DSCNN-LSTMs: A Lightweight and Efficient Model for Epilepsy Recognition

Epilepsy is the second most common disease of the nervous system. Because of its high disability rate and the long course of the disease, it is a worldwide medical problem and social public health problem. Therefore, the timely detection and treatment of epilepsy are very important. Currently, medical professionals use their own diagnostic experience to identify seizures by visual inspection of the electroencephalogram (EEG). Not only does it require a lot of time and effort, but the process is also very cumbersome. Machine learning-based methods have recently been proposed for epilepsy detection, which can help clinicians make rapid and correct diagnoses. However, these methods often require extracting the features of EEG signals before using the data. In addition, the selection of features often requires domain knowledge, and feature types also have a significant impact on the performance of the classifier. In this paper, a one-dimensional depthwise separable convolutional neural network and long short-term memory networks (1D DSCNN-LSTMs) model is proposed to identify epileptic seizures by autonomously extracting the features of raw EEG. On the UCI dataset, the performance of the proposed 1D DSCNN-LSTMs model is verified by cross-validation and time complexity comparison. Compared with other previous models, the experimental results show that the highest recognition rates of binary and quintuple classification are 99.57% and 81.30%, respectively. It can be concluded that the 1D DSCNN-LSTMs model proposed in this paper is an effective method to identify seizures based on EEG signals.


Introduction
According to the World Health Organization (WHO), epilepsy is the second most common disease of the nervous system after stroke, and there are about 50 million people affected by this disease around the world [1]. Epilepsy is a transient central nervous system dysfunction caused by the abnormal discharge of brain neurons [2]. Seizures can lead to uncontrollable parts or the whole body, loss of consciousness, and even death. However, seizures are unpredictable, which may have serious economic, physiological, and psychological impacts for patients, and bring a huge burden to their families. The Global Epilepsy Report, published by the WHO in 2019, points out that 25% of epilepsy can be prevented early, and 70% of epilepsy patients can be seizure-free through low-cost and effective drugs. Therefore, early detection and diagnosis are of great significance to improve the effect of epilepsy treatment and the quality of life of patients.
Electroencephalography (EEG) is a method of recording brain activity using electrophysiological indicators. It is formed by the sum of the postsynaptic potentials generated synchronously by a large number of neurons during brain activity. Electroencephalography is a commonly used non-invasive method to monitor and diagnose epilepsy. Thus, the abnormal state of the brain [3] can be effectively identified. In order to diagnose a seizure, doctors need to have a long record of the patient's EEG signals. Electroencephalography signals usually have many different channels and artifacts, which pose some difficulties sequence training process. Moreover, the temporal characteristics of the EEG signals can be extracted by LSTMs. These features are mainly needed for modeling calculation, but also indirectly help neurologists in clinical diagnosis.
(3) The model has less pre-processing of raw data, and in the future, it may be combined with existing wearable technology and smart phones, which can accurately detect and predict the development of epilepsy seizures, providing more universal applications for patients, caregivers, clinicians, and researchers.

EEG Data
The public UCI epilepsy recognition dataset was used in this paper [26]. In the UCI dataset, there are five different folders, each containing one hundred files. Specifically, each file represents an EEG record sample of the subject's brain activity. Each file is a 23.6 s record of brain activity. After visual examination of artifacts, such as muscle activity or eye movements, these segments were selected and cut out from a continuous multi-channel EEG signal. The corresponding time series were sampled as 4097 data points. Each data point is the EEG recorded value at a different time point. Now, we have 23 × 500 = 11,500 continuous EEG samples, each containing 178 data points, lasting 1 s (column), the last column representing the label Y{1,2,3,4,5}.
The EEG signals in Group A and Group B were recorded using standard cortical electrode placement protocol in five healthy volunteers who were in awake and relaxed state with their eyes open (A) and their eyes closed (B). Groups C, D, and E were obtained from intracranial EEG from epileptic patients. Group C represents the interictal EEG data from the hippocampal region. Group D represents the interictal EEG data from tumor tissues. Only group E represents the seizure activity EEG data for epileptic patients. All EEG signals were recorded using the same 128-channel amplifier system, with a standard electrode position scheme designed according to the international 10-20 system, using average common reference values. After the 12-bit analog-to-digital conversion, the data were continuously written to the disk of the data acquisition computer system at a sampling rate of 173.61 Hz. The bandpass filter was set to 0.53 to 40 Hz. The original datasets were preprocessed by the UCI, which created the data in CSV file format to simplify access to the data. This was described in detail in the literature [26].
Here are five states: Groups A and B are scalp electroencephalograms. Groups C, D, and E are intracranial implant electrodes. The difference between the original EEG signal waveform in the seizure state and the normal state is easy to observe, while the difference between the original EEG signal waveform in different normal states is difficult to observe. Therefore, two and five groups of epilepsy recognition tasks are considered in this paper. Binaries are divided into seizures and other states, and the five classifications are all five states in the dataset. Therefore, in order to comprehensively evaluate the performance of our approach, five EEG signals are visualized and shown in Figure 1, where the X-axis is the time/s and the Y-axis is the amplitude/mV. The EEG signals from open or closed eyes and healthy brain areas have good amplitude characteristics, whereas the EEG recorded during seizures is the most periodic and high amplitude, caused by the hypersynchronous activity of a large number of neurons [26]. all five states in the dataset. Therefore, in order to comprehensively evaluate the performance of our approach, five EEG signals are visualized and shown in Figure 1, where the X-axis is the time/s and the Y-axis is the amplitude/mV. The EEG signals from open or closed eyes and healthy brain areas have good amplitude characteristics, whereas the EEG recorded during seizures is the most periodic and high amplitude, caused by the hypersynchronous activity of a large number of neurons [26].

Data Pre-Processing
The UCI dataset has been pre-processed and reconstructed. Therefore, in the process of data preprocessing, it is necessary to normalize the EEG signal data, which can improve the convergence speed of the model. For normalization, the data are divided by 255. This normalization ensures the same distribution of data in the input layer. In addition, since computers cannot understand non-digital data, data labels are converted to a unique thermal code that can convert classified data into a uniform numeric format. The unique thermal coding solves the problem that the classifier is not good at processing attribute data, and also plays a role in extending the feature to a certain extent. It also facilitates the processing and computation of machine learning algorithms. After the dataset is pre-processed, the training set and the test set are divided and input into the deep learning model.

1D-CNN and 1D-DSCNN
The CNN has been proven to achieve good results in decoding brain signals. As a mature neural network architecture, CNN is very suitable for automatic feature learning. It is an end-to-end learning method that can directly learn local patterns in data without any feature engineering in advance. The CNN is a feedforward neural network. This special network structure has great advantages in feature extraction and learning. The CNN has excellent performance in many applications such as image classification, target detection, and medical image analysis. The main idea of CNN is that it can take local features from higher-level inputs and transfer them to a lower level to obtain more complex features. The CNN is generally composed of convolution layer, pooling layer, and full connection layer. The convolution layer contains a certain number of convolution kernels for convolution computation of input signals. Then, the activation function is used to nonlinear the result of convolution. In the one-dimensional CNN model, recti-

Data Pre-Processing
The UCI dataset has been pre-processed and reconstructed. Therefore, in the process of data preprocessing, it is necessary to normalize the EEG signal data, which can improve the convergence speed of the model. For normalization, the data are divided by 255. This normalization ensures the same distribution of data in the input layer. In addition, since computers cannot understand non-digital data, data labels are converted to a unique thermal code that can convert classified data into a uniform numeric format. The unique thermal coding solves the problem that the classifier is not good at processing attribute data, and also plays a role in extending the feature to a certain extent. It also facilitates the processing and computation of machine learning algorithms. After the dataset is pre-processed, the training set and the test set are divided and input into the deep learning model.

1D-CNN and 1D-DSCNN
The CNN has been proven to achieve good results in decoding brain signals. As a mature neural network architecture, CNN is very suitable for automatic feature learning. It is an end-to-end learning method that can directly learn local patterns in data without any feature engineering in advance. The CNN is a feedforward neural network. This special network structure has great advantages in feature extraction and learning. The CNN has excellent performance in many applications such as image classification, target detection, and medical image analysis. The main idea of CNN is that it can take local features from higher-level inputs and transfer them to a lower level to obtain more complex features. The CNN is generally composed of convolution layer, pooling layer, and full connection layer. The convolution layer contains a certain number of convolution kernels for convolution computation of input signals. Then, the activation function is used to nonlinear the result of convolution. In the one-dimensional CNN model, rectifying linear activation unit (ReLU) is used. The pooling layer, also known as the down sampling layer, pools the output of the convolution layer to maintain a higher level of representation. Pooling process including maximum pooling and global average pooling is used in our model. After the signals pass through the convolutional layer and the pooling layer, the advanced features are fed into the fully connected layer for final classification.
The DSCNN is proposed in the literature [27]. There is a high-performance MobileNets structure in the model, and its basic principle is that the standard convolution process is divided into the depth of the equivalent convolution and point by point convolution, then through point by point convolution mixing output channel. The improved convolution model can significantly reduce the computational complexity without losing accuracy of convolution. The DSCNN can effectively decompose traditional convolution by separating spatial filtering and feature generation mechanism. The production of DSCNN is defined by two separate layers, that is, lightweight deep convolution for spatial filter and 1 × 1 point convolution for feature generation. Specifically, in depthwise separable convolution, there is only one dimension in one convolution kernel channel, and each channel is responsible for the feature graph. One channel is convolved by only one convolution kernel. After deep convolution, the number of channels in the output feature graph is the same as that in the input layer. The 1 × 1 point convolution can reduce or raise the dimension of the feature graph. The feature graph of the upper layer can be weighted and combined in the depth direction. The size of the generated new feature graph is consistent with the input data, and the main function is to combine the feature information of each channel. Since the EEG signals in this experiment are all one-dimensional features, and one-dimensional convolution filter and feature mapping are both one-dimensional, thus one-dimensional convolution is adopted in this paper by using multiple filters to carry out one-dimensional convolution.
For standard convolution, the dimension of input feature graph is (D F , D F , M), convolution kernel is (N, D K , D K , M), the dimension of output feature graph is (D G , D G , N), standard convolution computation quantity is ( The process of standard convolution and depthwise separable convolution is shown in Figure 2 and Formula (1); it can be concluded that depthwise separable convolution is a much lighter convolution network.

Long Short-Term Memory Networks
Long short-term memory networks are a special kind of recurrent neural network (RNN). With the increase in training time and network layers, the problem of gradient explosion or gradient disappearance occurs easily in the RNN, which may lead to the inability to process long sequence data and thus an inability to obtain the information of long-distance data. Long short-term memory networks can be used in text generation, machine translation, speech recognition, generating image descriptions and video tags,

Long Short-Term Memory Networks
Long short-term memory networks are a special kind of recurrent neural network (RNN). With the increase in training time and network layers, the problem of gradient explosion or gradient disappearance occurs easily in the RNN, which may lead to the inability to process long sequence data and thus an inability to obtain the information of long-distance data. Long short-term memory networks can be used in text generation, machine translation, speech recognition, generating image descriptions and video tags, and so on. As shown in Figure 3, LSTMs mainly have three gates, namely, input gate, output gate, and forget gate. At each moment, input information from the input layer will first pass through the input gate. The opening and closing of the input gate determines whether any information will be input to the memory cell at that moment. Whether information is sent out of the memory cell at any time depends on output gate. Every time a value in the memory cell is forgotten, it is controlled by forget gate. If you punch the clock, the value in the memory cell will be cleared.  where σ represents sigmod activation function, tanh is an activation function, × represents multiplication, + represents addition. Other variables are temporal characteristic information.
The mathematical expressions of LSTMs units are defined as follows. where σ represents sigmod activation function, tanh is an activation function, × represents multiplication, + represents addition. Other variables are temporal characteristic information.
The first step in LSTMs is to determine what information will be discarded from the cellular state. This decision is made through a layer called forget gate ƒ t . The gate reads in h t−1 and x t , usually using sigmoid as the activation function, and outputs a value between 0 and 1 for each number in the cell state C t−1 . A reading of 1 means completely retained, 0 means completely abandoned, and most of the values of a trained LSTMs gate are very close to 0 or 1, and the rest are few. The second step is to determine what new information is stored in the cellular state. There are two parts in this step. First, the sigmoid layer, called the input gate layer, decides what value to update, and here i t is regarded as the input of input gate. Then, a tanh layer creates a new candidate vector C t , which is obtained from input data x t and hidden node h t−1 through a neural network layer, and is added to the state. If the previous steps have determined whether to update the old cell state, update C t−1 to C t . The updating operation is to multiply the old state with ƒ t , to discard the unwanted information, and to add i t × C to obtain the new candidate value. Finally, it needs to determine what the output value is, and here o t represents the output gate. This output is based on cell state, but also is a filtered version. First, a sigmoid layer is run to determine which part of the cell state will be output. Next, tanh is used to obtain the cell state, a value between −1 and 1, and then multiply it by the output of the sigmoid gate. Finally, h t is obtained from o t of output gate and C t of unit state, wherein the calculation method of o t is the same as ƒ t and i t .
The mathematical expressions of LSTMs units are defined as follows.

1D DSCNN-2LSTMs Model
The one-dimensional DSCNN-2LSTM model proposed in this paper consists of one input layer, one depth-separable convolution layer, one pooling layer, two full connection layers, two LSTM layers, and one output layer. In order to prevent excessive fitting, drop layer is added. The detailed model structure is shown in Figure 4. It can be seen that the DSCNN-2LSTM model proposed in this paper uses very few neurons, which is the advantage of depthwise separable convolution. Table 1 shows the parameters of the DSCNN-2LSTM architecture.     Firstly, the pre-processed 1d EEG data are directly input into the input layer of the model, and the dimension of the input data is 178 × 1. Then, one-dimensional depthseparable convolution operation is performed on the input data to extract the extract features of EEG signals. The specific convolution operation is as follows: in separable Conv1D Layer1, the number of one-dimensional convolution kernels is 64, the size of convolution kernels is 3 × 1, and the step size is 1. The convolution kernels represent the sensory field of convolution. If the convolution kernels are too small and the sensory fields are insufficient, it is unable to effectively extract the association features between adjacent characters in a larger range. It is easy to ignore the association features between local adjacent characters, and the convolution kernel is too small or too large, which will adversely affect the classification results. In many tests, the appropriate convolution kernel size is 3 × 1, and the nonlinear rectification linear unit is ReLU. The ReLU activation function helps to avoid the over-fitting problem. The ReLU formula is shown in Equation (8): After passing through the one-dimensional convolution layer, it enters the pooling layer, whose function is to retain the main features while reducing parameters (lowering latitude) and computation, so as to prevent over-fitting. The pooling layer then moves to the fully connected layer, where a dropout layer is added to prevent overfitting. After passing through FC Layer1, the output features are fed into the LSTMs layer, which is capable of learning useful information from EEG time series data. There are 64 neuron units in both LSTMs Layer1 and LSTMs Layer2. After the characteristics pass through the LSTMs layer, the output characteristics are sent to another FC Layer2. The FC Layer2 has 32 neurons, and finally retains the final data extracted from the whole model to the features, and then inputs the features to softmax layer for classification. Softmax classifier first converts the prediction results of the model to the exponential function, so as to ensure the non-negative probability. To make sure that the sum of the probabilities of each prediction is equal to 1, normalization needs to convert the result. The method is to divide the converted result by the sum of all the converted results, which can be understood as the percentage of the total number of converted results. That gives an approximate probability. In this way, the final feature vector will be mapped to the value of (0,1), and the cumulative sum of these values is 1, satisfying the nature of probability. When the output node is finally selected, the node with the maximum probability will be output as the target result of prediction. Softmax is shown in Equation (9).
In this experiment, categorical_Crossentropy Loss and Adam Optimizer algorithms are used, where crossentropy is used to evaluate the difference between the probability distribution obtained by training and the real distribution. It describes the distance between the actual and the expected output probability; that is, the smaller the value of cross entropy, the closer the two probability distributions will be. Adam optimizer combines the advantages of AdaGrad and RMSProp, two optimization algorithms. The update step size is calculated by considering the First Moment Estimation and Second Moment Estimation. Adam is chosen as the optimizer because it is a simple and computationally efficient stochastic gradient descent technique [28,29]. The empirical results show that Adam is more effective than other stochastic optimization methods. The detailed configuration of the model can be adjusted according to the specific situation of the identification task. The formula of the cross entropy loss function is as in Equations (10)-(13), the derivative transformation.
where y is the expected output; a is the actual output of the neuron.

Evaluation Indicators
Now, suppose that our classification target has only two categories, which are counted as positive and negative, respectively.  (14) to (17).

Experimental Setup
In this experiment, the dataset was split into 90% and 10% for training and testing, respectively. The proposed model was compared with DNN, CNN, DSCNN, LSTMs, and Bi-LSTMs and their combination models. The number of training was 100 times, and the batch size was 32. The pre-training data were all set to the same random seed and randomly shuffled and sent to the network model. Ten-fold cross-validation was also used to validate the performance of each model. The data were divided into ten parts, and take nine of them as the training set and one as the test set in turn, and the mean results of the ten times is used as the estimation of the algorithm accuracy. Both DSCNN-LSTMs and the above networks are implemented on a 12th Gen Intel (R) Core (TM) i9-12900KF 3.19 GHz processor using Python3.7.

LSTMs Layer Selection
The original LSTMs model consists of a single LSTMs layer followed by an output layer. Stacking LSTMs is actually to take the output of the previous layer of the LSTMs as the input of the next layer of LSTMs, which can make the model deeper and the extracted features deeper, resulting in more accurate prediction. In order to choose the appropriate number of LSTM layers, the one-, two-, and three-layer LSTMs through 10-fold crossvalidation are compared, as shown in Tables 2 and 3. It can be seen from Tables 2 and 3 that the average accuracy of stacking two layers of LSTMs in the two-and five-class tasks is the highest, with an average accuracy of 99.46% and 77.58%, respectively, and the accuracy begins to decline after more than two layers. Therefore, it can be concluded that DSCNN with two-layer-stacked LSTMs achieves the highest classification accuracy.

Resolve Class Imbalances
Class imbalance refers to the situation in which the number of training examples of the different classes in a classification task varies greatly. In general, if the proportion of class imbalance is quite different, then the classifier will be greatly unable to meet the classification requirements. Therefore, before building a classification model, it is necessary to deal with the problem of classification imbalance. Clearly, the number of patients is far smaller than healthy people. There are generally solutions to solve class imbalance such as expanding the dataset, undersampling, and oversampling. Machine learning uses existing data to estimate the distribution of the entire data; therefore, more data can yield more distribution information. Undersampling is to sample the data of a large class to reduce the number of data and to make it close to the number of other classes, and then to learn. However, undersampling may lose some important information by randomly discarding large classes of samples. Oversampling is to sample the data of subclasses to increase the number of data of subclasses. However, these methods more or less affect the classification results; the skewed distribution of classes is taken into account to modify the existing training algorithm, which can be achieved by giving different weights to the majority class and the minority class. During the training process, different weight affects the classification. The overall purpose is to penalize the misclassification of the minority class by setting higher class weights while lowering the weights for the majority class. The class weight is shown in Equation (18). From the weight formula, the class weights of epilepsy and other classes can be obtained in the binary classification task, 2.5 and 0.625, respectively. The reliability of our training model is further verified by adjusting the class weights and ten-fold cross-validation. The experimental results are shown in Table 4. It can be seen from Tables 2 and 4 that our model has high classification accuracy regardless of whether the class weight is adjusted or not, and the average accuracy exceeds 99%, which shows that our model is suitable for the EEG prediction of epilepsy and has good diagnostic performance.

Ablation Experiments
Ablation experiments have important implications for identifying accuracy and speed improvements for data augmentation, which are conducted to evaluate the performance of our algorithm on two-and five-class recognition tasks. The results of a single ablation experiment are shown in Tables 5 and 6. It can be seen from the four evaluation indicators that the performance of the DSCNN-2LSTMs model in the binary classification task is not much different from that of the DSCNN model, but it is better than the LSTMs model. On the five-class recognition task, the performance of the DSCNN-2LSTMs model is much greater than the other two models. In order to further verify the superiority of the DSCNN-2LSTMs model for epilepsy classification results, ten-fold cross-validation is also performed. The experimental results are shown in Tables 7 and 8. The average accuracy of our proposed model is still greater than that of the other two models. It shows that the combined model of DSCNN and LSTMs performs better than the DSCNN model and LSTMs model alone.

Binary Recognition Task
To further verify the classification performance of the proposed DSCNN-2LSTMs for seizure detection, comparison is conducted among DSCNN-2LSTMs with other deep learning models and traditional machine learning models. The same random seed is used to ensure the trained model and the test dataset are consistent. The deep learning models include Convolutional Neural Network(CNN), Deep neural network(DNN), and bidirectional LSTMs and their combined models. Bidirectional LSTMs are an extension of traditional LSTMs, which train two models on the input sequence. The first in the input sequence is the original sample and the second is the reversed sample of the input sequence. Traditional machine learning models include AdaBoost, K Nearest Neighbors (KNN), Random Forest, and Support Vector Machine (SVM). The experiment can be seen from Table 9. When testing the validation set, the DSCNN-2LSTMs performance in this paper is the best, with an accuracy rate of 99.57%, precision of 98.79%, a recall rate of 98.79%, and an F1 score of 98.79%. The accuracy rate of Bidirectional DSCNN-LSTMs is 99.57%, second only to DSCNN-2LSTM. The DNN model has the worst performance, with 96.35% accuracy, 95.18% precision, 87.50% recall, and 91.18% F1 score evaluation. The comprehensive performance of the four traditional machine learning models is weaker than that of the deep learning model, which indicates that the deep learning model is more suitable for seizure detection than the traditional machine learning model. Among them, SVM performed the worst, with an accuracy rate of 82.26%, a precision rate of 85.55%, a recall rate of 82.23%, and an F1 score of 75.78%. Table 9. The Performance of DSCNN-2LSTMs and other models on binary classification tasks. To compare the time complexity of the proposed DSCNN-2LSTMs for epilepsy detection with other models, the time complexity refers to the amount of computation required to execute the algorithm. All models are tested individually in the same environment and the iterative training average time for each model and the training average time per iteration step over ten iterations are calculated. The experimental result is shown in Figure 5. The time complexity required by DNN is the lowest, because there are only neural units in the DNN model, which reduces a lot of computation compared to other models. The time complexity of the CNN model is slightly smaller than that of the DNN. Compared with LSTMs, the computational cost of bidirectional LSTMs is greatly increased because bidirectional LSTMs need to obtain both forward and backward information. However, our proposed DSCNN-LSTMs model uses less time complexity under the premise of ensuring accuracy.

Accuracy (%) Precision (%)
only neural units in the DNN model, which reduces a lot of computation compared to other models. The time complexity of the CNN model is slightly smaller than that of the DNN. Compared with LSTMs, the computational cost of bidirectional LSTMs is greatly increased because bidirectional LSTMs need to obtain both forward and backward information. However, our proposed DSCNN-LSTMs model uses less time complexity under the premise of ensuring accuracy.

Five-Class Recognition Task
Similarly, the training and testing process of applying the above model are analyzed on the quintuple classification task. The test performance of the model is shown in Table  10. It can be found from the data that the 1D DSCNN-2LSTMs model proposed in this paper has achieved the best recognition performance under different recognition tasks. The accuracy of DSCNN-2LSTMs is 81.30%, the precision is 79.21%, the recall rate is 79.95%, and the F1 score is 79.59%. The CNN-bidirectional LSTM performed worse than DSCNN-2LSTM, and SVM performed the worst in the quintubation task, with accuracy of 26.39%, precision of 33.31%, recall rate of 26.40%, and F1 score of 26.79%. We also calculate the time complexity of the deep learning model in quintuple classification tasks, as shown in Figure 6. The CNN-Bidirectional LSTMs require the highest time complexity, far higher than other models. The time complexity required by DSCNN-2LSTMs ranks in the middle among these models, but the time complexity of the combined model is the lowest. From the comprehensive evaluation indicators and time complexity, DSCNN-2LSTMs is superior to other models.

Five-Class Recognition Task
Similarly, the training and testing process of applying the above model are analyzed on the quintuple classification task. The test performance of the model is shown in Table 10. It can be found from the data that the 1D DSCNN-2LSTMs model proposed in this paper has achieved the best recognition performance under different recognition tasks. The accuracy of DSCNN-2LSTMs is 81.30%, the precision is 79.21%, the recall rate is 79.95%, and the F1 score is 79.59%. The CNN-bidirectional LSTM performed worse than DSCNN-2LSTM, and SVM performed the worst in the quintubation task, with accuracy of 26.39%, precision of 33.31%, recall rate of 26.40%, and F1 score of 26.79%. We also calculate the time complexity of the deep learning model in quintuple classification tasks, as shown in Figure 6. The CNN-Bidirectional LSTMs require the highest time complexity, far higher than other models. The time complexity required by DSCNN-2LSTMs ranks in the middle among these models, but the time complexity of the combined model is the lowest. From the comprehensive evaluation indicators and time complexity, DSCNN-2LSTMs is superior to other models.

Compare with Other Cross-Validation Models
The DSCNN-2LSTMs model proposed in this study achieves good results on binary and quintuple classification tasks. In order to further verify the accuracy advantage and stability of the model for binary classification and the quintic classification of epileptic seizures, we compared the performance of each model through the cross-validation of ten folds. The experimental results are shown in Tables 11 and 12. In the binary recognition task, our model has an average accuracy rate of 99.46%, which is the highest among all models. The average accuracy of the DSCNN-Bidirectional LSTMs model is 98.73%, second only to DSCNN-2LSTMs. SVM performs the worst in binary recognition tasks, with an average accuracy of 81.88%. The CNN-Bidirectional LSTMs model has the highest average accuracy of 77.94% in five-class recognition tasks, 0.36% higher than our proposed model. The reason may be that CNN increases the convolution calculation compared with DSCNN. Bidirectional LSTMs capture both forward and reverse information. Compared with our model, more features are extracted, resulting in higher accuracy. However, the time complexity is much greater than our model when the accuracy is similar. SVM still performs the worst in the quintuple recognition task, with an average accuracy of 26.69%, far lower than other models. The experimental results show that our model is very effective for binary and quintuple classification tasks for epilepsy.

Compare with Other Cross-Validation Models
The DSCNN-2LSTMs model proposed in this study achieves good results on binary and quintuple classification tasks. In order to further verify the accuracy advantage and stability of the model for binary classification and the quintic classification of epileptic seizures, we compared the performance of each model through the cross-validation of ten folds. The experimental results are shown in Tables 11 and 12. In the binary recognition task, our model has an average accuracy rate of 99.46%, which is the highest among all models. The average accuracy of the DSCNN-Bidirectional LSTMs model is 98.73%, second only to DSCNN-2LSTMs. SVM performs the worst in binary recognition tasks, with an average accuracy of 81.88%. The CNN-Bidirectional LSTMs model has the highest average accuracy of 77.94% in five-class recognition tasks, 0.36% higher than our Mean(s) Mean(ms/step)

Conclusions
This paper presents a one-dimensional, deeply separable convolutional neural network for the detection and diagnosis of epilepsy based on EEG signals. The experimental results show that the proposed method consumes fewer computing resources, realizes the highprecision classification of seizures, and can use the original EEG data to realize real-time detection, which is helpful to the development of wearable and implantable EEG detection devices. However, the model could not predict seizures in advance. Future studies could establish multi-channel electrode DNN [30] or multi-bipolar channel input CNN [31] and a multi-classifier ensemble learning model to classify tasks under non-fixed-scale input. Pre-seizure EEG data could also be collected to train a model that could predict seizures in advance, which is crucial for epilepsy patients. Deep learning would also be applied to predict clinical drug response [32] and predict the prognosis of epilepsy surgery [33], so as to further improve the prognosis of patients and improve their living conditions.