Detection Method of Epileptic Seizures Using a Neural Network Model Based on Multimodal Dual-Stream Networks

Epilepsy is a common neurological disorder, and its diagnosis mainly relies on the analysis of electroencephalogram (EEG) signals. However, the raw EEG signals contain limited recognizable features, and in order to increase the recognizable features in the input of the network, the differential features of the signals, the amplitude spectrum and the phase spectrum in the frequency domain are extracted to form a two-dimensional feature vector. In order to solve the problem of recognizing multimodal features, a neural network model based on a multimodal dual-stream network is proposed, which uses a mixture of one-dimensional convolution, two-dimensional convolution and LSTM neural networks to extract the spatial features of the EEG two-dimensional vectors and the temporal features of the signals, respectively, and combines the advantages of the two networks, using the hybrid neural network to extract both the temporal and spatial features of the signals at the same time. In addition, a channel attention module was used to focus the model on features related to seizures. Finally, multiple sets of experiments were conducted on the Bonn and New Delhi data sets, and the highest accuracy rates of 99.69% and 97.5% were obtained on the test set, respectively, verifying the superiority of the proposed model in the task of epileptic seizure detection.


Introduction
Epilepsy is a common neurological disorder characterized by abnormal electrical activity in the brain.These abnormal electrical activities can trigger various forms of seizures, which vary from person to person.Seizures may cause generalized convulsions, which are violent, involuntary contractions and spasms of muscles throughout the body.This type of seizure is often called a generalized seizure.However, epileptic seizures do not always manifest as generalized convulsions.Some types of seizures, such as focal seizures, may be limited to one part of the body and manifest as localized muscle twitching or abnormal sensations.In addition, some seizures may include brief loss of consciousness, abnormal behavior, or confusion without obvious convulsions.Epilepsy can manifest in many different ways, depending on where in the brain the abnormal electrical activity occurs and how it spreads [1].The diagnosis and monitoring of epilepsy relies heavily on electroencephalography (EEG), a non-invasive brain testing technique that measures electrical signals in the brain through electrodes attached to the scalp.EEG can capture changes in brain activity during seizures.However, EEG signals are complex, noisy, and high-dimensional, which poses a challenge for accurate and efficient classification of EEG signals [2].
EEG signals for the detection and diagnosis of epilepsy can be divided into the following steps: signal preprocessing, feature extraction, feature selection and classification [3].
Signal preprocessing is carried out to remove noise and interference from the EEG signal and improve the signal quality.Then, useful features are extracted from the EEG signal that reflect the time domain, frequency domain, or time-frequency domain characteristics of the signal.However, not all of the extracted features are favorable for epilepsy diagnosis, so feature selection is also needed to select the optimal subset from the extracted features to reduce the feature dimensionality and computational complexity.Finally, the EEG signals are categorized as either normal or abnormal based on the features, which also help to further differentiate the type and degree of epilepsy.In recent years, many researchers have proposed many intelligent epilepsy diagnostic methods, some of which are based on traditional signal processing methods, and some of which are based on machine learning and deep learning methods.
Conventional methods usually require an artificially designed approach to feature extraction and selection [4][5][6][7][8][9][10][11].Guangpeng et al. [12] extracted the time-frequency feature maps of interval EEG signals.Then a single-channel method was used to reduce the network parameters, and finally a convolutional neural network was used to predict epilepsy, with a prediction accuracy of 87.9%.Cansel et al. [13] used discrete wavelet transform to process EEG signals and thus diagnose temporal lobe epilepsy (TLE) patients and psychogenic nonepileptic seizure (PNES) in an automated discriminative method to quickly and accurately determine different epilepsy types.Sirin et al. [14] investigated the interaction between sleep architecture and seizure probability, using dual-channel subcutaneous EEG signals to account for changes in brain dynamics in each patient.Seyed Morteza et al. [15] also used discrete wavelet transform to decompose the EEG signal; however, it was based on the Modified Binary Salp Swarm Algorithm (MBSSA) to extract the time domain features, thus avoiding manual and time-consuming computations.Ying et al. [16] used a wearable EEG monitoring device to capture EEG and automated epilepsy detection using support vector machines, providing a new approach to real-time monitoring.Traditional methods have certain limitations and need to set up appropriate recognition methods according to specific scenarios, which is not conducive to the rapid diagnosis of epilepsy diseases for complex and variable epilepsy types [17][18][19].
Deep learning methods can automatically learn feature representations from raw signals.Deep learning methods can also process multi-channel EEG signals, utilizing the spatial relationships between the signals [20][21][22][23][24]. Zixu et al. [25] developed a unified framework early-seizure detection and epilepsy diagnosis using mainly autoregressive moving average-model and support-vector machine classifiers for epilepsy diagnosis, which achieved classification accuracies of 93% and 94%, respectively.Weidon et al. [26] used multichannel EEG signals to construct a multilayer deep convolutional neural network model, thus effectively utilizing the relevant information such as time, frequency, and channel of EEG to extract relevant features about epilepsy, which greatly improved the diagnostic accuracy of epilepsy.Abdelhamid et al. [27] proposed a framework combining deep learning and EEG signal processing without any manual feature extraction for the detection of seizures and non-seizures, with a combination of one-dimensional convolutional neural networks, recurrent neural networks, and attentional mechanisms, which achieved high recognition accuracy in several publicly available datasets.Mingyang et al. [28] proposed a neural network based on wavelet envelope analysis, which combines discrete wavelet transform with the envelope analysis method to extract important features from EEG signals.Aayesh et al. [23] performed time domain, frequency domain and nonlinear analysis on the signal to extract pattern features; they performed feature selection on the extracted features, obtained more discriminative features, and constructed a fuzzy machine learning classifier for epileptic seizure detection.
Combining these methods, for the feature extraction of EEG signals, in this study we used the differential features of the EEG signal, the amplitude spectrum and the phase spectrum to jointly extract the features of the EEG signal.Differential feature extraction is a commonly used signal processing method that captures changes in a signal by calculating the difference between consecutive time points.In EEG signal processing, the change trend of the signal can be obtained by calculating the difference between adjacent time points, thereby extracting the differential features and capturing the instantaneous or periodic changes in the signal.Frequency domain analysis is the process of converting signals from the time domain to the frequency domain.The amplitude spectrum represents the amplitude of the signal at different frequencies, while the phase spectrum represents the phase information of the signal at different frequencies.In EEG signal processing, frequency domain analysis can help reveal the different frequency components present in the signal, such as alpha waves, beta waves, etc., as well as the phase relationship between them, and help identify activity patterns at specific frequencies in the signal, thereby better understanding and analyzing the characteristics of EEG signals.Differential feature extraction helps capture the instantaneous changes in the signal, while the amplitude spectrum and phase spectrum in the frequency domain provide information about the amplitude and phase of the signal at different frequencies.The combined use of these methods can more comprehensively describe the characteristics of EEG signals.This provides richer feature information for subsequent signal analysis and processing.
The preprocessed data uses a neural network model based on a multi-modal dualstream network to process temporal features and spatial features, respectively.Specifically, it is divided into two streams, one for processing temporal features and the other for processing spatial features.The two streams can each adopt network structures and algorithms suitable for processing their respective characteristics, with improved processing and representation capabilities of complex signals.
The remainder of this article is organized as follows.Section 2 introduces the EEG dataset and methods.Section 3 describes the experimental procedure and results.Section 4 concludes the paper and suggests some future directions.

Dataset 2.1.1. The University of Bonn Dataset
Bonn EEG Dataset is one of the public data sets widely used in the field of braincomputer interface (BCI) and neuroscience research [29].The dataset was created by the Center for Medical Epilepsy at the University of Bonn in Germany.This data set contains EEG data from 5 healthy people and 5 epilepsy patients.It was collected using the international 10-20 system EEG acquisition system.It contains a total of 5 data subsets, namely F, S, N, Z, and O.The data are described in Table 1 and visualized in Figure 1.The Bonn data set is a single-channel data set, in which each sub-data set contains 100 data segments: the time length of each data segment is 23.6 s, the data points are 4097, and the sampling frequency is 173.61Hz.

Intracranial lesion area
Subsets Z and O were collected from a control group of 5 healthy individuals.The clip in Z is the EEG when the subject's eyes are open, and the clip in O is the EEG when the subject's eyes are closed.Subsets N, F, and S are intracranial EEG, collected from 5 patients who were diagnosed before surgery.Subset N comes from the intracranial hippocampal formation area of the patient's interictal period.Subset N comes from the intracranial hippocampal formation area of the patient's interictal period.Subset F comes from the intracranial lesion area of the patient during the interictal period.Subset S comes from the intracranial lesion area during the patient's ictal period.In the experiment, Z, O, N, and F are regarded as one category and marked as Interictal period.E is marked Ictal period.Slice the data into a  Subsets Z and O were collected from a control group of 5 healthy indivi clip in Z is the EEG when the subject's eyes are open, and the clip in O is the the subject's eyes are closed.Subsets N, F, and S are intracranial EEG, collec patients who were diagnosed before surgery.Subset N comes from the intrac pocampal formation area of the patient's interictal period.Subset N comes from cranial hippocampal formation area of the patient's interictal period.Subset F c the intracranial lesion area of the patient during the interictal period.Subset S c the intracranial lesion area during the patient's ictal period.In the experimen and F are regarded as one category and marked as Interictal period.E is m period.Slice the data into a 2 s time window to obtain a single training sample

New Delhi Dataset
The New Delhi dataset is a publicly available dataset created from the Neurology and Sleep, Hauz Khas, New Delhi.The dataset contains EEG record epilepsy patients [30].Data were collected using a Grass Telefactor Comet AS cation system at a sampling rate of 200 Hz.During the acquisition process, g scalp EEG electrodes were placed according to the 10-20 electrode placement s signal is filtered between 0.5 and 70 Hz and then divided into pre-ictal, intericta

New Delhi Dataset
The New Delhi dataset is a publicly available dataset created from the Center for Neurology and Sleep, Hauz Khas, New Delhi.The dataset contains EEG recordings of ten epilepsy patients [30].Data were collected using a Grass Telefactor Comet AS40 amplification system at a sampling rate of 200 Hz.During the acquisition process, goldcoated scalp EEG electrodes were placed according to the 10-20 electrode placement system.The signal is filtered between 0.

Data Set Preprocessing
The EEG signal reflects the activity process of the brain.The amplitude signal changes within the entire range of 2~100 µV, and the frequency range is In the study, the EEG was divided into five frequency sub bands.In general, d often appear in the cerebral cortex during deep sleep.Specifically, this electri brain waveform with a frequency between 0.5 and 4 Hz is consistent with t stage of non-rapid eye movement sleep, and is associated with an extremely r restorative state of the brain and body.In contrast, theta waves, with frequenci 4 and 8 Hz, usually appear in the shallow stages of sleep and during meditation a transitional state between wakefulness and sleep, involving memory and lea

Data Set Preprocessing
The EEG signal reflects the activity process of the brain.The amplitude of the EEG signal changes within the entire range of 2~100 µV, and the frequency range is 1~100 Hz.In the study, the EEG was divided into five frequency sub bands.In general, delta waves often appear in the cerebral cortex during deep sleep.Specifically, this electrical activity brain waveform with a frequency between 0.5 and 4 Hz is consistent with the deepest stage of non-rapid eye movement sleep, and is associated with an extremely relaxed and restorative state of the brain and body.In contrast, theta waves, with frequencies between 4 and 8 Hz, Sensors 2024, 24, 3360 5 of 17 usually appear in the shallow stages of sleep and during meditation, reflecting a transitional state between wakefulness and sleep, involving memory and learning process.Alpha waves, with a frequency between 8 and 12 Hz, are clearly present in the cerebral cortex when a person is not stressed and calm, especially in the occipital area.This waveform is most significant when resting with eyes closed or lightly relaxed, marking a state of being awake but relaxed.The frequency of beta waves is between 12 and 30 Hz.It generally appears when the frontal lobe is excited and thinking.It is related to active cognitive activities and high concentration.It is commonly seen in problem solving, decision-making and reasoning processes.Finally, gamma waves, with frequencies above 30 Hz, typically occur when the brain feels anxious or in a state of emotional stress.Although this waveform is associated with high levels of cognitive function and information processing, in states of stress or anxiety, gamma wave activity also increases significantly [31][32][33].
In this study, in order to better observe the different EEG signal characteristics of patients, a signal is first converted from the time domain to the frequency domain.Its Fourier-transformed x 1 is the representation of the signal in the frequency domain, which contains the signal amplitude and phase information of x.The amplitude of the signal in the frequency domain is then calculated.By taking the absolute value of the Fourier transform result x 1 , we obtain the amplitude spectrum x 2 of the signal.By taking the angle of the Fourier transform result x 1 , the phase spectrum x 3 of the signal is obtained.
Finally, calculate the first-order difference x 4 and the second-order difference x 5 of the signal x.Finally, a feature matrix [x, x 2 , x 3 , x 4 , x 5 ] is formed with the original signal.On the other hand, the short-time Fourier transform is performed on the original signal to obtain spectrum data x 6 , which contain the signal at different frequencies.The EEG processing flow is shown in Figure 3.    [34][35][36].It converts a signal from the time domain to the frequency domain and represents the signal as a collection of frequency components.The discrete form of DFT can be expressed as Formula (1), where x[n] is the discrete sample of the input signal, X[k] is the transformed signal, N is the number of samples of the signal, and i is the imaginary unit.
STFT decomposes the signal into two dimensions: time and frequency.It segments the signal in time and applies Fourier transform to each time segment to obtain the representation of the signal in frequency [37].The specific principle formula is as follows: Formula (2), where X(t, ω) is the STFT result of the time domain signal x(t) at frequency ω, w(τ − t) is the window function, usually using the Hanning window and other window functions, and ω is the angular frequency.
STFT is usually implemented through discretization, replacing continuous time and frequency with discrete time and frequency.For discrete signals, STFT can be expressed as Formula (3), where X[m, ω] is the STFT result of the discrete time-domain signal x[n] at frequency ω, w[n − m] is the discrete window function, and m is the time index.

First-Order Difference and Second-Order Difference
The difference operation refers to calculating the difference between each element in the array and the adjacent element to obtain a new array.When the calculation result is the difference between the current data point and the next data point, it is called the forward difference.The calculation result is a positive value, which means that the function is rising at that point; if it is a negative value, it means that the function is falling at that point.The formula for directional difference is shown as Formula (4).
When the calculation result is the difference between the current data point and the previous data point, it is called backward difference.When the calculated result is positive, it means that the function is rising at that point.If it is negative, it means that the function decreases at that point.The principle is shown in Formula (5).
First difference refers to the operation of calculating the difference between each element in a sequence and its previous element.Second-order difference refers to a new sequence obtained by performing two difference operations on a sequence.The formula of the forward second-order difference is shown in Formula (6).
The formula for the backward second-order difference is shown in Formula (7).
For a sequence [a 1 , a 2 , a 3 , ..., a n ], its first difference can be expressed as , where b i = a i+1 − a i ; then perform a difference operation on the first-order difference sequence, and the result is the second-order difference sequence [c 1 , c 2 , c 3 , ..., c n−2 ], where c i = b i+1 − b i .

Neural Network Module 2.3.1. One-Dimensional Convolutional Neural Network
One-dimensional convolution is often used to process time series data, using a onedimensional convolution kernel of a specified size to perform a one-dimensional convolution operation on the input multi-channel one-dimensional input signal [38].Assume that the size of the input is (N, C in , L in ), where N represents the batch size, C in represents the number of channels, and L in represents the length of the signal sequence.The size of the output is (N, C out , L out ), where C out represents the number of output channels and L out represents the length of the output signal.The operation formula is as shown in Formula ( 8), and * represents a valid cross-correlation operator.The principle is shown in Figure 4.

One-Dimensional Convolutional Neural Network
One-dimensional convolution is often used to process time series data, us dimensional convolution kernel of a specified size to perform a one-dimensio lution operation on the input multi-channel one-dimensional input signal [38 that the size of the input is ,  ,  , where  represents the batch size,  the number of channels, and  represents the length of the signal sequence.T the output is ,  , , where  represents the number of output cha  represents the length of the output signal.The operation formula is as sho mula (8), and * represents a valid cross-correlation operator.The principle is

Two-Dimensional Convolutional Neural Network
Two-dimensional convolutional layers are used to process two-dimensi signals [39].Assume that the size of the input is ,  ,  ,  , where  rep batch size,  represents the number of input channels, and  and  rep height and width of the input image, respectively.The size of the ,  ,  ,  , where  represents the number of output channels, and

Two-Dimensional Convolutional Neural Network
Two-dimensional convolutional layers are used to process two-dimensional input signals [39].Assume that the size of the input is (N, C in , H in , W in ), where N represents the batch size, C in represents the number of input channels, and H in and W in represent the height and width of the input image, respectively.The size of the output is (N, C out , H out , W out ), where C out represents the number of output channels, and H out and W out represent the height and width of the output image, respectively.Here, * represents a valid two-dimensional cross-correlation operator.The formula of two-dimensional convolutional is as shown in Formula (9).The principle is shown in Figure 5.
Sensors 2024, 24, x FOR PEER REVIEW 8 represent the height and width of the output image, respectively.Here, * repre a valid two-dimensional cross-correlation operator.The formula of two-dimensional volutional is as shown in Formula (9).The principle is shown in Figure 5. LSTM is a special type of RNN.In order to solve the problems of gradient disapp ance and gradient explosion that exist in traditional RNN, memory cells and gating m LSTM is a special type of RNN.In order to solve the problems of gradient disappearance and gradient explosion that exist in traditional RNN, memory cells and gating mechanisms are introduced, which can retain old feature information in feature extraction of sequence data, thereby extracting relevant features.This achieves a better performance in data feature extraction [40].Figure 6 below shows the network structure of LSTM.LSTM is a special type of RNN.In order to solve the problems of gra ance and gradient explosion that exist in traditional RNN, memory cells anisms are introduced, which can retain old feature information in fea sequence data, thereby extracting relevant features.This achieves a bette data feature extraction [40].Figure 6 below shows the network structure There are three types of gates in the LSTM gate: input gate , forge put gate .The input gate is used to control the update information of The forget gate is used to control the amount of storage unit information vious moment.The output gate is used to control the amount of informa next hidden state.At time , given the input vector  and the hidden previous moment, the LSTM unit calculates the hidden state ℎ at the through internal loops and updates, and the formula is shown in (10)  There are three types of gates in the LSTM gate: input gate i, forget gate f and output gate o.The input gate is used to control the update information of the storage unit.The forget gate is used to control the amount of storage unit information used at the previous moment.The output gate is used to control the amount of information output to the next hidden state.At time t, given the input vector t x and the hidden state h t−1 at the previous moment, the LSTM unit calculates the hidden state h t at the current moment through internal loops and updates, and the formula is shown in ( 10)-( 15).
Among them, w f x , w ix , w cx , and w ox represent the weight matrix between the input layer and the corresponding gate at time t.w f h , w ih , w ch , and w oh are the hidden-layer weight matrices between time values t and t − 1, and b f , b i , b c , and b o represent the corresponding deviations.h t−1 and c t−1 are the hidden state and cell state of time value t − 1, and i t , f t , and o t are the output values of the input gate, forgetting gate and output gate respectively.c t and h t correspond to the cell state and hidden state at the current time t, respectively, c t represents the temporary cell state, and φ and σ represent the tanh and sigmoid activation functions, respectively.

Overall Process of Detection Method of Epileptic Seizures Using a Neural Network Model Based on Multimodal Dual-Stream Networks
In order to utilize EEG signals to identify patients with epilepsy, a neural network model based on a multimodal two-stream network was adopted, with a mixed use of one-dimensional convolution, two-dimensional convolution and the LSTM neural network to extract the spatial characteristics of EEG and the temporal characteristics of the signal, respectively.Combining the advantages of the two networks can more comprehensively extract EEG features.This method includes the following steps.Result analysis: analyze the experimental results and compare the performance differences between the proposed model and the baseline model.

Neural Network Model Based on Multimodal Dual-Stream Networks
Based on one-dimensional convolution, two-dimensional convolution and LSTM modules, we designed a neural network model of a multi-modal two-stream network to solve the epileptic seizure detection task.The architecture of the model is as follows.
Time-series signal processing flow: A. Input: 5 × 356 time-series signal, including original signal, first-order difference, secondorder difference, amplitude spectrum and phase spectrum in frequency domain.B.
Processed through three one-dimensional convolution modules, a 256 × 356 feature vector y 1 is output.C.
Perform batch normalization and ReLU activation function on y 1 , and then add it to the feature vector y 2 processed by a one-dimensional convolution module to obtain y 3 .D.
Input y1 into the LSTM network to obtain a 356 × 4 output feature vector y 4 .STFT matrix processing flow: A.
Input: STFT matrix of the original signal.B.
Processed through three two-dimensional convolution modules, batch normalization and ReLU activation function, a 256 × 11 × 18 feature matrix y 5 is obtained.

Feature fusion and classification:
A.
Flatten y 3 , y 4 , and y 5 and concatenate them into one eigenvector.B.
Output the feature vector to the fully connected layer and output the classification probability through softmax.
Based on EEG signals, the model fuses features from time series signals and STFT matrices, uses one-dimensional convolution, two-dimensional convolution, and LSTM modules to extract temporal and spatial features, respectively, and performs classification through fully connected layers to achieve automatic epileptic seizure detection.The network structure is shown in Figure 7.

Bonn Dataset
In the EEG data from the University of Bonn, we treat Z, O, N and F as one category, labeled as the interictal period.The E mark indicates the ictal period.Then, we use the proposed neural network model for training.The experiment is divided into two phases: the training phase and testing phase.
In the training phase, we trained for 30 epochs.Finally, on the validation set, we achieved an accuracy of 99.2% with a loss function of 0.03082.Cross-validation is a method used to observe the stability of the model.We divide the data into n parts, use one part as the test set, one part as the validation set, and the other n − 2 parts as the training set, and calculate multiple times.The accuracy of the model is used to evaluate the average accuracy of the model, as shown in Equation (19).
where p refers to the accuracy obtained by each verification.After cross-validation, the average accuracy decreased slightly, and an accuracy of 98.55% was obtained.
In addition, we also conducted ablation experiments, removing the LSTM module and two-dimensional convolution module of the network, and removing both the LSTM and two-dimensional convolution modules to verify the effectiveness of the network.The results are shown in Figures 8 and 9.
and make the data concise.t-SNE technology can reduce the dimensionality of high mensional data in the CNN fully connected layer to two dimensions, so that we can in itively judge the performance of the current model [41].

Bonn Dataset
In the EEG data from the University of Bonn, we treat Z, O, N and F as one categ labeled as the interictal period.The E mark indicates the ictal period.Then, we use proposed neural network model for training.The experiment is divided into two pha the training phase and testing phase.
In the training phase, we trained for 30 epochs.Finally, on the validation set, achieved an accuracy of 99.2% with a loss function of 0.03082.Cross-validation is a met used to observe the stability of the model.We divide the data into n parts, use one par the test set, one part as the validation set, and the other n − 2 parts as the training set, calculate multiple times.The accuracy of the model is used to evaluate the average ac racy of the model, as shown in Equation (19).

𝑝 ∑ 𝑝
where p refers to the accuracy obtained by each verification.After cross-validation, average accuracy decreased slightly, and an accuracy of 98.55% was obtained.
In addition, we also conducted ablation experiments, removing the LSTM mod and two-dimensional convolution module of the network, and removing both the LS and two-dimensional convolution modules to verify the effectiveness of the network.results are shown in Figures 8 and 9.The results of the training phase: A. Remove the LSTM module: the accuracy is 98.2%, and the loss function is 0.05341 The results of the training phase: A.
Remove the LSTM module: the accuracy is 98.2%, and the loss function is 0.05341.B.
Remove the two-dimensional convolution module: the accuracy is 98.1%, and the loss function is 0.03859.C.
Remove the LSTM and two-dimensional convolution modules at the same time: the accuracy is 98%, and the loss function is 0.04234.
The combination of multiple modules has the advantage of better extracting the characteristics of EEG signals.
In the testing phase, the performance of the saved network model was tested using the test set and evaluated using precision, recall, and F1 scores.Finally, on the test set, the accuracy of our proposed network model was 0.9969, precision was 0.9944, recall was 1, and F1-score was 0.9972.In the ablation experiment, the LSTM module and two-dimensional convolution module of the network were removed respectively, and the accuracy, recall rate, and F1-score results of removing the LSTM and two-dimensional convolution module at the same time are shown in Figure 10.The results of the training phase: A. Remove the LSTM module: the accuracy is 98.2%, and the loss function is 0.05341.B. Remove the two-dimensional convolution module: the accuracy is 98.1%, and the loss function is 0.03859.C. Remove the LSTM and two-dimensional convolution modules at the same time: the accuracy is 98%, and the loss function is 0.04234.
The combination of multiple modules has the advantage of better extracting the characteristics of EEG signals.
In the testing phase, the performance of the saved network model was tested using the test set and evaluated using precision, recall, and F1 scores.Finally, on the test set, the accuracy of our proposed network model was 0.9969, precision was 0.9944, recall was 1, and F1-score was 0.9972.In the ablation experiment, the LSTM module and two-dimensional convolution module of the network were removed respectively, and the accuracy, recall rate, and F1-score results of removing the LSTM and two-dimensional convolution module at the same time are shown in Figure 10.
In the test set, the neural network combined with one-dimensional convolution, twodimensional convolution and the LSTM multi-module achieved greater performance advantages in the face of test data that did not appear during the training process, which shows that multi-modal feature extraction is more conducive to improving the generalization ability of the model.
Finally, the confusion matrix and t-SNE are used to visualize the predicted distribution of the test data.

New Delhi Dataset
In order to verify the effectiveness and generalization ability of the proposed network, the public New Delhi dataset was used to verify the model performance.In the EEG data of New Delhi, we used two categories of EEG data: interictal and ictal.The experi-

New Delhi Dataset
In order to verify the effectiveness and generalization ability of the proposed network, the public New Delhi dataset was used to verify the model performance.In the EEG data of New Delhi, we used two categories of EEG data: interictal and ictal.The experiment is divided into two phases: the training phase and testing phase.

Comparison and Discussion with Related Studies
A seizure detection method based on multimodal two-stream networks is proposed and validated using the widely recognized University of Bonn dataset, as shown in Table 2. Compared with the existing methods, the proposed method outperforms the existing methods in all main performance metrics [42].Richhariya and Tanveer [15] used PCA, ICA and DWT to achieve an accuracy of 99.0%.Li et al.'s [28] method based on wavelet envelope analysis achieved an accuracy of 98.8%.Shen et al. [43] adopted the methods of discrete wavelet transform and support vector machine and achieved an accuracy of 97% and a sensitivity of 96.67%, while Xu et al. [44] used the 1D CNN-LSTM method to improve accuracy, precision, recall and F1-score, respectively, reaching 99.39%, 98.39%, 98.79% and 98.59%.In contrast, the multi-modal dual-stream network method we proposed

Comparison and Discussion with Related Studies
A seizure detection method based on multimodal two-stream networks is proposed and validated using the widely recognized University of Bonn dataset, as shown in Table 2. Compared with the existing methods, the proposed method outperforms the existing methods in all main performance metrics [42].Richhariya and Tanveer [15] used PCA, ICA and DWT to achieve an accuracy of 99.0%.Li et al.'s [28] method based on wavelet envelope analysis achieved an accuracy of 98.8%.Shen et al. [43] adopted the methods of discrete wavelet transform and support vector machine and achieved an accuracy of 97% and a sensitivity of 96.67%, while Xu et al. [44] used the 1D CNN-LSTM method to improve accuracy, precision, recall and F1-score, respectively, reaching 99.39%, 98.39%, 98.79% and 98.59%.In contrast, the multi-modal dual-stream network method we proposed achieved an accuracy of 99.69%, a precision of 99.44%, a recall of 99.00%, and an F1-score of 99.72%.These results show that our method not only achieves the highest value in accuracy, but also significantly outperforms existing methods in key performance indicators such as precision, recall, and F1-score.This further verifies the effectiveness of the multi-modal dualstream network in processing complex data features and identifying samples of different categories.Future research can further optimize the model structure and try to verify its generality and robustness on more diverse data sets.

Conclusions
This paper studies the application of hybrid neural network models in epilepsy diagnosis using EEG signals.First, the complexity features of the EEG signal are extracted using various feature methods such as signal differential features, frequency domain amplitude spectrum and phase spectrum, etc., to form a two-dimensional time-series signal and two-dimensional spectrum features.In terms of network models, in order to extract the characteristics of EEG signals in multiple dimensions, three network structures are used, namely, one-dimensional convolution, two-dimensional convolution and lstm.Through the combination of multiple network structures, the multi-dimensional characteristics of EEG signals are trained.Finally, experiments were conducted on the public Bonn and New Delhi datasets to evaluate the effectiveness of the proposed model using indicators such as precision, recall, F1 score, etc.Finally, the test set results were analyzed using the confusion matrix and t-SNE.Our research results prove that the proposed network model achieved the best diagnostic effect in the experiment, with an accuracy of 0.9969, precision of 0.9944, recall of 1, and F1 score of 0.9972.Even after changing the data set, the hybrid mesh wheel still has the most stable classification performance and can achieve high accuracy in the diagnosis of epilepsy.This article provides a hybrid neural network model based on EEG for EEG signal epilepsy diagnosis, and uses a variety of feature extraction methods to provide a useful reference for the early detection and treatment of epilepsy.
5 and 70 Hz and then divided into pre-ictal, interictal and ictal.Each category contains MAT files of 50 EEG time-series signals.The sampling frequency is 200 Hz, and each MAT file contains 1024 samples.Each sample represents a set of EEG time-series data with a duration of 5.12 s.The EEG signal is shown in Figure 2. Sensors 2024, 24, x FOR PEER REVIEW

Figure 8 .
Figure 8.The training accuracy and loss function of the proposed network, as well as the rem of the LSTM module, the removal of the two-dimensional convolution module, and the accu and loss functions of the ablation experiment in the Bonn data set by removing the LSTM and t dimensional convolution module at the same time.

Figure 8 .Figure 9 .
Figure 8.The training accuracy and loss function of the proposed network, as well as the removal of the LSTM module, the removal of the two-dimensional convolution module, and the accuracy and loss functions of the ablation experiment in the Bonn data set by removing the LSTM and two-dimensional convolution module at the same time.Sensors 2024, 24, x FOR PEER REVIEW 12 of

Figure 9 .
Figure 9.The proposed network removes the LSTM module, removes the two-dimensional convolution module, and simultaneously removes the LSTM and two-dimensional convolution module.Ablation experimental performance on the Bonn dataset: (a) highest training accuracy (b) loss function.
Ablation experimental performance on the Bonn dataset: (a) highest training accuracy (b) loss function.

Figure 10 .
Figure 10.The proposed network removes the LSTM module, removes the two-dimensional convolution module, and simultaneously removes the LSTM and two-dimensional convolution modules in the Bonn test set ablation experiment.Accuracy, precision, recall, and F1-score are shown.

Figure 10 .
Figure 10.The proposed network removes the LSTM module, removes the two-dimensional convolution module, and simultaneously removes the LSTM and two-dimensional convolution modules in the Bonn test set ablation experiment.Accuracy, precision, recall, and F1-score are shown.
Figure 11 contains the confusion matrices of four different models, namely, the proposed model, No-lstm, No-2DCONV and No-2DCONV-LSTM.Each confusion matrix shows the classification results between two categories (interictal and ictal), including true examples, false-positive examples, true-negative examples, and false-negative examples.Each confusion matrix represents different classification situations with different colors, and darker colors represent higher numbers.The model proposed in this article has obtained the best classification effect.
fusion matrix shows the classification results between two categories (interictal and ictal), including true examples, false-positive examples, true-negative examples, and false-negative examples.Each confusion matrix represents different classification situations with different colors, and darker colors represent higher numbers.The model proposed in this article has obtained the best classification effect.

Figure 11 .
Figure 11.The proposed network, the LSTM module is removed, the two-dimensional convolution module is removed, and the LSTM and two-dimensional convolution modules are removed at the same time, and the confusion matrix of the Bonn test set ablation experiment is shown.

Figure 12
Figure 12 of t-SNE shows the clustering of data for four different models.Each subgraph has two colors of points, representing two different types of data.The proposed model has the least confounded classification results between the two categories (interictal and ictal).

Figure 11 .
Figure 11.The proposed network, the LSTM module is removed, the two-dimensional convolution module is removed, and the LSTM and two-dimensional convolution modules are removed at the time, and the confusion matrix of the Bonn test set ablation experiment is shown.

Figure 12
Figure 12 of t-SNE shows the clustering of data for four different models.Each subgraph has two colors of points, representing two different types of data.The proposed model has the least confounded classification results between the two categories (interictal and ictal).

Sensors 2024 , 18 Figure 12 .
Figure 12.The proposed network removes the LSTM module, removes the two-dimensional convolution module, removes both the LSTM and the two-dimensional convolution module, and performs cluster analysis on the Bonn test set ablation experiment.

Figure 12 .
Figure 12.The proposed network removes the LSTM module, removes the two-dimensional convolution module, removes both the LSTM and the two-dimensional convolution module, and performs cluster analysis on the Bonn test set ablation experiment.

4. 2 .
New Delhi DatasetIn order to verify the effectiveness and generalization ability of the proposed network, the public New Delhi dataset was used to verify the model performance.In the EEG data of New Delhi, we used two categories of EEG data: interictal and ictal.The experiment is divided into two phases: the training phase and testing phase.In the training phase, 30 epochs were trained.The accuracy and loss functions in the training phase are shown in the figure.The final accuracy is 1 and the loss function is 6.71 × 10 −8 .The results are shown in Figure13.

Figure 12 .
Figure 12.The proposed network removes the LSTM module, removes the two-dimensional convolution module, removes both the LSTM and the two-dimensional convolution module, and performs cluster analysis on the Bonn test set ablation experiment.
In the training phase, 30 epochs were trained.The accuracy and loss functions in the training phase are shown in the figure.The final accuracy is 1 and the loss function is 6.71 × 10 −8 .The results are shown in Figure 13.

Figure 13 .
Figure 13.Performance of the proposed network on the New Delhi dataset: (a) accuracy (b) loss function..

Figure 13 .
Figure 13.Performance of the proposed network on the New Delhi dataset: (a) accuracy (b) loss function.On the test set, the obtained accuracy is 0.975, precision is 0.9444, recall is 1, and F1-score is 0.9714.The confusion matrix and cluster analysis are shown in Figure 14.It can be seen that the proposed model still has good accuracy when training and predicting using a new EEG data set without changing the network structure, verifying the improvement in the effectiveness and generalization ability of the model.Sensors 2024, 24, x FOR PEER REVIEW 15 of 18

Figure 14 .
Figure 14.Confusion matrix and cluster analysis of the proposed network on New Delhi dataset.

Figure 14 .
Figure 14.Confusion matrix and cluster analysis of the proposed network on New Delhi dataset.
Fast Fourier Transform) and (Short-Time Fourier Transform) STFT FFT functions to calculate the Discrete Fourier Transform (DFT) of the input signal x 0 The Bonn and New Delhi data sets are divided into training sets, validation sets and test sets.Train, validate, and test the model to evaluate its performance.The performance of the model on the epileptic seizure detection task was evaluated using accuracy, recall, precision, and F1 score.E.

Table 2 .
Comparison and discussion with related studies.