Robust Epileptic Seizure Detection Using Long Short-Term Memory and Feature Fusion of Compressed Time–Frequency EEG Images

Epilepsy is a prevalent neurological disorder with considerable risks, including physical impairment and irreversible brain damage from seizures. Given these challenges, the urgency for prompt and accurate seizure detection cannot be overstated. Traditionally, experts have relied on manual EEG signal analyses for seizure detection, which is labor-intensive and prone to human error. Recognizing this limitation, the rise in deep learning methods has been heralded as a promising avenue, offering more refined diagnostic precision. On the other hand, the prevailing challenge in many models is their constrained emphasis on specific domains, potentially diminishing their robustness and precision in complex real-world environments. This paper presents a novel model that seamlessly integrates the salient features from the time–frequency domain along with pivotal statistical attributes derived from EEG signals. This fusion process involves the integration of essential statistics, including the mean, median, and variance, combined with the rich data from compressed time–frequency (CWT) images processed using autoencoders. This multidimensional feature set provides a robust foundation for subsequent analytic steps. A long short-term memory (LSTM) network, meticulously optimized for the renowned Bonn Epilepsy dataset, was used to enhance the capability of the proposed model. Preliminary evaluations underscore the prowess of the proposed model: a remarkable 100% accuracy in most of the binary classifications, exceeding 95% accuracy in three-class and four-class challenges, and a commendable rate, exceeding 93.5% for the five-class classification.


Introduction
Approximately 1% of the global population is affected by epilepsy [1].This condition poses significant challenges that can even be life-threatening for those affected.Among these patients, one-third do not respond to medications and need physical interventions [2,3].Epileptic seizures are characterized by swift and abnormal fluctuations in the electrical patterns of the brain [4].In severe cases, they can cause the entire body to become unresponsive [5].Electroencephalogram (EEG) signals have been the fundamental reference for detecting epileptic seizures, helping to identify the seizure origin, and facilitating the treatment of the affected brain tissues through medication and surgical procedures [6].EEG signals contain significant features that detail both regular and irregular brain activities, particularly epileptic seizures.In addition, high-temporal-resolution EEG data from the scalp, spanning multiple input channels, can be acquired through distributed continuous sensing techniques [7].Traditionally, diagnosing epilepsy through visual analysis of EEG recordings, both clinically and conventionally, is labor-intensive and prone to error, with varying consistency among experts, because of its heavy reliance on human expertise and skill [8,9].Many EEG automatic seizure detection systems struggle with real-time specificity and sensitivity, making them less suitable for clinical applications.There is a pressing need for an advanced computer-aided system that can efficiently assist neurologists in detecting epileptic seizures, ultimately reducing the time spent analyzing extensive EEG recordings [10].In areas with a scarcity of neurologists, the excessive dependence on human expertise can increase the costs and cause delays in treating epilepsy.Tackling these issues is essential to guarantee affordable epilepsy care in low-to-middle-income regions, particularly in isolated locations with restricted access to skilled professionals and advanced facilities.Improving access to automated seizure detection using EEG signals has been studied extensively to mitigate this issue [11].
Machine learning is used widely to detect diseases automatically from biomedical signals, such as ECG and EEG.For example, a previous study [12] used two distinct features to detect epileptic seizures: fractal-based nonlinear features and entropy-based features.These features were inputted into two machine learning classifiers: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN).The classifiers were trained and tested on the Bonn Epilepsy database.This database comprises five distinct classes: Set S, Set F, Set N, Set O, and Set Z. Set S represents seizure activity typically observed in epileptic patients.Both Set F and Set N denote seizure-free states in the epileptic class, Set O is associated with a normal, non-epileptic state where the subject's eyes are closed, while Set Z corresponds to the normal state with the subject's eyes open.In their evaluation, they considered binary (e.g., Z-S, O-S, and N-S) and the three-class detection problem (ZO-NF-S).In addition, another study [13] introduced a framework that integrates fuzzy-based methods and conventional machine-learning techniques to identify epileptic EEG samples in binary classification problems.A limited set of features and linear (using the Naïve Bayes classifier) and nonlinear (using the K-Nearest Neighbor classifier) approaches were applied to classify the EEG samples [14].Binary classification tasks were involved in classifying various classes, i.e., Z-S, O-S, N-S, F-S, ZO-S, and ZN-E.Similarly, another study [15] used the statistical features and classified with SVM (AdaBoost Least-Square SVM).The resulting accuracy for the binary FNOZ-S classification problem in the Bonn dataset was 99%.In particular, none of these authors extended the evaluation of their proposed methods to include multi-class classifications.
Beyond traditional machine learning techniques, various deep learning architectures have been introduced to detect epileptic seizures in the EEG data.A previous study [16] utilized deep learning approaches to extract the important features from EEG data.In particular, a Convolutional Neural Network (CNN) was implemented for the differentiation tasks among normal, preictal, and seizure classes.The author of [17] introduced an experimental and methodological approach that mapped microscale local network dynamics with high spatiotemporal resolution and employed a quantitative analysis framework to elucidate the dynamics of seizure initiation and progression in vivo.In addition, the discrete wavelet transform (DWT) was used for feature extraction from the EEG data [18].A combination of genetic algorithm and artificial neural network (ANN) and the Support Vector Machine (SVM) classifiers were used to address binary and three-class classification challenges in the Bonn Epilepsy database.
Many seizure detection methods concentrate on specific domains, such as utilizing time-frequency domain methods, i.e., continuous wavelet transform (CWT), time domain, frequency domain, and statistical attributes [19][20][21][22].Unlike the other methods, the proposed epileptic detection model innovatively combines the best of these attributes.A comprehensive set of important features is obtained by leveraging the complex insights from the statistical domain that is characterized by rich features, such as the mean, median, variance, skewness, and kurtosis, with the compressed time-frequency domain images (CWT Images) processed through an autoencoder.This hybrid integration of Convolutional Autoencoder (CAE) latent space and statistical features ensures model robustness, making it adept at capturing the most vital information for classification.A long short-term memory network was used to optimize the approach, allowing precise classifications ranging

Contribution
The main contributions of this work are as follows:

•
This study introduces a significant advancement in epileptic seizure detection.The proposed novel deep learning method seamlessly merges the compressed latent space features from the time-frequency domain with statistical attributes of the EEG signal.This integrated feature pool captures time-frequency and statistical information, making this approach different in robustness and accuracy.

•
The proposed hybrid model uses an optimal window size for EEG segmentation, ensuring minimal data loss and a set overlap ratio.After rigorous evaluation, this method selects the best window size for maximal data coverage, which is crucial for precise EEG classification.This strategy upholds data integrity, boosting the classification reliability of the model.

•
The CAE latent space features still contain some less important features.Principal Component Analysis (PCA) was applied to extract the most relevant features from the latent space, enhancing the classification accuracy.

•
LSTM networks were used for classification, capitalizing on their proficiency with time-series data.Given the sequential nature of the EEG signals, LSTMs, with their ability to capture long-term dependencies, provided enhanced accuracy in detecting intricate seizure patterns.

•
While many studies evaluate the Bonn dataset for binary classification, some extend to three or four classes, with few tackling a five-class problem.This study encompassed classifications from binary to five class, achieving unprecedented accuracy, i.e., 100% for binary, >95% for three and four classes, and above 93% for the five-class categorization, marking the highest recorded performance in terms of accuracy.
The remainder of this article is organized as follows.Section 2 provides an in-depth explanation of the model design and components.Section 3 reports the dataset description and the model performance on the benchmark dataset.Finally, Section 4 provides the concluding remarks on the article.

Proposed Method
This section provides an overview of the proposed methodology for epilepsy detection, leveraging a hybrid model that combines an autoencoder and a Recurrent Neural Network (RNN), specifically the long short-term memory (LSTM) variant.The procedure starts with a windowing technique, segmenting the continuous signal into smaller, manageable packets.This approach ensures that every datum is captured accurately.Once segmented, critical statistical features for each windowed segment are calculated, capturing the primary characteristics of the data.Subsequently, the continuous wavelet transform is applied to the segmented data.This transformation extracts time-frequency information from each segment, providing a more detailed representation of the signal dynamics.The resulting time-frequency images serve as input to the Convolutional Autoencoder, which distills the data into a latent feature space.Owing to the potential high dimensionality of this latent space, PCA was implemented to streamline the feature set, retaining only those components that contribute significantly to the variance and, by extension, the classifiability of the data.These condensed features are merged with the previously computed statistical features, producing a hybrid feature pool.This comprehensive feature set captures both the inherent characteristics of the signal and its nuanced, transformed representations.Finally, this paper introduces the LSTM model, which takes this hybrid feature set as input and determines the epilepsy state of the signal.The inherent capacity of the LSTM to process sequential data makes it particularly suited for this task, ensuring accurate classifications across various detection scenarios.Figure 1 presents a visual representation of the entire process.
extension, the classifiability of the data.These condensed features are merged with th previously computed statistical features, producing a hybrid feature pool.This compre hensive feature set captures both the inherent characteristics of the signal and its nuanced transformed representations.Finally, this paper introduces the LSTM model, which take this hybrid feature set as input and determines the epilepsy state of the signal.The inher ent capacity of the LSTM to process sequential data makes it particularly suited for thi task, ensuring accurate classifications across various detection scenarios.Figure 1 present a visual representation of the entire process.

Windowing
The Bonn University Epilepsy dataset comprises five distinct subsets, Set Z, Set O Set N, Set F, and Set S, and the details of which are described earlier in the introduction section.Each subset contains 100 samples, resulting in 500 samples across the entire da taset.In the present study, all 100 samples were chained, and a windowing technique wa applied to create small segments of the EEG signal.In signal processing analysis, win dowing plays a pivotal role, primarily in combating the challenges of spectral leakage Spectral leakage is a key concern in signal processing, particularly relevant when analyz ing EEG signals.It occurs when energy from the signal's true frequency leaks into othe frequencies, often due to the finite length of the signal window.This can distort the tru frequency content of EEG data, potentially affecting the accuracy of seizure detection Moreover, windowing enhances temporal localization, ensuring that specific spectra events are precisely mapped within distinct time frames.The technique also fine-tune the frequency resolution, delineating closely packed frequency components with clarity [23].Given the advantage, the sliding window technique was employed to partition each sample into multiple smaller signal segments.An overlapping sliding window method implementing a 1458 data-point window with a 486 data-point overlap, was used to en sure no data-points were omitted.This window, shown in Figure 2, successively slide across the data, producing smaller signal segments, the combination of which represent the complete signal of the subject.The mathematical formulation of the sliding window technique with overlap, for a given signal  of length , the starting and ending point of the  windowed segment  , is expressed below.Equation (1) indicates the starting point of each window, and Equation (2) expresses the ending point of the window.

Windowing
The Bonn University Epilepsy dataset comprises five distinct subsets, Set Z, Set O, Set N, Set F, and Set S, and the details of which are described earlier in the introduction section.Each subset contains 100 samples, resulting in 500 samples across the entire dataset.In the present study, all 100 samples were chained, and a windowing technique was applied to create small segments of the EEG signal.In signal processing analysis, windowing plays a pivotal role, primarily in combating the challenges of spectral leakage.Spectral leakage is a key concern in signal processing, particularly relevant when analyzing EEG signals.It occurs when energy from the signal's true frequency leaks into other frequencies, often due to the finite length of the signal window.This can distort the true frequency content of EEG data, potentially affecting the accuracy of seizure detection.Moreover, windowing enhances temporal localization, ensuring that specific spectral events are precisely mapped within distinct time frames.The technique also fine-tunes the frequency resolution, delineating closely packed frequency components with clarity [23].Given the advantage, the sliding window technique was employed to partition each sample into multiple smaller signal segments.An overlapping sliding window method, implementing a 1458 data-point window with a 486 data-point overlap, was used to ensure no data-points were omitted.This window, shown in Figure 2, successively slides across the data, producing smaller signal segments, the combination of which represents the complete signal of the subject.The mathematical formulation of the sliding window technique with overlap, for a given signal S of length L, the starting and ending points of the i th windowed segment S i , is expressed below.Equation (1) indicates the starting point of each window, and Equation (2) expresses the ending point of the window.For the  window, End:  =  + ( − 1) × ( − ) where; •  is the window length, and in this case,  = 1458.

•
is the overlap length, and here,  = 486.

•
is the window number (e.g.,  = 1 for the first window,  = 2 for the second, so on).

•
It should be always ensured that  > 0 for the above formulation to be valid.

Continuous Wavelet Transformation (CWT)
Electroencephalography (EEG) records the electrical activity of the brain, producing inherently non-stationary signals.Traditional Fourier methods, which analyze the signals in terms of sinusoids with infinite duration, may not effectively capture the transient or time-varying phenomena of the EEG data [24].On the other hand, wavelet transform is a computational method designed to analyze non-stationary signals by decomposing them into various frequency components while maintaining temporal resolution.The wavelet transform employs basic functions called "wavelets", allowing simultaneous frequency and time domain analysis [25,26].Equation ( 3) is a mathematical expression for the wavelet transform.

𝑊𝑇(𝑠, 𝑡
where () is the input signal;  * (⋅) represents the complex conjugate of the wavelet function;  is the scale factor (which is inversely related to frequency); and  is the translation factor (related to time).Extending this concept, the CWT is a specialized form of wavelet transform wherein the wavelet undergoes continuous scaling and translation, allowing temporal and spectral analysis [27].CWT's multi-resolution characteristic is particularly advantageous for interpreting EEG signals, given that different physiological phenomena might present themselves at diverse scales.The expression for CWT of a function () relative to a wavelet () is as follows: with the modified wavelet given by the following: •  is called the mother wavelet, which is a short wave-like oscillation.
•  is the scaling factor.The function is stretched if  > 1 or compressed if 0 <  < 1.

•
is the translation factor, which shifts the function in time.For the i th window, Start : End : where; • ω is the window length, and in this case, ω = 1458.

•
i is the window number (e.g., i = 1 for the first window, i = 2 for the second, so on).

•
It should be always ensured that ω > 0 for the above formulation to be valid.

Continuous Wavelet Transformation (CWT)
Electroencephalography (EEG) records the electrical activity of the brain, producing inherently non-stationary signals.Traditional Fourier methods, which analyze the signals in terms of sinusoids with infinite duration, may not effectively capture the transient or time-varying phenomena of the EEG data [24].On the other hand, wavelet transform is a computational method designed to analyze non-stationary signals by decomposing them into various frequency components while maintaining temporal resolution.The wavelet transform employs basic functions called "wavelets", allowing simultaneous frequency and time domain analysis [25,26].Equation ( 3) is a mathematical expression for the wavelet transform.
where f (τ) is the input signal; ψ * (•) represents the complex conjugate of the wavelet function; s is the scale factor (which is inversely related to frequency); and t is the translation factor (related to time).Extending this concept, the CWT is a specialized form of wavelet transform wherein the wavelet undergoes continuous scaling and translation, allowing temporal and spectral analysis [27].CWT's multi-resolution characteristic is particularly advantageous for interpreting EEG signals, given that different physiological phenomena might present themselves at diverse scales.The expression for CWT of a function f (t) relative to a wavelet ψ(t) is as follows: with the modified wavelet given by the following: • ψ is called the mother wavelet, which is a short wave-like oscillation.

•
s is the scaling factor.The function is stretched if s > 1 or compressed if 0 < s < 1.

•
t is the translation factor, which shifts the function in time.

•
τ is the variable of integration, typically representing time.

•
The factor 1 √ |s| is a normalization term that ensures that the wavelet has the same energy at every scale.
Equations ( 3) and ( 4) describe how the original mother wavelet, ψ, is scaled and translated to analyze a signal at various frequencies and time positions.
The CWT was used to convert EEG signal segments into images, employing the Morlet wavelet.The Morlet wavelet, a complex sinusoid modulated by a Gaussian envelope, is crucial in signal processing for its ability to highlight oscillatory patterns, particularly in EEG/ECG data [28].The CWT, with Morlet as a mother wavelet, extracted both the spectral and temporal resolutions of the signal, which were subsequently represented as images.Figure 3 shows the graphical representation of CWT images of each class of the Bonn Epilepsy dataset.

•
The factor | | is a normalization term that ensures that the wavelet has the same energy at every scale.
Equations ( 3) and ( 4) describe how the original mother wavelet,  , is scaled and translated to analyze a signal at various frequencies and time positions.
The CWT was used to convert EEG signal segments into images, employing the Morlet wavelet.The Morlet wavelet, a complex sinusoid modulated by a Gaussian envelope, is crucial in signal processing for its ability to highlight oscillatory patterns, particularly in EEG/ECG data [28].The CWT, with Morlet as a mother wavelet, extracted both the spectral and temporal resolutions of the signal, which were subsequently represented as images.Figure 3 shows the graphical representation of CWT images of each class of the Bonn Epilepsy dataset.

Convolutional Autoencoder
After being proposed by Theis et al. [29] and Balle et al. [30], the Convolutional Autoencoder (CAE) has attracted the interest of many researchers in recent years, particularly for leaned image compression.Convolutional Autoencoder is a specialized neural network that encodes and decodes data with spatial hierarchies, such as images.Unlike traditional autoencoders, CAEs utilize convolutional layers to exploit spatial localities in data, making them particularly adept at handling images.A CAE aims to approximate an identity function while adhering to specific constraints, such as limited neurons in hidden layers.A CAE is structured into two main components:

Encoder
The encoder portion of a CAE serves as a funnel, which is responsible for mapping the input  ∈ ℝ to a latent (or compressed) space.This is achieved using a series of convolution operations designed to capture the spatial hierarchies in the data.Considering a feedforward neural network as the architecture, the output ℎ ( ) of the  layer in the encoder is defined as follows: where  ( ) denotes the convolutional filters (or kernels), which can be considered tiny feature detectors.The nonlinear activation function, , introduces non-linearity into the system, allowing the network to learn complex patterns.As the EEG image progresses through the  convolutional layers of the encoder, the final encoded representation, ℎ ( ) = ℎ, serves as a compressed, but rich, encapsulation of the most salient features of the images.

Decoder
The decoder acts as the inverse of the encoder.The decoder takes the compressed representation ℎ and attempts to reconstruct it back to the original space.This involves transposed convolutional operations, which can be visualized as deconvolutions or reverse convolutions.If a feedforward neural network is considered, the output ℎ ( ) of the  layer in the decoder is as follows:

Convolutional Autoencoder
After being proposed by Theis et al. [29] and Balle et al. [30], the Convolutional Autoencoder (CAE) has attracted the interest of many researchers in recent years, particularly for leaned image compression.Convolutional Autoencoder is a specialized neural network that encodes and decodes data with spatial hierarchies, such as images.Unlike traditional autoencoders, CAEs utilize convolutional layers to exploit spatial localities in data, making them particularly adept at handling images.A CAE aims to approximate an identity function while adhering to specific constraints, such as limited neurons in hidden layers.A CAE is structured into two main components:

Encoder
The encoder portion of a CAE serves as a funnel, which is responsible for mapping the input x ∈ R n to a latent (or compressed) space.This is achieved using a series of convolution operations designed to capture the spatial hierarchies in the data.Considering a feedforward neural network as the architecture, the output h (l+1) e of the l th layer in the encoder is defined as follows: where e denotes the convolutional filters (or kernels), which can be considered tiny feature detectors.The nonlinear activation function, σ, introduces non-linearity into the system, allowing the network to learn complex patterns.As the EEG image progresses through the L e convolutional layers of the encoder, the final encoded representation, h (L e ) e = h, serves as a compressed, but rich, encapsulation of the most salient features of the images.

Decoder
The decoder acts as the inverse of the encoder.The decoder takes the compressed representation h and attempts to reconstruct it back to the original space.This involves transposed convolutional operations, which can be visualized as deconvolutions or reverse convolutions.If a feedforward neural network is considered, the output h (l+1) d of the l th layer in the decoder is as follows: where W (l) d are the transposed convolutional filters, which operate in a manner opposite to the encoder filters.The final output from the decoder, h aims to be a faithful reconstruction of the original image x, bringing full circle the encoding-decoding process of the CAE.
The primary objective of a CAE is to minimize the reconstruction error between the original input and its reconstruction.This error, typically termed as the loss function, can be defined as follows: Optimization algorithms, such as backpropagation, minimize this loss when training a CAE.In the architecture presented in Table 1, a CAE was used with a five-layer encoder and decoder.The CAE's effectiveness is demonstrated by a high PSNR value of 66 dB, indicating precise image reconstruction.Figure 4 shows the graphical layer-wise architecture of the CAE.
where  ( ) are the transposed convolutional filters, which operate in a manner opposite to the encoder filters.The final output from the decoder, ℎ ( ) = ′, aims to be a faithful reconstruction of the original image , bringing full circle the encoding-decoding process of the CAE.The primary objective of a CAE is to minimize the reconstruction error between the original input and its reconstruction.This error, typically termed as the loss function, can be defined as follows: Optimization algorithms, such as backpropagation, minimize this loss when training a CAE.In the architecture presented in Table 1, a CAE was used with a five-layer encoder and decoder.The CAE's effectiveness is demonstrated by a high PSNR value of 66 dB, indicating precise image reconstruction.Figure 4 shows the graphical layer-wise architecture of the CAE.

Principal Component Analysis
PCA is a well-established dimensionality reduction technique that projects data into a lower-dimensional space while preserving as much of the original variance as possible [31].This method is particularly useful for reducing the dimensionality of datasets with many correlated variables, transforming them into a new set of orthogonal variables known as the principal components [32,33].
In the context of this study, PCA was used to reduce the dimensionality of the latent space extracted from the autoencoder.A compact representation of the data that retained most of the original variance was ensured by reducing the features to 128 dimensions using PCA.This processed latent space was combined with statistical features in a hybrid feature pool, paving the way for enhanced EEG signal classification.

Statistical Features
Electroencephalogram (EEG) signals, which represent the electrical activities of the brain, are inherently dynamic and complex.Therefore, it is imperative to extract the representative features that capture the underlying characteristics of the EEG data to discern information from these signals, particularly for applications, such as epilepsy detection.In addition, statistical features offer a compact representation of EEG signals, distilling them into metrics that reflect the distribution and behavior of the signal over time [34].These include the mean, standard deviation, kurtosis, skewness, and various factors, such as crest, shape, and impulse.Although each of these metrics carries its significance in capturing different signal characteristics when they provide a comprehensive overview of the signal when combined.For example, the mean offers a central tendency, suggesting the average amplitude of the signal.Standard deviation and variance capture the dispersion and variability within the signal.Metrics, such as kurtosis and skewness, provide insights into the shape of the distribution of the signal, indicating the presence of any irregular peaks or asymmetries.Factors, such as crest and shape, elucidate the transient behaviors of the signal and its oscillatory nature.Combining these statistical features with the latent features of an autoencoder derived from the CWT images can significantly enhance the classification performance of EEG signals, particularly in epilepsy detection.Because statistical features capture the basic characteristics of EEG signals, the latent space of the autoencoder, derived from the CWT images, encapsulates more complex, nonlinear patterns in the data.They offer a more comprehensive representation of the EEG signal.The fusion of these two feature sets can increase the robustness of the model.This process benefits from the generalization capabilities of autoencoders and the straightforward interpretability of statistical metrics.Furthermore, epileptic seizures lead to characteristic changes in the EEG patterns.Statistical features can highlight sudden spikes, deviations, and anomalies in the signal, which are common indicators of epileptic activities.Combined with the high-level patterns learned by the autoencoder from CWT images, the classification system can better differentiate between epileptic and non-epileptic signals.Table 2 provides the list of calculated statistical features.

Hybrid Features Pool
EEG signals are complex yet rich in information.It is very important to extract their right features to analyze them.With simple statistical features, a broader and more useful set of attributes can be obtained by combining the power of deep learning methods, such as CWT images.This approach combines detailed patterns (from CWT images) and basic signal traits (from statistical features) to provide a well-rounded view of the EEG data.
Ensuring the alignment of features accurately within this hybrid framework is essential to preserve data consistency and optimize subsequent analytical outcomes.F AE represents the set of features derived from the bottleneck of an autoencoder for a specific EEG window, and F stat denotes the statistical features for the same window.The harmonization of these features can be represented as follows: The index i in f AE i and f stat i ensures that the autoencoder latent space features and statistical features are obtained from the same EEG window packet.This hybrid feature pool offers a multidimensional view of EEG signals, amplifying the richness of information available in each class.This feature integration promises robustness against potential intra-class variations and maximizes the inter-class disparities, emphasizing its importance for complex data, such as EEG and EMG signal classification applications.These hybrid features are then input into an LSTM network for final classification.

Long Short-Term Memory
LSTM networks, a specific architecture of RNNs, have attracted significant attraction for predicting time-series data because of their unique cellular design.This design is essential for the LSTM to transmit information selectively, addressing issues such as vanishing and exploding gradients during backpropagation [35].Figure 5 presents an in-depth visualization of this architecture.At the core of an LSTM are three main gates: forget, input, and output gates.

Hybrid Features Pool
EEG signals are complex yet rich in information.It is very important to extract their right features to analyze them.With simple statistical features, a broader and more useful set of attributes can be obtained by combining the power of deep learning methods, such as CWT images.This approach combines detailed patterns (from CWT images) and basic signal traits (from statistical features) to provide a well-rounded view of the EEG data.
Ensuring the alignment of features accurately within this hybrid framework is essential to preserve data consistency and optimize subsequent analytical outcomes. represents the set of features derived from the bottleneck of an autoencoder for a specific EEG window, and  denotes the statistical features for the same window.The harmonization of these features can be represented as follows: The index  in  and  ensures that the autoencoder latent space features and statistical features are obtained from the same EEG window packet.This hybrid feature pool offers a multidimensional view of EEG signals, amplifying the richness of information available in each class.This feature integration promises robustness against potential intra-class variations and maximizes the inter-class disparities, emphasizing its importance for complex data, such as EEG and EMG signal classification applications.These hybrid features are then input into an LSTM network for final classification.

Long Short-Term Memory
LSTM networks, a specific architecture of RNNs, have attracted significant attraction for predicting time-series data because of their unique cellular design.This design is essential for the LSTM to transmit information selectively, addressing issues such as vanishing and exploding gradients during backpropagation [35].Figure 5 presents an indepth visualization of this architecture.At the core of an LSTM are three main gates: forget, input, and output gates.Initially, the forget gate decides the segments of information that the cell state should discard.Initially, the forget gate decides the segments of information that the cell state should discard.
where h t−1 denotes the prior hidden layer output; x t symbolizes the current input, with σ being the sigmoid activation; and W and b represent the weight matrix and bias, respectively.Subsequently, the input gate governs the preservation of information in the cell state, spliting into identifying the data for updates and setting up an updated state.This can be expressed mathematically as follows: The present state of the neuron can be derived by combining Equations ( 2) and ( 3): The role of the output gate is pivotal for determining the final output.The sigmoid function evaluates which segment of the cell state to assign to output, subsequently undergoing processing by the tanh function and pointwise multiplication: In biomedical contexts, the strength of the LSTM lies in its ability to recognize the patterns over time, making it particularly effective for detecting epileptic seizures.
EEG data, characterized by detailed time-based patterns, benefits from accuracy and timely analysis by the LSTM, ultimately improving patient care and treatment outcomes.This model uses an LSTM layer, consisting of 128 units, designed specifically to process the time-dependent patterns in EEG data.The data are passed to a dense layer using softmax activation, sorting the LSTM outputs into specific categories.The model is fine-tuned for optimal performance with the "adam" optimizer and the categorical_crossentropy loss function, which is suited for classifying multiple categories.The hyperparameters for this study were selected through a series of experiments shown in Table 3. Combining the strengths of autoencoder latent space features and statistical attributes, the LSTM provides a thorough and accurate representation of the complex patterns of the EEG data.This integration enhances the model robustness and its ability to identify subtle EEG patterns accurately, which is crucial for advanced seizure detection.The effectiveness of the proposed model will be further discussed in the next section.

Meta Data
In this study, the EEG database from the University of Bonn, Germany, curated by Andrzezak et al. [36], was chosen for data incorporation.This database was selected because of its authority in the field and its frequent utilization in numerous epilepsy diagnostic studies.The dataset comprises five sets (Z, O, N, F, and S) of 100 EEG signals each, captured via a single channel from the scalp surface.Each EEG signal spans a duration of 23.6 s and includes 4097 sample points.The signals were digitized using a 12-bit A/D converter at a sampling frequency of 173.61 Hz.
In the data collection process, a total of 10 subjects were involved.Sets Z and O originate from the EEG records of five healthy individuals, with eyes open and closed, respectively.Sets N, F, and S derive from the preoperative EEG records of five diagnosed epileptic patients.In particular, Set N segments were from the hippocampus located in the opposite hemisphere of the brain.Set F was obtained from within the epileptogenic zone, with both sets containing measurements during seizure-free intervals.Set S solely encompassed the seizure activity.Table 4 provides detailed information regarding these data.For this study, all five sets were utilized, with representative EEG signal samples from each group presented in Figure 6.In the data collection process, a total of 10 subjects were involved.Sets Z and inate from the EEG records of five healthy individuals, with eyes open and closed tively.Sets N, F, and S derive from the preoperative EEG records of five diagno leptic patients.In particular, Set N segments were from the hippocampus locate opposite hemisphere of the brain.Set F was obtained from within the epileptogen with both sets containing measurements during seizure-free intervals.Set S solely passed the seizure activity.Table 4 provides detailed information regarding the For this study, all five sets were utilized, with representative EEG signal sampl each group presented in Figure 6.In this study, the classification performance of epilepsy seizure detection m evaluated using multiple metrics: accuracy, F1-score, precision, recall, and sensitiv choice of these metrics provides a comprehensive understanding of the model pro in accurately identifying the seizures and distinguishing between the various clas In a binary classification framework, the terminologies employed are as follo

•
True Positive (TP): instances confirmed to be positive.In this study, the classification performance of epilepsy seizure detection models is evaluated using multiple metrics: accuracy, F1-score, precision, recall, and sensitivity.The choice of these metrics provides a comprehensive understanding of the model proficiency in accurately identifying the seizures and distinguishing between the various classes.
In a binary classification framework, the terminologies employed are as follows: • True Positive (TP): instances confirmed to be positive.

•
True Negative (TN): instances confirmed to be negative.
The metrics for binary classification are given by the following: Recall (or In this study, the performance of the model, built upon a hybrid feature pool, was examined across different classification scenarios.The aim was to assess its proficiency in distinguishing between various numbers of classes, ranging from binary classification to a more complex five-class scenario.The specific scenarios for each classification problem are detailed as follows:

•
Four-Class Classification: F-O-Z-S and N-O-Z-S.

Binary Classification
The proposed classification system exhibited an exceptional precision in classifying critical EEG states when assessing the model performance on the previously mentioned binary cases.As highlighted in Table 5, the model differentiates between the seizure activity (Set S) and various non-seizure states, including the eye-closed (Set O), eye-open (Set Z), and seizure-free states (Sets F and N), with remarkable accuracy, often achieving accuracy and F1-scores of 100%.Nevertheless, when classifying the F-S binary combination, the model accuracy decreased slightly, settling at 98.12%.The confusion matrices, which show the true versus predicted labels across these binary combinations, are illustrated in Figure 7.

Recall (or Sensitivity
core = 2 ×  ×   + In this study, the performance of the model, built upon a hybrid feature pool, was examined across different classification scenarios.The aim was to assess its proficiency in distinguishing between various numbers of classes, ranging from binary classification to a more complex five-class scenario.The specific scenarios for each classification problem are detailed as follows: • Binary Classification: N-S, Z-S, O-S, F-S, FN-S, FNZ-S, FNO-S, and NOZ-S.

•
Four-Class Classification: F-O-Z-S and N-O-Z-S.

Binary Classification
The proposed classification system exhibited an exceptional precision in classifying critical EEG states when assessing the model performance on the previously mentioned binary cases.As highlighted in Table 5, the model differentiates between the seizure activity (Set S) and various non-seizure states, including the eye-closed (Set O), eye-open (Set Z), and seizure-free states (Sets F and N), with remarkable accuracy, often achieving accuracy and F1-scores of 100%.Nevertheless, when classifying the F-S binary combination, the model accuracy decreased slightly, settling at 98.12%.The confusion matrices which show the true versus predicted labels across these binary combinations, are illustrated in Figure 7.

Three-Class Classification
After observing the promising results from the model performance for binary class problems, the tests were extended to multi-class problems, specifically F-O-S, N-Z-S, O-Z-S, and FN-OZ-S.The initial approach involved classifying three distinct categories: the normal state, characterized by patients with closed eyes (Class "O"); the interictal state, representing patients diagnosed with epilepsy but currently in a seizure-free state (Class F); and the ictal state, indicative of active seizures.The proposed epilepsy seizure detection architecture classified these three states, achieving 100% accuracy with no misclassifications, as shown in Figure 8a.Furthermore, another set of three-class classification problems, the N-Z-S classification problem, evaluated the model performance.The confusion matrix in Figure 8b shows that the model precision remained high, achieving an overall accuracy and sensitivity of 98.75% and 97.2%, respectively, for detecting seizures.This performance was consistent, with an F1-score and a precision rate of 98.76%.In the subsequent O-S-Z and FN-OZ-S classifications, the model sustained its robust performance, surpassing the accuracy and sensitivity of 96% and 98%, respectively, for seizure detection (Figure 8c,d).

Three-Class Classification
After observing the promising results from the model performance for binary problems, the tests were extended to multi-class problems, specifically F-O-S, N-Z-Z-S, and FN-OZ-S.The initial approach involved classifying three distinct categorie normal state, characterized by patients with closed eyes (Class "O"); the interictal representing patients diagnosed with epilepsy but currently in a seizure-free state F); and the ictal state, indicative of active seizures.The proposed epilepsy seizure tion architecture classified these three states, achieving 100% accuracy with no mis fications, as shown in Figure 8a.Furthermore, another set of three-class classifi problems, the N-Z-S classification problem, evaluated the model performance.Th fusion matrix in Figure 8b shows that the model precision remained high, achievi overall accuracy and sensitivity of 98.75% and 97.2%, respectively, for detecting sei This performance was consistent, with an F1-score and a precision rate of 98.76%.subsequent O-S-Z and FN-OZ-S classifications, the model sustained its robust p mance, surpassing the accuracy and sensitivity of 96% and 98%, respectively, for se detection (Figure 8c,d).Table 6 lists the comprehensive performance of the pro model for different three-class problems.

Five-Class Classification
Finally, the proposed model was evaluated for its ability to detect epileptic EEG samples within complex signals.The model's performance was evaluated using the Z-N-O-Z-S five-class problem.The confusion matrix shows that the model achieved promising results with an overall accuracy, F1-score, precision, and general sensitivity of 93.25%, 93.21%, 93.23%, and 93.25%, respectively, as shown in Figure 10.In particular, the model revealed a sensitivity of 100% in detecting the epileptic seizure signals with no false detection.The model also recorded a sensitivity of 95.00%, 91.56%, and 90% for class O, class N, and classes Z and F, respectively.In summary, these results confirm the reliable detection performance of the model across various scenarios, i.e., binary, three-class, fourclass, or even five-class problems.
Finally, the proposed model was evaluated for its ability to detect epileptic EEG samples within complex signals.The model's performance was evaluated using the Z-N-O-Z-S five-class problem.The confusion matrix shows that the model achieved promising results with an overall accuracy, F1-score, precision, and general sensitivity of 93.25%, 93.21%, 93.23%, and 93.25%, respectively, as shown in Figure 10.In particular, the model revealed a sensitivity of 100% in detecting the epileptic seizure signals with no false detection.The model also recorded a sensitivity of 95.00%, 91.56%, and 90% for class O, class N, and classes Z and F, respectively.In summary, these results confirm the reliable detection performance of the model across various scenarios, i.e., binary, three-class, four-class, or even five-class problems.

Discussion
After evaluating the model across various classification problems, ranging from binary to three-class, four-class, and even five-class scenarios, we observed that the proposed algorithm showed promising results in all these tasks.The enhanced performance of our epilepsy detection model is due to its hybrid architecture.This hybrid design leverages the autoencoder's feature distillation from high-dimensional data and the LSTM's sequential information processing.The integration of PCA retains key classification components, and merging these with statistical features creates a comprehensive feature set.This fusion effectively captures diverse signal characteristics, enhancing data classifiability.To assess the impact of concatenating statistical features with CAE (Convolutional Autoencoder) latent space features, we conducted an ablation study within a five-class classification framework.Table 8 illustrates the outcomes of training the LSTM network with distinct feature sets.When solely CAE latent space features were used, the LSTM achieved an accuracy of 89.50%, an F-1 score of 89.57%, a precision of 89.83%, and a sensitivity for the epileptic class of 91.78%.In contrast, training with only statistical features resulted in lower performance across all metrics, with an accuracy of 78.50%, an F-1 score of 78.60%, a precision of 79.17%, and a sensitivity for the epileptic class of 82.19%.However, the combination of both CAE latent space features and statistical features substantially improved the model's performance, elevating the accuracy to 93.25%, F-1 score to 93.21%, and precision to 93.23%, and achieving a perfect sensitivity for the epileptic class at 100%.This demonstrates that the integration of both feature types significantly enhances the LSTM network's ability to classify and detect epilepsy in a multi-class setting.The LSTM's proficiency in sequential data analysis further ensures accurate epilepsy detection across various scenarios.Overall, our approach sets a new standard in EEG data analysis for epilepsy detection.The performance of the proposed model was compared with existing approaches.Table 9 shows a comparison of the proposed model with some existing approaches.

Discussion
After evaluating the model across various classification problems, ranging from binary to three-class, four-class, and even five-class scenarios, we observed that the proposed algorithm showed promising results in all these tasks.The enhanced performance of our epilepsy detection model is due to its hybrid architecture.This hybrid design leverages the autoencoder's feature distillation from high-dimensional data and the LSTM's sequential information processing.The integration of PCA retains key classification components, and merging these with statistical features creates a comprehensive feature set.This fusion effectively captures diverse signal characteristics, enhancing data classifiability.To assess the impact of concatenating statistical features with CAE (Convolutional Autoencoder) latent space features, we conducted an ablation study within a five-class classification framework.Table 8 illustrates the outcomes of training the LSTM network with distinct feature sets.When solely CAE latent space features were used, the LSTM achieved an accuracy of 89.50%, an F-1 score of 89.57%, a precision of 89.83%, and a sensitivity for the epileptic class of 91.78%.In contrast, training with only statistical features resulted in lower performance across all metrics, with an accuracy of 78.50%, an F-1 score of 78.60%, a precision of 79.17%, and a sensitivity for the epileptic class of 82.19%.However, the combination of both CAE latent space features and statistical features substantially improved the model's performance, elevating the accuracy to 93.25%, F-1 score to 93.21%, and precision to 93.23%, and achieving a perfect sensitivity for the epileptic class at 100%.This demonstrates that the integration of both feature types significantly enhances the LSTM network's ability to classify and detect epilepsy in a multi-class setting.The LSTM's proficiency in sequential data analysis further ensures accurate epilepsy detection across various scenarios.Overall, our approach sets a new standard in EEG data analysis for epilepsy detection.The performance of the proposed model was compared with existing approaches.Table 9 shows a comparison of the proposed model with some existing approaches.

Conclusions
This paper introduced an advanced intelligent EEG recognition framework for epileptic seizure detection.This framework integrates deep autoencoders, statistical features, and LSTM networks.An optimal overlapping windowing method was used to mitigate the inherent spectral leakage.Subsequently, the CWT was used to produce time-frequency images from each window.Simultaneously, the statistical attributes, such as mean, mode, and standard deviation, were extracted during this wavelet transformation.A deep convolutional autoencoder (CAE) was trained to extract the essential features from the CWT images.The latent space of this CAE, rich with features, was then refined using PCA and concatenated with the statistical features, forming a comprehensive hybrid feature pool.This enhanced pool was processed through LSTM-based classification, addressing multiple class problems.
The model demonstrated exceptional F-1 score, precision, and accuracy.In most cases, it exhibited error-free classification in binary class problems, while in three-and four-class problems, it exhibited over 95% and 93% accuracy, respectively.The model sensitivity metrics are equally notable, scoring 100% for binary and some three-class situations, maintaining over 97% for all three-class problems, and >94% for four-class problems.Averaging across all classifications, this model achieved an accuracy exceeding 97%, highlighting its stability and validating its ability to detect epileptic events accurately within complex signal scenarios.
five-class classification challenges, particularly fine-tuned for the Bonn Epilepsy dataset.

Figure 1 .
Figure 1.Overview of the proposed epileptic seizure detection model.

Figure 1 .
Figure 1.Overview of the proposed epileptic seizure detection model.

Figure 3 .
Figure 3. CWT images of each class.

Figure 3 .
Figure 3. CWT images of each class.

Figure 5 .
Figure 5.Long short-term memory unit architecture.

Figure 5 .
Figure 5.Long short-term memory unit architecture.

Table 1 .
Summary of the autoencoder architecture.

Table 1 .
Summary of the autoencoder architecture.

Table 2 .
Statistical features and their mathematical expressions.

Table 4 .
Overview of EEG Bonn EEG dataset of University of Bonn, Germany.

Table 4 .
Overview of EEG Bonn EEG dataset of University of Bonn, Germany.

Table 5 .
Performance metrics for binary classification.

Table 5 .
Performance metrics for binary classification.

Table 6
lists the comprehensive performance of the proposed model for different three-class problems.

Table 6 .
Performance metrics for the three-class classification.

Table 6 .
Performance metrics for the three-class classification.

Table 9 .
Comparison with some existing approaches.