Automatic Diagnosis of Epileptic Seizures in EEG Signals Using Fractal Dimension Features and Convolutional Autoencoder Method

: This paper proposes a new method for epileptic seizure detection in electroencephalography (EEG) signals using nonlinear features based on fractal dimension (FD) and a deep learning (DL) model. Firstly, Bonn and Freiburg datasets were used to perform experiments. The Bonn dataset consists of binary and multi-class classiﬁcation problems, and the Freiburg dataset consists of two-class EEG classiﬁcation problems. In the preprocessing step, all datasets were prepossessed using a Butterworth band pass ﬁlter with 0.5–60 Hz cut-off frequency. Then, the EEG signals of the datasets were segmented into different time windows. In this section, dual-tree complex wavelet transform (DT-CWT) was used to decompose the EEG signals into the different sub-bands. In the following section, in order to feature extraction, various FD techniques were used, including Higuchi (HFD), Katz (KFD), Petrosian (PFD), Hurst exponent (HE), detrended ﬂuctuation analysis (DFA), Sevcik, box counting (BC), multiresolution box-counting (MBC), Margaos-Sun (MSFD), multifractal DFA (MF-DFA), and recurrence quantiﬁcation analysis (RQA). In the next step, the minimum redundancy maximum relevance (mRMR) technique was used for feature selection. Finally, the k-nearest neighbors (KNN), support vector machine (SVM), and convolutional autoencoder (CNN-AE) were used for the classiﬁcation step. In the classiﬁcation step, the K-fold cross-validation with k = 10 was employed to demonstrate the effectiveness of the classiﬁer methods. The experiment results show that the proposed CNN-AE method achieved an accuracy of 99.736% and 99.176% for the Bonn and Freiburg datasets, respectively.


Introduction
Epilepsy is a neurological disorder that has threatened many lives [1,2]. If diagnosed late, the epileptic seizure will cause irreparable damages to the patient [1,2]. Generally, epileptic seizures are divided into three groups of general [3], focal [4], and epilepsy with unknown symptoms [5]. According to the World Health Organization (WHO), more than 60 million individuals suffer from different types of epileptic seizures, and their lives are threatened with severe health issues [6,7].
Neuroimaging modalities are among the essential techniques for epileptic seizure detection, including functional and structural techniques [8,9]. The analysis of the anatomy and anatomical connections of various parts of the brain is performed using structural neuroimaging modalities, such as structural magnetic resonance imaging (sMRI) and diffusion tensor imaging (DTI) [10]. On the other hand, by using functional neuroimaging modalities, specialists can investigate the activities and functional connectivity of the The DT-CWT was first proposed by Kingsbury [33]. The DT-CWT has two parallel structures for obtaining real and imaginary parts of EEG signals. As compared to DWT, DT-CWT has several advantages, such as approximate shift immutability [34,35]. This helps to extract meaningful frequency information from each EEG signal [36]. The DT-CWT conversion is used in this paper with five decomposition levels, and a variety of fractal-based features are extracted from each decomposition level.
In the feature extraction step, various nonlinear features based on FDs are extracted from DT-CWT sub-bands, including HE [37], PFD [38], KFD [38], HFD [38], BC [39], MBC [39], Sevcik [40], DFA [41], RQA [42], MSFD [43], and MF-DFA [44]. FD features contain crucial information about EEG signals, and for this purpose they were used in the present The DT-CWT was first proposed by Kingsbury [33]. The DT-CWT has two parallel structures for obtaining real and imaginary parts of EEG signals. As compared to DWT, DT-CWT has several advantages, such as approximate shift immutability [34,35]. This helps to extract meaningful frequency information from each EEG signal [36]. The DT-CWT conversion is used in this paper with five decomposition levels, and a variety of fractal-based features are extracted from each decomposition level.
In the feature extraction step, various nonlinear features based on FDs are extracted from DT-CWT sub-bands, including HE [37], PFD [38], KFD [38], HFD [38], BC [39], MBC [39], Sevcik [40], DFA [41], RQA [42], MSFD [43], and MF-DFA [44]. FD features contain crucial information about EEG signals, and for this purpose they were used in the present study [45]. This research is the first to combine these feature extraction methods, highlighting the significance of this article. In the feature selection step, the mRMR method [46] is employed. In this section, the proposed method selected 25 features. Ultimately, the KNN [47], SVM [48,49], and CNN-AE methods are employed for classification. The CNN-AE classifier method with a proposed number of layers is used for the first time in this paper.
Other sections of the article are presented as follows: Section 2 explores previous research carried out on epileptic seizure detection using the FD techniques. The proposed method for epileptic seizure detection is presented in Section 3. The experiment results are presented in Section 4. Section 5 presents statistical parameters for classification performance. Then, the discussion is presented in Section 6. Finally, conclusions and future works are presented in Section 7.

Related Works
This section presents a literature review of previous investigations into the diagnosis of epileptic seizures via EEG signals and FD feature extraction techniques. The first study by Wijayanto et al. [50] proposed an automated epileptic seizure detection based on the FD theory. First, the EEG signals were decomposed into five sub-bands. Afterward, the KFD was employed for feature extraction. Ultimately, the SVM method was employed for the classification of features. The results of the research showed that they could obtain favorable results.
El-Kishky et al. [51] employed different features for epileptic seizure detection. In this research, the EEG signals were first preprocessed. Afterward, various features, including HE, HFD, and Shannon entropy, were extracted from the EEG signals.
In another study, Jacob et al. [52] employed discrete wavelet transform (DWT) in the preprocessing step. Then, KFD and HFD features were extracted from DWT sub-bands. Finally, the SVM classifier was employed for classification.
Yuan et al. [51] proposed an intelligent method of epileptic seizure detection using EEG signals. In their proposed method, different techniques were employed in the preprocessing step. Afterward, differential box-counting (DBC) was investigated for feature extraction in EEG signals. They employed the SLFN method in the classification step and obtained favorable results.
In [52], the proposed method of epileptic seizure detection is implemented on the Freiburg dataset. In this study, the noises of the EEG signals were eliminated and then decomposed into time periods. Afterward, the FDs and gradient boosting methods were employed in the feature extraction, and classification steps, respectively.
Yang et al. [53] employed a new feature extraction method based on the FD theory for diagnosis of epileptic seizures using EEG signals. The dataset they employed for carrying out the experiments was of clinical type. The GFD feature extraction method was employed in this study. Following this, the run covering and SVM methods were employed in the feature selection and classification steps, respectively.
Moctezuma et al. [54] employed the CHB-MIT dataset to implement their proposed epileptic seizure diagnosis method. In this research, decomposition and preprocessing of EEG signals into different sub-bands was carried out using the empirical mode decomposition (EMD) method. Next, energy distribution and FD features were extracted from each EMD sub-band. Following this, K-means clustering and SVM methods were employed in the feature selection and classification steps, respectively.
In [55], an epileptic seizure diagnosis method via EEG signals based on the GFD technique was proposed. The Bonn dataset was employed to carry out experiments. In this research, the noises of the EEG signals were first eliminated in the preprocessing step and then decomposed into different sub-bands using DWT. The GFD method and the Renyi entropy were extracted from DWT sub-bands in the feature extraction step. Following this, the analysis of variance (ANOVA) test with the box plot method was employed in the feature selection method.
In [56], a method of epileptic seizure diagnosis using different FD methods was proposed. Different methods such as noise elimination, windowing of signals, and finally decomposition of EEG signals into different sub-bands were implemented in the preprocessing step. Next, the HFD, KFD, and Sevcik features were extracted from DWT sub-bands. Finally, the SVM was used for classification.
Sikdar et al. [57] employed the DWT to analyze the EEG signals of the Bonn dataset into different sub-bands. They investigated the MFDFA method for feature extraction. Finally, the statistical analysis and SVM methods were employed in the feature selection and classification steps, respectively.
Dalal et al. [58] used the FAWT method in the preprocessing step of the EEG signals. Then, they used the HFD and Kruskal-Wallis statistical test methods in the feature extraction and feature selection steps. Then, the SVM method was employed in the classification step.
This section introduced several articles on the diagnosis of epileptic seizures via EEG signals using the fractal technique. Other articles are introduced in Table 1.

Material and Methods
In this section, the steps of the proposed method are presented. First, the EEG dataset for epileptic seizures detection is introduced. In the second step, the preprocessing of EEG signals is conducted according to the DT-CWT method. In the third step, various feature extraction techniques based on FD are introduced. The fourth step is pertinent to the feature selection technique. Finally, various types of classifier algorithms are presented in step five.

The Bonn Dataset
The Bonn dataset consists of five sets of D, C, B, A, and E, and each one includes 100 single-channel signals with a time duration of 23.6 s [29]. All EEG signals from this dataset are recorded in the form of 12-bit with a sampling frequency of 173.61 Hz. In this dataset, the A and B sets of the EEG signals of five individuals are normal, where the A set of EEG signals is pertinent to the case of eyes open and the B set is pertinent to the case of eyes closed. The C and D sets include the EEG signals of five patients at the pre-ictal time [24]. Ultimately, the E set includes the seizure time of five patients with epilepsy with ictal time. In this dataset, the EEG signals of the A and B sets are recorded from the scalp, and the D, C, and E sets are recorded in an invasive way. Invasive electrodes are inserted into the hippocampal formation site symmetrically. All EEG signals are recorded by a 128-channel amplifier using a common, normal reference [29]. Further information regarding the dataset is presented in Tables 2 and 3. In addition, Figure 2 shows a sample of five sets of the Bonn dataset.  In this dataset, the Intracranial EEG (IEEG) signals were recorded in the Freiburg hospital, Germany, from 21 patients with focal epilepsy [30]. All IEEG signals from this dataset are recorded with a sampling frequency of 256 Hz and strip, depth, and grid electrodes [25]. The age range of the patients in this dataset was between 10 and 50 years, and thirteen individuals were female, and eight individuals were male. In this dataset, there are different pre-ictal, ictal, and inter-ictal times of patients with epileptic seizures. Three different types of seizures are reported among the patients, including complex partial (CP), generalized tonic-clonic (GTC), and simple partial (SP) [30]. Each one of the patients has experienced at least two types of these seizures. Further details regarding this dataset are provided in Table 4 [30]. Figure 3 shows a section of the EEG signals from the Freiburg dataset.

The Freiburg Dataset
In this dataset, the Intracranial EEG (IEEG) signals were recorded in the Freiburg hospital, Germany, from 21 patients with focal epilepsy [30]. All IEEG signals from this dataset are recorded with a sampling frequency of 256 Hz and strip, depth, and grid electrodes [25]. The age range of the patients in this dataset was between 10 and 50 years, and thirteen individuals were female, and eight individuals were male. In this dataset, there are different pre-ictal, ictal, and inter-ictal times of patients with epileptic seizures. Three different types of seizures are reported among the patients, including complex partial (CP), generalized tonic-clonic (GTC), and simple partial (SP) [30]. Each one of the patients has experienced at least two types of these seizures. Further details regarding this dataset are provided in Table 4 [30]. Figure 3 shows a section of the EEG signals from the Freiburg dataset.   are different pre-ictal, ictal, and inter-ictal times of patients with epileptic seizures. Three different types of seizures are reported among the patients, including complex partial (CP), generalized tonic-clonic (GTC), and simple partial (SP) [30]. Each one of the patients has experienced at least two types of these seizures. Further details regarding this dataset are provided in Table 4 [30]. Figure 3 shows a section of the EEG signals from the Freiburg dataset.

Preprocessing
This DT-CWT was first proposed by Kingsbury [33] and then improved by Ivan [34][35][36], in which two wavelet filter trees are used to obtain the wavelet coefficients corresponding to the real and imaginary parts of the wavelet [33]. Figure 4 shows the DT-CWT structural. As shown in Figure 4, h 0 and h 1 show low and high-pass filters for real coefficients. Furthermore, g 0 and g 1 represent low and high-pass filters for imaginary coefficients. In addition, (↓2) indicates a decrease in the sampling rate [33].  3  3  14  Male  Temporal  SP, CP  5  4  26  Female  Temporal  SP, CP, GTC  5  5  16  Female  Frontal  SP, CP, GTC  5  6  31  Female  Temporal  CP, GTC  3  7  42  Female  Temporal  SP, CP, GTC  3  8  32  Female  Temporal  SP, CP  2  9 44

Preprocessing
This DT-CWT was first proposed by Kingsbury [33] and then improved by Ivan [34][35][36], in which two wavelet filter trees are used to obtain the wavelet coefficients corresponding to the real and imaginary parts of the wavelet [33]. Figure 4 shows the DT-CWT structural. As shown in Figure 4, ℎ 0 and ℎ 1 show low and high-pass filters for real coefficients. Furthermore, 0 and 1 represent low and high-pass filters for imaginary coefficients. In addition, (↓2) indicates a decrease in the sampling rate [33]. Here, the first and second filter banks are represented by symbols ( ) and ′ ( ), respectively, and the complex wavelet base is defined as (1) [33]: (1) Here, the first and second filter banks are represented by symbols ψ(t) and ψ (t), respectively, and the complex wavelet base is defined as (1) [33]: The filter bank ψ(t) is defined as follows [33]: where

of 31
Equations are ψ (t) also similar to Equations (1) and (2), where parameters h 1 and h 0 are substituted with g 1 and g 0 , respectively. Ideally, ψ (t) is equal to Hilbert transform ψ(t) as follows [33]: An important step in establishing Equation (4) is that g 0 is equal to half the delayed sample of h 0 (n) [33].
In addition, in the case of orthogonal wavelet, Equation (4) is satisfied if [33]: Therefore, ideally, high-pass filters are defined as follows [33]: For accurate DT-CWT running, the first step of the filter bank entails particular precision, otherwise frequency responses will not be accurately analyzed. Suppose the k th step of the first filter bank that terminates with H 1 e jw . If H (k) e jw is the equivalent response, then we can apply new identities for k > 1, as represented as follows [33]: If the equivalent response of the second filter bank of the corresponding branch, which is obtained by substituting H i with H i , is shown with H (k) e jw , then we have [33]: Assuming pairs {h 0 , h 1 }, {g 0 , g 1 } satisfy Equations (6) and (7), it follows that for all k ≥ 1, H (k) e jw and H (k) e jw estimate the following equation [33]: That is equal to [33]: new e jw , respectively, then we can conclude: We have pairs of {h 0 , h 1 }, {g 0 , g 1 } which satisfy Equations (6) and (7) for k > 1 [33]: If and only if [33]: More information regarding DT-CWT is presented in [33]. In this study, we employed the Kingsbury toolbox, which covers various filters including Antonini (An), Q-shift (QS), LeGall (LeG), and near-symmetric (NS) filters [33]. Reference [70] shows that NS 13/19 and QS 14/14 filters give more successful results than other filters. Therefore, we performed our simulations using these filters. We considered five sub-bands because of the five important cerebral frequencies for epileptic seizures detection, including delta, theta, alpha, beta, and gamma waves. Therefore, each EEG signal in the datasets were segmented into five sub-bands, and then FDs features were extracted from each sub-band. Figure 5 shows the five frequency sub-bands of an EEG signal using DT-CWT with NS 13/19 and QS 14/14 filters.
More information regarding DT-CWT is presented in [33]. In this study, we employed the Kingsbury toolbox, which covers various filters including Antonini (An), Qshift (QS), LeGall (LeG), and near-symmetric (NS) filters [33]. Reference [70] shows that NS 13/19 and QS 14/14 filters give more successful results than other filters. Therefore, we performed our simulations using these filters. We considered five sub-bands because of the five important cerebral frequencies for epileptic seizures detection, including delta, theta, alpha, beta, and gamma waves. Therefore, each EEG signal in the datasets were segmented into five sub-bands, and then FDs features were extracted from each sub-band. Figure 5 shows the five frequency sub-bands of an EEG signal using DT-CWT with NS 13/19 and QS 14/14 filters.

Delta Sub-band
Theta Sub-band Alpha Sub-band

Feature Extraction
Feature extraction is accounted for as the essential step of epileptic seizure diagnosis. Various nonlinear features are introduced in this section. These features are mostly based on the FD theories.

Hurst Exponent
The HE is employed to calculate the correlation in time series, including EEG signals [37]. HE is widely used to assess whether there is a long-term correlation in a time series. According to the interesting features of HE, it is employed for feature extraction from EEG signals [37]. The HE relationship can be expressed as follows [37]: where R is the difference between the maximum and minimum deviation and is pertinent to the mean. S indicates the standard deviation of the time series and T indicates the time duration of data [37].

Higuchi
The FD refers to the fractional or non-integer dimension of a geometric object. This dimension is a relative criterion of the number of structural blocks forming a pattern [38]. This feature is useful in different fields of the medical sciences [38]. Consider the

Feature Extraction
Feature extraction is accounted for as the essential step of epileptic seizure diagnosis. Various nonlinear features are introduced in this section. These features are mostly based on the FD theories.

Hurst Exponent
The HE is employed to calculate the correlation in time series, including EEG signals [37]. HE is widely used to assess whether there is a long-term correlation in a time series. According to the interesting features of HE, it is employed for feature extraction from EEG signals [37]. The HE relationship can be expressed as follows [37]: where R is the difference between the maximum and minimum deviation and is pertinent to the mean. S indicates the standard deviation of the time series and T indicates the time duration of data [37].

Higuchi
The FD refers to the fractional or non-integer dimension of a geometric object. This dimension is a relative criterion of the number of structural blocks forming a pattern [38]. This feature is useful in different fields of the medical sciences [38]. Consider the x(1), (2), . . . , N time series. Consider x k m Equation that expresses k number of time series: In the above equation, m is the initial time, k is a delay, and a is the integer part of a. Mean duration of L m (k) for each time series is defined as follows [38]: In the above equation, N and are the duration of time series x and normalization factor, respectively. In Equation (18), the sum of mean durations L (k) for each k display is indicated [38].

Katz
KFD is one of the FD techniques. In the KFD algorithm, the FD is calculated according to Equation (19) [38].
where L Is the length of the whole curve or the sum of distances between consecutive points in the time series, and d is the estimated diameter that is equal to the distance between the first point of the sequence and a point in the sequence. Mathematically, L and d are calculated as follows [38]: where i will be equal to the point with a maximum distance from 1. The FD calculated by this method depends on the measurement unit. If the measurement unit changes, the calculated FD will change as well. The KFD method solved this problem using a general unit. L and d are normalized using the mean distance between consecutive points of a. Therefore, Equation (19) can be expressed as follows [38]: The final Kartz equation can be expressed as follows by considering n = L /a [38]:

Petrosian
PFD is a quick method of estimating FD [38]. The PFD method is expressed as Equation (24) where n is the length of the sequence and N∆ is the number of changes in generated binary sequence [38].

Sevcik
The Sevcik algorithm is based on the Hausdorff dimension D h , defined as follows [39,40]: In the equation above, N(ε) is the number of balls with ε radius. By defining N(ε) = 1 2ε , the equation is written as follows [39,40]: In this method, the signals are normalized in two dimensions, which is 1 square meter. The normalized value of x * and ordinate y * are as follows [39,40]: The square unit can be imagined as a network of N*N cells. The final equation is indicated in Equation (30) [39,40].

Box Counting
The BC method is one of the most important fractal methods, playing a crucial part in medical signals analysis [39]. In this method, the EEG signals are covered with a number of boxes, and their number is counted with a specific size to check how many are required for complete coverage. The mathematical definition is provided as follows [39]: In this equation, N(r) is the number of total boxes with r size. The counting box algorithm estimates the curve by the number of boxes required to cover the multiple box curve [39]. Thus, where C is a constant, the lowest square of the best straight line is considered as the DB box-counting dimension estimation of the curve [39].

Multiresolution Box-Counting
Assume a discrete-time signal s = {s(1), s(2), . . . , s(N)} with the sampling frequency of f s with N samples. Each sample point s(i) is in the form of i = 1, . . . , N(X(i), y(i)). The steps of this method are defined in the following [39].
Step 1: Consider two S(i) and S(i + 1) points indicating EEG signals with the interval of dt = x(i + 1) − x(i) = 1 f s , whose points height is in the form of h = y(i + 1) − y(i). This method is repeated for all points on the EEG signal. In Equation (33), the total number of required boxes are provided to cover EEG signal with r resolution [39].
Step 2: Each intermittent point is placed on the signal to obtain the temporal resolution of r = 2 f s . It is repeated in this temporal resolution, and the total number of required boxes is calculated to cover the entire curve [39].
Step 3: We receive the number of boxes to cover the curve by repeating the steps above for multiple temporal resolutions. For r = 1 f s , 2 f s , . . . , R f s that R f s the maximum temporal resolution is true where the investigated curve is placed [39].
Step 4: The linear regression coefficient of curve log(B(r)) versus log 1 r is considered an EEG signal fractal dimension estimation [39].

DFA
In order to extract DFA, the method proposed by Peng et al. is employed [41]. Consider { x(i), i = 1, 2, . . . , N} time series. {y(k)} Time series can be obtained according to Equation (34) using the difference sum of {x(i)} time series from its mean [41].
In this equation, <x> is the mean of {x(i)} time series [41].
In the next step, the obtained {y(k)} time series is divided into N n = int(N/n) identical windows with the length of n, and each window is a line on the point sets and is picked using the least square error method. Afterward, mean squared error function F(n) for whole time series {y(k)} and window length n is obtained according to Equation (35) [41].
In this equation, y n (k) is the time series obtained from the line in each window [41].

Recurrent Quantitative Analysis
Using these recurrent maps provides the possibility of not only reviewing but also quantifying the hidden pattern and finding structural changes and the similarities among various patterns in data [42]. Known as RQA, this ability is a nonlinear method of processing time series, capable of quantifying the repeat number and range in a dynamic system, thus, creating its trajectory in the phase space [42]. This method's various criteria are provided in the following [42].
(1) Recurrent Rate Recurrent Rate (RR) specifies the percentage of recurrent points in a map pertinent to the total correlation defined as follows [42]: (2) DET It defines the percentage of existing recurrent points of diagonal lines to the total recurrent points. The diagonal lines consist of two or multiple points that are neighbors in the diagonal form without any other distance [42].
(5) LAM Index It defines the percentage of recurrent points creating the perpendicular (vertical) lines [42].

(6) Trap Time (TT)
It defines the average length of vertical points [42].

Margaos-Sun Fractal Dimension (MSFD)
The MSFD is one of the other fractal methods used for extracting the features from signals or time series [43]. Similar to the aforementioned fractal methods, this method is of paramount importance in multi-dimensional signals analysis. In this method, the EEG signals of both datasets are segmented with different time length windows, and afterward, MSFD is applied to them. Further explanations regarding this method are provided in [43].

MFDFA
In this section, the MFDFA algorithm for the time series x k signal of length N is presented. This technique consists of five steps, which are described below [44].
In the first step, the cumulative deviation relation x k for y(i) is calculated using the Equation (42) [44]. where In Equation (43), the signal x k converts to the time series y(i) [44].
In the second step, y(i) is divided into Ns non-overlapping segments whose length is equal to s (Ns ≡ int (N/s)) [44].
In the third step, the least-squares for the polynomial series of order m are used to calculate distances v (v = 1,2, . . . , Ns) [44].
For each segment v, v = 1, . . . , Ns and The effects of noise interference as well as non-static processes are eliminated at this stage. This leads to obtaining multi-fractal properties [44].
In the fourth step, the oscillation function of order qth is calculated for each scale s. This function has a qth RMS order [44].
Finally, the comparison powers of the log-log diagrams of Fq (s) are calculated for each q in the fifth step. The relation of this section is defined as follows [44]: In this relation, h(q) is generalized Hurst power. The generalized HE is as follows [44]: Note that a multi-fractal signal has multiple HEs. Moreover, the parameter τ(q) has a nonlinear relation with q [44].
In addition, it is proved that the singularity spectrum f (α) is related to h (q), which is shown in Equations (49) and (50).
The spectrum can be specified using a quadratic function with minimum squares. It is important to note that the quadratic function in question must lie in the vicinity of α 0 .
In this relation, C is an additive constant, and B is a measure of spectral asymmetry. Finally, the spectrum width is defined as follows [44]: with f (α 1 ) = f (α 1 ) = 0.
As indicated by width W, the spectrum in this relation is multi-fractal [44].

Information on Number of Features
This section describes the output values for each feature extraction technique. As discussed earlier, the DT-CWT technique was employed to decompose EEG signals into five sub-bands alpha, beta, gamma, theta, and delta. Then, a total number of 11 features were extracted from each DT-CWT sub-band, including MFDFA, MSFD, MBC, BC, Sevcik, DFA, HE, PFD, KFD, HFD, and RQA, each with its own output values. Additionally, some features had only one output and others possessed multiple outputs. The MSFD, Sevcik, DFA, HE, PFD, KFD, HFD, BC, and MRBC have a single output and MFDFA and RQA have multi outputs. The RQA method has six outputs and the MFDFA has different outputs. For the Bonn dataset with time windows of 5 s, the MFDFA has 51 outputs. Additionally, the MFDFA has 55 outputs for the Freiburg dataset with time windows of 4 s.
For the Bonn dataset, each EEG signal was segmented into time windows of 5 s. Time windows of 5 s have given the greatest accuracy in epileptic seizure detection according to [71]. For the Bonn dataset, the number of segments is equal to 2000, and 330 features (11 FDs) were extracted from each frame. As reported in [51], each EEG signal in the Freiburg dataset was segmented into time windows of 4 s. The number of features extracted for each frame of the Freiburg dataset was 350. Table 4 shows that the Freiburg dataset includes multiple subjects with various data lengths.

Feature Selection Using mRMR Method
mRMR is a feature selection method proposed by Peng et al. [46]. This method selects a group of features whose mutual information is minimal and with the output (class label) as the maximum value [46]. The mutual information of two features, x, and y, is defined as Equation (53), where p(x), p(y), and p(x, y) are the probability density functions (PDF) [46].
The maximum correlation or dependence between the features and class label is calculated via the formula as follows [46]: where S is the set of features and c is the class label. In addition, the minimum redundancy among the features is calculated via the formula as follows [46]: In the end, the mRMR equation is calculated as follows [46]:

Classification Methods
The proposed classification algorithms are examined in this section. In this study, the proposed classification methods include KNN [47], SVM [48,49], and the CNN-AE method. The novelty of this section is to employ the CNN-AE method with the number of proposed layers. An explanation of each method is provided in the following section.

Support Vector Machine
The SVM method is one of the well-established, popular, and best classification methods [48,49]. This method is widely used in various 2-class and multiclass classification problems with significantly high performance. In the SVM method, the 2-class or multiclass classification is performed using various kernels such as linear and RBF [48,49]. In this article, the RBF kernel is used to classify binary-and multiclass problems.

KNN
The KNN method, also used as a classification method, operates such that it finds the nearest neighbor for each data (according to the Euclidean distance) [47]. Afterward, it selects the K-nearest neighbor, specifying the data label based on the label of the majority. This method operates significantly slowly. More information on this method is provided in [47]. In this article, K is considered to be three.

Convolutional Autoencoder (CNN-AE)
Nowadays, DL methods have a special role in various medical applications, and numerous investigations are being conducted in this field [72][73][74][75]. Various DL models have been provided so far, the combined CNN-AE architecture is among them. Unsupervised learning-based CNN-AE is used for input data training [76,77]. This architecture has two parts, decoder and encoder, each of which uses convolutional layers [76,77]. In this method, first, data are entered into encoder layers, received in the output data in compressed form. Then, the encoder section data is received in the decoder section, and the data recovery operation is performed. In this study, a CNN-AE model together with the number of proposed layers is employed. The proposed method is shown in detail in Table 5. A block diagram of the proposed technique is shown in Figure 6, where the parameters of each layer in the Bonn dataset are illustrated for more simplicity. EEG signals of each dataset are first applied to the network's inputs. As shown in Figure 6, EEG signals pass through the encoder layers allowing them to extract the features of interest. The input EEG signals are reconstructed in decoder layers. The number of features extracted are presented in Figure 6 and Table 5. As illustrated in Figure 6, handcrafted features are merged with features of the CNN-AE model upon feature reduction via mRMR. Here, 25 handcrafted features are merged with features of the CNN-AE model within the concatenate block. Thereupon, data were classified into three fully connected (FC) layers, with the first two layers having 256 neurons and the third layer incorporating the Softmax activation function. The steps used to extract features and classify data for the Freiburg dataset are similar to those used on the Bonn dataset. presented in Figure 6 and Table 5. As illustrated in Figure 6, handcrafted features are merged with features of the CNN-AE model upon feature reduction via mRMR. Here, 25 handcrafted features are merged with features of the CNN-AE model within the concatenate block. Thereupon, data were classified into three fully connected (FC) layers, with the first two layers having 256 neurons and the third layer incorporating the Softmax activation function. The steps used to extract features and classify data for the Freiburg dataset are similar to those used on the Bonn dataset.

Statistical Parameters for Classification Performance
The assessment parameters have a crucial role in demonstrating the performance of classification algorithms. In this paper, accuracy (Acc), precision (Prec), specificity (Spec), sensitivity (Sens), and F1_score statistical metrics are used [71]. The pertinent equations of each have been illustrated below. In addition, the K-fold cross validation with k = 10 is used for evaluation [71].

Experiment Results
In this section, we present the results of the proposed method. Simulations were performed using a hardware system with 16 GB RAM, GPU Nvidia 1070, and Core i7 processor. In addition, Matlab 2019a software has been used to implement preprocessing, feature extraction, and feature selection. Classification algorithms including KNN, SVM, and CNN-AE have been implemented using Python language and Scikit-Learn [78] and Keras toolboxes [79].
The proposed method of this article includes four sections: preprocessing, feature extraction, feature selection, and classification. In the preprocessing step, firstly, we used a Butterworth band-pass filter with 0.5-60 Hz cut-off frequency for removal of EEG artifacts. Then, the EEG signals of the datasets were segmented into different time windows. In this step, the EEG window length for the Bonn dataset is 5 s, and for the Freiburg dataset is 4 s. The process of choosing time frames is of utmost importance for CADS efficiency, as it largely influences the precise processing of EEG signals for reaching the best detection accuracy. Similar to [71], signals were segmented into time windows of 5 s in the Bonn dataset. Likewise, the EEG signals in the Freiburg dataset were segmented into time windows of 4 s, similar to the method reported in [51]. Additionally, the DT-CWT is used for EEG signal decomposition into different sub-bands.
The execution time of feature extraction algorithms is of great importance. In practical applications, the execution time of each feature extraction algorithm plays a crucial role in terms of hardware implementation. We examine the execution time of each feature extraction algorithm in this part of the experiment. Accordingly, in Table 6, the execution time of each feature extraction algorithm is shown in the Bonn and Freiburg datasets. The values in Table 6 are in seconds. As evidenced, MSFD has the lowest execution time, whereas MFDFA has a greater execution time. The third section is pertinent to feature selection. In this section, the mRMR method is employed for feature selection.
In the fourth section, various classification algorithms are used. These methods include KNN, SVM, and CNN-AE. The SVM classification algorithm was chosen with radial basis function (RBF) kernel and KNN method with k = 3. The CNN-AE method is a further novelty of the present study.
In this paper, the CNN-AE proposed model consists of two inputs. In the first input, the raw EEG signals of the datasets are applied to the proposed CNN-AE model. Then, CNN-AE features are combined with FD features. Results show that combining the handcrafted features with the CNN-AE features increases the CADS efficiency in epileptic seizure detection from EEG signals. Furthermore, in this part of the experiment, the K-Fold cross validation with k = 10 is used. In addition, to present logical results, the classifier algorithms have been run ten times under the same conditions, and their average results have been presented in tables. In this section, the results of the proposed method for the Bonn and Freiburg datasets are presented. Table 7 provides the results of the proposed method without considering the mRMR feature selection algorithm for the Bonn dataset. Table 8 presents the the results of using the mRMR feature selection algorithm for the Bonn dataset.   Table 7 shows that the proposed CNN-AE method has achieved satisfactory results. Compared to other classifier algorithms, the proposed method for binary and multi-class problems' classification of the Bonn datasets has higher efficiency. In Table 8, we present the results of the proposed method with the mRMR method for the Bonn dataset. As shown in Table 8, the use of the mRMR method alongside classifier algorithms led to higher diagnostic accuracy. In this section, we observe that the performance of KNN, SVM and CNN-AE algorithms has been significantly improved. Figure 7 shows the results of the classifier algorithms with the mRMR feature selection method for the  Figure 7 shows that using the mRMR method has increased the efficiency of classifier algorithms in epileptic seizure detection. Table 9 shows the results of the proposed method for the Freiburg dataset. As shown in Table 9, the CNN-AE algorithm has been able to achieve the highest accuracy compared to other classifier algorithms when using mRMR.

Discussion
This study provides a CADS for epileptic seizure detection via EEG signals using FD features and the proposed CNN-AE method. The proposed CADS include four sections: preprocessing, feature extraction, feature selection, and DL-based classification. First, the EEG signals of both datasets were decomposed into different time windows. Afterward, the DT-CWT technique was used for EEG signal decomposition into different sub-bands. In addition, the NS 13/19 and QS 14/14 filters were used for simulation of the DT-CWT.   Figure 8 shows the accuracy of various classifier algorithms with or without the mRMR method. As demonstrated in Tables 7-9, the proposed CNN-AE model has the highest detection accuracy for the Bonn and Freiburg datasets.

Discussion
This study provides a CADS for epileptic seizure detection via EEG signals using FD features and the proposed CNN-AE method. The proposed CADS include four sections: preprocessing, feature extraction, feature selection, and DL-based classification. First, the

Discussion
This study provides a CADS for epileptic seizure detection via EEG signals using FD features and the proposed CNN-AE method. The proposed CADS include four sections: preprocessing, feature extraction, feature selection, and DL-based classification. First, the EEG signals of both datasets were decomposed into different time windows. Afterward, the DT-CWT technique was used for EEG signal decomposition into different sub-bands. In addition, the NS 13/19 and QS 14/14 filters were used for simulation of the DT-CWT.
In the following step, in order to reduce the size of the feature matrix, the mRMR method is used. In this step, the proposed feature selection method extracted 25 features.
In the final step, various machine learning (ML) and DL algorithms are used to classify the input data. The classifier methods consist of KNN, SVM, and CNN-AE methods. The CNN-AE method with proposed layers is another novelty of the paper. As previously mentioned, this architecture has two inputs, one of them is for applying raw EEG signals to the model, and the other input is for applying FD features. In the classification section, in order to calculate valid results, K-Fold with k = 10 is used. The CNN-AE architecture of this study is a feature fusion method since the FD-based handcrafted features and DL features are combined in this method to increase the accuracy of epileptic seizure detection. For the Bonn dataset, the proposed method achieved 99.736% accuracy for two-class classification problems and 99.594% accuracy for three-class classification problem. In addition, the proposed method on the Freiburg dataset achieved 99.176% accuracy. The experiment results show that the proposed method for both datasets achieved satisfactory results.

Conclusions and Future Works
Epilepsy is among a group of neural brain disorders taking place at different ages with various symptoms. If not diagnosed in the first stages, epileptic seizures will develop further [1][2][3][4][5]. Thereby, diagnosis of epileptic seizures in the first stages is of paramount importance for specialist physicians [6]. There have been various methods introduced to date for the early diagnosis of epileptic seizures. Among these methods, the recording of EEG signals is crucial for specialist physicians and neurologists. The EEG signals include important information regarding brain function at the time of epileptic seizures, which is significantly important for physicians. Recording EEG signals for epileptic seizure diagnosis is carried out in a long-term form, such that physicians cannot precisely examine all signals in detail. In addition, many factors, e.g., different artifacts in EEG signals, contribute to the fact that physicians encounter challenges in epileptic seizure detection.
In order to overcome the aforementioned obstacles, different AI methods are provided to help epileptic seizure diagnosis via EEG signals, and numerous investigations are being conducted. The AI techniques consist of various ML and DL methods. Each of these methods has some drawbacks and benefits.
This paper provides a novel CADS for epileptic seizure detection in EEG signals using FD features and a CNN-AE model with proposed layers. In this study, the steps of CADS for epileptic seizure detection from EEG signals include preprocessing, feature extraction, feature selection, and classification. Firstly, the Bonn and Freiburg datasets were used for experiments. During preprocessing, a band-pass filter (BPF) with 0.5-60 Hz cut-off frequency was employed for removal of artifacts in the EEG signals. In the next step, the EEG signals of the datasets were segmented into different time windows. For this, each EEG signal was segmented into time windows of 5 s and 4 s for the Bonn and Freiburg datasets, respectively. The DT-CWT with QS 14/14 and NS 13/19 filters was then used to decompose the EEG signals into different sub-bands. Feature extraction was performed employing various FD techniques. This step is the first innovation of the paper. Then, the mRMR technique was used for feature selection and the number of features was reduced to 25. The classification was eventually carried out using various ML models and a CNN-AE model, where the CNN-AE model with proposed layers is another novelty of the paper. The experiment results shows that the proposed CADS achieved the highest accuracy. Tables 10 and 11 display the related works on epileptic seizure detection in EEG signals using the Bonn and Freiburg datasets, respectively. As shown in Tables 10 and 11, the proposed method has been able to achieve satisfactory results compared to previous work. The proposed method includes a combination of the most important preprocessing, feature extraction, feature selection, and classification methods for epileptic seizure detection. This has improved the efficiency of the proposed method compared to related work.
The proposed method has higher accuracy than other methods used in practical applications in epileptic seizure detection in EEG signals, such that it can be implemented on hardware platforms such as field programmable gate array (FPGA) [108][109][110] and employed in health centers as an auxiliary tool for quick epileptic seizure diagnosis [111].
In future works, we could use various CNN-AE models such as convolutional Denoising AE (CNN-DAE) [112,113] models for epileptic seizure detection. First, the EEG signals were converted to 2D images using the short-time Fourier transform (STFT) technique and then applied to one of the inputs of the CNN-AE model [114]. Then, various 2D features of STFT were extracted and applied to the other input of the CNN-AE model [115]. Another future study is using local binary pattern (LBP) feature extraction methods [116,117], such that the EEG signals of LBP features are extracted and applied to the CNN-AE input. For future works, we can use new deep feature fusion methods for epileptic seizure diagnosis using EEG signals [118,119]. Another future study is using advanced DL models for epileptic seizure detection in EEG signals [120][121][122][123][124][125][126].

Conflicts of Interest:
The authors declare no conflict of interest.