Redundancy Removed Dual-Tree Discrete Wavelet Transform to Construct Compact Representations for Automated Seizure Detection

: With the development of pervasive sensing and machine learning technologies, automated epileptic seizure detection based on electroencephalogram (EEG) signals has provided tremendous support for the lives of epileptic patients. Discrete wavelet transform (DWT) is an effective method for time-frequency analysis of EEG and has been used for seizure detection in daily healthcare monitoring systems. However, the shift variance, the lack of directionality and the substantial aliasing, limit the effects of DWT in some applications. Dual-tree discrete wavelet transform (DTDWT) can overcome those drawbacks but may increase information redundancy. For classiﬁcation tasks with small dataset sizes, such redundancy can greatly reduce learning efﬁciency and model performance. In this work, we proposed a novel redundancy removed DTDWT (RR-DTDWT) framework for automated seizure detection. Energy and modiﬁed multi-scale entropy (MMSE) features in a dual tree wavelet domain were extracted to construct a complete picture of mental states. To the best of our knowledge, this is the ﬁrst study to employ MMSE as an indicator of epileptic seizures. Moreover, a compact EEG representation can be obtained after removing useless information redundancy (redundancy between wavelet trees, adjacent channels and entropy scales) by a general auto-weighted feature selection framework via global redundancy minimization (AGRM). Through validation on Bonn and CHB-MIT databases, the proposed RR-DTDWT method can achieve better performance than previous studies.


Introduction
Epilepsy is a functional disorder caused by paroxysmal abnormal discharge of brain cells.According to the World Health Organization, about 50 million patients worldwide of all ages are suffering from epilepsy [1].Conventional manual seizure inspection of long-term EEG monitors is time-consuming.With the rapid development of pervasive sensing in daily healthcare system and machine learning technologies, computer-aided diagnosis mechanisms, such as automated seizure detection based on electroencephalogram (EEG), provides tremendous support for a patient's health and quality of life [2].Such automated seizure detectors can trigger alarm when users are or will possibly be in a state of seizure.So far, algorithms for automated epileptic seizure detection proposed in most studies consist of three parts: (1) signal domain transformation, such as frequency domain via Fourier transform [3], wavelet time-frequency domain via discrete wavelet transform (DWT) [4,5], weighted and specific shapes via Hermite transformation [6] or original domain without transformation [7]; (2) feature extraction in the target domain, such as energy features [8] and complexity features [9]; and (3) machine learning based classification using a support vector machine (SVM) [10], k-nearest neighbor (KNN) [11] or artificial neural network (ANN) [12].However, all the aforementioned three parts have shown limitations in some application scenarios, which are discussed in the following paragraphs separately.
For signal domain transformation, in spite of the many transformed domains explored for automated epileptic seizure detection, the most discriminative information resides in the time-frequency domain due to the evident transient characteristics and rich frequency components of epilepsy EEG.Accordingly, discrete wavelet transform (DWT) [4,5] has been widely applied in EEG based epileptic seizure detection applications because it can represent both the time and frequency characteristics of EEG signals.However, some drawbacks of DWT, such as shift variance, lack of directionality and the oscillation attribute of DWT coefficients [13], limit its effectiveness in some applications.Moreover, the iterated downsampling operations during wavelet decomposition may introduce severe aliasing [13].Such aliasing can lead to information loss of original signals.Dual tree DWT (DTDWT) [13] can overcome the aforementioned drawbacks of DWT at the cost of increasing information redundancy (a 2 d redundancy factor for d-dimensional signals).But large redundancy may greatly reduce learning efficiency, which in turn weakens the performance of the trained models.
For feature extraction in the target domain, complexity features [9] have received a great deal of attention in the biomedical signal processing field in recent years.To measure the complexity property of EEG signals during epileptic seizures, entropy-based features have been widely used, such as sample entropy [9], fuzzy entropy [10], approximate entropy [14], permutation entropy [15] and distribution entropy [9].All these entropy indicators listed above measure EEG complexity on a single scale.The single scale complexity measure may fail to quantify the underlying dynamics of the extremely complex physiological signals.Therefore, multi-scale entropy (MSE) was proposed [16].However, the coarse-graining procedure during MSE computation can shorten the length of time sequence but a precise entropy relies heavily on a longer sequence length.
For machine learning based classification between normal (or interictal) and ictal seizure EEG, as mentioned earlier, large redundancy may greatly reduce the learning efficiency and increase model complexity.Epileptic seizure detection is a pattern recognition task with a small dataset size due to the scarcity of ictal seizure EEG.With only a small database available for model training, the large amount of redundancy in the seizure detection scenario will especially weaken the model performance.Due to the huge cost of data collection and data labeling, another alternative solution is to construct a compact data representation, reducing the dependence on data volume.Therefore, redundancy removal is crucial.Some classical redundancy reduction methods, such as principal component analysis (PCA) [17] do not take feature separability into consideration.For a pattern recognition problem, the high feature separability and low redundancy are both important.
To address the aforementioned issues, we propose a novel framework of redundancy removed DTDWT (RR-DTDWT) to reduce global redundancy introduced by DTDWT and achieve a compact signal representation through low-redundancy features in the wavelet domain.First, DTDWT was employed to represent EEG signals in dural tree wavelet domain and at the same time, to overcome the drawbacks of DWT at the cost of increasing information redundancy.Then, energy and complexity features were both extracted for classification in our method.The energy features refer to the mean absolute values and variance-like statistic metrics.To measure the complexity property of EEG signals and reduce the influence of the short length of time sequence on entropy calculation, we used modified multi-scale entropy (MMSE) [18] to represent the complexity of EEG signals.MMSE can overcome the shortcoming of MSE by replacing the coarse-graining procedure with a moving-average procedure.To the best of our knowledge, this is the first study to evaluate the effectiveness of MMSE for seizure detection.We constructed a complete picture to represent mental electrophysiological activity in wavelet domain with both a wealth of useful information and redundancy.The redundancy in this work was introduced from three levels: (1) redundancy between adjacent EEG channels, (2) redundancy between dual wavelet trees and (3) redundancy between entropy in different scales.Therefore, in the next step, we aimed to minimize the global redundancy, and meanwhile retain the feature separability.
In our work, auto-weighted feature selection via global redundancy minimization (AGRM) [19] was used to reduce the information redundancy.AGRM can take both feature redundancy and separability into account.Moreover, the optimization problem involved in AGRM is convex, so that a global optimum instead of local optimum can be obtained.In our work, a compact representation of the EEG signal can be obtained after removing information redundancy by AGRM.We validated the proposed RR-DTDWT framework on two benchmark databases (Bonn database and CHB-MIT database).The results on both databases demonstrate that RR-DTDWT can yield competitive results compared with previous studies.

Methodologies
The proposed RR-DTDWT consists of four parts; namely, DTDWT-based signal domain transformation, feature extraction, AGRM-based feature selection and SVM-based classification.The framework of the proposed method is shown in Figure 1.The principles of the four parts are given separately in the following subsections.

Dual-Tree Discrete Wavelet Transform (DTDWT)
DWT employs analysis filters to decompose signals into approximation components and detail components.Then 1/2 downsampling was applied to obtain the approximation coefficients and detail coefficients.The downsampling procedure may introduce severe aliasing, leading to distortion which may reduce the ability of wavelet coefficients to characterize the original signals.DTDWT is an enhancement of DWT which can overcome the above mentioned drawbacks.The flow chart of DTDWT is shown in Figure 2. As shown in Figure 2, DTDWT employs two wavelet trees for signal decomposition.For the first level decomposition, if the delay between two wavelet trees equals to the sampling interval, then the sample value discarded during the downsampling operation of the 1st wavelet tree is exactly the reserved one of the 2nd wavelet tree (equivalent to no downsampling operation).For decomposition in a higher level, alternate odd and even-length linear phase filters are utilized in DTDWT.However, the use of alternate odd/even filter approach is impractical in some scenarios.Therefore, a Q-shift [20] dual tree structure was proposed so that all filters beyond level 1 decomposition were of even-length.In our work, the 10-tap Q-shift filter was selected, which has been proven effective and is the common choice in many applications of DTDWT [21,22].
Through DTDWT (Figure 2), we can obtain 14 coefficient sets (2 sets of approximation coefficients: A T1L6 , A T2L6 ; 12 sets of detail coefficients: D T1L1 , D T2L1 , D T1L2 , D T2L2 , ... , D T1L6 and D T2L6 ) for each EEG channel.Complexity and energy features were extracted directly from these wavelet coefficient sets.L Ti and H Ti denote low-pass and high-pass analysis filters in the ith wavelet tree, respectively.The operator "↓ 2" denotes the 1/2 down-sampling operation.A TiLj and D TiLj denote the approximation and detail components of level j decomposition in the ith wavelet tree respectively.
(3) For a given X m (i), assign B i to be the number of j which meets the following condition: d[X m (i), X m (j)] ≤ r and j = i, where r is the tolerance parameter.Then, define: (4) Define B (m) (r) to be: (5) Increase the length of vector series from m to m + 1.Then repeat step (1) to (4) to obtain B (m+1) (r).(6) The sample entropy of time sequence X is: When N is a finite value, sample entropy can be estimated by the following formula:

Multi-Scale Entropy
Sample entropy may fail to interpret the signal complexity in multiple time scales.MSE was, thus, proposed [16].MSE accounts for complex dynamics of time sequence over multiple scales by introducing the coarse-graining process directly before entropy calculation.For discrete time sequence X = {x(1), x(2), ..., x(N)}, the coarse-grained time sequence {y (τ) } can be calculated by the following formula: where τ is the scale factor.Then, we can obtain the sample entropy of the coarse-grained time sequence {y (τ) } as MSE in scale τ.

Modified Multi-Scale Entropy
Although MSE can quantify complex dynamics of time sequence over multiple scales, there are still some limitations.Considering time sequence with a length of 100 points, MSE in scale 5 is calculated by only 20 coarse-grained points.However, a precise complexity analysis relies heavily on time sequence with sufficient samples.MMSE was proposed for complexity measurement of short-term time sequence [18] and has been applied for signal complexity analysis [24,25].In this work, entropy features were extracted directly from wavelet coefficients.The iterated downsampling operations of DTDWT greatly reduce the lengths of coefficient sets in high level decomposition.Accordingly, MMSE is preferred rather than MSE to estimate complexity of short wavelet coefficients.For MMSE, the coarse-grained time sequence {y (τ) (j)} is calculated through moving-average strategy as follows: Instead of utilizing disjoint time window to divide original time sequence into coarse-grained sample points, for MMSE, an overlapping moving window was applied.Therefore, the length of coarse-grained time sequence increases from N/τ to N − τ + 1.According to previous study [26], m = 1 or m = 2 with r fixed at a value between 0.1 and 0.25 of the standard deviation (STD) of time sequence was suggested.Many applications of sample entropy selected parameters following this criterion [27].Specifically, r = 0.15 × STD was selected in [28].In our method, m = 2, r = 0.15 × STD and τ = {1, 2, 3, 4, 5} (5 scales) were selected for MMSE calculation.

Energy Features
Energy features, namely, variance and mean absolute values of wavelet coefficient sets, were also extracted for classification, as a complement of the complexity features.

Feature Selection Based on Auto-Weighted Feature Selection via Global Redundancy Minimization (AGRM) Algorithm
The extracted features in above procedures contain huge amounts of redundant information.For pattern recognition problems with small dataset size, epileptic seizure detection for example, such redundancy may greatly reduce learning efficiency and model performance.In this section, we introduce the principle of a latest redundancy removal algorithm, namely, AGRM, which has been proven quite effective in the image processing field.Traditional redundancy removal algorithms only focus on minimizing feature redundancy but ignore feature separability.For AGRM, feature redundancy and separability are both taken into consideration at the same time.We denote the feature matrix as F ∈ R n×d , where rows of F correspond to observations and columns to features.Each column f i (i ∈ {1, 2, ..., d}) has been normalized by Z-score.The objective function of AGRM takes the following form, where 1 = (1, 1, ..., 1) T and z ∈ R d×1 refer to the final feature score obtained by AGRM so that the top ranking features are selected to represent the original signals.Matrix A ∈ R d×d in objective function (8) denotes pair-wise feature redundancy, where A i,j = Besides, s ∈ R d×1 refers to another input feature score to be jointly taken into account.In this work, s denotes the Fisher score of original features.We denote µ i as the mean value of the ith feature of all EEG segments.Similarly, µ k i is the mean value of the ith feature of segments in class k.Given a feature f i in c known classes, with segment ID j in a specific class, Fisher score s can be computed as Equation (9).
In Equation ( 9), S B and S W denote feature distance between and within class respectively.Therefore, a higher Fisher score contributes to higher feature separability.The first term z T Az in ( 8) is the global redundancy of extracted features.The second term z T s in (8) denotes feature separability.And λ acts as a trade-off parameter between the two terms, which comes to convergence automatically in optimization procedure.Obviously, by minimizing the objective function ( 8), feature redundancy is minimized while feature separability is maximized.The optimization problem of AGRM has been proven convex [19], so a global optimum can be achieved.The optimal z and λ can be obtained by general augmented Lagrangian multiplier (ALM) method given by [19].Here, we give a brief description of optimization procedure: ) + , where (x) + = max(x, 0).For ∀i, the optimal z * (i) = (g(i) − η * ) + .
Following the optimization procedure above, feature scores z can be obtained.The leading features with higher scores are picked out to represent original data with minimal redundancy.

Classification with the Support Vector Machine
The SVM is a machine learning algorithm based on the structural risk minimization principle [29].SVM has achieved great performance in many pattern recognition based advanced technologies, such as the human-machine interface based on muscle activity classification [30], the brain-computer interface based on mental state classification [31] and the engine control system based on fault diagnosis [32].The principle of SVM is introduced in brief.
Given input feature vectors x = {x i , i = 1, 2, ..., N} and the corresponding labels y = {y i , i = 1, 2, ..., N}, where x i ∈ R d×1 , d is the dimensionality of the feature space, N is the dataset size and y = 1 or y = −1 refer to the two categories (normal or ictal in this work) to be classified.We define a function f (x) taking the following form: where w ∈ R d×1 .SVM aims to find the optimal function f (x) to achieve f (x i ) ≥ 0 for y i = 1 and f (x i ) < 0 for y i = −1 so that data of two classes in the high-dimensional feature space are separated by the optimal hyper-plane w T x + b = 0. To obtain the optimal hyper-plane as the classification boundary with the maximal separating margin between the two classes [33], mathematically, the objective function of SVM takes the following form: The constraints in Equation ( 11) can be rewritten in a compact form: Because the training data may not be perfectly separated by the obtained hyper-plane, the slack variables ξ i were employed to relax the constraints in Equation ( 12): The objective function (11) can be further rewritten to the following form which constitutes the structural risk [29]: The objective function ( 14) takes both the model complexity (the first term) and the training error (the second term) into consideration.The optimal solution of objective function ( 14) can be obtained through Lagrange multipliers [33].

Databases
In this section, we give a description on the two benchmark databases utilized in this study.

Bonn Database
Bonn database built by Andrzejak et al. at Bonn University, Germany, contains artifact-free data collected from five healthy subjects and five epilepsy patients.The artifacts due to muscle activity and eye movements were pre-removed through visual examination.Two classification tasks were used to evaluate the effectiveness of our method: (1) normal EEG (set A) versus ictal EEG (set E); (2) interictal EEG (set C) versus ictal EEG (set E).Each set contained 100 single-channel EEG segments of 23.6 s duration.EEG data were acquired at a 173.61Hz sampling rate using 12-bit resolution.For more details, please refer to [34].
In the preprocessing procedure, the single-channel EEG signal was normalized by Z-score.The normalized signals with zero-mean and one-standard deviation were prepared for subsequent feature extraction process.
In feature extraction procedure, due to the relatively low segment duration and sampling rate, the approximation and detail coefficient sets at the 6th decomposition level (A T1L6 , D T1L6 , A T2L6 and D T2L6 ) were of extremely short length (64 coefficients), which cannot support a reliable complexity analysis.MMSE features extracted from these four coefficient sets were excluded from the feature vector.Accordingly, a 50-dimensional MMSE feature vector was constructed (the remaining 10 coefficient sets × 5 entropy scales × 1 EEG channel).As for energy features, the variance and mean absolute value were extracted from each wavelet coefficient set, constructing a 28-dimensional energy feature vector (14 coefficient sets × 2 indicators × 1 EEG channels).In summary, a 78-dimensional feature vector was constructed for each EEG segment.
In the feature selection and classification procedure, we employed the "leave-one-out" cross-validation strategy.Because EEG data from all patients are mixed together in Bonn database, no patient label is available.Most studies in the literature also employed the cross-validation strategy on all data [6,35,36], which is consistent with our study.Considering that the dimensionality of original feature space is relatively low, all features with positive feature scores were retained while discarding those with zero scores (according to the last step of AGRM optimization procedure, the feature scores are non-negative values).We first held out one EEG segment as test data.All remaining data were allocated to training set, used for both AGRM-based feature score calculation and SVM model training.A label of "normal", "interictal" or "ictal" for the held-out test data could be given by the trained SVM.We applied the same procedure to all EEG data so that all EEG segments were used to test algorithm performance.We employed sensitivity, specificity, precision, accuracy and F1 score as classification evaluation metrics.

CHB-MIT Database
CHB-MIT database published by Shoeb [37] contains long-term multi-channel EEG data of 24 epilepsy patients (1.5-22 years old) collected at the Children's Hospital of Boston-Massachusetts Institute of Technology, with a 256 sampling rate and a 16-bit resolution.Data in CHB-MIT database were acquired continuously during long-term EEG monitoring with no signal processing performed after acquisition.The raw EEG data can better simulate a practical application scenario.The subset of each patient contains a varying number (between 9 and 42, 27.25 ± 9.68 on average) of .edffiles.Generally, the digitized recordings in most files are one-hour long.The beginning and end time of each seizure ictal case has been labeled based on expert judgments.We divided the continuously recorded data into 30 s EEG segments.There are totally 198 seizure ictal cases in the database.Due to the non-uniform electrode distribution during data acquisition, 181 out of 198 seizures which shared common 23 channels were utilized for method validation.For each seizure ictal case, only one 30 s segment was used in our experiment to avoid the overestimation of algorithm performance due to the similarity among several segments in one seizure ictal case.Because of the scarcity of epilepsy ictal segments (181 segments), we randomly downsampled interictal EEG data to rebalance the ratio between ictal and interictal data (a final ratio of 1:3 in our validation experiment).To take the signal variability over time into consideration, the segmented interictal data were evenly distributed over different periods of time with no overlap.More details can be fetched in [37].
In preprocessing procedure, the EEG signal in each channel was normalized by Z-score.The normalized signals with zero-mean and one-standard deviation were prepared for subsequent feature extraction process.
In feature extraction procedure, MMSE features were first extracted from each coefficient set in each EEG channel, constructing a 1610-dimensional MMSE feature vector (14 coefficient sets × 5 entropy scales × 23 EEG channels).As for energy features, the variance and mean absolute value were extracted from each wavelet coefficient set, constructing a 644-dimensional energy feature vector (14 coefficient sets × 2 indicators × 23 EEG channels).In summary, a 2254-dimensional feature vector was constructed for each EEG segment.
In the feature selection and classification procedure, a specific classification model was developed for each patient.The "leave-one-out" cross-validation strategy was also applied to validate the proposed RR-DTDWT algorithm.We firstly held out one EEG segment as test data and all remaining EEG segments of all patients were allocated to the training set.Although a total number of 181 cases of seizure ictal EEG segments were contained in the experiment database, for each patient, very few cases of ictal seizures were available, which could not contribute to a precise feature correlation used for feature score calculation.Therefore, the AGRM-based feature score was pre-calculated using all EEG data in training set.The whole procedure was independent of the testing set.Due to the high dimensionality of original feature space, only the leading 50 features were selected.As for classification, due to the huge individual difference of EEG characteristics, the training data of the same patient as test data were used for SVM model training.The model trained in our work can be viewed as a patient-specific model, which is consistent with most automated seizure detection studies in the literature [7, 38,39].Similarly, we employed sensitivity, specificity, precision, accuracy and F1 score as classification evaluation metrics.

Qualitative Results
The pair-wise redundancy matrix of extracted features can help to better visualize the redundancy in the features extracted, as is shown in Figure 3. Here, we take CHB-MIT database as an example.According to the middle panel of Figure 3, it is obvious that energy-based features (mean absolute and variance) are highly correlated with each other, and the same is true for complexity-based features (MMSE).However, there is only a small amount of information redundancy between two kinds of features (energy-based and complexity-based features).According to the left panel of Figure 3, we can clearly see three bright diagonal lines.The middle diagonal line represents that each feature is highly correlated with itself.The lower left and upper right diagonal lines demonstrate the high redundancy between corresponding features extracted from two wavelet trees.According to the right panel of Figure 3, the periodic occurrence of the bright blocks indicates the high redundancy between adjacent EEG channels and between two wavelet trees.Moreover, the bright part of each block refers to the high redundancy between different entropy scales.In summary, from Figure 3, features extracted by DTDWT had high redundancy so an effective redundancy removal method was required to refine the extracted features.The visualization of high-dimensional features (CHB-MIT database) by t-distributed stochastic neighbor embedding (t-SNE) [40] is shown in Figure 4.Although the epileptic seizure detector was trained specifically for each patient, here, we visualize data of all patients at the same time.Figure 4a represents visualization of original 2254-dimensional features with high redundancy.Obviously, ictal and interictal data is difficult to distinguish.Figure 4b

Quantitative Results on Bonn Database
Algorithm comparisons for two different classification tasks, "normal versus ictal" and "interictal versus ictal," on Bonn database, are shown in Tables 1 and 2 respectively.For both classification tasks, RR-DTDWT yielded an sensitivity and specificity of 100%.As a contrast, DTDWT, RR-DWT and DWT achieved a relatively lower performance.To avoid the performance bias caused by feature dimension, we also utilized Fisher score to reduce feature dimensionality (the same dimensionality as RR-DTDWT).As shown in Tables 1 and 2, RR-DTDWT outperformed all other methods on the Bonn database.We also present the algorithms' performances on CHB-MIT database in Table 3. Unlike DWT, the results of DTDWT and RR-DWT are almost the same (even slightly lower).This demonstrates that by simply extending DWT to DTDWT, the redundancy introduced limits the learning efficiency of model.Also, by directly applying AGRM to DWT-based features, some useful information may be lost.However, if we first employ DTDWT to complement the missing information of DWT, and then apply AGRM to reduce information redundancy, RR-DTDWT can achieve significantly higher results for all evaluation metrics.Similarly, we also employed Fisher score to reduce feature dimensionality (the same dimensionality as RR-DTDWT).RR-DTDWT still outperforms all other methods, as shown in Table 3.

Discussion
In this work, we proposed a RR-DTDWT framework for automated epileptic seizure detection.The propsed RR-DTDWT consists of four parts: (1) DTDWT-based signal domain transformation; (2) feature extraction; (3) feature selection and (4) classification.In the signal domain transformation part, the signal representation was obtained through DTDWT.DTDWT can reduce information loss during the iterated downsampling operation of DWT, at the cost of introducing information redundancy.In feature extraction part, energy and complexity features were extracted.MMSE was employed as an indicator of epileptic seizures for the first time.Then, AGRM-based feature selection could reduce the information redundancy and take feature separability into consideration at the same time.Finally, the label of each EEG segment was given by a patient-specific SVM classifier.In the following subsections, the comparison between our method and the ones from previous studies, the computational cost analysis and the limitations of our method are presented separately.

Comparison with Previous Studies
In this subsection, method comparison between the proposed RR-DTDWT algorithm and latest studies (after 2014) utilizing the same databases is presented.

Comparison with Previous Studies Based on Bonn Database
For Bonn database, method comparisons on two classification tasks, namely, "normal versus ictal" and "interictal versus ictal," are presented in Tables 4 and 5 respectively.Because the data in Bonn database are balanced between each category, classification accuracy was expected to be a good metric to characterize method performance.Obviously, the proposed RR-DTDWT method outperformed most other latest methods, achieving an accuracy of 100% for both cases.As previously mentioned, Bonn database contains artifact-free EEG signals.The artifacts due to muscle activity and eye movements were pre-removed through visual examination.Accordingly, Bonn database cannot simulate the challenging practical application scenarios, although it can be utilized for method comparison.Some previous studies have also reported a 100% accuracy.For example, Battacharyya et al. [41] employed tunable-Q wavelet transform to develop a seizure detector which also achieved an accuracy of 100% for "normal versus ictal" problem.The model of Kumar et al. [4] can discriminate normal data from ictal data with a 100% accuracy and also discriminate interictal data from ictal data with a very low error rate.Methods proposed by Raghu et al. [42] yielded excellent performance for both classification tasks on Bonn database.Moreover, previous studies also proposed wavelet-based methods [43,44] and achieved a 100% or almost 100% classification accuracy.Akut et al. [45] proposed a wavelet-based deep learning approach in which no manual feature extraction was required.This method can achieve a high accuracy on small EEG database.All these studies made great contributions to the epilepsy monitoring field.Although our method achieved a 100% classification accuracy and outperformed most previous studies on Bonn database, its superior performance compared with other studies on the practical and challenging CHB-MIT database is described next.For the CHB-MIT database, a method comparison is shown in Table 6.Different studies based on CHB-MIT database employed a wide variety of metric combinations to evaluate algorithm performance.For example, study [10,56] applied specificity and false positive rate (FPR) to characterize model performance respectively.To compare all methods consistently, we converted the two metrics into each other.For the proposed method, a specificity of 99.63% is equivalent to a false positive rate of 0.44/hour, under the assumption of 30 s EEG segments.The proposed RR-DTDWT can achieve a higher F1 score than most latest studies validated on CHB-MIT database.Only the F1 score of the detector developed by Xiang et al. [10] is slightly higher than ours.However, due to the long time EEG monitoring for epileptic seizure detection and the scarcity of ictal data, the acquired EEG data in real application scenarios characterize an extremely imbalanced ratio between ictal and interictal segments.Therefore, a high specificity in this application scenario may be equivalent to an unacceptable false alarm rate.For example, a detector with a seemingly high specificity of 95% is expected to trigger six false alarms per hour, which cannot be permitted in a practical application.Compared with the performance of [10], our detector approximately achieves a four fold reduction of FPR (from 2.03/h to 0.44/h) only at the cost of a slight reduction in sensitivity (from 98.27% to 96.69%).
Based on the above discussions, our RR-DTDWT based detector is more suitable for realistic application scenarios compared with those listed in Table 6.Raghu et al. [57] also proposed an automated epileptic seizure detection method based on DWT and complexity measure via sigmoid entropy, achieving a sensitivity of 94.21% and an accuracy of 94.38%, which is less effective than our method.Previous studies such as [35] evaluated the performance of their method using both Bonn and CHB-MIT databases.Although their model has also shown a perfect performance on both of the two classification tasks ("normal versus ictal" and "interictal versus ictal") on the Bonn dataset (see Tables 4 and 5), its accuracy (97.5%) on more challenging CHB-MIT database shown in Table 6 is lower than ours (98.89%).Promisingly, our RR-DTDWT algorithm can further promote the development of automated seizure detection technology.In this subsection, we analyze the computational cost of the proposed RR-DTDWT framework.In signal domain transformation part, DTDWT can be calculated with a computational cost of O(N) [61], for a EEG segment with a length of N. In the energy-based feature extraction part, both mean absolute values and variance features can be calculated in O(N) computational time.Complexity-based features such as sample entropy require O(N 2 ) computation time.Using the fast computation algorithms, the computation time can be reduced to O(N m is the parameter used for sample entropy calculation (as aforementioned, m = 2 in this work).Feature selection methods can be divided into two categories: filter-based and embedded-based methods [63].Filter-based methods rank all potential features according to a predefined feature score based on the intrinsic property of data, which is independent of the following classification procedure.Embedded-based feature selection methods integrate feature selection into the learning procedure, involving more parameters to be fine-tuned.Therefore, embedded-based methods are computationally expensive.The AGRM algorithm used in our work is a filter-based method which is computationally efficient.Moreover, the feature score was pre-calculated using training set in our work.To develop an online seizure detector, the AGRM-based feature selection adds no additional computational burden.In contrast, by selecting only those features with the minimal redundancy, the computational cost is further reduced.In the classification procedure, the computational cost of SVM model is only O(1) for a given feature vector of a specific EEG segment.In summary, the computational cost of the proposed RR-DTDWT is O(N), which can support its application to an online system.

Limitations of the Proposed RR-DTDWT
One limitation of the RR-DTDWT based seizure detection model is the generalization performance when applied to a new patient.The automated seizure detection models in literature can be divided into two categories; namely, the patient-nonspecific model [64] and the patient-specific model [7].The former one refers to the model trained by the data of other patients with no data of the test patient involved in model training.This kind of model can save huge costs on data collection and data labeling because for a new patient, no further data collection is required and the model can be used immediately.Data from the other patients can be reserved beforehand, which can be viewed as cost-free.For example, Deng et al. [64] proposed an enhanced, transductive, transfer learning Takagi-Sugeno-Kang fuzzy system construction method (ETTL-TSK-FS) to enhance the generalization performance of automated seizure detection models.The ETTL-TSK-FS method achieved a sensitivity of 91.91%, specificity of 93.16% and accuracy of 94.04% on the CHB-MIT dataset.Although its performance is less effective than most patient-specific models, ETTL-TSK-FS can be viewed as the current state-of-the-art patient-nonspecific model.The patient-specific models take advantage of the useful information of the test patient.However, acquisition of data from the test patient is required to train the seizure detection model or fine-tune the pre-trained model.Most automated seizure detection studies in literature focus on patient-specific classification models [7,38,39].The generalization performance is a common limitation and remains a future work of our study.

Conclusions
In this work, we proposed a novel RR-DTDWT framework for automated epileptic seizure detection.First, a more complete picture of mental electrophysiological activity was constructed via DTDWT which can complement the missing information during the iterated downsampling operations of DWT.Then, MMSE was extracted as an indicator of epileptic seizures for the first time.Moreover, the information redundancy was reduced through AGRM so that a compact EEG representation could be obtained for seizure detection.This novel method yields a superior performance compared with latest studies through validation on two benchmark databases.Due to the wide application of wavelet-based algorithms and complexity-based features, the proposed RR-DTDWT has shown its high potential to be applied in broader fields.

Figure 2 .
Figure 2. DTDWT flow chart.L Ti and H Ti denote low-pass and high-pass analysis filters in the ith wavelet tree, respectively.The operator "↓ 2" denotes the 1/2 down-sampling operation.A TiLj and D TiLj denote the approximation and detail components of level j decomposition in the ith wavelet tree respectively.
represents visualization of 50-dimensional features after redundancy removal achieved by AGRM.Compared with scatter in Figure4a, ictal and interictal data in Figure4bare easier to classify, proving the effectiveness of redundancy removal.

Figure 4 .
Feature visualization by t-SNE.(a) Visualization of original features; (b) Visualization of redundancy removed features.

Table 1 .
Performance of different methods on Bonn database (normal versus ictal).

Table 2 .
Performance of different methods on Bonn database (interictal versus ictal).

Table 3 .
Performance of different methods on CHB-MIT database.

Table 4 .
Comparison with previous studies on Bonn database (normal versus ictal).

Table 5 .
Comparison with previous studies on Bonn database (interictal versus ictal).

Table 6 .
Comparison with previous studies on the CHB-MIT database.