Statistical Model-Based Classiﬁcation to Detect Patient-Speciﬁc Spike-and-Wave in EEG Signals

: Spike-and-wave discharge (SWD) pattern detection in electroencephalography (EEG) is a crucial signal processing problem in epilepsy applications. It is particularly important for overcoming time-consuming, difﬁcult


Introduction
Epilepsy is a chronic neurological disorder that affects patients, causing recurrent seizures. Seizures are characterized by excessive electrical discharges in neurons. Their waveform, known as the spike, is characterized by brief bursts of high amplitude, synchronized and multiphasic activity with several polarity changes [1]. These are exhibited close to the epileptic focus and stand out from the background EEG activity. Electroencephalography (EEG) is currently the main technique to record electrical activity in the brain. Neurologists, trained in EEG, are able to properly determine an epilepsy diagnosis by analyzing the different types of spikes in the so-called rhythmic activity of the brain.
Existing, automatic methods for detecting epileptic events in EEG signals have performance that greatly exceed visual inspection. These methods focus mostly on interictal spikes [2,3], seizure onset detection [4], or waveform epileptic patterns [5,6]. There exists a wide variety of methods to accurately detect seizures and their patterns in EEG. Most of these methods are based on supervised Table 1. Some state-of-the-art methods for the spike-and-wave discharge (SWD) estimation in electroencephalography (EEG) signals in humans, compared in terms of classification techniques, features, and reported performance.Abbreviations are as follows: high Specificity, rule in (SpPIn), According to the frequency and magnitude weighted average (WA), According to an estimated threshold (AET).

Method
Features Classifier Accuracy in % Ref.
Generalized Gaussian distribution (GGD) GGD  This paper presents a new SWD patient-specific detection method based on the statistical modeling of the continuous Morlet wavelet coefficients. Precisely, we fit the generalized Gaussian distribution to these coefficients and estimate the corresponding parameters. These parameters are used as features in a 10-NN classifier. Training and testing of the learning model use different EEG datasets.
The remainder of the paper is structured as follows. Section 2 presents the EEG database, and the proposed methodology, where we explain the continuous Morlet wavelet transformation, the generalized Gaussian distribution (GGD) statistical model, and the k-NN classifier. Experimental results using the scale parameter from the GGD and the variance and median from the continuous wavelet coefficients are reported in Section 3, flowed by discussion in Section 4. Conclusions, remarks, and perspectives are presented in Section 5.

Materials and Methods
This section presents our statistical model-based method to detect spike-and-wave discharges (SWD) in EEG signals. It is computationally very efficient, suitable for real-time implementation, allowing on-line spike-and-wave detection. First, we describe the dataset used for experimentation, then we present the different processing steps.
All EEG signals were labeled by a neurologist from FLENI to indicate the onset and duration of the epilepticform. Based on these annotations, we extracted 212 short epochs (1-min average duration) focusing on the spike-and-wave waveform. As such, a database with the 212 monopolar signal epochs was created, with 106 SWD signals and 106 non-SWD signals. Each SWD signal has been restricted to a narrow frequency band between 1-3 Hz. Figure 1 shows an example of a typical SWD signal. Visually, one can observe that SWD patterns exhibit characteristic morphology; whereas non-SWD signals have normal waveform.
Six patients with twenty (20) signals each were used for training our predictive model (Section 2.2). This multiplicity of signals from the same patient has been decided to enforce learning-patient specific patterns. A set of 96 signals from the other six patients were used for testing.

Methodology
The proposed method is composed of four stages. The first stage divides every EEG signal epoch, X ∈ R N×M with M channels and N time instants, into two-second segments per channel, with one-second overlap across a rectangular sliding window. Please note that M is fixed to 22, whereas N varies for different epochs (with an average duration of 60 s giving about N = 256 * 60 = 15, 360 samples). This will give [N/60] segments per channel. The second stage consists of applying Morlet decomposition to create separate time-frequency representations for each segment X t ∈ R N×1 . The purpose of this decomposition is to evaluate the energy distribution throughout the SWD frequency band (1-3 Hz). In the third stage, the statistical distribution of the wavelet coefficients from each segment is represented using a zero-mean generalized Gaussian distribution (GGD). A maximum likelihood method is used to estimate the GGD parameters, scale (ς) and shape (τ) [4,[42][43][44]. This statistical modeling stage gives M × [N/60] pairs of scale (ς) and shape (τ) parameters, achieving a very strong dimension reduction. As we demonstrate it in the experimentation section, the scale parameter ς was found statistically characteristic of the SWD waveform, and it is proposed as a feature to detect such patterns [4,44]. Using a single parameter to classify patterns is too strict and misses the natural variability in the data. For this reason, we compute two other statistical parameters, namely the variance (σ 2 ) and the median ( x) from the wavelet coefficients of each segment. The data from one EEG epoch reduces therefore to three parameters, {ς, σ 2 , x}, giving a high dimension reduction while offering a flexible representation space to discriminate patterns while accounting for natural variability. Please note, in total, we will have M × [N/60] * 3 parameters for any EEG signal epoch. Finally, a classification model has been trained to detect SWD patterns, using the three features parameters. The proposed methodology is summarized in Figure 2. The following sections describe each stage.
We now introduce the Morlet wavelet, the Generalized Gaussian distribution, and the k-nearest neighbor classifier used in this paper.

Sliding window
Feature vector

Morlet Wavelet
The continuous wavelet transform is given by where a is the scaling parameter and b the shift parameter. Equation (2) is the mother wavelet function, where ( * ) denotes the complex conjugate operation. Equation (3) is the analytic expression of the Morlet wavelet [45]. In order to use the Morlet wavelet with frequency f c , we use the relationship where α is the scale, ∆ is the sampling period, f c is the center frequency of Morlet wavelet (in Hz) and f a is the pseudo-frequency corresponding to the scale a (in Hz). As such, the wavelet dominant frequency can be characterized using the center frequency, as it detects the main wavelet oscillations (the reader is referred to [46] for details). Note that the wavelet scale is estimated according to the narrow frequency 1-3 Hz from (Section 2.1). In our case, ∆ = 256, and 1 < f requency < 3, we have a scale α ∈ f c * ∆, f c * ∆/3. Figure 3 show an example of time-scale variation for a value of f c = 1.2308 for SWD and f c = 0.8125 for non-SWD. These data are from signal 1 from patient 1.

Generalized Gaussian Distribution
The generalized Gaussian distribution (GGD) is a flexible statistical model often used in science and engineering to represent data. We propose to represent the distribution of the Morlet wavelet coefficients (C t ) using the GGD [47]. The probability density function (PDF) of the GGD is given by the expression where ς ∈ R + is a scale parameter, τ ∈ R + is a shape parameter, and Γ (·) is the Gamma function.
Fitting equation (5) to the data C t is performed using the maximum likelihood estimators Θ 1 C t : For more details about the estimation of the GGD parameters, we refer the reader to our previous work [4,[42][43][44]48].

Feature Parameters
Through the previously-described stages, data from every signal epoch, M × N samples, is reduced to the matrix of parameters (or features) Θ C t , composed of M × [N/60] rows, with three columns consisting of our data, reduced to the matrix of parameters (or features) Θ C t , composed of the scale parameter from the GGD, the variance and the median of the Morlet wavelet coefficients.
Using this representation, the next stage consists of training a k-nearest neighbors classifier to detect SWD patterns. Please, recall that we have 212 signal epochs in our database. All went through the preceding dimension reduction process. A set of 120 of those (from six patients) served for training and the remaining 96 (from the six other patients) were used for testing. 20

k-Nearest Neighbors Classification
Using the feature vector Θ C t , consider a classification into two possible classes c = 0 (non-SWD) and c = 1 (SWD). The probability of classifying a sample in one of the two classes is given by where D is the dimension of the sammple Θ C t , N 0 and N 1 are the numbers of training samples from class 0 and class 1, respectively; and σ 2 is the variance. Using the Bayes rule to classify a new observation Θ * C t , we obtain the following equation The maximum likelihood gives ρ(c = 0) = N 0 /(N 0 + N 1 ), and ρ(c = 1) = N 1 /(N 0 + N 1 ). Substituting in Equation (10), we obtain the probability ρ c = 0|Θ * C t . The expression for ρ c = 1|Θ * C t can be derived in a similar manner. To determine which class is most likely, the ratio between the two expressions is evaluated If the ratio is greater than one, Θ * C t is classified as c = 0, otherwise it is classified as c = 1. It is important to note that in the case where σ 2 is very small in (11), then both the numerator and denominator will be dominated by the term for which the sample Θ n 0 C t in class-0 or Θ n 1 C t in class-1 are closest to the point Θ * C t , such that On the limit σ 2 → 0, Θ * C t is classified as class 0 if Θ * C t has a point in the class 0 data which is closer than the closest point in the class 1 data. The nearest neighbor method is therefore recovered as the limiting case of a probabilistic generative model. The parameter k is chosen based on √ N, where N is the number of samples in the training dataset. We refer the reader to [49,50] for a comprehensive treatment of the mathematical properties of k-nearest neighbors classifier.

Results
The annotated database introduced in Section 2.1 was used to compute the feature vector [ς t , σ 2 t , x t ] ∈ R 3 , based on the statistical model of the coefficients of the continuous Morlet wavelet. The resulting features were used for off-line training the k-nearest neighbor classifier. With the 212 samples, k was set to 10 giving a 10−nearest neighbor. Tables 2-4 show the statistical mean, standard deviation, variance and bounds values from the feature vector. One can note that, sigma (ς t ), variance (σ 2 t ), and median ( x t ) are larger for SWD that for non-SWD. Therefore, despite the overlapping statistical bounds, a threshold can be determined to detect SWD patterns. Table 2. Mean, standard deviation, variance and bound values for sigma parameter (ς) for class 0 (non-spike-and-wave) and class 1 (spike-and-wave).  Table 3. Mean, standard deviation, variance and bound values for the variance σ 2 t for class 0 (non-spike-and-wave) and class 1 (spike-and-wave).

Mean std Variance Bounds
Class 0 Table 4. Mean, standard deviation, variance and bound values for median ( x t ) for class 0 (non-spike-and-wave) and class 1 (spike-and-wave).

Mean std Variance Bounds
Class 0  To illustrate the above point, Figure 4 shows a 3D scatter plot of the feature vector for spike-and-wave events (SWD, class 1, red dots) and non-spike-and-wave events (non-SWD, class 0, blue dots). One observes that the SWD events tend to be more dispersed compared to non-SWD events. This is corroborated by Figure 5 that shows the parameters in pairs, with the following combinations 1. Scale parameter (ς t ) vs. variance σ 2 t : for class 1 (SWD), one observes a direct relationship between the variance and sigma, where both parameters grow proportionally. For class 0 (non-SWD), both sigma and variance remain in a limited range of values. 2. Scale parameter (ς t ) vs. median ( x t ): as sigma grows, median increases then decreases for both SWD and non-SWD, but is larger for SWD. A cone-shaped pattern can be observed. 3. Variance σ 2 t vs. median ( x t ): as the variance grows, the median increases then decreases for SWD, while it remains in a small range (cluster) for non-SWD.
The performance of our 10-nearest neighbor classifier was evaluated using a dataset consisting of 96 samples, separate from the training set. These samples were extracted from six EEG signals from subjects different from those used for training. We assessed the total accuracy of the classification. The proposed method achieved a 95% sensitivity (true positive rate), 87% specificity (true negative rate), and 92% accuracy.  x t ] for class 0 (non-spike-and-waves events, blue dots), and class 1 (spike-and-waves events, red dots). The SWD events tend to be more dispersed than non-SWD events. (c) variance (σ 2 t ) vs median ( x t ) Figure 5. Scatter plots of the signals used for training, with ς t , σ 2 t , and x t parameters class 0 (non-spike-and-waves events, blue dots), and class 1 (spike-and-waves events, red dots), showing the data dispersion of the proposed approach. In (a) Scale parameter (ς t ) vs. variance (σ 2 t ). For class 1 (SWD), we can see the direct relationship between the variance and sigma, both grow proportionally, while for class 0 (non-SWD) both sigma and variance remain in a range of values. (b) Scale parameter (ς t ) vs. median ( x t ). As sigma grows, the median increases then decreases for both SWD and non-SWD, but is larger for SWD. (c) variance (σ 2 t ) vs. median ( x t ). As variance grows, the median increases then decreases for SWD, while it remains in a small range for non-SWD.

Discussion
The proposed model-based classification method to detect patient-specific spike-and-wave events in long-term EEG signals is based on three feature parameters (or predictors). These are the scale parameter from the generalized Gaussian distribution, see Equation (6), the variance and the median, all estimated from the continuous Morlet coefficients. These features are used with a 10-nearest neighbors classifier to discriminates spike-and-wave from non-spike-and-wave events. Experimental results with real data from a hospital achieved 9 % sensitivity (true positive rate), 87% specificity (true negative rate), and 92% accuracy. Based on our rule to choose k, the value was √ 212 ≈ 14, but we found a better performance by choosing empirically k = 10 . Techniques used in this study are widely known in the scientific community, but they have never before been put together to detect patient-specific epileptiform patterns in EEG. Our main contribution lies in the type of features proposed to detect spike-and-wave patterns and its application to human data. From a technical point of view, the GGD scale parameter depends on the shape parameter, see Equation (5) and Tables 2-4. They can therefore not be used together as features. Using only the scale parameter would restrict the representation space leading to pour representation of natural variability in the data. We, therefore, augmented the representation space by considering the variance and the median of wavelet coefficients. This choice has proven pertinent to discriminating SWD patterns from non-SWD.
The data collection protocol consisted of a neurologist selecting ten SWD patterns for each patient to be part of the training database. Our hypothesis was that using multiple signal patterns from individual patients improves the classification. This enhances learning patient-specific patterns, leading to precise detection of epileptiform patterns compared to previous work [39].
The collected dataset was previously used with other methods (see Table 5). We can see that the proposed method doesn't provide significantly more precise results. However, it has the advantage of analyzing the EEG signal in the time-frequency domain, where previous methods were based on temporal waveform characterization. On the other hand, the assumption that the data has a generalized Gaussian distribution allows a strong dimension reduction, leading to low computational solutions relying on rigorous statistical properties.

Conclusions
This paper presented a new model-based classification method to detect spike-and-wave events in long-term EEG signals in humans. The proposed method is based on the scale parameter of the generalized Gaussian distribution augmented with the variance and the median of the continuous Morlet wavelet coefficients from EEG data and a k-nearest neighbors classification technique.
The performance of the method was evaluated by training the model with an annotated real dataset containing 212 signal recordings consisting of spike-and-wave and non-spike-and-wave events. The classification performance was assessed by utilizing 96 segments and achieved 95% sensitivity (true positive rate), 87% specificity (true negative rate), and 92% accuracy. These results set the path to potentially new research to study the causes underlying the so-called absence epilepsy in long-term EEG recordings.
In addition to its performance, the proposed method can be implemented in online epilepsy care applications. However, due to the high dynamics of the EEG epileptic signals, some waveform might be incomplete (with part of the signal missing due to artifacts). Our method is not able to detect such situations, as confirmed by physicians using visual inspection [39]. Future work will focus on other epileptic waveform patterns as well as on the extensive evaluation of the proposed approach and its comparison with other methods from the literature both in humans and rodents. Other techniques, such as visual data analysis with t-distributed stochastic neighbor embedding [51] and deep learning variational autoencoders [52] will be considered. For future clinical research, an on-line user interface will be implemented with different functionalities such as automatic SWD detection and SWD pattern counts for each brain region.  Acknowledgments: Part of the work was conducted by AQR during his Ph.D. work at Buenos Aires Institute of Technology (ITBA). We would also like to thank the Bioengineering Ivana Zorgno for her assistance in the writing.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

EEG
Electroencephalography FLENI Fight against Pediatric Neurological Disease GGD Generalized Gaussian distribution k-NN k-nearest neighbors SWD Spike-and-wave discharge