Wavelet Power Spectral Domain Functional Principal Component Analysis for Feature Extraction of Epileptic EEGs

: Feature extraction plays an important role in machine learning for signal processing, particularly for low-dimensional data visualization and predictive analytics. Data from real-world complex systems are often high-dimensional, multi-scale, and non-stationary. Extracting key features of this type of data is challenging. This work proposes a novel approach to analyze Epileptic EEG signals using both wavelet power spectra and functional principal component analysis. We focus on how the feature extraction method can help improve the separation of signals in a low-dimensional feature subspace. By transforming EEG signals into wavelet power spectra, the functionality of signals is signiﬁcantly enhanced. Furthermore, the power spectra transformation makes functional principal component analysis suitable for extracting key signal features. Therefore, we refer to this approach as a double feature extraction method since both wavelet transform and functional PCA are feature extractors. To demonstrate the applicability of the proposed method, we have tested it using a set of publicly available epileptic EEGs and patient-speciﬁc, multi-channel EEG signals, for both ictal signals and pre-ictal signals. The obtained results demonstrate that combining wavelet power spectra and functional principal component analysis is promising for feature extraction of epileptic EEGs. Therefore, they can be useful in computer-based medical systems for epilepsy diagnosis and epileptic seizure detection problems.


Introduction
In data classification, including biomedical signals classification, feature extraction is used to reduce the dimension of input data so that a classification method's efficiency can be improved [1][2][3][4]. Many current studies of this type of research are focusing on time-frequency feature extraction. Fourier transform, wavelet transform, empirical mode decomposition and sparse representation of signals are often used to extract signal features, while principal component analysis (PCA), linear discriminant analysis (LDA), independent component analysis (ICA), as well as canonical correlation analysis (CCA), are popular techniques for dimension reduction. In [5], a method was developed to find the optimal discrete wavelet transform (DWT) settings so that classification accuracy is improved and the computational cost of seizure detection is then reduced. In [6], EEG signals are first decomposed using DWT, then extracted wavelet features were used as an input to a mixture expert network for classification. In [7], a robust feature extraction method based on PCA was designed to classify multi-class EEG signals. In [8], the DWT method was employed for pre-processing, and approximate entropy was used as features for classification using artificial neural networks. In [9], a novel patient-specific seizure detection approach using wavelet decomposition of multi-channel EEG recordings was proposed, and the features extracted from different frequency bands were used to classify the seizure and non-seizure signals. In [10], an empirical mode decomposition (EMD)-based dictionary approach was proposed for epilepsy seizure detection, and the high accuracy of the obtained results suggests that the proposed method may be promising for classifying long-term multi-channel EEG recordings. In [11], a hybrid dimension reduction model that uses ICA and PCA was developed as a feature extraction method for epileptic seizure detection. For the data they considered in [11], the results showed significant performance using the combination of ICA and PCA.
More recent research is focusing on the methods that combine both domain transformation and data dimension reduction. Wavelet PCA, multi-scale PCA, and wavelet ICA are some of the examples. In [12], a method that combines independent component analysis, and continuous wavelet transformation was used to remove EKG artifact from EEG data. The technique demonstrated that it outperformed other algorithms that used general statistical features for artifact rejection. In [13], PCA, ICA and LDA were used to reduce the dimension of input data after DWT, and the extracted features were used as the input of support vector machines. The performance of classification showed that the proposed classification process was promising. In [14], multi-scale PCA (MSPCA) was used as a de-noising method, and the study showed that MSPCA greatly improves the classification performance in epileptic seizures detection. In [15], the wavelet-based sparse functional linear model was developed to capture discriminative random components of EEG signals. The proposed method outperformed many other state-of-the-art techniques for the EEG data from the University of Bonn. EEG signals' spectral information has been shown to be useful in epilepsy classification [16][17][18]. In [17,18], Fourier domain functional PCA was used for extracting the feature of EEG signals. The studies in [17,18] illustrated the success of combining the Fourier domain transformation and functional PCA. The rationale behind using the domain transform and dimension reduction approach is that after the data transformation, the separability of the extracted feature is greatly improved. The role of domain transform is to make data behave more discriminable.
The successful classification of the extracted low-dimensional features makes the complicated, multiple domains-based feature extraction more practical and appealing in data visualization. In medical information systems, data visualization enables a better understanding of data patterns and makes the decision more interpretable. Because of this, much of recent research focuses on data visualization problems for information systems. For instance, in [19], a medical data visualization system was developed to allow physicians to see the development of the patients at one glance. The study was done with the aim of assessing the system's usability. To help with the processing of high-dimensional signals, enable signal visualization, and improve the performance of simple classification methods, we develop a new feature extraction approach by focusing on using wavelet transformation and low-dimensional principal component features. We propose a novel approach that extracts epileptic EEG signal features using functional principal component analysis of wavelet power spectra. The transformation of signals from the time domain to wavelet spectral-domain helps improve random signals' functionality. Furthermore, the functional principal component analysis is applied to reduce the signal dimension. The preliminary study of this work was presented in [20], where mainly the functional mean wavelet power spectra were studied. Within this new work, we focus on an in-depth discussion of the technical aspects of the proposed methodologies. We also aim to demonstrate the proposed method's applicability more intensively by considering additional, patient-specific, multichannel EEG signals. Within this study, epilepsy diagnosis and epileptic seizure detection problems using extracted features will be discussed.
The significance of this work is the novelty of the proposed feature extraction methodology and the achieved high separability for the data we consider, which are useful for both epilepsy diagnosis and epileptic seizure detection problems. The proposed feature extraction method makes the use of simple classification methods possible. Also, the proposed methods are applicable to other types of signals, such as financial time series and long-term observational economic data for classification or pattern recognition problems. To the best of our knowledge, functional PCA in the wavelet spectral-domain is used for the first time in biomedical signal processing. This paper is organized as follows. In Section 2, we discuss wavelet spectral analysis and functional principal component analysis. We provide a detailed mathematical description of how the proposed method is developed and how the features are extracted via functional principal component analysis. In Section 3, publicly available short-term EEG data are analyzed, and the results are analyzed, including the study of the effect of the number of components in approximating wavelet power spectra on the separability of extracted features. We also analyze a set of patient-specific multi-channel long-term epileptic EEG signals and compare the extracted features from both interictal and ictal signals. Finally, we conclude our findings and provide further remarks in Section 4.

Methods
In this section, we focus on the in-depth discussion of the technical aspects of our proposed method. We summarize the wavelet power spectra, including both the discrete and continuous wavelet power spectra, and principal component analysis, to ensure that the paper is self-contained. We also discuss how the functional PCA approach is combined with the wavelet power spectra to serve as a double feature extraction method. For a complete reading on the wavelet methods and functional PCA, we refer readers to [21,22].

Wavelet Power Spectra
Mathematically speaking, the power spectrum transforms a given signal from the time domain to the spectral domain. If the transformation is a wavelet function, then we have a wavelet power spectrum. There are two types of wavelet power spectrum: continuous wavelet power spectrum and discrete wavelet power spectrum. We first briefly discuss the power spectrum based on continuous wavelet transform (CWT). We then discuss the wavelet power spectrum based on the discrete wavelet transform (DWT) and explain its superiority compared to CWT.

Continuous Wavelet Transform
The continuous wavelet transform of a continuous random signal x(t) concerning a given wavelet function ψ(t) is defined as where the (*) indicate the complex conjugate and ψ( t−b a ) is the dilated and translated version of wavelet function ψ(t) with a being the dilation parameter and b being the translation parameter. In continuous wavelet transform, the most common choices of ψ(t) are Morlet, Mexican hat, or Paul wavelets, to name a few [22]. If the random signal is discrete and is denoted by x k , k = 1, 2, . . . , N, the CWT of x k becomes After the wavelet transforms, one can define the wavelet power spectrum as | W b (a) | 2 . Notice that | W b (a) | 2 is a two-dimensional quantity. This work further considers the global wavelet power spectrum, which is defined as the average power spectrum over all the local wavelet spectra in time. It is given as follows for a continuous random signal. If a random signal is discrete, then the global wavelet power spectrum can be estimated by This is referred to as the wavelet variance of a signal at the scale a.

Discrete Wavelet Transform
The discrete wavelet transform is done by taking discrete values of the dilation and translation factors a and b. A common approach is to take the logarithmic discretization of the dilation factor a and link it to b by making b be proportional to a. This implies that we have the following forms of discretization of the normalized wavelet function: which is the result by taking a = a m 0 , and b = nb 0 a m 0 , respectively, in Equation (2). Here m, n are all integers that control dilation and translation, respectively. a 0 is the fixed dilation step and is greater than one. b 0 is the location parameter that has to be greater than zero.
In practice, a common choice of a 0 and b 0 are 2 and 1, respectively. This leads to the so-called dyadic scaling of wavelet transform. The dyadic scaling wavelet function is then defined as Based on this wavelet function, the wavelet transform of a sequence of discrete values of signal x k becomes The global wavelet power spectrum is then defined as where N m is the total number of translation values associated with dilation parameter m.
Notice that the wavelet power spectrum is different from a scaleogram for wavelets, which can be used to estimate instantaneous frequency. Instead, the wavelet power spectrum can be interpreted as a density function of instantaneous frequency. Using the wavelet power spectrum, one reduces the dimension of wavelet transform and extract the functionality of frequency values so that the application of functional principal component analysis becomes sensible.

Principal Component Analysis
In multivariate statistics, principal component analysis (PCA) of a p-variate random vector X = (X 1 , X 2 , . . . , X p ) looks for a set of weight values, ξ j = (ξ 1j , ξ 1j , . . . , ξ pj ), so that at the jth step, the linear combinations of variable X (j) i , for j = 1,2,· · · , p, has the greatest variance [23], with X (1) i being the original signal X i . Mathematically, the solution of weight values can be found by solving for the following maximization problem After the first principal component ξ 1 is obtained, the maximization process in (9) is repeated for j = 2, . . . , p by replacing X j−1 i with the value that subtracts X j−1 i by the previous principal component, subject to the normalization and orthogonality constraints, That is to say in Equation (9), we have The iterative input of X (j) will lead to the solution of ξ j in the optimization problem in (9), for j = 1, 2, . . . , p.
Alternatively, due to the fact that the optimization problem in (9) is mathematically equivalent to the following non-linear programming problem Furthermore, for n samples of X (with zero mean for each variate), we have a n × p data matrix X that implies Cov(X, X) = 1 n−1 X X, and the optimization problem becomes which can be solved by finding the solution of the following eigen equation where V = 1 n−1 X X is sample variance-covariance matrix calculated based on the n samples of X, and λ is the eigenvalue associated with the eigenvector ξ. There are a sequence of pairs (λ j , ξ j ), satisfying the eigen equation. This eigen analysis leads to the solutions for both the eigenvalues and the eigenvectors. In PCA, we refer ξ j to the jth principal component. For feature extraction of data matrix X, we then compute Xξ j , for j = 1, 2, . . . , p, and they are referred to as principal component scores.

Functional Principal Component Analysis
Now let us consider functional principal component analysis [24,25], which will be applied to the wavelet power spectra. We first consider the case when CWT was applied to a signal x t , in which the wavelet power spectra W 2 (a) is obtained. For a set of n samples, we denote the wavelet power spectrum of the ith signal by I i (a), where I i (a) = W 2 i (a). Thus, the mean wavelet power spectrum of these n signals becomesĪ(a). The functional variance-covariance at wavelet scales a and a can be estimated by Suppose that the wavelet power spectrum of a given signal can be approximated using K basis functions, and it is given as follows: where µ 0 (a) is the functional mean of wavelet power spectra. Without the loss of generality, for the purpose of function approximation, we can take µ 0 (a) to be a constant value denoted as µ 0 . Since we consider an approximation of I i (a) rather than represent the function exactly using infinity number of basis functions, we may then express Equation (14) in a matrix form where µ = (µ 0 , µ 0 , . . . , µ 0 ) , φ = (φ 1 , φ 2 , . . . , φ K ) , and the coefficient C is n × K matrix given as follows: Now we consider how to obtain the functional principal components and their scores. Let us first denote the variance-covariance function by v(a, a ), which is defined in (13).
To find the principal component weight functions, we have to solve the following eigen equation for an appropriate eigenvalue λ v(a, a )ξ(a )da = λξ(a). (16) Suppose that eigen function ξ(a) in the left hand side of Equation (16) has an expansion Here, Φ = φφ da is a K × K matrix, and it has entries φ k 1 φ k 2 da , where k 1 = 1, 2, . . . , K and k 2 = 1, 2, . . . , K. The eigen equation in (16) becomes Since (17) must hold for all a, this implies that the following matrix equation must hold To obtain the required principal components, we define By solving the symmetric eigenvalue problem in (19) for u, and then compute b = Φ −1/2 u to obtain the eigen function ξ(a), we obtain If φ k (a) are orthonormal, we have Φ = I, which is an K × K identity matrix. The eigenanalysis of functional PCA problem in (18) reduces to This becomes a PCA problem where the variance-covariance matrix is replaced by the coefficient matrix C obtained from function approximation of wavelet power spectra. However, one should realize that when orthogonal basis functions are used in functional PCA, the problem is not a standard PCA. From the discussion above, PCA conducts eigenanalysis for a p × p covariance matrix. If we apply PCA to the signal matrix, then this p becomes the length of signal n. With function approximation using K basis function, the eigenanalysis of functional PCA is applied to a K × K coefficient matrix, which depends on the value of K and often K p. From the computational complexity perspective, when using a sparse approximation, the functional PCA is more efficient.
If wavelet power spectra are obtained using DWT, then the left hand side of the equation in (14) is replaced by I i (m), which is defined as I i (m) = W 2 (m). Let us consider the discrete wavelet transform using the dyadic scale, with m taking values from 1 to M, where M is determined by 2 M = n. Because of this, the value of K cannot be larger than M. The eigen equation in (16) becomes By subsitituting the new form of eigen function ξ(m), it gives the following new eigen euqation This matrix has entries ∑ M m=1 φ k 1 (m)φ k 2 (m) for k 1 = 1, 2, . . . , K and k 2 = 1, 2, . . . , K. The equations of (18) and (19)

Feature Extraction by Functional Principal Component Analysis
After the eigen function ξ(a) is obtained, we can extract the jth principal component scores, denoted by P j , for the given power spectra I(a) by the following P j = I(a)ξ j (a)da, j = 1, . . . , K.
Substituting (15) and (20) to the equation above, we obtain Here,μ = µφ (a)da is a K × K matrix, where µ = (µ 0 , µ 0 , . . . , µ 0 ) . We refer P 1 to the first principal component score vector of the n power spectra of signals, and P 2 is the second principal component score vector, and so on. When DWT is applied, we can simply replace I(a) and ξ j (a) by I(m) and ξ j (m), respectively.

Discussion
There are other possible choices for transforming high-dimensional signals, such as Fourier transform. Additionally, different PCA variants may be considered. By comparing with our previous work in [17,18], we found that discrete wavelet domain transformation outperforms the Fourier domain transform in terms of the sparsity of functional approximation of power spectra. The Fourier power spectra tend to be less smooth, while discrete wavelet power spectra are smoother. Because of this, the separability of extracted features from power spectra is less affected by the non-smoothness for the discrete wavelet power spectra case. When CWT is used to obtain power spectra, this advantage disappears. This may imply that the sparsity on function approximation is critical for the success of feature extraction. On the other hand, to approximate the power spectra and extract the signal features, kernel PCA may be considered due to the nature of kernel methods. However, the kernel PCA is more suitable when the extension of dimension is needed rather than when dimension reduction is concerned. Functional PCA aims at approximating the function via a set of basis functions, and the sparsity of the approximation can be controlled by selecting the number of basis functions. This makes functional PCA a unique choice for feature extraction of functional data.

Feature Extraction of Short Epileptic EEGs
To illustrate the application of the proposed method, we first use a set of epileptic EEG signals that is from the University of Bonn, Germany. (http://epileptologie-bonn. de/cms/front_content.php?idcat=193, accessed on 1 January 2018). This data set has been widely used to test machine learning algorithms for epilepsy diagnosis and epileptic seizure detection. There are five different data sets, denoted as A, B, C, D and E. Signals in sets A and B are from healthy people with eyes closed and open. Signals in sets C, D and E are epileptic signals from patients who have epilepsy. Signals in set C were collected from patients' non-epileptogenic zone, and signals in set D were from patients' epileptogenic zone. Signals in set E were obtained when epilepsy seizure was onset. Each type of signal (i.e., A, B, C, D, E) contains 100 single-channel scalp EEG segments of 23.6 seconds, and they were sampled at 173.61 Hz (i.e., N = 4096).
In Figures 1a and 2a, the extracted two-dimensional features of wavelet power spectra using CWT for signals in sets A, B, C, D, and E are presented. The maximum number of basis functions (i.e., K = 114) is used to estimate the wavelet power spectra. The results show that the extracted features form clusters according to their signal types and are well separated by different types of signals. Even though both sets A and B are from healthy people, we do not see the same wavelet power spectra patterns. This may suggest that the brain activities (i.e., Eyes Open and Eyes Closed) can lead to different power spectra. Additionally, for epileptic signals, the wavelet power spectra depend on which zone of brains that EEG signals were collected. From Figure 2a, we observe that E set signals (i.e., seizure onset) are well separated from sets C and D (i.e., non seizure signals). This may imply that a simple classifier such as k-NN can successfully classify the signals based on the extracted two-dimensional signal features. In Figures 1b and 2b, the same type of results is reported, but they are obtained using DWT with maximum number of basis functions, i.e., K = 12. Notice that the wavelet power spectra under the DWT are obtained for a dyadic scale only. They can be interpreted as the wavelet variances at the dyadic scale by summarizing the variations of the neighbouring CWT coefficients at the corresponding dyadic scale. Because of this, the wavelet power spectra under DWT capture the overall smoother pattern so that the extracted features may be less noisy and more separable in a low-dimensional feature subspace. The results shown in Figures 1 and 2 suggest that feature extraction by functional PCA of wavelet power spectra leads to non-linear separable features that facilitate both epileptic seizure detection and epilepsy diagnosis problems. Another interesting question is how the number of basis functions K affects the feature separability. That is, how the sparsity of the number of basis functions affects the feature separability. The influence from the choice of the value of K is much more significant when CWT is used. From the results reported in Figure 3, one can see that principal component scores are overlapped when K is smaller than 20, but the results are improved with the increase of the K value. This implies that approximation using a small number of basis functions is not enough in capturing differences among signal groups for a two-dimensional feature vector. However, when K is larger than 30, they are clearly separable for all sets, i.e., A, B, C, and D. The results are almost the same for all choices of K if K > 30. By increasing the number of basis functions, it does not give extra benefit on improving the feature separability. We conclude that the sparsity of function approximation in functional PCA affects the feature extractions. For DWT, the sparsity of functional approximation corresponds to different wavelet levels, and the level is at the dyadic scale, so the maximum level allowed in function approximation is small. This has become superior when compared to the case using CWT, which requires a much larger number of basis functions for approximating the wavelet power spectra. Our study shows that DWT is a better choice for feature extraction using functional PCA. Using the wavelet power spectra based on DWT, feature separation level is much higher than the method using CWT, for the signals we consider. The results shown in Figures 3 and 4 may suggest a potential application of the proposed method to the epilepsy diagnosis problem, while the results presented in Figure 5 seem to be promising for epileptic seizure detection. Due to the significant difference in signal power spectra between seizure on-set signals (Set E) and the non-seizure signals (i.e., Sets C and D), we do not observe a significant impact from the choice of the value of K on the feature separability. One can see that the pattern of the extracted features remain the same starting from K = 5. We also notice that when K = 4, the extracted features behave differently, but it remains linearly separable in the two-dimensional feature subspace.  Finally, the eigenvalues and eigenfunctions of the power spectra under both CWT and DWT are presented in Figure 6. The first three eigenfunctions share some commonalities in terms of their shapes of functions between CWT and DWT. For eigenvalues, the method using CWT has a higher number of significant eigenvalues than the one using DWT because CWT captures more details in the local fluctuation of the power spectra. As we have discussed, for features extracted using DWT, the signals can be separated according to their groups using only the first PC. The is reflected by the first eigenvalue shown in (c) of Figure 6.

Feature Extraction of Long Term Multi-Channel Patient Specific Epileptic EEGs
To further demonstrate the applicability of the proposed method in the epileptic seizure detection problem, we analyze EEG signals at interictal and ictal stages from four epileptic patients. These EEG signals were collected from six channels. Figure 7 displays the DWT power spectra of each channel's signals of interictal patients. We observe that all wavelet power spectra contain some commonalities: the overall decaying pattern is the same, but some power spectra of selected channels appear to have heavy tails. The feature extraction from these power spectra would then become separable due to the different tail patterns of wavelet power spectra. This can be seen from the results displayed in Channel 4 of patient 3, channels 5 and 6 of patient 4, and channel 2 of patient 6 lead to linearly separable feature patterns that may be associated with the epileptic zone. This indicates that signals from the epileptic zone have higher values of wavelet power for the higher wavelet decomposition level. To study the variability of the extracted features, which are the principal component scores, we report the 95% confidence intervals for both first and second PCs, by patients and by channels. The results are displayed in Table 1. The comparison by channels between different patients is shown in Figure 9. The obtained results demonstrated that extracted features behave completely differently for a different channel between patients. For example, using the signals from channel 2 or 3, the first principal component scores can be used to differentiate EEG signals. It seems that channel 6 shares more commonalities than other channels due to the similar extracted features. This may suggest that the analysis of interictal epileptic EEGs should be done on a patientspecific basis. If combining signals from different patients, then an effort to select a specific channel less affected by the patients may be needed.  The results obtained from the ictal signals are presented in Figure 10. We can observe that the extracted features are formed into clusters by different channels and by different patients. Channels 1 and 2 are significantly overlapped for patients 1, 3 and 4, while they are well separated for patient 6. channels 3 and 4 and channels 5 and 6 are also overlapped for patients 1, 3 and 4. Although for patient 6, the extracted features are well separated by different channels, we still can see that they make into three groups, similar to those for patients 1, 3 and 4. This may imply that there are commonalities of the extracted features for the ictal signals from epileptic patients, and they may be useful in computer-aided epileptic seizure detection. To further verify this, we compare the variability by constructing the 95% confidence intervals for the first two PC of extracted features for both interictal and ictal. The results are shown in Table 2. We observe that the 95% confidence intervals do not overlap for interictal signals and ictal signals. In Figure 11, we observe some big differences in the first PC of wavelet power spectra between interictal and ictal types for channels 1 and 2, channels 3 and 4, and channels 5 and 6, for different patients. The large values of the PC scores are due to the stage of seizure onset. Due to this clear separation of extracted features and their low variability, feature extraction of DWT power spectra via functional PCA becomes useful for classifying the interictal and ictal signals. Additionally, the high degree of separation of these two types of signals through the proposed feature extraction methods implies the usability of simple classification method methods such as the K-NN classifier. This makes computer-aided medical diagnosis easily developed due to the simplicity of the classification method.

Conclusions
In this work, we proposed an approach that first transforms signals to the wavelet power spectra. Then Functional PCA is used to extract features of wavelet power spectra to facilitate epilepsy diagnosis and epileptic seizure detection problems. Transforming EEG signals to wavelet power spectra enhances the functionality of input signals so that functional PCA becomes useful in feature extraction. We applied the proposed method to both short EEGs and long-term, multi-channel EEGs. Using the proposed method, different types of epileptic EEG signal becomes linear or non-linear separable in the lowdimensional feature subspace. The extracted features are formed into clusters by channels for multi-channel signals and behave differently and non-linear separable for different patients. The degree of feature separation depends on selecting the number of basis functions used to approximate the power spectra in functional PCA. These may indicate the natural complexity of epileptic signals, and the analysis needs to be done on patientspecific. Fortunately, we can produce linear or non-linear separable features to classify the interictal and ictal signals. The obtained results demonstrated that the proposed method is promising for epilepsy diagnosis and seizure detection problems. The proposed method can also be applied to other types of random signals, such as financial or economic time series, for similar purposes. Our future work will be investigating how different ways of achieving sparsity of functional approximation of power spectra affect the separability of extracted low-dimensional features.