Development of Automated Sleep Stage Classification System Using Multivariate Projection-Based Fixed Boundary Empirical Wavelet Transform and Entropy Features Extracted from Multichannel EEG Signals

The categorization of sleep stages helps to diagnose different sleep-related ailments. In this paper, an entropy-based information–theoretic approach is introduced for the automated categorization of sleep stages using multi-channel electroencephalogram (EEG) signals. This approach comprises of three stages. First, the decomposition of multi-channel EEG signals into sub-band signals or modes is performed using a novel multivariate projection-based fixed boundary empirical wavelet transform (MPFBEWT) filter bank. Second, entropy features such as bubble and dispersion entropies are computed from the modes of multi-channel EEG signals. Third, a hybrid learning classifier based on class-specific residuals using sparse representation and distances from nearest neighbors is used to categorize sleep stages automatically using entropy-based features computed from MPFBEWT domain modes of multi-channel EEG signals. The proposed approach is evaluated using the multi-channel EEG signals obtained from the cyclic alternating pattern (CAP) sleep database. Our results reveal that the proposed sleep staging approach has obtained accuracies of 91.77%, 88.14%, 80.13%, and 73.88% for the automated categorization of wake vs. sleep, wake vs. rapid eye movement (REM) vs. Non-REM, wake vs. light sleep vs. deep sleep vs. REM sleep, and wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. REM sleep schemes, respectively. The developed method has obtained the highest overall accuracy compared to the state-of-art approaches and is ready to be tested with more subjects before clinical application.


Introduction
Sleep is one of the important activities of human beings and plays an important role in maintaining both mental and physical health [1,2]. Sufficient good quality sleep enhances the learning ability and performance of a person. Inadequate or a lack of proper sleep increases the occurrence of various sleep-related pathologies such as insomnia and bruxism, and other complications such as neurological diseases, cardiac diseases, hypertension, and diabetes [3]. Typically, sleep is categorized into wake, rapid eye movement (REM), and non-REM (NREM) sleep classes [4]. The sleep sub-types such as S1-sleep, S2-sleep, S3-sleep, and S4-sleep fall under the class of NREM sleep. The S1-sleep and S2-sleep sub-types are termed as light sleep (LS), whereas S3-sleep and S4-sleep sub-types are considered as deep sleep (DS) [5]. The heart activity, respiratory activity, eye movement, and muscle activity are slow during S1-sleep [6]. In S2-sleep, the eye movement is stopped, and there is also a drop in the body temperature. In the DS stage, the δ-wave activity of the brain increases, and heart rate and respiratory rate are dropped to the lowest level [7]. Moreover, during REM sleep, there is an increase in the physiological parameters such as blood pressure, heart rate, respiratory activity, and body temperature [8]. The rapid eye movements during this sleep stage affect the brain activity and these changes are faithfully reflected in the electroencephalography (EEG) signals of selected channels [7,9]. The polysomnography (PSG) test is normally performed in the clinical study for the diagnosis of sleep-related pathologies [10,11]. In the PSG test, various physiological signals such as EEG, electrocardiogram (ECG), respiratory signal, electromyogram (EMG), and oxygen saturation (SPO 2 ) are recorded from the subjects [1,3]. The human experts or sleep technologists manually assign the sleep classes to the segments of the physiological signal using Rechtschaffen and Kales (R& K) guidelines [12]. This process of sleep staging is cumbersome and, hence, automated approaches based on the analysis and classification of different physiological signals are needed. The discrimination of sleep stages from the physiological signal using one modality (e.g., EEG) can reduce the number of sensors used in the PSG test [13]. The multi-channel EEG signal has been used for the automated categorization of different sleep stages [9,14]. The development of a new approach for the automated categorization of various sleep stages using multi-channel EEG signals is an important research topic in neuroscience.
In the last two decades, different automated approaches have been employed for the automated categorization sleep stages using single-channel and multi-channel EEG signals [3,[15][16][17]. A detailed review of the existing approaches is described in [18,19]. Song et al. [20] have used wavelet domain fractal analysis of single-channel EEG signal and quadratic discriminant analysis for the automated categorization of sleep stages. They have reported accuracies of 63.6%, 61.8%, 85.6%, and 21.7% for the classification of S1-sleep, S2-sleep, DS, and REM sleep categories, respectively. Similarly, Fraiwan et al. [21] have extracted Renyi entropy features in the time-frequency domains of single-channel EEG and used random forest classifier for the discrimination of different sleep stages. They have compared the performance of three time-frequency analysis methods such as Hilbert-Huang transform (HHT), Choi-Williams distribution (CWD), and discrete-time continuous wavelet transform (CWT) using EEG signals [21]. An overall accuracy score of 73.21% is reported for the categorization of S1-sleep, S2-sleep, S3-sleep, and REM sleep classes. Tsinalis et al. [22] have considered a convolutional neural network (CNN)-based deep learning approach for the automated categorization of sleep stages using single-channel EEG signals. They have reported an overall accuracy of 74% for the discrimination of S1-sleep, S2-sleep, S3-sleep, and REM sleep stage classes. Moreover, Huang et al. [14] have extracted spectral features from different bands of multi-channel EEG signals and used a multi-class support vector machine (MSVM) model to classify various sleep stages. Their method has reported an overall accuracy of 68.24%. Rodriguez-Sotelo et al. [9] have computed various non-linear features such as Shannon entropy, approximation entropy, sample entropy, detrended fluctuation analysis, multi-scale entropy, and fractal dimension features from two-channel EEG signals for the discrimination of sleep stages. They have used unsupervised learning method such as J-means clustering for the categorization of wake, S1-sleep, S2-sleep, S3-sleep, and REM sleep stages and obtained an accuracy of 57.4%. Moreover, Lagnef et al. [23] have extracted both time domain and spectral features from the multi-channel EEG signals and used a dendrogram-based SVM (DSVM) model for the categorization of wake, S1-sleep, S2-sleep, S3-sleep, and REM sleep types. They have achieved an overall accuracy of 74.8% using DSVM classifier. Andreotti et al. [24] have used CNN for the automated categorization of sleep stages using EEG signals from different databases. They have obtained a Kappa score value of 0.58 for five class sleep stage classification scheme using CNN. The CNN-based approach has demonstrated less classification performance compared to feature-based techniques using multi-channel EEG signals.
The existing approaches have used various uni-variate signal processing techniques for the classification of sleep stage classes with EEG signals. In recent years, various multivariate signal decomposition-based methods have been used for the analysis of different multi-channel physiological signals [1,25,26]. These methods considered all channel information of the physiological signals simultaneously for the decomposition. The multivariate empirical wavelet transform (MEWT) has been used for the categorization of seizure and seizure-free classes using multi-channel EEG signals [27]. In MEWT, the discrete Fourier transform (DFT) of individual channel EEG signal is computed and the average of DFTs of all EEG signals is used to generate the composite Fourier spectrum. The empirical wavelet filters are designed using the segments of composite Fourier spectrum [28]. The modes are evaluated using the designed wavelet filters and DFT of each channel EEG signal. The projection-based approach has been used in multivariate EMD (MEMD) and multivariate Fast and adaptive EMD (MFAEMD) to evaluate the composite signal from the multi-channel signal [29,30]. The advantage of the projection-based approach is that all channels are used to generate the composite signal. The parameters evaluated from the composite signal are used to extract the modes of each channel signal. The entropy measures have been widely used to quantify the information from EEG signals for various applications such as seizure detection, emotion recognition, and sleep stage classification [7,31,32]. The bubble entropy (BE)-based measure has been proposed for the analysis of heart rate variability (HRV) signals [33]. This measure used only one parameter such as an embedded dimension to quantify the regularity and complexity of a time series. Similarly, the dispersion entropy (DE)-based information measure has also been employed for the categorization of sleep stage classes using single-channel EEG signals [3,34]. Both BE and DE can be used in the multi-scale domain of multi-channel EEG signals for the categorization of sleep stages. The hybrid learning-based classifier has been considered for various applications, namely the detection of heart pathology such as congestive heart failure using electrocardiogram (ECG) signal features [35], and heart valve pathology detection using phonocardiogram (PCG) signal features [36]. This classification approach is distance-based and it does not use any weight updating rule based on the gradient descent algorithm like neural networks or deep learning methods. The number of training parameters is less in hybrid learning compared to the deep learning-based classifiers [35]. The hybrid learning classifier can be used for the automated categorization of different sleep stages using multi-scale entropy features extracted from the multi-channel EEG signals. The novelty of this work is the development of a multivariate multi-scale information-theoretic approach for the categorization of sleep stages using multi-channel EEG signals. The contributions of this paper are as follows: (I) Novel multivariate projection-based fixed boundary empirical wavelet transform (MPFBEWT) is introduced for the multi-scale decomposition of multi-channel EEG signals.
(II) Two novel entropies (BE and DE) are used to extract the features in the multivariate multi-scale domain of multi-channel EEG signals.
(III) A hybrid learning-based classifier is employed for the categorization of sleep stages. The remaining sections of this manuscript are organized as follows. In Section 2, the multi-channel EEG signals collected from the public database for the proposed classification task is described. The proposed approach for the categorization of sleep stages is explained in Section 3. In Section 4, experimental results and its discussion are presented. The conclusion of the paper is highlighted in Section 5.

Method
The approach employed in this work for sleep stage classification is depicted in Figure 1. This approach consists of the evaluation of multi-channel EEG frames obtained from multi-channel EEG recordings using a segmentation technique. The multi-channel EEG frames are decomposed in to various sub-band signals using the MPFBEWT method. The BE and DE entropies are extracted from these sub-bands and clinically significant features are classified using hybrid learning classifier. Each stage involved in the flow chart is described in detail in the following sub-sections.

EEG Frame Evaluation
In this work, we have segmented each of the multi-channel EEG recordings into frames of 30 s duration. Before segmentation, the amplitude of each channel EEG signal is normalized by dividing the gain parameter of 32.76 [37]. The segmentation process is performed using a non-overlapping moving window of 30 s duration (15360 samples) [3]. In Table 1, we show the number of multi-channel EEG frames (or instances) used to evaluate the proposed approach for the automated discrimination of sleep stages.

Multivariate Fixed Boundary-Based EWT Filter Bank
The extension of EWT for the analysis of multi-channel signals is termed as the multivariate EWT [27]. The objective of EWT is the detection of boundary points in the Fourier spectrum of the analyzed signal [28]. Then, the contiguous segments extracted from the Fourier spectrum of the analyzed signal are used to design the empirical wavelet filter bank. In this work, we have proposed an MBFBEWT filter bank for the decomposition of multi-channel EEG signals. The sub-band signals of multi-channel EEG are evaluated in five steps. First, the multi-channel EEG signal X ∈ R N×m is projected into a unit vector. The factor m is the total number of channels. In MFAEMD, the performance of the projection of a multi-channel signal X is based on the weighted sum of all channel signals [30]. For taking the projection, a point set for sampling on the (m − 1)-dimensional unit sphere is considered [29]. The direction vector computed by a point on the (m − 1)-dimensional unit sphere has the length m. The (m − 1)-dimensional unit sphere contains the set of points (v 1 , v 2 , ..., v m ) which satisfy the condition of v 2 1 + v 2 2 + ... + v 2 m = 1 in the Euclidean space. The vector representation of this point on (m − 1)-dimension or channel unit sphere is given as n = v 1 n 1 + v 2 n 2 + · · · · · · · · · + v m n m , where n 1 , n 2 ......... n n are the unit vectors of different channels [30]. In this study, we have considered the value of points such as (v 1 , v 2 , ..., v m ) as the direction cosines for all channels and they are given by 1 √ m . This unit vector used in this work is given as follows [29,30]: Similarly, the parameter N corresponds to the number of samples present in each channel of multi-channel EEG signal. The projected EEG signal is computed as follows: where x ch 1 (n), x ch 2 (n), ..., x ch m (n) are the EEG signals for different channels. In EWT, methods such as local maxima, scale-space, order statistics filter (OSF), etc., have been used for the detection of boundary points in the Fourier spectrum of the analysed signal [28,39]. For multivariate projection-based EWT, the filter bank can be designed based on the extraction of segments from the Fourier spectrum of the projected signal using any one of the boundary detection methods. However, in this study, the fixed boundary points are considered to design the filter bank. Hence, in the second step, we have considered a frequency grid as ([− F s 2 , F s 2 ]) instead of the DFT of the projected EEG signal for the creation of the filter bank [28]. Third, the fixed boundary points are evaluated to design the EWT filter bank. These boundary points are computed from the frequency points [40]. In this work, we have created an MBFBEWT filter bank using the frequency ranges of bands or rhythms of EEG signals. The δ, θ, α, β, and γ rhythms have frequency ranges such as 0-4 Hz, 4-8 Hz, 8-13 Hz, 13-30 Hz, and 30-75 Hz, respectively [41]. In this work, the frequency points F = [4 8 13 30 75] are used to design the empirical wavelet filter bank [41]. The tth boundary point is obtained from the tth frequency point using the following relation [40]: After obtaining the boundary points, the frequency grid ([− F s 2 , F s 2 ]) is segregated into segments for both positive and negative sides, and these segments are denoted as where FB 0 = 0, and FB N s = F s 2 [28]. The concatenation of all boundary points should cover the entire frequency range ([0, F s 2 ]), and it is given by [28] N s where N s is the number of segments. In this work, a total of N s = 6 segments are extracted from the frequency domain representation of the projected EEG signal. In the fourth step, the empirical scaling and wavelet functions are used to create filters using the segments computed from the Fourier domain of the projected EEG signal. The empirical scaling function (SF) is given as follows [28]: Similarly, the empirical wavelet function (WF) is written as follows [28]: The factor g(z) is given as g(z) = 35z 4 − 84z 5 + 70z 6 − 20z 7 [28]. The transition phase width at tth boundary point is given as 2η t [41]. The factor η t can be selected as η t = αFB t where 0 < α < 1 [28]. The value of α is selected as α < min t FB t+1 −FB t in order to get the sets SF 1 , {WF t } N s t=2 as tight frames in the Euclidean space [28]. In the fifth step, the sub-band signals of the multi-channel EEG signal x m (n) are evaluated. The mth channel approximation sub-band signal is evaluated as follows: where 1 is the frequency domain approximation sub-band signal and it is obtained using the multiplication of the spectrum of the mth channel EEG signal with the complex conjugate of the empirical scaling function [40]. The parameter SF is termed as the complex conjugate of the scaling function.
. Moreover, the tth detailed sub-band signal for the mth EEG channel is computed as follows [40]: where WW m t = [WW m t (k)] N−1 k=0 =x m WF t is the frequency domain of the tth detailed sub-band signal obtained using the multiplication of the spectrum of the mth channel EEG signal with the complex conjugate of the empirical wavelet function for the tth segment [28]. The factor WF is the complex conjugate of WF. Similarly, R(.) is denoted as the real part of the signal [28]. The algorithm for the evaluation of the sub-band signals of the mth channel is summarized in Algorithm 1.
Algorithm 1: Evaluation of modes obtained from multi-channel electroencephalogram (EEG) signal using multivariate projection-based fixed boundary empirical wavelet transform (MPFBEWT) filter bank.
The four-channel EEG signals (F4-C4 channel, C4-P4 channel, P4-O2 channel, and C4-A1 channel) are shown in Figure 2a-d. The projected EEG signal evaluated from the multi-channel EEG is depicted in Figure 2e. The detected frequency points for the design of the MPFBEWT filter bank in the spectrum of the projected EEG signal are shown in Figure 2e. The MPFBEWT filter bank was computed using empirical scaling and wavelet functions that are depicted in Figure 2f. The purpose of considering the spectrum of the projected EEG signal for deriving an empirical wavelet filter bank is given as follows.
In multi-channel signal decomposition approaches like MEMD and MFAEMD, the composite signal is evaluated at the initial step by considering the information of all channel signals [29,30]. The mean envelope is computed from the composite signal using maxima-minima detection and the evaluation of upper and lower envelopes [29]. The mean envelope is used to obtain the modes of each channel signal at each iteration or until the stopping criteria is fulfilled. Motivated by these studies, we have considered the segments from the spectrum of the projected EEG signal for deriving the empirical wavelet filter bank. Furthermore, this filter bank is used for the evaluation of sub-band signals of each channel EEG signal.The two-sided Fourier spectrum of the projected EEG signal is depicted in Figure 3a. As the sampling frequency of the EEG signal is 512 Hz, the spectral energy is distributed between 0 and 256 Hz in both sides of the Fourier spectrum. The frequency domain scaling function obtained using Equation (7) for segment 1 is shown Figure 3b. It is observed that the scaling function is a low-pass filter with cut-off frequency value of 4Hz. Similarly, the wavelet functions obtained using Equation (8) for segment 2, segment 3, segment 4, segment 5, and segment 6 are shown in Figure 3c, Figure 4a-d, respectively.   The F4-C4 channel EEG signals for the wake, S1-sleep, S2-sleep classes are depicted in Figure 5a,g,m, respectively. Similarly, for S3-sleep, S4-sleep, and REM sleep classes, F4-C4 channel EEG signals are shown in Figure 6a,g,m, respectively. The sub-band signals for the wake, S1-sleep, and S2-sleep classes are shown in Figure Figure 6n-r, we show the sub-band signals for S3-sleep, S4-sleep, and REM sleep stage classes. In the S1-sleep stage, the θ-wave activity increases in the EEG signal [42]. Similarly, in the early portion of S1-sleep stage, the α-waves are seen in the EEG signal [43]. Moreover, in S2-sleep, sleep spindles and K-complexes are present in the EEG signal. In S3-sleep and S4-sleep stages, the δ-wave activity increases in the EEG signal, and it is difficult to awaken a person during these sleep stages [42]. Furthermore, the REM sleep stage EEG signal characteristics are very similar to that of the wake stage EEG signal-this is the dreaming stage [42]. The muscle activity and the eye movements increase the amplitude of the EEG signal during the REM sleep stage [44]. As seen from the plots in Figures 5 and 6, for different sleep stage classes, the characteristics of sub-band signals or rhythms of EEG are also different. These differences can be effectively captured by extracting the features from the sub-band signals. In this study, the BE and the DE measures are computed from each sub-band signal of multi-channel EEG.

Entropy Features Extraction
In this work, we have extended the theories of DE and BE for the analysis of multi-channel EEG signals in multi-scale domain. The DE of tth sub-band signal of the mth channel x m t (n) is evaluated using six steps. First, the sub-band signal x m t (n) is mapped into a new signal, y m t (n) using a normal cumulative distribution function (NCDF). The value of y m t (n) varies between 0 and 1. Second, a linear function is used to assign a decimal value or level with the relation as follows [34]: where z m,a t (n) represents the nth sample of the mapped signal. The factor a stands for the ath level or decimal value. In DE, each sample of the mapped signal is assigned a decimal value. In the third step, the embedded vectors are extracted from the mapped signal z m,a t (n) using the embedded dimension as L. The embedded vector is evaluated as follows: where i represents the ith embedded vector and i = 1, 2, · · · · · · N − (L − 1)d. The parameter d is the time delay. The fourth step is the assignment of the dispersive pattern (DP) for the ith embedded vector and it can be written as π {r 0 ,r 1 ······r L−1 } , where each element of the ith embedded vector is given by z m,a t,i = r 0 , z m,a t,i+d = r 1 ,..., z m,a t,i+(L−1)d = r L−1 [34]. The number of possible DPs for the mapped signal, z m,a t (n) is given as a L [34]. In the fifth step, the relative frequency or probability of each DP for the tth sub-band signal of the mth channel is given by P m t (π {r 0 ,r 1 ······r L−1 } ) = count number of i for whichz m,a t,i has a DP π {r 0 ,r 1 ······r L−1 } N − (L − 1)d (12) where i = 1, 2, · · · · · · N − (L − 1)d. In the last step, the DE of the tth sub-band signal of he mth channel EEG is evaluated and it is given as follows [34]: Parameters such as the embedded vector length (L), time delay (d), and level (a) are used to compute the DE of each sub-band signal of the mth channel. In this work, we have considered L as 10, d as 1, and a as 2, respectively. In this work, a small value for L is selected in order to avoid the under sampling in the embedded vector. BE is a recently proposed information quantification measure, and has advantages in that requires only few features from the time series [33]. The BE of the tth sub-band signal of the mth channel, x m t (n), is evaluated in five steps. First, the embedding vectors from the tth sub-band signal of the mth channel are computed using Equation (12) [33]. Second, the 'L' elements in the ith embedding vector are sorted in an ascending order and the number of swaps are counted. The number of swaps for the ith embedding vector are denoted as ns i . Third, a histogram of the swap vector (a vector containing the swaps of all embedding vectors) is evaluated, and it is normalized to obtain the probability. The probability for the tth sub-band signal of the mth channel is given as follows: Fourth, the Renyi entropy for tth sub-band signal of the mth channel is evaluated as follows [33]: where B is the total number of bins. Similarly, the Renyi entropy is also calculated by considering the embedding dimension as L + 1, and it is denoted as E m,L+1 t . In the fifth step, the BE for the tth sub-band signal of the mth channel is evaluated as follows: In this study, for each sub-band of all four channels of EEG signals, the DE and BE features are computed. Thus, 20 dimensional BE and DE feature vectors are created. Hence, the entropy feature vector, which consists of 40 features of multi-channel EEG signals, is formulated and used as an input to the hybrid learning classifier for the automated categorization of sleep stages. The following sub-section describes the working of the hybrid learning classifier.

Hybrid Learning based Classifier
In this work, the hybrid learning classifier is used to discriminate various sleep stages using entropy features obtained from the multi-channel EEG signal in a multi-scale domain. This classifier is designed based on the residual of the class-specific sparse representation method and nearest neighbor distances [35]. The description of hybrid learning for sleep stage classification is shown in Algorithm 2.

Algorithm 2:
Hybrid learning classifier algorithm for classification of sleep stages.
1 Inputs: Training feature matrix (F tr ∈ R I tr ×q ), training class label (L tr ∈ R I tr ), test feature matrix (F te ∈ R I te ×q ), number of nearest neighbors (nn), desired sparsity level (ρ).
2 Output: Predicted class label, L P ∈ R I te 3 Step 1: The training feature matrix F tr is taken as a dictionary for the sparse representation of the test feature vector. The rth test instance f r can be written as f r = α 1 F 1 tr + α 2 F 2 tr + · · · · · · α e F e tr [35]. where F e tr is the feature matrix for eth class. α 1 , α 2 , · · · · · · α e are the class-specific sparse representation vectors. 4 Step 2: In this step, the combined sparse representation vector α = α 1 , α 2 · · · · · · α e is evaluated using the orthogonal matching pursuit (OMP) method as the optimization problem based on the fact that the minimization of L 0 -norm α = arg min α α 0 subjected to f r = αF tr is NP-hard [45]. 5 Step 3: The residual for the eth class is computed as Res e = f r − α e F e tr 2 [35]. 6 Step 4:In this step, the distances between f r and all training instances for the eth class are computed and these distances are given as dis e (j) = f r − f e tr j 2 . Then, the nearest distances for each class are selected. The median value of these distances for each class are evaluated, and they are given by D e = median (dis e (1 : nn)), where nn is the number of nearest neighbors for each class.
The matrix evaluated using the entropy features from the multi-channel EEG frames is written as F ∈ R I×q , where I is denoted as the total number of multi-channel EEG frames. Similarly, the factor q is termed as the number of entropy features. We have used hold-out and 10-fold based cross-validation (CV) schemes to develop the hybrid learning classifier [3,40]. For hold-out CV, 60%, 10%, and 30% of instances are considered as the training, validation, and testing of the hybrid learning classifier. Similarly, for 10-fold CV, 90% of instances from the feature matrix are used for training and the remaining 10% are used for the testing of the hybrid learning classifier in each fold [3,40]. The training and test feature matrices for the classification are given as F tr , and F te , respectively. Similarly, the class labels for training and testing the multi-channel EEG instances are given as L tr , and L te , respectively. In this work, five classification strategies are considered to evaluate the classification results using the hybrid learning classifier. These strategies are wake vs. sleep, wake vs. REM, wake vs. LS class vs. Ds class vs. REM, wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. REM, and wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. S4-sleep vs. REM, respectively [3,17]. In order to evaluate the classification performance, the overall accuracy, accuracy for the individual class and the kappa score are used [36,46]. The Cohen kappa is evaluated using the following mathematical expression as [47], where P op and P tp are the observed and total probability values, respectively. The observed and total probability values are evaluated from the confusion matrix [48]. The confusion matrix table for a four-class sleep stage categorization is shown in Table 2.
The observed probability is evaluated as follows: Similarly, the total probability is computed using the individual probability values and it is written as follows [47]: where P W , P LS , P DS , and P REM are the probabilities for wake, LS, DS, and REM sleep classes. These probabilities are evaluated as follows:

Results and Discussion
This section shows the statistical analysis results of DE and BE features obtained from the sub-band signals of each EEG channel of wake, LS, DS, and REM sleep stages. The hybrid learning classifier results are shown for different classification schemes. A comparison with existing multi-channel based sleep stage classification approaches is also presented in this section. The box-plots of DE and BE features for different classes are shown in Figure 7. It can be observed from the statistical analysis results that eleven entropy features have shown higher mean values for the LS class. Similarly, five entropy features out of forty features have obtained higher mean values for the DS class. For the REM sleep class, three entropy features have shown higher mean values. Moreover, twenty-one entropy features have demonstrated higher mean values for the wake class. The θ-waves present in the EEG signal during LS have shown higher amplitude values compared to the α-waves [7,9]. Similarly, during the wake class, the EEG signal is irregular, and the neural activities are not synchronous. Furthermore, during DS stages, γ-wave patterns appear in the EEG signal [3]. Moreover, during REM sleep, the EEG signal morphology is different from EEG signals for wake and NREM sleep stage classes [3]. Due to these physiological changes in the EEG signals for different sleep stages, BE and DE features extracted in the multivariate multi-scale domain of multi-channel EEG signals have different mean values. The analysis of variance (ANOVA) test employed in this study confirms the statistical significance of entropy features for the automated categorization of sleep stages [49]. It can be seen from the ANOVA results that all multi-scale DE and BE features have p < 0.001, and hence these entropy features are found to be clinically significant for the categorization of sleep stages using our proposed hybrid learning approach.  Table 3 shows the results obtained for proposed multivariate multi-scale approach for the automated categorization of the wake vs. sleep classification scheme with hold-out and 10-fold CV techniques using multi-channel frame selection techniques. Table 3. Performance of proposed method for the automated categorization of wake vs. sleep classification scheme. It is evident that the hybrid learning classifier has obtained accuracy and kappa scores of more than 91% and 0.80, respectively, for wake vs. sleep classification scheme (as shown in Table 3) using DE and BE features extracted from multi-channel EEG in the multi-scale domain. Similarly, the sensitivity and specificity values are also more than 90% for this classification scheme using the 10-fold CV strategy. Similarly, for hold-out CV, the hybrid learning classifier has obtained sensitivity, specificity and kappa score values of 86%, 91.16%, and 0.77, respectively. The classification results for wake vs. NREM vs. REM classification scheme using hold-out and CV methods are shown in Table 4. It is seen that, the accuracy of the NREM class is higher than the accuracy of the REM and wake classes. The average kappa score of 0.74 is obtained using hybrid learning classifier with the 10-fold CV method. Our proposed method has yielded a higher performance with 10-fold CV compared to hold-out CV. Moreover, for the classification scheme such as wake vs. LS vs. DS vs. REM sleep, the results obtained using our method are depicted in Table 5. It can be noted that the accuracy values of wake and deep sleep classes are more than 85% with the 10-fold CV method. The kappa score and overall accuracy values are higher for the 10-fold CV scheme compared to hold-out CV. The results of the classification task for wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. REM using our proposed method are shown in Table 6. It is evident that, for the wake, S2-sleep, S3-sleep, and REM sleep stage classes, the accuracy values are more than 72% with 10-fold CV. The S1 class has obtained the lowest accuracy using our proposed method. The average kappa score of 0.72 is obtained with 10-fold CV. Similarly, for the wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. S4-sleep vs. REM classification scheme, the accuracy for each class, the kappa score, and the overall accuracy are shown in Table 7. It should be noted that the accuracy values of the proposed method are more than 80% for the wake and S4 classes using the 10-fold CV strategy. Similarly, for S2 and REM sleep classes, the accuracy values are more than 70%. The average kappa score value for the six-class sleep stage classification scheme using our method is 0.65. The confusion matrices obtained for the wake vs. LS-class vs. DS-class vs. NREM, wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. REM, and wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. S4-sleep vs. REM sleep stage classification schemes are shown in Figure 8a-c. It can be observed that the number of true positive percentages obtained for wake, LS, DS, and REM sleep classes are 84.92%, 77.82%, 87.88%, and 69.68%, respectively. Similarly, for wake, S1-sleep, S2-sleep, S3-sleep, and REM sleep stage classes, the true positive percentages obtained are 89.01%, 46.85%, 74.42%, 76.72%, and 74.35%, respectively. Moreover, the true positive percentages obtained for wake, S1-sleep, S2-sleep, S3-sleep, S4-sleep and REM classes are 87.5%, 48.37%, 71.82%, 55.41%, 84.23%, and 73.30%, respectively. These results clearly indicate that the DE and BE features successfully captured the information from multi-channel EEG recordings for the automated categorization of different sleep stage classes. Moreover, the classification results are also evaluated by varying the DE and BE parameters such as embedding vector length (L), time delay (d) and level (a). The overall accuracy and kappa score values obtained using the hybrid learning classifier for the wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. S4-sleep vs. REM sleep classification scheme by varying DE and BE parameters are shown in Table 8. In this work, the results are shown for both validation and test sets. It can be observed that the overall accuracy and kappa score values are 72.72%, and 0.637, respectively, for L = 10, a = 2, and d = 1 using feature vectors obtained from the multi-channel EEG frames of the test set. Similarly, overall accuracy and kappa score values of 72.18% and 0.631 are obtained using the feature vectors obtained from the validation set. Moreover, for other values of L, a, and d, the overall accuracy and kappa score are less for both test and validation sets. Hence, we have considered L = 10, a = 2, and d = 1 to compute DE and BE features from the sub-band signals of the multi-channel EEG signal. We have selected hyper-parameters such as desired sparsity level (ρ) and the number of nearest neighbors (nn) of the hybrid learning classifier using the accuracy value of the validation set. The variations in the overall accuracy values with sparsity level and the number of nearest neighbors for the validation set and test set are shown in Table 9. It is observed that the hybrid learning classifier has an overall accuracy value of 37.96% for ρ = 2, and nn = 1, respectively. The overall accuracy value of the hybrid learning classifier increases by increasing the sparsity level from ρ = 2 to ρ = 20, and the nearest neighbors from nn = 1 to nn = 10, respectively. Moreover, the overall accuracy value decreases by increasing the sparsity level from ρ = 20 to ρ = 22, and the nearest neighbors from nn = 10 to nn = 11, respectively. Hence, the sparsity level of ρ = 20, and the number of nearest neighbors of nn = 10 are found as the optimal parameters for the hybrid learning classifier for the automated categorization of sleep stages. Moreover, the proposed information-theoretic approach is compared with the existing multi-channel EEG based techniques for the categorization of wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. REM sleep stages. Table 10 shows a summary of the results of a comparison with the state-of-art techniques. It can be observed from the reported works that the spectral features [14], and non-linear features [9], coupled with MSVM and J-means clustering techniques, have obtained lower overall accuracy values compared to the proposed MPFBEWT filter bank-based approach. The combination of both time domain and spectral features with DSVM classifiers led to higher overall accuracy values compared to our proposed information theoretic approach [23]. Moreover, the accuracy value reported using CNN-based transfer learning method is 67.70% [24], which is less than our proposed method. The proposed multivariate multi-scale approach has also demonstrated higher overall accuracy compared to the time-frequency domain Renyi entropy features combined with the random forest classifier [21]. The advantages of this study are summarized as follows:

Cross-validation Accuracy (%) Sensitivity (%) Specificity (%) Kappa score
(i) We obtained the highest classification performance compared to the spectral, and time-frequencybased entropy features of EEG signals. (ii) The extracted discriminative multi-scale BE and DE entropy features have yielded high classification accuracy. (iii) The proposed information-theoretic approach is simple and computationally less intensive. (iv) The developed hybrid learning model is evaluated for five types of sleep stage classification strategies. (v) We achieved a robust model using 10-fold CV and hold-out strategies.
The limitation of this work is that we used multi-channel EEG recordings obtained from only 25 subjects. In future, we intend to consider other entropy-based measures such as slope entropy [50], distribution entropy [51], state space domain correlation entropy [52,53], and other entropy measures [31] to improve the classification performance of sleep stages using more subjects. Table 10. Comparison of our proposed method with existing techniques for the categorization of wake vs. S1-sleep vs. S2-sleep vs. S3-sleep vs. REM sleep stages using multi-channel and single-channel EEG signals.

Feature Extraction Methods
Classifier Used Overall Accuracy (%) Spectral Features evaluated from different rhythms of multi-channel EEG signals [14] MSVM 68.24 Different non-linear features extracted from multi-channel EEG signals [9] Unsupervised learning (J-means clustering)

57.40
Time domain and spectral features extracted from multi-channel EEG [23] DSVM 74.80 Learnable features evaluated from multi-channel EEG signal in convolution layer stages [24] Transfer learning using CNN

67.70
Renyi entropy features computed from the time-frequency representation of single-channel EEG signals [21] Random forest 73.21 Multi-scale DE and BE features extracted from Multi-channel EEG signal (proposed work) hybrid learning 73.88

Conclusions
A novel information-theoretic approach is proposed for the automated categorization of different sleep stage classes using multi-channel EEG signals. The approach is based on the decomposition of each channel EEG signal in to various sub-band signals using the MPFBEWT filter bank technique. The dispersion and bubble entropies are extracted from the sub-bands of the MPFBEWT filter bank. The classification of various sleep stages is performed using a hybrid learning classifier with these entropy features. Our proposed approach has obtained classification accuracy values of 91.77%, and 88.14% for wave vs. sleep, and wake vs. NREM vs. REM sleep categories. The classification results of the proposed approach can be further improved by using other entropy measures in the multi-scale domain of multi-channel EEG signals.