An Automatic Sleep Stage Classification Algorithm Using Improved Model Based Essence Features

The automatic sleep stage classification technique can facilitate the diagnosis of sleep disorders and release the medical expert from labor-consumption work. In this paper, novel improved model based essence features (IMBEFs) were proposed combining locality energy (LE) and dual state space models (DSSMs) for automatic sleep stage detection on single-channel electroencephalograph (EEG) signals. Firstly, each EEG epoch is decomposed into low-level sub-bands (LSBs) and high-level sub-bands (HSBs) by wavelet packet decomposition (WPD), separately. Then, the DSSMs are estimated by the LSBs and the LE calculation is carried out on HSBs. Thirdly, the IMBEFs extracted from the DSSM and LE are fed into the appropriate classifier for sleep stage classification. The performance of the proposed method was evaluated on three public sleep databases. The experimental results show that under the Rechtschaffen’s and Kale’s (R&K) standard, the sleep stage classification accuracies of six classes on the Sleep EDF database and the Dreams Subjects database are 92.04% and 78.92%, respectively. Under the American Academy of Sleep Medicine (AASM) standard, the classification accuracies of five classes in the Dreams Subjects database and the ISRUC database reached 79.90% and 81.65%. The proposed method can be used for reliable sleep stage classification with high accuracy compared with state-of-the-art methods.


Introduction
Automatic sleep stage classification is an important research focus due to its importance for the study of sleep related disorders. There are currently two classification criteria for sleep stages. According to Rechtschaffen's and Kale's (R&K) recommendations, sleep stages can be divided into six stages: The Awake stage (Awa), rapid Eye Movement stage (REM), Sleep stage 1 (S1), Sleep stage 2 (S2), Sleep stage 3 (S3), Sleep stage 4 (S4) [1]. Another sleep stage classification standard was provided by the AASM. In this standard, there are five sleep stages: Awa, N1 (S1), N2 (S2), N3 (the merging of stages S3 and S4) and REM [2]. Usually, the detection of each sleep stage requires manual marking by professionals, which requires a lot of work and may produce erroneous markings. Therefore, it is imperative to study the method for automatic sleep stage classification.
According to the characteristics of the adopted features, currently commonly used automatic detection methods can be divided into the following two categories. The first is the method based on statistical features (such as spectral energy) extracted from the one-dimensional EEG signal. The other is the implicit features, which can be obtained by training deep-learning based classifiers. Hassan et al. computed various spectral features by Tunable-Q factor wavelet transform (TQWT) on sleep-EEG signal segments [3]. With the random forest classifier, they achieved accuracies of 90.38%, 91.50%, 92.11%, 94.80%, 97.50% for 6-stage to 2-stage classification of sleep states on the Sleep-EDF database.
with various kinds of classifiers, the Bagged Trees was finally selected as the suitable classifier for this method to identify the sleep stages. In addition, experiments are conducted on three public sleep databases and the results are compared with state of the art published work in order to fully evaluate and validate the performance of the proposed method.
The paper is organized as follows: In Section 2, the experimental material and methodology of the proposed method are descripted in detail. Section 3 resents the experimental results. In Section 4, the results and findings of this paper are discussed. The conclusions of the paper are drawn in Section 5.

Sleep State Classes
According to the AASM and R&K standards, the classes of sleep stages can be divided into two to six classes. Moreover, under the AASM standard, it can be divided into two to five classes. The difference is that the N3 stage of AASM includes the S3 and S4 stages of the R&K standard. The detailed description of classes considered in this work are shown in Tables 1 and 2.

Sleep EDF (S-EDF) Database
The S-EDF database have 197 whole-night Polysomnography (PSG) sleep recordings, containing EEG, EOG, chin EMG and event markers [18,19]. All the Hypnograms (sleep patterns) were manually scored by well-trained technicians according to the R&K criteria. In this study, 34 EEG recordings from 26 subjects aged 25 to 96 years are randomly selected.

DREAMS Subjects (DRMS) Database
The DRMS Database consists of 20 whole-night PSG recordings coming from healthy subjects, annotated in sleep stages according to both the R&K criteria and the new standard of the AASM [20]. Data collected were acquired in a sleep laboratory of a Belgium hospital using a digital 32-channel polygraph (BrainnetTM System of MEDATEC, Brussels, Belgium). The sampling frequency was 200 Hz.

ISRUC(Subgroup 3, ISRUC3) Database
The ISRUC3 database is the third subgroup of ISRUC database [21]. The data were obtained from human adults, including healthy subjects, subjects with sleep disorders and subjects under the effect of sleep medication. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC).
The S-EDF database was only labeled under the R&K criteria. Moreover, the ISRUC3 database was only labeled by the AASM criteria. The DRMS database was not only labeled by R&K criteria but also the AASM criteria. The annotations of S-EDF database and DRMS database were produced visually by a single expert. The ISRUC3 database was scored by two experts and the label made by the second expert was used in this paper. The Pz-Oz channel of the S-EDF database is used according to the recommendations of various studies [3][4][5][6][7]. At the same time, for the DRMS database, as the researches [9][10][11][12] recommended, the Cz-A1 channel was adopted in this work. Moreover, for the ISRUC database, the C3-A2 channel is the best choice [7]. Table 3 lists the detailed information of the above three databases.

EEG Data Preprocessing
Firstly, all the single-channel data will be extracted by the Matlab and EEGLAB [22] tools from the three database described previously. According to the prior work [5][6][7][8][9][10][11], the 0-35 Hz low pass filter can be used to eject the most of artifact. Once the dataset is filtered, it will be exported as one-dimensional vector without time information and saved as txt file which also can be denoted as the Formula (1).
where X is the vector containing the sampled EEG x k and where M is the length of vector. Furthermore, we use a window of length j to divide the full data X across time without overlap. That is X is converted into [X 1 , X 2 , . . . , X i , . . . , X L ] T which can be described as (2).
where j = T e × F s . The T e is the length of each epoch. Moreover, the F s is the sampling frequency of the database. For the S-EDF database, the T e = 30 and the F s = 100, so the j is 3000. Moreover, for the ISRUC3 database, the T e = 20 and the F s = 200, so the j is 4000.

Wavelet Package Decomposition
WPD is a powerful tool to analyze non-stationary EEG signals. In essence, WPD is a wavelet transform where the discrete-time signal is passed through more filters than the discrete wavelet transform, which can provide a multi-level time-frequency decomposition of signals [23]. Compared with discrete wavelet transform, WPD can provide more frequency resolutions. In the discrete wavelet transform, a signal is split into an approximation coefficient and a detail coefficient [24]. The approximation coefficient is then itself split into a second-level approximation coefficients and detail coefficients and the process is repeated. A wavelet packet function ω m l,d (q) is defined as (3): where l and d are the scaling (frequency localization) parameter and the translation (time localization) parameter, respectively; m = 0, 1, 2, . . . is the oscillation parameter. Wavelet packet (WP) coefficients of the EEG epoch X i are embedded in the inner product of the signal with every WP function, denoted by p i,m l (d), d = ..., −1, 0, 1, ... and given below: where p i,m l (d) denotes the m-th set of WPD coefficients at l-th scale parameter and d is the translation parameter. All frequency components and their occurring times are reflected in p i,m l (d) through change in m, l, d. Each coefficient p m l (d) measures a specific sub-band frequency content, controlled by scaling parameter l and oscillation parameter m. The essential function of WPD is the filtering operation through low-pass filter h(d) and high-pass filter g(d). For the l-th level of decomposition, there are 2 l sets of sub-band coefficients C i l,m , of length j/2 l . The wavelet packet coefficients of epoch X i are given as It can be seen from the (5) that each node of the WP tree is indexed with a pair of integers (l, m), where l is the corresponding level of decomposition and m is the order of the node position in the specific level. Here, the level l LE and wavelet basis ω LE of WPD on the epoch X i for LE calculation will be confirmed in the Section 3. Moreover, the wavelet basis ω DSSM for DSSM will be confirmed in the same section.

Locality Energy Calculation
The wavelet package energy E i l LE ,m at the m-th node on the level l LE of epoch X i can be defined as follows [25].

Dual State Space Models Estimation
As we have described before, after the wavelet packet decomposition, the low-level (the first level) coefficients will be used to estimate the dual state space models which can denoted by the difference Equation (7).
The y k ∈ C i 1,m is the coefficient at instant k ∈ [1, 2, . . . , j/2]. Vector u k ∈ R n×1 is the state vector of process at discrete time instant k and contains the numerical value of n states. Matrix A ∈ R n×n is the dynamical system matrix. K ∈ R n×1 is the steady state Kalman gain. B ∈ R 1×n is the output matrix, which describes how the internal state is transferred to the outside world in the observations y k . The e k ∈ R denotes zero mean white noise.
With the traditional subspace algorithm such as N4SID, the matrix A, B, K of the state space model of dynamic system can be estimated [26]. In this paper, the order n DSSM of dual state space models will be determined by the experiments in the Section 3. Moreover, the parameter matrixes of state space model estimated by the first level wavelet coefficients C i 1,m can be expressed as Then the DSSM S i of the X i can be defined as: So, the parameters extracted from DSSM here is called DSSM Features (DSSMFs) can be defined as DSSMFs = s i 1 s i 2 .

IMBEFs Construction
According to the previously calculated LEFs E i l LE ,m and the parameters S i of the DSSM, the features IMBEFs of epoch X i here are given by The feature dimension can be calculated by the Equation (11).
Here, the general form of features extracted from LE and multiple state space models (MSSM) which are estimated by the l MSSM -th level WPD coefficients can be depicted as Equation (12).
The dimension of the F i MSSM can be calculated by where n MSSM is the order of MSSM. Usually, the n MSSM range from 5 to 10. Assume the n MSSM = 5, the Dim MSSM will be too large. So the l MSSM is set to 1 in this paper, which means there are two state space models.

Experiments and Results
In this section, there are four experimental parts. The first is the experiment for selecting a suitable classifier among several candidate classifiers. Then is the determination of the most suitable wavelet basis ω DSSM and model order n DSSM for DSSM estimation. Next is the selection of the wavelet basis ω LE and the level l LE for the LE calculation. Finally, the test experiment will be conducted on the S-EDF database and ISRUCS3 database with the ω DSSM , ω LE , n DSSM and l LE determined according to the previous experiments.
In the process of selecting these parameters, the DRMS database was adopted for testing under the both R&K and AASM standards. There are several conventional verification strategies, including k-fold cross-validation, leave one-subject-out cross-validation (LOOCV) and corss-dataset validation, etc. In this paper, many commonly-used databases are adopted to verify the performance of the algorithm, in which the S-EDF database and the DRMS database contains lots of subjects. However, some subjects contained in these database possess unevenly distributed samples, which means the incomplete sleep stages. Consequently, the 10-fold cross-validation method would be more suitable for the performance verification in this research. In 10-fold cross-validation, the original sample is randomly partitioned into 10 equal size subsamples. Of the 10 subsamples, a single subsample is retained as the validation data for testing the model and the remaining nine subsamples are used as training data. The cross-validation process is then repeated 10 times, with each of the 10 subsamples used exactly once as the validation data. The 10 results from the folds can then be averaged to produce a single estimation. The advantage of this method is that all observations are used for both training and validation and each observation is used for validation exactly once. The accuracy (ACC) and Cohen's Kappa Coefficient (Kappa) are computed to evaluate the overall classification performance.
where TP, TN, FP and FN represent the number of true positive, true negative, false positive and false negative examples respectively. And p e is the hypothetical probability of agreement by chance.

Classifier Comparison and Selection
In this section, an algorithm is designed to search the best classifier for the method proposed in this paper. The detailed steps are shown in the Algorithm 1 below. In this algorithm, according to the distribution and characteristic of the samples, the candidate classifiers are including Linear Discriminant, Quadratic Discriminant, Quadratic SVM, Fine KNN, Bagged Trees and RUSBoosted Trees. The candidate wavelet bases include the db1, db2, db3, db4, db5, db6, db8, db16, db32, sym2, sym8, sym16, coif1, coif3 and dmey. The order of DSSM range from 5 to 10. Here only the DSSMFs are used for training and validation. Table 4 shows the experiment results of Algorithm 1. As can be seen from Table 4, the Bagged Trees is the optimal classifier in the classification of two to six classes. At the meantime, the corresponding order of DSSM is 6. In addition, in the two class classification, the optimal wavelet basis is sym2; the others, however, are db1. Furthermore, the comparison of different classifiers in two classes classification under the condition of n DSSM = 6 are listed in Table 5. It can be seen from Table 5 that the accuracy of sym2 is 95.79%, which is a little higher than the 95.71% of db1 and 95.72% of db2. Therefore, considering the results in Tables 4 and 5 , the Bagged Trees will be used as the classifier for subsequent experiments.

Wavelet Basis Comparison and Selection
After the classifier is determined, the model order n DSSM and wavelet basis ω DSSM should be further confirmed through the grid search method. This process can be seen in the step 1 of the Figure 2. The candidate wavelets include db1, db2, db3, db4, db5, db6, db8, db16, db32, sym2, sym8, sym16, coif1, coif3 and dmey. The candidate model order is 5 to 10. The Following Tables 6-14       From Tables 6-10, we can see that under the R&K standard, when the order of the DSSM is 6 and the wavelet basis is selected as db1, the classification accuracy for three to six classes can reach the highest. When the wavelet basis is selected as sym2, the accuracy of the two classes is the highest. Through further analysis, it can be seen that in the results of two class classification, the difference between the accuracy of the db1 and the highest is very small.   As can be seen from Tables 11-14, when the order n DSSM is 6, the highest classification accuracy can be obtained in two to five classes sleep state classification. Moreover, in the three to five classes classifications, when the wavelet basis is db1, the highest classification accuracy can be achieved. In the two classes of sleep classification, when the wavelet base is db1, the accuracy is 0.14% lower than the highest accuracy. Combining the classification results of the above tables, in order to facilitate subsequent calculations, the db1 was uniformly used as the wavelet basis for DSSM estimation and the model order of DSSM adopts 6.
Then, the wavelet basis ω LE and level l LE which are required to calculate LE should be further determined according to the experimental results in the next step. That is, on the basis of the features previously extracted from the DSSM, LEFs will be added which have been shown in the Step 2 of the    As can be seen from Tables 15-19, when l LE = 5, the ω LE is db4, the accuracy of two, four and six classes is the highest. Moreover, when the ω LE is set to the db5 and db3, the classification accuracy of three and five classes can reach the highest respectively. The Table 20 is the confusion matrix of six classes sleep state classification on DRMS database with IMBEFs under the R&K standard. As shown in the Table 20, the sensitivity of Awa, REM, S1, S2, S3 and S4 are 93.68%, 81.16%, 14.37%, 89.29%, 25.71% and 77.99%, respectively. Moreover, the overall accuracy of the six classes classification is 78.92%.  Tables 21-24 show the classification accuracy of 2-5 classes with LEFs on the DRMS database under the AASM, in which the highest accuracy values are highlighted in boldface. As can be seen from these tables, after adding LEFs, the accuracy of each classification has been greatly improved. Among them, the highest accuracy can be obtained when using the LEFs extracted from the 5 level WPD and there are three corresponding wavelet bases, which are db1, db2 and db4. When the wavelet basis is selected as db4, the accuracy of two classes and four classes can reach the highest. In addition, the accuracy of three and five classes are 88.22% and 79.90% respectively, which is not much different from 88.26% and 79.97% of the corresponding highest classification accuracy. Therefore, the parameter of l LE will be set as 5 and ω LE will be set as db4 in this paper.    The confusion matrix of five classes sleep state classification is listed in the Table 25. As can be seen in this table, the overall accuracy is 79.90%. The sensitivity of Awa, REM, N1, N2, N3 are 92.89%, 81.22%, 17.57%, 85.52% and 78.79%. Furthermore, the receiver operating characteristic (ROC) curve of the classifier trained by this dataset with the confirmed parameter is shown in Figure 3.
As can be seen in the Figure 3, when the positive samples is Awa, the true positive rate is 0.93 and the false positive rate is 0.05. In addition, when the positive samples are REM, N2 and N3, the corresponding positive sample rates are 0.81, 0.86 and 0.79. When the positive samples are N1, the area under the curve (AUC) area is only 0.18. Moreover, the issue of low classification accuracy of S1(N1) will be discussed in the Section 4.

Experiments on S-EDF and ISRUC3 Database
After experiments on the DRMS database, through the comprehensive comparison and selection, the classifier is selected as the Bagged Tress, n DSSM is set to 6, l LE is set to 5, ω DSSM is set to db1 and ω LE is set to db4. In order to further evaluate the performance of the method proposed in this paper, we will use these parameters to conduct experiments on the S-EDF database and the ISRUC3 database.
The classification accuracy and Cohen's Kappa Coefficients of the 2-6 classes on the S-EDF database are shown in Table 26. Furthermore, the confusion matrix of six class classification is listed for further analysis in Table 27. Similarly, the method proposed in this paper was also tested on the ISRUC3 database. The experimental results are shown in the following Tables 28 and 29. As can be seen from Table 28, the classification accuracies of two to five classes are 96.18%, 90.54%, 84.68% and 81.65%, respectively. In the five class classification, the sensitivity of Awa, REM, N1, N2, N3 are 90.31%, 83.36%, 57.70%, 81.12% and 87.50%, respectively. Table 30 shows the comparison of the classification accuracy from two to six classes of the various published method and the method proposed in this paper on the DRMS database under the R&K standard.

Epoch Mumber 6 Classes (%) 5 Classes (%) 4 Classes (%) 3 Classes (%) 2 Classes (%) Cross-Validation
Hassan et al. [ As can be seen from the Table 30 above, when the only DSSMFs is used, the method proposed in this paper has a certain improvement in accuracy compared with the others. After adding LEFs on the basis of DSSMFs, the classification accuracies of two to six classes are improved by 1.27%, 1.02%, 1.27%, 1.38% and 0.72% compared with our previous study [27].
It can be seen from Table 31 that the method proposed in this paper has a certain improvement in the sleep stage classification of 3-5 classes on the DRMS database compared with the current existing methods. The N1 sensitivity of this method on the DRMS database is 17.57%, which is higher than 14.3% of Ghimatgar [7]. Moreover, Table 32 is the accuracy comparison of various published methods on S-EDF database.

Epoch number 5 Classes (%) 4 Classes (%) 3 Classes (%) 2 Classes (%) Cross-validation
Hassan et al. [  It can be seen from Table 32 that when a large number of samples are used, the accuracy is also improved compared with other published methods. Among them, the accuracy for the classification of four classes is 93.87%, while the Sharma [28] is 92.1% and the Shen [27] is 93.0%. In the classification of two classes, Abdulla et al. [6] has the highest accuracy of 93%; however, the number of epoch they used is only 23806. The sensitivity of S1 in this paper is 19.32%, which is higher than 18.3% of Ghimatgar [7] and 15.9% of Shen [27].
The experiments results of the proposed method on ISRUC3 database are also compared with other methods, which can be seen in the following Table 33.

Epoch number 5 Classes 4 Classes 3 Classes 2 Classes
Overall Accuracy Ghimatgar  As can be seen from the Table 33, compared with Ghimatgar [7], the detection accuracy of two and three classes is improved by more than 2 points. The sensitivity of S1 in Table 29 is 57.70%, which is higher than 33% of Ghimatgar [7]. Furthermore, the Cohen's kappa Coefficient is also much higher than Ghimatgar [7].
It should be noted that the classification of S1 which is an enormous challenge to all of the published method. From neurophysiological standpoint, S1(N1) is a transition phase and is a mixture of wakefulness and sleep resulting in similarity with the neural oscillations of S1 and Awa. In REM state, the cortex shows 40-60 Hz gamma waves as it does in waking. So the S1 state is often misclassified as REM or Awa state during the visual inspection by experts [3,11]. This is why many of the S1 epochs are misclassified as REM, Awa or S2 stages in this work. In addition, with different databases, the classification accuracy of S1 (N1) are also different. The detection accuracy of N1 on the ISRUC3 database reached 57.7%; on the DRMS database and the S-EDF database, however, it is less than 20%. This is also related to the different proportions of S1 stages in each database. Under the same AASM standard, on the ISRUC3 database, the S1 accounted for 12.65%; however, on the DRMS database, the S1 accounted for only 7.3%. Furthermore, under the R&K standard, the sensitivity of S3 on the S-EDF and DRMS databases is low, only 46.11% and 25.71%, respectively. The reason relate to this phenomenon rely mainly on that the S3 is a transition phase of S2 and S4. Thus the further research should be conducted to improve the S3 detection accuracy. Moreover, as can be seen in Table 20, a large number of S3 is misclassified as S2 and the other large part is misclassified as S4. Similarly, in Table 27, almost half of S3 epochs are misclassified as S2 and a small part are misclassified as S4. In addition, when under the AASM standard, after combining the S3 and S4 into N3, the sensitivity of N3 has been improved. As shown in Table 25, only 761 epochs of N3 were misclassified as N2; however, in Table 20, 1022 epochs of S3 were misclassified as S2 and 231 epochs of S4 were misclassified as S2. Therefore, the AASM standard is more suitable for guiding the researchers to annotate the sleep stages than the R&K standard.

Conclusions
In this paper, a novel IMBEF based automatic sleep stage classification method is proposed. Moreover, a grid search strategy was presented to determine a suitable model order n DSSM and a wavelet basis ω DSSM for estimating the DSSM among 15 candidate wavelets and 6 candidate model orders. With the same search strategy, a proper wavelet basis ω LE and the WPD level l LE for LE calculation are determined under 15 candidate wavelets and multilevel decomposition. The fused IMBEFs extracted from the DSSM and LE would be used as the input features of the suitable classifier which can be selected by comparing a variety of classifiers' experiment results. In order to precisely verify the performance of the proposed IMBEF based automatic sleep stage classification method, experiments were carried out on three public databases. The comparison results with other state-of-the-art methods show that the proposed algorithm can achieve higher accuracy.
We demonstrated in this paper measurable improvements in automatic sleep stage classification, providing better understanding and diagnostic of the sleep phenomenon, clearly essential in medical, wellness and other fields.