Fast Sleep Stage Classification Using Cascaded Support Vector Machines with Single-Channel EEG Signals

Dezhao Li; Yangtao Ruan; Fufu Zheng; Yan Su; Qiang Lin

doi:10.3390/s22249914

,

and

¹

Zhejiang Provincial Key Laboratory of Quantum Precision Measurement, Collaborative Innovation Center for Information Technology in Biological and Medical Physics, College of Science, Zhejiang University of Technology, Hangzhou 310023, China

²

School of Art, Zhejiang International Studies University, Hangzhou 310023, China

^*

Authors to whom correspondence should be addressed.

Sensors2022, 22(24), 9914;https://doi.org/10.3390/s22249914

This article belongs to the Special Issue Human Signal Processing Based on Wearable Non-invasive Device

Version Notes

Order Reprints

Abstract

Long-term sleep stage monitoring is very important for the diagnosis and treatment of insomnia. With the development of wearable electroencephalogram (EEG) devices, we developed a fast and accurate sleep stage classification method in this study with single-channel EEG signals for practical applications. The original sleep recordings were collected from the Sleep-EDF database. The wavelet threshold denoising (WTD) method and wavelet packet transformation (WPT) method were applied as signal preprocessing to extract six kinds of characteristic waves. With a comprehensive feature system including time, frequency, and nonlinear dynamics, we obtained the sleep stage classification results with different Support Vector Machine (SVM) models. We proposed a novel classification method based on cascaded SVM models with various features extracted from denoised EEG signals. To enhance the accuracy and generalization performance of this method, nonlinear dynamics features were taken into consideration. With nonlinear dynamics features included, the average classification accuracy was up to 88.11% using this method. In addition, with cascaded SVM models, the classification accuracy of the non-rapid eye movement sleep stage 1 (N1) was enhanced from 41.5% to 55.65% compared with the single SVM model, and the overall classification time for each epoch was less than 1.7 s. Moreover, we demonstrated that it was possible to apply this method for long-term sleep stage monitor applications.

Keywords:

single-channel EEG signals; sleep stage classification; cascaded support vector machine; nonlinear dynamics features; long-term monitor

1. Introduction

Sleep, which contributes to self-recovery, replenishing psychophysiological resources, and upholding the immune system, is a critical physiological activity of the human body. Recently, sleep health has been widely discussed due to its association with mortality, coronary artery disease [1], and impaired neurobehavioral performance [2,3]. Unfortunately, according to demographics, up to 24% of the adult population suffers from various sleep problems, including insomnia, obstructive sleep apnea syndrome, or a mere lack of sleep hygiene, and sleep therapy is urgently needed [4]. To provide effective diagnosis and treatments, long-term sleep monitoring and sleep stage classification are very necessary [5,6]. According to Rechtschaffen, A., and Kales, A.D. (R&K) rules [7] and the recently updated American Academy of Sleep Medicine (AASM) standard [8], sleep can be evaluated and classified into five different stages: wake, non-rapid eye movement sleep stages (1, 2, 3), and rapid eye movement, namely W, N1, N2, N3, and R.

Currently, Polysomnography (PSG) is considered to be an effective method for sleep stage classification [9], but it has several obvious disadvantages. PSG is very cumbersome and needs to record multiple bio-signals of the patient simultaneously, including electromyogram (EMG), electrocardiogram (ECG), electroencephalogram (EEG), electrooculogram (EOG), blood oxygen saturation, etc. [10,11]. To carry out PSG, the patients are required to stay in a special sleep lab for at least one whole night. These requirements make this method costly and time-consuming, limiting its applications in fast and long-term sleep monitoring [12]. To overcome the above-mentioned shortcomings, one promising strategy using wearable electroencephalogram (EEG) signal-acquiring systems has been proposed for sleep stage classification [13,14], since EEG signals have different characteristics in different sleep stages. Based on this property, kinds of signal processing techniques have been applied to extract sleep-related feature information, including time-domain features [15], spectral features [16], time–frequency features [17], and nonlinear dynamics features [18]. What is more, to determine the sleep stage, several kinds of algorithms including K-means [19], Support Vector Machine (SVM) [20], Random Forest [21], Naive Bayes [22], Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN) [23,24,25] were proposed. Sors, A. reported a one-dimensional Convolutional Network method for sleep stage classification, and the average accuracy was 87% [26]. Using EEG signal energy features and Recurrent Neural Network, Hsu et al. developed another system to classify five sleep stages, resulting in 87.2% average accuracy [27]. Most work focuses on classification accuracy but ignores the classification calculation time. Few studies discussed how to strike a balance between the high classification accuracy and time-consuming aspect. In addition, it was difficult to obtain an accuracy higher than 40% for the N1 stage with single-channel EEG signals. To meet the requirement for long-term sleep monitoring, an accurate and fast sleep stage classification method with single-channel EEG signals is highly desired.

To address the aforementioned challenges, based on cascaded SVM models we proposed a fast sleep stage classification method with single-channel EEG signals following the AASM rules. We applied the nonlinear dynamics features of EEG signals, which resulted in a more comprehensive feature system to improve the accuracy and generalization performance. Moreover, the classification speed with this method has also been evaluated. These results revealed that it would be very promising to use this method for practical long-term sleep stage monitoring in the future.

2. Materials and Methods

The scheme of the sleep stage classification process adopted in this study is described in Figure 1, including data acquisition from sleep recordings, signal preprocessing, feature extraction, feature selection, and classification. The original sleep recordings were collected from the Sleep-EDF database [28,29], and we chose the data from the Fpz-Cz channel to analyze [30]. The wavelet threshold denoising (WTD) method and wavelet packet transformation (WPT) method were applied in the process of signal preprocessing to obtain six kinds of characteristic waves. After this, a comprehensive feature system with time, frequency, and nonlinear dynamics domains was made up. Subsequently, the minimum redundancy maximum relevance (mRMR) algorithm was used to select the most effective features. Finally, we obtained the cascaded SVM models with these selected features as the inputs. The final classification results were a combination of the calculation results from the two SVM models.

Figure 1. The sleep stage classification processing flow adopted in this paper.

2.1. Data Collection

The EEG sleep recordings used in this study were obtained from the Sleep-EDF database, which is publicly available from PhysioBank directly [28,29]. We collected eight different sleep data sets from healthy people (SC4001, SC4011, SC4021, SC4051, SC4062, SC4102, SC4112, and SC4122) aged from 21 to 35. Originally, the sleep recordings had three kinds of signals, which were the horizontal electrooculogram (EOG) and EEG signals from Fpz-Cz and Pz-Oz channels. All these signals were recorded with the sampling rate of 100 Hz, and we chose the EEG sleep signals from the Fpz-Cz channel for sleep stage classification analysis, since some studies revealed that EEG signals from Fpz-Cz and Pz-Oz can be replaced with each other without losing the AASM rules [31]. In this study, for every data set, we choose the sleep recording of nine hours from 11 p.m. to 8 a.m. to carry out the analysis.

2.2. Signal Preprocessing

The WTD method was used to denoise the original EEG signals for signal preprocessing. Daubechies wavelets of the order 8 (db8) method were applied to decompose the collected EEG signals into 7 layers, as shown in Figure 2. After that, using the soft threshold method with suitable process coefficients, the EEG signals were denoised.

Figure 2. The wavelet packet tree applied for signal processing.

2.2.1. Wavelet Packet Transform

EEG signals were decomposed into different characteristic waves [27,32], including alpha (

α

= (8–13 Hz)), beta (

β

= (12–30 Hz)), theta (

θ

= (4–8 Hz)), delta (

δ

= (0.5–2 Hz)), spindle (12–14 Hz), sawtooth (2–6 Hz), and K complex (1 Hz). The characteristic frequency waves corresponding to different stages were summarized and shown in Table 1.

Table 1. Five sleep stages and their corresponding characteristic frequency.

To obtain the characteristic waves from every denoised epoch, the WPT method was applied to obtain the corresponding frequency bands. The WPT method allows precisely resolving the brain rhythms into packets whilst demanding a relatively low computational cost [33,34]. To obtain all these characteristic waves, we constructed a wavelet packet tree with 7 decomposition levels to obtain the frequency band resolution of around 0.39 Hz. As seen in Figure 2, the wavelet packet coefficient of the j-th node in the i-th layer was named

(i, j), 1 \leq j \leq 2^{i} - 1

, which represented a decomposed frequency band. We chose suitable nodes according to different characteristic waves’ frequency bands. By connecting these nodes’ frequency bands from low to high, a frequency band covering the required characteristic wave was formed.

2.2.2. Feature Extraction

Feature extraction was the essential process for accurate sleep stage classification. Since the sampling rate was 100 Hz in this study, there were a total of 3000 samples in each 30 s epoch. To obtain a comprehensive feature system, time-domain features, energy features, frequency-domain features, and nonlinear dynamics features were comprehensively considered in this study.

(1): Time-domain features

The standard deviation was the average amount of variability in each epoch. For each characteristic wave, the standard deviation of the samples during an epoch was computed as:

S t d = \sqrt{\sum_{i = 1}^{3000} {(w_{i} - \bar{w})}^{2}},

(1)

where

w_{i}

was the i-th sample of an epoch corresponding to the characteristic wave. Thus, there were six standard deviations corresponding to six characteristic waves in an epoch:

S t d_{α}, S t d_{β}, S t d_{δ}, S t d_{s a w - t o o t h}, S t d_{θ}, S t d_{s p i n d l e}

. Other effective time-domain features of the EEG signals were calculated and summarized (details in Appendix A.1, Table A1).

(2): Energy-domain features

The total energy of six characteristic waves in an epoch was defined as:

E n e r g y = \sum_{i = 1}^{3000} w_{i}^{2},

(2)

Six energy features were corresponding to six characteristic waves in an epoch:

E_{α}, E_{β}, E_{δ}, E_{s a w - t o o t h}, E_{θ}, E_{s p i n d l e}

.

From previous research, N1 could not be classified accurately, which was usually considered to be confused with R or N2 [35]. To improve the accuracy of the N1 stage, two more different features were established in this study. Since the characteristic wave in the N1 stage was the

θ

wave, the characteristic waves for R and N2 were

α

and

δ

waves, respectively. Therefore, we set the ratio of the energy of alpha or delta to theta as important features:

E_{\frac{α}{θ}} = \frac{E_{α}}{E_{θ}},

(3)

E_{\frac{δ}{θ}} = \frac{E_{δ}}{E_{θ}},

(4)

where

E_{α / θ}

in Equation (3) and

E_{δ / θ}

in Equation (4) were the energy ratio of alpha and delta waves to theta wave, respectively.

(3): Frequency-domain features

The frequency features usually contain power information of EEG waves. In this study, the power of each characteristic wave was expressed as follows in Equation (5):

P o w e r = \sum_{k = 1}^{K} P_{k},

(5)

where

P_{k}

was the kth magnitude of the wave’s power spectral density (PSD), and K was the total sample number of the EEG signals in the frequency domain. Thus, six power features were corresponding to six characteristic waves (frequency spectra as in Figure A1) in an epoch:

P_{α}, P_{β}, P_{δ}, P_{s a w - t o o t h}, P_{θ}, P_{s p i n d l e}

.

Moreover, mean frequency (MNF) for an epoch was used, which was defined as:

M N F = \frac{\sum_{k} p_{k} f_{k}}{\sum_{k} p_{k}},

(6)

where

p_{k}

and

f_{k}

in Equation (6) were the k-th power and frequency of the power spectral density of the EEG signals in an epoch, respectively.

Similar to the energy features, we set the ratio of the power of alpha and delta to theta as new features:

P_{α / θ} = \frac{P_{α}}{P_{θ}}, P_{δ / θ} = \frac{P_{δ}}{P_{θ}},

(7)

where

P_{α / θ}

and

P_{δ / θ}

in Equation (7) were the power ratio of alpha and delta waves to theta wave, respectively.

(4): Nonlinear-dynamics-domain features

In this study, nonlinear-dynamics-domain features of Renyi entropy, Lempel–Ziv complexity, multi-scale entropy, spectral entropy, sample entropy, and fuzzy entropy were calculated with denoised EEG signals.

Renyi entropy (RE) was widely applied to analyze EEG signals as well [36,37]. RE can quantify the diversity, uncertainty, or randomness of a system. RE values were calculated as in Equation (8):

R E = - \log (\sum_{k} p_{k}^{2}),

(8)

Similarly, we calculated the RE values of six characteristic waves

(R E_{α}, R E_{β}, R E_{δ}, R E_{s a w - t o o t h}, R E_{θ}, R E_{s p i n d l e})

.

The variation in EEG signals within a time scope indicated the self-invariant and self-similar structures, and this was measured by the nonlinear analysis method of the Lempel–Ziv complexity (LZC) algorithm.

Before calculating the LZC values, the sequence

A^{n}

was transformed into a finite symbol sequence, namely a binary sequence

Z = {z_{1}, z_{2}, \dots, z_{n}}

as in Equation (9) with the threshold

T_{d}

:

z_{i} = {\begin{matrix} 0, a_{i} < T_{d} \\ 1, o t h e r w i s e \end{matrix},

(9)

The median of the sequence

A^{n}

was taken as

T_{d}

. LZC was calculated following the computational flow chart as in Figure 3.

Figure 3. The nonlinear analysis computation process of Lempel–Ziv complexity (LZC).

Based on chaos theory, multi-scale entropy contributed to the improvement of the accuracy of sleep stage classification. We set the scale factor τ, and the sequence

A^{n}

was divided into τ sequences. The coarsely granulated time sequence is given by

λ^{τ} (J) = {λ_{1}^{τ}, λ_{2}^{τ}, \dots, λ_{J}^{τ}}

:

λ_{j}^{(τ)} = \frac{1}{τ} \sum_{i = (j - 1) τ + 1}^{j τ} a_{i}, 1 \leq j \leq [\frac{n}{τ}],

(10)

Considering the scale factor effect on the accuracy of sleep stage classification, R and N1 could not be classified properly if

1 \leq τ \leq 8

. However, if

τ \geq 14

W, R, and N2 stages would be confused,

9 < τ < 13

was therefore suitable. In this study, we set

τ = 11

. Therefore, the multi-scale entropy in an epoch was expressed as the following:

M s E n (m, r, λ^{(τ)}) = \sum_{j = 1}^{[\frac{n}{τ}]} S p E n (m, r, λ^{(τ)}), τ = 11,

(11)

where

m = 2

,

r = 0.2 S D

. Other nonlinear-dynamics-domain features of the EEG signals were calculated (details in Appendix A.2.).

2.2.3. Feature Selection

In this article, the minimum redundancy maximum relevance (mRMR) algorithm was used to select the effective features. The mutual information I of the discrete random variables Z₁ and Z₂ was defined as [38] in Equation (12):

I (Z_{1}, Z_{2}) = \sum_{i, j} P (Z_{1} = Z_{1 i}, Z_{2} = Z_{2 j}) l o g \frac{P (Z_{1} = Z_{1 i}, Z_{2} = Z_{2 j})}{P (Z_{1} = Z_{1 i}) P (Z_{2} = Z_{2 j})}

(12)

where

P (Z_{1} = Z_{1 i})

,

P (Z_{2} = Z_{2 j})

, and

P (Z_{1} = Z_{1 i}, Z_{2} = Z_{2 j})

were the probability density functions. The relevance between features was

F

. The output sequence

g = {g_{1}, g_{2}, \dots, g_{e n d}}

was

D (F, g)

in Equation (13), and the redundancy in the feature sets F was

R (F)

in Equation (14), which were defined as:

D (F, g) = \frac{1}{| F |} \sum_{f_{i} ϵ F} I (f_{i}, g),

(13)

R (F) = \frac{1}{{| F |}^{2}} \sum_{f_{i}, f_{j} ϵ F} I (f_{i}, f_{j}),

(14)

where

f_{i}

and

f_{j}

were different feature sets:

f_{i} = {f_{i 1}, f_{i 2}, \dots f_{i, e n d}}, f_{j} = {f_{j 1}, f_{j 2}, \dots f_{j, e n d}}

. |F| was the number of the total features in F, which was 51 in this study (details in Appendix A.3, Table A2). The analysis process of the mRMR was shown in Figure 4.

Figure 4. The computation process of mRMR to select the effective features.

The MIQ value of each feature in feature set F was defined as in Equation (15):

M I Q (f_{i}) = \frac{I (f_{i}, g)}{\frac{1}{| F |} \sum_{f_{j} \in S} I (f_{i}, f_{j})}, f_{i} \neq f_{j},

(15)

Finally, we obtained the MIQ values for all those features. The rank of MIQ values for different features was permuted (in Appendix A.4, Table A3).

2.3. Cascaded Support Vector Machine Classifier

The cascaded SVM method, consisting of two 3-class SVM models, was applied for the sleep stage classifier in this study. Since SVM is inherently a binary classifier, we chose the one-against-one method and constructed 3 hyper-planes, where each hyper-plane was constructed with the training epochs with two classes from three classes. To decide each epoch, the same voting weight was set for every decision function. Finally, the predicted result was the class with the largest vote.

2.3.1. Data Set for SVM I

All these collected EEG signals were divided into 30 s epochs. Every epoch was assigned to one of the five sleep stages, which were W, N1, N2, N3, and R [39]. In our study, the proposed SVM I was applied to identify three different sleep stages: W, REM-LS, and N3. The REM-LS includes R, N1, and N2 stages, which are frequently confused with each other [27].

2.3.2. Data Set for SVM II

Since the sample sizes of different sleep stages from the original data were different, as in Table 2, the N1 stage usually could not be classified accurately. To overcome this problem, we reconstructed the training set for the SVM II classifier from the training set of SVM I. Then, we collected total epochs in stage N1 and randomly selected the same number of epochs in stage R, and twice the number of epochs in stage N2 to make up the training data set. The training progress for the SVM II was conducted by 10-fold cross-validation with the formed training set, and the generalization performances of SVM I and SVM II were assessed by the same test set.

Table 2. The sample size distribution in different sleep stages.

2.3.3. Transform the Input Data

The training set was expressed as

D a t a = {(X_{i}^{*}, y_{i})}_{i = 1}^{M}

, where M was the number of total epochs in the training set.

X_{i}^{*}

was an input vector for the i-th epoch:

X_{i}^{*} = [X_{i 1}^{*}, \dots, X_{i χ}^{*}]

;

y_{i}

was the label of the i-th epoch. For the SVM method, multiple classification problems were decomposed into several dichotomy problems. For each dichotomy problem, the label was

y_{i} \in {1, - 1}

.

There were

χ

features and M epochs to form an input matrix X. The k-th element of the input matrix in the i-th epoch was standardized as in Equation (16):

X_{i k} = \frac{X_{i k}^{*} - m e a n (X_{k}^{*})}{s t d (X_{k}^{*})}, k = 1, \dots, χ,

(16)

where

X_{k}^{*}

was the vector for the k-th feature.

The kernel function was applied to fulfill the space transform with a mapping relationship. In this study, the quadratic polynomial kernel function was utilized in Equation (17):

θ_{i} θ_{j} = κ (X_{i}, X_{j}) = {(X_{i}^{T} X_{j})}^{2},

(17)

X_{i}

and

X_{j}

were different input vectors;

θ_{i}, θ_{j}

were the outputs in higher dimensional space corresponding to

X_{i}, X_{j}

through nonlinear mapping [40].

SVM constructed an optimal separating hyper-plane (OSH) by maximizing the margin of separation between the classes [40]. The separate hyper-plane was expressed as in Equation (18):

W θ + ε = 0 {such that y}_{i} (W θ_{i} + ε) \geq 1 \forall_{i},

(18)

where W was the normal plane and

ε

was the relative position to the coordinate center. We utilized a quadratic programming problem to find the OSH by creating a Lagrangian multiplier and converting it into the dual problem shown in Equations (19) and (20):

maximize \sum_{i = 1}^{M} α_{i} - \frac{1}{2} \sum_{i = 1}^{M} \sum_{j = 1}^{M} α_{i} α_{j} y_{i} y_{j} κ (X_{i}, X_{j}),

(19)

s . t . \sum_{i = 1}^{M} α_{i} y_{i} = 0, 0 \leq α_{i} \leq C o n s t,

(20)

where

α = (α_{1}, \dots, α_{M})

was the Lagrangian multiplier, which was non-negative. Const was a constant regularization parameter.

The final output

\overset{\land}{y_{j}}

for the j-th epoch was expressed as in Equation (21):

\overset{\land}{y_{j}} = sgn (\sum_{i}^{M} α_{i} y_{i} κ (X_{i}, X_{j}) + ε),

(21)

where

s g n (u) = {\begin{matrix} 1, u \geq 0 \\ - 1, u < 0 \end{matrix}

.

We found that the number of input parameters was not linear with the accuracy of classification results. When the number of input parameters was small, the accuracy of classifiers increased very quickly. However, after a certain number of input parameters, the accuracy changed little. The proper number of input features was chosen to obtain a balance between the accuracy result and computing time.

3. Results and Discussion

3.1. The Average Accuracy of Sleep Stage Classification

In this study, we randomly selected 90% labeled epochs to train the SVM classifiers with 10% data reserved for the classification accuracy test. To test the accuracy of the trained model, different epochs were classified with SVM classifier I and classifier II, and then the classification results were compared with true labels to obtain the test accuracy. The training and testing were conducted five times to obtain five random training and testing sets. The average accuracy values were used to evaluate the generalization performance of the model.

Through the above process, we compared the average accuracy of the cascaded SVM model and the single SVM model, as summarized in Table 3. The average accuracy for the single SVM model was 86.45% and the standard deviation was around 0.71%. For the cascaded SVM model, the average accuracy was 88.11% and the standard deviation was around 0.67%. Additionally, for the cascaded SVM method, the accuracy of N1 was 55.65% and the standard deviation was 3.13%, while the value of the single SVM method was only 41.5% and the standard deviation was 1.72%. These results indicated that the cascaded SVM method was effective to improve the overall average classification accuracy. Moreover, the accuracy of the N1 stage was significantly improved. We found that it did not take too much time for the cascaded SVM to identify a sleep stage compared with the single SVM. All results in Table 3 were calculated by the computer with Intel (R) Core (TM) i9-9900K CPU @ 3.60 GHz, 32 GB of memory, and a 64-bit operating system based on a ×64 processor. The computing time was the average run time of the process from denoised EEG signals to give the sleep stage results.

Table 3. The run time of training and testing of the Support Vector Machine for each epoch.

3.2. The Sleep Stage Classification Performance

To verify the performance of this proposed method for long-term applications, we compared the classification results with the method in this work and the original result from the Sleep-EDF data set (SC4001) as an example, shown in Figure 5.

Figure 5. The classification performance was evaluated with the test set of SC4001.

The accuracy result for this case (SC4001) was 89.22%. It was clear that this method could be applied for continuous data analysis. To obtain more details about the classification result, we acquired the confusion matrix as shown in Figure 6. The most accurate stage was W, with a precision of 94.4%. In addition, due to the optimized model we acquired the precision of the N1 stage to be high, up to 63.9%. As in the confusion matrix, the total accuracy of this case was defined as in Equation (22):

A c c_{i} = \frac{\sum N_{s t a g e i, P T}}{\sum N_{s t a g e i, P T} + \sum N_{s t a g e i, F}},

(22)

where

A c c_{i}

is the total accuracy of the model;

N_{s t a g e i, P T}

is the number of corrected predicted samples for sleep stage i (W, R, N1, N2, and N3); and

N_{s t a g e i, F}

is the number of prediction and true label mismatched samples for sleep stage i.

Figure 6. The confusion matrix of trained model was calculated with different stages for test set of SC4001.

The precision of every labeled stage by prediction Pi was defined as in Equation (23):

P r e c i t i o n_{P i} = \frac{N_{s t a g e P i, P T}}{N_{s t a g e P i, P T} + \sum N_{s t a g e P i, F}},

(23)

where

N_{s t a g e P i, P T}

is the number of corrected predicted samples for the predicted labeled of stage Pi in the matrix;

N_{s t a g e P i, P T}

is the number of prediction and true label mismatched samples for stage Pi in the matrix.

The recall of every original labeled stage Ti was defined as in Equation (24):

R e c a l l_{T i} = \frac{N_{s t a g e T i, P T}}{N_{s t a g e T i, P T} + \sum N_{s t a g e T i, F}},

(24)

where

N_{s t a g e T i, P T}

is the number of corrected predicted samples for the originally labeled stage of Ti in the matrix.

N_{s t a g e P i, P T}

is the number of prediction and true label mismatched samples for stage Ti in the matrix.

In addition, the classification accuracy results using the cascaded Support Vector Machine method in this study and other previously reported studies are summarized in Table 4.

Table 4. The accuracy of the classification for five sleep stages using single-channel methods.

In Table 4, these methods classifying the five sleep stages, which were W, N1, N2, N3, and R, were compared. The prediction accuracy of Neural Networks is usually higher than that of SVM models, but large amounts of data are needed for calculation using Neural Networks. However, for the sleep stage classification task with single-channel EEG, we considered a more comprehensive feature system including the time-domain, energy-domain, frequency-domain, and nonlinear-dynamics-domain features to obtain better results in this work. Moreover, for N1 stage classification, we not only chose the special frequency features but also increased the number of N1 stages in the training set to acquire the model. From the application perspective, the cascaded Support Vector Machine method applied in our study can be applied for the EEG signal analysis for various applications. Based on the results presented above, it was demonstrated that our method can classify the sleep stage in a short time, which shows the potential for long-term sleep stage monitoring.

3.3. Influence by Number of Input Features on Classification Accuracy

Based on our preliminary calculation results, with the number of classifier inputs increasing, the accuracy of classifiers would increase very quickly at first. However, after a certain number of input parameters, there was little change in accuracy. To investigate how the number of input features affects classification accuracy, we took the first 26 to 36 features from the permuted rank of 51 features as the input features for every SVM model to calculate the mean accuracy of the total task, as in Figure 7a. Moreover, the accuracy of the N1 stage was also calculated, as shown in Figure 7b. As in Figure 7a, the accuracy of SVMI increased slightly when the number of input features was larger than 32, while a similar phenomenon was found for SVMII when the number of input features was larger than 30. Based on these calculation results, we chose 32 and 30 as the number of input features for calculation.

Figure 7. The average accuracy of five sleep stages’ classifications (a) and accuracy of sleep stage N1 (b), with the different number of input features.

We found that the most suitable number of input features can be chosen without affecting the classification accuracy.

3.4. Effectiveness Analysis of Different Nonlinear Dynamics Features

In this study, fuzzy entropy, LZC, sample entropy, and multi-scale entropy needed more time to be calculated than other features, whereas they contributed conspicuously to the improvement of accuracy. A fast calculation model needs to evaluate the computing time, average accuracy value of five sleep stages, and the accuracy value of stage N1 simultaneously to compare them. The technique for order preference by similarity to an ideal solution (TOPSIS) method was applied to give a comprehensive evaluation of every nonlinear dynamic feature. With TOPSIS test results, suitable nonlinear dynamics features were selected.

We set j,

A c c_{j}

, and

A c c N 1_{j}

to represent the computing time, average accuracy value, and the accuracy value of N1 for the j-th nonlinear dynamics feature before normalized operation. For the normalized operation, it was defined as in Equation (25):

{\begin{matrix} V_{j 1} = \frac{\max {T i m e} - T i m e_{j}}{\max {T i m e} - \min {T i m e}} \\ V_{j 2} = \frac{A c c_{j} - \min {A c c}}{\max {A c c} - \min {A c c}} \\ V_{j 3} = \frac{A c c N 1_{j} - \min {A c c N 1}}{\max {A c c N 1} - \min {A c c N 1}} \end{matrix} j = 1, 2, 3, 4,

(25)

where

V_{j 1}, V_{j 2}, V_{j 3}

were the computing time, average accuracy for five sleep stages, and the accuracy of N1 with the normalized operation.

Then, we set the positive and negative ideal values for each parameter:

V_{i}^{+}, V_{i}^{-}, i = 1, 2, 3

. The distances to the positive and negative ideal solutions were solving using Equations (26) and (27):

D_{j}^{+} = \sqrt{\sum_{i = 1}^{3} α_{i} {(V_{j i} - V_{i}^{+})}^{2}}

(26)

D_{j}^{-} = \sqrt{\sum_{i = 1}^{3} α_{i} {(V_{j i} - V_{i}^{-})}^{2}}

(27)

where

D_{j}^{+}

,

D_{j}^{-}

represented the distances to the positive and negative ideal solutions, respectively, and

α_{1}, α_{2}, α_{3}

were the weight of these features. Considering the equal importance of computing time and accuracy value, we took

α_{1} = α_{2} = α_{3} = 1 / 3

.

After that, the comprehensive score for the j-th feature was defined as in [41] with Equation (28):

s c o r e_{j} = \frac{D_{j}^{-}}{D_{j}^{+} + D_{j}^{-}}, j = 1, 2, 3, 4,

(28)

All these scores of every parameter are shown in Table 5. We found that different kinds of nonlinear dynamics features have different influences on the average accuracy results. The impact on the accuracy results of the multi-scale entropy feature was the most significant, even though the computing time of this kind of feature was the longest. In a real application, to obtain a balance between the accuracy results and the computing time the most significant features can be chosen without other kinds of features. In this study, it was clear that multi-scale entropy was more suitable to be selected. For EEG signals from a single channel, it was very complex to acquire a precise description. The multi-scale entropy described the single complexity, which was efficient for sleep stage classification.

Table 5. The evaluation of nonlinear dynamics features.

Through the above-mentioned process, the overall performances of the model before and after the selection of nonlinear dynamics features were obtained and are shown in Table 6.

Table 6. The performance of different models.

With all these nonlinear features included, the average computing time for each epoch was longer than 2.5 s. After nonlinear dynamic features selection, the computing time was 1.65 s, which was much shorter with the accuracy maintained. Nonlinear dynamic features have a great influence on the performance of the sleep stage classification model.

4. Conclusions

In this work, a fast sleep stage classification applicable method with energy, time, frequency, and nonlinear dynamics features of EEG signals from the Fpz-Cz channel with a cascaded Support Vector Machine was proposed. Compared with the traditional single SVM model, the average accuracy of the N1 stage was enhanced by up to 55.65%. Moreover, different kinds of nonlinear dynamics features were included in the model, and it was revealed that different nonlinear dynamics features had different effects on the performance of the model. The feature of multi-scale entropy was the most significate parameter. To achieve a balance between accurate and time-consuming classification, we selected nonlinear dynamics features based on overall performance parameters. Finally, the proposed method was shown to be fast, with a classification time of less than 1.7 s, and real applicable sleep stage classification with a high accuracy of 88.11%. Based on the proposed method, further investigations could be conducted with multi-channel EEG signal analysis for other applications. Moreover, signal preprocessing and feature extraction methods were explored to further decrease the time taken for classification and increase the generalization performance. For real applications for long-term sleep medical diagnosis exploration, the robustness of this method can be further enhanced with more samples for every user.

5. Patents

The patent resulting from the work reported in this manuscript is CN202111182428.9.

Author Contributions

Conceptualization, D.L., Y.S. and Q.L.; methodology, F.Z.; software, Y.R.; validation, D.L., F.Z. and Y.R.; formal analysis, F.Z.; investigation, Y.S.; resources, Y.S.; data curation, D.L.; writing—original draft preparation, F.Z. and Y.R; writing—review and editing, D.L. and Y.S.; visualization, Y.S.; supervision, D.L. and Q.L.; project administration, D.L. and Q.L.; funding acquisition, Q.L., D.L. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Joint Funds of the National Natural Science Foundation of China (No. U20A2019), the Natural Science Foundation of Zhejiang Province under Grant No. LQ21E060006, the China Academy of Space Technology (“Experiments for Space Exploration Program and the Qian Xuesen Laboratory”, No. TKTSPY-2020-06-01), and the Zhejiang International Studies University’s key project “A comparative study of Chinese and English art education from the perspective of nationalization” (No. 090500112016).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Physionet: https://physionet.org/content/sleep-edfx/1.0.0/sleep-cassette/ (accessed on 11 November 2020).

Acknowledgments

We would like to thank Long Li from the Qian Xuesen Laboratory of Space Technology and Chi Cheng from Tsinghua University for their discussion and part of supporting.

Conflicts of Interest

The authors declare no competing interests.

Appendix A

Appendix A.1. Effective Time-Domain Features of the EEG Signals

Table A1. Functions of time-domain features *¹.

Time-Domain Features	Functions
Mean	$\bar{x} = \frac{1}{3000} \sum_{i = 1}^{3000} x_{i}$
Median	$M = \frac{x_{1500} + x_{1501}}{2}$
Variance	$v a r (x) = \frac{1}{3000} \sum_{i = 1}^{3000} {(x_{i} - \bar{x})}^{2}$
Skewness	$s k e w (x) = \frac{\frac{1}{3000} \sum_{i = 1}^{3000} {(x_{i} - \bar{x})}^{3}}{v a r {(x)}^{3 / 2}}$
Kurtosis	$k u r t (x) = \frac{\frac{1}{3000} \sum_{i = 1}^{3000} {(x_{i} - \bar{x})}^{4}}{v a r {(x)}^{2}}$
Hjorth mobility	$H M (x) = \sqrt{\frac{v a r (d x / d t)}{v a r (x)}}$
Peak-to-peak amplitude	$x_{p p k} = m a x (x) - m i n (x)$
Hjorth complexity	$H C (x) = \frac{H M (d x / d t)}{H M (x)}$
Average rectified value	$A R V (x) = \frac{1}{3000} \sum_{i = 1}^{3000} \| x_{i} \|$
Root mean square	$R M S (x) = \sqrt{\frac{1}{3000} \sum_{i = 1}^{3000} \| x_{i} \|^{2}}$

*¹ In this table,

x_{i}

and

\bar{x}

were the value of the ith sample and the mean value of an epoch for denoised EEG signals, respectively.

Appendix A.2. Nonlinear-Dynamics-Domain Features of the EEG Signals

(1): Spectral entropy (SE)

Spectral entropy (SE) is a nonlinear method to summarize signal power irregularity over measured frequencies, which describes the complexity of a system [45], which is widely used to analyze the electrophysiological signal [46,47]. It can be defined as the following:

S E = \sum_{k} p_{k} \log (\frac{1}{p_{k}})

(A1)

where

p_{k}

is the k th power spectral density of the EEG signals in an epoch. The spectral entropy of six characteristic waves (

S E_{α}, S E_{β}, S E_{δ}, S E_{s a w - t o o t h}, S E_{θ}, S E_{s p i n d l e}

) were calculated in this study.

(2): Sample entropy

Sample entropy measures the complexity of the time series by measuring the probability of new patterns being generated in the signals. Meanwhile, if the probability of new patterns being generated is higher, the complexity of the signal is higher as well [48]. It is also suitable to analyze EEG signals with low signal noise [49]. To calculate the sample entropy of EEG signals in an epoch, we suppose the length of samples is N and a discrete time sequence is

A^{n} = {a_{1}, a_{2}, \dots, a_{n}}

, and define the m-dimensional vector in

A^{n}

[50] as the following:

A_{m} (i) = {a_{i}, a_{i + 1}, \dots, a_{i + m - 1}}, i = 1 \dots \dots, n - m

(A2)

In this article,

n = 3000

. The distance between

A_{m} (i)

and

A_{m} (j)

can be defined as the following:

d_{i j}^{m} = d [A_{m} (i), A_{m} (j)] = \underset{k = 0, \dots, m - 1}{m a x} (| a_{i + k} - a_{j + k} |)

(A3)

The next step is to obtain the probability of distance smaller than the similarity tolerance r as the following:

B^{m} (r) = \frac{\sum_{i = 1}^{n - m} B_{i}}{N - m - 1}

(A4)

where

B_{i}

is the number of the

d_{i j}^{m}

(

d_{i j}^{m} < r

), r is effective when it satisfies

r \in [0.1 S D, 0.25 S D]

, and SD is the standard deviation of the sequence

A^{n}

. We took

m = 2

,

r = 0.2 S D

. Then, we increased the dimension from m to m + 1 and repeated the above steps to obtain the sample entropy value:

S p E n (m, r) = \underset{n \to \infty}{l i m} {- \ln [\frac{B^{m + 1} (r)}{B^{m} (r)}]}

(A5)

(3): Fuzzy entropy

Apart from the sample entropy, fuzzy entropy should be calculated, which shows the probability of new patterns being generated in the signals [51]. To obtain that value, we acquired the degree of the membership

μ_{i j}^{m}

.

μ_{i j}^{m} = {\begin{matrix} \exp [- (\frac{d_{i j}^{m}}{r})^{2} \ln 2], d_{i j}^{m} = 0 \\ 1, d_{i j}^{m} > 0 \end{matrix}

(A6)

where

m = 2

,

r = 0.15 S D

, and

d_{i j}^{m}

is the distance between

A_{m} (i)

and

A_{m} (j)

. Define

φ^{m} (r) = \frac{1}{n - m + 1} \sum_{i, j}^{n - m + 1} μ_{i j}^{m}

, and the fuzzy entropy is the following:

F z E n (m, r, s) = - \ln \frac{φ^{m} (r)}{φ^{m + 1} (r)}

(A7)

Appendix A.3. The Feature System Considered in this Study

Table A2. Different features considered in this study.

Time-Domain Features	Energy-Domain Features	Frequency-Domain Features	Nonlinear-Dynamics-Domain Features
$1 ~ 6 : S t d_{α}, S t d_{β}, S t d_{δ},$ $S t d_{s a w - t o o t h}, S t d_{θ}, S t d_{s p i n d l e}$	$17 ~ 22 : E_{α}, E_{β}, E_{δ},$ $E_{s a w - t o o t h}, E_{θ}, E_{s p i n d l e}$	$\begin{array}{l} 25 ~ 30 : P_{α}, P_{β}, P_{δ}, \\ P_{s a w - t o o t h}, P_{θ}, P_{s p i n d l e} \end{array}$	$34 ~ 39 : S E_{α}, S E_{β}, S E_{δ},$ $S E_{s a w - t o o t h}, S E_{θ}, S E_{s p i n d l e}$ $40 ~ 45 : R E_{α}, R E_{β}, R E_{δ},$ $R E_{s a w - t o o t h}, R E_{θ}, R E_{s p i n d l e}$ $46 : Spectral entropy$
7: Mean	$23 : E_{\frac{α}{θ}} = \frac{E_{α}}{E_{θ}}$	31: Mean frequency	47: Renyi entropy
8: Median	$24 : E_{\frac{δ}{θ}} = \frac{E_{δ}}{E_{θ}}$	$32 : P_{\frac{α}{θ}} = \frac{P_{α}}{P_{θ}}$	48: Sample entropy
9: Variance		$33 : P_{\frac{δ}{θ}} = \frac{P_{δ}}{P_{θ}}$	49: Fuzzy entropy
10: Skewness			50: Multi-scale entropy
11: Kurtosis			51:Lempel–Ziv complexity
12: Hjorth mobility
13:peak-to-peak amplitude
14: Hjorth complexity
15: Average rectified value
16: Root mean square (RMS)

Appendix A.4. The Permuted Rank of Different Features for SVM Classifiers

There are ranks shown in Table A3, which are permuted according to the MIQ value from high to low.

Table A3. The ranks of effective features for different classifiers.

Classifier	Ranks of Features’ Number
Support Vector Machine I	46→49→11→40→10→24→32→50→20→7 12→33→35→47→8→29→13→51→42→15 25→31→43→26→3→48→14→44→9→37 36→41→23 →34→16→27→4→2→38→17 19→28→5→18→1→21→39→45→6→22→30
Support Vector Machine II	20→7→24→2→32→33→17→11→19→39 43→13→49→21→45→50→14→37→40→30 15→31→35→44→4→36→18→25→9→23 28→48→41→34→42→6→38→12→3→26 1→46→5→16→22→27→51→29→47→8→10

In Table A3, the corresponding relations between features and numbers are shown in Table A2.

Appendix A.5. The Frequency Spectra That Were Used for Computing the Frequency-Domain Features

Figure A1. The frequency spectra of (a) Alpha, (b) Beta, (c) Delta, (d) Sawtooth, (e) Spindle, and (f) Theta, applied for computing frequency domain features.

References

Labeix, P.; Berger, M.; Zellag, A.; Garcin, A.; Barthelemy, J.C.; Roche, F.; Hupin, D. Resistance training of inspiratory muscles after coronary artery disease may improve obstructive sleep apnea in outpatient cardiac rehabilitation: RICAOS study. Front. Physiol. 2022, 13, 846532. [Google Scholar] [CrossRef] [PubMed]
Vrajova, M.; Slamberova, R.; Hoschl, C.; Ovsepian, S.V. Methamphetamine and sleep impairments: Neurobehavioral correlates and molecular mechanisms. Sleep 2021, 44, zsab001. [Google Scholar] [CrossRef] [PubMed]
Yan, C.; Li, P.; Yang, M.; Li, Y.; Li, J.; Zhang, H.; Liu, C. Entropy Analysis of Heart Rate Variability in Different Sleep Stages. Entropy 2022, 24, 379. [Google Scholar] [CrossRef] [PubMed]
Mitchell, H.A.; Weinshenker, D. Good night and good luck: Norepinephrine in sleep pharmacology. Biochem. Pharmacol. 2010, 79, 801–809. [Google Scholar] [CrossRef]
Sharma, M.; Tiwari, J.; Patel, V.; Acharya, U.R. Automated identification of sleep disorder types using triplet half-band filter and ensemble machine learning techniques with EEG signals. Electronics 2021, 10, 1531. [Google Scholar] [CrossRef]
Tripathy, R.K.; Gajbhiye, P.; Acharya, U.R. Automated sleep apnea detection from cardio-pulmonary signal using bivariate fast and adaptive EMD coupled with cross time-frequency analysis. Comput. Biol. Med. 2020, 120, 103769. [Google Scholar] [CrossRef] [PubMed]
Hori, T.; Sugita, Y.; Koga, E.; Shirakawa, S.; Inoue, K.; Uchida, S.; Kuwahara, H.; Kousaka, M.; Kobayashi, T.; Tsuji, Y.; et al. Proposed supplements and amendments to ‘A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects’, the Rechtschaffen & Kales (1968) standard. Psychiat. Clin. Neuros. 2001, 55, 305–310. [Google Scholar]
Berry, R.B.; Budhiraja, R.; Gottlieb, D.J.; Gozal, D.; Iber, C.; Kapur, V.K.; Marcus, C.L.; Mehra, R.; Parthasarathy, S.; Quan, S.F.; et al. Rules for scoring respiratory events in sleep: Update of the 2007 AASM Manual for the Scoring of Sleep and Associated Events. Deliberations of the Sleep Apnea Definitions Task Force of the American Academy of Sleep Medicine. J. Clin. Sleep. Med. 2012, 8, 597–619. [Google Scholar] [CrossRef]
Schulz, H. Phasic or transient? Comment on the terminology of the AASM manual for the scoring of sleep and associated events. J. Clin. Sleep. Med. 2007, 3, 752. [Google Scholar] [CrossRef]
Arora, N.; Meskill, G.; Guilleminault, C. The role of flow limitation as an important diagnostic tool and clinical finding in mild sleep-disordered breathing. Sleep Sci. 2015, 8, 134–142. [Google Scholar] [CrossRef]
Khalighi, S.; Sousa, T.; Pires, G.; Nunes, U. Automatic sleep staging: A computer assisted approach for optimal combination of features and polysomnographic channels. Expert Syst. Appl. 2013, 40, 7046–7059. [Google Scholar] [CrossRef]
Zhang, X.; Kou, W.; Chang, E.I.-C.; Gao, H.; Fan, Y.; Xu, Y. Sleep stage classification based on multi-level feature learning and recurrent neural networks via wearable device. Comput. Biol. Med. 2018, 103, 71–81. [Google Scholar] [CrossRef] [PubMed]
Nguyen, A.; Alqurashi, R.; Raghebi, Z.; Banaei-Kashani, F.; Halbower, A.C.; Vu, T. LIBS: A Lightweight and Inexpensive In-Ear Sensing System for Automatic Whole-Night Sleep Stage Monitoring. GetMobile Mob. Comp. Comm. 2017, 21, 31–34. [Google Scholar] [CrossRef]
Tripathy, R.K.; Ghosh, S.K.; Gajbhiye, P.; Acharya, U.R. Development of automated sleep stage classification system using multivariate projection-based fixed boundary empirical wavelet transform and entropy features extracted from multichannel EEG signals. Entropy 2020, 22, 1141. [Google Scholar] [CrossRef]
Ebrahimi, F.; Setarehdan, S.-K.; Nazeran, H. Automatic sleep staging by simultaneous analysis of ECG and respiratory signals in long epochs. Biomed. Signal Process. Control 2015, 18, 69–79. [Google Scholar] [CrossRef]
Gomez-Pilar, J.; Gutierrez-Tobal, G.C.; Poza, J.; Fogel, S.; Doyon, J.; Northoff, G.; Hornero, R. Spectral and temporal characterization of sleep spindles-methodological implications. J. Neural Eng. 2021, 18, 036014. [Google Scholar] [CrossRef]
Gupta, V.; Pachori, R.B. FBDM based time-frequency representation for sleep stages classification using EEG signals. Biomed. Signal Process. 2021, 64, 102265. [Google Scholar] [CrossRef]
Peker, M. An efficient sleep scoring system based on EEG signal using complex-valued machine learning algorithms. Neurocomputing 2016, 207, 165–177. [Google Scholar] [CrossRef]
Diykh, M.; Li, Y.; Wen, P. EEG Sleep Stages Classification Based on Time Domain Features and Structural Graph Similarity. IEEE Trans. Neural Syst. Rehabil. Eng. A Publ. IEEE Eng. Med. Biol. Soc. 2016, 24, 1159–1168. [Google Scholar] [CrossRef]
Čić, M.; Šoda, J.; Bonković, M. Automatic classification of infant sleep based on instantaneous frequencies in a single-channel EEG signal. Comput. Biol. Med. 2013, 43, 2110–2117. [Google Scholar] [CrossRef]
Memar, P.; Faradji, F. A Novel Multi-Class EEG-Based Sleep Stage Classification System. IEEE Trans. Neural Syst. Rehabil. Eng. A Publ. IEEE Eng. Med. Biol. Soc. 2018, 26, 84–95. [Google Scholar] [CrossRef] [PubMed]
Dimitriadis, S.I.; Salis, C.; Linden, D. A novel, fast and efficient single-sensor automatic sleep-stage classification based on complementary cross-frequency coupling estimates. Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 2018, 129, 815–828. [Google Scholar] [CrossRef] [PubMed]
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.M.; Wu, Y. Automatic sleep stage classification of single-channel EEG by using complex-valued convolutional neural network. Biomed. Eng.-Biomed. Tech. 2018, 63, 177–190. [Google Scholar] [CrossRef]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
Sors, A.; Bonnet, S.; Mirek, S.; Vercueil, L.; Payen, J.-F. A convolutional neural network for sleep stage scoring from raw single-channel EEG. Biomed. Signal Proces 2018, 42, 107–114. [Google Scholar] [CrossRef]
Hsu, Y.L.; Yang, Y.T.; Wang, J.S.; Hsu, C.Y. Automatic sleep stage recurrent neural classifier using energy features of EEG signals. Neurocomputing 2013, 104, 105–114. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, E215–E220. [Google Scholar] [CrossRef]
Kemp, B.; Zwinderman, A.H.; Tuk, B.; Kamphuisen, H.A.; Oberyé, J.J. Analysis of a sleep-dependent neuronal feedback loop: The slow-wave microcontinuity of the EEG. IEEE Trans. Bio-Med. Eng. 2000, 47, 1185–1194. [Google Scholar] [CrossRef]
Mousavi, S.; Afghah, F.; Acharya, U.R. SleepEEGNet: Automated sleep stage scoring with sequence to sequence deep learning approach. PLoS ONE 2019, 14, e0216456. [Google Scholar] [CrossRef]
van Sweden, B.; Kemp, B.; Kamphuisen, H.A.; van der Velde, E.A. Alternative electrode placement in (automatic) sleep scoring (Fpz-Cz/Pz-Oz versus C4-A1). Sleep 1990, 13, 279–283. [Google Scholar] [CrossRef] [PubMed]
Park, H.J.; Oh, J.S.; Jeong, D.U.; Park, K.S. Automated sleep stage scoring using hybrid rule- and case-based reasoning. Comput. Biomed. Res. Int. J. 2000, 33, 330–349. [Google Scholar] [CrossRef]
Percival, D.B.; Walden, A.T. Wavelet Methods for Time Series Analysis; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
da Silveira, T.L.T.; Kozakevicius, A.J.; Rodrigues, C.R. Automated drowsiness detection through wavelet packet analysis of a single EEG channel. Expert Syst. Appl. 2016, 55, 559–565. [Google Scholar] [CrossRef]
Tanaka, H.; Hayashi, M.; Hori, T. Topographical characteristics and principal component structure of the hypnagogic EEG. Sleep 1997, 20, 523–534. [Google Scholar] [CrossRef] [PubMed]
Oliva, J.T.; Rosa, J.L.G. Classification for EEG report generation and epilepsy detection. Neurocomputing 2019, 335, 81–95. [Google Scholar] [CrossRef]
Yin, Y.; Sun, K.; He, S. Multiscale permutation Rényi entropy and its application for EEG signals. PLoS ONE 2018, 13, e0202558. [Google Scholar] [CrossRef] [PubMed]
Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef]
Rosenberg, R.S.; van Hout, S. The American Academy of Sleep Medicine inter-scorer reliability program: Sleep stage scoring. J. Clin. Sleep Med. 2013, 9, 81–87. [Google Scholar] [CrossRef]
Horn, D.; Demircioğlu, A.; Bischl, B.; Glasmachers, T.; Weihs, C. A comparative study on large scale kernelized support vector machines. Adv. Data Anal. Classif. 2018, 12, 867–883. [Google Scholar] [CrossRef]
Tzeng, G.-H.; Huang, J.-J. Multiple Attribute Decision Making: Methods and Applications; CRC Press: Boca Raton, FL, USA, 2011; p. 335. [Google Scholar]
Seifpour, S.; Niknazar, H.; Mikaeili, M.; Nasrabadi, A.M. A new automatic sleep staging system based on statistical behavior of local extrema using single channel EEG signal. Expert Syst. Appl. 2018, 104, 277–293. [Google Scholar] [CrossRef]
Hassan, A.R.; Bashar, S.K.; Bhuiyan, M.I.H. On the classification of sleep states by means of statistical and spectral features from single channel Electroencephalogram. In Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, India, 10–13 August 2015; IEEE: Kochi, India, 2015; pp. 2238–2243. [Google Scholar]
Sharma, M.; Goyal, D.; Achuth, P.V.; Acharya, U.R. An accurate sleep stages classification system using a new class of optimally time-frequency localized three-band wavelet filter bank. Comput. Biol. Med. 2018, 98, 58–75. [Google Scholar] [CrossRef] [PubMed]
Dai, Y.; Zhang, H.; Mao, X.; Shang, P. Complexity–entropy causality plane based on power spectral entropy for complex time series. Phys. A Stat. Mech. Its Appl. 2018, 509, 501–514. [Google Scholar] [CrossRef]
Helakari, H.; Kananen, J.; Huotari, N.; Raitamaa, L.; Tuovinen, T.; Borchardt, V.; Rasila, A.; Raatikainen, V.; Starck, T.; Hautaniemi, T.; et al. Spectral entropy indicates electrophysiological and hemodynamic changes in drug-resistant epilepsy—A multimodal MREG study. NeuroImage. Clin. 2019, 22, 101763. [Google Scholar] [CrossRef] [PubMed]
Zhang, A.; Yang, B.; Huang, L. Feature Extraction of EEG Signals Using Power Spectral Entropy. In Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics, Sanya, China, 27–30 May 2008; pp. 435–439. [Google Scholar]
Jiayi, G.; Peng, Z.; Xin, Z.; Mingshi, W. Sample Entropy Analysis of Sleep EEG under Different Stages. In Proceedings of the 2007 IEEE/ICME International Conference on Complex Medical Engineering, Beijing, China, 23–27 May 2007; IEEE: Beijing, China, 2007; pp. 1499–1502. [Google Scholar]
Huang, J.-R.; Fan, S.-Z.; Abbod, M.; Jen, K.-K.; Wu, J.-F.; Shieh, J.-S. Application of Multivariate Empirical Mode Decomposition and Sample Entropy in EEG Signals via Artificial Neural Networks for Interpreting Depth of Anesthesia. Entropy 2013, 15, 3325–3339. [Google Scholar] [CrossRef]
Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef]
Nakamura, T.; Adjei, T.; Alqurashi, Y.; Looney, D.; Morrell, M.J.; Mandic, D.P. Complexity science for sleep stage classification from EEG. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 4387–4394. [Google Scholar]

Figure 1. The sleep stage classification processing flow adopted in this paper.

Figure 2. The wavelet packet tree applied for signal processing.

Figure 3. The nonlinear analysis computation process of Lempel–Ziv complexity (LZC).

Figure 4. The computation process of mRMR to select the effective features.

Figure 5. The classification performance was evaluated with the test set of SC4001.

Figure 6. The confusion matrix of trained model was calculated with different stages for test set of SC4001.

Figure 7. The average accuracy of five sleep stages’ classifications (a) and accuracy of sleep stage N1 (b), with the different number of input features.

Table 1. Five sleep stages and their corresponding characteristic frequency.

Sleep Stage	Characteristic Waves
W	Alpha (8–13 Hz) and beta (12–30 Hz)
N1	Theta (4–8 Hz)
N2	Spindle (12–14 Hz) and K complex (1 Hz)
N3	Delta (0.5–2 Hz)
R	Alpha (8–13 Hz), beta (12–30 Hz), theta (4–8 Hz), and sawtooth wave (2–6 Hz)

Table 2. The sample size distribution in different sleep stages.

Sleep Stage	Number of Epochs in Stages	Proportion
W	2201	26.97%
N1	579	7.10%
N2	3221	39.47%
N3	900	11.03%
R	1259	15.43%

Table 3. The run time of training and testing of the Support Vector Machine for each epoch.

	Cascaded SVM	SVM
Computing time for each epoch (s)	1.65	1.57
Average accuracy	88.11% ± 0. 67%	86.45% ± 0.71%
The average accuracy of N1	55.65% ± 3.13%	41.5% ± 1.72%

Table 4. The accuracy of the classification for five sleep stages using single-channel methods.

Reference	Classifier	Accuracy	Accuracy of N1
[41]	OCRNN	82.40%	33.39%
[18]	LSTM RNN	86.74%	61.09%
[42]	Elman RNN	87.20%	36.70%
[43]	Bagging	86.53%	27.48%
[26]	CNN	86.79%	34.92%
[44]	Multi-class SVM	83.92%	17.39%
This work	Cascaded Support Vector Machine	88.11%	55.65%

Table 5. The evaluation of nonlinear dynamics features.

Features	Computing Time(s)	Accuracy	Accuracy of N1	Score
Fuzzy entropy	2.04 ± 0.012	0.8560 ± 0.0133	0.4685 ± 0.0638	0.6136
LZC	2.03 ± 0.015	0.8586 ± 0.0143	0.4550 ± 0.0647	0.5399
Sample entropy	2.04 ± 0.021	0.8589 ± 0.0096	0.4750 ± 0.0740	0.3808
Multi-scale entropy	2.02 ± 0.014	0.8651 ± 0.0152	0.4524 ± 0.0391	0.5858

Table 6. The performance of different models.

Performance Parameter	Before Nonlinear Features Selection	After Nonlinear Feature Selection	Total Features
Computing time for each epoch (s)	2.08	1.65	2.65
Average accuracy	74.54% ± 0. 82%	88.11% ± 0.67%	74.36% ± 1.93%
Average accuracy of N1	26.58% ± 1.76%	55.65% ± 3.13%	26.97% ± 0. 75%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Fast Sleep Stage Classification Using Cascaded Support Vector Machines with Single-Channel EEG Signals

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Signal Preprocessing

2.2.1. Wavelet Packet Transform

2.2.2. Feature Extraction

2.2.3. Feature Selection

2.3. Cascaded Support Vector Machine Classifier

2.3.1. Data Set for SVM I

2.3.2. Data Set for SVM II

2.3.3. Transform the Input Data

3. Results and Discussion

3.1. The Average Accuracy of Sleep Stage Classification

3.2. The Sleep Stage Classification Performance

3.3. Influence by Number of Input Features on Classification Accuracy

3.4. Effectiveness Analysis of Different Nonlinear Dynamics Features

4. Conclusions

5. Patents

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Effective Time-Domain Features of the EEG Signals

Appendix A.2. Nonlinear-Dynamics-Domain Features of the EEG Signals

Appendix A.3. The Feature System Considered in this Study

Appendix A.4. The Permuted Rank of Different Features for SVM Classifiers

Appendix A.5. The Frequency Spectra That Were Used for Computing the Frequency-Domain Features

References

Article Metrics

Citations

Article Access Statistics