1. Introduction
Classification of sleep stages is essential for monitoring sleep quality and diagnosing related sleep disorders [
1]. The American Academy of Sleep Medicine (AASM) divides sleep stages into five stages, namely wakefulness (W), rapid eye movement (REM), and non-REM (subdivided into N1, N2, and N3) [
2]. Experts usually use polysomnography (PSG) to classify sleep stages. PSG includes electroencephalogram (EEG), electrooculogram (EOG), electromyogram (EMG), and electrocardiogram (ECG) [
3]. Manual recognition of PSG takes time and effort. In addition, it is not easy to collect PSG, which is not conducive to daily family use [
4]. Single-channel EEG signals are easy to collect, convenient for daily family use, and have received more attention in classifying sleep stages [
5]. Therefore, proposing an automatic sleep stage classification method based on single-channel sleep EEG is significant.
Feature extraction and classification of single-channel EEG is the key to the automatic classification of sleep stages [
6,
7,
8]. Venkat and colleagues [
9] used wavelet packet decomposition to extract five sub-bands from EEG and then extracted Hjorth parameters and feature ratios between different bands from the sub-bands. Finally, K-nearest neighbor (KNN) and SVM were used to classify the sleep stages. Liu and colleagues [
10] used ensemble empirical mode decomposition (EEMD) to decompose single-channel sleep EEG and then extracted time-domain features and nonlinear features from each intrinsic modal component (IMF). Jiang and colleagues [
11] used empirical mode decomposition (EMD) to decompose single-channel EEG and then extracted multiple time domains, frequency domains, and nonlinear features from the first seven IMFs. Finally, RF was used to realize automatic sleep stage classification. The above studies used signal decomposition for feature extraction, which can fully utilize EEG information. However, the existing studies lacked comparative studies on different signal decomposition methods.
In recent years, more and more studies have shown that deep learning performs well in sleep EEG classification [
5,
12]. Sharma and colleagues [
13] used a wavelet-scattering network to extract EEG features from a single-channel EEG and used a weighted K-nearest neighbor algorithm (WKNN) to classify sleep stages. Phan and colleagues [
14] proposed a joint classification and prediction framework based on convolutional neural networks (CNN) and adopted a one-to-many classification strategy to realize automatic sleep EEG classification. Heng and colleagues [
15] pointed out that the CNN could extract the time–frequency features of EEG signals, and the gated recurrent unit (GRU) could learn the transition rule of sleep stages. They built an end-to-end network based on CNN and GRU to achieve single-channel sleep EEG classification. Considering the potential regularity of sleep state transition, some studies have begun to use networks with temporal information-learning ability to classify single-channel sleep EEG, including long short-term memory (LSTM) and transformer [
16]. LSTM can capture the timing information of signals and excels in processing sequential data with long-term dependencies [
16]. The transformer network can learn contextual information, offering advantages in handling short sequences and enabling parallel computation [
1]. Nevertheless, most existing studies primarily rely on single-step time data as the input for their models, which hampers the models’ capacity to effectively learn the transition rules between different sleep stages [
17,
18].
The method based on temporal networks can learn the sleep transition rules during the classification process and improve accuracy. However, due to the classifier’s limited performance, some misclassification is inevitable. According to the previous sleep state, some misidentified sleep states can be well corrected. Ghimatgar and colleagues [
19] proposed a single-channel sleep EEG classification method based on the random forest (RF) and hidden Markov model (HMM). First, RF is employed to classify sleep EEG, and then, the HMM is used to learn the sleep transition rules. The experimental results validate the effectiveness of this approach. Networks such as LSTM and Transformer [
1,
18] are capable of learning sleep stage transition rules during classification, whereas HMM [
11] focuses on understanding these transition rules post-classification. Consequently, this paper explores the combination of a temporal network with HMM to learn the sleep transition rules from both perspectives, which has the potential to enhance the classification accuracy of single-channel sleep EEG.
This paper proposed a single-channel sleep EEG classification method based on LSTM and HMM (LSTM-HMM). The contributions of this paper are as follows.
(1) We compared the performances of EMD, VMD, SSA, and WT in the decomposition of single-channel sleep EEG. Further, we analyzed the performance of twenty wavelet functions, which provided a reference for other researchers for EEG decomposition and feature extraction;
(2) The proposed method considered the temporal structure of the sleep stage transition from two perspectives. First, the multi-step time features and LSTM were used to learn the sleep transition rules during the classification. After classification, the HMM was used to find the most likely sleep state transition sequence and automatically corrected the results;
(3) The proposed method was fully verified on the Sleep-EDFx dataset. The results show that WT can extract deep information from EEG. The proposed method can achieve high-precision sleep EEG classification using the sleep stage transition rules and is superior to most existing methods.
2. Method
The single-channel EEG classification method proposed in this paper based on LSTM-HMM is shown in
Figure 1. The proposed method can be divided into four stages, namely EEG segmentation, EEG feature extraction, EEG classification, and HMM-based correction. First, the entire night’s sleep EEG signals were segmented into 30 s segments. Then, the EEG was decomposed by WT, and the time domain, frequency domain, and nonlinear feature were extracted. After that, the multi-step time features were input into the LSTM network to realize sleep EEG classification. Finally, the predicted sleep state sequence throughout the night was input into the HMM, and the hidden state sequence with the most significant probability was obtained, which is the final prediction result.
2.1. EEG Decomposition Based on Wavelet Transform
The spatial resolution of the single-channel EEG is low, and the information obtained directly is small. Before feature extraction, this paper uses the decomposition method to decompose single-channel EEG signals to extract more hidden information. WT is a time–frequency decomposition positioning technique that decomposes signals through stretched and shifted wavelet functions [
20]. Compared with EMD, VMD, and SSA, WT can provide better time–frequency positioning [
21]. In addition, the most helpful information in sleep EEG is concentrated in low-frequency components below 30 HZ, and WT has significant advantages in extracting accurate low-frequency data [
22]. This paper uses discrete wavelet transform (DWT) to decompose the single-channel EEG signal. The DWT is defined as follows:
where
represents the signal and
represents the wavelet function.
and
represent the scale and displacement parameters. Using
k-layer DWT to decompose the single-channel EEG
, the
k + 1 sub-bands are obtained.
where
and
represent the low-frequency and high-frequency components of the EEG. The EEG is usually divided into five rhythmic waves; the lowest frequency rhythmic wave is delta (0–4 HZ). In order to excavate the hidden information of the EEG as much as possible, avoid excessive decomposition layers to increase the computing load. The signal decomposition is complete when the low-frequency component
is in the delta. In this paper, the Sleep-EDFx dataset is used for the experiments. The sampling frequency of the EEG is 100 HZ, so a four-layer DWT is designed to decompose the EEG. The distribution of sub-bands is shown in
Figure 2.
Different wavelet functions (WF) can affect the decomposition results of EEG [
21,
23]. This paper analyzed the performance of 20 different wavelet functions in EEG decomposition, which can provide a reference for other EEG-related research. The detailed information on the WF is shown in
Table 1.
2.2. Time Domain, Frequency Domain, and Nonlinear Feature Extraction
The single-channel EEG signal was decomposed to obtain five sub-bands, as shown in
Figure 2. After that, multiple time domains, frequency domains, and nonlinear features are extracted from the five sub-bands to explore the EEG information fully. The EEG features extracted in this paper are shown in
Table 2. For an EEG
with length
, the specific calculation of feature extraction is as follows.
2.2.1. Time Domain Features
Time domain features can provide the characteristics of signals in the time domain [
9]. The time-domain features extracted in this paper can be divided into statistical and Hjorth parameters. The statistical parameters include the absolute mean value (
MA), standard deviation (
Std), skewness (
Ske), and kurtosis (
Kur) of EEG [
10].
Hjorth parameters measure the characteristics of signals in the time domain from three aspects, namely activity (
HA), mobility (
HM), and complexity (
HC).
2.2.2. Frequency Domain Features
Frequency domain analysis requires the conversion of the EEG from the time domain to the frequency domain. This paper converts the time domain EEG signal into the frequency domain with the fast Fourier transform [
24]. After that, the statistical parameters of the signal in the frequency domain are extracted, namely the mean, standard deviation, skewness, kurtosis, and mean square value. In addition, the power spectral density (
PSD) of the signal is calculated, and the average power spectral density (
Mpsd) and power (
P) are extracted from the
PSD [
16].
where
represents frequency. In addition, the power ratio between five sub-bands is extracted:
,
,
,
,
,
,
,
,
,
.
2.2.3. Nonlinear Features
An EEG is a typical nonlinear signal, so the nonlinear features can measure the nonlinearity of EEG [
23]. This paper extracted five nonlinear features, namely approximate entropy (
AE), differential entropy (
DE), Shannon entropy (
SE), CO complexity (
CC), and fractal dimension (
FD) [
10].
- (1)
Approximate entropy (AE)
AE is used to quantify the regularity and unpredictability of signal fluctuations [
24]. First, the m-dimension reconstruction of EEG
is carried out:
Define the distance between
and
as:
Given the threshold
, count the number of
:
Finally, the
AE of the EEG is obtained:
- (2)
Shannon entropy (SE)
SE is used to measure the uncertainty ratio of a signal [
25]. The greater the
SE, the greater the randomness of the signal.
SE is defined as:
where
represents the probability of the occurrence of a random event
.
- (3)
Differential entropy (DE)
DE is a generalization of Shannon entropy on continuous variables. EEG approximately follows a Gaussian distribution
, and its
DE is [
26]:
- (4)
CO complexity (CC)
CC is used to measure the degree of irregularity of the signal [
27]. First, Fourier transforms the signal and calculates the average value of the power spectrum:
where
represents the result of the Fourier transform of EEG. After that, define a new sequence:
Finally, the CO complexity of EEG is obtained:
where
represents the inverse Fourier transform result of
.
- (5)
Fractal dimension (FD)
The
FD measures the complexity of a signal from the perspective of chaotic dynamics. In this paper, the Higuchi method is used to calculate the
FD [
11]. First, the signal is converted into
sequences:
where
. Define the length of each sequence as:
After that, the average of each sequence length is calculated:
Given the interval , calculate the corresponding . and are fitted linearly, and the slope of the linear fitting is the fractal dimension of EEG.
2.3. Classifier
The extracted features are reconstructed into multi-step time features and input into the classifier to realize sleep EEG classification. Ten classifiers are used in this paper, namely radial basis function support vector machine (RBFSVM), linear function support vector machine (LFSVM), random forest (RF), decision tree (DT), naive Bayes (NB), K-nearest neighbor (KNN), convolutional neural network (CNN), long short-term memory (LSTM), bidirectional LSTM (Bi-LSTM), and transformer encoder (TE).
- (1)
Support vector machine
SVM is a machine-learning method based on statistical learning theory, which performs well for small sample data [
28]. The core idea of SVM is to construct an optimal hyperplane in the projection space, separate different types of data, and maximize the distance between the two types of data [
23]. Different kernel functions will affect the performance of SVM. This paper uses the radial basis function and linear function as kernel functions:
where
represents the bandwidth of the kernel function. The penalty factor and kernel bandwidth of the support vector machine are set to
and
, respectively.
- (2)
Random forest
RF is an ensemble-learning model based on the Bagging strategy, which has better robustness to noise, lower complexity, and faster computing speed [
22]. There are many classification trees in the RF. Each classification tree classifies the samples during decision-making and determines the sample category according to the voting results [
19]. The two cores of RF are sample randomness and feature randomness [
12]. Sample randomization refers to sampling some samples randomly from the original dataset to form some sub-datasets. Feature randomization means that, when selecting the optimal feature, only the subset of features selected randomly is considered rather than all of the features. The RF’s tree number and depth are set to 60 and 10, respectively, and the classification tree is constructed using the C4.5 algorithm.
- (3)
Decision tree
DT is a model that displays decision rules and classification results with a tree-like data structure [
23]. The DT consists of a root node and several internal and several leaf nodes. Each internal node represents a test of a feature attribute. Each branch represents the test result, and each leaf node represents the decision result. In this paper, the C4.5 algorithm is used to generate the DT. The C4.5 algorithm uses the information gain ratio to discretize continuous features, thus achieving feature selection [
29]. The number depth of the DT is set to 10.
- (4)
Naive Bayes
NB is a classification method based on Bayes’ theorem and feature independence assumption [
30]. The core idea of NB is to use Bayes’ theorem to calculate the conditional probability that a given sample belongs to a class
:
where
represents the feature of the sample. When
is maximum, the corresponding class is the class of the given sample.
- (5)
K-nearest neighbor
KNN is a distance-based classification method [
9]. For a given sample, The KNN first finds K samples closest to the given sample by calculating the distance between the given sample and the known sample, which are called neighbors. Finally, the sample is classified as the class with the most occurrences among K’s nearest neighbors [
20]. The nearest neighbor of KNN is set to 5. For two samples
and
, the Euclidean distance is used to measure the distance between the samples:
- (6)
Convolutional neural network
CNN is a deep feed-forward neural network with a local connection and weight sharing, which is widely used in EEG classification [
2]. CNN usually contains the input, convolutional, activation, pooling, fully connected, and output layers [
14]. In this paper, a shallow CNN model is designed for EEG feature classification, and the model structure is shown in
Figure 3.
- (7)
Long short-term memory
LSTM is a classical recurrent neural network that performs well in sequence data processing [
18]. LSTM introduces the forgetting gate, input gate, and output gate based on traditional recurrent neural networks to control the flow of information, which can effectively solve the long-term dependence problem. The model structure is shown in
Figure 4. The “hidden-size” and “num-layers” of LSTM in this paper are set to 20 and 1, respectively. The Bi-LSTM network consists of two independent LSTM layers that handle time series’ forward and reverse flow, respectively. The “hidden-size” and “num-layers” of Bi-LSTM in this paper are set to 10 and 1, respectively.
- (8)
Transformer encoder
The transformer is composed of an encoder and decoder and is a sequence model based on the self-attention mechanism [
1]. This model is mainly used for various natural language processing tasks, such as translation and text generation, and is also gradually applied to time series prediction. This paper uses the transformer encoder (TE) to classify single-channel sleep EEG. The model structure is shown in
Figure 5. The “nhead” and “num-layers” of TE in this paper are set to 3 and 3, respectively.
2.4. HMM-Based Correction
The transition of sleep states has an apparent time structure. That is, the sleep state of the next stage is related to the sleep state of the previous stage [
17]. Therefore, it is possible to correct some of the misclassified sleep states by observing previous sleep states. However, manual recognition of misclassified sleep states is time-consuming and subjective. Some studies have pointed out that HMM can learn the sleep transition rule and realize the adaptive correction of misclassified sleep states. Therefore, HMM is used in this paper to self-correct the prediction results [
31]. The structure of the HMM is shown in
Figure 6.
represents the sequence of hidden states, and the corresponding set is
.
represents the sequence of observed states, and the corresponding set is
. The hidden state sequence is not visible, and the observed state sequence is visible.
The HMM can be represented using a set of parameters
[
15].
is the hidden state probability matrix, representing the probability of the current hidden state moving to the next hidden state.
is the observation probability matrix, representing the probability distribution corresponding to different observation results under the current hidden state.
is the initial probability distribution of the hidden states.
The classifier’s output is defined as the observed state sequence of the HMM, and the actual sleep state transition sequence is defined as the hidden state sequence of the HMM. According to the sleep stage division rule, the set of hidden and observed states in this paper is defined as
. During the model’s training, the dataset is divided into three parts, namely the training set, the validation set, and the test set. The training set is used to train the classifier model, and then, the trained classifier is used for the validation set. The predicted sequence and the actual sleep state sequence on the validation set are taken as the observed state sequence and hidden state sequence of the HMM. Then, the hidden state transition probability matrix
and the observed probability matrix
of the HMM are calculated using the maximum likelihood estimation.
where
represents the number of times the hidden state sequence transitions from
to
.
represents the number of times the observation state is
when the hidden state is
.
and
represent the length of the hidden and observed state sets, respectively, with
in this paper. Since the sleep process begins in an awake state, the initial probability distribution is set to
. Through the above methods, the HMM
is successfully constructed.
In the test phase, the feature of the test set is input into the trained classifier model, and the prediction result is obtained. Currently, the test set’s prediction result is the observed state sequence
, and the corresponding hidden state sequence
is the final prediction result. Then, for the trained HMM model
and the prediction result
, the final prediction result is:
In this paper, the Viterbi algorithm [
32] is used to solve the most likely hidden state sequence.
3. Sleep EEG Dataset
The Sleep-EDF database expanded (Sleep-EDFX, 2013 version) published on PhysioNet was used for the experiment [
33]. The dataset consisted of two subsets, and this paper used the sleep cassette. Twenty healthy subjects (age: 28.65 ± 8.65, 10 males) participated in the experiment. Subject 13 collected the EEG for one night, and the other subjects collected the EEG for two nights, totaling 39 EEG signals for the whole night. The experiment collected EEG signals of Fpz-Cz and Pz-Oz channels, and the sampling frequency was 100 HZ. In this paper, the EEG signal of the Fpz-Cz channel is used for experimental verification. A trained technician manually scores the corresponding sleep EEG (sleep pattern) according to the Rechtschaffen and Kales manuals. Finally, the technician labeled the sleep states at 30 s intervals according to the R&K rules: W, N1, N2, N3, N4, REM, MOVEMENT, and UNKNOWN.
In the data preprocessing stage, a bandpass filter of 0.5–100 HZ is used to reduce the noise of the EEG. According to the AASM modified sleep classification criteria, we discard the MOVEMENT and UNKNOWN tags in the dataset. The N3 and N4 stages were merged into N3, and the sleep state was divided into five stages, namely W, N1, N2, N3, and REM. The EEG was segmented for 30 s, with 3000 sampling points per segment, and mapped to the labeled sleep stages. The 30 s EEG fragments corresponding to different sleep states are shown in
Figure 7. Since nearly 24 h of EEGs were collected in the experiment, it was necessary to divide the night sleep time. In this paper, the EEG between staying awake for 30 min before falling asleep and staying awake for 30 min after waking up was intercepted, and the EEG of about 9 h was generally divided. The number of samples corresponding to each sleep stage in the dataset is shown in
Table 3.
5. Discussion
This paper proposed a single-channel sleep EEG classification method based on the LSTM and HMM. First, the proposed method used signal decomposition and multi-domain feature extraction to obtain deep EEG information. Second, the proposed method learned the sleep transition rules in the classification process using multi-step time features and temporal networks. Third, the proposed method used HMM to post-process the classification results and realize the automatic correction of the classification results. A complete experiment was conducted on the Sleep-EDFx dataset. The results show that the proposed method can achieve high-precision single-channel sleep EEG classification and is superior to most existing methods.
The performance of four signal decomposition methods (WT, EMD, VMD, and SSA) in single-channel sleep EEG classification was compared. The results in
Figure 8 show that WT achieved the highest accuracy, indicating that WT is more suitable for EEG decomposition. Then, the proposed method was used to classify single-channel sleep EEG, and the performance of 20 different wavelet functions was compared. The results in
Figure 14 show that the classification method based on WT-db4 and LSTM had the best performance, and the accuracy, MF1, and kappa were 82.71%, 0.75, and 0.76, respectively. This paper discussed the performances of various signal decomposition methods and wavelet functions in single-channel EEG classification, providing a reference for other EEG analyses.
Table 4 and
Figure 10 show the classification results of different classifiers in single-channel sleep EEG. The results show that the classification accuracy of temporal networks (LSTM, Bi-LSTM, and TE) in single-channel sleep EEG was higher than 81%, and the AUC was equal to 0.97, which was higher than that of other classifiers. The sleep state transition had potential regularity, and the temporal network could learn temporal information from temporal features so that the temporal network could improve the EEG classification accuracy.
Figure 11 shows the box diagram of the prediction results, and the distribution state of the classification results can be observed. It can be seen from the figure that LSTM, Bi-LSTM, and TE had good robustness, and the minimum accuracy of LSTM was about 70%, indicating that the method had high robustness.
Table 5 and
Figure 12 show the EEG classification results before and after HMM processing. Before HMM processing, the classification accuracy, MF1, and kappa of the LSTM were 81.66%, 0.74, and 0.74, respectively. After HMM processing, the classification accuracy, MF1, and kappa of LSTM were 82.71%, 0.75, and 0.76, respectively. The results show that the classification accuracy, MF1, and kappa improved after HMM processing, which indicated that the HMM could modify the classification results by learning the sleep transition rules after the classification was completed.
Figure 13 shows the confusion matrix of the classification results, and the specific classification results of different sleep stages can be observed. As can be seen from the figure, the amount of the N1 stage was significantly less than that of the other stages, so the sample imbalance existed in the sleep EEG dataset, which also led to the low classification accuracy of the N1 stage. At the same time, after HMM processing, the number of N1 correctly identified increased, indicating that HMM can improve the sensitivity of the N1 stage.
Finally, we compared the performances of single-step features and multi-step temporal features in EEG classification, and the results are shown in
Figure 15. The results show that the performance of the multi-step time features was better than that of the single-step features. The single-step features only used the single-moment EEG features as the model input, so it was difficult for the model to learn the sleep transition rules. The multi-step time features recombined the EEG features of multiple moments into a multi-step time feature matrix to contain the timing information of sleep transition, which was conducive for the timing network to learn the sleep transition rules.
There are also some limitations. First, the proposed method is challenging to directly apply to other EEG classifications of discontinuous states, such as motor-imaging EEG classification. Second, the direct application of the proposed method to the classification of a multi-channel EEG may result in dimensional disaster. Third, the classification accuracy of the proposed method in the N1 stage still needs to improve. Some aspects of the proposed method can be improved in future work so that the proposed method can be used for discontinuous state EEG and multi-channel EEG classification. At the same time, we can pay more attention to the EEG feature-extraction method of the N1 stage in the future.
6. Conclusions
This paper proposed a single-channel EEG classification method based on the LSTM and HMM, and a complete experiment was carried out on the Sleep-EDFx dataset. The performance of EMD, VMD, SSA, and WT in EEG classification was discussed in this paper. The results showed that WT was suitable for EEG decomposition. On this basis, the performance of 20 wavelet functions in EEG classification was discussed, which provided a reference for other EEG-related studies. During classification, the multi-step time features and LSTM were used to learn the sleep transition rules and improve the classification accuracy. After the classification, the proposed method used the HMM to learn the sleep transition rules and realize the adaptive correction of the classification results. The results showed that the proposed method successfully learned the sleep transition rules from two perspectives and significantly improved the classification accuracy of single-channel sleep EEG. The classification accuracy, MF1, and kappa of the proposed method were 82.71%, 0.75, and 0.76, respectively.