# Automatic Sleep Staging Based on Single-Channel EEG Signal Using Null Space Pursuit Decomposition Algorithm

## Abstract

## 1. Introduction

- (1)
- An automatic sleep scoring method based on single-channel EEG is proposed.
- (2)
- A new signal processing technique, NSP decomposition, is used for sleep staging.
- (3)
- The effectiveness of this method is verified by statistical analysis and graphical analysis.
- (4)
- Compared with the existing schemes, the performance of this scheme is promising.
- (5)
- The automation of the classification method avoids the manual time-consuming nature and subjectivity of scoring.

## 2. Materials and Methods

#### 2.1. Datasets and Data Preprocessing

#### 2.2. Methods

#### 2.2.1. NSP Algorithms

#### 2.2.2. Feature Extraction

#### 2.2.3. Classification Algorithms

#### 2.2.4. Model Evaluation

## 3. Results

#### 3.1. Analysis of Classification Results

#### 3.2. Feature Importance Analysis Results

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

**Figure 3.**Decomposition of NSP after preprocessing of the 30-second EEG signal in the deep sleep stage.

Database | Demographics | Electroencephalographic(EEG) | ||||
---|---|---|---|---|---|---|

Subjects | Age | Gender | Lead | Frequency | Epochs | |

Sleep-EDF | 8 | $28.5\pm 5.4$ | 4M/4F | Pz-Oz | 100 | 15188 |

DREAMS | 20 | $33.5\pm 14.6$ | 4M/16F | Cz-A1 | 200 | 20242 |

SHHS ${}^{1}$ | 111 | $57.5\pm 11.3$ | 20M/91F | C4-A1 | 125 | 113347 |

^{1}: SHHS is the abbreviation of Sleep Heart Health Study.

Feature | Computing Formula | Feature Description |
---|---|---|

Mean | $ME=\frac{1}{M}{\displaystyle \sum _{n=1}^{M}}S\left(n\right)$ | ME describes the middle point of the sample set. |

Skewness | $SK=\frac{1}{{\sigma}^{3}}{\displaystyle \sum _{n=1}^{M}}{\left(S\left(n\right)-ME\right)}^{3}$ | SK is a measure of the asymmetry of the probability distribution of real variables [24]. |

Kurtosis | $KU=\frac{1}{{\sigma}^{4}}{\displaystyle \sum _{n=1}^{M}}{\left(S\left(n\right)-ME\right)}^{4}$ | KU is a measure of the kurtosis of the probability distribution of real-valued variables [24]. |

Zero crossing rate | $ZCR=\frac{1}{M-1}{\displaystyle \sum _{n=2}^{M}}{\phi}_{r<0}\left(S\left(n\right)S(n-1)\right)$ | ZCR is the change rate of the signal sampling point symbol [25]. |

Sample entropy | $SE(S,m,r)=-ln\left({A}_{m+1}\left(r\right)/{B}_{m}\left(r\right)\right)$ | SE is often used to measure the complexity of time series [26]. |

Permutation entropy | $PE=-{\displaystyle \sum _{t=1}^{L}}{\xi}_{t}ln\left({\xi}_{t}\right)$ | PE can quickly and accurately respond to the sudden change of the signal, which is a standard to measure the complexity of the signal [27]. |

Flexibility | $HA={\sigma}^{2}=\frac{1}{M}{\displaystyle \sum _{n=1}^{M}}{\left(S\left(n\right)-ME\right)}^{2}$ | HA represents the fluctuation degree of the EEG signal [28]. |

Complexity | $HM=\sqrt{\left({\displaystyle \sum _{n=1}^{M-1}}{\left({S}^{\prime}\left(n\right)-\overline{{S}^{\prime}}\right)}^{2}/(M-1)\phantom{\rule{0.277778em}{0ex}}\right)/\left({\displaystyle \sum _{n=1}^{M}}{\left(S\left(n\right)-ME\right)}^{2}/(M-1)\phantom{\rule{0.277778em}{0ex}}\right)\phantom{\rule{0.277778em}{0ex}}}$ | HM represents the slope of the EEG signal [28]. |

Mobility | $HC=\frac{\sqrt{\left({\displaystyle \sum _{n=1}^{M-2}}{\left({S}^{\u2033}\left(n\right)-\overline{{S}^{\u2033}}\right)}^{2}/(M-2)\phantom{\rule{0.277778em}{0ex}}\right)/\left({\displaystyle \sum _{n=1}^{M-1}}{\left({S}^{\prime}\left(n\right)-\overline{{S}^{\prime}}\right)}^{2}/(M-1)\phantom{\rule{0.277778em}{0ex}}\right)\phantom{\rule{0.277778em}{0ex}}}}{\sqrt{\left({\displaystyle \sum _{n=1}^{M-1}}{\left({S}^{\prime}\left(n\right)-\overline{{S}^{\prime}}\right)}^{2}/(M-1)\phantom{\rule{0.277778em}{0ex}}\right)/\left({\displaystyle \sum _{n=1}^{M}}{\left(S\left(n\right)-ME\right)}^{2}/M\phantom{\rule{0.277778em}{0ex}}\right)\phantom{\rule{0.277778em}{0ex}}}}$ | HC represents the change rate of the slope of the EEG signal [28,29]. |

Model Evaluation | Computing Formula |
---|---|

Accuracy | $accuracy=\frac{{\displaystyle \sum _{\mathrm{i}=1}^{A}}{Q}_{ii}}{SUM}$ |

Specificity | $specificity=\frac{TN}{TN+FP}$ |

Sensitivity | $sensitivity=\frac{\mathrm{TP}}{TP+FN}$ |

Database | 4-Class Classifier | ||||
---|---|---|---|---|---|

W | LS | N3 | R | ||

Sleep-EDF | specificity | 97.65% | 97.00% | 96.08% | 91.51% |

sensitivity | 98.1% | 89.14% | 91.56% | 95.75% | |

DREAMS | specificity | 95.23% | 93.87% | 92.15% | 83.24% |

sensitivity | 96.11% | 92.87% | 90.06% | 80.42% | |

SHHS | specificity | 94.06% | 90.53% | 88.32% | 95.84% |

sensitivity | 90.08% | 93.74% | 92.51% | 94.73% |

Database | 5-Class Classifier | |||||
---|---|---|---|---|---|---|

W | N1 | N2 | N3 | R | ||

Sleep-EDF | specificity | 96.41% | 94.46% | 93.63% | 93.11% | 85.38% |

sensitivity | 98.67% | 47.36% | 90.57% | 90.68% | 85.60% | |

proportion | 53.03% | 3.1% | 23.9% | 8.8% | 11.17% | |

DREAMS | specificity | 93.96% | 91.47% | 89.81% | 87.39% | 81.53% |

sensitivity | 95.18% | 59.32% | 89.85% | 92.12% | 83.10% | |

proportion | 35% | 3.4% | 28.6% | 13% | 20% | |

SHHS | specificity | 95.84% | 92.03% | 91.00% | 89.69% | 84.33% |

sensitivity | 94.73% | 9.21% | 92.00% | 89.09% | 81.90% | |

proportion | 27.65% | 2.6% | 39.4% | 14.8% | 15.55% |

Database | 4-Class Classifier | 5-Class Classifier | |
---|---|---|---|

Sleep-EDF | Accuracy | 93.59% | 92.89% |

Kappa | 0.8924 | 0.8837 | |

DREAMS | Accuracy | 91.32% | 90.01% |

Kappa | 0.8619 | 0.8392 | |

SHHS | Accuracy | 90.25% | 88.37% |

Kappa | 0.8412 | 0.8238 |

Database | Year | Name | Decomposition Algorithm | Features ang Signal Channel | Classifiers | 4-Class | 5-Class |
---|---|---|---|---|---|---|---|

Sleep-EDF | 2014 | Zhu | Degree distribution based on difference visibility (Pz-Oz) | Support vector machine | 89.3% | 88.9% | |

2017 | Hassan | Tunable-Q factor wavelet transform | Four statistical moments (Pz-Oz) | Bagging | 94.36% | 93.69% | |

2018 | Seifpour | Statistical behavior of local extrema (Fpz-Cz) | Support vector machine | 92.8% | 91.8% | ||

2021 | Cong Liu | Ensemble empiricar model algorithm (EEMD) | Mean, skewness, kurtosis, time domain, and nonlinear dynamics features (Pz-Oz) | XGBOOST | 93.1% | 91.9% | |

In this work | NSP | Mean, skewness, kurtosis,time domain, and nonlinear dynamicsfeatures(Pz-Oz) | XGBOOST | 93.59% | 92.89% | ||

DREAMS Subjects | 2017 | Hassan | EEMD | Statistical features (Cz-A1) | Random under sampling boosting | 80.0% | 74.6% |

2017 | Hassan | Tunable-Q factor wavelet transform | Four statistical moments (Cz-A1) | Bagging | 83.78% | 78.95% | |

2018 | Seifpour | Statistical behavior of local extrema (Cz-A1) | Support vector machine | 83.3% | |||

2021 | Cong Liu | EEMD | Mean, skewness, kurtosis, time domain, and nonlinear dynamics features (Cz-A1) | XGBOOST | 86.4% | 83.4% | |

In this work | NSP | Mean, skewness, kurtosis,time domain, and nonlinear dynamicsfeatures(Cz-A1) | XGBOOST | 91.32% | 90.01% | ||

SHHS | 2021 | Cong Liu | EEMD | Mean, skewness, kurtosis, time domain, and nonlinear dynamics features (C4-A1) | XGBOOST | 87.5% | 85.8% |

In this work | NSP | Mean, skewness, kurtosis,time domain, and nonlinear dynamicsfeatures(C4-A1) | XGBOOST | 90.25% | 88.37% |

Database | 4-Class Classifier | 5-Class Classifier | |
---|---|---|---|

Sleep-EDF | Accuracy of using NSP | 93.59% | 92.89% |

Accuracy without NSP | 89.56% | 88.20% | |

DREAMS | Accuracy of using NSP | 91.32% | 90.01% |

Accuracy without NSP | 84.34% | 82.63% | |

SHHS | Accuracy of using NSP | 90.25% | 88.37% |

Accuracy without NSP | 85.73% | 81.58% |

