A Multi-Class Automatic Sleep Staging Method Based on Photoplethysmography Signals

Automatic sleep staging with only one channel is a challenging problem in sleep-related research. In this paper, a simple and efficient method named PPG-based multi-class automatic sleep staging (PMSS) is proposed using only a photoplethysmography (PPG) signal. Single-channel PPG data were obtained from four categories of subjects in the CAP sleep database. After the preprocessing of PPG data, feature extraction was performed from the time domain, frequency domain, and nonlinear domain, and a total of 21 features were extracted. Finally, the Light Gradient Boosting Machine (LightGBM) classifier was used for multi-class sleep staging. The accuracy of the multi-class automatic sleep staging was over 70%, and the Cohen’s kappa statistic k was over 0.6. This also showed that the PMSS method can also be applied to stage the sleep state for patients with sleep disorders.


Introduction
Sleep plays a very important role in our daily life and is closely related to the operation of many physiological systems in the body. Poor sleep quality not only affects people's daily life but also causes insomnia, narcolepsy, and other sleep disorders [1]. These acquired sleep disorders and congenital disorders are highly correlated with the duration of each sleep phase [2]. Not only that, sleep staging has been used to monitor the physiological status of some diseases in intensive care units, such as stroke, cardiovascular and cerebrovascular diseases, etc.
In the early study of sleep staging criteria, researchers divided sleep into non-rapid eye movement (NREM), wake, and rapid eye movement (REM) stages, and labelled rapid eye movement (REM) according to the state of the brain, blood pressure, heart rate, oxygen content in the blood, energy consumption, and other indicators of REM [3]. With the development of modern technology and further research on sleep, the non-rapid eye movement phase has been refined into sleep 1 (s1), sleep 2 (s2), sleep 3 (s3), and sleep 4 (s4) [4,5]. Among them, s1 and s2 are collectively referred to as light sleep, and s3 and s4 are collectively referred to as slow-wave sleep [6]. At present, the international standard staged sleep activity into five phases: REM, NREM I (N1), NREM II (N2), NREM III (N3), and wake (W) [7]. It is derived from researchers' use of polysomnography (PSG) and the American Academy of Sleep Medicine (AASM) sleep scores and related event rulebook divisions [8,9]. The "gold standard" for assessing the sleep stage is a sleep staging method based on PSG technology consisting of multiple digital signals, including electroencephalogram (EEG), electrocardiogram (ECG), leg and chin electromyography (EMG), electrooculogram (EOG), respiration, oxygen saturation, and airflow. PSG technology is usually performed by multiple certified researchers by analyzing PSG signals for sleep staging at 30 s intervals [10].
Traditionally, sleep physiologists perform sleep staging by visually examining PSG signals. This method not only consumes expensive human resources but also relies on the professional level and experience of the evaluator [11]. In recent years, automatic sleep staging technology has liberated the limitations of manpower and improved the efficiency of sleep staging, which has become the main direction of people's research at this stage and has achieved good results. In the early stage of the development of automatic sleep staging technology, researchers mainly used a combination of physiological signals to perform sleep staging, and the accuracy rate of 5-class sleep staging could reach more than 92% [12,13]. To reduce the impact of data collection on the subjects, the researchers turned to single-channel EEG signals, single-channel EOG signals, and single-channel ECG signals. Hassan and Bhuiyan used single-channel EEG signals to perform 5-class sleep staging with an accuracy of 91% [14]. Rahman et al. used single-channel EOG signals with over 90% accuracy for 5-class sleep [15]. Yücelbaş et al., Yoon et al., and Xiao et al. used ECG signals for 3-class sleep staging with an accuracy of more than 87% [7,16,17]. However, the above methods will affect the sleep state of subjects during the physiological signal collection process, and some methods must be performed in professional environments such as hospitals. To this end, Fonseca et al. and Beattie et al. tried to use the PPG signal extracted by the optical sensor for sleep staging and demonstrated the feasibility of the approach [18,19]. High-precision sleep staging and home sleep monitoring methods that reduce the impact on subjects become the next major goals for sleep staging researchers.
Here, the PMSS (PPG-based multi-class automatic sleep staging) method was proposed on a single-channel optical PPG signal for sleep staging. With less impact on the subjects during data collection, mobile monitoring and home monitoring can be easily achieved. In the CAP sleep database [20,21], PSG signals from more than 27,000 periods of 27 subjects were used to extract PPG signals as data, and the process of collecting physiological signal data would not affect the subjects' natural sleep and will not cause psychological distress to the subjects. After preprocessing the data, feature extraction was performed from the frequency domain, time domain, and nonlinear domain. Finally, the Light Gradient Boosting Machine (LightGBM) classification model was used to perform sleep staging according to multiple classification principles. The classification results are described based on various indicators such as confusion matrix, accuracy rate, recall rate, F1 value, and Cohen's kappa statistic. The method in this paper is suitable for family monitoring of different subjects, and the obtained results are equivalent to the results of PSG signal sleep staging.

Materials and Methods
PPG, based on a reflection-type detector based on an LED light source, measures the attenuation of reflected light, some of which is absorbed by human blood vessels and tissues. The pulse state of the blood vessel is then recorded, and the pulse wave is plotted. PPG signal can extract physiological signals such as heart rate, SpO2, and heart rate variability. This PPG-based test is usually applied to the fingertips, so it is safe, painless, and contains all the information needed for sleep staging and long-term monitoring. It is the first choice for portable sleep staging.
The PPG data collected in the CAP sleep database was abstracted to verify the PMSS method. The CAP sleep database is a collection of 108 polysomnography records registered by the Ospidere Marjorie Sleep Disorders Center in Palma, Italy. Using the Rechtschaffen and Kales (R&K) guidelines [8] and the AASM guidelines [9], several experts annotated all PSG records for a sleep phase every 30s and assigned the sleep phase to each data epoch. Due to the lack of PPG signal data in the PSG signal of 108 subjects, the sleep data of 27 subjects without PPG signal loss were used in this experiment, including 4 healthy subjects, 8 patients with REM sleep behavior disorder, 10 patients with nocturnal frontal lobe epilepsy, and 5 patients with insomnia. The subject information was extracted as shown in Table 1. In this study, small changes in PPG signals at different sleep stages were used to conduct multi-class sleep staging. Since there was be a lot of interference in the collection process of physiological signals, this study first preprocessed the PPG signals. Then, the features extracted from time-domain features, nonlinear-domain features, and heart rate variability signals were prepared for the preprocessed signals. Then, the frequencydomain features of heart rate variability were extracted. Finally, the extracted features were put into the machine learning model for sleep staging. The basic flow is shown in Figure 1, and the details of each step are described below.

Preprocessing of Raw Data
For noise, such as baseline drift and power frequency interference of PPG signal, the wavelet transform method was adopted in this experiment, and the BiOR3.5 wavelet was selected for filtering. At the same time, the 6 classification tags in the original data were converted into a multimodal tag, as shown in Figure 2. In addition, this study still used the division of sleep stage every 30 s in the original data and manually deleted some data without sleep state annotations, and finally obtained 27,333 sleep state data.

Feature Extraction Process
In this experiment, all features were extracted from the PPG signal, and these features can be divided into, time-domain features, frequency-domain features, and nonlinear domain features.

Time-Domain Feature Extraction of PPG Signal
The time-domain characteristics of the PPG signal can intuitively reflect the changes of sleep stage with time. The time-domain features extracted in this paper are shown in Table 2, where Z stands for PPG data.

Name
Meaning Formula Dif_PPG difference between the maximum and the minimum Comentropy of first order difference Comentropy(1st) En2nd_PPG Comentropy of Second order difference Comentropy(2nd)

En1st_2nd_PPG
Comentropy of first-order difference divided by entropy of second-order difference

Frequency-Domain Features
The frequency-domain characteristics of the heart rate variability signal extracted from the PPG signal clearly reflect the activity of the human autonomic nervous system. The features include the frequency band power of each frequency band of heart rate variability. According to the PPG signal, the R-R intervals can be accurately obtained, thereby reliably obtaining heart rate variability. Heart rate variability is usually obtained by ECG signals. In this study, there is a strong correlation between the heart rate variability signals obtained by PPG signals and the heart rate variability signals extracted by ECG signals using PSG datasets, which has also been proven by some scholars [22]. The frequency band of heart rate variability can be divided into low frequency (LF.0.04-0.15) and high frequency, (HF.0.15-0.4), where LF can be further divided into true low frequency (TLF.0.04-0.1) and medium frequency (MF.0.1-0.15). The power of LF and HF bands is related to the regulation of the sympathetic nervous system (SNS) and parasympathetic nervous system (PNS), respectively.

Nonlinear Features
The information extracted by frequency-domain features and time-domain features was still limited, so this experiment introduced many advanced nonlinear feature extraction methods to further extract PPG signal features. Since the PPG signal sampling time at each sleep stage is only 30 s, this paper adopted a nonlinear feature extraction method suitable for short-term PPG signals. These methods included, approximate entropy (ApEn), sample entropy (SampEn), fuzzy entropy permutation entropy, and recurrence Plot.
In order to solve the difficulty of solving entropy in chaos, Pincus proposed the concept of ApEn analysis: an indicator used to measure the complexity of time series from nonlinear time series [23,24]. The theoretical implementation of the ApEn algorithm is shown below: Perform m-dimensional spatial reconstruction on an N-dimensional time series [u(1), u(2),. . . , u(N)] obtained by sampling at equal time intervals. The reconstructed i-th vector is expressed as Equation (1): For 1 ≤ i ≤ N − m + 1, calculate the number of vectors that satisfy the following Formula (2). Given the threshold r, usually r = 0.1~0.25 SD (SD is the standard deviation of the sequence X(i): where d[X(i), X(j)] represents the maximum distance between X(i) and X(j), m is the pre-selected mode dimension. ApEn is calculated as: During the calculation of ApEn, the m is set as 2 and the r is 0.15 SD. SampEn is also a method to describe the complexity of time series, which is improved based on the ApEn method [25]. It has applications in assessing the complexity of physiological time series and diagnosing pathological state. The SampEn algorithm steps for an original time series [u(1), u(2), . . . , u(N)] are as follows: Firstly, using the original time series construct a set of m dimensional vectors, where X(i) = [u(i), u(i + 1), . . . , u(i + m − 1)]. For 1 ≤ i ≤ N − m + 1, calculate the number of vectors that satisfy the following formula [26][27][28]. Define the function: Then, define another function, let k = m + 1: For a finite dataset, the SampEn is estimated as follows: Fuzzy entropy is similar to the physical meaning of ApEn and SampEn. It measures the magnitude of the probability that the new model produces. The larger the measure, the greater the probability that the new pattern will produce, meaning that the sequence complexity is greater. The fuzzy entropy algorithm steps for an original time series [u(1), u(2), . . . , u(N)] are as follows [29,30]: Firstly, using the original time series construct a set of m dimensional vectors, where X(i) = [u(i), u(i + 1), . . . , u(i + m − 1)] − 1 m ∑ m−1 j=0 u(i + j), j = 1, 2, . . . , N − m + 1. Then, add fuzzy membership function: Therefore, the fuzzy entropy of the original time series is as follows: For a finite dataset, the fuzzy entropy is estimated as follows: where d m ij = d[X(i), X(j)] represents the maximum absolute distance between the window vectors X(i) and X(j), r is the similar tolerance limit, and m is the preselected modal size. The paper takes m = 2 and r = 0.15SD.
The permutation entropy is the same as the ApEn, SampEn, and fuzzy entropy mentioned above, and is an indicator for measuring the complexity of time series. The difference is that it introduces the idea of permutation when calculating the complexity between reconstructed subsequences [31]. The permutation entropy algorithm steps for an original time series [u(1), u(2), . . . , u(N)] are as follows: First, phase space reconstruction of time series X (phase space size is denoted as m) yields a matrix: rearrange the ascending order of each row of the reconstructed matrix, and if the same two values are encountered, arrange them according to the subscript, thus generating a sequence of symbols. Finally, the number of occurrences of row subscript order is calculated as the row probability, and the entropy of arrangement is the sum of the entropy of all rows in the time series.
The recurrence plot method is an innovative tool for analyzing periodicity and nonlinearity of time series, and it can dig out the internal structure of time series. The recursive graph is used in the experimental data by Eckmann et al. [32,33] and its definition is as follows: Recurrence rate is the density of recursive points in a recursive graph, which is the percentage of recursive points (the proportion of the total number of black points in the recursive graph); determinism is the percentage of recursive points that form a diagonal in the recursive graph (the proportion of black points on the line segment that constitutes the parallel diagonal direction). The measure is defined as follows:

Summary of PPG Features
A total of 27 features were explored and are summarized in Table 3.

Classification Procedures
Following the completion of the above preparation phase, the feature dataset was standardized and subjected to leave-one-out cross-validation. Afterward, the datasets were classified by using the LightGBM and the sleep staging process was performed.
LightGBM is a gradient lifting algorithm based on Gradient Boosting Decison Tree (GBDT) [34]. Its main improvement measures include histogram algorithm and leaf-wise decision tree growth strategy with depth limitation. The decision tree submodel in Light-GBM splits the nodes by tiling. Therefore, compared with XGBoost [35], its computational cost is small. We must control the depth of the tree and the minimum data of each leaf node to avoid the fitting phenomenon. The histogram-based decision tree algorithm divides the feature values into multiple kegs and then searches for optimal partitions on those buckets, thereby reducing storage and computational costs. This enhances the robustness to noise while ensuring good evaluation accuracy and training speed. It is proposed to solve the problems encountered by GBDT in massive data so that GBDT can be better applied to reality.
First, given the training set X = {(x i, y i )} n i=1 , the purpose of the LightGBM algorithm is to find a suitable p(x), as close as possible to p * (x), to minimize the expected value of the specific loss function L(y, p(x)), as follows: p(x) = argmin p E y,X L(y, p(x)) LightGBM integrates a large number of T-regression trees ∑ T t=1 p t (X), which can be approximated to the final model: The regression tree should be represented as W q(x) , q ∈ {1, 2, . . . , J} where J is the number of leaves, q is the decision rule of the tree, and w is a vector of leaf node sample weights. Hence, LightGBM is trained in the following form: The objective function is a fast approximate-place Newton method. For the sake of simplicity, the constant term in (21) is removed and the formula becomes: where g i represents first-order statistics of loss function, h i represents second-order statistics of loss function, and I j is the sample set of the leaf j, and then (22) can represent the following formula: For the structure q(x) of the tree, the optimal leaf weight fraction of each leaf node w * j and the extreme value of ϕ * T can be solved as: ϕ * T is an important scoring function of the tree structure q, then the objective function can be expressed as: where I L and I R are the left and right branches of the sample set, and LightGBM will allow the tree to grow vertically, which will be more efficient when processing large amounts of data.

Decision Mechanism
In the field of machine learning, accuracy is the most basic statistical classification evaluation indicator, but it cannot fully demonstrate model performance. In order to visualize the performance of the sleep staging algorithm objectively and comprehensively, the confusion matrix, recall rate and F1 score were used as evaluation criteria. In addition to statistical criteria, Cohen's kappa coefficient is used to represent the correlation of sleep staging results [36].

Results and Discussion
This study used PPG data extracted from the CAP sleep database, which was derived from 27 subjects, 4 of whom were healthy, 5 had insomnia, 10 had nocturnal frontal lobe epilepsy, and 8 had REM behavior disorder. First of all, 27,333 periods were obtained after data preprocessing such as data cleaning, data filtering, and data denoising. In order to make the final result more authoritative, this study performed a balanced processing of various types of data according to multimodal staging criteria.
After preprocessing, this study extracted features from the time domain, frequency domain, and nonlinear domain and obtained 21 features in total. After effective features were determined, the feature dataset were normalized, and the dataset was divided using the 10-fold cross-validation method. Finally, the training set was used to train the LightGBM classifier for sleep staging, the validation set was used to adjust the model, and the test set was used to evaluate the model according to the above evaluation criteria.
In this experiment, four evaluation indexes including accuracy, recall rate, F1 score, and Cohen's kappa statistic k were used to evaluate the performance of the model. The results of multi-class sleep staging in the test dataset are shown in Table 4. Among them, the accuracy rate of the 3-class was higher than 86%, and Cohen's kappa statistic k was also higher than 0.79, which was highly similar to the expert scoring results. The sleep staging results of the 4-class and the 5-class were slightly inferior to the sleep staging results of the 3-class, but the accuracy rate was also higher than 72% and the Cohen's kappa statistic k coefficient was also higher than 0.6, which basically meets the accuracy requirements of most sleep staging scenes. From the experimental results, the PMSS method still lacks the ability of multi-class sleep classification compared with the ability of multi-class sleep staging using EEG signals. However, compared with other single-channel physiological signals for sleep classification, such as ECG and respiratory signals, the accuracy was significantly improved. Table 4. Confounding matrix and evaluation index of multiple sleep stages. The divided test dataset contained a mixed dataset of some data from healthy subjects, insomnia subjects, nocturnal frontal lobe epilepsy subjects, and REM behavior disorder subjects. In order to verify the classifying ability of the experimental model for subjects with sleep disorders, this article used four different health conditions of the subject data to perform 4-class sleep staging. The results of sleep staging are shown in Figure 3. Among them, the sleep staging ability is the best for healthy people, with an accuracy rate of more than 80%. The sleep staging ability of the subjects with the disease was decreased, but the consistency is more than 0.60. It can be concluded that the model is still suitable for sleep staging of subjects with sleep disorders. Compared with articles using PPG signals for sleep classification in recent years, this experiment only uses single-channel PPG signals as classification data and does not require the assistance of other signals, which greatly reduces the impact of the experiment on the subjects' natural sleep. Judging from the classification results, the accuracy and Cohen's kappa statistic k of sleep staging in 3-class, 4-class, and 5-class in this experiment are higher than those of the current study of sleep classification using PPG signals. Not only that, the experimental sleep classification model has a strong generalization ability and can meet the sleep staging needs of patients with sleep diseases. Nevertheless, this experiment still encountered some problems. The collection of PPG signals is based on the principle of light reflection. Therefore, when this method is used at a high light intensity, large errors will occur, which will become a problem to be solved in the next step.

Conclusions
PMSS method was proposed with only PPG signal used to stage the sleeping status. PMSS method can achieve 3-class, 4-class, and 5-class sleep staging, and the results of multi-class sleep staging are highly consistent with the results of manual sleep staging conducted by several experts based on PSG signals. The reason is that because PPG signals can extract HRV signals and SpO2 signals, they have all the information of these signals. At the same time, these signals have been recognized by many scholars as suitable for sleep staging. It is well understood that PPG signals can be obtained in sleep staging experiments. In addition, this method can also achieve a consistent result on PPG data of subjects with sleep disorders. Therefore, this study believes that the PMSS method has generalization ability and can be applied to home sleep monitoring for patients with sleep disorders and healthy subjects, greatly reducing human resource consumption and reducing the impact on the subject during sleep monitoring. In the next step of this study, considering the accuracy of this method, we will try to apply this method to the diagnosis of sleep disorders.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
Publicly available datasets were analyzed in this study. This data can be found here: https://www.physionet.org/content/capslpdb/1.0.0/.