1. Introduction
With the widespread use of wearable sensors, heartbeat and physical activity data are accumulating and forming physiological big data. To utilize these data for self-healthcare, the detection of sleep period and assessment of sleep quality are important issues. Although plenty of earlier studies have reported various algorithms and methods for this purpose, most of them used either heartbeat or actigraphic data [
1,
2,
3,
4,
5,
6,
7,
8], and only a few of them reported the combined use of both parameters in a small population [
9]. Because many recent wearable sensors of electrocardiogram (ECG)/pulse wave also equip triaxial accelerometers, algorithms utilizing both data modalities seem useful. In addition, the sample size of earlier studies was small compared to the number of features used for sleep stage classification. Thus, their performance when applying real world data from a large population is unknown.
In the present study, we collected ECG and actigraphic data during a polysomnographic study (PSG) in 289 subjects. We examined the performance of sleep stage classification by the combination of heart rate variability (HRV)-derived autonomic indices and actigraphic parameters by comparing it with sleep stage classification determined by PSG. Autonomic function state is thought to differ between non-rapid-eye-movement (REM) (NREM) sleep and waking/REM sleep [
10,
11,
12,
13], while it is similar between waking and REM sleep [
8]. In contrast, the magnitude of body movement (BM) differs between waking and REM sleep due to its suppression during REM sleep. Thus, we evaluated the classification performances separately for the discrimination of NREM sleep from waking/REM sleep and of waking from REM sleep.
2. Methods
The present study was performed according to the protocol that was approved by the Institutional Review Board of Nagoya City University Graduate School of Medical Sciences and Nagoya City University Hospital (No. 60160097).
2.1. Data Collection
We studied ECG and wrist actigraphic data recorded simultaneously with an overnight PSG in 289 subjects (199 males and 90 females, median age (interquartile range (IQR)) 52 (37–65) years). They were consecutive subjects whose ECG showed sinus rhythm. The PSG examinations were performed in the Sleep Disorder Center, Aichi Medical University Hospital (Nagakute city, Aichi, Japan) for 168 subjects and the Gifu Mates Sleep Clinic (Gifu city, Gifu, Japan) for 121 subjects. The subjects included 36 that were normal, 168 that had sleep apnea syndrome, 31 that had REM sleep behavioral disorder, 23 that had insomnia, 14 that had hypersomnia, 12 that had narcolepsy, and 5 that had restless leg syndrome. None of the subjects had heart diseases or atrial fibrillation.
The ECG was recorded with a modified V2 lead and sampled at 500 Hz. Actigraphic data were collected by a wrist actigraphic sensor (HFM-3D Prototype, Suzuken Co., Nagoya, Japan), which digitized and stored triaxial acceleration data at 31.25 Hz.
Sleep stages were scored per 30-s window according to the American Academy of Sleep Medicine (AASM)’s guidelines by registered polysomnographic technicians of each center. Then, six consecutive windows were grouped together into one epoch (length, 3 min). A stage of the epoch was defined as those of the majority among six windows, i.e., if the stage of >4 windows was the same, it was chosen as a stage of the epoch. If there was no majority stage among six windows, the epoch was defined as a transitional stage. This exclusion criterion is a mitigated criterion used in earlier studies [
1].
2.2. Data Analysis
The PSG, ECG, and actigraphic data were synchronized by entering a time marker into the PSG and ECG recording when the actigraphic recording was started. From the ECG data, all R waves were detected and labeled for the rhythm of beat (sinus, ventricular ectopic, supraventricular ectopic, noise, etc.) and R-R interval time series were generated. The R-R interval time series and actigraphic data were divided into consecutive 3-min epochs. For each epoch, heart rate variability (HRV) was analyzed by the methods that have been reported elsewhere [
14]. Briefly, R-R intervals were interpolated by a step function only using consecutive sinus rhythm (normal-to-normal) R-R intervals (NN intervals). Then, interpolated data were resampled equidistantly in time, filtered with the Hanning window, and analyzed by fast Fourier transformation. After correcting for the losses of variance resulting from the Hanning window filtering, the obtained power spectra were integrated for the power of very low frequency (VLF, 0.003–0.04 Hz), low frequency (LF, 0.04–0.15 Hz), and high frequency (HF, 0.15–0.45 Hz) components. According to the guideline for heart rate variability analysis [
15], the power of these components was transformed into logarithmic values. The ratio of LF to HF power (LF/HF) was also calculated.
For the actigraphic data in each 3-min epoch, acceleration signals in the x, y, and z axes were bandpass filtered (0.02–0.08 Hz) to remove baseline trend and high-frequency noise, and were composed into the magnitude, Act(t), with the following equation.
Then, the average, median, maximum, and upper 95% values of Act(t) during each 3-min epoch were calculated as the corresponding indices of BM, respectively.
2.3. Statistical Analysis
We used the program package of Statistical Analysis System (SAS institute, Cary, NC, USA) for the statistical analyses. We evaluated the performance of classifications between NREM sleep and waking/REM sleep and between waking and REM sleep by multivariate logistic regression analyses, in which the heart rate (HR), VLF, LF, HF, LF/HF, and the average, median, maximum, and upper 95% values of BM were candidate independent variables and regression models were generated by stepwise variable selection. We evaluated the sensitivity and specificity of the classification with the area-under-curve (AUC) of a receiver-operating characteristic curve (ROC) analysis and using the best cutoff criteria, sensitivity, specificity, and accuracy (fraction of correctly classified) were calculated. We also computed Cohen’s kappa statistics of agreement. Statistical significance was considered for p < 0.05.
3. Results
A total of 40,643 epochs (length, 3 min) were obtained (
Table 1). After excluding the 12,337 with a transitional stage, 28,306 were used for analyses.
Figure 1 shows the indices of HRV and BM by sleep stages. HR and VLF were smaller and HF was greater during the NREM stage. All indices of BM were greater during the waking stage.
The logistic regression analysis revealed that the NREM stage was discriminated from the waking and REM stages by upper 95% BM, HF, VLF, median BM, and HR (
Table 2) and that the REM stage was discriminated from the waking stage by upper 95% BM, VLF, average BM, and HR (
Table 3). The ROC curve analysis showed that the models discriminate the NREM stage with an AUC of 0.830, 76.9% sensitivity, 74.5% specificity, 75.8% accuracy, and a Cohen’s kappa statistic of 0.514 and discriminate the REM stage with an AUC of 0.20, 77.2% sensitivity, 72.3% specificity, 74.5% accuracy, and a kappa statistic of 0.491 (
Table 4).
4. Discussion
In this study, we demonstrated the performance of sleep stage classification by the indices obtained from HR and actigraphic signals. We studied 40,643 epochs (length, 3 min) of PSG data in 289 subjects. We observed that the NREM stage was discriminated from the waking/REM stages by the combination of HF, VLF, HR, and BM indices with 75.8% accuracy and kappa = 0.514, and that REM sleep was discriminated from waking by the combination of VLF, HR, and BM indices with 74.5% accuracy and kappa = 0.491. These observations indicate that the combined use of autonomic functional indices from HRV and BM indices derived from actigraphic data are useful for discriminating the NREM stage from the REM/waking stages and for discriminating the REM stage from the waking stage.
Many earlier studies have reported the performance of sleep stage classification by HRV indices [
1,
2,
3,
8]. Penzel et al. [
1] studied 78 subjects undergoing a PSG. They analyzed HRV indices (mean R-R interval, SD of R-R interval, VLF, LF, HF, and LF/HF) and scaling exponents for 5-min consecutive segments whose sleep stage was defined as that continued for >3 min in the segment. They reported 85% accuracy for sleep stage separation by a discriminant model consisting of the mean and SD of R-R intervals and scaling exponents. Using 18 PSG data in the MIT/BIH Polysomnographic Database, Adnane et al. [
2] reported 80% accuracy and a Cohen’s kappa of 0.41 for the classification of wake and sleep of 30-s epochs with 10 features of HRV and heart rate dynamics selected by a support vector machine recursive elimination system. Fonseca et al. [
3], in a study of 48 PSG data in healthy adults, reported 80% accuracy and a kappa of 0.56 in the classification of 30-s epochs of wake, NREM sleep, and REM sleep with 80 features selected from 142 features of ECG and thoracic respiratory effort. Additionally, Aktaruzzaman et al. [
8], in a study of 20 PSG data, reported 84% and 71% accuracy and kappa of 0.68 and 0.45 in the classification of 5-min segments for NREM versus REM and for sleep versus wakefulness, respectively, with 4 features selected from 12 HRV features. Although the discriminant accuracy that we observed in this study seems comparable to those reported in these earlier studies, it should be noted that, in these earlier studies, a greater number of HRV indices was used in a smaller sample size of subjects than ours. Thus, the classification models obtained in these previous studies have a higher risk of overfitting to the data used for the analysis, which could result in a lower performance when they are applied to new data from other groups of subjects.
There are also many earlier studies reporting on the sleep-wakefulness classification performance of BM assessed by actigraphy [
4,
5,
6]. In a study of 41 subjects undergoing a PSG examination, Cole et al. [
4] reported that sleep period was distinguished by a wrist actigraphy with 88% accuracy. In a study of 100 sleep-disordered patients, Kushida et al. [
5] reported that total sleep time and sleep efficiency results that did not differ significantly from those obtained by PSG data were obtained by a combination of actigraphy and subjective reports. Long et al. [
6] reported 95.7% accuracy and a kappa of 0.59 for a sleep-wakefulness classification by a combination of actigraphy and respiration in a study of 15 healthy subjects. Additionally, Aktaruzzaman et al. [
9] have recently reported classification performance by a combination of actigraphy and HRV in 18 subjects with no previous history of sleep disorders. They found that sleep and waking were distinguished at 78% accuracy by four features derived only from wrist actigraphy, and that the addition of HRV features resulted in no significant improvement of classification performance. Although there is an apparent discrepancy between their findings and those in the present study, they reported about sleep-wakefulness classification in subjects without sleep disorder, while we performed sleep stage classification in subjects consisting mainly (88%) of patients with sleep disorders.
The present study is characterized by: (1) providing sleep stage classification models with the combined use of actigraphic and HR data; (2) proposing two-step approaches separately discriminating NREM sleep from waking/REM sleep and REM sleep from wakefulness; and (3) examining models with a large sample size. Our observations support the merit of the combined use of HR and actigraphic data for the estimation of sleep stage.
This study has limitations. First, in this study, 30.5% of 3-min epochs were defined as a transient stage and were excluded from the analysis. The definition of transient stage, however, could affect the performance of the classification. Second, we used only two-way classification accuracies: NREM from REM and waking stages and REM from the waking stage. This is because we tried to characterize the parameters contributing to each classification in the physiological point of view. To determine the models’ practical usefulness, however, performance for three-way classification is important. Finally, the classification of sleep stage may be affected by sleep-disordered breathing. Particularly, because episodes of sleep apnea are accompanied by a cyclic variation of heart rate [
16,
17], further studies are required for the impact of sleep-disordered breathing on sleep stage classification by HRV indices.
5. Conclusions
We demonstrated the performance of sleep stage classification by the indices obtained from HR and actigraphic signals. In the analysis of 40,643 3-min epochs of PSG data in 289 subjects, NREM was discriminated from the waking/REM stages by the combination of HF, VLF, HR, and BM indices with 75.8% accuracy and a kappa = 0.514, and REM sleep was discriminated from the waking stage by the combination of VLF, HR, and BM indices with 74.5% accuracy and kappa = 0.491. Our observations indicate that the combined use of autonomic functional indices from HRV and BM indices derived from actigraphic data are useful for discriminating NREM from the REM/waking stages and for discriminating REM from the waking stage.