Sleep Staging Method Based on Multimodal Physiological Signals Using Snake–ACO

Chu, Wenjing; Wang, Chen; Yang, Liuwang; Guo, Lin; Wu, Chuquan; Wang, Binhui; Wan, Xiangkui

doi:10.3390/app16031316

Open AccessArticle

Sleep Staging Method Based on Multimodal Physiological Signals Using Snake–ACO

by

Wenjing Chu

¹

,

Chen Wang

¹,

Liuwang Yang

¹

,

Lin Guo

¹,

Chuquan Wu

²

,

Binhui Wang

¹

and

Xiangkui Wan

^1,*

¹

Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology, Wuhan 430068, China

²

Puleap (Wuhan) Medical Technology Co., Ltd., Wuhan 430068, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(3), 1316; https://doi.org/10.3390/app16031316

Submission received: 26 December 2025 / Revised: 22 January 2026 / Accepted: 26 January 2026 / Published: 28 January 2026

Download

Browse Figures

Versions Notes

Abstract

Non-invasive electrocardiogram (ECG) and respiratory signals are easy to acquire via low-cost sensors, making them promising alternatives for sleep staging. However, existing methods using these signals often yield insufficient accuracy. To address this challenge, we incrementally optimized the sleep staging model by designing a structured experimental workflow: we first preprocessed respiratory and ECG signals, then extracted fused features using an enhanced feature selection technique, which not only reduces redundant features, but also significantly improves the class discriminability of features. The resulting fused features serve as a reliable feature subset for the classifier. In the meantime, we proposed a hybrid optimization algorithm that integrates the snake optimization algorithm (SO) and ant colony optimization algorithm (ACO) for automated hyperparameter optimization of support vector machines (SVMs). Experiments were conducted using two PSG-derived public datasets, the Sleep Heart Health Study (SHHS) and MIT-BIH Polysomnography Database (MIT-BPD), to evaluate the classification performance of multimodal features compared with single-modal features. Results demonstrate that the bimodal staging using SHHS multimodal signals significantly outperformed single-modal ECG-based methods, and the overall accuracy of the SHHS dataset was improved by 12%. The SVM model optimized using the hybrid Snake–ACO algorithm achieved an average accuracy of 89.6% for wake versus sleep classification on the SHHS dataset, representing a 5.1% improvement over traditional grid search methods. Under the subject-independent partitioning experiment, the wake versus sleep classification task maintained good stability with only a 1.8% reduction in accuracy. This study provides novel insights for non-invasive sleep monitoring and clinical decision support.

Keywords:

sleep staging; multimodal physiological signals; data analysis; support vector machine; snake optimization

1. Introduction

Sleep constitutes a vital physiological process essential for somatic restoration and health maintenance, occupying approximately one-third of an individual’s lifespan [1]. Beyond sleep quality evaluation, scientific sleep staging serves as a diagnostic cornerstone for managing sleep disorders [2]. Polysomnogram (PSG), recognized as the clinical gold standard for sleep monitoring [3], comprehensively records physiological signals during sleep, including electroencephalogram (EEG), electrocardiogram (ECG), and respiratory [4]. However, PSG’s dependence on EEG leads to complex settings, prolonged testing time, and high costs [5]. Researchers thus focus on utilizing ECG and respiratory signals, leveraging their convenience and cost-effectiveness for portable sleep staging solutions. According to the classification criteria of the American Academy of Sleep Medicine (AASM), sleep stages can be categorized into wakefulness, rapid eye movement (REM), and non-rapid eye movement (NREM) [6]. In sleep staging studies based on electrocardiogram (ECG) and respiratory signals, multiple classification systems are commonly adopted, specifically including the two-class scheme (Wake–Sleep), the three-class scheme (Wake–REM–NREM), and the four-class scheme (Wake–REM–N1N2–N3). These classification systems are able to capture the core structural features of sleep [7].

Sleep structure closely correlates with physiological parameters such as heart rate and respiratory patterns, with current signal acquisition methods being diverse and technically mature [8]. Xiao et al. [9] extracted 41 Heart Rate Variability (HRV) features from RR interval sequences derived from ECG signals and employed a Random Forest (RF) classifier for sleep staging. Wang et al. segmented ECG signals into 5 min epochs for HRV analysis, filtered 24 extracted HRV features using Sequential Forward Selection (SFS), and utilized a decision tree-based support vector machine (SVM) to classify Wake, REM, and NREM stages, achieving an accuracy of 73.51% [10]. Long et al. applied a Linear Discrimination (LD) classifier to model sleep triage from respiratory signals of 48 subjects, attaining a 76.2% accuracy (Cohen’s Kappa = 0.45) [11]. Earlier, Redmond et al. [12] extracted 30 features from ECG and respiratory signals for sleep staging and employed a linear classifier for the Wake–REM–NREM classification, achieving an average accuracy of 76% (Cohen’s Kappa of 0.46 ± 0.10). Willemen et al. [13] segmented signals into 60 s intervals, extracting 375 features from cardiac, respiratory, and somatic signals, and achieved an average accuracy of 81% (Cohen’s Kappa of 0.62) for the Wake–REM–NREM classification. In recent years, Sharan et al. [14] utilized ECG-derived instantaneous heart rate (IHR) and ECG-derived respiration (EDR) signals as inputs to a neural network consisting of a 1D convolutional neural network (CNN), achieving an average classification accuracy of 83.8% in two-stage staging. Si et al. [1] conducted a five-stage experiment on ECG and respiratory signals from 255 subjects using the U-sleep network, achieving an overall accuracy of 64.1%. Li et al. [15] built REM-NREM automatic identification model for HRV signals using the SVM machine learning algorithm, achieving an accuracy of 79.83%.

While SVMs have demonstrated reliable performance in sleep staging tasks, most existing studies directly adopt default or empirically selected hyperparameters when deploying SVMs. This empirical hyperparameter setting, however, often fails to fully exploit the optimization potential of SVMs: on the one hand, inappropriate hyperparameter configurations may lead to overfitting or underfitting and even prevent the model from capturing key patterns in sleep-related features; on the other hand, mismatched parameter combinations can disrupt the kernel function’s ability to map raw features to a discriminative high-dimensional space, thereby weakening SVM’s performance in distinguishing between REM and NREM sleep stages. Unfortunately, few studies in current sleep staging research have focused on the targeted optimization of SVM hyperparameters—most either rely on manual tuning or a simple grid search, which is computationally expensive and inefficient for high-dimensional parameter spaces.

In clinical practice, non-invasive sleep monitoring using ECG and respiratory signals is critical for screening sleep disorders in primary care settings and home-based scenarios, as it avoids the high cost and operational complexity of PSG. Against this backdrop, this study focuses on the incremental optimization of the classical sleep staging workflow based on ECG and respiratory signals. We target two key bottlenecks in the conventional workflow: (1) redundant features with poor discriminability that reduce the efficiency and performance of the sleep staging model and (2) suboptimal hyperparameters of SVMs that restrict the model’s classification stability and generalization ability. To tackle these problems, we first adopt an enhanced feature selection technique based on the Maximum Relevance Minimum Redundancy criterion (mRMR) [16]. We then combine snake optimization (SO) and ant colony optimization (ACO) to propose the hybrid snake–ant colony optimization (Snake–ACO) algorithm for automated SVM hyperparameter tuning [17,18], aiming to further improve the model’s accuracy, stability, and generalizability for ECG–respiratory signal-based sleep staging.

2. Materials and Methods

In this study, we propose a sleep staging method based on multimodal feature fusion to address the limitations of conventional feature selection techniques and the sensitivity of SVM hyperparameters. We utilized ECG and respiratory signals sourced from publicly available databases. These signals underwent preprocessing, followed by feature extraction and analysis across temporal, spectral, and nonlinear domains. The extracted features were then selected to remove redundant features, and input into the machine learning model for sleep classification. The overall structure of the proposed algorithm is presented in Figure 1.

2.1. Dataset

The experimental data used in this paper were obtained from two publicly available databases: the MIT-BIH Polysomnographic Database (MIT-BPD) [19] and the Sleep Heart Health Study (SHHS) [20,21]. The MIT-BPD database consists of eighteen records containing polysomnographic monitoring data from 16 subjects collected between 1987 and 2003. The subjects ranged in age from 32 to 56 years old, including healthy individuals and patients with sleep apnea, with a total duration exceeding 250 h. The SHHS is a multicenter cohort study conducted by the National Heart, Lung, and Blood Institute to identify disorders such as cardiovascular diseases resulting from sleep apnea. This database stores 6441 data records from 1995 to 1998 and 3295 data records from 2001 to 2003, with a typical sleep time of 7 to 8 h per record.

2.2. Preprocessing

ECG denoising and R-wave detection using the Pan–Tompkins algorithm [22] determined the R-wave position by band-pass filtering, differential squaring, and adaptive thresholding. One RR interval was created by recording two adjacent R-wave intervals. To ensure the RR interval sequence is free of outliers that could affect the experimental results, a thresholding technique was used to exclude RR intervals that fall outside the typical range. Respiratory signals are often disturbed by motion artifacts and low-frequency drifts. We suppressed motion artifacts using adaptive filtering and eliminated high-frequency noise and baseline drifts using 0.1 to 0.5 Hz band-pass filtering.

The signals used in the experiments were split into 30 s non-overlapping epochs, and they ultimately generated time-aligned valid data epochs of ECG and respiratory signals for subsequent feature extraction. In the experiments, a total of 10,154 epochs of data were used in the MIT-BPD database. For the SHHS database, strict inclusion criteria were applied to subject recruitment: all subjects were healthy adults aged 22–65 years (no self-reported sleep disorders, cardiovascular diseases, or neurological diseases); the gender ratio was controlled at 1:1 to eliminate the potential impact of gender differences on cardiorespiratory signal characteristics; all subjects completed a full-night PSG monitoring with no signal loss exceeding 5% of the total recording time. Two non-overlapping data subsets were employed for different experimental purposes:

The first subset (referred to as the “base validation set”) contains 15,531 epochs, comprising data from 20 subjects with a gender distribution of 10 males and 10 females, which was used to evaluate the model’s basic performance.

The second subset (referred to as the “robustness verification set”) consists of an additional 12,500 epochs, derived from 16 independent subjects (8 males and 8 females) who did not overlap with those in the base validation set. This subset was specifically used for independent robustness testing, to verify whether the model maintains stable performance when generalized to data from new, previously unseen subjects.

2.3. Feature Extraction

2.3.1. Feature Extraction of ECG Signals

HRV extracted based on RR interval is a highly valuable physiological indicator in sleep staging research. Its core lies in its ability to indirectly reflect the dynamic regulation of the autonomic nervous system (ANS) through subtle changes in heart rate fluctuations. Different sleep stages are accompanied by significant changes in the balance between the sympathetic and parasympathetic branches of the ANS, which directly regulate heart rate patterns.

In this study, 17 features were extracted from the RR interval sequence obtained based on the ECG. These include six HRV time-domain features, such as the standard deviation of RR interval (SDNN), the number of RR intervals larger than 50 ms (NN50), etc. The power spectral densities of the RR intervals were obtained by using the Fast Fourier Transform (FFT). This method decomposes HRV into frequency bands that correspond to distinct branches of the ANS, thereby enabling the targeted capture of sleep-related physiological dynamics. Six frequency-domain features were extracted, including low-frequency (LF, 0.04–0.15 Hz) power and high-frequency (HF, 0.15–0.4 Hz) power, among others, referring to some of the frequency-domain features in Reference [23]; five nonlinear features, including four nonlinear features based on the Poincaré scatterplot and the HRV Triangular Index, were extracted [24,25].

All features extracted based on the ECG in this study are given in Table 1. By analyzing HRV characteristics, the ANS-mediated physiological changes can be quantified, providing objective and non-invasive basis for distinguishing different sleep stages. Some of the features were calculated as shown in the following equation:

S D 1 = \sqrt{\frac{1}{2} V a r (R R_{n} - R R_{n + 1})}

(1)

S D 2 = \sqrt{2 V a r (R R_{n}) - \frac{1}{2} V a r (R R_{n} - R R_{n + 1})}

(2)

H R V T r i a n g u l a r I n d e x = \frac{N u m b e r o f R R i n t e r n a l}{Y}

(3)

where RR_n denotes the RR interval corresponding to the n-th cardiac cycle, and Y represents the highest frequency for RR intervals.

2.3.2. Feature Extraction of Respiratory Signals

The respiratory signal is a core physiological marker for sleep staging. Its rhythm, depth, and regularity undergo distinct, stage-specific changes, driven by shifts in ANS balance and the activity of brainstem respiratory centers during sleep. Unlike ECG-derived HRV, respiratory signals directly capture the body’s ventilatory dynamics, providing complementary information for distinguishing sleep states.

Based on the time–frequency features of the respiratory signal, a total of 12 features were extracted in this paper, as shown in Table 2. Nine time-domain features are included, such as mean, standard deviation, and interquartile spacing. Among the time-domain ones, kurtosis and skewness describe respiratory signal distribution and correlate strongly with sleep stages. Their calculation formulas are as follows:

S k e w n e s s = \frac{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{3}}{^{σ^{3}}}

(4)

K u r t o s i s = \frac{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{4}}{^{σ^{4}}} - 3

(5)

where x_i denotes the amplitude of the i-th sampling point of the respiratory signal, and σ represents the sample standard deviation of the respiratory signal segment.

In the frequency domain, three features were extracted as key supplements to the time-domain measurement to further link respiratory dynamics with ANS regulation. These include low-frequency (0.04~0.15 Hz) power, high-frequency (0.15~0.4 Hz) power, and low-frequency-to-high-frequency power ratio. The dynamic changes in these features can be used as an important basis for sleep staging and provide key information for the accurate discrimination of sleep stages.

2.4. Feature Selection

Feature selection is a critical step in the sleep staging process. However, the traditional mRMR method, while leveraging the mutual information between features and the target class, lacks an explicit and direct quantification of class separability. To address this limitation of mRMR and enhance the discriminative focus by explicitly characterizing class separability, this study develops an improved dynamically weighted dual-criteria mRMR (DW_mRMR) based on the mRMR framework as a feature fusion strategy. The core of the mRMR criterion is to maximize the relevance between features and the target category while minimizing the redundancy between features, and its objective function is defined as follows:

m R M R (f_{i}) = I (f_{i}; Y) - \frac{1}{| S |} \sum_{f_{j} \in S} I (f_{i}; f_{j})

(6)

where f_i denotes the candidate feature to be selected, S is the selected feature subset,

| S |

is the subset size, and I (f_i; f_j) is the mutual information (MI) among features. The specific calculation method and parameter settings of mutual information I (f_i; f_j) and relevance term I (f_j; Y) are given in Reference [16]. In this study, we adopt the k-nearest neighbor (k-NN)-based MI estimation approach consistent with Reference [16] to adapt to the distribution characteristics of sleep-related physiological features.

To improve the characterization of discriminative ability differences among different sleep stages, we introduced the Symmetric Kullback–Leibler Divergence (SKLD) [26], which effectively enhances the accuracy and interpretability of quantifying class separability between sleep stages. Its definition is shown below:

S K L D (f_{i} ∣ c_{p}, c_{q}) = \frac{1}{2} [D_{K L} (p ∥ q) + D_{K L} (q ∥ p)]

(7)

where D_KL(p||q) is the KL divergence from distribution p to distribution q.

We designed the dynamic weight coefficient α to adaptively adjust the contribution weights of SKLD and mRMR in the dual criterion, so as to maximize the relevance while preferentially retaining features with higher discriminative ability. The specific settings of α are as follows:

(1) Initialization: α₀ = 0.5 (balancing the two criteria at the initial stage);

(2) Update rule: After selecting the optimal feature f_best in each iteration, calculate the average SKLD (denoted as SKLD_avg) of all remaining candidate features. If SKLD (f_best) > SKLD_avg (indicating that f_best has strong class separability), update α_t+1 = min (α_t + 0.02, 1) to enhance the weight of class separability; otherwise, update α_t₊₁ = max (α_t − 0.02, 0) to emphasize the mRMR criterion;

(3) Constraint: α ∈ [0, 1] to avoid extreme weighting.

Based on the above settings, the composite score function integrating SKLD and mRMR is defined as follows:

D W_m R M R (f_{i}) = α \cdot S K L D (f_{i}) + (1 - α) \cdot [I (f_{i}; Y) - \frac{1}{∣ S ∣} \sum_{f_{j} \in S} I (f_{i}; f_{j})]

(8)

To quantify the contribution of each feature to sleep staging, this study defines contribution rate (CR) and cumulative contribution rate (CCR) as the screening criteria, with specific definitions as follows:

C R (f_{i}) = \frac{S c o r e (f_{i})}{\sum_{f \in F} (S c o r e (f))}

(9)

where F denotes the set of all candidate feature subsets, and f ∈ F.

C C R = \sum_{f_{j} \in S} (C R (f_{j}))

(10)

where S is the optimal feature subset. Features are selected and added to the subset in the descending order of their composite scores. The screening is terminated when the cumulative contribution rate of S exceeds 0.9, and S is the output for subsequent sleep staging.

2.5. Classifier Hyperparameter Optimization

Aiming at the sensitivity problem of the penalty coefficient BoxConstraint (denoted as C) and the kernel scale parameter KernelScale (denoted as σ) of SVM classifiers [27], this paper proposes Snake–ACO, a hybrid optimization algorithm. To ensure the rationality of the search space and parameter comparability, parameter bounds and scaling strategies are defined for C and σ: C is constrained within [0.001, 1000] and σ within [0.01, 10], both adopting logarithmic scaling to balance the search resolution of small and large parameter values. This algorithm efficiently searches for the optimal parameter combinations by synergistically utilizing the exploratory ability of the snake algorithm and the pheromone feedback mechanism of the ant colony algorithm to achieve a balance between global exploration and local exploitation.

The hybrid algorithm divides the optimization process into two phases, dominated by ACO and SO, respectively. The ACO dominates the global exploration phase (the first 50% of the iteration) and utilizes the pheromone concentration gradient to guide the search direction.

For ACO, candidate path definitions are as follows: the parameter space of (C, σ) (after logarithmic scaling) is discretized into a grid with a step size of 0.01, and each candidate path corresponds to a discrete grid point, representing a feasible (C, σ) combination. The number of candidate paths is determined by the grid density, matching the ant colony size (set to 20 in this study). Each ant generates a candidate solution (C, σ) based on the path selection probability:

P_{j}^{k} = \frac{{[τ_{j}]}^{α}}{\sum_{l = 1}^{N} {[τ_{l}]}^{α}}

(11)

where N is the number of candidate paths, and α is the pheromone weight coefficient (set to 1.2, balancing pheromone guidance and random exploration). This phase allows the ants to explore extensively within a specified range of the parameter space C and σ and avoid premature convergence to a local optimum.

The local exploitation phase (50% after iteration) is taken over by SO, which performs a refined search based on the temperature decay model, and the position update of snake individuals follows a Gaussian perturbation strategy:

x_{i}^{t + 1} = x_{b e s t}^{t} + β \cdot N (0, 1) \cdot \frac{1}{\sqrt{t + 1}}

(12)

where β is the control perturbation strength (set to 0.1),

x_{b e s t}^{t}

is the current optimal solution, and N (0,1) is a standard Gaussian distribution. This stage limits the search range to the elite solution neighborhood

‖ x - x_{b e s t} ‖ \leq 0.05

to improve the fine-tuning accuracy of the classification boundary. The iteration number threshold controls the stage-switching condition:

\{\begin{matrix} t < 0.5 T_{\max}, A C O \\ o t h e r w i s e, S O \end{matrix}

(13)

where T_max is the maximum number of iterations. The stopping criteria are defined as follows: the algorithm terminates when either T_max is reached, or the optimal fitness value (SVM classification accuracy) remains unchanged for 5 consecutive iterations. To prevent overexploitation, if an area is frequently visited and the number of localized exploits exceeds a threshold (set to 3), the pheromone of this area is reset. Meanwhile, evaluation budget is defined as the total number of SVM classification performance evaluations during the entire optimization process, which is calculated by multiplying T_max and the ant colony size.

We demonstrate the collaborative optimization process of the Snake–ACO hybrid algorithm in Figure 2. The experimental parameters displayed in the figure are set as follows: T_max = 20, ant colony size (N) = 20, perturbation radius (r) = 0.05, α = 1.2, and perturbation strength β = 0.1. Correspondingly, the total evaluation budget is 400, calculated as T_max × N. The first phase of the algorithm is the ant colony exploration phase (shown on the left), which performs global exploration through random path (depicted as translucent curves) generation, along with the pheromone concentration gradient represented by color-mapped surfaces. Specifically, on the left side of the figure, the color gradient transitions from light orange to dark blue, corresponding to the change of pheromone concentration from low to high. The pheromone helps steer the population toward potentially optimal regions. The second phase, the SO stage (shown on the right), employs directed perturbations (illustrated by scatter points) within the neighborhood of the elite solution, indicated by gray dashed boundaries. The switching of two phases is represented by a vertical dashed line at t = 0.5 × T_max, thus achieving the efficient search of the parameter space through the dynamic collaboration of the global pheromone guidance of the ant colony algorithm and the local parameter optimization of the snake optimization algorithm.

3. Results

3.1. Evaluation Metrics

To evaluate the performance of the sleep staging model, this study utilizes five indicators: accuracy, precision, recall, F1-score, and Cohen’s Kappa coefficient for verification [28]. Accuracy reflects the model’s overall classification correctness rate. The calculation formula is as follows:

A cc u r a r y = \frac{T P + T N}{T P + T N + F P + F N}

(14)

where TP (true positive) and TN (true negative) refer to the counts of correctly predicted positive and negative samples, respectively. FP (false positive) and FN (false negative) indicate the counts of incorrectly classified samples. This indicator is suitable for the overall performance evaluation of balanced datasets [29].

The precision reflects the proportion of true positives among all predicted positive samples, indicating the model’s ability to avoid false positives:

P r e c i s i o n = \frac{T P}{T P + F P}

(15)

Recall represents the proportion of true positives among all actual positive samples, evaluating the model’s ability to identify positive samples:

R e c a l l = \frac{T P}{T P + F N}

(16)

The F1-score is the harmonic mean of precision and recall, which balances the trade-off between the two metrics and is robust for class imbalance in sleep staging tasks:

F 1 = 2 \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(17)

Cohen’s Kappa coefficient [30] quantifies the agreement between predicted and actual sleep stages beyond chance, which is more reliable than accuracy for imbalanced datasets. Its formula is as follows:

κ = \frac{P_{o} - P_{e}}{1 - P_{e}}

(18)

where P_o is the observed agreement rate, and P_e is the expected random agreement rate.

3.2. Verification of the Multimodal Fusion Mechanism

This study adopted a stepwise comparison experiment to assess the effect of respiratory signals on sleep staging, with a unified five-fold cross-validation protocol applied to ensure consistent experimental settings—for each fold, the dataset was partitioned into training, validation, and test subsets at a fixed ratio of 7:1:2. Based on the MIT-BPD dataset, 17 HRV features were extracted from ECG signals for unimodal (ECG-only) sleep staging experiments. Additionally, for the SHHS dataset, 17 HRV features were first extracted from the base validation set, with 12 additional respiratory features extracted to construct multimodal fusion features; single-modal (ECG-only) and multimodal (ECG and respiratory) control experiments were then conducted, consistently adopting the aforementioned cross-validation protocol. For all experiments, DW_mRMR was used as the feature selection method, and Snake–ACO was employed to optimize the hyperparameters of the SVM model. Notably, a class-weighting strategy was integrated into the SVM training process across all experiments, where the weight of each sleep stage was assigned according to the inverse ratio of its sample size. All feature selection and hyperparameter optimization operations were strictly and properly nested within the training subset of each fold to avoid data leakage. This two-stage validation protocol was consistently applied to all subsequent experiments—encompassing feature selection method comparisons, hyperparameter optimization algorithm evaluations, and generalization performance validation—with epoch-level splitting for internal cross-validation and subject-level splitting for hold-out validation to assess the final generalization ability.

The statistical results of overall classification accuracy (expressed as mean ± standard deviation, with 95% confidence interval (CI) included, derived from 10 independent runs with different random seeds to ensure statistical stability and reliability) for different sleep staging tasks are presented in Table 3. Table 4 reports the per-class precision, recall, and F1-score of a representative run from the SHHS multimodal group in Table 3, while the classification confusion details are illustrated in Figure 3. Details of these figures and tables are presented below.

From the statistical results in Table 3, the multimodal model achieves significantly better performance than the unimodal ECG model on the SHHS base validation set: two-class staging yields a mean accuracy of 89.60 ± 1.30%, a 14.1% accuracy gain relative to the unimodal model (75.50 ± 1.60%); for three-class staging, the multimodal model attains 78.50 ± 2.70%, marking a 10.5% improvement compared to the unimodal counterpart (68.00 ± 3.00%); even for the more challenging four-class staging, the model still maintains a mean accuracy of 68.90 ± 4.50%, corresponding to a 6.9% rise from the unimodal model (62.00 ± 4.70%). Cross-dataset comparison shows the multimodal model outperforms the unimodal ECG model on the MIT-BPD dataset by 11.4% in W-S staging (89.60 ± 1.30% vs. 78.20 ± 1.20%). These results indicate that, compared with the unimodal ECG model, the fusion of ECG and respiratory signals can significantly improve the accuracy of sleep staging across scenarios with different staging task complexities.

To further dissect the classification performance of the multimodal model, we analyzed the results summarized in Table 4. For two-class staging, the S class achieves excellent performance, with a precision of 92.67%, a recall of 93.70%, and an F1-score of 93.18%, while the W class shows acceptable indicators, at 78.29%, 75.07%, and 76.65%, respectively. For three-class staging, the R and N classes have precision of 67.98% and 83.27%, recall of 68.06% and 85.61%, and F1-score of 68.02% and 84.42%, respectively. The R and N classes are prone to being confused, which is consistent with the confusion cases visualized in Figure 3b. For four-class staging, the W, R, N1N2, and N3 classes have precision of 71.72%, 65.15%, 67.15% and 60.26%, respectively, with their recall and F1-score summarized in Table 4. Most misclassifications occur between W and N1N2, N1N2 and N3, and R and N1N2. For instance, approximately 30% of the N3 class samples are misclassified as N1N2, and nearly 25% of the W class samples are incorrectly categorized as either R or N1N2.

3.3. Robustness Analysis of the Feature Selection Method

To validate the comprehensive performance of the DW_mRMR method, this study compared the traditional mRMR and the packaged feature selection method based on the Particle Swarm Optimization algorithm (PSO) [31] on the base validation set of the SHHS dataset. Consistent with the experimental protocol in Section 3.2, five-fold cross-validation was adopted, and SVM hyperparameters were adaptively optimized via Snake–ACO.

The experimental results presented in Figure 4 demonstrate that, compared to the PSO algorithm, DW_mRMR improves overall sleep staging accuracy by approximately 5%, and by about 2% compared to traditional mRMR. PSO is prone to falling into local optimal solutions, while DW_mRMR, by balancing feature relevance and redundancy and leveraging its dynamic weighting adjustment mechanism, quickly approximates the optimal solution with a 10% feature reduction. In addition, this method effectively overcomes the deficiencies of conventional mRMR in terms of class separability and reduces the number of features in the feature set from 29 to 26 (about 10% reduction), significantly reducing redundancy while retaining key discriminative features with high stability across five-fold cross-validation, a trait attributed to the method’s inherent focus on physiologically interpretable discriminative features.

3.4. Performance Comparison of Hyperparameter Optimization Algorithms

To verify the advantages of Snake–ACO in SVM hyperparameter optimization, this study conducted controlled experiments on the base validation set of the SHHS dataset, focusing on sleep bi-staging with multimodal feature fusion. Consistent with the experimental protocol in Section 3.2, six optimization algorithms were compared—SO, PSO, Genetic Algorithm (GA), Bayesian optimization (BO), grid search, and Snake–ACO. This comparison was benchmarked against mainstream metaheuristic methods, serving to clearly demonstrate the unique characteristics of Snake–ACO in balancing exploration and exploitation.

The experimental results presented in Figure 5 demonstrate that the Snake–ACO algorithm exhibited significant advantages in SVM hyperparameter optimization for sleep staging. Specifically, compared to the other five algorithms, it achieved a well-balanced and leading performance in both classification accuracy and computational efficiency.

In terms of core sleep staging performance, Snake–ACO reached an average accuracy of 89.6% and a Kappa coefficient of 0.70, standing out as the top performer among all compared methods. It outperformed traditional grid search by a notable margin—boosting accuracy by 7.1 percentage points (from 82.5%) and increasing the Kappa coefficient by 0.07 (from 0.63). Against single metaheuristics, it also demonstrated clear superiority: compared to the base SO algorithm, it achieved a 4.9-percentage-point increase in accuracy (from 84.7%) and a 0.05 increase in the Kappa coefficient (from 0.65). It also outperformed other representative single metaheuristics, including GA, PSO, and even the probabilistic model-based BO, in both core sleep staging accuracy and the Kappa coefficient.

In computational efficiency, Snake–ACO maintains a lightweight runtime of 15.7 s, which is far more efficient than the time-consuming grid search (45.8 s, a 65.7% reduction) and GA (38.2 s, a 58.9% reduction). It also outperforms BO, which takes 22.6 s (a 30.5% runtime cut). Although its runtime is slightly longer than the ultra-fast base SO (12.4 s) and marginally shorter than PSO (18.5 s), this minor increase in computational cost is trivial when weighed against the substantial gains in staging accuracy and the Kappa coefficient. This confirms that Snake–ACO achieves the significant optimization of sleep staging performance without incurring excessive computational overhead.

To further validate the robustness of the Snake–ACO, PSO, and BO algorithms in practical optimization scenarios, supplementary robustness verification experiments were conducted. Consistent with the experimental protocol in Section 3.2 (epoch-level splitting for internal cross-validation), the robustness verification set was adopted, and its data were split by epoch for five-fold cross-validation. All compared algorithms were assigned the same function evaluation budget of 400 to ensure comparison fairness. Each algorithm was independently run 10 times with different random seeds to mitigate the influence of initial randomness, and results were reported as “mean ± standard deviation”.

Table 5 presents the performance comparison of the three optimization algorithms (PSO, BO, and Snake–ACO) on multimodal ECG and respiration signals from two subsets of the SHHS: the base validation set and the robustness verification set. On the robustness verification set, Snake–ACO maintained its superior performance relative to the base validation set, exhibiting remarkable stability: its mean accuracy decreased only slightly by 0.3 percentage points, from 89.6% ± 1.3% to 89.3% ± 1.6%, with a minimal volatility variation. In contrast, both PSO and BO showed more significant performance drops and much higher volatility. PSO’s mean accuracy fell from 86.3% ± 3.5% to 84.7% ± 4.8%, a 1.6-percentage-point decline accompanied by a notable increase in volatility. BO’s mean accuracy also decreased, from 87.8% ± 2.1% to 85.5% ± 2.9%, with a moderate rise in fluctuation.

These results confirm that Snake–ACO not only delivers higher accuracy on the base validation set but also demonstrates stronger robustness and statistical stability when generalized to unseen subject data. This validates the effectiveness of its optimization mechanism in maintaining stable performance across diverse datasets, with a consistent evaluation budget ensuring the credibility of the comparative results.

3.5. The Generalization Verification Experimental Results

ECG and respiratory signals exhibit significant individual specificity. If data from the same subject are simultaneously involved in model training and validation, it leads to overly optimistic performance evaluations, failing to truly reflect the model’s generalization ability on new subjects. Therefore, this section adopts a strict subject-independent partitioning strategy for the robustness verification set, where each subject is exclusively included in this set. Consistent with the base experiment, the performance evaluation strictly follows the subject-level statistical standard: for each individual subject in the robustness verification set, the per-class precision, recall, F1-score, and overall accuracy are calculated independently. Subsequently, the mean ± standard deviation and 95% CI of these metrics are aggregated across all subjects to reflect the model’s stable performance at the individual level. This subject-level hold-out validation aligns with the final generalization stage defined in the two-stage protocol of Section 3.2, serving as the gold standard for evaluating the model’s practical applicability. The results of the multimodal model under this paradigm are shown in Table 6.

Under the subject-independent paradigm, the performance metrics of the multimodal model exhibited a moderate decline compared with those obtained on the base validation set (Table 4), with the magnitude of the decline being positively correlated with the complexity of the sleep staging task. As shown in Table 6, the four-class sleep staging task exhibited the most pronounced performance decline, with overall accuracy decreasing by 4.8% and the N3 sleep stage yielding the lowest F1-score (53.68%). The N3 stage is characterized by subtle physiological features (e.g., slow-wave ECG activity and regular shallow respiration) as well as a relatively small sample size, and individual variability in cardiorespiratory signals under the subject-independent setting blurs the discriminative boundary between the N3 stage and the N1/N2 stages. Under the subject-independent paradigm, individual differences in respiratory irregularity during REM sleep exacerbate its similarity to wakefulness, resulting in increased misclassification rates and lower F1-scores (66.61% for three-class staging, 61.16% for four-class staging) relative to the base validation set. The three-class sleep staging task maintained the highest stability, with only a 1.8% reduction in overall accuracy and the S stage retaining precision and recall values above 91%.

4. Discussion

Experimental results of this study confirm that all the implemented optimizations on the traditional sleep staging model yield remarkable effectiveness. The DW_mRMR feature selection strategy, constructed by integrating SKLD into the traditional mRMR framework, effectively enhances feature discriminability, while the hybrid Snake–ACO algorithm achieves more accurate SVM hyperparameter tuning. Experiments on two PSG-derived datasets confirm that the optimized model outperforms traditional methods in wake versus sleep classification, with improved accuracy and stability even under subject-independent verification. These findings provide technical support for non-invasive sleep monitoring and clinical decision making, verifying the feasibility of the incremental optimization of the classical sleep staging workflow.

Despite these positive outcomes, this study still has several aspects that need improvement, which point out directions for future research. First, the classification accuracy for certain sleep stages remains to be enhanced. The current model performs well in the dichotomous wake versus sleep classification, but its accuracy in distinguishing finer-grained stages (e.g., REM versus NREM or N1N2 versus N3) is relatively limited. A key reason for this is that the ECG feature extraction process only focuses on 17 HRV features, but fails to utilize the important morphological features of the ECG waveform obtained after preprocessing. In fact, ECG morphological features are closely linked to sleep stage changes: T wave amplitude and wave amplitude ratio exhibit systematic variations between REM and NREM sleep due to autonomic nervous system modulation; the QT interval (reflecting ventricular repolarization duration) differs across stages and aids in distinguishing REM from NREM; moreover, QRS complex morphology variability caused by sleep-related body position changes can assist in identifying awakening-associated motor activity. Neglecting these morphological features limits the comprehensiveness of the feature set and the model’s ability to capture stage-specific physiological differences, thereby restricting classification accuracy for finer-grained stages. Future work will integrate these morphological features (e.g., T wave amplitude, QT interval, QRS complex shape parameters) with existing HRV features to optimize the feature set, and simultaneously adjust the model structure to improve the discriminative power for low-accuracy stages.

Second, this study does not conduct quantitative statistical analysis on the potential collinearity or informational redundancy between MI and SKLD. Although both metrics are derived from class distribution characteristics, they quantify distinct and complementary aspects of feature discriminability for sleep staging tasks. Specifically, MI measures the global statistical dependence between features and class labels, reflecting the overall ability of a feature to distinguish between all sleep stages in a probabilistic sense. In contrast, SKLD focuses on the direct distribution divergence between pairwise sleep stages (e.g., light sleep vs. deep sleep), a key challenge in sleep staging that MI often fails to capture explicitly due to its global statistical nature. Experimentally, the DW_mRMR strategy outperforms the traditional mRMR method in feature screening, and the features selected by the improved strategy achieve higher sleep staging accuracy with reduced feature redundancy. This performance gain directly verifies the validity of the composite criterion formed by their fusion—if there were significant collinearity or informational redundancy between MI and SKLD, the integration of the two metrics would not yield such a notable improvement in classification performance and feature discriminability. However, it should be acknowledged that relevant quantitative analyses (e.g., correlation coefficient calculation between MI and SKLD values, variance inflation factor (VIF) test for collinearity, or ablation experiments on the two metrics) are lacking in the current study. Subsequent research will supplement such analyses to quantify the degree of redundancy between MI and SKLD, and further optimize the dynamic weight adjustment mechanism of α based on the quantitative results, thereby enhancing the interpretability and robustness of the DW_mRMR strategy.

Third, if the proposed model is to be extended to wearable device applications in the future, motion artifacts must be taken into account, given that this study only utilized PSG data from public databases. In practical wearable scenarios, such artifacts—caused by body position changes, electrode displacement, and other factors—will degrade signal quality, which may lead to deviations in staging results. Future work will introduce adaptive anti-artifact algorithms (e.g., wavelet denoising, empirical mode decomposition) to optimize the preprocessing workflow and validate the model on wearable-acquired datasets, improving its noise resistance and practical applicability.

5. Conclusions

This study aims to improve the performance of sleep staging based on ECG and respiratory signals by optimizing the feature extraction method and classifier hyperparameters. First, screening is performed from 29 original features by dynamically weighted bicriteria mRMR, which introduces symmetric KL scatter to overcome the lack of conventional mRMR in terms of separability. Second, for the hyperparameter optimization of SVM classifiers, we proposed the Snake–ACO. The algorithm combines the global search capability of ACO and the local optimization characteristics of SO to achieve the adaptive optimization of the hyperparameters of SVM classifiers and further improve the sleep-staging performance of multimodal signals.

We systematically compared the performance of sleep staging methods based on physiological signals at home and abroad, as shown in Table 7, and revealed the differences in staging effects through two indicators: classification accuracy and the Kappa value. A comparison of the experimental results shows that the method proposed in this paper achieves favorable performance in sleep bi-staging tasks, with the Cohen’s Kappa coefficient reaching 0.70—a medium-high consistency level. It achieves a medium level of consistency in three-class sleep staging based on public datasets, providing a potential technical reference for non-invasive clinical sleep monitoring. However, whether this method can be applied to wearable sleep monitoring scenarios requires further validation. Accordingly, in subsequent research, we plan to improve the accuracy of sleep staging by expanding the sample size and fusing handcrafted features with deep learning-based feature extraction techniques and further supplement experiments in wearable sleep monitoring scenarios, so as to address these limitations and enhance the credibility of the conclusions for wearable application.

Author Contributions

Conceptualization, X.W. and W.C.; methodology, X.W.; software, C.W. (Chen Wang); validation, L.Y. and L.G.; formal analysis, C.W. (Chuquan Wu); investigation, B.W.; resources, B.W.; data curation, C.W. (Chen Wang) and C.W. (Chuquan Wu); writing—original draft preparation, W.C.; writing—review and editing, C.W. (Chen Wang); visualization, W.C.; supervision, L.Y.; project administration, B.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Wuhan Knowledge Innovation Special Project of China (Grant No. 2022020801010258).

Data Availability Statement

All data used in this study are obtained from publicly available databases. The MIT-BIH Polysomnographic Database is publicly available on https://physionet.org/content/slpdb/1.0.0/ (accessed on 13 May 2023). The Sleep Heart Health Study (SHHS) is publicly available on https://sleepdata.org/datasets/shhs (accessed on 1 April 2024).

Conflicts of Interest

Author Chuquan Wu was employed by Puleap (Wuhan) Medical Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SDNN	Standard deviation of all normal-to-normal intervals
RMSSD	The square root of the mean of the sum of the squares of differences between adjacent NN intervals
SDSD	Standard deviation of differences between adjacent NN intervals
NN50	Number of pairs of adjacent NN intervals differing by more than 50 ms
pNN50	NN50 count divided by the total number of all NN intervals
AVNN	Average of RR interval
TP	Total energy in whole data
LF	Total energy in frequency 0.04–0.15 Hz
HF	Total energy in frequency 0.15–0.40 Hz
LF/HF	Ratio between LF and HF
LFn	Normalized LF
HFn	Normalized HF
SD1	Short-term Poincaré Plot Standard Deviation
SD2	Long-term Poincaré Plot Standard Deviation
SDratio	Ratio of SD1 to SD2
S	Poincaré Plot Area
std	Standard deviation
IQR	Interquartile Range

References

Si, K.; Dong, K.; Lu, J.; Zhao, L.; Xiang, W.; Li, J.; Liu, C. A U-Sleep Model for Sleep Staging Using Electrocardiography and Respiration Signals. In Asian-Pacific Conference on Medical and Biological Engineering; Springer Nature: Cham, Switzerland, 2023; pp. 475–482. [Google Scholar] [CrossRef]
Yue, H.; Chen, Z.; Guo, W.; Sun, L.; Dai, Y.; Wang, Y.; Ma, W.; Fan, X.; Wen, W.; Lei, W. Research and application of deep learning-based sleep staging: Data, modeling, validation, and clinical practice. Sleep Med. Rev. 2024, 74, 101897. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Zhou, J.; Zha, F.; Zhou, M.; Li, D.; Zheng, Q.; Chen, S.; Yan, S.; Geng, X.; Long, J.; et al. Comparative analysis of sleep parameters and structures derived from wearable flexible electrode sleep patches and polysomnography in young adults. J. Neurophysiol. 2024, 131, 738–749. [Google Scholar] [CrossRef] [PubMed]
Perslev, M.; Darkner, S.; Kempfner, L.; Nikolic, M.; Jennum, P.J.; Igel, C. U-Sleep: Resilient high-frequency sleep staging. npj Digit. Med. 2021, 4, 72. [Google Scholar] [CrossRef]
Yang, C.Y.; Chen, P.C.; Huang, W.C. Cross-domain transfer of EEG to EEG or ECG learning for CNN classification models. Sensors 2023, 23, 2458. [Google Scholar] [CrossRef]
Silva, F.B.; Uribe, L.F.; Cepeda, F.X.; Alquati, V.F.; Guimarães, J.P.; Silva, Y.G.; Santos, O.L.; Oliveira, A.A.; Aguiar, G.H.; Andersen, M.L.; et al. Sleep staging algorithm based on smartwatch sensors for healthy and sleep apnea populations. Sleep Med. 2024, 119, 535–548. [Google Scholar] [CrossRef]
Wei, Y.; Qi, X.; Wang, H.; Liu, Z.; Wang, G.; Yan, X. A multi-class automatic sleep staging method based on long short-term memory network using single-lead electrocardiogram signals. IEEE Access 2019, 7, 85959–85970. [Google Scholar] [CrossRef]
Byeon, Y.H.; Kwak, K.C. An Ensemble Deep Neural Network-Based Method for Person Identification Using Electrocardiogram Signals Acquired on Different Days. Appl. Sci. 2024, 14, 7959. [Google Scholar] [CrossRef]
Xiao, M.; Yan, H.; Song, J.; Yang, Y.; Yang, X. Sleep stages classification based on heart rate variability and random forest. Biomed. Signal Process. Control 2013, 8, 624–633. [Google Scholar] [CrossRef]
Wang, J.S.; Shih, G.R.; Chiang, W.C. Sleep stage classification of sleep apnea patients using decision-tree-based support vector machines based on ECG parameters. In Proceedings of the 2012 IEEE-EMBS International Conference on Biomedical and Health Informatics, Hong Kong, China, 5–7 January 2012; pp. 285–288. [Google Scholar] [CrossRef]
Long, X.; Foussier, J.; Fonseca, P.; Haakma, R.; Aarts, R.M. Analyzing respiratory effort amplitude for automated sleep stage classification. Biomed. Signal Process. Control 2014, 14, 197–205. [Google Scholar] [CrossRef]
Redmond, S.J.; Chazal, P.; O’Brien, C.; Ryan, S.; McNicholas, W.T.; Heneghan, C. Sleep staging using cardiorespiratory signals. Somnologie 2007, 11, 245–250. [Google Scholar] [CrossRef]
Willemen, T.; Van Deun, D.; Verhaert, V.; Vandekerckhove, M.; Exadaktylos, V.; Verbraecken, J. An evaluation of cardiorespiratory and movement features with respect to sleep-stage classification. IEEE J. Biomed. Health Inform. 2014, 18, 661–669. [Google Scholar] [CrossRef] [PubMed]
Sharan, R.V.; Takeuchi, H.; Kishi, A.; Yamamoto, Y. Macro-sleep staging with ECG-derived instantaneous heart rate and respiration signals and multi-input 1D CNN-BiGRU. IEEE Trans. Instrum. Meas. 2024, 73, 2535212. [Google Scholar] [CrossRef]
Li, X.; Zhao, Z.; Zhu, Y.; Zhao, Q.; Li, J.; Feng, F. Automatic sleep identification using the novel hybrid feature selection method for HRV signal. Comput. Methods Programs Biomed. Update 2022, 2, 100050. [Google Scholar] [CrossRef]
Wang, G.; Lauri, F.; Hassani, A.H.E. Feature selection by mRMR method for heart disease diagnosis. IEEE Access 2022, 10, 100786–100796. [Google Scholar] [CrossRef]
Hashim, F.A.; Hussien, A.G. Snake Optimizer: A novel meta-heuristic optimization algorithm. Knowl.-Based Syst. 2022, 242, 108320. [Google Scholar] [CrossRef]
Zhou, X.; Ma, H.; Gu, J.; Chen, H.; Deng, W. Parameter adaptation-based ant colony optimization with dynamic hybrid mechanism. Eng. Appl. Artif. Intell. 2022, 114, 105139. [Google Scholar] [CrossRef]
Ichimaru, Y.; Moody, G.B. Development of the polysomnographic database on CD-ROM. Psychiatry Clin. Neurosci. 1999, 53, 175–177. [Google Scholar] [CrossRef]
Zhang, G.Q.; Cui, L.; Mueller, R.; Tao, S.; Kim, M.; Rueschman, M.; Mariani, S.; Mobley, D.; Redline, S. The National Sleep Research Resource: Towards a sleep data commons. J. Am. Med. Inform. Assoc. 2018, 25, 1351–1358. [Google Scholar] [CrossRef]
Quan, S.F.; Howard, B.V.; Iber, C.; Kiley, J.P.; Nieto, F.J.; O’Connor, G.T.; Rapoport, D.M.; Redline, S.; Obbins, J.; Samet, J.M.; et al. The Sleep Heart Health Study: Design, rationale, and methods. Sleep 1997, 20, 1077–1085. [Google Scholar] [CrossRef]
Al-Jabbar, A.; Entisar, Y.; Mohamedsheet Al-Hatab, M.M.; Qasim, M.A.; Fathel, W.R.; Fadhil, M.A. Clinical Fusion for Real-Time Complex QRS Pattern Detection in Wearable ECG Using the Pan-Tompkins Algorithm. Fusion Pract. Appl. 2023, 12, 172–184. [Google Scholar] [CrossRef]
Fonseca, P.; Long, X.; Radha, M.; Haakma, R.; Aarts, R.M.; Rolink, J. Sleep stage classification with ECG and respiratory effort. Physiol. Meas. 2015, 36, 2027. [Google Scholar] [CrossRef] [PubMed]
Akbari, H.; Sadiq, M.T.; Jafari, N.; Too, J.; Mikaeilvand, N.; Cicone, A.; Serra-Capizzano, S. Recognizing seizure using Poincaré plot of EEG signals and graphical features in DWT domain. Bratisl. Med. J./Bratisl. Lek. Listy 2023, 1234, 123–124. [Google Scholar] [CrossRef] [PubMed]
Haemmerle, P.; Hennings, E.; Eken, C.; Aeschbacher, S.; Coslovsky, M.; Schlageter, V.; Osswald, S.; Kuehne, M.; Zuern, C.S. Impaired heart rate variability triangular index predicts stroke and systemic embolism in patients with atrial fibrillation. Europace 2022, 24, euac053-161. [Google Scholar] [CrossRef]
Zhang, Z.; Chen, H.; Ning, P.; Yang, N.; Yuan, D. CAKD: A correlation-aware knowledge distillation framework based on decoupling Kullback-Leibler divergence. In Proceedings of the 2024 IEEE International Conference on Data Mining (ICDM), Abu Dhabi, United Arab Emirates, 9–12 December 2024; pp. 959–964. [Google Scholar] [CrossRef]
Wang, H.; Li, G.; Wang, Z. Fast SVM classifier for large-scale classification problems. Inf. Sci. 2023, 642, 119136. [Google Scholar] [CrossRef]
Birrer, V.; Elgendi, M.; Lambercy, O.; Menon, C. Evaluating reliability in wearable devices for sleep staging. npj Digit. Med. 2024, 7, 74. [Google Scholar] [CrossRef]
He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 12363–12384. [Google Scholar] [CrossRef]
Urtnasan, E.; Park, J.U.; Joo, E.Y.; Lee, K.J. Deep convolutional recurrent model for automatic scoring sleep stages based on single-lead ECG signal. Diagnostics 2022, 12, 1235. [Google Scholar] [CrossRef]
Jain, M.; Saihjpal, V.; Singh, N.; Singh, S.B. An overview of variants and advancements of PSO algorithm. Appl. Sci. 2022, 12, 8392. [Google Scholar] [CrossRef]
Migliorini, M.; Bianchi, A.M.; Nisticò, D.; Kortelainen, J.; Arce-Santana, E.; Cerutti, S.; Mendez, M.O. Automatic sleep staging based on ballistocardiographic signals recorded through bed sensors. In Proceedings of the 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, Buenos Aires, Argentina, 31 August–4 September 2010; pp. 3273–3276. [Google Scholar] [CrossRef]
Canisius, S.; Ploch, T.; Gross, V.; Jerrentrup, A.; Penzel, T.; Kesper, K. Detection of sleep disordered breathing by automated ECG analysis. In Proceedings of the 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada, 20–25 August 2008; pp. 2602–2605. [Google Scholar] [CrossRef]

Figure 1. Overall flowchart of algorithm design.

Figure 2. The example visualization process of the Snake–ACO algorithm.

Figure 3. Confusion matrices of sleep staging based on the ECG and respiration multimodal fusion model. (a) Confusion matrix of 2-class staging; (b) confusion matrix of 3-class staging; (c) confusion matrix of 4-class staging.

Figure 4. The comparison of three feature selection methods for sleep staging accuracy after feature selection.

Figure 5. The comparison of two-stage experimental results of different optimization algorithms.

Table 1. Features of ECG signal.

Feature Domain	Feature Number	Feature Description
Time Domain	1~6	SDNN, RMSSD, SDSD, NN50, pNN50, AVNN
Frequency Domain	7~12	TP, LF, HF, LF/HF, LFn, HFn
Nonlinear Domain	13~17	SD1, SD2, SDratio, S, HRV Triangular Index

Table 2. Features of respiratory signal.

Feature Domain	Feature Number	Feature Description
Time Domain	18~26	mean, std, skewness, kurtosis, max, min, range, median, IQR
Frequency Domain	27~29	LF, HF, LF/HF

Table 3. Comparison of experimental results of single-modal and multimodal signals.

Dataset	Signal Type	Stage	Accuracy (%)	95% CI (%)
MIT-BPD	ECG unimodal	W-S	78.20 ± 1.20	(75.94–80.46)
		W-R-N	70.20 ± 3.30	(66.15–74.25)
		W-R-N1N2-N3	65.00 ± 4.20	(59.83–70.17)
SHHS (base validation set)	ECG unimodal	W-S	75.50 ± 1.60	(72.85–77.15)
		W-R-N	68.00 ± 3.00	(64.32–71.68)
		W-R-N1N2-N3	62.00 ± 4.70	(56.68–67.32)
	ECG and respiration multimodal	W-S	89.60 ± 1.30	(87.23–91.97)
		W-R-N	78.50 ± 2.70	(75.01–81.99)
		W-R-N1N2-N3	68.90 ± 4.50	(63.82–73.98)

Table 4. Performance metrics of multi-class sleep staging tasks.

Stage	Class	Precision (%)	Recall (%)	F1-Score (%)
Two-Class (W-S)	W	78.29	75.07	76.65
Two-Class (W-S)	S	92.67	93.70	93.18
Three-Class (W-R-N)	W	76.41	74.11	75.24
	R	67.98	68.06	68.02
	N	83.27	85.61	84.42
Four-Class (W-R-N1N2-N3)	W	71.72	72.19	71.95
	R	65.15	65.34	65.24
	N1N2	67.15	69.10	68.11
	N3	60.26	55.51	57.79

Table 5. Accuracy of three algorithms on SHHS multimodal signals.

Dataset	Signal Type	Algorithm	Accuracy (%)
SHHS (base validation set)	ECG and respiration multimodal	PSO	86.30 ± 3.50
		BO	87.80 ± 2.10
		Snake–ACO	89.60 ± 1.30
SHHS (robustness verification set)	ECG and respiration multimodal	PSO	84.70 ± 4.80
		BO	85.50 ± 2.90
		Snake–ACO	89.30 ± 1.60

Table 6. Performance metrics of multi-class sleep staging tasks on robustness verification set based on subject partitioning.

Stage	Class	Precision (%)	Recall (%)	F1-score (%)	Overall Accuracy (%) ± SD	95% CI (%)
Two-Class (W-S)	W	76.85	73.56	75.18	87.80 ± 1.40	84.98–90.62
Two-Class (W-S)	S	91.23	92.45	91.83	87.80 ± 1.40	84.98–90.62
Three-Class (W-R-N)	W	74.12	72.05	73.07	76.30 ± 2.80	72.45–80.15
	R	66.52	66.71	66.61
	N	81.54	83.82	82.66
Four-Class (W-R-N1N2-N3)	W	67.33	67.95	67.64	64.10 ± 5.00	58.95–69.25
	R	61.08	61.25	61.16
	N1N2	63.01	64.82	63.90
	N3	55.89	51.67	53.68

Table 7. The comparison of similar staging results at home and abroad.

Author	Signal Type	Sleep Stage	Accuracy (%)	Kappa Coefficient
Redmond [12]	ECG, HR	W-S	89.00	0.60
Redmond [12]	ECG, HR	W-R-N	76.00	0.46 ± 0.10
Migliorini [32]	ECG, HR, MOV	W-R-N	77.00	0.55
Xiao [9]	ECG	W-R-N	76.20 ± 6.70	0.46 ± 0.09
Canisius [33]	HR	W-R-N	76.00	-
Li [15]	ECG	R-N	79.83	-
Sharan [14]	EDR, IHR	W-S	83.80	0.54
Si [1]	ECG, HR	W-R-N1-N2-N3	64.10	0.45
This paper	ECG, HR	W-S	89.60 ± 1.30	0.70 ± 0.05
		W-R-N	78.50 ± 2.70	0.62 ± 0.06
		W-R-N1N2-N3	68.90 ± 4.50	0.57 ± 0.08

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chu, W.; Wang, C.; Yang, L.; Guo, L.; Wu, C.; Wang, B.; Wan, X. Sleep Staging Method Based on Multimodal Physiological Signals Using Snake–ACO. Appl. Sci. 2026, 16, 1316. https://doi.org/10.3390/app16031316

AMA Style

Chu W, Wang C, Yang L, Guo L, Wu C, Wang B, Wan X. Sleep Staging Method Based on Multimodal Physiological Signals Using Snake–ACO. Applied Sciences. 2026; 16(3):1316. https://doi.org/10.3390/app16031316

Chicago/Turabian Style

Chu, Wenjing, Chen Wang, Liuwang Yang, Lin Guo, Chuquan Wu, Binhui Wang, and Xiangkui Wan. 2026. "Sleep Staging Method Based on Multimodal Physiological Signals Using Snake–ACO" Applied Sciences 16, no. 3: 1316. https://doi.org/10.3390/app16031316

APA Style

Chu, W., Wang, C., Yang, L., Guo, L., Wu, C., Wang, B., & Wan, X. (2026). Sleep Staging Method Based on Multimodal Physiological Signals Using Snake–ACO. Applied Sciences, 16(3), 1316. https://doi.org/10.3390/app16031316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sleep Staging Method Based on Multimodal Physiological Signals Using Snake–ACO

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Preprocessing

2.3. Feature Extraction

2.3.1. Feature Extraction of ECG Signals

2.3.2. Feature Extraction of Respiratory Signals

2.4. Feature Selection

2.5. Classifier Hyperparameter Optimization

3. Results

3.1. Evaluation Metrics

3.2. Verification of the Multimodal Fusion Mechanism

3.3. Robustness Analysis of the Feature Selection Method

3.4. Performance Comparison of Hyperparameter Optimization Algorithms

3.5. The Generalization Verification Experimental Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI