Modulation-Based Feature Extraction for Robust Sleep Stage Classification Across Apnea-Based Cohorts
Abstract
1. Introduction
- We introduce a domain adaptive modulation spectrograms framework that works beyond the standard ‘out-of-the-box’ application of time–frequency representations. This framework employs a novel constraint-aware windowing mechanism that reconciles the trade-off between the high-frequency resolution required for modulation features (60 s context) and the strict temporal localization required for clinical scoring (30 s epoch). By effectively integrating these constraints, the proposed method captures both spectral content and amplitude-modulation patterns that characterize sleep architecture, demonstrating particular effectiveness in identifying challenging transitional stages such as N1.
- We systematically demonstrate that our proposed methodology offers superior robustness against sleep fragmentation compared to the baseline. By evaluating Apnea–Hypopnea Index (AHI)-stratified cohorts (Normal, Mild, Moderate, and Severe), we show that as sleep apnea severity increases, the baseline performance degrades significantly. The proposed framework demonstrates stable performance in patient populations, addressing a critical gap in clinical generalizability.
- We conduct detailed ablation studies to identify optimal configurations for modulation-based features, including window length and image resolution, providing evidence-based guidelines for implementation and establishing a more generalizable and efficient method.
2. Related Work
- (A)
- Feature representation: Automatic sleep stage classification fundamentally relies on extracting discriminative features from physiological signals [13,25]. Early systems used hand-crafted features derived from expert knowledge, including power spectral density in standard EEG frequency bands (delta, theta, alpha, sigma, beta), statistical measures, and specific event detections such as sleep spindles and K-complexes [4]. While these features encoded valuable domain knowledge, they required extensive manual engineering. They could miss discriminative patterns not anticipated by human experts, motivating researchers to explore methods that could automatically learn optimal feature representations from data [8]. Time–frequency analysis emerged as a powerful approach for sleep-feature extraction because sleep EEG is fundamentally non-stationary, its frequency content changes as sleep stages evolve [26]. The Short-Time Fourier Transform (STFT) has become widely adopted for generating spectrograms that visualize how spectral content evolves [7]. However, the Short-Time Fourier Transform (STFT) suffers from a fixed time–frequency resolution trade-off: short windows provide good temporal localization but poor frequency resolution, while long windows provide the opposite [27]. To address this limitation, Continuous Wavelet Transform (CWT) provides adaptive multi-resolution analysis, yielding fine frequency resolution at low frequencies for slow oscillations and fine temporal resolution at high frequencies for rapid transients like sleep spindles [28]. Comparative studies have systematically evaluated different feature extraction methods to identify optimal representations. Ref. [29] compared multiple transformation methods, including CWT, STFT, FFT-based features, and recurrence plots using identical deep learning architectures, demonstrating that feature representation choice significantly impacts performance, often more than architectural innovations. Similarly, Ref. [30] found that time–frequency representations consistently outperformed raw signals or simple spectral features across different network architectures. These studies establish that feature representation is critical for effective sleep staging, with time–frequency methods generally providing superior performance by capturing both spectral content and temporal dynamics. Despite extensive exploration of time–frequency methods, existing approaches share a common limitation: they focus on instantaneous frequency content but do not explicitly model amplitude modulation patterns that characterize many sleep phenomena [31,32]. N2 sleep spindles exhibit characteristic waxing-waning amplitude envelopes, K-complexes show sharp amplitude transients, and N3 slow-wave activity displays distinctive amplitude fluctuation patterns. While STFT and CWT detect oscillations at specific frequencies and times, they do not explicitly represent how amplitude varies over time [31]. Modulation analysis addresses this gap by decomposing signals into carrier frequencies (spectral content) and modulation frequencies (rate of amplitude variation), producing modulation spectrograms that explicitly represent both dimensions [11]. Although modulation features have proven valuable in speech processing [33] and have shown promise in biomedical applications [12], they remain largely unexplored for sleep staging despite the prevalence of amplitude-modulated phenomena in sleep EEG.
- (B)
- Classification methods: For classification, deep learning has replaced traditional machine learning methods due to its ability to learn hierarchical representations. Convolutional neural networks (CNNs) have become the dominant approach for processing time–frequency representations of EEG signals, with numerous studies demonstrating their effectiveness on STFT spectrograms [34,35,36]. These architectures automatically learn spatial filters that detect discriminative patterns in spectrograms, such as sleep spindles, K-complexes, and characteristic frequency band signatures across different sleep stages [37]. Recurrent neural networks model temporal dependencies across consecutive sleep epochs. Ref. [8] introduced DeepSleepNet, combining CNNs with bidirectional LSTM to capture temporal context. More recently, attention mechanisms [9] and transformer architectures [38] have been explored to capture long-range dependencies. However, these require larger datasets and may not provide proportional gains when using well-designed feature representations. Ref. [39] proposed EEGSNet, integrating residual CNN blocks, bidirectional LSTM, and an auxiliary classifier for intermediate supervision, demonstrating strong performance across multiple datasets. We adopt EEGSNet as our backbone architecture to isolate the contribution of modulation-based features by maintaining a consistent classification framework across all feature types.
- (C)
- Clinical validation across patient populations: While significant progress has been made in developing sophisticated feature extraction methods and deep learning architectures for sleep staging, the systematic evaluation of these systems across clinically diverse patient populations remains critically underexplored. The majority of benchmark sleep datasets used in algorithm development including Sleep-EDF [8], MASS [13], and Sleep-EDFX [30] consist primarily of healthy subjects or heterogeneous populations without stratification by clinical severity. This evaluation paradigm may overestimate real-world performance, as algorithms deployed in clinical settings predominantly encounter patients with sleep disorders whose physiological characteristics differ substantially from healthy cohorts.Obstructive sleep apnea (OSA), one of the most prevalent sleep disorders affecting an estimated 425 million adults globally [17], fundamentally alters sleep architecture in ways that challenge automated staging systems. OSA is characterized by repeated upper airway obstructions during sleep, leading to intermittent hypoxemia, recurrent arousals, and sleep fragmentation [16,20]. These respiratory events disrupt normal sleep stage progression, resulting in increased stage transitions, reduced slow-wave sleep, fragmented REM sleep, and typical EEG patterns that may confound classification algorithms trained on healthy populations [21]. The Apnea–Hypopnea Index (AHI), defined as the average number of apnea and hypopnea events per hour of sleep [19], provides a clinically validated measure of OSA severity: AHI < 5 indicates normal sleep, AHI 5–14 indicates mild OSA, AHI 15–29 indicates moderate OSA, and AHI ≥ 30 indicates severe OSA [20].Recent work has begun to address this validation gap. Ref. [40] introduced the DREAMT dataset specifically designed to assess sleep staging algorithms on a population with sleep disorders, demonstrating that model performance varies significantly across demographic subgroups and highlighting systematic biases in algorithms trained on predominantly healthy cohorts. Ref. [5] proposed a two-branch neural network architecture explicitly designed to balance performance across multiple cohorts with varying sleep disorder prevalence, recognizing that standard training procedures may optimize for majority patterns while underperforming on clinical subpopulations. However, despite these initial efforts, few studies systematically evaluate sleep staging performance across AHI-stratified severity groups, and most published work reports only aggregate metrics across mixed populations.The clinical implications of this validation gap are substantial. Sleep stage classification directly informs treatment decisions, including CPAP titration for OSA patients, surgical candidacy assessment, and monitoring of therapeutic response [19]. Algorithms that perform well on healthy subjects but degrade in OSA populations risk misclassifying sleep architecture precisely in patients who most need an accurate assessment. Furthermore, the relationship between AHI severity and algorithm performance remains poorly characterized: it is unclear whether performance degrades linearly with increasing apnea severity, whether specific sleep stages become more difficult to classify in OSA populations, or whether specific feature extraction methods exhibit greater robustness to OSA-related EEG alterations.This work addresses these gaps by systematically evaluating modulation-based feature extraction across AHI-stratified cohorts (Normal, Mild, Moderate, and Severe) using the DREAMT dataset [40]. By comparing performance across severity groups and analyzing per-stage classification metrics, we provide insight into both the clinical generalizability of modulation spectrograms and the specific challenges posed by OSA populations for automated sleep staging systems. This stratified validation approach represents a critical step toward developing sleep staging algorithms that maintain robust performance across the diverse clinical populations encountered in real-world deployment.
3. Proposed Method
3.1. Dataset and AHI-Based Stratification
3.2. Modulation Spectrogram
- Extended Window (60 s): If the sleep stage is stable (), we utilize a window length of s covering the interval . This adheres to a strict agreement rule, ensuring the label represents the entire duration of the window.
- Fallback Window (30 s): If a transition occurs where (), we restrict the window length to the standard s to preserve the temporal boundry of transition.
4. Experimental Setup
4.1. Data Preprocessing
4.2. Deep Learning Models
4.2.1. Convolutional Neural Networks (CNNs)
4.2.2. Sequence Learning
4.2.3. Bidirectional LSTM (BiLSTM)
4.2.4. Auxiliary Classifier
4.3. Training Protocol
- Testing Set (20%): For final performance evaluation, a held-out subset was reserved exclusively to report the that can be seen in Section 5.
- Training and Validation Set (80%): The remaining subjects were utilized for model optimization. From this subset, we applied a stratified shuffle-split to allocate 15% of subjects to an internal Validation Set for hyperparameter tuning, with the remaining 85% used for training.
- Model Selection: We implemented an early-stopping mechanism based on the validation macro-F1 score to mitigate overfitting. The model was evaluated after every three epochs on the internal Validation Set, and training was halted if no improvement was observed for 10 consecutive evaluations (patience). The model state achieved the highest validation macro-F1, which was then restored and used for final inference on the held-out test set. This protocol was repeated for all 5-folds, with final results reported as mean ± standard deviation. This training workflow is illustrated in Figure 3.
4.4. Evaluation Metrics
- Accuracy (ACC): Accuracy measures the proportion of correctly classified epochs across all sleep stages. However, in imbalanced datasets such as sleep data, this metric can be misleading as it tends to bias toward the majority class and gives an inflated impression of performance [48]. While accuracy provides a general overview of model performance, it fails to capture how well the model performs on minority classes, which is critical for comprehensive sleep stage assessment.
- Macro-average F1 Score (MF1): To address the limitations of accuracy, we use the F1-score, which balances precision and recall. The macro F1-score assigns equal weights to all classes, offering a more balanced evaluation of model performance, particularly for minority stages such as N1 and N3. We define the F1-score for class i as:and compute the macro as:where l is the number of sleep stages. This metric ensures that model performance on underrepresented classes receives equal consideration alongside that of majority classes.
- Cohen’s Kappa Coefficient (): We use Cohen’s kappa coefficient to evaluate the level of agreement between predicted and actual sleep stages after adjusting for chance agreement [49]. In the context of sleep staging, evaluating kappa is crucial as it assesses how consistently the model reproduces stage boundaries and transitions relative to expert annotations. Unlike accuracy, kappa accounts for chance agreement, providing a more robust measure of classification performance. Higher kappa values indicate stronger agreement between model predictions and ground-truth labels, reflecting the model’s ability to capture the complex temporal patterns of sleep architecture.
4.5. Benchmark Systems
- Short-Time Fourier Transform (STFT): For the STFT baseline, we utilized a 30-s window length. This configuration strictly follows the standard preprocessing pipeline established in previous benchmarks [39], ensuring our baseline represents the current state-of-the-art. We explicitly maintained this 30-s window rather than extending it to 60 s because the STFT relies on signal stationarity. Extending the window would introduce non-stationarity, thereby compromising spectral resolution and reducing classification accuracy. Furthermore, following the benchmark [39], we resampled the EEG channel C4–M1 to 64 Hz and set the maximum analysis frequency to 32 Hz to capture EEG oscillations influencing sleep stage transitions. We compute spectrograms using a Hamming window with overlapping segments, apply power scaling, and restrict the frequency range to 0–32 Hz. Lastly, resultant images are cropped to to remove text and redundant blank areas.
- Continuous Wavelet Transform (CWT): To provide a robust comparison against established time–frequency representations, we implemented a CWT baseline. To be consistent with STFT baseline and AASM clinical scoring standard, we employed a 30-s window length [30,50]. This alignment with the standard duration ensures the CWT baseline strictly adheres to established protocols, thereby preventing methodological discrepancies introduced by arbitrary window sizes. Preprocessing was matched to the STFT pipeline: we resampled channel C4–M1 to 64 Hz, restricted the maximum frequency to 32 Hz, and normalized each 30-s epoch. We then transform the signal using an analytic Morlet (’amor’) filterbank with 12 frequency divisions per octave across the 0–32 Hz range, providing multi-resolution time–frequency representations. The resulting scalogram is normalized, and low-power scaling is applied, and the image is then resized to for input to the neural network.
5. Results and Discussion
5.1. Ablation Study: Optimal Modulation Spectrogram Configuration
5.1.1. Effect of Image Resolution
- 76 × 60 × 3: Matches baseline STFT/CWT dimensions for direct comparison
- 152 × 120 × 3: Doubles resolution to preserve finer carrier-modulation patterns
- 224 × 224 × 3: Standard CNN input size, enabling potential transfer learning
5.1.2. Effect of Window Length
- 30-s window: Single epoch (matches baseline methods).
- 60-s window: Formed by merging two consecutive epochs () only if they share the identical sleep stage label.
- 90-s window: Formed by merging three consecutive epochs () only if all three share the identical sleep stage label.
5.2. Deep Learning Based Model Comparison
5.3. Comparison of Proposed Method with BASELINE
- Normal AHI Group. Table 6 presents results for participants without clinically significant sleep apnea. Across the five-fold cross-validation, the proposed framework demonstrated robust performance, achieving a mean accuracy of . In terms of agreement and class-wise balance, the model yielded a Cohen’s kappa of and a Macro-F1 score of , confirming that the method generalizes well across different data partitions.
- Mild AHI Group. Table 7 reveals that performance advantages amplify in the Mild AHI group. The proposed framework achieved substantial gains, outperforming the STFT baseline by 7% in accuracy and 12% in Macro-F1. Furthermore, it surpassed the CWT method by 5% in accuracy, coupled with a 12% macro-F1. The proposed framework improvement over baselines suggests that modulation features capture patterns that are more resilient to the increased arousals and stage fragmentation present in mild OSA. Despite sleep irregularities characteristic of this clinical population, modulation-based feature extraction remained robust and provided superior discrimination of sleep stage transitions.
- Moderate AHI Group. Results for the Moderate AHI group, shown in Table 8, reveal a slightly different pattern where modulation spectrograms performed comparably to baselines rather than being superior. While baselines achieved marginally higher accuracy, modulation maintained competitive macro-F1, indicating more balanced per-stage performance despite lower overall accuracy. This reduced performance advantage likely reflects the increased sleep variability and fragmentation in moderate OSA, which may limit the effectiveness of detailed modulation pattern analysis. Notably, CWT’s higher accuracy, coupled with equal macro-F1 and higher standard deviation, suggests a dependency on majority classes (particularly N2 and Wake), whereas modulation spectrograms maintained greater class-level balance. This trade-off between overall accuracy and balanced class performance represents an important consideration for clinical applications, where reliable detection of all sleep stages, not just dominant ones, is essential for comprehensive sleep architecture assessment.
- Severe AHI Group. Table 9 demonstrates a comparison of the proposed and benchmark approaches in the Severe AHI group. Modulation spectrograms achieved 0.66 ± 0.05 accuracy, 0.52 ± 0.05 Cohen’s kappa, and 0.57 ± 0.05 macro-F1, substantially outperforming STFT and CWT. Both baseline methods showed marked performance degradation in this challenging population, with STFT experiencing particularly sharp drops in Cohen’s kappa (a 15 percentage-point decrease from Normal to Severe) and macro-F1 (a 11-point decrease). These drops indicate baseline methods’ sensitivity to sleep fragmentation and inability to handle severe irregularities characteristic of advanced OSA. In contrast, modulation-based representations showed a more gradual decline in performance (7.8 percentage-point decrease in accuracy, 6-point decrease in macro-F1 from Normal to Severe), illustrating greater resilience to increased signal noise and irregular sleep patterns commonly observed in severe OSA patients.
5.3.1. Per-Stage Performance Analysis
5.3.2. Clinical Implications
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chattu, V.K.; Manzar, M.D.; Kumary, S.; Burman, D.; Spence, D.W.; Pandi-Perumal, S.R. The global problem of insufficient sleep and its serious public health implications. Healthcare 2018, 7, 1. [Google Scholar] [CrossRef]
- Piwek, L.; Ellis, D.A.; Andrews, S.; Joinson, A. The Rise of Consumer Health Wearables: Promises and Barriers. PLoS Med. 2016, 13, e1001953. [Google Scholar] [CrossRef]
- Gaiduk, M.; Serrano Alarcón, Á.; Seepold, R.; Martínez Madrid, N. Current status and prospects of automatic sleep stages scoring: Review. Biomed. Eng. Lett. 2023, 13, 247–272. [Google Scholar] [CrossRef]
- Farag, A.F.; El-Metwally, S.M.; Morsy, A.A. A Sleep Scoring System Using EEG Combined Spectral and Detrended Fluctuation Analysis Features. J. Biomed. Sci. Eng. 2014, 7, 584–592. [Google Scholar] [CrossRef]
- Zhang, D.; Sun, J.; She, Y.; Cui, Y.; Zeng, X.; Lu, L.; Tang, C.; Xu, N.; Chen, B.; Qin, W. A two-branch trade-off neural network for balanced scoring sleep stages on multiple cohorts. Front. Neurosci. 2023, 17, 1176551. [Google Scholar] [CrossRef]
- Haghayegh, S.; Hu, K.; Stone, K.; Redline, S.; Schernhammer, E. Automated Sleep Stages Classification Using Convolutional Neural Network From Raw and Time-Frequency Electroencephalogram Signals: Systematic Evaluation Study. J. Med. Internet Res. 2023, 25, e40211. [Google Scholar] [CrossRef] [PubMed]
- Xu, Z.; Yang, X.; Sun, J.; Liu, P.; Qin, W. Sleep Stage Classification Using Time-Frequency Spectra From Consecutive Multi-Time Points. Front. Neurosci. 2020, 14, 14. [Google Scholar] [CrossRef]
- Supratak, A.; Dong, H.; Wu, C.; Guo, Y. DeepSleepNet: A model for automatic sleep stage scoring based on raw single-channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 2017, 25, 1998–2008. [Google Scholar] [CrossRef]
- Eldele, E.; Chen, Z.; Liu, C.; Wu, M.; Kwoh, C.K.; Li, X.; Guan, C. An Attention-Based Deep Learning Approach for Sleep Stage Classification with Single-Channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 809–818. [Google Scholar] [CrossRef] [PubMed]
- Boostani, R.; Karimzadeh, F.; Nami, M. A Comparative Review on Sleep Stage Classification Methods in Patients and Healthy Individuals. Comput. Methods Programs Biomed. 2017, 140, 77–91. [Google Scholar] [CrossRef] [PubMed]
- Flinker, A.; Doyle, W.K.; Mehta, A.D.; Devinsky, O.; Poeppel, D. Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries. Nat. Hum. Behav. 2019, 3, 393–405. [Google Scholar] [CrossRef]
- Tiwari, A.; Cassani, R.; Kshirsagar, S.; Tobon, D.P.; Zhu, Y.; Falk, T.H. Modulation Spectral Signal Representation for Quality Measurement and Enhancement of Wearable Device Data: A Technical Note. Sensors 2022, 22, 4579. [Google Scholar] [CrossRef]
- Phan, H.; Andreotti, F.; Cooray, N.; Chén, O.Y.; De Vos, M. SeqSleepNet: End-to-end hierarchical recurrent neural network for automatic sleep staging. IEEE Trans. Neural Syst. Rehabil. Eng. 2019, 27, 400–410. [Google Scholar] [CrossRef]
- Biswal, S.; Sun, H.; Goparaju, B.; Westover, M.B.; Sun, J.; Bianchi, M.T. Expert-level sleep scoring with deep neural networks. J. Am. Med. Inform, Assoc. 2018, 25, 1643–1650. [Google Scholar] [CrossRef] [PubMed]
- Stephansen, J.B.; Olesen, A.N.; Olsen, M.; Ambati, A.; Leary, E.B.; Moore, H.E.; Carrillo, O.; Lin, L.; Han, F.; Yan, H.; et al. Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy. Nat. Commun. 2018, 9, 5229. [Google Scholar] [CrossRef] [PubMed]
- Javaheri, S.; Barbe, F.; Campos-Rodriguez, F.; Dempsey, J.A.; Khayat, R.; Javaheri, S.; Malhotra, A.; Martinez-Garcia, M.A.; Mehra, R.; Pack, A.I.; et al. Sleep apnea: Types, mechanisms, and clinical cardiovascular consequences. J. Am. Coll. Cardiol. 2017, 69, 841–858. [Google Scholar] [CrossRef]
- Senaratna, C.V.; Perret, J.L.; Lodge, C.J.; Lowe, A.J.; Campbell, B.E.; Matheson, M.C.; Hamilton, G.S.; Dharmage, S.C. Prevalence of obstructive sleep apnea in the general population: A systematic review. Sleep Med. Rev. 2017, 34, 70–81. [Google Scholar] [CrossRef]
- Guilleminault, C.; Bassiri, A. Clinical features and evaluation of obstructive sleep apnea-hypopnea syndrome and upper airway resistance syndrome. In Principles and Practice of Sleep Medicine, 4th ed.; Elsevier Saunders: Philadelphia, PA, USA, 2005; pp. 1043–1052. [Google Scholar]
- Berry, R.B.; Budhiraja, R.; Gottlieb, D.J.; Gozal, D.; Iber, C.; Kapur, V.K.; Marcus, C.L.; Mehra, R.; Parthasarathy, S.; Quan, S.F.; et al. Rules for scoring respiratory events in sleep: Update of the 2007 AASM Manual for the Scoring of Sleep and Associated Events. J. Clin. Sleep Med. 2012, 8, 597–619. [Google Scholar] [CrossRef]
- Gottlieb, D.J.; Punjabi, N.M. Diagnosis and management of obstructive sleep apnea: A review. JAMA 2020, 323, 1389–1400. [Google Scholar] [CrossRef]
- Younes, M.; Raneri, J.; Hanly, P. Reliability of the American Academy of Sleep Medicine rules for assessing sleep depth in clinical practice. J. Clin. Sleep Med. 2015, 11, 109–116. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Wu, Y. Automatic sleep stage classification of single-channel EEG by using complex-valued convolutional neural network. Biomed. Eng. Tech. 2018, 63, 177–190. [Google Scholar]
- Fiorillo, L.; Puiatti, A.; Papandrea, M.; Ratti, P.L.; Favaro, P.; Roth, C.; Bargiotas, P.; Bassetti, C.L.; Faraci, F.D. Automated sleep scoring: A review of the latest approaches. Sleep Med. Rev. 2019, 48, 101204. [Google Scholar] [CrossRef]
- Sridhar, N.; Shoeb, A.; Stephens, P.; Kharbouch, A.; Shimol, D.B.; Burkart, J.; Ghoreyshi, A.; Myers, L. Deep learning for automated sleep staging using instantaneous heart rate. NPJ Digit. Med. 2020, 3, 106. [Google Scholar]
- Tsinalis, O.; Matthews, P.M.; Guo, Y. Automatic sleep stage scoring using time–frequency analysis and stacked sparse autoencoders. IEEE Trans. Neural Syst. Rehabil. Eng. 2016, 24, 1000–1011. [Google Scholar]
- Acharya, U.R.; Sree, S.V.; Swapna, G.; Martis, R.J.; Suri, J.S. Automated EEG analysis of epilepsy: A review. Knowl.-Based Syst. 2013, 45, 147–165. [Google Scholar] [CrossRef]
- Boashash, B. Time–Frequency Signal Analysis and Processing: A Comprehensive Reference; Elsevier Academic Press: Oxford, UK, 2003. [Google Scholar]
- Tary, J.B.; Herrera, R.H.; van der Baan, M. Analysis of time-varying signals using continuous wavelet and synchrosqueezed transforms. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2018, 376, 20170254. [Google Scholar] [CrossRef]
- Lekkas, G.; Vrochidou, E.; Papakostas, G.A. Time–Frequency Transformations for Enhanced Biomedical Signal Classification with Convolutional Neural Networks. BioMedInformatics 2025, 5, 7. [Google Scholar] [CrossRef]
- Jadhav, P.; Rajguru, G.; Datta, D.; Mukhopadhyay, S. Automatic sleep stage classification using time–frequency images of CWT and transfer learning using convolution neural network. Biocybern. Biomed. Eng. 2020, 40, 494–504. [Google Scholar] [CrossRef]
- Atlas, L.E.; Shamma, S.A. Joint Acoustic and Modulation Frequency. EURASIP J. Appl. Signal Process. 2003, 2003, 310290. [Google Scholar] [CrossRef]
- Singh, N.; Theunissen, F. Modulation spectra of natural sounds and ethological theories of auditory processing. J. Acoust. Soc. Am. 2003, 114, 3394–3411. [Google Scholar] [CrossRef]
- Kshirsagar, S.R.; Falk, T.H. Quality-aware bag of modulation spectrum features for robust speech emotion recognition. IEEE Trans. Affect. Comput. 2022, 13, 1892–1905. [Google Scholar] [CrossRef]
- Sors, A.; Bonnet, S.; Mirek, S.; Vercueil, L.; Payen, J.F. A convolutional neural network for sleep stage scoring from raw single-channel EEG. Biomed. Signal Process. Control 2018, 42, 107–114. [Google Scholar] [CrossRef]
- Chambon, S.; Galtier, M.N.; Arnal, P.J.; Wainrib, G.; Gramfort, A. A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Trans. Neural Syst. Rehabil. Eng. 2018, 26, 758–769. [Google Scholar] [CrossRef]
- Mousavi, S.; Afghah, F.; Acharya, U.R. SleepEEGNet: Automated sleep stage scoring with sequence to sequence deep learning approach. PLoS ONE 2019, 14, e0216456. [Google Scholar] [CrossRef]
- Vilamala, A.; Madsen, K.H.; Hansen, L.K. Deep convolutional neural networks for interpretable analysis of EEG sleep stage scoring. In Proceedings of the 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, Japan, 25–28 September 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Phan, H.; Mikkelsen, K.; Chen, O.Y.; Koch, P.; Mertins, A.; De Vos, M. L-SeqSleepNet: Whole-cycle long sequence modelling for automatic sleep staging. IEEE J. Biomed. Health Inform. 2023, 27, 359–370. [Google Scholar]
- Li, C.; Qi, Y.; Ding, X.; Zhao, J.; Sang, T.; Lee, M. A deep learning method approach for sleep stage classification with EEG spectrogram. Int. J. Environ. Res. Public Health 2022, 19, 6322. [Google Scholar] [CrossRef]
- Wang, W.K.; Yang, J.; Hershkovich, L.; Jeong, H.; Chen, B.; Singh, K.; Roghanizad, A.R.; Shandhi, M.M.H.; Spector, A.R.; Dunn, J. Addressing wearable sleep tracking inequity: A new dataset and novel methods for a population with sleep disorders. In Proceedings of the Conference on Health, Inference, and Learning (CHIL), New York, NY, USA, 27–28 June 2024; Volume 248, pp. 380–396. [Google Scholar]
- Wu, S.; Falk, T.H.; Chan, W.Y. Automatic speech emotion recognition using modulation spectral features. Speech Commun. 2011, 53, 768–785. [Google Scholar] [CrossRef]
- Theunissen, F.E.; Elie, J.E. Neural processing of natural sounds. Nat. Rev. Neurosci. 2014, 15, 355–366. [Google Scholar] [CrossRef]
- Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2015, arXiv:1409.0473. [Google Scholar]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Opitz, J. A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice. Trans. Assoc. Comput. Linguist. 2024, 12, 820–836. [Google Scholar] [CrossRef]
- Yang, J.; Chen, Y.; Yu, T.; Zhang, Y. LMCSleepNet: A Lightweight Multi-Channel Sleep Staging Model Based on Wavelet Transform and Multi-Scale Convolutions. Sensors 2025, 25, 6065. [Google Scholar] [CrossRef]
- Rosenberg, R.S.; Van Hout, S. The American Academy of Sleep Medicine inter-scorer reliability program: Sleep stage scoring. J. Clin. Sleep Med. 2013, 9, 81–87. [Google Scholar] [CrossRef]






| Dataset | Subjects (Train/Validation/Test) | W | N1 | N2 | N3 | REM | Total |
|---|---|---|---|---|---|---|---|
| DREAMT | 100 | 20,041 | 8818 | 39,953 | 2704 | 8387 | 79,903 |
| Normal (<5 events) | 26 (18/4/4) | 5638 | 1715 | 10,246 | 567 | 2508 | 20,674 |
| Mild (5–14) | 25 (17/4/4) | 4809 | 1971 | 10,295 | 906 | 2294 | 20,275 |
| Moderate (15–29) | 24 (16/4/4) | 4711 | 2189 | 9816 | 647 | 1907 | 19,270 |
| Severe (≥30) | 25 (17/4/4) | 4883 | 2943 | 9596 | 584 | 1678 | 19,684 |
| Parameter | STFT (Baseline) | CWT (Baseline) | Modulation (Proposed) |
|---|---|---|---|
| 1. Preprocessing & Input | |||
| Sampling Rate | 100 Hz | ||
| Window Duration | 30 s (Fixed) | 30 s (Fixed) | Adaptive (30 s/60 s) |
| Stride | 30 s | 30 s | 30 s |
| 2. Feature Extraction | |||
| Spectral Resolution | 256 freq bins | 64 scales | 44 freq bins |
| Window Function | Hamming | Morlet | Rectangular |
| Frequency Range | 0–32 Hz | 0–32 Hz | 0.5–30 Hz |
| 3. Model Training | |||
| Architecture | CNN + BiLSTM | CNN + BiLSTM | CNN + BiLSTM |
| Sequence Length | |||
| Optimizer | Adam (, ) | ||
| Weight Decay | |||
| Stopping Criterion | Patience = 10 (Metric: Val Macro-F1) | ||
| Image Size | Accuracy | Kappa | Macro-F1 |
|---|---|---|---|
| 76 × 60 × 3 | 0.656 | 0.526 | 0.552 |
| 152 × 120 × 3 | 0.739 | 0.624 | 0.643 |
| 224 × 224 × 3 | 0.724 | 0.611 | 0.628 |
| Window Length | Accuracy | Kappa | Macro-F1 |
|---|---|---|---|
| 30-s | 0.669 | 0.526 | 0.550 |
| 60-s | 0.739 | 0.624 | 0.643 |
| 90-s | 0.731 | 0.610 | 0.635 |
| Method | Model | Window | Resolution | Acc | Kappa | MF1 |
|---|---|---|---|---|---|---|
| STFT | CNN | 30 s | 76 × 60 × 3 | 0.676 | 0.498 | 0.514 |
| STFT | EEGSNet | 30 s | 76 × 60 × 3 | 0.728 | 0.614 | 0.610 |
| CWT | CNN | 30 s | 76 × 60 × 3 | 0.629 | 0.459 | 0.490 |
| CWT | EEGSNet | 30 s | 76 × 60 × 3 | 0.718 | 0.599 | 0.591 |
| Modulation | CNN | 60 s | 152 × 120 × 3 | 0.659 | 0.512 | 0.560 |
| Modulation | EEGSNet | 60 s | 152 × 120 × 3 | 0.739 | 0.625 | 0.643 |
| Method | Window | Image Size | Accuracy (Mean ± SD) | Kappa (Mean ± SD) | Macro-F1 (Mean ± SD) |
|---|---|---|---|---|---|
| STFT | 30 s | 76 × 60 × 3 | 0.72 ± 0.11 | 0.61 ± 0.14 | 0.59 ± 0.11 |
| CWT | 30 s | 76 × 60 × 3 | 0.73 ± 0.04 | 0.61 ± 0.05 | 0.59 ± 0.07 |
| Modulation | 60 s | 152 × 120 × 3 | 0.74 ± 0.08 | 0.62 ± 0.11 | 0.63 ± 0.06 |
| Method | Window | Image Size | Accuracy (Mean ± SD) | Kappa (Mean ± SD) | Macro-F1 (Mean ± SD) |
|---|---|---|---|---|---|
| STFT | 30 s | 76 × 60 × 3 | 0.64 ± 0.09 | 0.49 ± 0.10 | 0.51 ± 0.08 |
| CWT | 30 s | 76 × 60 × 3 | 0.66 ± 0.11 | 0.51 ± 0.13 | 0.51 ± 0.12 |
| Modulation | 60 s | 152 × 120 × 3 | 0.71 ± 0.05 | 0.59 ± 0.06 | 0.63 ± 0.06 |
| Method | Window | Image Size | Accuracy (Mean ± SD) | Kappa (Mean ± SD) | Macro-F1 (Mean ± SD) |
|---|---|---|---|---|---|
| STFT | 30 s | 76 × 60 × 3 | 0.70 ± 0.06 | 0.56 ± 0.08 | 0.59 ± 0.06 |
| CWT | 30 s | 76 × 60 × 3 | 0.72 ± 0.05 | 0.59 ± 0.07 | 0.59 ± 0.09 |
| Modulation | 60 s | 152 × 120 × 3 | 0.68 ± 0.07 | 0.56 ± 0.09 | 0.59 ± 0.06 |
| Method | Window | Image Size | Accuracy (Mean ± SD) | Kappa (Mean ± SD) | Macro-F1 (Mean ± SD) |
|---|---|---|---|---|---|
| STFT | 30 s | 76 × 60 × 3 | 0.65 ± 0.05 | 0.46 ± 0.09 | 0.48 ± 0.09 |
| CWT | 30 s | 76 × 60 × 3 | 0.62 ± 0.08 | 0.44 ± 0.10 | 0.42 ± 0.06 |
| Modulation | 60 s | 152 × 120 × 3 | 0.66 ± 0.05 | 0.52 ± 0.05 | 0.57 ± 0.05 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tallal, U.; Agrawal, R.; Kshirsagar, S. Modulation-Based Feature Extraction for Robust Sleep Stage Classification Across Apnea-Based Cohorts. Biosensors 2026, 16, 56. https://doi.org/10.3390/bios16010056
Tallal U, Agrawal R, Kshirsagar S. Modulation-Based Feature Extraction for Robust Sleep Stage Classification Across Apnea-Based Cohorts. Biosensors. 2026; 16(1):56. https://doi.org/10.3390/bios16010056
Chicago/Turabian StyleTallal, Unaza, Rupesh Agrawal, and Shruti Kshirsagar. 2026. "Modulation-Based Feature Extraction for Robust Sleep Stage Classification Across Apnea-Based Cohorts" Biosensors 16, no. 1: 56. https://doi.org/10.3390/bios16010056
APA StyleTallal, U., Agrawal, R., & Kshirsagar, S. (2026). Modulation-Based Feature Extraction for Robust Sleep Stage Classification Across Apnea-Based Cohorts. Biosensors, 16(1), 56. https://doi.org/10.3390/bios16010056

