MFSleepNet: An Interactive Multimodal Fusion Framework for Automatic Sleep Staging
Highlights
- MFSleepNet consistently outperforms representative single-modality and multimodal sleep staging methods on the Sleep-EDF, SHHS, and HSP datasets.
- The interaction-based EEG–EOG fusion mechanism improves multimodal feature representation and sleep stage classification performance.
- The gated temporal-channel attention module enhances discriminative temporal and channel-wise physiological patterns.
- Cross-subject evaluation reveals substantial inter-subject variability and limited baseline generalization on unseen subjects.
- Structured inter-modality interaction is beneficial for multimodal physiological signal fusion in sleep staging.
- Subject-specific adaptation can effectively improve personalization performance with limited labeled data.
- Inter-subject variability remains a major challenge for practical sleep staging systems and motivates future research on more robust subject-independent modeling.
Abstract
1. Introduction
- (1)
- A multimodal sleep staging framework is proposed that incorporates cross-modality interaction between EEG and EOG signals, moving beyond conventional fusion strategies based on independent processing and simple concatenation.
- (2)
- An interaction-based multimodal fusion module is developed to model cross-modal dependencies via bidirectional channel-wise modulation between EEG and EOG representations.
- (3)
- A gated temporal-channel attention block is introduced to refine the fused features by adaptively weighting temporal and channel-wise information, thereby enhancing discriminative patterns for sleep stage classification.
- (4)
- Extensive validation on multiple public datasets under an epoch-level cross-validation protocol demonstrates improved classification performance.
2. Related Work
2.1. Single-Modality Sleep Staging Methods
2.2. Multi-Modality Sleep Staging Methods
3. Materials and Methods
3.1. Dataset
3.1.1. Sleep-EDF
3.1.2. Sleep Heart Health Study
3.1.3. Human Sleep Project
3.2. Data Preprocessing
3.3. Model
3.3.1. Overview of the Model
3.3.2. Feature Extraction Backbone
3.3.3. Multimodal Feature Fusion Module
3.3.4. Gated Temporal-Channel Attention Block
3.3.5. Classification Head
3.4. Implementation Details
3.5. Evaluation Metrics
4. Experiments and Results
4.1. Comparison with State-of-the-Art Methods
4.2. Ablation Study
4.3. EEG–EOG Correlation Analysis and Inter-Modality Interaction Effects
4.4. Grad-CAM–Based Visualization and Interpretability Analysis
4.5. Cross-Subject Evaluation and Subject-Adaptive External Validation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mukherjee, U.; Sehar, U.; Brownell, M.; Reddy, P.H. Mechanisms, consequences and role of interventions for sleep deprivation: Focus on mild cognitive impairment and Alzheimer’s disease in elderly. Ageing Res. Rev. 2024, 100, 15. [Google Scholar] [CrossRef] [PubMed]
- Ren, Q.; Gu, M.Y.; Fan, X. Circadian clock genes and insomnia: Molecular mechanisms and therapeutic implications. Ann. Med. 2025, 57, 18. [Google Scholar] [CrossRef] [PubMed]
- Shi, L.; Gui, R.R.; Wang, L.; Li, P.; Niu, Q.F. A Multi-Task Deep Learning Approach for Simultaneous Sleep Staging and Apnea Detection for Elderly People. Interdiscip. Sci. 2025, 16, 341–356. [Google Scholar] [CrossRef] [PubMed]
- Ma, Y.L.; Li, C.P.; Xu, Y.W.; Tan, X.D.; Yu, X.F.; Zhan, C.A. Unsupervised clustering of extensive physiological features substantiates five-stage sleep staging paradigm. Sleep 2025, 22, zsaf284. [Google Scholar] [CrossRef]
- Lee, H.Y.J.; Choi, Y.R.; Lee, H.K.; Jeong, J.; Hong, J.P.Y.; Shin, H.W.; Kim, H.S. Explainable vision transformer for automatic visual sleep staging on multimodal PSG signals. NPJ Digit. Med. 2025, 8, 14. [Google Scholar] [CrossRef]
- Zhang, L.D.; Fabbri, D.; Upender, R.; Kent, D. Automated sleep stage scoring of the Sleep Heart Health Study using deep neural networks. Sleep 2019, 42, 10. [Google Scholar] [CrossRef]
- Shi, L.; Gui, R.R.; Niu, Q.F.; Li, P. A Multimodal Hand Movement Recognition Framework Based on S-Transform and ISDNet. IEEE Sens. J. 2025, 25, 11672–11682. [Google Scholar] [CrossRef]
- Zhang, X.L.; Zhang, X.Z.; Huang, Q.; Lv, Y.; Chen, F.M. A review of automated sleep stage based on EEG signals. Biocybern. Biomed. Eng. 2024, 44, 651–673. [Google Scholar] [CrossRef]
- Fathima, S.; Ahmed, M. Sleep Apnea Detection Using EEG: A Systematic Review of Datasets, Methods, Challenges, and Future Directions. Ann. Biomed. Eng. 2025, 53, 1043–1067. [Google Scholar] [CrossRef]
- Wenjian, W.; Qian, X.; Jun, X.; Zhikun, H. DynamicSleepNet: A multi-exit neural network with adaptive inference time for sleep stage classification. Front. Physiol. 2023, 14, 13. [Google Scholar] [CrossRef]
- Kazemi, K.; Azimi, I.; Khine, M.; Khayat, R.N.; Rahmani, A.M.; Liljeberg, P. Multimodal Sleep Stage and Sleep Apnea Classification Using Vision Transformer: A Multitask Explainable Learning Approach. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Copenhagen, Denmark, 14–18 July 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–7. [Google Scholar] [CrossRef]
- Zhu, Y.F.; Wu, Y.X.; Wang, Z.Y.; Zhou, L.G.; Chen, C.; Xu, Z.F.; Chen, W. AFSleepNet: Attention-Based Multi-View Feature Fusion Framework for Pediatric Sleep Staging. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 4022–4032. [Google Scholar] [CrossRef] [PubMed]
- Cao, Y.; Xiang, W.; Wei, J.; Cao, S.S.; Tian, X.H.; Zhong, J.J.; Fang, X.L.; Luo, B.; Lyu, H.; Li, X.K. CrossFusionSleepNet: A multimodal deep learning model for automatic sleep stage classification. Biomed. Signal Process. Control 2026, 112, 9. [Google Scholar] [CrossRef]
- Jia, Z.Y.; Cai, X.Y.; Jiao, Z.H. Multi-Modal Physiological Signals Based Squeeze-and-Excitation Network With Domain Adversarial Learning for Sleep Staging. IEEE Sens. J. 2022, 22, 3464–3471. [Google Scholar] [CrossRef]
- Fan, J.X.; Zhao, M.F.; Huang, L.; Tang, B.; Wang, L.R.; He, Z.; Peng, X.L. Multimodal sleep staging network based on obstructive sleep apnea. Front. Comput. Neurosci. 2024, 18, 15. [Google Scholar] [CrossRef]
- Huang, J.; Ren, L.F.; Zhou, X.K.; Yan, K. An Improved Neural Network Based on SENet for Sleep Stage Classification. IEEE J. Biomed. Health Inform. 2022, 26, 4948–4956. [Google Scholar] [CrossRef]
- Eldele, E.; Chen, Z.H.; Liu, C.Y.; Wu, M.; Kwoh, C.K.; Li, X.L.; Guan, C.T. An Attention-Based Deep Learning Approach for Sleep Stage Classification With Single-Channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 809–818. [Google Scholar] [CrossRef]
- Zhao, C.H.; Li, J.B.; Guo, Y.H. SleepContextNet: A temporal context network for automatic sleep staging based single-channel EEG. Comput. Meth. Programs Biomed. 2022, 220, 12. [Google Scholar] [CrossRef]
- Liu, C.Y.; Guan, Q.L.; Zhang, W.; Sun, L.Y.; Wang, M.Y.; Dong, X.; Xu, S.G. MultiScaleSleepNet: A Hybrid CNN-BiLSTM-Transformer Architecture with Multi-Scale Feature Representation for Single-Channel EEG Sleep Stage Classification. Sensors 2025, 25, 6328. [Google Scholar] [CrossRef]
- Zhou, H.; Su, M.; Pan, J.-S.; Dai, C.; Chen, Y.; Chu, S.-C. SleepHybridNet: A Lightweight Hybrid CNN-Transformer Model for Enhanced N1 Sleep Staging From Single-Channel EEG. IEEE J. Biomed. Health Inform, 2025; Online ahead of print. [CrossRef]
- Jia, Z.Y.; Cai, X.Y.; Zheng, G.X.; Wang, J.; Lin, Y.F. SleepPrintNet: A Multivariate Multimodal Neural Network Based on Physiological Time-Series for Automatic Sleep Staging. IEEE Trans. Artif. Intell. 2020, 1, 248–257. [Google Scholar] [CrossRef]
- Yubo, Z.; Yingying, L.; Bing, Z.; Lin, Z.; Lei, L. MMASleepNet: A multimodal attention network based on electrophysiological signals for automatic sleep staging. Front. Neurosci. 2022, 16, 11. [Google Scholar] [CrossRef]
- Duan, L.J.; Ma, B.; Yin, Y.; Huang, Z.Y.; Qiao, Y.H. MMS-SleepNet: A knowledge-based multimodal and multiscale network for sleep staging. Biomed. Signal Process. Control 2025, 103, 13. [Google Scholar] [CrossRef]
- Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef]
- Kemp, B.; Zwinderman, A.H.; Tuk, B.; Kamphuisen, H.A.; Oberye, J.J. Analysis of a sleep-dependent neuronal feedback loop: The slow-wave microcontinuity of the EEG. IEEE Trans. Bio-Med. Eng. 2000, 47, 1185–1194. [Google Scholar] [CrossRef]
- Quan, S.F.; Howard, B.V.; Iber, C.; Kiley, J.P.; Nieto, F.J.; O’Connor, G.T.; Rapoport, D.M.; Redline, S.; Robbins, J.; Samet, J.M.; et al. The Sleep Heart Health Study: Design, rationale, and methods. Sleep 1997, 20, 1077–1085. [Google Scholar] [CrossRef] [PubMed]
- Zhang, G.Q.; Cui, L.C.; Mueller, R.; Tao, S.Q.; Kim, M.; Rueschman, M.; Mariani, S.; Mobley, D.; Redline, S. The National Sleep Research Resource: Towards a sleep data commons. J. Am. Med. Inf. Assoc. 2018, 25, 1351–1358. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Yao, W.P.; Yao, W.L.; Wang, J. Threshold distribution of equal states for quantitative amplitude fluctuations. Physiol. Meas. 2023, 44, 9. [Google Scholar] [CrossRef]
- Fradi, M.; Gtifa, W.; Hnaien, A.; Abdesslem, C.; Sakly, A.; Machhout, M. Convolutional neural network application for automatic epilepsy detection in EEG signals. Analog Integr. Circuits Process. 2025, 126, 14. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11534–11542. [Google Scholar]
- Vieira, S.M.; Kaymak, U.; Sousa, J.M. Cohen’s kappa coefficient as a performance measure for feature selection. In Proceedings of the International Conference on Fuzzy Systems, Barcelona, Spain, 18–23 July 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–8. [Google Scholar]
- Duan, L.J.; Li, M.Y.; Wang, C.M.; Qiao, Y.H.; Wang, Z.Y.; Sha, S.; Li, M.G. A Novel Sleep Staging Network Based on Data Adaptation and Multimodal Fusion. Front. Hum. Neurosci. 2021, 15, 10. [Google Scholar] [CrossRef]
- Pitkänen, M.; Huovinen, J.; Rissanen, M.; Pitkänen, H.; Kainulainen, S.; Penzel, T.; Fanfulla, F.; Anttalainen, U.; Saaresranta, T.; Grote, L.; et al. Arousal burden is highest in supine sleeping position and during light sleep. J. Clin. Sleep Med. 2025, 21, 337–344. [Google Scholar] [CrossRef]
- Chicco, D.; Sichenze, A.; Jurman, G. A simple guide to the use of Student’s t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics. BioData Min. 2025, 18, 51. [Google Scholar] [CrossRef] [PubMed]
- Vink, J.J.T.; Klooster, D.C.W.; Ozdemir, R.A.; Westover, M.B.; Pascual-Leone, A.; Shafi, M.M. EEG Functional Connectivity is a Weak Predictor of Causal Brain Interactions. Brain Topogr. 2020, 33, 221–237. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Ahmed, W.; Toivanen, P.; Haataja, K. Toward Sleep Spindle Detection: A Comparative Survey of State-of-the-Art, Challenges and Future Research. IEEE Access 2025, 13, 182821–182845. [Google Scholar] [CrossRef]
- Takahashi, K.; Kuo, M.-F.; Nitsche, M.A. Sleep stage-specific effects of 0.75 Hz phase-synchronized rTMS and tACS on delta frequency activity during sleep. Sci. Rep. 2026, 16, 10520. [Google Scholar] [CrossRef] [PubMed]
- Pizza, F.; Fabbri, M.; Magosso, E.; Ursino, M.; Provini, F.; Ferri, R.; Montagna, P. Slow eye movements distribution during nocturnal sleep. Clin. Neurophysiol. 2011, 122, 1556–1561. [Google Scholar] [CrossRef]
- He, Z.L.; Tang, M.F.; Wang, P.; Du, L.D.; Chen, X.X.; Cheng, G.; Fang, Z. Cross-scenario automatic sleep stage classification using transfer learning and single-channel EEG. Biomed. Signal Process. Control 2023, 81, 13. [Google Scholar] [CrossRef]
- Tajbakhsh, N.; Shin, J.Y.; Gurudu, S.R.; Hurst, R.T.; Kendall, C.B.; Gotway, M.B.; Liang, J.M. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? IEEE Trans. Med. Imaging 2016, 35, 1299–1312. [Google Scholar] [CrossRef] [PubMed]
- Wang, G.T.; Li, W.Q.; Zuluaga, M.A.; Pratt, R.; Patel, P.A.; Aertsen, M.; Doel, T.; David, A.L.; Deprest, J.; Ourselin, S.; et al. Interactive Medical Image Segmentation Using Deep Learning With Image-Specific Fine Tuning. IEEE Trans. Med. Imaging 2018, 37, 1562–1573. [Google Scholar] [CrossRef] [PubMed]














| Layer Type | Parameters | Output Shape | Signal | |||
|---|---|---|---|---|---|---|
| Feature Extraction Backbone | Stem | Input | - | (B, C, L) | EEG | |
| Conv1D | k = 7, s = 2, p = 3 | (B, 64, L/2) | ||||
| Max Pool | k = 3, s = 2, p = 1 | (B, 64, L/4) | ||||
| Stage 1 | Residual Bottleneck | ×3 | k = 1, s = 1 | (B, 256, L/4) | ||
| k = 3, s = 1, p = 1 | ||||||
| k = 1, s = 1 | ||||||
| Stage 2 | Residual Bottleneck | ×4 | k = 1, s = 1 | (B, 512, L/8) | ||
| k = 3, s = 2/1, p = 1 | ||||||
| k = 1, s = 1 | ||||||
| Stage 3 | Residual Bottleneck | ×6 | k = 1, s = 1 | (B, 1024, L/16) | ||
| k = 3, s = 2/1, p = 1 | ||||||
| k = 1, s = 1 | ||||||
| Stage 4 | Residual Bottleneck | ×3 | k = 1, s = 1 | (B, 2048, L/32) | ||
| k = 3, s = 2/1, p = 1 | ||||||
| k = 1, s = 1 | ||||||
| Conv1D | k = 1, s = 1 | (B, 1024, L/32) | ||||
| Stem | Input | - | (B, C, L) | EOG | ||
| Conv1D | k = 7, s = 2, p = 3 | (B, 64, L/2) (B, 64, L/4) | ||||
| Max Pool | k = 3, s = 2, p = 1 | |||||
| Stage 1 | Residual BasicBlock | ×2 | k = 3, s = 1, p = 1 | (B, 64, L/4) | ||
| k = 3, s = 1, p = 1 | ||||||
| Stage 2 | Residual BasicBlock | ×2 | k = 3, s = 2/1, p = 1 k = 3, s = 1, p = 1 | (B, 128, L/8) | ||
| Stage 3 | Residual BasicBlock | ×2 | k = 3, s = 2/1, p = 1 | (B, 256, L/16) | ||
| k = 3, s = 1, p = 1 | ||||||
| Stage 4 | Residual BasicBlock | ×2 | k = 3, s = 2/1, p = 1 | (B, 512, L/32) | ||
| k = 3, s = 1, p = 1 | ||||||
| Conv1D | k = 1, s = 1 | (B, 1024, L/32) | ||||
| Layer Type | Parameters | Output Shape | Signal | |
|---|---|---|---|---|
| Multimodal feature fusion module | Input | - | (B, 1024, L/32) | EEG |
| AdaptiveAvgPool1D | - | (B, 1024, 1) | ||
| Conv1D | k = 1 | (B, 1024, 1) | ||
| Conv1D | k = 1 | (B, 1024, 1) | ||
| Element-wise product | - | (B, 1024, L/32) | EOG | |
| Input | - | (B, 1024, L/32) | EOG | |
| AdaptiveAvgPool1D | - | (B, 1024, 1) | ||
| Conv1D | k = 1 | (B, 1024, 1) | ||
| Conv1D | k = 1 | (B, 1024, 1) | ||
| Element-wise product | - | (B, 1024, L/32) | EEG | |
| Concat | - | (B, 2048, L/32) | EEG-EOG |
| Layer Type | Parameters | Output Shape | ||
|---|---|---|---|---|
| Gated temporal-channel attention block | Input | - | (B, 2048, L/32) | |
| Temporal Attention | AdaptiveAvgPool1D | - | (B, 2048, 1) | |
| Conv1D | k = 1 | (B, 256, 1) | ||
| Conv1D | k = 1 | (B, 2048, 1) | ||
| Element-wise product | - | (B, 2048, L/32) | ||
| Gated Fusion | Conv1D | k = 1 | (B, 2048, L/32) | |
| Element-wise product | - | (B, 2048, L/32) | ||
| Element-wise addition | - | (B, 2048, L/32) | ||
| Efficient Channel Attention | AdaptiveAvgPool1D | - | (B, 2048, 1) | |
| Transpose | - | (B, 1, 2048) | ||
| Conv1d | k = 7, s = 1, p = 3 | (B, 1, 2048) | ||
| Transpose | - | (B, 2048, 1) | ||
| Element-wise addition | - | (B, 2048, L/32) |
| Layer Type | Parameters | Output Shape | |
|---|---|---|---|
| Classification head | Input | - | (B, 2048, L/32) |
| AdaptiveAvgPool1D | - | (B, 2048, 1) | |
| Flatten | - | (B, 2048) | |
| Linear | - | (B, 5) |
| Dataset | Signal | OA (%) | |
|---|---|---|---|
| Before Interaction | After Interaction | ||
| Sleep_edf | EEG | 89.60 | 90.03 |
| EOG | 87.79 | 89.18 | |
| SHHS | EEG | 85.97 | 86.52 |
| EOG | 83.28 | 85.14 | |
| HSP | EEG | 83.26 | 83.93 |
| EOG | 78.36 | 82.18 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Gui, R.; Wang, C.; Niu, Q.; Wang, L. MFSleepNet: An Interactive Multimodal Fusion Framework for Automatic Sleep Staging. Sensors 2026, 26, 3085. https://doi.org/10.3390/s26103085
Gui R, Wang C, Niu Q, Wang L. MFSleepNet: An Interactive Multimodal Fusion Framework for Automatic Sleep Staging. Sensors. 2026; 26(10):3085. https://doi.org/10.3390/s26103085
Chicago/Turabian StyleGui, Ranran, Chen Wang, Qunfeng Niu, and Li Wang. 2026. "MFSleepNet: An Interactive Multimodal Fusion Framework for Automatic Sleep Staging" Sensors 26, no. 10: 3085. https://doi.org/10.3390/s26103085
APA StyleGui, R., Wang, C., Niu, Q., & Wang, L. (2026). MFSleepNet: An Interactive Multimodal Fusion Framework for Automatic Sleep Staging. Sensors, 26(10), 3085. https://doi.org/10.3390/s26103085

