Multimodal Classification Algorithms for Emotional Stress Analysis with an ECG-Centered Framework: A Comprehensive Review
Abstract
1. Introduction
2. Physiological and Psychometric Basis of Emotional Stress
2.1. Physiological Foundations of Multimodal Stress Assessment
2.2. Common Stress-Related Questionnaires in Stress Assessment
3. Multimodal Stress Databases
3.1. Public and Self-Collected Stress Databases
3.2. Experimental Paradigms and Methodological Insights
4. Preprocessing and Feature Extraction
4.1. Preprocessing Overview
4.2. Feature Extraction for Emotional-Stress Classification
5. Multimodal Feature Fusion and Cross-Modal Representation
5.1. Multimodal Feature Fusion Strategies
| Strategy | Key Properties | Typical Methods | Strengths | Limitations | When to Use |
|---|---|---|---|---|---|
| Early fusion (feature-level) | Fusion at input feature stage; shallow interaction; sensitive to misalignment; low compute; high interpretability. | PCA [126], Z-score [127], CCA/DCCA [128], autoencoder mapping [129]. | Simple and deployable; transparent feature attribution; effective when signals are well synchronized. | Feature redundancy and modality imbalance; overfitting risk under high dimensionality; weak under asynchrony or missing signals. | Well-aligned datasets, stable acquisition, and sufficient sample size with mature feature engineering. |
| Mid fusion (hidden-layer) | Fusion at intermediate layers; deep interaction via shared latent space; moderate missing-modality tolerance; high compute; moderate interpretability. | Cross-modal attention [130], latent projection [131], MoE or gating [118,132], Transformer fusion [133]. | Learns explicit cross-modal dependencies; handles heterogeneous feature spaces; improved robustness under noise with proper regularization. | Data-hungry; tuning-sensitive; possible training instability (modality dominance or gradient imbalance); heavier deployment cost. | Asynchronous or heterogeneous signals where learned interactions are required and sufficient training data or resources are available. |
| Late fusion (decision-level) | Fusion at output stage; minimal interaction; strong robustness to missing modalities; moderate compute; high interpretability per modality. | Weighted voting [134], probability averaging [135], stacking ensembles [136], attention-weighted decision fusion [137]. | Fault-tolerant and modular; easy to add or remove modalities; works under partial signal loss; strong reproducibility with unimodal baselines. | Limited deep inter-modal semantics; performance depends on unimodal model quality; may underperform when strong cross-modal coupling exists. | Real-world deployment with missing data, sensor dropouts, or when modularity and reliability are priorities. |
5.2. Cross-Modal Representation and Consistency Modeling
5.3. Robustness and Generalization in Real-World Scenarios
6. Classification Algorithms for Emotional Stress Recognition
6.1. Traditional Machine Learning-Based Approaches
6.2. Deep Learning-Based Approaches
6.3. Comparison and Analysis
7. Key Challenges in Multimodal Deep Learning
8. Discussion and Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ECG | Electrocardiogram |
| EDA | Electrodermal Activity |
| EMG | Electromyography |
| EEG | Electroencephalography |
| HRV | Heart Rate Variability |
| SCL | Skin Conductance Level |
| SCR | Skin Conductance Response |
| MUAP | Motor Unit Action Potential |
| PSS | Perceived Stress Scale |
| SRQ | Stress Response Questionnaire |
| STAI | State–Trait Anxiety Inventory |
| PANAS | Positive and Negative Affect Schedule |
| SVM | Support Vector Machine |
| KNN | k-Nearest Neighbors |
| HMM | Hidden Markov Model |
| CRF | Conditional Random Field |
| CNN | Convolutional Neural Network |
| RNN | Recurrent Neural Network |
| LSTM | Long Short-Term Memory |
| GRU | Gated Recurrent Unit |
| KAN | Kolmogorov–Arnold Network |
| MoE | Mixture of Experts |
| VAE | Variational Autoencoder |
| GAN | Generative Adversarial Network |
| DTW | Dynamic Time Warping |
| MMD | Maximum Mean Discrepancy |
| DANN | Domain-Adversarial Neural Network |
| CORAL | Correlation Alignment |
| LOSOCV | Leave-One-Subject-Out Cross Validation |
| AUROC | Area Under the Receiver Operating Characteristic Curve |
| BVP | Blood Volume Pulse |
| IBI | Inter-Beat Interval |
| PPG | Photoplethysmography |
| VR | Virtual Reality |
| AR | Augmented Reality |
References
- Christiansen, J.; Qualter, P.; Friis, K.; Pedersen, S.; Lund, R.; Andersen, C.; Bekker-Jeppesen, M.; Lasgaard, M. Associations of loneliness and social isolation with physical and mental health among adolescents and young adults. Perspect. Public Health 2021, 141, 226–236. [Google Scholar] [CrossRef] [PubMed]
- Baxter, A.J.; Scott, K.M.; Ferrari, A.J.; Norman, R.E.; Vos, T.; Whiteford, H.A. Challenging the myth of an “epidemic” of common mental disorders: Trends in the global prevalence of anxiety and depression between 1990 and 2010. Depress. Anxiety 2014, 31, 506–516. [Google Scholar] [CrossRef]
- Franks, K.H.; Rowsthorn, E.; Bransby, L.; Lim, Y.Y.; Chong, T.T.J.; Pase, M.P. Association of self-reported psychological stress with cognitive decline: A systematic review. Neuropsychol. Rev. 2023, 33, 856–870. [Google Scholar] [CrossRef]
- Merten, T. The self-report fallacy: When diagnosis predominantly relies on subjective symptom report. Curr. Opin. Psychol. 2025, 65, 102096. [Google Scholar] [CrossRef]
- Sara, J.D.S.; Toya, T.; Ahmad, A.; Clark, M.M.; Gilliam, W.P.; Lerman, L.O.; Lerman, A. Mental stress and its effects on vascular health. Mayo Clin. Proc. 2022, 97, 951–990. [Google Scholar] [CrossRef]
- Janse, P.D.; van Sonsbeek, M.A.; Bovendeerd, B.; de Jong, K. Progress feedback in psychotherapy: Advantages, challenges, and future directions. Curr. Opin. Psychol. 2025, 66, 102110. [Google Scholar] [CrossRef]
- Naeem, M.; Fawzi, S.A.; Anwar, H.; Malek, A.S. Wearable ECG systems for accurate mental stress detection: A scoping review. J. Public Health 2025, 33, 1181–1197. [Google Scholar] [CrossRef]
- Kapase·, A.B.; Uke, N. A comprehensive review in affective computing: An exploration of artificial intelligence in unimodal and multimodal emotion recognition systems. Int. J. Speech Technol. 2025, 28, 541–563. [Google Scholar] [CrossRef]
- Ometov, A.; Mezina, A.; Lin, H.C.; Arponen, O.; Burget, R.; Nurmi, J. Stress and emotion open access data: A review on datasets, modalities, methods, challenges, and future research perspectives. J. Healthc. Inform. Res. 2025, 9, 247–279. [Google Scholar] [CrossRef]
- Haque, Y.; Zawad, R.S.; Rony, C.S.A.; Al Banna, H.; Ghosh, T.; Kaiser, M.S.; Mahmud, M. State-of-the-art of stress prediction from heart rate variability using artificial intelligence. Cogn. Comput. 2024, 16, 455–481. [Google Scholar] [CrossRef]
- Zapf, H.; Boettcher, J.; Haukeland, Y.; Orm, S.; Coslar, S.; Fjermestad, K. A systematic review of the association between parent-child communication and adolescent mental health. JCPP Adv. 2024, 4, e12205. [Google Scholar] [CrossRef]
- Madaan, A.; Singh, A. From sensors to insight: A review of physiological signal processing for stress prediction. In Proceedings of the AI-Driven Smart Healthcare for Society 5.0, Kolkata, India, 14–15 February 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 182–187. [Google Scholar] [CrossRef]
- Kotlęga, D.; Gołąb-Janowska, M.; Masztalewicz, M.; Ciećwież, S.; Nowacki, P. The emotional stress and risk of ischemic stroke. Neurol. Neurochir. Pol. 2016, 50, 265–270. [Google Scholar] [CrossRef]
- Lazarus, R.S. Psychological stress and coping in adaptation and illness. Int. J. Psychiatry Med. 1974, 5, 321–333. [Google Scholar] [CrossRef]
- Levenson, R.W. The autonomic nervous system and emotion. Emot. Rev. 2014, 6, 100–112. [Google Scholar] [CrossRef]
- Cipresso, P.; Colombo, D.; Riva, G. Computational psychometrics using psychophysiological measures for the assessment of acute mental stress. Sensors 2019, 19, 781. [Google Scholar] [CrossRef] [PubMed]
- Giannakakis, G.; Grigoriadis, D.; Giannakaki, K.; Simantiraki, O.; Roniotis, A.; Tsiknakis, M. Review on psychological stress detection using biosignals. IEEE Trans. Affect. Comput. 2022, 13, 440–460. [Google Scholar] [CrossRef]
- Hammad, M.; Maher, A.; Wang, K.; Jiang, F.; Amrani, M. Detection of abnormal heart conditions based on characteristics of ECG signals. Measurement 2018, 125, 634–644. [Google Scholar] [CrossRef]
- Kim, H.G.; Cheon, E.J.; Bai, D.S.; Lee, Y.H.; Koo, B.H. Stress and heart rate variability: A meta-analysis and review of the literature. Psychiatry Investig. 2018, 15, 235–245. [Google Scholar] [CrossRef] [PubMed]
- Billman, G.E. The LF/HF ratio does not accurately measure cardiac sympatho-vagal balance. Front. Physiol. 2013, 4, 26. [Google Scholar] [CrossRef]
- Castaldo, R.; Melillo, P.; Bracale, U.; Caserta, M.; Triassi, M.; Pecchia, L. Acute mental stress assessment via short-term HRV analysis in healthy adults: A systematic review with meta-analysis. Biomed. Signal Process. Control 2015, 18, 370–377. [Google Scholar] [CrossRef]
- Moritani, T.; Stegeman, D.; Merletti, R. Basic physiology and biophysics of EMG signal generation. In Electromyography: Physiology, Engineering, and Noninvasive Applications; Wiley: Hoboken, NJ, USA, 2004; pp. 1–25. [Google Scholar]
- Farina, D.; Stegeman, D.; Merletti, R. Biophysics of the generation of EMG signals. In Surface Electromyography: Physiology, Engineering, and Applications; Wiley: Hoboken, NJ, USA, 2016; pp. 1–24. [Google Scholar]
- Kret, M.; Stekelenburg, J.; Roelofs, K.; De Gelder, B. Perception of Face and Body Expressions Using Electromyography, Pupillometry and Gaze Measures. Front. Psychol. 2013, 4, 28. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, M.; Grillo, M.; Taebi, A.; Kaya, M.; Thibbotuwawa Gamage, P. A Comprehensive Analysis of Trapezius Muscle EMG Activity in Relation to Stress and Meditation. BioMedInformatics 2024, 4, 1047–1058. [Google Scholar] [CrossRef]
- Rissanen, J. Collecting Biosignals: Data Experiments with EDA and EEG. Master’s Thesis, Tampere University of Applied Sciences, Tampere, Finland, 2024. [Google Scholar]
- Braithwaite, J.J.; Watson, D.G.; Jones, R.; Rowe, M. A guide for analysing electrodermal activity (EDA) & skin conductance responses (SCRs) for psychological experiments. Psychophysiology 2013, 49, 1017–1034. [Google Scholar]
- Rahma, O.N.; Putra, A.P.; Rahmatillah, A.; Putri, Y.S.K.A.; Fajriaty, N.D.; Ain, K.; Chai, R. Electrodermal activity for measuring cognitive and emotional stress level. J. Med. Signals Sens. 2022, 12, 155–162. [Google Scholar] [CrossRef]
- Liu, Y.; Du, S. Psychological stress level detection based on electrodermal activity. Behav. Brain Res. 2018, 341, 50–53. [Google Scholar] [CrossRef] [PubMed]
- Bari, D.S. Gender differences in tonic and phasic electrodermal activity components. Sci. J. Univ. Zakho 2020, 8, 29–33. [Google Scholar] [CrossRef]
- Greco, A.; Valenza, G.; Scilingo, E.P. Modeling for the Analysis of the EDA. In Advances in Electrodermal Activity Processing with Applications for Mental Health: From Heuristic Methods to Convex Optimization; Springer: Berlin/Heidelberg, Germany, 2016; pp. 19–33. [Google Scholar]
- Ernst, H.; Scherpf, M.; Pannasch, S.; Helmert, J.R.; Malberg, H.; Schmidt, M. Assessment of the human response to acute mental stress—An overview and a multimodal study. PLoS ONE 2023, 18, e0294069. [Google Scholar] [CrossRef]
- Khalili, M.; GholamHosseini, H.; Lowe, A.; Kuo, M.M. Motion artifacts in capacitive ECG monitoring systems: A review of existing models and reduction techniques. Med. Biol. Eng. Comput. 2024, 62, 3599–3622. [Google Scholar] [CrossRef]
- Boyer, M.; Bouyer, L.; Roy, J.S.; Campeau-Lecours, A. Reducing Noise, Artifacts and Interference in Single-Channel EMG Signals: A Review. Sensors 2023, 23, 2927. [Google Scholar] [CrossRef]
- Yang, S.; Gao, Y.; Zhu, Y.; Zhang, L.; Xie, Q.; Lu, X.; Wang, F.; Zhang, Z. A deep learning approach to stress recognition through multimodal physiological signal image transformation. Sci. Rep. 2025, 15, 22258. [Google Scholar] [CrossRef]
- Li, J.; Li, J.; Wang, X.; Zhan, X.; Zeng, Z. A Domain Generalization and Residual Network-Based Emotion Recognition from Physiological Signals. Cyborg Bionic Syst. 2024, 5, 74. [Google Scholar] [CrossRef] [PubMed]
- Qasim, M.S.; Bari, D.S.; Martinsen, Ø.G. Influence of ambient temperature on tonic and phasic electrodermal activity components. Physiol. Meas. 2022, 43, 065001. [Google Scholar] [CrossRef] [PubMed]
- Khan, T.H.; Villanueva, I.; Vicioso, P.; Husman, J. Exploring relationships between electrodermal activity, skin temperature, and performance during. In Proceedings of the 2019 IEEE Frontiers in Education Conference (FIE), Covington, KY, USA, 16–19 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Posada-Quintero, H.F.; Reljin, N.; Mills, C.; Mills, I.; Florian, J.P.; VanHeest, J.L.; Chon, K.H. Time-varying analysis of electrodermal activity during exercise. PLoS ONE 2018, 13, e0198328. [Google Scholar] [CrossRef]
- Bjaastad, J.F.; Jensen-Doss, A.; Moltu, C.; Jakobsen, P.; Hagenberg, H.; Joa, I. Attitudes toward standardized assessment tools and their use among clinicians in a public mental health service. Nord. J. Psychiatry 2019, 73, 387–396. [Google Scholar] [CrossRef]
- Lee, E.H. Review of the psychometric evidence of the perceived stress scale. Asian Nurs. Res. 2012, 6, 121–127. [Google Scholar] [CrossRef]
- Spielberger, C.D.; Gorsuch, R.L.; Lushene, R.; Vagg, P.R.; Jacobs, G.A. Manual for the State-Trait Anxiety Inventory (STAI); Consulting Psychologists Press: Palo Alto, CA, USA, 1983; ISBN 0-87120-197-6. [Google Scholar]
- Watson, D.; Clark, L.A.; Tellegen, A. Development and validation of brief measures of positive and negative affect: The PANAS scales. J. Personal. Soc. Psychol. 1988, 54, 1063. [Google Scholar] [CrossRef]
- Beusenberg, M.; Orley, J.H.; WHO. A User’s Guide to the Self Reporting Questionnaire (SRQ/Compiled by M. Beusenberg and J. Orley; WHO: Geneva, Switzerland, 1994. [Google Scholar]
- Ringgold, V.; Burkhardt, F.; Abel, L.; Kurz, M.; Müller, V.; Richer, R.; Eskofier, B.M.; Shields, G.S.; Rohleder, N. Multimodal stress assessment: Connecting task-related changes in self-reported stress, salivary biomarkers, heart rate, and facial expressions in the context of the stress response to the Trier Social Stress Test. Psychoneuroendocrinology 2025, 180, 107560. [Google Scholar] [CrossRef]
- Wuensch, M.; Frenzel, A.C.; Pekrun, R.; Sun, L. Enjoyable for some, stressful for others? Physiological and subjective indicators of achievement emotions during adaptive versus fixed-item testing. Contemp. Educ. Psychol. 2025, 82, 102388. [Google Scholar] [CrossRef]
- Kalateh, S.; Estrada-Jimenez, L.A.; Nikghadam-Hojjati, S.; Barata, J. A Systematic Review on Multimodal Emotion Recognition: Building Blocks, Current State, Applications, and Challenges. IEEE Access 2024, 12, 103976–104019. [Google Scholar] [CrossRef]
- Ladakis, I.; Fotopoulos, D.; Chouvarda, I. Integrative Analysis of Open Datasets for Stress Prediction. J. Med. Biol. Eng. 2025, 45, 385–399. [Google Scholar] [CrossRef]
- Zhang, X.; Wei, X.; Zhou, Z.; Zhao, Q.; Zhang, S.; Yang, Y.; Li, R.; Hu, B. Dynamic Alignment and Fusion of Multimodal Physiological Patterns for Stress Recognition. IEEE Trans. Affect. Comput. 2024, 15, 685–696. [Google Scholar] [CrossRef]
- Schmidt, P.; Reiss, A.; Duerichen, R.; Marberger, C.; Van Laerhoven, K. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, Paris, France, 9–13 October 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 400–408. [Google Scholar]
- Miranda-Correa, J.A.; Abadi, M.K.; Sebe, N.; Patras, I. Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Trans. Affect. Comput. 2018, 12, 479–493. [Google Scholar] [CrossRef]
- Soleymani, M.; Lichtenauer, J.; Pun, T.; Pantic, M. A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 2011, 3, 42–55. [Google Scholar] [CrossRef]
- Katsigiannis, S.; Ramzan, N. DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J. Biomed. Health Inform. 2017, 22, 98–107. [Google Scholar] [CrossRef]
- Healey, J.A.; Picard, R.W. Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans. Intell. Transp. Syst. 2005, 6, 156–166. [Google Scholar] [CrossRef]
- Jiang, Z.; Seyedi, S.; Griner, E.; Abbasi, A.; Rad, A.B.; Kwon, H.; Cotes, R.O.; Clifford, G.D. Evaluating and mitigating unfairness in multimodal remote mental health assessments. PLoS Digit. Health 2024, 3, e0000413. [Google Scholar] [CrossRef]
- Mordacq, J.; Milecki, L.; Vakalopoulou, M.; Oudot, S.; Kalogeiton, V. Multimodal Learning for Detecting Stress under Missing Modalities. In Proceedings of the WiCV 2024—Women in Computer Vision Workshop in Conjunction with CVPR, Seattle, WA, USA, 18 June 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Rodrigues, S.; Kaiseler, M.; Queirós, C. Psychophysiological Assessment of Stress Under Ecological Settings. Eur. Psychol. 2015, 20, 204–226. [Google Scholar] [CrossRef]
- Zhu, X.; Guo, C.; Feng, H.; Huang, Y.; Feng, Y.; Wang, X.; Wang, R. A review of key technologies for emotion analysis using multimodal information. Cogn. Comput. 2024, 16, 1504–1530. [Google Scholar] [CrossRef]
- Carvalho, M.; Pinho, A.J.; Brás, S. Resampling approaches to handle class imbalance: A review from a data perspective. J. Big Data 2025, 12, 71. [Google Scholar] [CrossRef]
- Huang, Y.; Zhang, Z.; Yang, Y.; Mo, P.C.; Zhang, Z.; He, J.; Hu, S.; Wang, X.; Li, Y. Exploring Skin Potential Signals in Electrodermal Activity: Identifying Key Features for Attention State Differentiation. IEEE Access 2024, 12, 100832–100847. [Google Scholar] [CrossRef]
- Akpinar, M.H.; Sengur, A.; Salvi, M.; Seoni, S.; Faust, O.; Mir, H.; Molinari, F.; Acharya, U.R. Synthetic Data Generation via Generative Adversarial Networks in Healthcare: A Systematic Review of Image- and Signal-Based Studies. IEEE Open J. Eng. Med. Biol. 2025, 6, 183–192. [Google Scholar] [CrossRef]
- Salazar, A.; Vergara, L.; Safont, G. Generative Adversarial Networks and Markov Random Fields for oversampling very small training sets. Expert Syst. Appl. 2021, 163, 113819. [Google Scholar] [CrossRef]
- Fajardo, V.A.; Findlay, D.; Jaiswal, C.; Yin, X.; Houmanfar, R.; Xie, H.; Liang, J.; She, X.; Emerson, D.B. On oversampling imbalanced data with deep conditional generative models. Expert Syst. Appl. 2021, 169, 114463. [Google Scholar] [CrossRef]
- Wu, Y.; Mi, Q.; Gao, T. A comprehensive review of multimodal emotion recognition: Techniques, challenges, and future directions. Biomimetics 2025, 10, 418. [Google Scholar] [CrossRef]
- Dominguez-Catena, I.; Paternain, D.; Galar, M. Metrics for dataset demographic bias: A case study on facial expression recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5209–5226. [Google Scholar] [CrossRef] [PubMed]
- Xu, H.; Zeng, M.; Liu, H.; Xie, X.; Tian, L.; Yan, J.; Chen, C. A dynamic transfer network for cross-database atrial fibrillation detection. Biomed. Signal Process. Control 2024, 90, 105799. [Google Scholar] [CrossRef]
- Al-Azani, S.; El-Alfy, E.S.M. A review and critical analysis of multimodal datasets for emotional AI. Artif. Intell. Rev. 2025, 58, 334. [Google Scholar] [CrossRef]
- Zhang, B.; Morère, Y.; Sieler, L.; Langlet, C.; Bolmont, B.; Bourhis, G. Reaction time and physiological signals for stress recognition. Biomed. Signal Process. Control 2017, 38, 100–107. [Google Scholar] [CrossRef]
- Romeo, Z.; Fusina, F.; Semenzato, L.; Bonato, M.; Angrilli, A.; Spironelli, C. Comparison of Slides and Video Clips as Different Methods for Inducing Emotions: An Electroencephalographic Alpha Modulation Study. Front. Hum. Neurosci. 2022, 16, 901422. [Google Scholar] [CrossRef]
- Parsons, T.D. Virtual Reality for Enhanced Ecological Validity and Experimental Control in the Clinical, Affective and Social Neurosciences. Front. Hum. Neurosci. 2015, 9, e00660. [Google Scholar] [CrossRef]
- Williams, J.M.G.; Mathews, A.; MacLeod, C. The emotional Stroop task and psychopathology. Psychol. Bull. 1996, 120, 3. [Google Scholar] [CrossRef] [PubMed]
- Galy, E.; Mélan, C. Effects of cognitive appraisal and mental workload factors on performance in an arithmetic task. Appl. Psychophysiol. Biofeedback 2015, 40, 313–325. [Google Scholar] [CrossRef]
- Lang, P.J.; Bradley, M.M.; Cuthbert, B.N. International affective picture system (IAPS): Technical manual and affective ratings. Nimh Cent. Study Emot. Atten. 1997, 1, 3. [Google Scholar]
- Allen, A.P.; Kennedy, P.J.; Dockray, S.; Cryan, J.F.; Dinan, T.G.; Clarke, G. The trier social stress test: Principles and practice. Neurobiol. Stress 2017, 6, 113–126. [Google Scholar] [CrossRef]
- Razzak, R.; Li, Y.; Sokhadze, E.; He, S. Stress and Driving Performance Evaluation through VR and Physiological Metrics: A Pilot Study. JISARA 2025, 18, 30. [Google Scholar] [CrossRef]
- Nasri, M. Towards Intelligent VR Training: A Physiological Adaptation Framework for Cognitive Load and Stress Detection. In Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, New York, NY, USA, 16–19 June 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 419–423. [Google Scholar]
- Mahesh, B.; Weber, D.; Garbas, J.; Foltyn, A.; Oppelt, M.; Becker, L.; Rohleder, N.; Lang, N. Setup for Multimodal Human Stress Dataset Collection. In Proceedings of the 12th International Conference on Methods and Techniques in Behavioral Research, and 6th Seminar on Behavioral Methods, Virtual, 18–20 May 2022. [Google Scholar]
- Ferreira, S.O. Emotional activation in human beings: Procedures for experimental stress induction. Psicol. USP 2019, 30, e180176. [Google Scholar] [CrossRef]
- Chaptoukaev, H.; Strizhkova, V.; Panariello, M.; D’Alpaos, B.; Reka, A.; Manera, V.; Thümmler, S.; Ismailova, E.; Evans, N.; Bremond, F.; et al. StressID: A Multimodal Dataset for Stress Identification. Adv. Neural Inf. Process. Syst. 2023, 36, 29798–29811. [Google Scholar]
- Miranda Calero, J.A.; Gutiérrez-Martín, L.; Rituerto-González, E.; Romero-Perales, E.; Lanza-Gutiérrez, J.M.; Peláez-Moreno, C.; López-Ongil, C. Wemac: Women and emotion multi-modal affective computing dataset. Sci. Data 2024, 11, 1182. [Google Scholar] [CrossRef]
- Rashed, A.; Shirmohammadi, S.; Hefeeda, M. Descriptor: Multimodal Dataset for Player Engagement Analysis in Video Games (MultiPENG). IEEE Data Descr. 2025, 2, 17–25. [Google Scholar] [CrossRef]
- Heimerl, A.; Prajod, P.; Mertes, S.; Baur, T.; Kraus, M.; Liu, A.; Risack, H.; Rohleder, N.; André, E.; Becker, L. The ForDigitStress Dataset: A Multi-Modal Dataset for Automatic Stress Recognition. IEEE Trans. Affect. Comput. 2025, 16, 1219–1234. [Google Scholar] [CrossRef]
- Jaiswal, M.; Bara, C.-P.; Luo, Y.; Burzo, M.; Mihalcea, R.; Provost, E.M. MuSE: A Multimodal Dataset of Stressed Emotion. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 13–15 May 2020; European Language Resources Association: Reykjavik, Iceland, 2020. [Google Scholar]
- Hosseini, M.; Sohrab, F.; Gottumukkala, R.; Bhupatiraju, R.T.; Katragadda, S.; Raitoharju, J.; Iosifidis, A.; Gabbouj, M. EmpathicSchool: A multimodal dataset for real-time facial expressions and physiological data analysis under different stress conditions. arXiv 2022, arXiv:2209.13542. [Google Scholar]
- Tabbaa, L.; Searle, R.; Bafti, S.M.; Hossain, M.M.; Intarasisrisawat, J.; Glancy, M.; Ang, C.S. VREED: Virtual Reality Emotion Recognition Dataset Using Eye Tracking & Physiological Measures. Proc. Acm Interact. Mob. Wearable Ubiquitous Technol. 2022, 1–20. [Google Scholar] [CrossRef]
- Oppelt, M.P.; Foltyn, A.; Deuschel, J.; Lang, N.R.; Holzer, N.; Eskofier, B.M.; Yang, S.H. ADABase: A Multimodal Dataset for Cognitive Load Estimation. Sensors 2023, 23, 340. [Google Scholar] [CrossRef]
- Soon, P.S.; Lim, W.M.; Gaur, S.S. The role of emotions in augmented reality. Psychol. Mark. 2023, 40, 2387–2412. [Google Scholar] [CrossRef]
- Dahiya, V. Interactive Emotional Resonance: Bidirectional Communication Between Heart Rate-Derived Player States, Game Music, and Gameplay Events. Master’s Thesis, Drexel University, Philadelphia, PA, USA, 2025. [Google Scholar]
- Paniagua-Gómez, M.; Fernandez-Carmona, M. Trends and Challenges in Real-Time Stress Detection and Modulation: The Role of the IoT and Artificial Intelligence. Electronics 2025, 14, 2581. [Google Scholar] [CrossRef]
- Yan, J.; Yue, Y.; Yu, K.; Zhou, X.; Liu, Y.; Wei, J.; Yang, Y. Multi-Representation Joint Dynamic Domain Adaptation Network for Cross-Database Facial Expression Recognition. Electronics 2024, 13, 1470. [Google Scholar] [CrossRef]
- Yan, L.; Gašević, D.; Echeverria, V.; Zhao, L.; Jin, Y.; Li, X.; Martinez-Maldonado, R. In Sync or Out of Sync? Understanding Stress and Learning Performance in Collaborative Healthcare Simulations through Physiological Synchrony and Arousal. Int. J. Artif. Intell. Educ. 2025, 35, 2421–2452. [Google Scholar] [CrossRef]
- Yadav, G.; Bokhari, M.U. Hybrid classifier for optimizing mental health prediction: Feature engineering and fusion technique. Int. J. Ment. Health Addict. 2024, 22, 1–41. [Google Scholar] [CrossRef]
- Silva, R.; Salvador, G.; Bota, P.; Fred, A.; Plácido da Silva, H. Impact of sampling rate and interpolation on photoplethysmography and electrodermal activity signals’ waveform morphology and feature extraction. Neural Comput. Appl. 2023, 35, 5661–5677. [Google Scholar] [CrossRef]
- Li, Z.; Tian, Y.; Jin, Y.; Wei, X.; Wang, M.; Liu, J.; Liu, C. EDDM: A Novel ECG Denoising Method Using Dual-Path Diffusion Model. IEEE Trans. Instrum. Meas. 2025, 74, 2509815. [Google Scholar] [CrossRef]
- Huang, W.; Chen, Y.; Jiang, X.; Zhang, T.; Chen, Q. GJFusion: A channel-level correlation construction method for multimodal physiological signal fusion. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 20, 1–23. [Google Scholar] [CrossRef]
- Iammarino, E.; Marcantoni, I.; Sbrollini, A.; Morettini, M.; Burattini, L. Normalization of Electrocardiogram-Derived Cardiac Risk Indices: A Scoping Review of the Open-Access Literature. Appl. Sci. 2024, 14, 9457. [Google Scholar] [CrossRef]
- Kantharaju, P.; Vakacherla, S.S.; Jacobson, M.; Jeong, H.; Mevada, M.N.; Zhou, X.; Major, M.J.; Kim, M. Framework for personalizing wearable devices using real-time physiological measures. IEEE Access 2023, 11, 81389–81400. [Google Scholar] [CrossRef]
- Orlhac, F.; Eertink, J.J.; Cottereau, A.S.; Zijlstra, J.M.; Thieblemont, C.; Meignan, M.; Boellaard, R.; Buvat, I. A guide to ComBat harmonization of imaging biomarkers in multicenter studies. J. Nucl. Med. 2022, 63, 172–179. [Google Scholar] [CrossRef]
- Rahman, S.; Karmakar, C.; Natgunanathan, I.; Yearwood, J.; Palaniswami, M. Robustness of electrocardiogram signal quality indices. J. R. Soc. Interface 2022, 19, 20220012. [Google Scholar] [CrossRef]
- Bahador, N. Assessment of Neurological Function with Multimodal and Multichannel Physiological Signal Analysis Using Machine and Deep Learning Techniques. Ph.D. Thesis, University of Oulu, Oulu, Finland, 2024. [Google Scholar]
- Liu, C.L.; Xiao, B.; Hsieh, C.H. Multimodal fusion of spatial-temporal and frequency representations for enhanced ECG classification. Inf. Fusion 2025, 118, 102999. [Google Scholar] [CrossRef]
- Dalmeida, K.M.; Masala, G.L. HRV features as viable physiological markers for stress detection using wearable devices. Sensors 2021, 21, 2873. [Google Scholar] [CrossRef]
- Schneider, M.; Kraemmer, M.M.; Weber, B.; Schwerdtfeger, A.R. Life events are associated with elevated heart rate and reduced heart complexity to acute psychological stress. Biol. Psychol. 2021, 163, 108116. [Google Scholar] [CrossRef]
- Chen, T.; Ma, Y.; Pan, Z.; Wang, W.; Yu, J. Fusion of multi-scale feature extraction and adaptive multi-channel graph neural network for 12-lead ECG classification. Comput. Methods Programs Biomed. 2025, 265, 108725. [Google Scholar] [CrossRef] [PubMed]
- Telangore, H.; Sharma, N.; Sharma, M.; Acharya, U.R. A novel ECG-based approach for classifying psychiatric disorders: Leveraging wavelet scattering networks. Med. Eng. Phys. 2025, 135, 104275. [Google Scholar] [CrossRef]
- Rauf, U.; Saeed, S.M.U. Towards Improved Classification of Perceived Stress using Time Domain Features. IEEE Access 2024, 12, 51650–51664. [Google Scholar] [CrossRef]
- Xiang, Y.; Zhang, X.; Zhang, W.; Dou, Z.; Wang, T. Wrist Motion Regression Using EMG Attention Feature Fusion Algorithm. IEEE Sens. J. 2025. early access. [Google Scholar] [CrossRef]
- Kartowisastro, I.H.; Trisetyarso, A.; Budiharto, W.; Sudimanto. Pain Classification Using Discrete Wavelet Transform Feature Extraction and Machine Learning Techniques. IEEE Access 2025, 13, 45912–45922. [Google Scholar] [CrossRef]
- Ghosh, S.; Tripathi, K.; Garg, A.; Singh, D.; Prasad, A.; Bhavsar, A.; Dutt, V. Predicting Stress among Students via Psychometric Assessments and Machine Learning. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments, Crete, Greece, 26–28 June 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 662–669. [Google Scholar]
- Dogan, G.; Akbulut, F.P. Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress. Neural Comput. Appl. 2023, 35, 24435–24454. [Google Scholar] [CrossRef]
- Rashid, N.; Mortlock, T.; Al Faruque, M.A. Stress detection using context-aware sensor fusion from wearable devices. IEEE Internet Things J. 2023, 10, 14114–14127. [Google Scholar] [CrossRef]
- Zhang, Q.; Wei, Y.; Han, Z.; Fu, H.; Peng, X.; Deng, C.; Hu, Q.; Xu, C.; Wen, J.; Hu, D.; et al. Multimodal fusion on low-quality data: A comprehensive survey. arXiv 2024, arXiv:2404.18947. [Google Scholar] [CrossRef]
- Hussain, M.; O’Nils, M.; Lundgren, J.; Mousavirad, S.J. A Comprehensive Review on Deep Learning-Based Data Fusion. IEEE Access 2024, 12, 180093–180124. [Google Scholar] [CrossRef]
- Bodaghi, M.; Hosseini, M.; Gottumukkala, R. A multimodal intermediate fusion network with manifold learning for stress detection. In Proceedings of the 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), Mt Pleasant, MI, USA, 13–14 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–8. [Google Scholar]
- Duan, J.; Xiong, J.; Li, Y.; Ding, W. Deep learning based multimodal biomedical data fusion: An overview and comparative review. Inf. Fusion 2024, 110, 102536. [Google Scholar] [CrossRef]
- Singh, R.; Ranjan, V.; Ganguly, A.; Halder, S. Physiological Patterns Classification of HRV Dynamics through Feature-Level Fusion and Machine Learning during Chi Meditation. Eng. Lett. 2025, 33, 1759. [Google Scholar]
- Wang, L.; Zhang, Y.; Zhou, B.; Cao, S.; Hu, K.; Tan, Y. Automatic depression prediction via cross-modal attention-based multi-modal fusion in social networks. Comput. Electr. Eng. 2024, 118, 109413. [Google Scholar] [CrossRef]
- Mengara Mengara, A.G.; Moon, Y.K. CAG-MoE: Multimodal Emotion Recognition with Cross-Attention Gated Mixture of Experts. Mathematics 2025, 13, 1907. [Google Scholar] [CrossRef]
- Roy, S.; Ogidi, F.; Etemad, A.; Dolatabadi, E.; Afkanpour, A. A Shared Encoder Approach to Multimodal Representation Learning. arXiv 2025, arXiv:2503.01654. [Google Scholar] [CrossRef]
- Zhu, J.; Li, Y.; Yang, C.; Cai, H.; Li, X.; Hu, B. Transformer-based fusion model for mild depression recognition with EEG and pupil area signals. Med. Biol. Eng. Comput. 2025, 63, 2011–2027. [Google Scholar] [CrossRef]
- Ghose, D.; Gitelson, O.; Scassellati, B. Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual Data. In Proceedings of the 26th International Conference on Multimodal Interaction (ICMI ’24), San Jose, Costa Rica, 4–8 November 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 22–32. [Google Scholar] [CrossRef]
- Kim, H.; Hong, T. Enhancing emotion recognition using multimodal fusion of physiological, environmental, personal data. Expert Syst. Appl. 2024, 249, 123723. [Google Scholar] [CrossRef]
- Wang, M.; Fan, S.; Li, Y.; Xie, Z.; Chen, H. Missing-modality enabled multi-modal fusion architecture for medical data. J. Biomed. Inform. 2025, 164, 104796. [Google Scholar] [CrossRef] [PubMed]
- A multimodal fusion model with multi-level attention mechanism for depression detection. Biomed. Signal Process. Control 2023, 82, 104561. [CrossRef]
- Pereira, L.M.; Salazar, A.; Vergara, L. A Comparative Analysis of Early and Late Fusion for the Multimodal Two-Class Problem. IEEE Access 2023, 11, 84283–84300. [Google Scholar] [CrossRef]
- Ramteke, R.B.; Gajbhiye, G.O.; Thool, V.R. Discriminating psychological stress levels: Multi-level attentive LSTM approach. Neural Comput. Appl. 2025, 37, 25579–25599. [Google Scholar] [CrossRef]
- Kim, S.H. Mifu-ER: Modality Quality Index-based Incremental Fusion for Emotion Recognition. IEEE Access 2025, 13, 112703–112719. [Google Scholar] [CrossRef]
- Frey, S.; Spacone, G.; Cossettini, A.; Guermandi, M.; Schilk, P.; Benini, L.; Kartsch, V. BioGAP-Ultra: A Modular Edge-AI Platform for Wearable Multimodal Biosignal Acquisition and Processing. arXiv 2025, arXiv:2508.13728. [Google Scholar] [CrossRef]
- Zhao, S.; Hu, Y.; Chen, J.; Wang, W.; Hu, X. Multi-source Signal Fusion with Contrastive AutoEncoder for Emotion Classification. IEEE J. Biomed. Health Inform. 2025, 1–14. [Google Scholar] [CrossRef]
- Farmani, J.; Bargshady, G.; Gkikas, S.; Tsiknakis, M.; Rojas, R.F. A CrossMod-Transformer deep learning framework for multi-modal pain detection through EDA and ECG fusion. Sci. Rep. 2025, 15, 29467. [Google Scholar] [CrossRef] [PubMed]
- Li, A.; Wu, M.; Ouyang, R.; Wang, Y.; Li, F.; Lv, Z. A Multimodal-Driven Fusion Data Augmentation Framework for Emotion Recognition. IEEE Trans. Artif. Intell. 2025, 6, 2083–2097. [Google Scholar] [CrossRef]
- Mansourian, N.; Mohammadi, A.; Ahmad, M.O.; Swamy, M. ECG-EmotionNet: Nested mixture of expert (NMoE) adaptation of ECG-foundation model for driver emotion recognition. arXiv 2025, arXiv:2503.01750. [Google Scholar]
- Gkikas, S.; Kyprakis, I.; Tsiknakis, M. Tiny-biomoe: A lightweight embedding model for biosignal analysis. In Proceedings of the Companion Proceedings of the 27th International Conference on Multimodal Interaction, Canberra, Australia, 13–17 October 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 117–126. [Google Scholar]
- Li, Y.; Li, Y.; He, X.; Fang, J.; Zhou, C.; Liu, C. Learner’s cognitive state recognition based on multimodal physiological signal fusion. Appl. Intell. 2025, 55, 127. [Google Scholar] [CrossRef]
- Choi, H.S. Emotion Recognition Using a Siamese Model and a Late Fusion-Based Multimodal Method in the WESAD Dataset with Hardware Accelerators. Electronics 2025, 14, 723. [Google Scholar] [CrossRef]
- Muke, P.Z.; Kozierkiewicz, A. Machine learning techniques to improve the cognitive workload classification using multimodal sensors’ data. IEEE Access 2025, 13, 173415–173443. [Google Scholar] [CrossRef]
- Guo, Y.; Yang, K.; Wu, Y. A multi-modality attention network for driver fatigue detection based on frontal EEG, EDA and PPG signals. IEEE J. Biomed. Health Inform. 2025, 29, 4009–4022. [Google Scholar] [CrossRef]
- Fang, C.; Sandino, C.; Mahasseni, B.; Minxha, J.; Pouransari, H.; Azemi, E.; Moin, A.; Zippi, E. Promoting cross-modal representations to improve multimodal foundation models for physiological signals. arXiv 2024, arXiv:2410.16424. [Google Scholar] [CrossRef]
- Hou, K.; Zhang, X.; Yang, Y.; Zhao, Q.; Yuan, W.; Zhou, Z.; Zhang, S.; Li, C.; Shen, J.; Hu, B. Emotion recognition from multimodal physiological signals via discriminative correlation fusion with a temporal alignment mechanism. IEEE Trans. Cybern. 2023, 54, 3079–3092. [Google Scholar] [CrossRef]
- Tang, Z.; Qi, J.; Zheng, Y.; Huang, J. A Comprehensive Benchmark for Electrocardiogram Time-Series. arXiv 2025, arXiv:2507.14206. [Google Scholar]
- Roy, K.; Rao, A.C.S. Self-Supervised Learning of Cardiac Dynamics Using Masked Volume Modeling (MVM). TechRxiv 2025. [Google Scholar] [CrossRef]
- Nourbakhsh, A.; Mohammadzade, H. Deep Time Warping for Multiple Time Series Alignment. arXiv 2025, arXiv:2502.16324. [Google Scholar] [CrossRef]
- Kurtek, S.; Wu, W.; Christensen, G.E.; Srivastava, A. Segmentation, alignment and statistical analysis of biosignals with application to disease classification. J. Appl. Stat. 2013, 40, 1270–1288. [Google Scholar] [CrossRef]
- Yang, H.C.; Lee, C.C. An attribute-invariant variational learning for emotion recognition using physiology. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1184–1188. [Google Scholar]
- Zhang, Y.; Cai, H.; Wu, J.; Xie, L.; Xu, M.; Ming, D.; Yan, Y.; Yin, E. EMG-based cross-subject silent speech recognition using conditional domain adversarial network. IEEE Trans. Cogn. Dev. Syst. 2023, 15, 2282–2290. [Google Scholar] [CrossRef]
- Li, W.; Hou, B.; Shao, S.; Huan, W.; Tian, Y. Spatial-temporal constraint learning for cross-subject EEG-based emotion recognition. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
- Viana-Matesanz, M.; Sánchez-Ávila, C. Adaptive normalization and feature extraction for electrodermal activity analysis. Mathematics 2024, 12, 202. [Google Scholar] [CrossRef]
- Hou, M.; Zhang, Z.; Liu, C.; Lu, G. Semantic alignment network for multi-modal emotion recognition. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 5318–5329. [Google Scholar] [CrossRef]
- Pang, L. Contrastive Learning Neural Network with Multimodal Physiological Signal Fusion for Early Detection of Cognitive Impairment. In Proceedings of the 2025 5th International Conference on Artificial Intelligence, Big Data and Algorithms (CAIBDA), Beijing, China, 20–22 June 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1691–1695. [Google Scholar]
- Ramaswamy, M.P.A.; Palaniswamy, S. EOG and PPG fusion for subject independent multimodal emotion recognition: A prototypical networks approach. In Proceedings of the 2024 2nd International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), Erode, India, 23–25 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1635–1640. [Google Scholar]
- Han, E.G.; Kang, T.K.; Lim, M.T. Physiological signal-based real-time emotion recognition based on exploiting mutual information with physiologically common features. Electronics 2023, 12, 2933. [Google Scholar] [CrossRef]
- Vieluf, S.; Hasija, T.; Kuschel, M.; Reinsberger, C.; Loddenkemper, T. Developing a deep canonical correlation-based technique for seizure prediction. Expert Syst. Appl. 2023, 234, 120986. [Google Scholar] [CrossRef]
- Zhang, T.; El Ali, A.; Wang, C.; Hanjalic, A.; Cesar, P. Weakly-supervised learning for fine-grained emotion recognition using physiological signals. IEEE Trans. Affect. Comput. 2022, 14, 2304–2322. [Google Scholar] [CrossRef]
- Demirel, B.U.; Holz, C. Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep Learning. arXiv 2025, arXiv:2502.19921. [Google Scholar] [CrossRef]
- Ji, J.; Cao, Y.; Ma, Y.; Yan, J. TITD: Enhancing optimized temporal position encoding with time intervals and temporal decay in irregular time series forecasting. Appl. Intell. 2025, 55, 415. [Google Scholar] [CrossRef]
- Zhao, S.; Ye, Z.; Adhin, B.; Vuori, M.; Laukkanen, J.; FinnGen; Fisch, S. Cardiorenal Interorgan Assessment via a Novel Clustering Method Using Dynamic Time Warping on Electrocardiogram: Model Development and Validation Study. JMIR Med. Inform. 2025, 13, e73353. [Google Scholar] [CrossRef]
- Wang, M.; You, C.; Zhang, W.; Xu, Z.; Liang, Q.; Li, Q. Causal ECGNet: Leveraging causal inference for robust ECG classification in cardiac disorders. Front. Physiol. 2025, 16, 1543417. [Google Scholar] [CrossRef] [PubMed]
- Manchanda, R.; Panchal, S.; Sandiri, R.; Sudhamsu, G.; Mehta, A.; Gupta, R.; Bhowmik, A.; Bukate, B.B. Energy-efficient clustering and routing for IoT-enabled healthcare using adaptive fuzzy logic and hybrid optimization. Sci. Rep. 2025, 15, 34619. [Google Scholar] [CrossRef]
- Jiménez-Guarneros, M.; Fuentes-Pineda, G.; Grande-Barreto, J. Multimodal semi-supervised domain adaptation using cross-modal learning and joint distribution alignment for cross-subject emotion recognition. IEEE Trans. Instrum. Meas. 2025, 74, 2518612. [Google Scholar] [CrossRef]
- Ghasemigarjan, R.; Mikaeili, M.; Setarehdan, S.K.; Saboori, A. Enhancing EEG-based sleep staging efficiency with minimal channels through adversarial domain adaptation and active deep learning. J. Neural Eng. 2025, 22, 046043. [Google Scholar] [CrossRef]
- Li, G.; Wu, C.; Liang, Z. Unsupervised Pairwise Learning Optimization Framework for Cross-Corpus EEG-Based Emotion Recognition Based on Prototype Representation. arXiv 2025, arXiv:2508.11663. [Google Scholar]
- Edder, A.; Ben-Bouazza, F.E.; Tafala, I.; Manchadi, O.; Jioudi, B. Self Attention-Driven ECG Denoising: A Transformer-Based Approach for Robust Cardiac Signal Enhancement. Signals 2025, 6, 26. [Google Scholar]
- Yu, J.; Ru, Y.; Lei, B.; Chen, H. GBV-Net: Hierarchical Fusion of Facial Expressions and Physiological Signals for Multimodal Emotion Recognition. Sensors 2025, 25, 6397. [Google Scholar] [CrossRef] [PubMed]
- Fang, X.; Jin, J.; Wang, H.; Liu, C.; Cai, J.; Nie, G.; Li, J.; Li, H.; Hong, S. PPGFlowECG: Latent Rectified Flow with Cross-Modal Encoding for PPG-Guided ECG Generation and Cardiovascular Disease Detection. arXiv 2025, arXiv:2509.19774. [Google Scholar]
- Sethi, S.; Chen, D.; Statchen, T.; Burkhart, M.C.; Bhandari, N.; Ramadan, B.; Beaulieu-Jones, B. ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning. arXiv 2025, arXiv:2504.08713. [Google Scholar]
- Wang, Y.; Hu, S.; Liu, J.; Wang, A.; Zhou, G.; Yang, C. PULSE: A personalized physiological signal analysis framework via unsupervised domain adaptation and self-adaptive learning. Expert Syst. Appl. 2025, 278, 127317. [Google Scholar] [CrossRef]
- Srivastava, S.; Kumar, D.; Jiwari, R.; Seth, S.; Sharma, D. rECGnition_v2. 0: Self-Attentive Canonical Fusion of ECG and Patient Data using deep learning for effective Cardiac Diagnostics. arXiv 2025, arXiv:2502.16255. [Google Scholar]
- Zheng, Y. Fusing Cross-Domain Knowledge from Multimodal Data to Solve Problems in the Physical World. Acm Trans. Intell. Syst. Technol. 2025, 17, 1–13. [Google Scholar] [CrossRef]
- Berwal, D.; Vandana, C.; Dewan, S.; Jiji, C.; Baghini, M.S. Motion artifact removal in ambulatory ECG signal for heart rate variability analysis. IEEE Sens. J. 2019, 19, 12432–12442. [Google Scholar] [CrossRef]
- Bari, D.; Aldosky, H.; Tronstad, C.; Kalvøy, H.; Martinsen, Ø. Electrodermal responses to discrete stimuli measured by skin conductance, skin potential, and skin susceptance. Ski. Res. Technol. 2018, 24, 108–116. [Google Scholar] [CrossRef]
- Fan, Y.; Liang, J.; Cao, X.; Pang, L.; Zhang, J. Effects of noise exposure and mental workload on physiological responses during task execution. Int. J. Environ. Res. Public Health 2022, 19, 12434. [Google Scholar] [CrossRef]
- Scarciglia, A.; Catrambone, V.; Bonanno, C.; Valenza, G. Physiological noise: Definition, estimation, and characterization in complex biomedical signals. IEEE Trans. Biomed. Eng. 2023, 71, 45–55. [Google Scholar] [CrossRef] [PubMed]
- Venton, J.; Harris, P.M.; Sundar, A.; Smith, N.A.; Aston, P.J. Robustness of convolutional neural networks to physiological electrocardiogram noise. Philos. Trans. R. Soc. 2021, 379, 20200262. [Google Scholar] [CrossRef]
- de Jong, I.P.; Sburlea, A.I.; Valdenegro-Toro, M. Uncertainty Quantification in Machine Learning for Biosignal Applications–A Review. arXiv 2023, arXiv:2312.09454. [Google Scholar]
- Frachi, Y.; Takahashi, T.; Wang, F.; Barthet, M. Design of emotion-driven game interaction using biosignals. In Proceedings of the International Conference on Human-Computer Interaction, Virtual, 26 June–1 July 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 160–179. [Google Scholar]
- Parreira, J.D.; Chalumuri, Y.R.; Mousavi, A.S.; Modak, M.; Zhou, Y.; Sanchez-Perez, J.A.; Gazi, A.H.; Harrison, A.B.; Inan, O.T.; Hahn, J.O. A proof-of-concept investigation of multi-modal physiological signal responses to acute mental stress. Biomed. Signal Process. Control 2023, 85, 105001. [Google Scholar] [CrossRef]
- Mühl, C.; Jeunet, C.; Lotte, F. EEG-based workload estimation across affective contexts. Front. Neurosci. 2014, 8, 114. [Google Scholar] [CrossRef]
- Niu, L.; Chen, C.; Liu, H.; Zhou, S.; Shu, M. A deep-learning approach to ECG classification based on adversarial domain adaptation. Healthcare 2020, 8, 437. [Google Scholar] [CrossRef]
- O’Shea, R.; Katti, P.; Rajendran, B. Baseline drift tolerant signal encoding for ECG classification with deep learning. In Proceedings of the 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 15–19 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
- Gholamiangonabadi, D.; Kiselov, N.; Grolinger, K. Deep Neural Networks for Human Activity Recognition With Wearable Sensors: Leave-One-Subject-Out Cross-Validation for Model Selection. IEEE Access 2020, 8, 133982–133994. [Google Scholar] [CrossRef]
- Han, J.; Wei, X.; Faisal, A.A. EEG decoding for datasets with heterogenous electrode configurations using transfer learning graph neural networks. J. Neural Eng. 2023, 20, 066027. [Google Scholar] [CrossRef] [PubMed]
- Dissanayake, T.; Fernando, T.; Denman, S.; Ghaemmaghami, H.; Sridharan, S.; Fookes, C. Domain Generalization in Biosignal Classification. IEEE Trans. Biomed. Eng. 2021, 68, 1978–1989. [Google Scholar] [CrossRef]
- Pup, F.D.; Atzori, M. Applications of Self-Supervised Learning to Biomedical Signals: A Survey. IEEE Access 2023, 11, 144180–144203. [Google Scholar] [CrossRef]
- Liu, Y.; Du, S.; Han, H.; Chen, X.; Zeng, W.; Tian, Z. Adaptive Distraction Recognition via Soft Prototype Learning and Probabilistic Label Alignment. IEEE Trans. Intell. Transp. Syst. 2024, 25, 18701–18713. [Google Scholar] [CrossRef]
- Zhang, T.; Ali, A.E.; Hanjalic, A.; Cesar, P. Few-Shot Learning for Fine-Grained Emotion Recognition Using Physiological Signals. IEEE Trans. Multimed. 2023, 25, 3773–3787. [Google Scholar] [CrossRef]
- Jeong, H.; Son, J.; Kim, H.; Kang, K. Defensive Adversarial Training for Enhancing Robustness of ECG based User Identification. In Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 6–8 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 3362–3369. [Google Scholar] [CrossRef]
- Luganga, A. Emofusion: Toward Emotion-Driven Adaptive Computational Design Workflows. In Proceedings of the 2025 ACM International Conference on Interactive Media Experiences, Niterói, RJ, Brazil, 3–6 June 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 473–478. [Google Scholar] [CrossRef]
- Rim, B.; Sung, N.J.; Min, S.; Hong, M. Deep learning in physiological signal data: A survey. Sensors 2020, 20, 969. [Google Scholar] [CrossRef] [PubMed]
- Nayak, S.K.; Pradhan, B.; Mohanty, B.; Sivaraman, J.; Ray, S.S.; Wawrzyniak, J.; Jarzębski, M.; Pal, K. A review of methods and applications for a heart rate variability analysis. Algorithms 2023, 16, 433. [Google Scholar] [CrossRef]
- Orguc, S.; Khurana, H.S.; Stankovic, K.M.; Leel, H.; Chandrakasan, A. EMG-based Real Time Facial Gesture Recognition for Stress Monitoring. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 8–21 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2651–2654. [Google Scholar] [CrossRef]
- Torres-Valencia, C.; Álvarez López, M.; Orozco-Gutiérrez, Á. SVM-based feature selection methods for emotion recognition from multimodal data. J. Multimodal User Interfaces 2017, 11, 9–23. [Google Scholar] [CrossRef]
- Hao, T.; Xu, K.; Zheng, X.; Li, J.; Chen, S.; Nie, W. Towards mental load assessment for high-risk works driven by psychophysiological data: Combining a 1D-CNN model with random forest feature selection. Biomed. Signal Process. Control 2024, 96, 106615. [Google Scholar] [CrossRef]
- Gunawan, M.D.; Setiawan, R.; Hikmah, N.F. Estimation of Sleep Quality Based on HRV, EMG, and EEG Parameters with K-Nearest Neighbor Method. In Proceedings of the 2024 International Seminar on Intelligent Technology and Its Applications (ISITIA), Kuala Lumpur, Malaysia, 20–22 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 651–656. [Google Scholar] [CrossRef]
- Patil, M.S.; Patil, H.D. Logistic regression based model for pain intensity level detection from biomedical signal. Int. Res. J. Multidiscip. Scope 2024, 5, 652–662. [Google Scholar] [CrossRef]
- Dutsinma, L.I.F.; Temdee, P. VARK learning style classification using decision tree with physiological signals. Wirel. Pers. Commun. 2020, 115, 2875–2896. [Google Scholar] [CrossRef]
- Rivas, J.J.; Orihuela-Espina, F.; Sucar, L.E. Recognition of Affective States in Virtual Rehabilitation using Late Fusion with Semi-Naive Bayesian Classifier. In Proceedings of the 13th EAI International Conference on Pervasive Computing Technologies for Healthcare, Trento, Italy, 20–23 May 2019; PervasiveHealth’19; Association for Computing Machinery: New York, NY, USA, 2019; pp. 308–313. [Google Scholar] [CrossRef]
- Kyamakya, K.; Al-Machot, F.; Haj Mosa, A.; Bouchachia, H.; Chedjou, J.C.; Bagula, A. Emotion and stress recognition related sensors and machine learning technologies. Sensors 2021, 21, 2273. [Google Scholar] [CrossRef] [PubMed]
- Naidu, G.; Zuva, T.; Sibanda, E.M. A review of evaluation metrics in machine learning algorithms. In Proceedings of the Computer Science On-Line Conference, Virtual, 3–5 April 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 15–25. [Google Scholar]
- Lin, Z.; Wang, Y.; Zhou, Y.; Du, F.; Yang, Y. Ste-mamba: Automated multimodal depression detection through emotional analysis and spatio-temporal information ensemble. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
- Tian, C.; Ma, Y.; Cammon, J.; Fang, F.; Zhang, Y.; Meng, M. Dual-encoder VAE-GAN with spatiotemporal features for emotional EEG data augmentation. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 2018–2027. [Google Scholar] [CrossRef] [PubMed]
- Thant, A.M.; Panitanarak, T. Emotion Recognition Through Advanced Signal Fusion and Kolmogorov-Arnold Networks. IEEE Access 2025, 13, 93259–93270. [Google Scholar] [CrossRef]
- Jain, P.; Kar, P. Non-convex optimization for machine learning. Found. Trends® Mach. Learn. 2017, 10, 142–363. [Google Scholar] [CrossRef]
- Ramachandram, D.; Taylor, G.W. Deep Multimodal Learning: A Survey on Recent Advances and Trends. IEEE Signal Process. Mag. 2017, 34, 96–108. [Google Scholar] [CrossRef]
- Fan, Y.; Xu, W.; Wang, H.; Wang, J.; Guo, S. Pmr: Prototypical modal rebalance for multimodal learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 20029–20038. [Google Scholar]
- Hatipoglu Yilmaz, B.; Kose, C.; Yilmaz, C.M. A novel multimodal EEG-image fusion approach for emotion recognition: Introducing a multimodal KMED dataset. Neural Comput. Appl. 2025, 37, 5187–5202. [Google Scholar] [CrossRef]
- Li, C.; Xie, L.; Wang, X.; Pan, H.; Wang, Z. A disentanglement mamba network with a temporally slack reconstruction mechanism for multimodal continuous emotion recognition. Multimed. Syst. 2025, 31, 169. [Google Scholar] [CrossRef]
- Wang, X.; Li, C.Z.; Sun, Z.; Xu, Y. Design and Analysis of a Closed-Loop Emotion Regulation System Based on Multimodal Affective Computing and Emotional Markov Chain. IEEE Trans. Syst. Man, Cybern. Syst. 2025, 55, 2426–2437. [Google Scholar] [CrossRef]
- Fang, L.; Chai, B.; Xu, Y.; Wang, S.J. KANFeel: A Novel Kolmogorov-Arnold Network-Based Multimodal Emotion Recognition Framework. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Yokohama, Japan,26 April–1 May 2025; CHI EA ’25; Association for Computing Machinery: New York, NY, USA, 2025. [Google Scholar] [CrossRef]
- Moorthy, S.; Moon, Y.K. Hybrid Multi-Attention Network for Audio–Visual Emotion Recognition Through Multimodal Feature Fusion. Mathematics 2025, 13, 1100. [Google Scholar] [CrossRef]
- Erdem Güler, S.; Patlar Akbulut, F. Multimodal Emotion Recognition: Emotion Classification Through the Integration of EEG and Facial Expressions. IEEE Access 2025, 13, 24587–24603. [Google Scholar] [CrossRef]
- Pavan, K.; Singh, A.; Pawar, D.S.; Ganapathy, N. Multimodal Wearable-Based Automated Driver Inattention State Assessment Using Multidevices and Novel Cross-Modal Attention Framework. IEEE Sens. Lett. 2025, 9, 1–4. [Google Scholar] [CrossRef]
- Can, Y.S.; Benouis, M.; Mahesh, B.; André, E. Application of Multimodal Self-Supervised Architectures for Daily Life Affect Recognition. IEEE Trans. Affect. Comput. 2025, 16, 2454–2465. [Google Scholar] [CrossRef]
- Thaduri, V.R.; R, R.; Rafi, M.; Fernandez, F.M.H.; Lakumarapu, S. Integrating Graph Neural Networks and Temporal Graph Convolutions for Enhanced Multimodal Stress Detection in Physiological and Behavioral Data Streams. In Proceedings of the 2025 International Conference on Sensors and Related Networks (SENNET) Special Focus on Digital Healthcare (64220), Vellore, India, 24–27 July 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Niu, Y.; Chen, X.; Fan, J.; Liu, C.; Fang, M.; Liu, Z.; Meng, X.; Liu, Y.; Lu, L.; Fan, H. Explainable machine learning model based on EEG, ECG, and clinical features for predicting neurological outcomes in cardiac arrest patient. Sci. Rep. 2025, 15, 11498. [Google Scholar] [CrossRef]
- Khuntia, S.; Amjad, A.; Tarekegen, R.B.; Tai, L.C. Deep Learning-Based Emotion Recognition Using Fusion of Multimodal Affective Data From Consumer-Grade Wearable ECG and Speech Sensors. In Proceedings of the 2025 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 11–14 January 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar]
- Feng, G.; Manimurugan, S.; Yi, B.; Feng, Y. Towards Precision Cardiac Healthcare: Deep Learning and IoT Integration for Real-Time Monitoring and Personalized Diagnosis. IEEE Internet Things J. 2025, early access. [Google Scholar] [CrossRef]
- Wen, Y.; Chen, W. A Multi-Modal Emotion Recognition Method Considering the Contribution and Redundancy of Channels and the Correlation and Heterogeneity of Modalities. Measurement 2025, 258, 119247. [Google Scholar] [CrossRef]
- Tryon, J.; Guillermo Colli Alfaro, J.; Luisa Trejos, A. Effects of Image Normalization on CNN-Based EEG–EMG Fusion. IEEE Sens. J. 2025, 25, 20894–20906. [Google Scholar] [CrossRef]
- Ringeval, F.; Eyben, F.; Kroupi, E.; Yuce, A.; Thiran, J.P.; Ebrahimi, T.; Lalanne, D.; Schuller, B. Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognit. Lett. 2015, 66, 22–30. [Google Scholar] [CrossRef]
- Zang, Z.; Yu, X.; Fu, B.; Liu, Y.; Ge, S.S. Contrastive reinforced transfer learning for EEG-based emotion recognition with consideration of individual differences. Biomed. Signal Process. Control 2025, 106, 107622. [Google Scholar] [CrossRef]
- Cañellas, M.L.; Casado, C.Á.; Nguyen, L.; López, M.B. A self-supervised multimodal framework for 1D physiological data fusion in remote health monitoring. Inf. Fusion 2025, 124, 103397. [Google Scholar] [CrossRef]
- Houssein, E.H.; Mohsen, S.; Emam, M.M.; Abdel Samee, N.; Alkanhel, R.I.; Younis, E.M. Leveraging explainable artificial intelligence for emotional label prediction through health sensor monitoring. Clust. Comput. 2025, 28, 86. [Google Scholar] [CrossRef]
- Gutiérrez-Martín, L.; López-Ongil, C.; Miranda-Calero, J.A. DeepBindi: An End-to-End Fear Detection System Optimized for Extreme-Edge Deployment. IEEE J. Biomed. Health Inform. 2025, 30, 688–699. [Google Scholar] [CrossRef]







| Dimension | ECG | EEG | EMG | EDA |
|---|---|---|---|---|
| Key Feature Indicators | HRV (SDNN, LF/HF), QTc, PQRST morphology | Band power (delta, theta, alpha, beta), frontal asymmetry, coherence, entropy | RMS, mean frequency, facial/limb MUAP waveforms | SCL (baseline), SCR (amplitude, rise time, recovery time) |
| Stress-Related Changes | HRV reduction, LF/HF increase, QTc fluctuation, PQRST waveform changes | Increased frontal theta and beta activity, reduced alpha power, altered connectivity | Muscle tension increase, higher amplitude and frequency, irregular contractions | SCR frequency and amplitude increase, SCL elevation |
| Advantages | Clinically mature, well established medical basis | High temporal resolution, direct reflection of central nervous system activity | Directly reflects muscle tension and facial expression changes | Highly sensitive to sympathetic activity, effective for short-term stress detection |
| Limitations | Strongly affected by inter-individual variability and motion artifacts [33], limited generalizability [36] | Sensitive to noise and artifacts, requires careful preprocessing and calibration, limited portability | Sensitive to electrode placement, prone to noise and artifacts [34], non-specific muscle tension changes | Strongly influenced by temperature [37] and humidity and skin properties [38], nonlinear and time-varying [39] |
| Applicable Scenarios | Clinical stress monitoring, cardiovascular stress studies, wearable devices for health | Laboratory-based stress experiments, cognitive workload assessment, affective computing | Facial EMG in emotion recognition, workplace stress detection, muscle fatigue studies | Cognitive load tasks, affective computing, psychophysiological stress research |
| Type | Measurement Dimensions | No. of Items | Scoring | Evaluation Objective | Application Scenarios | Advantages |
|---|---|---|---|---|---|---|
| PSS [41] | Perceived stress | 10/14 (PSS-10/PSS-14) | 5-point Likert (0–4) | Assess subjective stress perception | Mental health research; stress management | Easy to implement; suitable for general population |
| STAI [42] | State and Trait Anxiety (SA and TA) | 20 each (40 total) | 4-point Likert (1–4) | Differentiate SA and TA | Clinical anxiety assessment; screening | Separately assesses SA and TA |
| PANAS [43] | Positive or Negative affect | 10 each (20 total) | 5-point Likert (1–5) | Assess affective states | Emotion research; therapy evaluation | Distinguishes affect valence |
| SRQ [44] | Physiological, psychological, and behavioral responses to stress | 24 | 5-point Likert (1–5) | Assess comprehensive stress responses | Stress research; psychophysiological studies | Suited for multimodal physiological signal studies |
| Database | WESAD [50] | AMIGOS [51] | MAHNOB-HCI [52] | DREAMER [53] | PhysioNet (Stress Rec.) [54] |
|---|---|---|---|---|---|
| Physiological Modalities | ECG, EDA, Respiration, Motion | ECG, EDA, EEG, Facial expressions | ECG, EEG, EDA, Respiration | ECG, EEG | ECG, EDA, Respiration |
| No. of Subjects | 15 | 40 | 27 | 23 | 40 |
| Stress Induction Method | Wearable sensor-based protocol | Video-induced emotions | Video-induced emotions | Video-induced emotions | Driving simulation |
| Primary Application Domain | Wearable stress or affect detection | Affective computing; multimodal analysis | Human–computer interaction | VR-based emotion recognition | Driver stress evaluation |
| Annotation Type | Task phases; self-reports | Self-reports (arousal/ valence) | Self-reports; event markers | Self-reports (arousal/ valence) | Driving states; coarse labels |
| Recording Duration/ Protocol | Baseline task recovery blocks (minutes) | Multiple short video clips (minutes/ clip) | Video clips with synchronization | Video sessions (multi-trial) | Long continuous driving sessions |
| Acquisition Device/ Environment | Wearable sensors (lab/semi-free) | Lab; biosensors + video | Lab; medical grade devices | Lab; biosensors | Driving simulator; semi-realistic |
| Synchronization Alignment | Provided; multi-sensor timestamps | Requires alignment across modalities | Provided across channels | Provided for physiological signals | Provided; per-segment labels |
| Key Limitations | Limited sample size | Modality alignment required | Lack of high-stress tasks | Restricted modality coverage | High inter-individual variability |
| Task Type | Specific Task | Stress Induction Mechanism | Main Measured Signals | Advantages | Limitations |
|---|---|---|---|---|---|
| Cognitive | Stroop [71] | Color–word conflict and cognitive load | ECG, EDA, EEG | Simple and highly standardized | Limited stress intensity |
| MIST [72] | Time pressure and negative feedback | ECG, EDA, EEG | Strong induction and widely used | Artificial setting and low ecological validity | |
| Emotional Induction | Video stimuli | Emotionally evocative film clips | ECG, EDA, EEG, facial expression | Easy to implement and well controlled | Large interindividual variation |
| Audio stimuli | Music and affective sounds | ECG, EDA | Simple and noninvasive | Narrow stimulus range | |
| IAPS image set [73] | Standardized affective pictures | ECG, EDA, EEG | High standardization and replicable | Monotony and weak persistence | |
| Social Stress | TSST [74] | Public speaking and interview causing social evaluation pressure | ECG, EDA, cortisol, EMG | High ecological validity and strong stress | Hard to standardize and ethical concerns |
| Real-World Simulation | Driving simulation [75] | Traffic complexity and performance pressure | ECG, EDA, respiration, motion | High ecological validity | High cost and strong individual variability |
| Immersive VR tasks [76] | Virtual environments that elicit stress | ECG, EDA, EEG, motion | High immersion and realism | Technology dependence and expense |
| Name | Sample Size | Task Type | Specific Tasks | Modalities | Annotation Scheme | Advantages | Limitations |
|---|---|---|---|---|---|---|---|
| Stress-ID [79] | 65 participants (18F and 47M, age 21–55) | Cognitive, Emotional, Social, Relaxation Control | Breathing baseline; emotional videos; seven interactive stress tasks; public speaking; relaxation | ECG, EDA, Respiration, Video, Audio | Self-assessments (0 to 10 stress); SAM; binary and three-class labels | Comprehensive multimodal coverage; large dataset; detailed annotations; baseline models provided and publicly available | Controlled lab setup; possible sensor-induced stress; subjective bias; gender imbalance; partial data loss |
| Muse [83] | 28 college students (during and after final exams) | Emotional induction and monologue elicitation | Baseline + monologues + emotional videos; stress level assessed via PSS scale | Video, Thermal camera, Audio, Heart rate, Skin conductance and temperature | Self-reports (PSS, SAM); external annotations via AMT | Naturalistic stress context; multimodal coverage; synchronized baseline and task data | Small sample and student-only cohort; contextual rather than controlled stress elicitation |
| Empathic School [84] | 20 students (aged 21 to 35) | Cognitive, Emotional, Social, Relaxation | Magazine reading, presentation prep, IQ and Stroop, music, funny video, breathing, rest sessions | Facial video, EDA, HR, BVP, IBI, Skin temperature, ACC | NASA-TLX scores; video-based expression labels | Comprehensive wearable and facial coverage; multiple task types; public code | Lab-only environment; coarse temporal resolution; no chronic stress analysis |
| VERRD [85] | 34 participants (final 26 used) | Real-world simulation | 360° VR video environment (12 clips) | Eye tracking, ECG, GSR, Self-reports | SAM and VAS scales with circumplex valence–arousal model | High immersion and ecological validity; multimodal integration; publicly available features | Small sample; no EEG data; raw video stimuli unavailable |
| ADA Base [86] | 51 participants | Cognitive + Driving Simulation | n-back (1–3 back, single and dual) and semi-autonomous driving tasks with secondary infotainment load | ECG, EDA, EMG, PPG, Respiration, Skin Temp, Eye tracking, Facial video, Cortisol | Baseline and load levels (low, medium, high); subjective questionnaires (NASA-TLX, PSS, PANAS) | Broad multimodal coverage; realistic simulation; continuous load labels with synchronization | Missing EEG; limited sample; privacy restrictions in video data; artifact-prone sessions |
| WEMAC [80] | 100 women (20–77 years) | Emotional induction (VR fear vs neutral) | Immersive VR videos eliciting fear and neutral states | BVP, GSR, Skin Temp, Resp, EMG, Motion, Speech features | Discrete (12 emotions) and dimensional (VAD) ratings; speech annotations | High ecological validity; large female cohort; VR-based emotion elicitation | Order effects; limited stimuli range; gender-specific sample |
| Multi-PENG [81] | 39 participants (30M and 9F, mean age 24.3) | Video game tasks with graded difficulty | Sports and fighting games (FIFA’23, Street Fighter V) with round-level surveys and pauses | EEG, Eye tracking, Heart rate, Controller inputs, Facial video, Gameplay footage, Surveys | Self-reports; third-party annotations subset | Rich multimodal data; precise temporal alignment; public availability on Kaggle | Limited games; motion artifacts; partial missing modalities; small sample |
| ForDigit-Stress [82] | 40 participants (57.5% F, 40% M, 2.5% diverse; mean age 22.7 ± 3.2) | Social stress task (digital mock interview) | Remote job interviews with 14 stages (self-intro, motivation, logic and math questions, etc.) | PPG, EDA, Cortisol, Facial AUs, Eye tracking, Body skeleton, Speech, HD video, Audio | Frame-by-frame annotations (2 psychologists + self-reports + cortisol validation) | Ecological interview scenario; comprehensive modalities; continuous labels and baseline features | Limited sample; missing eye tracking; EDA delay vs labels; single scenario context |
| Aspect | Objective | Representative Algorithms | Theoretical Assumption | Advantages | Limitations | Scenarios |
|---|---|---|---|---|---|---|
| Temporal Alignment | To correct variations in sampling frequency and physiological response latency, enabling temporal correspondence across modalities. | Learnable time-shift modules [154]; temporal position encoding [155]; dynamic time warping (DTW) and differentiable variants [156]; causal temporal convolution [157]; delay-aware attention mechanisms [158]. | Modality-specific discrepancies can be modeled through learnable temporal shifts and local temporal scaling. | Directly accounts for physiological latency and asynchronous sampling, improving segment-level alignment and temporal consistency. | Excessive alignment may introduce artificial temporal distortions; global alignment is difficult under causal or real-time constraints. | Suitable when modalities exhibit distinct temporal dynamics (e.g., near-instant ECG response, delayed EDA activation, rapid EMG fluctuations). |
| Distribution Alignment | To reduce inter-modality divergence in feature distributions and mitigate domain shift caused by device, session, or subject variability. | Maximum Mean Discrepancy (MMD) [159]; Domain-Adversarial Neural Networks (DANN) [160]; Correlation Alignment (CORAL) [161]; whitening transformation [162]; layer-wise normalization recalibration [163]. | Modalities share alignable statistical moments that can be matched through explicit distributional constraints. | Mitigates domain shift without requiring strong supervision, enhancing robustness to inter-subject or inter-device variability. | Statistical alignment may overlook semantic structure; adversarial optimization can be unstable. | Preferred in scenarios with device heterogeneity, batch effects, or cross-session variability. |
| Semantic Alignment | To map heterogeneous modalities into a unified semantic space in which representations of the same emotional state converge. | Contrastive learning frameworks (InfoNCE, SupCon) [164]; triplet and prototypical networks [165]; mutual information maximization [166]; Deep Canonical Correlation Analysis (DCCA) [167]. | Different modalities approximate a shared semantic manifold and can be aligned through discriminative or correlation-based objectives. | Enhances discriminability and modality-invariant representation, facilitating external generalization and cross-modal transfer. | Requires reliable labels or pseudo-labels; risk of semantic collapse or dominance of a single modality. | Most effective when semantic consistency across modalities is critical for modality-agnostic inference or transfer learning. |
| Author | Datasets | Signals | Extracted Features | Feature Selection | Classifier | Accuracy | F1 Score | Limitations |
|---|---|---|---|---|---|---|---|---|
| Torres-Valencia et al. [191] | DEAP, MAHNOB-HCI | EEG (main), GSR, HR, Resp., Temp., EMG, EOG | EEG spectral features, HRV indices, statistical descriptors | Not specified | SVM | 75.17% | 79.25% | Limited to SVM; binary classification only; shallow fusion; no temporal modeling; no real-time validation |
| Hao et al. [192] | Self-collected | EEG, PPG, EOG | Time–frequency and statistical features | Random Forest Selection (RFS) | 1D-CNN + RFS | 90.67% | 91.47% | Small sample size; all-male cohort; limited diversity of feature selection methods |
| Gunawan et al. [193] | Self-built | ECG, EMG, EEG | Statistical and time-domain features | – | KNN | 73.33% | – | Small sample size; unbalanced data; no F1/recall reported; no model comparison |
| Patil et al. [194] | BioVid Heat Pain | ECG, EDA, EMG | HRV features, EDA statistical features, EMG amplitude descriptors | Not specified | Logistic Regression | 83.20% | – | F1 and recall not reported; sensitive to feature dimensionality; limited nonlinear modeling ability |
| Dutsinma et al. [195] | Self-built | Heart Rate, Blood Pressure | Heart rate and blood pressure statistics | – | Decision Tree | 95% | – | Small sample size; limited modalities; restricted applicability |
| Abadi et al. [196] | Self-built | Finger pressure, hand motion, facial expression | Pressure and motion statistical descriptors | Late fusion-based aggregation | Semi-Naive Bayesian (Late Fusion) | 93% | 93% | Facial data loss; limited physiological integration; robustness to sensor failure needs improvement |
| (a) | ||||||
| Author | Datasets | Signals | Classifier | Accuracy | F1 Score | Limitations |
| Bahar et al. [205] | KMED, DEAP | EEG + facial image | AAT + SIFT + LBP + FLF + SVM | KMED: 89.95%, DEAP: 92.44% | – | Signal-to-image processing required; only binary classification is used; sensitive to data synchronization and frame selection |
| Chouinard et al. [206] | RECOLA, LM-TSST | Facial video, ECG, EDA | Decoupled Mamba Network + Time Relaxation Reconstruction Mechanism + Transformer | Only Concordance correlation coefficient: RECOLA: 0.3921, LM-TSST: 0.3774 | – | Sensitive to modal differences; Limited handling of modal imbalance; Generalization depends on training distribution; Lack of multi-scenario validation |
| Xingchao Wang et al. [207] | DEAP, Self-built, Million Song Dataset | EEG, ECG, facial video | Multimodal LSTM + Emotional Markov Chain | Valence: 86.75%, Arousal: 83.24% | – | The granularity of emotion modeling is limited, the computational complexity is high, there are large differences between individuals, and personalized adjustments are required. Real-time performance is affected by hardware |
| Le Fang et al. [208] | Emo-MG, IEMOCAP, EMOTIC | EEG + micro-gesture, voice + video, image | multimodal model based on Kolmogorov v-A Arnold Network, Transformer variant with KAN attention mechanism | Emo-MG: 83.54%, IEMOCAP: 72.16%, EMOTIC (Valence dim.): 73.41% | – | The test speed is slightly slower; the IEMOCAP data is still lower than some SOTA models (such as CORRECT); the Transformer version is more complicated to calculate |
| Sathi-shkumar Moorthy et al. [209] | AffWild2, AFEW-VA, IEMOCAP | Speech audio, spectrogram, MFCC, facial images video | Hybrid Multi-Attention Network, Contains CSSA + HASPCM modules | IEMOCAP: 75.39% | – | The model structure is complex; it consumes a lot of resources; it takes a long time to train; it depends on the quality of annotations |
| Erdem et al. [210] | DEAP | EEG+ Facial expression video | GRU(main), LSTM, Transformer | Single modality GRU: 91.8%,Multimodality: 97.8% | GRU: 97% | The Transformer model is complex and requires a large amount of data; facial expression data is affected by the quality of the recording, some of it needs to be synthesized and supplemented |
| Ao Li et al. [131] | DEAP, WESAD | EEG, ECG, EMG, EOG, GSR | CovNet + Attention Fusion Model + Conditional Self-Attention GAN + CovNet Classifier | DEAP: 96.06%, WESAD: 95.70% | DEAP: 95.80%, WESAD: 95.45% | High reliance on generative models; requires high-computing equipment; lacks verification in real complex environments; scalability and cross-domain adaptability need further research |
| (b) | ||||||
| Author | Datasets | Signals | Classifier | Accuracy | F1 Score | Limitations |
| Kaveti Pavan et al. [211] | IIT Hyderabad in-house controlled driving dataset | ECG, EDA | 1D-CNN with cross squeeze-and-excitation attention; LOSOCV | 76.54% ECG→EDA attention; 76.37% LOSOCV mean | 73.02% (best) | Small, single-site, all-male cohort; two-class setup; short windows; device- and scenario-specific; no benchmarking on public datasets; limited evidence for real-world generalization and edge deployment latency/energy. |
| Axel Gedeon et al. [118] | ASCERTAIN and KEMDy20 | ECG, EEG, EDA, TEMP, IBI, EMO, Audio signal, Text | Cross-Attention Gated Mixture of Experts. | Arousal: 99.71%, Valence: 99.71% | Arousal: 99.83%, Valence: 99.76% | The model is highly complex, has a large number of parameters, and requires high hardware resources; it requires multimodal and complete input; its generalization ability has not been verified in real environments; and its model interpretability is limited. |
| Yekta et al. [212] | SWEET, DAPPER, LabToDaily | EDA, PPG, TEMP, ACC | CNN-complex | – | SWEET: 97.80% | The signal is severely affected by motion artifacts, the model is insensitive to motion state, the Transformer module is not effective when the data scale is insufficient, requires a long time window, and is not suitable for real-time emotion detection. Self-supervision requires a large amount of unlabeled data, and the training cost is high. |
| Thaduri et al. [213] | Stress-Lysis | ECG, EDA, Behavioral signals, Environmental signals | GNN-T-GCN | 92.3% | – | The generalization ability is not verified. It requires a lot of computing resources. The mixing of environmental signals and physiological signals may introduce noise. It does not deal with motion artifacts and lost data in real wearable devices. It has not been verified under real-time monitoring conditions. |
| Modals | Feature Method | Data Scale | Interpre-Tability | Fusion Type | Alignment | Deployment Cost | Generalization | Sensitivity |
|---|---|---|---|---|---|---|---|---|
| Traditional Machine Learning | Manual feature extraction | Can work with small datasets | High | Mostly early or decision-level fusion | Requires manual synchronization or preprocessing | Low computational resources required; suitable for edge devices | Relatively robust in low-variance tasks | More reliant on individual feature quality |
| Deep Learning | Automatic feature learning via deep networks | Requires large-scale, labeled data for optimal performance | Low; needs XAI tools such as SHAP or Grad-CAM | Supports mid-level fusion via attention or KAN mechanisms | Can model inter-modal dynamics and delays | High computational cost; lightweight models needed for deployment | Sensitive to overfitting without regularization or transfer learning | Better at capturing high-order correlations between modalities |
| Challenge | Manifestation | Impact | Key Points | Solutions |
|---|---|---|---|---|
| Modal heterogeneity and structural incompatibility | Inconsistent sampling rates, temporal–spectral characteristics, and dynamic ranges across modalities cause misaligned features and redundant information. | Reduces discrimination and generalization; leads to negative transfer and unstable convergence. | Heterogeneous signals differ in temporal response and noise; shared encoders fail to capture modality-specific semantics. | Modality-specific encoders with shared–private embedding spaces; adaptive normalization and attention weighting; confidence-aware fusion; calibration-aware models [218]. |
| Temporal asynchrony between modalities | Physiological and device-level delays cause unsynchronized data streams. | Leads to semantic mismatches, feature redundancy, and cross-modal interference; affects temporal generalization. | Static alignment cannot adapt to dynamic tasks; physiological lags and rating delays persist. | Dynamic and Soft-DTW alignment; asynchronous attention; adaptive delay modeling; cross-modal temporal calibration [219]. |
| Dependence on large-scale labeled datasets | Physiological data collection and annotation are costly and subjective, leading to small-sample imbalance and missing modalities. | Overfitting and unstable training; poor scalability under real-world data scarcity. | Annotation subjectivity and limited public datasets restrict model capacity; multimodal imbalance worsens with missing data. | Self-supervised and contrastive learning; few-shot learning; incremental and federated adaptation; multi-expert Transformer with sparse gating [118]. |
| Poor generalization and small-sample adaptability | Strong inter-subject variability and inconsistent experimental paradigms cause domain shift and poor cross-dataset transfer. | Degraded performance under unseen subjects, devices, or contexts. | Differences in physiological baselines and sensor configurations hinder subject-independent modeling. | Domain adaptation; invariant representation learning; meta-learning for personalization; contrastive reinforcement transfer learning [220]. |
| Fusion dependency | Model performance heavily depends on complete, synchronized, high-quality multimodal inputs. | Missing or noisy modalities cause cascading attention mismatches and unstable predictions. | Deep fusion assumes modality complementarity and shared latent space; lacks physiological priors. | Uncertainty-aware fusion; modality dropout; reliability weighting; self-supervised multimodal contrastive learning [221]. |
| Lack of interpretability | Deep models act as black boxes; decisions lack physiological or psychological explainability. | Limits trust, clinical adoption, and ethical approval. | SHAP and LIME improve transparency but remain locally inconsistent and computationally costly. | Embedded explainable modules; multimodal causal analysis; model-agnostic and differentiable interpretability frameworks [222]. |
| Real-time deployment and efficiency | High latency, energy consumption, and limited compute capacity on edge devices hinder real-time response. | Prevents large-scale and wearable implementation; increases delay and power usage. | Multi-branch architectures and cross-modal attention raise complexity; quantization may reduce precision. | Model pruning, quantization, knowledge distillation; dynamic inference; edge–cloud co-inference; lightweight CNN or Transformer architectures [223]. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, X.; Zhang, H.; Xu, M. Multimodal Classification Algorithms for Emotional Stress Analysis with an ECG-Centered Framework: A Comprehensive Review. AI 2026, 7, 63. https://doi.org/10.3390/ai7020063
Zhang X, Zhang H, Xu M. Multimodal Classification Algorithms for Emotional Stress Analysis with an ECG-Centered Framework: A Comprehensive Review. AI. 2026; 7(2):63. https://doi.org/10.3390/ai7020063
Chicago/Turabian StyleZhang, Xinyang, Haimin Zhang, and Min Xu. 2026. "Multimodal Classification Algorithms for Emotional Stress Analysis with an ECG-Centered Framework: A Comprehensive Review" AI 7, no. 2: 63. https://doi.org/10.3390/ai7020063
APA StyleZhang, X., Zhang, H., & Xu, M. (2026). Multimodal Classification Algorithms for Emotional Stress Analysis with an ECG-Centered Framework: A Comprehensive Review. AI, 7(2), 63. https://doi.org/10.3390/ai7020063

