SGMT with S-PACE: A Framework for Temporal Alignment and Quality-Aware Multimodal Fusion in Emotion Recognition

Ahn, Jun-Young; Arthanari, Sathiyamoorthi; Moorthy, Sathishkumar; Moon, Yeon-Kug

doi:10.3390/math14101743

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

SGMT with S-PACE: A Framework for Temporal Alignment and Quality-Aware Multimodal Fusion in Emotion Recognition

by

Jun-Young Ahn

,

Sathiyamoorthi Arthanari

,

Sathishkumar Moorthy

and

Yeon-Kug Moon

^*

HEART Lab (Human Emotion and Intelligent Agent Research for Future Transformation Lab), Department of Artificial Intelligence and Data Science, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(10), 1743; https://doi.org/10.3390/math14101743

Submission received: 14 April 2026 / Revised: 8 May 2026 / Accepted: 13 May 2026 / Published: 19 May 2026

(This article belongs to the Special Issue Mathematics-Driven Computer Vision and Multi-Modal Learning)

Download Versions Notes

Abstract

Multimodal emotion recognition is challenging because behavioral signals and physiological responses evolve at different temporal rates. Facial expressions and speech often change rapidly after an emotional event, whereas peripheral biosignals such as electrodermal activity, blood volume pulse, and skin temperature exhibit delayed and smoother dynamics. This temporal inconsistency can degrade fusion performance, particularly in real-world recordings with noisy or missing modalities. To address this issue, this study proposes SGMT, an S-PACE Gated Multimodal Transformer for emotion recognition using speech, facial video, and physiological signals. The proposed SGMT introduces S-PACE, a physiology-guided cross-attention mechanism that aligns fast behavioral cues with slower biosignal representations without assuming a fixed temporal delay. A Quality-Aware Gate further improves robustness by adaptively weighting modalities according to signal reliability. The fused representations are processed using a Temporal Swin Transformer and a Perceiver Fusion module for arousal–valence prediction and emotion quadrant classification. Experiments are conducted on the Korean multimodal emotion datasets KEMDy20 and K-EmoCon under different modality settings. SGMT achieves arousal UARs of 68.4% on KEMDy20 and 62.9% on K-EmoCon, with quadrant accuracies of 44.7% and 62.5%, respectively. Ablation studies demonstrate that the proposed alignment and gating strategies provide more stable multimodal fusion than conventional feature concatenation. The results indicate that SGMT effectively adapts to varying modality availability and improves multimodal emotion recognition in naturalistic environments.

Keywords: multimodal emotion recognition; physiological signal alignment; quality-aware fusion; temporal transformer; bio-behavioral synchronization; affective computing; gated fusion; wearable biosensors

Share and Cite

MDPI and ACS Style

Ahn, J.-Y.; Arthanari, S.; Moorthy, S.; Moon, Y.-K. SGMT with S-PACE: A Framework for Temporal Alignment and Quality-Aware Multimodal Fusion in Emotion Recognition. Mathematics 2026, 14, 1743. https://doi.org/10.3390/math14101743

AMA Style

Ahn J-Y, Arthanari S, Moorthy S, Moon Y-K. SGMT with S-PACE: A Framework for Temporal Alignment and Quality-Aware Multimodal Fusion in Emotion Recognition. Mathematics. 2026; 14(10):1743. https://doi.org/10.3390/math14101743

Chicago/Turabian Style

Ahn, Jun-Young, Sathiyamoorthi Arthanari, Sathishkumar Moorthy, and Yeon-Kug Moon. 2026. "SGMT with S-PACE: A Framework for Temporal Alignment and Quality-Aware Multimodal Fusion in Emotion Recognition" Mathematics 14, no. 10: 1743. https://doi.org/10.3390/math14101743

APA Style

Ahn, J.-Y., Arthanari, S., Moorthy, S., & Moon, Y.-K. (2026). SGMT with S-PACE: A Framework for Temporal Alignment and Quality-Aware Multimodal Fusion in Emotion Recognition. Mathematics, 14(10), 1743. https://doi.org/10.3390/math14101743

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SGMT with S-PACE: A Framework for Temporal Alignment and Quality-Aware Multimodal Fusion in Emotion Recognition

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI