Next Article in Journal
Mergeable Probabilistic Voxel Mapping for LiDAR–Inertial–Visual Odometry
Previous Article in Journal
Distributed Pursuit–Evasion Game Decision-Making Based on Multi-Agent Deep Reinforcement Learning
Previous Article in Special Issue
Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

MMER-LMF: Multi-Modal Emotion Recognition in Lightweight Modality Fusion

1
Department of Computer Science, Chosun University, Gwangju 61452, Republic of Korea
2
Department of Future Convergence, Chosun University, Gwangju 61452, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(11), 2139; https://doi.org/10.3390/electronics14112139
Submission received: 15 April 2025 / Revised: 9 May 2025 / Accepted: 23 May 2025 / Published: 24 May 2025
(This article belongs to the Special Issue Modeling of Multimodal Speech Recognition and Language Processing)

Abstract

Recently, multimodal approaches that combine various modalities have been attracting attention to recognizing emotions more accurately. Although multimodal fusion delivers strong performance, it is computationally intensive and difficult to handle in real time. In addition, there is a fundamental lack of large-scale emotional datasets for learning. In particular, Korean emotional datasets have fewer resources available than English-speaking datasets, thereby limiting the generalization capability of emotion recognition models. In this study, we propose a more lightweight modality fusion method, MMER-LMF, to overcome the lack of Korean emotional datasets and improve emotional recognition performance while reducing model training complexity. To this end, we suggest three algorithms that fuse emotion scores based on the reliability of each model, including text emotion scores extracted using a pre-trained large-scale language model and video emotion scores extracted based on a 3D CNN model. Each algorithm showed similar classification performance except for slight differences in disgust emotion performance with confidence-based weight adjustment, correlation coefficient utilization, and the Dempster–Shafer Theory-based combination method. The accuracy was 80% and the recall was 79%, which is higher than 58% using text modality and 72% using video modality. This is a superior result in terms of learning complexity and performance compared to previous studies using Korean datasets.
Keywords: multi-modal emotion recognition; decision-level fusion; text–video multi-modal emotion recognition; decision-level fusion; text–video

Share and Cite

MDPI and ACS Style

Kim, E.-H.; Lim, M.-J.; Shin, J.-H. MMER-LMF: Multi-Modal Emotion Recognition in Lightweight Modality Fusion. Electronics 2025, 14, 2139. https://doi.org/10.3390/electronics14112139

AMA Style

Kim E-H, Lim M-J, Shin J-H. MMER-LMF: Multi-Modal Emotion Recognition in Lightweight Modality Fusion. Electronics. 2025; 14(11):2139. https://doi.org/10.3390/electronics14112139

Chicago/Turabian Style

Kim, Eun-Hee, Myung-Jin Lim, and Ju-Hyun Shin. 2025. "MMER-LMF: Multi-Modal Emotion Recognition in Lightweight Modality Fusion" Electronics 14, no. 11: 2139. https://doi.org/10.3390/electronics14112139

APA Style

Kim, E.-H., Lim, M.-J., & Shin, J.-H. (2025). MMER-LMF: Multi-Modal Emotion Recognition in Lightweight Modality Fusion. Electronics, 14(11), 2139. https://doi.org/10.3390/electronics14112139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop