Next Article in Journal
Solid-Contact Electrode with Composite PVC-Based 3D-Printed Membrane. Optimization of Fabrication and Performance
Next Article in Special Issue
Deep-Learning-Based Multimodal Emotion Classification for Music Videos
Previous Article in Journal
Array Design of 300 GHz Dual-Band Microstrip Antenna Based on Dual-Surfaced Multiple Split-Ring Resonators
Previous Article in Special Issue
Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models
Article

Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion

Department of Biomedical Engineering, School of Engineering and Applied Science, George Washington University, Washington, DC 20052, USA
*
Author to whom correspondence should be addressed.
Academic Editors: Soo-Hyung Kim and Gueesang Lee
Sensors 2021, 21(14), 4913; https://doi.org/10.3390/s21144913
Received: 1 July 2021 / Revised: 17 July 2021 / Accepted: 17 July 2021 / Published: 19 July 2021
(This article belongs to the Special Issue Sensor Based Multi-Modal Emotion Recognition)
Decades of scientific research have been conducted on developing and evaluating methods for automated emotion recognition. With exponentially growing technology, there is a wide range of emerging applications that require emotional state recognition of the user. This paper investigates a robust approach for multimodal emotion recognition during a conversation. Three separate models for audio, video and text modalities are structured and fine-tuned on the MELD. In this paper, a transformer-based crossmodality fusion with the EmbraceNet architecture is employed to estimate the emotion. The proposed multimodal network architecture can achieve up to 65% accuracy, which significantly surpasses any of the unimodal models. We provide multiple evaluation techniques applied to our work to show that our model is robust and can even outperform the state-of-the-art models on the MELD. View Full-Text
Keywords: multimodal emotion recognition; multimodal fusion; crossmodal transformer; attention mechanism multimodal emotion recognition; multimodal fusion; crossmodal transformer; attention mechanism
Show Figures

Figure 1

MDPI and ACS Style

Xie, B.; Sidulova, M.; Park, C.H. Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion. Sensors 2021, 21, 4913. https://doi.org/10.3390/s21144913

AMA Style

Xie B, Sidulova M, Park CH. Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion. Sensors. 2021; 21(14):4913. https://doi.org/10.3390/s21144913

Chicago/Turabian Style

Xie, Baijun, Mariia Sidulova, and Chung H. Park. 2021. "Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion" Sensors 21, no. 14: 4913. https://doi.org/10.3390/s21144913

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop