Next Article in Journal
Thermal Ablation and High-Resolution Imaging Using a Back-to-Back (BTB) Dual-Mode Ultrasonic Transducer: In Vivo Results
Next Article in Special Issue
The Analysis of Emotion Authenticity Based on Facial Micromovements
Previous Article in Journal
Textile Electrodes: Influence of Knitting Construction and Pressure on the Contact Impedance
 
 
Article

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute, Daejeon 34129, Korea
*
Author to whom correspondence should be addressed.
Academic Editor: Raffaele Gravina
Sensors 2021, 21(5), 1579; https://doi.org/10.3390/s21051579
Received: 7 January 2021 / Revised: 15 February 2021 / Accepted: 21 February 2021 / Published: 24 February 2021
(This article belongs to the Special Issue Emotion Intelligence Based on Smart Sensing)
Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization. View Full-Text
Keywords: speech emotion recognition; domain adaptation; SER generalization; Korean Emotional Speech Database; ensemble model; multi-path; group-loss; BLSTM network speech emotion recognition; domain adaptation; SER generalization; Korean Emotional Speech Database; ensemble model; multi-path; group-loss; BLSTM network
Show Figures

Figure 1

MDPI and ACS Style

Noh, K.J.; Jeong, C.Y.; Lim, J.; Chung, S.; Kim, G.; Lim, J.M.; Jeong, H. Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets. Sensors 2021, 21, 1579. https://doi.org/10.3390/s21051579

AMA Style

Noh KJ, Jeong CY, Lim J, Chung S, Kim G, Lim JM, Jeong H. Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets. Sensors. 2021; 21(5):1579. https://doi.org/10.3390/s21051579

Chicago/Turabian Style

Noh, Kyoung Ju, Chi Yoon Jeong, Jiyoun Lim, Seungeun Chung, Gague Kim, Jeong Mook Lim, and Hyuntae Jeong. 2021. "Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets" Sensors 21, no. 5: 1579. https://doi.org/10.3390/s21051579

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop