Next Article in Journal
Erratum: Xu, S. et al. The Integrated Design of a Novel Secondary Control and Robust Optimal Energy Management for Photovoltaic-Storage System Considering Generation Uncertainty. Electronics 2020, 9, 69
Next Article in Special Issue
A Robust Structured Tracker Using Local Deep Features
Previous Article in Journal
A Compact Dual-Mode Bandpass Filter with High Out-of-Band Suppression Using a Stub-Loaded Resonator Based on the GaAs IPD Process
Previous Article in Special Issue
SMO-DNN: Spider Monkey Optimization and Deep Neural Network Hybrid Classifier Model for Intrusion Detection
Open AccessArticle

Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

Department of Computer Engineering, Hanbat National University, Daejeon 34158, Korea
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(5), 713; https://doi.org/10.3390/electronics9050713
Received: 30 March 2020 / Revised: 22 April 2020 / Accepted: 23 April 2020 / Published: 26 April 2020
(This article belongs to the Special Issue Deep Neural Networks and Their Applications)
We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%. View Full-Text
Keywords: speech-emotion recognition; attention mechanism; LSTM speech-emotion recognition; attention mechanism; LSTM
Show Figures

Figure 1

MDPI and ACS Style

Yu, Y.; Kim, Y.-J. Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database. Electronics 2020, 9, 713.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop