Next Article in Journal
Factors Influencing Mathematics Achievement of University Students of Social Sciences
Next Article in Special Issue
Cluster Flows and Multiagent Technology
Previous Article in Journal
AIRC: Attentive Implicit Relation Recommendation Incorporating Content Information for Bipartite Graphs
Previous Article in Special Issue
A Short-Patterning of the Texts Attributed to Al Ghazali: A “Twitter Look” at the Problem
Open AccessArticle

CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network

Interaction Technology Laboratory, Department of Software, Sejong University, Seoul 05006, Korea
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(12), 2133; https://doi.org/10.3390/math8122133
Received: 18 October 2020 / Revised: 16 November 2020 / Accepted: 24 November 2020 / Published: 30 November 2020
(This article belongs to the Special Issue Machine Learning and Data Mining in Pattern Recognition)
Artificial intelligence, deep learning, and machine learning are dominant sources to use in order to make a system smarter. Nowadays, the smart speech emotion recognition (SER) system is a basic necessity and an emerging research area of digital audio signal processing. However, SER plays an important role with many applications that are related to human–computer interactions (HCI). The existing state-of-the-art SER system has a quite low prediction performance, which needs improvement in order to make it feasible for the real-time commercial applications. The key reason for the low accuracy and the poor prediction rate is the scarceness of the data and a model configuration, which is the most challenging task to build a robust machine learning technique. In this paper, we addressed the limitations of the existing SER systems and proposed a unique artificial intelligence (AI) based system structure for the SER that utilizes the hierarchical blocks of the convolutional long short-term memory (ConvLSTM) with sequence learning. We designed four blocks of ConvLSTM, which is called the local features learning block (LFLB), in order to extract the local emotional features in a hierarchical correlation. The ConvLSTM layers are adopted for input-to-state and state-to-state transition in order to extract the spatial cues by utilizing the convolution operations. We placed four LFLBs in order to extract the spatiotemporal cues in the hierarchical correlational form speech signals using the residual learning strategy. Furthermore, we utilized a novel sequence learning strategy in order to extract the global information and adaptively adjust the relevant global feature weights according to the correlation of the input features. Finally, we used the center loss function with the softmax loss in order to produce the probability of the classes. The center loss increases the final classification results and ensures an accurate prediction as well as shows a conspicuous role in the whole proposed SER scheme. We tested the proposed system over two standard, interactive emotional dyadic motion capture (IEMOCAP) and ryerson audio visual database of emotional speech and song (RAVDESS) speech corpora, and obtained a 75% and an 80% recognition rate, respectively. View Full-Text
Keywords: affective computing; artificial intelligence; deep learning; ConvLSTM; gated recurrent units (GRUs); speech emotion recognition; raw speech data affective computing; artificial intelligence; deep learning; ConvLSTM; gated recurrent units (GRUs); speech emotion recognition; raw speech data
Show Figures

Figure 1

MDPI and ACS Style

Mustaqeem; Kwon, S. CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network. Mathematics 2020, 8, 2133. https://doi.org/10.3390/math8122133

AMA Style

Mustaqeem, Kwon S. CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network. Mathematics. 2020; 8(12):2133. https://doi.org/10.3390/math8122133

Chicago/Turabian Style

Mustaqeem; Kwon, Soonil. 2020. "CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network" Mathematics 8, no. 12: 2133. https://doi.org/10.3390/math8122133

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop