Next Article in Journal
Expert Control Systems for Maximum Power Point Tracking in a Wind Turbine with PMSG: State of the Art
Next Article in Special Issue
Improving Generative and Discriminative Modelling Performance by Implementing Learning Constraints in Encapsulated Variational Autoencoders
Previous Article in Journal
A Process-Oriented Method for Tracking Rainstorms with a Time-Series of Raster Datasets
Previous Article in Special Issue
A Simple Convolutional Neural Network with Rule Extraction
Open AccessArticle

Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features

1
Department of Digital Contents, Sejong University, Seoul 05006, Korea
2
Department of Electrical Engineering, Sejong University, Seoul 05006, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(12), 2470; https://doi.org/10.3390/app9122470
Received: 22 March 2019 / Revised: 4 June 2019 / Accepted: 12 June 2019 / Published: 17 June 2019
(This article belongs to the Special Issue Advances in Deep Learning)
The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension. View Full-Text
Keywords: timbre acoustic features; valence dimension; affective computing; emotion recognition; neural networks; speech processing timbre acoustic features; valence dimension; affective computing; emotion recognition; neural networks; speech processing
Show Figures

Figure 1

MDPI and ACS Style

Tursunov, A.; Kwon, S.; Pang, H.-S. Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features. Appl. Sci. 2019, 9, 2470.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop