Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network

Electronics 2023, 12(4), 839; https://doi.org/10.3390/electronics12040839

by Kishor Bhangale and Mohanaprasad Kothandaraman^*

Reviewer 1: Anonymous

Reviewer 2:

Yuri Matveev

Electronics 2023, 12(4), 839; https://doi.org/10.3390/electronics12040839

Submission received: 4 January 2023 / Revised: 27 January 2023 / Accepted: 30 January 2023 / Published: 7 February 2023

(This article belongs to the Section Artificial Intelligence Circuits and Systems (AICAS))

Round 1

Reviewer 1 Report

The authors describe a NN AI for detecting emotions from audio streams. In the introduction, they adequately postulate the problems of the proposed approaches and describe the main contributions of the paper in bullet points.

For reproducibility purposes, the authors provide both the mathematical formulæ describing each component and the architecture of the overall system. As the implementation/codebase is not provided, I would have also expected to see more detail in the training phase: how the hyperparameters were settled, and which criteria were used. In fact, the authors just describe the number of epochs required for training but do not elaborate further on how the training was carried out.

The authors also provided a good set of experimental settings where different competing AIs for the same task are compared. The authors might also want to provide the training time to describe the tradeoff between this and the accuracy of the provided results. Still, the authors make a good point showing the differences across models by showing the number or total trainable parameters, from which the reader might argue that we might prefer a system with lesser parameters, as it will be more robust and less prone to human error.

The paper is well-written and easy to read. No major grammatical issues were found. Those suggested edits are very minor and I am confident they might be carried out in a few days of work.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

The topic of speech emotion recognition (SER) is very relevant, and over the past decade, quite a lot of articles have been published on this topic, including articles on selecting hand-crafted features and using 1D-DCNN for SER. That is why the originality/novelty, the significance of the content, and the interest to the readers are in great doubt.

There are big questions about the chief contributions of the proposed article:

1. What was the motivation to analyze exactly the listed set of acoustic features? In numerous state-of-the-art SER challenges, the community uses well-known datasets like INTERSPEECH Emotion Challenge set, Geneva Minimalistic Acoustic Parameter Set (Ge MAPS), and extended Geneva Mini-malistic Acoustic Parameter Set (eGeMAPS). What is wrong with these feature sets? They include most of the features listed by the authors of the article. What is the originality/novelty of the proposed feature set?

2. The second contribution was stated as the complexity reduction of deep learning frameworks for SER. But this contribution is not confirmed in the article. The authors evaluate only the accuracy of classification but do not give estimates of computational complexity. Therefore, it is impossible to confirm the value of this contribution and estimate the gain in computational complexity of 1-DCNN relative to 2-DCNN and whether this gain compensates for the decrease in accuracy from 96.07% to 93.31% on the Emo-DB dataset.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Comments for author File: Comments.pdf

Author Response

The revised manuscript and response to the reviewers is attached

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

The authors have corrected the manuscript in accordance with the recommendations of the reviewer, and the article can be accepted in its present form.

Article Menu

Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network

Further Information

Guidelines

MDPI Initiatives

Follow MDPI