Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Uni2Mul: A Conformer-Based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task Learning

Appl. Sci. 2023, 13(17), 9910; https://doi.org/10.3390/app13179910

by Lihong Zhang^*, Chaolong Liu and Nan Jia

Reviewer 1: Anonymous

Reviewer 2:

Muhammed Yildirim

Reviewer 3:

Jiří Maxa

Appl. Sci. 2023, 13(17), 9910; https://doi.org/10.3390/app13179910

Submission received: 8 July 2023 / Revised: 25 August 2023 / Accepted: 29 August 2023 / Published: 1 September 2023

(This article belongs to the Special Issue Advanced Technologies for Emotion Recognition)

Round 1

Reviewer 1 Report

The article proposes a model to classify multimodal emotions. The main problem addresed by the research is to improve the effectiveness of such classification based on high-quality unimodal representations. The topic is not very novel as it was already considered in the literature and the Authors provide adequate references for that. However, I find it relevant and interesting as the Authors propose their approach to the issue.

The main contribution of the Authors is the development and testing of many techniques of extracting unimodal representations and implementation of model that helps to achieve improved multimodal representation.

I do not have any remarks concerning the methodology that is used in the research. However, I would find it advisable to conduct experiments of the model also with other datasets, if they are available.

The discussion of the results supports all necessary evidence and arguments to address the main questions posed in the research. Conclusion is very short and does not add much to it.

The references given by the Authors are appropriate - they are up to date and sufficient to support the Authors' main thoughts.

Tables and figures prepared by the Authors for the paper provide interesting insigths and they are valuable addition to the text.

I have some minor remarks concerning the editing of the paper. There are some unnecessary hyphens in the text (lines, 12, 239). In the section 4.2 there is another level of subsection describing single parameters. I would suggest different presentation, subsection containing two sentences does not look good.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

I have thoroughly reviewed your paper entitled "Uni2Mul: A Conformer-based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task Learning". The missing points in the study are listed below. What is meant by the concepts of multidimensional and one-dimensional in the Abstract? In the abstract, why the study was done, what its contributions to the literature were, the innovations of the study and the results should be summarized. Abstract section needs to be updated. I would like to state that the work is well written in general terms. But the important parts were the least discussed parts in the study. For example, how feature inferences were made. What are the contributions of the multimodel? The contributions part written at the end of the Introduction section should be written more clearly. The model used should be mentioned in this section. Below this, a paragraph about the organization of the article should be added. No information was given about the data set. The limitations of the study should be mentioned. I would like to state that the Methodology part is well written, but the model and results are very close. There are different performance measurement metrics used in the literature. It is interesting that none of these metrics are included. As a result, the engineering side of the study should be highlighted.

Spelling and grammatical errors should be reviewed.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

It would be beneficial to explain all the used abbreviations in the text, including those common in the field, for a wider audience of readers.

Consider giving reasons for choosing the given timestep figure in chapter 4.2.1, and the given acoustic timestep value in chapter 4.2.2.

Consider whether it would be beneficial to add more reasoning to the results said in this statement in chapter 4.3:
("Additionally, our multi-task models showcased superior performance
when compared to the single-task models with identical structures, thereby validating the superiority of the multi-task framework over the single-task framework. Uni2Mul-M-Conformer without pre-training outperformed Uni2Mul-S-Conformer without pre-training, and Uni2Mul-M-Conformer outperformed Uni2Mul-S-Conformer.")

Consider creating a bar chart for the figures in Table 2, as it might be more transparent.

I recommend a more detailed justification of obtained results presented in Fig. 6.

It would be fitting to significantly enlarge the Conclusion.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

It is important to review the article and correct some typos (small and capital letters).

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Article Menu

Uni2Mul: A Conformer-Based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI