Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Simple Distortion-Free Method to Handle Variable Length Sequences for Recurrent Neural Networks in Text Dependent Speaker Verification

Appl. Sci. 2020, 10(12), 4092; https://doi.org/10.3390/app10124092

by Sung-Hyun Yoon

and Ha-Jin Yu^*

Reviewer 1: Anonymous

Reviewer 2:

Rui Sousa-Silva

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Appl. Sci. 2020, 10(12), 4092; https://doi.org/10.3390/app10124092

Submission received: 22 May 2020 / Revised: 10 June 2020 / Accepted: 11 June 2020 / Published: 14 June 2020

(This article belongs to the Special Issue Intelligent Speech and Acoustic Signal Processing)

Round 1

Reviewer 1 Report

The manuscript presents an interesting topic related to handling variable-length sequences of data. Although the remaining idea is very nice I think the paper has important flaws.

The most pressing concern is the perspective proposed by the authors. They select neural networks (RNN) to develop their approach. Nevertheless, the problem they address is generic appearing for other Machine Learning models. Therefore, I strongly recommend modifying the manuscript making the proposal as generic as they can. Then, they can propose experiments focused on RNN if they prefer. Nevertheless, a comparative between models to show which ones obtain better benefits from the new approach could be highly recommendable.

The second approach is related to the conclusions. They are very soft and tiny. And no future work and guidelines are detailed. For these reasons, I think they are incompleted.

I desire the best of lucks to the authors if they decide to address the indicated modifications in the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript presents a highly interesting and promising piece of research. The approach proposed by the authors is promising, as, if successful, it will help resolve a practical and common problem with neural networks. The manuscript is overall clear and well-written, although it requires a light linguistic editing. The authors further provide an extensive description of the methodology adopted.

However, the manuscript would benefit from some improvements. Firstly, the authors approach Text Dependent Speaker Verification, but the particular problems of speaker verification are not described, neither in detail, nor sufficiently clearly. Rather the contrary, the authors assume that by making reference to the corpus used, the readers will know what the problem is; however true this may be in most cases, it does not apply in general and some readers would appreciate such information. Secondly, the methodological approach is very dense and it would be beneficial for the manuscript if some of this information was unpacked so as to provide a clearer explanation of the procedural steps. Last, but not least, the conclusion is rather disappointing, given that the success of the approach is rather limited. It would perhaps soften this disappointment if the authors warded off those limitations over the manuscript, so that the reader is prepared in advance and doesn't set the expectations too high.

Notwithstanding, this is a relevant piece of research.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Dear authors,

your article is interesting with using of simple "Distortion-Free Method " to handle variable length sequences for recurrent neural networks. It is important that there is no/ or min. information distortion.

Author Response

Thank you for your opinion about our manuscript.

Reviewer 4 Report

This paper proposes a method to handle inputs with different lengths for recurrent neural networks in the context of text-dependent speaker verification. The proposed method is simple but effective. I find it interesting. I have only a few minor comments.

(1) In Section 4, I feel that the authors do not fully explain how to compute embeddings from output sequences in practice. I think some additional discussion will be helpful for readers.

(2) I guess Table 1 is for the method without self-attention, but this was unclear at my first glance. An additional explanation will be necessary.

Author Response

Please see the attachment.

Round 2

Reviewer 1 Report

The authors try to improve the quality of the manuscript. Though they do not satisfy my interest and concerns related to see the method working with other Machine Learning models, they have proposed at least for the future.

Article Menu

A Simple Distortion-Free Method to Handle Variable Length Sequences for Recurrent Neural Networks in Text Dependent Speaker Verification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI