Next Article in Journal
Cathode Shape Design for Steady-State Electrochemical Machining
Next Article in Special Issue
Assessing the Mass Transfer Coefficient in Jet Bioreactors with Classical Computer Vision Methods and Neural Networks Algorithms
Previous Article in Journal
Online Batch Selection for Enhanced Generalization in Imbalanced Datasets
 
 
Article
Peer-Review Record

Audiovisual Biometric Network with Deep Feature Fusion for Identification and Text Prompted Verification

Algorithms 2023, 16(2), 66; https://doi.org/10.3390/a16020066
by Juan Carlos Atenco *,†, Juan Carlos Moreno and Juan Manuel Ramirez
Reviewer 1:
Reviewer 2: Anonymous
Algorithms 2023, 16(2), 66; https://doi.org/10.3390/a16020066
Submission received: 18 November 2022 / Revised: 7 January 2023 / Accepted: 7 January 2023 / Published: 19 January 2023
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

Round 1

Reviewer 1 Report

A method of fusion of two biometric modalities is presented. Although the fusion of face and voice is an old and well explored topic, the paper presents quality description of problem, methods, database and results. However, it has two crucial defects. Both concern the output of the system. First is revealed in line 197: the number of output neurons correspond to the number of persons in the database. This prevents any practical use of the developed system, since in real world user should be added or removed without re-training of the system. Apart, number of 45 users is quite small. Second problem is revealed in line 202: the authors train the system to recognize letters of the alphabet in the speech sound. This is a flawed way, since letters of written text correspond weakly to spoken sounds. Trying to train the neural net to get letters from spoken sounds will lead to ineffective waste of performance, both in calculation amount and precision. There is a way of direct recognizing words (which requires big neural net and huge training) or recognizing phonemes. I think, these two drawbacks are misleading and cause the development of irrelevant things. Considering this the paper cannot be accepted, although the design and presentation are good.

Some typos:

1. Line 21: "a enrolled" -> "an enrolled"; 2. Line 48: "an unified" -> "a unified"; 3. Table 2: "Tahn" -> "Tanh".

Figure 4 displays ROC curve in linear scale. Contemporary face recognition systems have achieved high performance, at which linear-scale ROC curves are all pushed to the upper-left corner of the graph and are not distinguishable. It is better to use logarithmic scale and DET curve for systems with error rate below 5%.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper presents a bimodal multitask network to perform identification and text prompted verification. 

 

There are some important technical details that need to be clarified:

- The proposed network as stated by the authors has "a total population of 51 individuals, this set was divided into 45 target clients and 6 impostors". Hence, what is the output of the network for impostors?

- It is not clear if the authors augmented the data before or after the split of the data

- It is not clear how the data were split and on which (and how many) data the system has evaluated.

- 1000 epochs for training are many ones, also considering the limited amount of data available. The authors should report the curves of loss and accuracy of the training

 

Minor issues:

- The authors should also highlight the limitations of their approach. For example, identifying new users requires training the network from scratch.

- A proofread is needed to correct errors and typos.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors has explained two items that caused my concern in their response to comments. However I think they should convey these explanations in the text of the article, not only for me, especially the article "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin" (and maybe others), which validate the design of research.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors addressed the raised issues

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop