Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multi-Modal Alignment of Visual Question Answering Based on Multi-Hop Attention Mechanism

Electronics 2022, 11(11), 1778; https://doi.org/10.3390/electronics11111778

by Qihao Xia¹, Chao Yu^1,2,3,*

, Yinong Hou¹, Pingping Peng¹

, Zhengqi Zheng^1,2 and Wen Chen^1,2,3

Reviewer 1:

Daniel Homocianu

Reviewer 2: Anonymous

Electronics 2022, 11(11), 1778; https://doi.org/10.3390/electronics11111778

Submission received: 28 April 2022 / Revised: 27 May 2022 / Accepted: 30 May 2022 / Published: 3 June 2022

(This article belongs to the Collection Image and Video Analysis and Understanding)

Round 1

Reviewer 1 Report

Dear Editors,

I enjoyed reading your paper.

Still, there are some issues to deal with.

For instance:

English language and style issues - Grammarly (https://app.grammarly.com) on default settings detected only for the text block resulting from the concatenation of Title+Abstract+Keywords
8 correctness issues (critical alerts) and 13 more advanced ones, namely: Unclear sentences (4), Word choice (3), Wordy sentences (2), Passive voice misuse (2), Misplaced words or phrases (1), and more (1). This meant a total score of 56 out of a maximum of 100 for this sample above. Moreover, since you do not appear to be native English speakers, I suggest a total revision of the English language and style for the entire article using Grammarly or another specialized tool;

The paper must follow the specific structure of the journal, namely:
Author Information, Abstract, Keywords, Introduction, Materials & Methods, Results, Discussion, Conclusions, etc., as indicated at: https://www.mdpi.com/journal/electronics/instructions

You must avoid ending some sections/subsections with equations/formulas, figures or tables (e.g., Table 1 just before subsection 5.4, Figure 6 just before Conclusions);

You must ensure that all figures have the required resolution (minimum 1000 pixels width/height, or a resolution of 300 dpi or higher according to the Journal’s instructions: https://www.mdpi.com/journal/electronics/instructions );

All references to equations/formulas must be explicitly and precisely formulated in the main text ( e.g., … Main text (see eq.9), …., Main text (see eq.21) );

I think more contributions in journal papers must be cited in this research both in the Introduction and the section dedicated to the interpretation of the results (I think that just 30 references, most of them in conference proceedings, are not enough);

When mentioning the use of the sigmoid function (after equation 20), some references should be provided indicating the existing relevant scientific literature;

I think the full CPU (including main memory) and GPU specifications (e.g., GPU name/code, number of CUDA cores if it comes from NVidia, GPU frequency and bandwidth in bits, graphical memory type, frequency, and amount) must be provided (at least in a footnote) in terms of ensuring support for replication of results (just a generic reference is available at this moment just before subsection 5.3);

A model accuracy of ~ 0.65 or ~65% (Table 1 - consider Accuracy (%) instead of just Accuracy) is far from Fair (>=0.7 & <0.8), Good (>=0.8 & <0.9 or Excellent (>=0.9). I think this must be assumed and specified in the paper together with two or more corresponding references to the scientific literature (where these thresholds are mentioned);

It seems that the authors provided more than a single conclusion in this paper. Therefore, the singular form (Conclusion instead of Conclusions) is not justified;

I think the conclusions should better highlight your contribution;

The Acknowledgment section must include thanks to the providers of the VQA2.0 dataset.

Thank you for your contribution and for trying to make the world a better place!

Sincerely,

D.H.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper is generally good and well structured. I suggest some minor revisions to improve the overall quality:

-the paper fluency should be improved in some sentences, as an example "We concatenate adjacent words of every word to that word before multi-modal fusion which makes full use of the information around a word when doing self-attention."

-in the related work section, recent applications of VQA should be mentioned to give the reader some context. I suggest https://doi.org/10.1016/j.patrec.2021.09.008 that is a short but exhaustive survey on recent applications of VQA.

-In Table 2 the authors should add references numbers near the models evaluated. The models should be also discussed in further details and the text should refer to them always with their reference numbers (instead of the row numbers in the table).

-Some newest references should be added. At the moment the paper present too few recent works in its bibliography.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Dear Authors,

You substantially improved your paper since the 1st round of review.

I think it is close now to the state of being published.

Sincerely,

D.H.

Article Menu

Multi-Modal Alignment of Visual Question Answering Based on Multi-Hop Attention Mechanism

Further Information

Guidelines

MDPI Initiatives

Follow MDPI