Next Article in Journal
Special Issue: Gold Nanoparticles for Catalytic Applications
Previous Article in Journal
A 2.4 GHz-Band 250 W, 60% Feedback-Type GaN-HFET Oscillator Using Imbalanced Coupling Resonator for Use in the Microwave Oven
 
 
Article
Peer-Review Record

Structure Preserving Convolutional Attention for Image Captioning

Appl. Sci. 2019, 9(14), 2888; https://doi.org/10.3390/app9142888
by Shichen Lu 1,2,†,‡, Ruimin Hu 1,2,*,‡, Jing Liu 3, Longteng Guo 3 and Fei Zheng 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2019, 9(14), 2888; https://doi.org/10.3390/app9142888
Submission received: 11 June 2019 / Revised: 16 July 2019 / Accepted: 16 July 2019 / Published: 19 July 2019
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

This paper introduces an architecture for automatic image captioning. The introduction is clearly written and adequate. The rest of the paper would benefit from grammar and typos revision (it feels like the introduction and te rest were written by different people). The references are adequae, the experimentation appropriately set and the results seem correct. However, the authors should provide more extended discussion about some of the choices that has been made, e.g.,

- Why Resnet101 is used as the encoder? I agree that it is a reasonable choice but the authors should explicitly provide the arguments for such a choice.

- Also, the cross-channel attention method looks like a simple 2D convolution across the feature map channels. Is there any other novelty related to that?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

In this paper a convolutional attention module that can preserve the spatial structure of an image is presented. The paper includes an appropriate state of the art, which is also used for comparison with the proposed methods.

The level of the use of the English language must be improved. Authors whose primary language is not English are advised to seek help in the preparation of the paper.

It should be interesting to include the computation time required per method and probably better qualitative examples should be given in order to compare soft-attention and the proposed attention map, which seem quite subjective (for instance according to the reviewer it is clear that a man is throwing a ball both in the soft attention method and the proposed method)

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop