Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning
Round 1
Reviewer 1 Report
More comparisons to the method are needed.
This is a strong paper, with comprehensive outlines of methodology and review. The author's methods are sound. I have one criticism of the paper, which is that the comparison evaluation is a bit weak. While the authors show the method compares favorably to two other methods, I think more evaluation of the results would be good, as well as more comparison to other methods. Its understandable in NLP, that the outputs can be a bit subjective; in respect of the captions, but I still think the authors could do a bit more to show how the method is better, in a qualitative fashion.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
This paper proposed a a Bidirectional LSTM for Image Captioning. The idea is interesting. I only have the following concerns.
1) Table 6 shows the computation cost of the proposed model and other baselines. However, I found the difference is marginal so that there is not a clear advantage over these works.
2) Missing recent image captioning works. Image captioning is a hot topic and there are lots of improved LSTM stuctures for Image captioning. The authors should cite and discuss more recent works that use LSTM for image captioning, such as Switchable Novel Object Captioner, TPAMI 2023
3) It is better to compare with more recent State-of-the-art works in experiments.
Need to improve.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
The author has proposed “Bi-LS-AttM: a Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning”, however, I have following comments:
- Author has not mentioned the dataset they have used to test their model.
- Even number of classes in the dataset is not mentioned in the abstract.
- The literature is very limit and not sufficient to cover the said area. Many latest references have not been mentioned in the literature. The author should include the latest literature in the manuscript and highlight their contribution. Some of the studies are as follows:
- Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning; Image captioning model using attention and object features to mimic human image understanding; A deep learning‐based image captioning method to automatically generate comprehensive explanations of bridge damage.
- Section 4.1: author has not mentioned the number of classes in both of the datasets.
- Author has shown the examples of proposed model only figure 7. Author must provide some examples of existing models as well, which they have used to comparison as well.
- This will help the readers to understand and visualize the difference between proposed approach and existing ones.
Minor editing of English language required
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 3 Report
The authors have responded to all the comments
Minor editing