A Study of Weather-Image Classification Combining VIT and a Dual Enhanced-Attention Module
Round 1
Reviewer 1 Report
The authors proposed combining Visual Interaction Transformer (VIT) and Dual Enhanced Attention Module for weather image classification. The idea is to take advantage of the following:
1. VIT captures global and local dependencies among image regions and is designed to handle geometric relationships in the image better.
2. Attention to enhance visual attention by applying a dual attention mechanism.
Experiments are conducted on public weather image datasets MWD (Multi-class Weather Database) and WEAPD (Weather Phenomenon Database). The idea is interesting and promising. However, I have significant concerns regarding the presented study:
- What makes the proposed study different from other recent works such as :
"VIT-DEAM: A Visual Interaction Transformer with Dual Enhanced Attention Module for Image Classification" (2021)
"Combining Visual Interaction Transformer and Dual Enhanced Attention Module for Image Classification" (2021)
I believe the authors should clearly highlight their contribution and what it makes it novel regarding existing related research works.
- The overall approach is not well explained and the main contribution should be better detailed.
- Another important thing related to experiments. Why the authors stopped their experiments after 8 epochs? Maybe we can get good results using simple ML or DL techniques at a later stage; in that case, what is the need for a very advanced technique like transformers, especially since its implementation cost could be high?
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
Acronym should be explained at their first use and I see some acronyms used before or without explaining heir meaning like VIT, F1,..
I recommend a review for the English grammar, and punctation
Author Response
Please see attachment
Author Response File: Author Response.pdf
Reviewer 3 Report
The figure captions state only the obvious. Making the figure captions more complete and comprehensive would be very useful, as many times readers will be interested just in them and the captions, without necessarily giving full attention to the main text. What one is really curious about when looking at a figure is what is the main message; the most interesting feature that one should look out for, not just what is presented.
The authors need to analyze the potential reasons why the proposed method achieved better results than state-of-the-art approaches.
The literature review was not sufficient, more recent research works should be discussed.
Transfer learning is a popular way to generate image representations, and has been used in medical image analysis. Please discuss the methods in ‘CGENet: A Deep Graph Model for COVID-19 Detection Based on Chest CT’ and 'Detection of abnormal brain in MRI via improved AlexNet and ELM optimized by chaotic bat algorithm'.
Please provide some drawbacks of your method and some future research directions in Conclusion.
Author Response
Please see attachment
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The paragraph (lines 40-53) is repeated at the end of the introduction and at the end of the related works (lines 129-142).
Author Response
Please see attachment
Author Response File: Author Response.pdf
Reviewer 3 Report
This revised paper can be accepted.
Author Response
Please see attachment
Author Response File: Author Response.pdf