Next Article in Journal
Construction of the Global Reference Atmospheric Profile Database
Next Article in Special Issue
Network Collaborative Pruning Method for Hyperspectral Image Classification Based on Evolutionary Multi-Task Optimization
Previous Article in Journal
High-Accuracy Positioning in GNSS-Blocked Areas by Using the MSCKF-Based SF-RTK/IMU/Camera Tight Integration
Previous Article in Special Issue
Probabilistic Wildfire Segmentation Using Supervised Deep Generative Model from Satellite Imagery
 
 
Article
Peer-Review Record

Adversarial Robustness Enhancement of UAV-Oriented Automatic Image Recognition Based on Deep Ensemble Models

Remote Sens. 2023, 15(12), 3007; https://doi.org/10.3390/rs15123007
by Zihao Lu, Hao Sun * and Yanjie Xu
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2023, 15(12), 3007; https://doi.org/10.3390/rs15123007
Submission received: 27 April 2023 / Revised: 31 May 2023 / Accepted: 7 June 2023 / Published: 8 June 2023

Round 1

Reviewer 1 Report

A deep ensemble model is proposed for automatic image recognition. It is interesting but not complete enough for publication. Specific comments are listed as follows:

(1)    Some spelling mistakes and writing style can be further improved, these small mistakes need to be corrected before the paper is published.

(2)    The abstract is not condensed enough. Some unnecessary descriptions need to be removed so that readers can quickly understand the content of the article.

(3)    Some pictures are not clear and does not allow the reader to quickly follow the process of the proposed method.

(4)    The description of the method is not comprehensive enough and it is difficult for readers to grasp it quickly.

(5)    The authors tried to conclude the recent advances. However, there are still some relevant works about the application of deep learning technology in SAR. It is recommended to add these works to the reference, such as

https://www.doi.org/10.1109/LGRS.2018.2865608

https://www.doi.org/10.1109/TGRS.2023.3248040

https://www.doi.org/10.1109/TIP.2018.2863046

Some spelling mistakes and writing style can be further improved.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The overall paper is interesting, but I feel that the quality can be improved. First of all please specify each time what is the metric used in your results. Check tables and graphs. Also please use more comprehensive metrics. Just accuracy can be very misleading.

Did you produce any usable models that can be used in real time on any drone? What would be the effect on the battery of a drone if you would deploy such a solution.

Please add complexity values and latency values expected in real time (if possible).

 

The overall English Language is adequate, but several issues are still present. Several grammar and syntax faults are present. Some missing articles, some duplicate words. Please have a third party analyze the paper. Preferably some one which is has an extensive knowledge of the English language.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The manuscript presents a defensive ensemble framework of ResNet and ViT to enhance detection and recognition algorithms on UAV, particularly against adversarial attacks. The proposed method is utilised to execute a proactive and reactive defences for optical and SAR images and is compared against relevant state-of-the art.

The manuscript is very well structured and provides a sound solution in terms of decision fusion and feature fusion, despite authors avoiding the use of the term fusion and opting for “integration”.

The manuscript is missing line numbers and has several incomplete fields. It does not seem be properly fitted to the journal’s template.

Overall, it is systematically written with solid experiment procedure. It is felt that a little more analysis of the limitations of the proposition should be included. The open source APER-RSI tool is a welcome idea, and it is great to see it available on GitHub.

Adjusting the model’s confidence seems reasonable to integrate different opinions, but this needs more clarification. How are the models’ weights decided? What prioritises the decision of one model over another? In the experiments, both models are weighted equally.

There are various ways of fusing predictive models, and the proactive approach is mainly implementing decision fusion. When would it be feasible to have the fusion on the feature level instead of decision level? Authors elaborate on the different ways RestNet and ViT create features in a previous section. It would be useful if the fusion choice is justified here considering that argument and the fact that the reactive defence ensemble fuses the features in the next section.

Some English corrections and edits are also provided with the review.

Good overall with no major issues. There are, however, many minor corrections required. Authors are advised to kindly consider the following:

The article before “one-stop” should be “a” not “an”. Please revise whole manuscript for this.

Page 1: “objective detection” >> “object detection”

Page 2: “an attack that designed for...” >> “an attack that is designed for…” or “an attack designed for…”

Figures: please avoid using red text for accessibility.

Page 8: “2n” >> “2N”

Page 8: “Let an vector…” >> “Let a vector…”

Figure 4: is the caption missing the ViT model name for this to be an ensemble?

Equation (3): is there a reason the summation limits are unconventional written as an equality below the symbol instead of j=1 below and 2N above?

Page 9: “to correctly…” >> “to be correctly…”

Page 9: “comparison experiments with and…” with what? please revise sentence.

It would be useful if Figure 5 is moved to the top of the page.

Page 10: “to perfectly suited…” >> “to be perfectly suited…”

Page 10: “developed called…” >> missing article “and”?

Page10: “grapthic” >> “graphics”

Page 12: “The dataset are…” >> “The dataset is…”

Page 14: “training&testing” >> “training and testing”

Figures 13 and 14 captions: “above” instead of “below”?

Page 16: “which makes DNN models hard to learn…” >> “which makes it hard for the DNN model to learn…”

Page 17: “For adversarial examples generated against ViT-Base/16, similar with the discussions in ResNet-18, stably correct recognition in optical datasets.” Kindly revise sentence.

Page 17: “performances are pretty good”, consider a more scientific way of describing performance.

Page 18: “which causes that detectors cannot extract representative…” Kindly revise sentence. Consider “which inhibits the detector from extracting representative features...”

Page 18: “single detectors perform not well…” >> “single detectors not performing well…”

Page 18: “there also exist not good performances…” >> consider rephrasing the sentence “Even though the proposed model does not perform as good on some attacks of SAR datasets, this ensemble strategy…”

Page 18: “experimented with more types of attacks…” >> “experimented with on more types of attacks...”

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

The paper proposed an ensemble method of CNN and ViT for both proactive and reactive adversarial defense in resource-limited environments, particularly on UAV platforms. The authors also developed a platform, called AREP-RSIs, for conducting adversarial defense and evaluating the adversarial robustness of DNN-based visual systems on UAVs. Experimental results on remote sensing recognition tasks using both optical and SAR datasets look promising.

One of the arguments the authors made in section "3.1 Motives of Ensemble", particularly in subsection "3.1.3. Defects in AT for Edge Environment" is that "...More importantly, the extra cost (e.g., latency, memory space, battery energy) of re-training new robust models under new attacks is another defect for adversarial training when deploying the robust models for edge environment. Therefore, it is worthwhile to investigate ensemble of base models..." 

I think that this argument in favour of ensemble models is totally moot. Because ensemble models add extra cost (e.g., latency, memory space, battery energy) to edge devices just like AT. 

The authors did not report the extra cost (e.g., latency, memory space, battery energy) of the ensemble models. The computation and memory footprints of the ensemble models were not reported. 

Just as the authors reported the "transferability test" in Fig. 2 and Fig. 3 to support their argument in subsection "3.1.2. Weak Adversarial Transferability", a similar experiment should be made to justify the claim made in subsection "3.1.3. Defects in AT for Edge Environment". 

Or I humbly suggest that the authors should remove that argument made in the subsection "3.1.2. Weak Adversarial Transferability" and report only the computation and memory footprints of the ensemble models. Readers would be interested to know these costs (e.g., model size, latency, memory space, FLOPs, etc).

 

**Minor issue:

The best results in Table 1 should be bolded as done in Tables 3-5

The authors should check some minor English expressions

E.g., In the Abstract and in Section 3.3, 1st line, "...[an] one-stop platform..." -> "...[a] one-stop platform..." 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have revised the manuscript according to the comments of all reviewers. I have no other questions.

Reviewer 2 Report

The authors have improved the overall quality of the paper taking into consideration the limited amount of time that they had. Also, they have provided sound arguments for all of my requests. Dus, I consider the paper suitable for publication.

Back to TopTop