Next Article in Journal
Liquid Organic Hydrogen Carriers Applied on Methane–Hydrogen-Fueled Internal Combustion Engines: A Preliminary Analysis of Process Heat Balance
Next Article in Special Issue
An Ensemble Transfer Learning Model for Detecting Stego Images
Previous Article in Journal
Multiscale Design of Graded Stochastic Cellular Structures for the Heat Transfer Problem
Previous Article in Special Issue
An Efficient SMOTE-Based Deep Learning Model for Voice Pathology Detection
 
 
Review
Peer-Review Record

Robustness of Deep Learning Models for Vision Tasks

Appl. Sci. 2023, 13(7), 4422; https://doi.org/10.3390/app13074422
by Youngseok Lee 1 and Jongweon Kim 2,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Appl. Sci. 2023, 13(7), 4422; https://doi.org/10.3390/app13074422
Submission received: 9 February 2023 / Revised: 10 March 2023 / Accepted: 19 March 2023 / Published: 30 March 2023
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)

Round 1

Reviewer 1 Report

This manuscript introduces various deep-learning models based on convolutional neural networks (CNNs), techniques for adversarial attacks, and defences against these attacks. The paper also presents brain-inspired deep-learning models that mimic the visual cortex, which is resistant to adversarial attacks.

I recommend rewriting the abstract. In the abstract, a reader should get informed about the main problem this study addresses, objectives, methods used and main results. From the current abstract, we can not conclude what the main findings of this study are.

In the introduction in the paragraph starting in Line 39, the authors present recent literature in the neural network field. However, most of the listed studies were not newly published. I suggest adding more recent studies from the area. 

I recommend authors explain this manuscript's novelty and the main contribution of this work to the existing literature in the introduction. 

In section 2.2, the authors wrote the following: "In this context, we focus on three of the most important types of deep-learning models concerning applicability to vision tasks, CNNs, the "Boltzmann family" including deep belief networks (DBNs) and deep Boltzmann machines (DBMs), and Autoencoders and Sparse coding." – so, first, if I counted correctly, there are four types? In Figure 2, we can also see four types of deep learning models. Figure 2 is not explained in the text. That said, the sentence where the authors refer to Figure 2 should probably refer to Figure 3. Please check the sentence in Line 119. In section 2.2, I am missing an explanation of why authors chose to focus on the types mentioned above of deep learning models. 

Figure 16 was not explained in the text. The same applies to Figure 27. In the manuscript, many rent figures need a better explanation to provide the reader with enough information to understand what is presented in the figure. 

In Table 1, there are some results of the adversarial attack. Still, according to the description of Table 1, it is not clear whether these results were gained with experiments conducted by the authors of this manuscript or not. The same applies to Table 2. If the presented data are the results of another study, it should be explained in the text as well as in the caption of the table.

Because of typos identified while reading the manuscript, I recommend proofreading the manuscript. Some examples of typos:

Line 177: Sentence starts with "the network's ability is limited to reconstructing only" – should begin with a capital letter. 

Line 203: "the Here, the green color represents" – the word "the" can be deleted.

Line 209: "Such variations the include novel blocks, parameter optimization, and structural reformulation." – the word "the" should be deleted.

Section "2. Adversarial Attacks and Defenses" is the third section and not the second.

 

Author Response

Responses to the Associate Editor’s and Reviewers’ Comments

10 March 2023

Dear reviewers and editorial staffs in Applied Sciences

 

We are sincerely grateful for your thorough consideration and scrutiny of our manuscript, “Robustness of Deep-Learning Models for Vision Tasks”, Manuscript ID: applsci-2241744. Through the accurate comments made by the reviewers, we better understand the critical issues in this paper. We have revised the manuscript according to the Reviewer’s suggestions. We hope that our revised manuscript will be considered and accepted for publication in the Applied Sciences. We acknowledge that the scientific quality of our manuscript was improved by the scrutinizing efforts of the reviewers and editors.

 

 

Reviewer #1 :

<GENERAL COMMENTS>

1) Reviewer’s comment: I recommend rewriting the abstract. In the abstract, a reader should get informed about the main problem this study addresses, objectives, methods used and main results. From the current abstract, we can not conclude what the main findings of this study are.

Author’s response: We appreciate the reviewer’s comment. The abstract has been rewritten according to Reviewer's comments. In addition, the revised abstract includes the robustness of deep-learning models against adversarial attacks as the issue and the introduction of adversarial attacks and defense technologies in state-of-the-art methods. Finally, we mentioned in the abstract that deep-learning models based on new physiological research would emerge.

The revised abstract section is shown below: 

 In recent years, artificial intelligence technologies in vision tasks have gradually begun to be applied to the physical world, proving they are vulnerable to adversarial attacks. Thus, the importance of improving robustness against adversarial attacks has emerged as an urgent issue in vision tasks. This article aims to provide a historical summary of the evolution of adversarial attacks and defense methods on CNN-based models and also introduces studies focusing on brain-inspired models that mimic the visual cortex, which is resistant to adversarial attacks. As the origination of CNN models were the application of physiological findings related to the visual cortex of the time, new physiological studies related to the visual cortex provide an opportunity to create more robust models against adversarial attacks. The authors hope this review will promote interest and progress in artificial-intelligent security by improving the robustness of deep-learning models for vision tasks.

2) Reviewer’s comment: In the introduction in the paragraph starting in Line 39, the authors present recent literature in the neural network field. However, most of the listed studies were not newly published. I suggest adding more recent studies from the area. 

Author’s response: We appreciate the reviewer’s comment. We admit that we have not been able to cite the most recent papers. The revised paper cites 11 recent papers published from 2021 to 2022 at the end of the sentence running from line 39 to line 43. The revised pragraph is shown below: 

Because adversarial attacks based on adversarial examples can be a fatal weakness, particularly in vision tasks, such as autonomous driving, a defense technology for such attacks has been proposed, along with a new adversarial attack that overcomes the defense technology [23–33].

 

3) Reviewer’s comment: I recommend authors explain this manuscript's novelty and the main contribution of this work to the existing literature in the introduction.

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. In the revised paper, we describe our main contribution from line 52 to line 55. Revised paragraph is shown below:  

The main contribution of our review introduced robust deep-learning models based on physiological observations and discoveries in the visual cortex of mammals, including the contents of existing reviews that described the adversarial attack and defense techniques.

4) Reviewer’s comment: In section 2.2, the authors wrote the following: "In this context, we focus on three of the most important types of deep-learning models concerning applicability to vision tasks, CNNs, the "Boltzmann family" including deep belief networks (DBNs) and deep Boltzmann machines (DBMs), and Autoencoders and Sparse coding." – so, first, if I counted correctly, there are four types? In Figure 2, we can also see four types of deep learning models. Figure 2 is not explained in the text. That said, the sentence where the authors refer to Figure 2 should probably refer to Figure 3. Please check the sentence in Line 119. In section 2.2, I am missing an explanation of why authors chose to focus on the types mentioned above of deep learning models. 

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. In the revised paper, three types were corrected to four types. This is an obvious mistake on our part. Also, Figure 2 on line 119 is an incorrect representation of Figure 3. This has also been corrected. The sentence related to "In this context, we focus on the three type~" has been corrected to line 107-line 110 sentences to clarify the meaning. Revised paragraph for Figure 16 is shown below:

Regarding vision tasks, deep learning models can be classified into four broad categories, CNNs, the “Boltzmann family” including deep belief networks (DBNs) and deep Boltzmann machines (DBMs), and autoencoders and sparse coding as shown in Figure 2.

And also, the contents of Figure 2 are described from line 120 for CNNs, 139 lines for RBM, 168 lines for autoencoder, and 186 lines for sparse coding.

5) Reviewer’s comment: Figure 16 was not explained in the text. The same applies to Figure 27. In the manuscript, many rent figures need a better explanation to provide the reader with enough information to understand what is presented in the figure.

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. In the revised paper, we describe for Figure 16 from line 463 to line 467. Revised paragraph for Figure 16 is shown below:  

For example, poisoning attacks [108,110] change the probability distribution of the original training data by injecting malicious data into the training data during the training stage to reduce the prediction accuracy of the model, as shown in Figure 16 [114].

And also, we describe for Figure 27 from line886 to line 889. Revised paragraph for Figure 27 is shown below:

A deep-learning model was trained as a wide deep convolutional neural network (WDCNN), and artificial test samples generated by this model were then transferred to target neural network models as shown in Figure 27.

Finally, we've rewritten the almost captions for all the figures to make it clearer in revised paper.

6) Reviewer’s comment: In Table 1, there are some results of the adversarial attack. Still, according to the description of Table 1, it is not clear whether these results were gained with experiments conducted by the authors of this manuscript or not. The same applies to Table 2. If the presented data are the results of another study, it should be explained in the text as well as in the caption of the table

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. In the revised paper, we added references to the captions in Table 1 and Table 2 to clarify the subject of the results. Revised paragraph for Table 1(line 940) and Table 2 (Lines 962- 963) are shown respectively below: 

Table 1. Adversarial-attack results by integrated-contour model in [153].

Table 2. The results of adversarial tests for two hidden layers artificial neural network trained by the GLR with different activation and loss functions in [163].

And the descriptions of Tables 1 and 2 are described in lines 935-939 and lines 965-970 as follows, respectively.

As shown in Table 1, regardless of the number of pixels for perturbation, the attack success rate was reduced compared with the results of AlexNet. In addition, the author claimed that applying the same contour-integration layer to the pretrained MobileNet model yielded an accuracy improvement of 67% and a low attack success rate of approximately 13%.

 

As shown in Table 2, the accuracy reached 96% when the backpropagation was applied to the benign examples. However, when it was applied to adversarial examples generated via the L_BFGS method and FGSM for the same model, the accuracies decreased to 57% and 28%, respectively. For the model trained using the proposed method, the accuracies of the adversarial examples were 78% and 53% for L_BFGS and the FGSM, respectively, indicating a significant accuracy improvement.

 

7) Reviewer’s comment: Because of typos identified while reading the manuscript, I recommend proofreading the manuscript. Some examples of typos:

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. This is clearly our fault and the weakness of authors who are not native English speakers. We commissioned a company specializing in paper revision to edit our paper and supplemented the revisions. We will send you a confirmation of the correction on file.

 

8) Reviewer’s comment: Line 177: Sentence starts with "the network's ability is limited to reconstructing only" – should begin with a capital letter. 

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. This is clearly our fault and the weakness of authors who are not native English speakers. We corrected the typo pointed out. Revised paragraph (line 182) is shown below:

The network’s ability is limited to reconstructing only the average of the training data, but a solution was proposed in [9] through the use of a pretraining method provides the network with initial weights closer to the final solution.

 

9) Reviewer’s comment: Line 203: "the Here, the green color represents" – the word "the" can be deleted.

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. This is clearly our fault and the weakness of authors who are not native English speakers. We corrected the typo pointed out. Revised paragraph (lines 236-237) is shown below:

Here, the blue color represents the 2  2 kernels, and the green color represents the input region for taking the dot product in the input image.

 

10) Reviewer’s comment: Line 209: "Such variations the include novel blocks, parameter optimization, and structural reformulation." – the word "the" should be deleted.

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. This is clearly our fault and the weakness of authors who are not native English speakers. We corrected the typo pointed out. Revised paragraph (lines 317-318) is shown below:

Such variations include novel blocks, parameter optimization, and structural reformulation.

 

11) Reviewer’s comment: Section "2. Adversarial Attacks and Defenses" is the third section and not the second.

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. This is clearly our fault. We corrected the pointed out. Revised paragraph (line 458) is shown below:

  1. Adversarial Attacks and Defenses

Author Response File: Author Response.pdf

Reviewer 2 Report

The article is very interesting as it shows recent studies on deep learning models for vision tasks are vulnerable to contradictory examples. They present a review presenting various deep learning models based on convolutional neural networks (CNN), techniques for adversarial attacks, and defenses against these attacks. They also show brain-inspired deep learning models that mimic the visual cortex, which is resistant to attacks from adversaries. However, this work has some observations that can enrich this work.

 

1.- Figure 1 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

 

2.- Figure 3 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

 

3.- Figure 7 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

 

4.- Align the numbering of page 7. For example: 1. Convolutional layer (it is placed to the right) and 2. Pooling layer with the other numbers are loaded to the left, organize that part.

 

5.- Figure 8 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

 

6.- Why are the texts highlighted in yellow in figure 10?

 

7.- Figure 11 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

 

8.- Figure 12 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

 

9.- Figure 13 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

 

10.- On page 14 there is a comment, delete the comment if the comment has already been answered.

 

**After having reviewed this work, it is necessary to address the observations so that the article can be accepted.

Author Response

Responses to the Associate Editor’s and Reviewers’ Comments

10 March , 2023

Dear reviewers and editorial staffs in Applied Sciences

 

We are sincerely grateful for your thorough consideration and scrutiny of our manuscript, “Robustness of Deep-Learning Models for Vision Tasks”, Manuscript ID: applsci-2241744. Through the accurate comments made by the reviewers, we better understand the critical issues in this paper. We have revised the manuscript according to the Reviewer’s suggestions. We hope that our revised manuscript will be considered and accepted for publication in the Applied Sciences. We acknowledge that the scientific quality of our manuscript was improved by the scrutinizing efforts of the reviewers and editors.

 

Reviewer #2:

1) Reviewer’s comment: Figure 1 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. The characters and lines that make up the figures have been redrawn so that they are clearly visible.

 

2) Reviewer’s comment: Figure 3 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. The characters and lines that make up the figures have been redrawn so that they are clearly visible.

 

3) Reviewer’s comment: Figure 7 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. The characters and lines that make up the figures have been redrawn so that they are clearly visible.

 

4) Reviewer’s comment: Figure 1 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. We aligned them with the MDPI paper format.

 

5) Reviewer’s comment: Figure 8 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. The characters and lines that make up the figures have been redrawn so that they are clearly visible.

 

6) Reviewer’s comment: Why are the texts highlighted in yellow in figure 10?

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. We removed highliting yellow in Figire 10.

 

7) Reviewer’s comment: Figure 11 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. The characters and lines that make up the figures have been redrawn so that they are clearly visible.

 

8) Reviewer’s comment: Figure 12 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. The characters and lines that make up the figures have been redrawn so that they are clearly visible.

 

9) Reviewer’s comment: Figure 13 is not properly appreciated, it is necessary to increase the size of the letters and improve the quality of the figure.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. The characters and lines that make up the figures have been redrawn so that they are clearly visible.

 

10) Reviewer’s comment: On page 14 there is a comment, delete the comment if the comment has already been answered.

Author’s response: We appreciate the reviewer’s comment. Thank you for the reviewer's compliment and encouragement of our study. We removed the comment on page 14.

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors review historically significant deep learning models, adversarial attacks on vision tasks, the relative defense techniques, and robust brain-inspired models which is promising to be more general robustness strategies to adversarial attacks. The manuscript was submitted at the end of 2022. However, in the reference list, there is only one paper published in 2022. Considering the fast development of robustness study of deep learning vision, it is better to include more references in 2022 and 2023 to enhance the novelty of this study. (i.e., PeerJ Computer Science 9 (2023): e1197, arXiv:2211.10670v2, arXiv:2112.00639v2,et al.). The structure of the manuscript is well packaged. Here are other comments.

  Figure captions: it is suggested to add more explanation to clarify the details.

 Some typos may lead to misunderstanding. Please recheck and verify. For example,

1)line 50, different;

2)line 104, 4 types instead of 3 as illustrated in Fig. 2?

3)Line 119, figure 3 instead of figure 2?

Author Response

Responses to the Associate Editor’s and Reviewers’ Comments

10 March 2023

Dear reviewers and editorial staffs in Applied Sciences

 

We are sincerely grateful for your thorough consideration and scrutiny of our manuscript, “Robustness of Deep-Learning Models for Vision Tasks”, Manuscript ID: applsci-2241744. Through the accurate comments made by the reviewers, we better understand the critical issues in this paper. We have revised the manuscript according to the Reviewer’s suggestions. We hope that our revised manuscript will be considered and accepted for publication in the Applied Sciences. We acknowledge that the scientific quality of our manuscript was improved by the scrutinizing efforts of the reviewers and editors.

Reviewer #3:

1) Reviewer’s comment: The authors review historically significant deep learning models, adversarial attacks on vision tasks, the relative defense techniques, and robust brain-inspired models which is promising to be more general robustness strategies to adversarial attacks. The manuscript was submitted at the end of 2022. However, in the reference list, there is only one paper published in 2022. Considering the fast development of robustness study of deep learning vision, it is better to include more references in 2022 and 2023 to enhance the novelty of this study. (i.e., PeerJ Computer Science 9 (2023): e1197, arXiv:2211.10670v2, arXiv:2112.00639v2,et al.). The structure of the manuscript is well packaged. Here are other comments..

Author’s response: We appreciate the reviewer’s comment. We think it's a very good point, and we agree with the reviewer's opinion. The revised paper cites 11 recent papers incuding papers recommended by the revewer published from 2021 to 2022 at the end of the sentence running from line 39 to line 43. The revised pragraph is shown below: 

Because adversarial attacks based on adversarial examples can be a fatal weakness, particularly in vision tasks, such as autonomous driving, a defense technology for such attacks has been proposed, along with a new adversarial attack that overcomes the defense technology [23–33].

 

2) Reviewer’s comment: Figure captions: it is suggested to add more explanation to clarify the details.

Author’s response: We appreciate the reviewer’s comment. We think it's a very good point, and we agree with the reviewer's opinion. We changed almost figure captions for a clearer expression. Please find the revised paper.

 

3) Reviewer’s comment: Some typos may lead to misunderstanding. Please recheck and verify. For example,

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. This is clearly our fault and the weakness of authors who are not native English speakers. We commissioned a company specializing in paper revision to edit our paper and supplemented the revisions. We will send you a confirmation of the correction on file.

 

4) Reviewer’s comment: line 50, different;

Author’s response: We appreciate the reviewer’s comment. We wholly agree with the reviewer’s opinion. This is clearly our fault and the weakness of authors who are not native English speakers.

 

5) Reviewer’s comment: line 104, 4 types instead of 3 as illustrated in Fig. 2?

Author’s response: We appreciate the reviewer’s comment. We corrected the

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have improved the manuscript according to all reviewers' comments and suggestions.

Back to TopTop