Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A High-Transferability Adversarial Sample Generation Method Incorporating Frequency Domain Transformations

Electronics 2024, 13(22), 4480; https://doi.org/10.3390/electronics13224480

by Sijian Yan, Zhengjie Deng^*, Jiale Dong and Xiyan Li

Reviewer 1: Anonymous

Reviewer 2:

Dariusz Mika

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Electronics 2024, 13(22), 4480; https://doi.org/10.3390/electronics13224480

Submission received: 14 October 2024 / Revised: 10 November 2024 / Accepted: 12 November 2024 / Published: 15 November 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents a method of frequency transformation of an input image for a neural network in order to attack it. In my opinion, the topic is interesting and aligns with the scientific scope of Electronics journal. However, it should be noted that the current version of the paper contains a significant number of substantive errors in the theoretical section, which makes an objective assessment difficult. It is important to emphasize that one of the widely recognized values of scientific work is the clarity of the mathematical model, which allows potential readers to understand the essence of the proposed approach and, ultimately, interpret the obtained results.

Considering the above, I outline below some points that may significantly improve future versions of the paper:

The symbol J, as well as Clip, is used in various formats: L120, 128, 142, 212/9,10.
Missing definition of x' on L128.
Missing definition of $ϵ\epsilon$ on L128.
What does t represent in equations (5) and (6) on L147 and L148?
format in L 54, 77, 168-171

These are just a few examples, but unfortunately, the paper contains many more. A thorough and careful revision of the symbols and their descriptions is essential for an improved version.

Beyond editorial comments, I ask the authors to clarify the following substantive questions:

The DCT transformation mechanism presented in the paper has been effectively used for years in the JPEG algorithm for lossy image compression. It is known that eliminating transform coefficients in the frequency domain may have not only a quantitative model (as the authors describe) but can also be eliminated/modified qualitatively. For example, only the smallest DCT coefficient values could be removed. Both cases involve the effect of tiling the input image, which can easily be detected through edge detection (high-pass filtering) of original image. How did the authors take this phenomenon into account in their research?
The structure of the input image is crucial for all image processing methods. For example, the derivative of the image function will be significantly different for an intensity image versus its indexed structure equivalent. What image structures did the authors consider in their research? Why? Would the results differ for different structures?

Author Response

Thank you for your comments on our manuscript entitled "A High-Transferability Adversarial Sample Generation Method Incorporating Frequency Domain Transformations". We have carefully studied these comments and have made revisions that we hope will meet with your approval.

Comments 1:[errors in formulas and symbols]

Response 1:Thank you for pointing this out.We have carefully revised and corrected the related errors in formulas and symbols.

Comments 2:[The DCT transformation mechanism presented in the paper has been effectively used for years in the JPEG algorithm for lossy image compression. It is known that eliminating transform coefficients in the frequency domain may have not only a quantitative model (as the authors describe) but can also be eliminated/modified qualitatively. For example, only the smallest DCT coefficient values could be removed. Both cases involve the effect of tiling the input image, which can easily be detected through edge detection (high-pass filtering) of original image. How did the authors take this phenomenon into account in their research?]

Response 2:Indeed, removing the frequency domain coefficients of an image can lead to blocky artifacts, which are easily detectable through edge detection (high-pass filtering). However, in this paper, the modification of frequency domain coefficients is akin to a data augmentation technique. The transformations applied to the input images are intended to calculate the average gradient of several transformed images. This approach helps mitigate overfitting to the source model, thereby improving the attack success rate of adversarial examples on other models. Ultimately, we introduce a small perturbation in the direction of the gradient. As shown in the third row of Figure 2, the generated adversarial examples remain visually recognizable to the human eye, but they cause misclassification by the model. This is precisely the purpose of generating adversarial examples.

Comments 2:[The structure of the input image is crucial for all image processing methods. For example, the derivative of the image function will be significantly different for an intensity image versus its indexed structure equivalent. What image structures did the authors consider in their research? Why? Would the results differ for different structures?]

Response 3:Our image processing method is largely independent of the underlying file structure of the image. Before processing, we convert the image into a two-dimensional pixel matrix. As a result, the method should have minimal impact on images with different structural formats. We employed a widely-used dataset in the field, the ImageNet-compatible dataset, and achieved favorable results.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper concerns the important problem of generating adversarial samples in Adversarial attack methods. The authors present a novel input transformation-based attack: the Frequency Domain Enhancement (FDE) method, which performs input transformations in the frequency domain to increase input diversity.However, the theoretical part should be improved. In particular:

1. In the "Related works" section, the abbreviations of the methods used should be defined here. e.g. Fast Gradient Sign Method (FGSM), Iterative Fast Gradient Sign Method (I-FGSM), etc.

2. The "Preliminary" section was written too cursorily and should be expanded and clarified.

3. In formula (1), the loss function J should be defined. Its specific form should be given.

4. Under formula (2), it is written "...gradient of the clean image x". Shouldn't it be rather "...gradient of the loss function of the clean image x"?

5. In formula (3), the 'Clip' operation should be defined.

6. In formulas (5) and (6) the parameters µ and α should be defined.

7. In formula (3) the iteration number is marked with the letter N and in formulas (5) and (6) with the letter t. Is N+1 in formula (3) the last iteration of the algorithm? If not, this should be standardized.

8. In addition, Notations must be reviewed. Several symbols are used more than once, with different meanings, throughout the text. For example, in formula (7) M is one of the dimensions of the digital image matrix and in formula (8) M denotes the weight matrix and in the "Experiments" section M denotes the number of enhancements. This is very confusing for the reader.

9. In formulas (9) and (10) the Hadamard product should be defined.

10. Both the figures and tables are placed before the text referring to these figures or tables. This makes reading the text a bit more difficult. They should be moved to the appropriate places in the text.

Author Response

Comments 1:[In the "Related works" section, the abbreviations of the methods used should be defined here. e.g. Fast Gradient Sign Method (FGSM), Iterative Fast Gradient Sign Method (I-FGSM), etc.]

Response 1:Thank you for pointing this out. We have made revisions to the related work section.

Comments 2:[The "Preliminary" section was written too cursorily and should be expanded and clarified.]

Response 2:Thank you for pointing this out. we have modified the Preliminary section.

Comments 3:[In formula (1), the loss function J should be defined. Its specific form should be given. ]

Response 3:Thank you for pointing this out. The loss function is defined as the cross-entropy loss function. L136.

Comments 4:[Under formula (2), it is written "...gradient of the clean image x". Shouldn't it be rather "...gradient of the loss function of the clean image x"?]

Response 4:Thank you for pointing this out. The modification was made in Equation 2, where it now represents "the gradient of the loss function with respect to the image x. L136.

Comments 5:[ In formula (3), the 'Clip' operation should be defined.]

Response 5:Thank you for pointing this out. we define the clip operation of Equation 3. L152.

Comments 6:[In formulas (5) and (6) the parameters µ and α should be defined.]

Response 6:Thank you for pointing this out. The parameters and are defined in Equations (5) and (6). L160.

Comments 7:[In formula (3) the iteration number is marked with the letter N and in formulas (5) and (6) with the letter t. Is N+1 in formula (3) the last iteration of the algorithm? If not, this should be standardized.]

Response 7:Thank you for pointing this out. The iteration count in Equation (3) has been changed to . L151.

Comments 8:[In addition, Notations must be reviewed. Several symbols are used more than once, with different meanings, throughout the text. For example, in formula (7) M is one of the dimensions of the digital image matrix and in formula (8) M denotes the weight matrix and in the "Experiments" section M denotes the number of enhancements. This is very confusing for the reader.]

Response 8:Thank you for pointing this out. We corrected duplicate symbols in the paper and marked them with red text.

Comments 9:[In formulas (9) and (10) the Hadamard product should be defined.]

Response 9:Thank you for pointing this out. We define the Hadamard product of equations (9) and (10). L205.

Comments 10:[ Both the figures and tables are placed before the text referring to these figures or tables. This makes reading the text a bit more difficult. They should be moved to the appropriate places in the text.]

Response 10:Thank you for pointing this out. We changed the position of the image to move it to the right place in the text.

Reviewer 3 Report

Comments and Suggestions for Authors

Dear authors,

I think your work is interesting, but some points should be adjusted:

1) Related works should be enlarged to consider custom approach similar to yours.

2) Once cited this works, it is interesting to compare performance with these other models.

3) As metrics you used just success rate, can you include other metrics?

4) Conclusions should be supported by results in a more quantitative way.

Author Response

Comments 1:[Related works should be enlarged to consider custom approach similar to yours.]

Response 1:Thank you for pointing this out. We have extended the related work by adding Affordable and GeneralizableSubstitute (AGS) and Momentum Iterative FGSM.

Comments 2:[Once cited this works, it is interesting to compare performance with these other models.]

Response 2:Thank you for pointing this out. Some methods in the related work differ significantly from the approach presented in this paper. Therefore, we selected a set of comparative models based on other studies that address the same problem.

Comments 3:[ As metrics you used just success rate, can you include other metrics?]

Response 3:Thank you for pointing this out. In the studies we referenced that address the same problem, most use success rate as the primary metric. Since this type of attack allows for the pre-generation of adversarial examples, the algorithm's speed is not a critical factor. Therefore, we did not include a comparison of execution efficiency.

Comments 4:[Conclusions should be supported by results in a more quantitative way.]

Response 4:Thank you for pointing this out. We have revised the conclusions to be more quantitative.

Reviewer 4 Report

Comments and Suggestions for Authors

Summary

The paper "A High-Transferability Adversarial Sample Generation Method Incorporating Frequency Domain Transformations" proposes a novel Frequency Domain Enhancement (FDE) method to enhance adversarial sample transferability across different Deep Neural Network (DNN) models. By focusing on frequency domain transformations, the authors address the limitations of spatial domain transformations for adversarial attacks. Their approach shows notable improvements in transferability across various models, both defended and undefended, making it a valuable addition to research in adversarial machine learning.

General Concept Comments

- TDE approach to apply frequency domain transformations is well-motivated, with a novel application of weight matrices in the frequency domain to improve adversarial transferability. However, the methodology could benefit from an in-depth comparison with spatial domain methods, particularly around cases where FDE does not outperform spatial methods.

- The hypothesis—that frequency domain transformations improve transferability in adversarial samples—is testable and well-supported by the experimental results presented. Nevertheless, further evidence comparing FDE against other frequency-based approaches would strengthen the findings, particularly in terms of reproducibility across other datasets.

- The experiments are extensive, covering multiple models and parameters, but they could benefit from more statistically rigorous measures. While success rates are reported, confidence intervals or additional statistical analysis on the transferability improvements would offer more robust evidence.

Author Response

In response to your suggestions, we will continue to do relevant research to optimise the methodology of this paper.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for providing the explanations and modifications to the manuscript text. Unfortunately, it still contains issues related to mathematical notation:

Line 136: Function,
Lines 150, 158: Different formats for "adv"
Line 152: "t-th" and "iteration,
Line 159: Format for "}"
Line 160: $μ$ , format for "t"-th iteration
Line 228: Format for "The adversarial example $x^{\text{adv}}$ "

Additionally, please clarify the use of the Clip operator. A common interpretation of this operator is as follows:

Given a value , the clipping operator restricts to be within the range . Mathematically, this is usually defined and noted to as:

$\begin{cases} a & \text{if } x < a \\ x & \text{if } a \leq x \leq b \\ b & \text{if } x > b \end{cases}$

$Therefore, what is the interpretation of the Clip operator in this paper? Please introduce appropriate explanations in the manuscript text. What kind of mapping is being performed here?$

Author Response

Comments 1:[errors in formulas and symbols]

Response 1: Thank you for pointing this out.

Line 136: Function,ϵ

Revise: Line 136 We have revised line 136.

Lines 150, 158: Different formats for "adv"

Revise: Line 158 We have revised "adv."

Line 152: "t-th" and "iteration,Clip"

Revise: Line 151 We have revised "t-th" and "iteration, Clip."

Line 159: Format for "}"

Revise: Line 158 We have corrected the format of "}".

Line 160: μ, format for "t"-th iteration

Revise: Line 159 We have revised μ and the format of "t-th iteration."

Line 228: Format for "The adversarial example x^adv"

Revise: Line 228 We have revised " x^adv."

We have provided the definition of clip(x, a, b) on line 228.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors introduced the comments to the manuscript suggested by the reviewer.

Author Response

Thank you for your valuable feedback. Please let us know if further clarifications or adjustments are needed.

Article Menu

A High-Transferability Adversarial Sample Generation Method Incorporating Frequency Domain Transformations

Further Information

Guidelines

MDPI Initiatives

Follow MDPI