Next Article in Journal
Secure Task Offloading and Resource Allocation Strategies in Mobile Applications Using Probit Mish-Gated Recurrent Unit and an Enhanced-Searching-Based Serval Optimization Algorithm
Next Article in Special Issue
Collaborative Decision Making with Responsible AI: Establishing Trust and Load Models for Probabilistic Transparency
Previous Article in Journal
Linear Consensus Protocol Based on Vague Sets and Multi-Attribute Decision-Making Methods
Previous Article in Special Issue
DPShield: Optimizing Differential Privacy for High-Utility Data Analysis in Sensitive Domains
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient Adversarial Attack Based on Moment Estimation and Lookahead Gradient

1
Hubei Province Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
2
School of Mathematics & Computer Science, Hubei University of Arts and Science, Xiangyang 441053, China
3
School of Science, Wuhan University of Technology, Wuhan 430070, China
4
School of Artificial Intelligence, Hubei Business College, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(13), 2464; https://doi.org/10.3390/electronics13132464
Submission received: 29 May 2024 / Revised: 14 June 2024 / Accepted: 16 June 2024 / Published: 24 June 2024
(This article belongs to the Special Issue Artificial Intelligence and Applications—Responsible AI)

Abstract

Adversarial example generation is a technique that involves perturbing inputs with imperceptible noise to induce misclassifications in neural networks, serving as a means to assess the robustness of such models. Among the adversarial attack algorithms, momentum iterative fast gradient sign Method (MI-FGSM) and its variants constitute a class of highly effective offensive strategies, achieving near-perfect attack success rates in white-box settings. However, these methods’ use of sign activation functions severely compromises gradient information, which leads to low success rates in black-box attacks and results in large adversarial perturbations. In this paper, we introduce a novel adversarial attack algorithm, NA-FGTM. Our method employs the Tanh activation function instead of the sign which can accurately preserve gradient information. In addition, it utilizes the Adam optimization algorithm as well as the Nesterov acceleration, which is able to stabilize gradient update directions and expedite gradient convergence. Above all, the transferability of adversarial examples can be enhanced. Through integration with data augmentation techniques such as DIM, TIM, and SIM, NA-FGTM can further improve the efficacy of black-box attacks. Extensive experiments on the ImageNet dataset demonstrate that our method outperforms the state-of-the-art approaches in terms of black-box attack success rate and generates adversarial examples with smaller perturbations.

1. Introduction

Currently, neural networks are ubiquitously employed across intelligent software domains such as object detection [1], security monitoring [2], autonomous vehicles [3], speech recognition [4] and image classification [5]; however, despite significant advancements in these areas, they remain plagued with security and robustness concerns. Adversarial examples refer to intentionally crafted input modifications [6] imperceptible to the human, designed to cause misclassification in neural networks. The existence of such adversarial examples presents profound challenges and dilemmas for contemporary neural network systems. For instance, when an autonomous vehicle encounters an adversarial target during operation, it may be misled into altering its driving state, thereby engendering traffic safety issues. In the context of security, adversarial examples can potentially deceive security system assessments, affording opportunities for malicious actors. In military applications, where unmanned aerial vehicles are utilized for reconnaissance in uncharted territories, adversarial examples could lead to erroneous target identification, compromising battlefield awareness. Consequently, adversarial examples pose a severe threat to the application security within computer vision domains reliant on deep neural networks. Hence, it is critical to delve into the generation mechanisms and underlying principles of adversarial examples, and to develop either defensive strategies [7,8,9,10] against them or more secure and robust deep neural network architectures.
The research on adversarial examples has matured over time, giving rise to a multitude of attack methodologies. In 2018, Dong et al. [11] introduced the momentum iterative fast gradient sign method (MI-FGSM), which combined momentum-based optimization with the iterative fast gradient sign approach. This integration expedited gradient convergence and rendered the update direction of the objective function as more consistent, mitigating the influence of local noise. MI-FGSM significantly enhanced the efficacy of white-box attacks and demonstrated a marginal improvement in black-box scenarios. Subsequently, several variants of MI-FGSM were successively proposed [12,13,14], all of which achieved nearly 100% attack success rates under white-box conditions; however, their performance in black-box attacks was less satisfactory.
In gradient-based adversarial example generation methods, the gradient of the objective function is computed and then added to the original example to generate adversarial examples. If the gradient of the objective function is too large, the generated adversarial perturbation is too large and can be easily recognized by the human. In order to make the adversarial perturbation imperceptible, the gradient normalization and activation function are used to process the gradient of the objective function and limit the gradient to a range of values from −1 to 1, thus limiting the size of the adversarial perturbation. MI-FGSM and its variant methods all use the sign function as the activation function. The sign function turns the value of the gradient after normalization to −1, 1, or zero, which not only loses the gradient information that reduces the efficiency of the black-box attack, but also increases the generated adversarial perturbation, making the adversarial examples easier to be detected.
In this paper, we focus on investigating gradient-based black-box attacks and propose a novel adversarial attack method termed NA-FGTM. Unlike MI-FGSM and its variants that employ the sign function for gradient manipulation, NA-FGTM utilizes the hyperbolic tangent (Tanh) function [15] to preserve gradient information more effectively, thereby reducing the magnitude of induced adversarial perturbations. The NA-FGTM approach further incorporates the Adam [16] optimization algorithm to expedite gradient convergence and employs Nesterov acceleration [17] to enhance the transferability [18] of adversarial examples, thereby improving the success rate of black-box attacks. Moreover, NA-FGTM can be integrated with data augmentation techniques such as DIM [19], TIM [20], and SIM [12] to further boost the success rates of black-box attacks.
This study was conducted in a system environment running Python 3.9 and TensorFlow 2.6.0, employing a dataset consisting of 1000 clean images drawn from the ILSVRC 2012 validation set [21]. Several prevalent models were utilized, including Inception-v3 [22], Inception-v4, Inception-Resnet-v2 [23], and Resnet-v2-101 [24], along with three adversarial-trained models, namely Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens [25]. The experiments adopted a comparative approach to validate the performance of four distinct methods: MI-FGSM [11], NI-FGSM [12], VMI-FGSM [13], and NA-FGTM, with all experimental parameters and configurations kept consistent across the compared methods. The experimental results demonstrated that our proposed NA-FGTM outperforms other algorithms in terms of black-box attack success rates, indicating higher efficiency of our algorithm. Furthermore, under equivalent attack success rates, our method generates adversarial examples with smaller perturbation values, suggesting a superior balance between effectiveness and imperceptibility in crafting adversarial examples.
The remainder of this paper is structured as follows: Section 2 provides related work on adversarial example generation. Our novel attack algorithm, NA-FGTM, is detailed in Section 3. The experimental results and corresponding analyses are presented in Section 4. Finally, the conclusions of this paper are provided in the last section.

2. Related Work

2.1. Problem Definition

The process of an adversarial attack can be generally defined as follows: Consider a pair of pre-trained neural network models {Mw, Mb}, where Mw denotes a white-box model with transparent internal information, and Mb represents a black-box model with unknown internal information. Given a clean input sample x, when fed into the neural network M, it outputs the true label ytrue, such that ytrue = M(x). Let J(x, y) denote the loss function within M, and η represent the adversarial perturbation. The objective of an adversarial attack is to craft an adversarial example xadv = x + η, which maximizes the loss function J(xadv, ytrue), thereby causing the predicted label yadvytrue. To ensure the imperceptibility of the adversarial perturbation, η is constrained within a certain range, i.e., ||η||pε, where ε signifies the maximum allowable adversarial perturbation under the Lp norm, which can take values p = 0, 1, 2, or ∞. The adversarial example xadv can potentially fool both the white-box model Mw and the black-box model Mb; however, this study primarily focuses on black-box attacks.

2.2. Fast Gradient Sign Method

The fast gradient sign method (FGSM), proposed by Goodfellow [26], is an algorithm for generating adversarial examples based on gradients, which belongs to the category of untargeted attacks in adversarial machine learning—attacks that do not mandate the perturbed sample to be misclassified into a specific class but merely require disagreement with the original prediction. In the context of training neural networks where minimization of the loss function entails movement against the gradient direction via gradient descent, FGSM can be interpreted as a gradient ascent strategy that maximizes the loss function. The FGSM algorithm modifies the image once after calculating the gradient and is a very fast and efficient single-step attack algorithm. Assuming the original data are represented by x, the adversarial example is xadv, with the classification output denoted as y, the adversarial example xadv is obtained by adding a small perturbation η to the original image: xadv = x + η. Throughout this process, FGSM operates under an L norm constraint, i.e., ||xadv−x||pε, ensuring that the difference between the original and adversarial examples remains within a predefined bound.

2.3. Basic Iterative Method

In 2016, Kurakin [27] proposed the basic iterative method (BIM, also known as I-FGSM), which builds upon the fast gradient sign method to efficiently generate adversarial examples. BIM essentially constitutes an iterative variant of FGSM, also employing the L norm as a constraint. It divides the one-step process of FGSM into ten smaller steps. The author suggested that this gradual approach ensured a more optimal gradient ascent trajectory, at worst matching the performance of FGSM. For untargeted attacks, FGSM is iteratively applied within each step, augmented by a clipping function to normalize the images. During the iterative update procedure, some pixel values might exceed their valid range (e.g., beyond [0, 1]), necessitating their replacement with either 0 or 1 to produce a valid image. This preserves the integrity of the new sample by keeping its pixels within a neighborhood of the original image’s pixel values, avoiding significant distortion.
Empirically, the author recommended setting the learning rate to 1. Intuitively, BIM shares FGSM’s ease of comprehension while being both concise and efficient, demonstrating superior attack effectiveness compared to FGSM. However, subsequent research has shown that BIM exhibits relatively lower transferability, leading to less effective black-box attacks.

2.4. Momentum Iterative Fast Gradient Sign Method

Due to the fact that the basic iterative method based on FGSM computes and accumulates gradients in each step during the iterative process, adding them to the adversarial example such that it can effectively deceive only the white-box model used to generate it, but not the unknown black-box models, this method is significantly constrained in practical applications. In conventional optimization algorithms, momentum terms are known to expedite convergence by mitigating the likelihood of becoming stuck in local optima and imparting greater stability to the update direction. Inspired by this concept, Dong et al. [10] incorporated a momentum term into the iterative method for generating adversarial examples, aiming to circumvent noisy data and poorly effected local extremes encountered during iterations, thereby enhancing the overall attack efficacy.
Dong et al. proposed the MI-FGSM algorithm, which integrates momentum into the I-FGSM algorithm to generate noise by the following equations:
g t + 1 = μ g t + x J ( x t * , y ) | | x J ( x t * , y ) | | 1
x t + 1 * = x t * + α s i g n ( g t + 1 )
where gt+1 cumulatively integrates the velocity vector along the gradient direction, thereby transcending local optima during optimization to enhance the probability of reaching global optima. However, this momentum-based approach can lead to an accelerated growth in gradients, resulting in excessive perturbations in the generated adversarial examples.

2.5. Nesterov Iterative Fast Gradient Sign Method

Lin et al. [11] employed the Nesterov accelerated gradient (NAG) [15] and proposed the NI-FGSM algorithm to enhance the transferability of adversarial examples. In NI-FGSM, prior to each gradient iteration, a leap is made along the accumulated gradient direction. It replaces x t * in Equation (1) with x t a d v + α · μ · g t , leveraging the lookahead feature of NAG to construct a robust adversarial attack. This lookahead property of NAG aids in more efficiently and rapidly escaping the shallow local maxima, thereby improving transferability.

2.6. Data Enhancement Methods

The scale invariance method (SIM) [11] generates a large number of training examples to enhance the transferability of the generated adversarial examples through the loss invariant property of the image scaled and fed into the neural network:
arg max x a d v 1 m i = 0 m J ( S i ( x a d v ) , y t r u e ) , s . t . | | x t a d v x | | ε
where Si(x) = x/2i denotes a scaled copy of the input x when the scaling factor is 1/2i, and m denotes the number of scaled copies. The scale invariant method effectively enhances the transferability of adversarial examples; however, it demands substantial computational time and resources.
The diversity input method (DIM) [16] randomly adjusts and pads input images with a fixed probability p at each iteration before feeding the transformed images into the classifier for gradient calculation. DIM can be readily integrated with other gradient-based approaches to further bolster the transferability of adversarial examples. The transformation function is represented as follows:
T ( x t a d v , p ) = T ( x t a d v ) , p x t a d v , ( 1 p ) .
The translation invariance method (TIM) [17] optimizes the adversarial perturbations across a set of translated images rather than on a single image, rendering it more effective against defended black-box models. To alleviate the computational demands on gradients, Dong et al. introduced small image translations and approximate gradient computations by convolving the gradients of the untranslated image with a kernel matrix. The generation of adversarial examples under this framework can be expressed as follows:
x t + 1 a d v = i , j T i j ( x t a d v ) , s . t . | | x t a d v x r e a l | | ε
where T i j ( x t a d v ) denotes the translation function that shifts the input xadv by i and j pixels, respectively, along the two-dimensional direction.

3. Our Method

In this section, we delve into the detailed workings of the NA-FGTM method. To address the gradient information loss in sign activation function, NA-FGTM employs the Tanh function to augment gradient preservation. Subsequently, it utilizes the Adam optimization algorithm as well as the Nesterov acceleration, which is able to stabilize gradient update directions and expedite gradient convergence. The detailed principles underlying these methodologies are presented subsequently.

3.1. Activation Function

Activation functions play a pivotal role in gradient-based adversarial example generation methods. In the process of computing adversarial perturbations, once the gradient of the objective function with respect to the current iteration’s input is calculated, this gradient is fed into the activation function. The output of this function then determines the direction of the function update for the next iteration and the magnitude of the perturbation that is computed.
Most gradient-based methods for generating adversarial examples are built upon or improved from the fast gradient sign method (FGSM), hence commonly employing the sign activation function. However, the sign function in these gradient-oriented approaches has two significant drawbacks. Firstly, it normalizes all gradient values to either −1, +1, or zero, leading to the loss of gradient information. Secondly, the sign function maps gradient values within the interval (−1, 1) to binary extremes of −1 or +1, which inherently amplifies the magnitude of the perturbation introduced.
As Figure 1 illustrates, for the sign function, when x > 0, y = 1; when x < 0, y = −1; and when x = 0, y = 0. In contrast, for the hyperbolic tangent (Tanh) function, when −1 < x < 1, y = Tanh(x). Upon computation of the gradient values of the objective function, these values are fed into an activation function. When the gradient values fall within the range of −1 to 1, the sign function converts all gradient values within the range of −1 to 1 into either −1 or 1, which results in larger perturbations due to amplified gradient values. In contrast, the Tanh function preserves the original gradient values, leading to smaller perturbations as it maintains the original gradient magnitude. Consequently, the Tanh function can be employed as an alternative to the sign function to reduce the size of the generated perturbations.

3.2. Adaptive Learning Rate Adjustment Strategy Based on Moment Estimation

The Adam optimization algorithm integrates the strengths of both the AdaGrad and RMSprop methods, inheriting advantageous properties from each. Moreover, Adam leverages first-order moment estimates and second-order moment estimates to handle gradient information, where the first moment reflects the mean of gradients, guiding the direction of gradient updates, and the second moment represents the variance of gradients, used to modulate the learning rate during the update process. The procedure of the Adam algorithm can be outlined as follows:
m t = μ 1 m t 1 + ( 1 μ 1 ) g t
v t = μ 2 v t 1 + ( 1 μ 2 ) g t 2
m t * = m t / ( 1 μ 1 t )
v t * = v t / ( 1 μ 2 t )
θ t = θ t 1 α m t * / ( v t * + σ )
where mt denotes the estimate of the first moment (mean) of the gradients and vt represents the estimate of the second moment (uncentered variance) of the gradients. Here, μ1 and μ2 are exponential decay rates, typically set to 0.9 and 0.999, respectively. gt signifies the gradient information at a given time step. A constant σ is introduced to prevent division by zero in the denominator.
Here, Adam employs exponential weighted averages to accumulate the first and second moments, mitigating the impact of drastic fluctuations in the data on gradient updates. Additionally, to address the cold start problem inherent in exponential weighted averaging—where m0 and v0 are initialized to zero causing m1 to be biased towards zero—Adam corrects the bias by dividing mt by 1 − μ2. Lastly, m t * / ( v t * + σ ) is used to normalize the gradients. In our method, Equations (8) and (9) are combined into Equation (10), with further refinement made to the learning rate α:
α t = ε v t + 1 + σ
x t + 1 a d v = C l i p ϕ x { x t a d v + α t tanh ( ( 1 μ 2 t ) 1 μ 1 t m t + 1 v t + 1 + σ ) }
where ε denotes the initial learning rate and αt represents the adaptive learning rate, which decreases when the second moment vt increases in Equation (11), and vice versa. In Equation (12), we employ the Tanh function as the activation function, following the bias correction and normalization of gradient information. After processing the gradient through the Tanh activation function, it preserves gradient information while generating adversarial perturbations of smaller magnitude.

3.3. Lookahead Gradient Prediction

The momentum algorithm incorporates an accumulator for past gradients, thereby dominating the update steps with a substantial influence from historical directions while attenuating the sway of immediate gradient variations. This approach can be likened to a physical system where a ball descending along a gradient path maintains its cumulative momentum, rendering it less susceptible to transient perturbations and enabling a rapid, stable descent towards the global optimum. The momentum method proceeds as follows:
m t = μ m t 1 + g ( θ t 1 )
θ t = θ t 1 α m t
In Equation (13), mt and mt−1 denote the accumulated historical gradients at the current and previous iterations, respectively, whereas g represents the gradient of the objective function. μ is a hyperparameter signifying the weight assigned to the previous accumulated history gradient in the current accumulation, taking a value between 0 and 1, and α refers to the learning rate. The momentum technique constrains the current gradient update using the gradient from the previous iteration, thereby promoting a more stable convergence of the objective function toward the global optimum.
Building upon the momentum method, Nesterov proposes an enhancement where instead of computing the next update direction based solely on the current gradient, the method preemptively uses the gradient at the next iteration to calculate the update direction within the current iteration itself. Nesterov thus improves upon Equation (13) as follows:
m t = μ m t 1 + g ( θ t - 1 α μ m t 1 )
From Equation (15), it can be observed that the gradient computation does not rely solely on the current position θt−1; rather, it takes into account the gradient a step ahead ( θ t 1 α μ m t 1 ) in the future. By modifying the current gradient direction according to the anticipated gradient at the next step, this approach accelerates the convergence of gradients and hastens the objective function’s progression towards the global optimum.
Figure 2 demonstrates that although the momentum method steadily converges towards the optimal point, Nesterov achieves faster convergence, settling into a stable state shortly after the initial few updates and proceeding smoothly towards the global minimum. Analogously, our NA-FGTM method draws inspiration from Nesterov’s acceleration strategy, as follows:
x t n e s = x t a d v + α m t v t + δ
g t = x t n e s L ( f ( x t n e s ) , y t r u e )
As depicted in Equation (16), the objective function is anticipated and virtually advanced by one step before its gradient is computed. Following this, the gradient of the virtually progressed objective function is determined according to Equation (17). Thereafter, the gradient update is computed utilizing the Adam method as per Equations (6) through (12), culminating in the generation of the adversarial example xadv and the completion of the current iteration cycle.

3.4. NA-FGTM Algorithm

In this section, we outline the general flow of the NA-FGTM method. In each iteration, the process begins with leveraging Nesterov acceleration to estimate the gradient of the function at a projected next-step point xnes. Subsequently, the Adam optimization algorithm is employed to calculate the first-moment estimate m and second-moment estimate v. Building on these estimates, an adaptive learning rate α is derived using the second-moment estimate v. Following bias correction, gradient normalization and the Tanh activation function are used to compute the gradient update, which is then added to the original input sample to generate the adversarial example for the current iteration, thereby concluding the iteration. Once all iterative steps have been executed, the final adversarial example is obtained. We encapsulate the step-by-step process of the NA-FGTM method in Algorithm 1.
Algorithm 1: NA-FGTM
Input: classifier f with loss function L, original image x, true label ytrue, number of iterations T, first moment estimate m, second moment estimate v, exponential decay factor μ1, μ2, initial learning rate ε, constant value σ, maximum perturbation φ;
Output: Adversarial example xadv;
1: x0adv = 0, m0 = 0, v0 = 0
2:    for t = 0 to T − 1 do:
3:     x t n e s = x t a d v + α t m t v t + δ
4:     g t = x t n e s L ( f ( x t n e s ) , y t r u e )
5:     m t + 1 = μ 1 m t + ( 1 μ 1 ) g t
6:     v t + 1 = μ 2 v t + ( 1 μ 2 ) g t 2
7:     α t = ε v t + 1 + σ
8:     x t + 1 a d v = C l i p ϕ x { x t a d v + α t tanh ( ( 1 μ 2 t ) 1 μ 1 t m t + 1 v t + 1 + σ ) }
9:    end for
10: return xadv

4. Experiments

4.1. Experimental Setup

Datasets. In this study, a dataset comprising 1000 clean images selected from the ILSVRC 2012 validation set was employed. Each of these images belonged to a different class among a total of 1000 classes, and nearly all of these images were correctly classified by the experimental models.
Models. In this study, several common models were employed, including Inception-v3, Inception-v4, Inception-Resnet-v2, and Resnet-v2-101, along with three adversarial-trained models, namely Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens.
Hyperparameters. In order to ensure the effectiveness of the proposed method, we set the experimental parameters to be consistent with the MI-FGSM method, the maximum perturbation magnitude φ was set to 16, the number of iterations T was set to 10, and the initial learning rate ε was configured to 1.6. Within our methodology, the first-moment estimate m0 and the second-moment estimate v0 were initialized to zero, while the exponential decay factors μ1 and μ2 were set to 0.9 and 0.999, respectively. A constant value σ was established at 10−6.

4.2. Experimental Method

To validate the superiority of the proposed approach, we devised four experiments to compare the attack success rates and average adversarial perturbation magnitudes among MI-FGSM, NI-FGSM, VMI-FGSM, and NA-FGTM methods.
(1)
Adversarial example generation on individual models: Using MI-FGSM, NI-FGSM, VMI-FGSM, and NA-FGTM techniques, we generated adversarial examples separately on Inception-v3, Inception-v4, Inception-Resnet-v2, and Resnet-v2-101 models. These generated adversarial examples were then used to attack these four base models as well as the three adversarial-trained models—Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens. Ultimately, we contrasted the attack success rates and the sizes of the induced adversarial perturbations across these different methods.
(2)
Adversarial example generation with data augmentation for individual models: We integrated three data augmentation techniques (DIM, TIM, SIM) into the four attack methods—MI-FGSM, NI-FGSM, VMI-FGSM, and NA-FGTM—and employed these augmented variants to generate adversarial examples on Inception-v3, Inception-v4, Inception-Resnet-v2, and Resnet-v2-101 models. The generated adversarial examples were then utilized to attack both the four base models and the three robust models, namely Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens. Finally, we compared the attack success rates across these distinct integrated methods.
(3)
Adversarial example generation for a model ensemble: We formed an ensemble using Inception-v3, Inception-v4, Inception-Resnet-v2, and Resnet-v2-101 models, and individually apply MI-FGSM, NI-FGSM, VMI-FGSM, and NA-FGTM to generate adversarial examples on this ensemble model. Subsequently, these adversarial examples were used to attack the same set of robust models: Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens. Lastly, we compared the attack success rates and the sizes of the produced adversarial perturbations among these various methods.
(4)
Adversarial example generation with data augmentation for a model ensemble: We utilized the four augmented attack methods from Experiment (2) to generate adversarial examples on the ensemble model established in Experiment (3). The resulting adversarial examples were then employed to attack the same three robust models: Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens. Finally, we compared the attack success rates among these different augmented methods when applied to the ensemble setting.

4.3. Adversarial Example Generation on Individual Models

The experimental results of adversarial examples generated using individual models are presented in Table 1, where entries marked with an asterisk (*) denote white-box attacks while the rest represent black-box attacks. From the table, it is evident that for models such as Inc-v3, Inc-v4, IncRes-v2, and Res-101, all four attack methods—MI-FGSM, NI-FGSM, VMI-FGSM, and NA-FGTM—attain either 100% or near-100% success rates in white-box attacks, indicating high efficiency in this setting. Under black-box scenarios, NA-FGTM exhibits a slightly higher attack success rate compared to the other methods. Specifically, when adversaries crafted on Inc-v3 were used to attack Inc-v4, IncRes-v2, and Res-101, NA-FGTM achieved black-box success rates of 55.4%, 49.3%, and 45.0%, respectively, surpassing the other three techniques. Conversely, for adversarial examples generated from Res-101, NI-FGSM demonstrated the highest black-box attack success rates at 63.8%, 57.7%, and 56.4%, suggesting that NI-FGSM is more effective when generating adversarial examples against Res-101. Moreover, across the adversarial-defended models Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens, NA-FGTM consistently displayed the highest black-box attack success rates. For instance, when adversarial examples were produced using IncRes-v2, NA-FGTM achieved a black-box success rate of 41.6% on Inc-v3ens3, over two times higher than NI-FGSM’s 18.7%. This highlights that our proposed NA-FGTM method outperforms the other three methods significantly in attacking defensively enhanced models.
Additionally, we computed the L2 norm values of adversarial perturbations generated by the four methods, as shown in Table 2. The table reveals that the average magnitude of adversarial perturbations produced by MI-FGSM is around 16, while that of NI-FGSM hovers around 17. The VMI-FGSM method yields the largest average perturbation size, consistently staying around 22, which indicates that VMI-FGSM introduces more adversarial noise to enhance its attack success rate. By contrast, our NA-FGTM approach generates the smallest average perturbation sizes, maintaining them approximately at 11. This suggests that NA-FGTM achieves a higher attack success rate while also minimizing the size of the generated adversarial perturbations, making the adversarial examples less perceptible.

4.4. Adversarial Example Generation with Data Augmentation for Individual Models

In the second experiment, we incorporated three data augmentation techniques, DIM, TIM, and SIM, into the adversarial example generation methods. These data augmentation approaches effectively enhanced the black-box attack efficacy of adversarial examples, as demonstrated in Table 3. Comparing these results with those from the previous experiment, it is evident that data augmentation has indeed led to a significant increase in black-box attack efficiency for adversarial examples. For instance, when adversarial examples were generated using Res-101 and subsequently inputted to Inc-v3 in Table 1, MI-FGSM, NI-FGSM, VMI-FGSM, and NA-FGTM achieved attack success rates of 55.8%, 63.8%, 60.2%, and 58.1%, respectively. However, after incorporating data augmentation strategies in Table 3, these four methods recorded improved attack success rates of 76.4%, 75.2%, 78.1%, and 81.0%, respectively, representing an upsurge of up to 22.9%. This underscores the effectiveness of integrating data augmentation methods with adversarial example generation, thereby significantly boosting the success rate of black-box attacks.
Within Table 3, under white-box attacks, the attack success rates of the four methods exhibit marginal differences, all hovering close to 100%. However, in black-box scenarios, our NA-FGTM approach outperforms other methods significantly. Specifically, in Table 3, adversarial examples generated by the NA-FGTM method augmented with data augmentation techniques from the IncRes-v2 model attain success rates of 73.8%, 69.1%, and 62.2% against the other three models, manifestly higher than those of other methods. Furthermore, on the adversarial-defended models Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens, the attack success rates stand at 47.9%, 42.3%, and 40.4%, respectively, indicating that our NA-FGTM method, when coupled with data augmentation, can more effectively conduct black-box attacks against such defended models. In summary, data augmentation techniques effectively boost the black-box attack efficiency of adversarial example generation methods, and in conjunction with these techniques, our NA-FGTM method demonstrates superior black-box attack efficiency, particularly against adversarial-trained deep neural network models.

4.5. Adversarial Example Generation for a Model Ensemble

In the third experiment, we ensembled four deep neural network models, namely Inc-v3, Inc-v4, IncRes-v2, and Res-101, to form a large ensemble model collection. Subsequently, this ensemble was employed to generate adversarial examples using four distinct methods. The experimental results in Table 4 demonstrate that upon ensembling these models, the attack success rates on the original models for all generated adversarial examples approach nearly 100%, thereby indicating that ensemble-based approaches can significantly enhance black-box attack success rates against adversarial examples. Moreover, adversarial examples produced by the ensemble models also exhibit increased attack success rates against defense-oriented adversarial models. For instance, in Table 4, when employing MI-FGSM-generated adversarial examples to attack the defended models Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens, the respective attack success rates are 40.8%, 35.5%, and 23.4%. By contrast, in Table 1, the highest attack rates against these same defended models using MI-FGSM were 23.3%, 20.3%, and 12.2%, respectively, on the Res-101 model, thus evidencing that ensembling enhances the black-box attack efficacy. Crucially, Table 4 reveals that our proposed NA-FGTM method, after ensembling, continues to yield adversarial examples with the highest attack success rates, particularly against defense models. With attack success rates of 75.0%, 70.6%, and 62.3% on these defended models, NA-FGTM exhibits a notably superior efficiency in black-box attacks compared to the other three methods under consideration.
Additionally, Table 5 presents the mean perturbation sizes of adversarial examples generated by the ensemble through the four different methods. From the table, it is evident that our proposed NA-FGTM method produces adversarial examples with the smallest mean perturbation size of 11.48, while among the other methods, MI-FGSM yields the smallest perturbations at 15.95. This indicates that even after incorporating the ensemble strategy, our NA-FGTM approach consistently generates adversarial examples with the minimal average amount of distortion.

4.6. Adversarial Example Generation with Data Augmentation for a Model Ensemble

In Experiment Four, we employed both model ensembles and data augmentation techniques in the generation of adversarial examples, with the results presented in Table 6. The table reveals that combining model ensembles and data augmentation significantly enhances the black-box attack success rates of adversarial examples compared to those generated by individual models alone. For instance, in Table 1, our NA-FGTM method achieves black-box attack success rates of 41.6%, 36.1%, and 30.8% against Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens adversarial defense models when tested on IncRes-v2. However, upon integrating model ensembles and data augmentation in Table 6, these rates escalate to 89.5%, 86.2%, and 81.0%, respectively, substantially outperforming three other comparative methods. This substantiates the effectiveness of combining model ensembles and data augmentation with our NA-FGTM approach, thereby greatly improving the efficiency of black-box attacks using adversarial examples.
Moreover, we randomly selected a few distinct images from the dataset and showcase the adversarial examples produced by four different methods within the Inc-v3 model in Figure 3. The visual inspection demonstrates that the adversarial perturbations in these images are imperceptible to the human eye.

5. Conclusions

This paper primarily focuses on gradient-based black-box attack methodologies, specifically enhancing an iterative fast gradient sign method (IFGSM) for black-box attacks. IFGSM-based approaches are known for their efficiency and representativeness among attack methods, generally requiring fewer resources and time compared to alternative methods. Nevertheless, conventional attack algorithms such as MI-FGSM and its variants rely on a sign function to process gradient information, which not only discards gradient details but also decreases the efficiency of black-box attacks and amplifies the generated adversarial perturbations.
Herein, we propose a novel adversarial attack method termed NA-FGTM. Unlike MI-FGSM and its relatives that utilize a sign function, NA-FGTM adopts the Tanh function to preserve gradient information more effectively, thereby reducing the magnitude of the generated adversarial perturbations. Moreover, NA-FGTM integrates Adam optimization to expedite gradient convergence and employs Nesterov acceleration to enhance the transferability of adversarial examples, thus boosting the black-box attack success rate.
Furthermore, NA-FGTM can be coupled with data augmentation techniques like DIM, TIM, and SIM to further elevate the black-box attack success rates. The experimental results on 1000 images from the ILSVRC 2012 validation set substantiate the superior performance of NA-FGTM in generating adversarial examples compared to other methods. Notably, under the same attack success rates, NA-FGTM yields adversarial examples with smaller average perturbation magnitudes. Additionally, NA-FGTM demonstrates promising black-box attack success rates against adversarial defense models, indicating its efficacy in increasing the attack efficiency against such models. Certainly, the NA-FGTM method has numerous areas for improvement, such as generating a large number of adversarial samples in short periods or applying it against large-scale deep neural networks, both of which present significant challenges. We will consider further exploration and enhancement in these aspects in future research. Additionally, future work will involve contemplating the practical application of our method in real-world scenarios, contributing to the advancement of adversarial defense mechanisms.

Author Contributions

Conceptualization, D.H. and D.C.; methodology, D.H. and D.C.; software, D.H.; validation, D.H., Y.Z. and H.Z.; formal analysis, L.X. and H.Z.; investigation, L.X. and J.J.; resources, J.J. and J.T.; data curation, J.T. and L.X.; writing—original draft preparation, D.H.; writing—review and editing, D.C. and J.T.; visualization, D.H. and J.J.; supervision, H.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 62171328, 62171327).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Dhillon, A.; Verma, G.K. Convolutional neural network: A review of models, methodologies and applications to object detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
  2. Zakariyya, I.; Kalutarage, H.; Al-Kadri, M.O. Towards a robust, effective and resource efficient machine learning technique for IoT security monitoring. Comput. Secur. 2023, 133, 103388. [Google Scholar] [CrossRef]
  3. Parekh, D.; Poddar, N.; Rajpurkar, A.; Chahal, M.; Kumar, N.; Joshi, G.P.; Cho, W. A review on autonomous vehicles: Progress, methods and challenges. Electronics 2022, 11, 2162. [Google Scholar] [CrossRef]
  4. Dua, S.; Kumar, S.S.; Albagory, Y.; Ramalingam, R.; Dumka, A.; Singh, R.; Rashid, M.; Gehlot, A.; Alshamrani, S.S.; AlGhamdi, A.S. Developing a speech recognition system for recognizing tonal speech signals using a convolutional neural network. Appl. Sci. 2022, 12, 6223. [Google Scholar] [CrossRef]
  5. Wang, P.; Fan, E.; Wang, P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar] [CrossRef]
  6. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. In Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  7. Liu, J.; Zhang, W.; Zhang, Y.; Hou, D.; Liu, Y.; Zha, H.; Yu, N. Detection based defense against adversarial examples from the steganalysis point of view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4825–4834. [Google Scholar]
  8. Song, Z.; Zhang, Z.; Zhang, K.; Luo, W.; Fan, Z.; Ren, W.; Lu, J. Robust single image reflection removal against adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 24688–24698. [Google Scholar]
  9. Frosio, I.; Kautz, J. The best defense is a good offense: Adversarial augmentation against adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4067–4076. [Google Scholar]
  10. Rosenberg, I.; Shabtai, A.; Elovici, Y.; Rokach, L. Defense methods against adversarial examples for recurrent neural networks. arXiv 2019, arXiv:1901.09963. [Google Scholar]
  11. Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9185–9193. [Google Scholar]
  12. Lin, J.; Song, C.; He, K.; Wang, L.; Hopcroft, J.E. Nesterov accelerated gradient and scale invariance for adversarial attacks. arXiv 2019, arXiv:1908.06281. [Google Scholar]
  13. Wang, X.; He, K. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1924–1933. [Google Scholar]
  14. Wang, J.; Wang, M.; Wu, H.; Ma, B.; Luo, X. Improving Transferability of Adversarial Attacks with Gaussian Gradient Enhance Momentum. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; Springer Nature: Singapore, 2023; pp. 421–432. [Google Scholar]
  15. Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
  16. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  17. Nesterov, Y. A method of solving a convex programming problem with convergence rate O (1/k2). Dokl. Akad. Nauk. SSSR 1983, 269, 543. [Google Scholar]
  18. Tramèr, F.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. The space of transferable adversarial examples. arXiv 2017, arXiv:1704.03453. [Google Scholar]
  19. Xie, C.; Zhang, Z.; Zhou, Y.; Bai, S.; Wang, J.; Ren, Z.; Yuille, A. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2730–2739. [Google Scholar]
  20. Dong, Y.; Pang, T.; Su, H.; Zhu, J. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4312–4321. [Google Scholar]
  21. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  22. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  23. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 4–9 February 2017; Volume 31. [Google Scholar]
  24. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  25. Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. Ensemble adversarial training: Attacks and defenses. arXiv 2017, arXiv:1705.07204. [Google Scholar]
  26. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
  27. Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: London, UK, 2018; pp. 99–112. [Google Scholar]
Figure 1. Diagrams of sign function (a) and Tanh function (b).
Figure 1. Diagrams of sign function (a) and Tanh function (b).
Electronics 13 02464 g001
Figure 2. Gradient optimization trajectories for momentum method (a) and Nesterov method (b).
Figure 2. Gradient optimization trajectories for momentum method (a) and Nesterov method (b).
Electronics 13 02464 g002
Figure 3. Adversarial example images generated by MI-FGSM, NI-FGSM, VMI-FGSM, and NA-FGTM methods.
Figure 3. Adversarial example images generated by MI-FGSM, NI-FGSM, VMI-FGSM, and NA-FGTM methods.
Electronics 13 02464 g003
Table 1. Attack success rates (%) of adversarial examples generated by individual models.
Table 1. Attack success rates (%) of adversarial examples generated by individual models.
AttackInc-v3Inc-v4IncRes-v2Res-101Inc-v3ens3Inc-v3ens4IncRes-v2ens
Inc-v3MI-FGSM100.0 *43.041.734.812.712.76.5
NI-FGSM100.0 *50.149.139.913.113.46.2
VMI-FGSM100.0 *51.648.342.925.426.117.6
NA-FGTM100.0 *55.449.345.029.128.518.3
Inc-v4MI-FGSM53.999.9 *43.939.916.214.27.8
NI-FGSM63.4100.0 *50.944.115.114.27.1
VMI-FGSM61.3100.0 *55.246.530.428.621.0
NA-FGTM64.8100.0 *54.347.632.430.023.7
IncRes-v2MI-FGSM57.148.698.5 *42.521.515.811.8
NI-FGSM61.152.399.0 *44.118.714.810.4
VMI-FGSM68.963.2100.0 *52.535.932.428.1
NA-FGTM73.166.599.8 *58.341.636.130.8
Res-101MI-FGSM55.850.145.699.3 *23.320.312.2
NI-FGSM63.857.756.499.4 *23.721.611.7
VMI-FGSM60.255.453.699.8 *31.728.026.3
NA-FGTM58.151.251.699.2 *36.632.827.1
* Results of white-box setting.
Table 2. Mean perturbation values (L2) of adversarial examples produced by individual models.
Table 2. Mean perturbation values (L2) of adversarial examples produced by individual models.
AttackInc-v3Inc-v4IncRes-v2Res-101
MI-FGSM15.9716.115.9815.90
NI-FGSM17.4917.5917.4117.35
VMI-FGSM21.5822.1221.6521.50
NA-FGTM11.8111.9611.7411.76
Table 3. Attack success rates (%) of adversarial examples generated using data augmentation on individual models.
Table 3. Attack success rates (%) of adversarial examples generated using data augmentation on individual models.
AttackInc-v3Inc-v4IncRes-v2Res-101Inc-v3ens3Inc-v3ens4IncRes-v2ens
Inc-v3MI-FGSM99.3 *64.259.652.018.518.49.4
NI-FGSM99.6 *60.658.847.616.114.47.8
VMI-FGSM99.8 *61.959.353.131.634.721.9
NA-FGTM99.4 *65.860.256.839.836.826.4
Inc-v4MI-FGSM72.098.5 *63.354.821.322.011.7
NI-FGSM70.799.2 *58.649.717.216.58.7
VMI-FGSM71.297.6 *61.954.132.934.625.7
NA-FGTM73.598.8 *63.856.841.439.930.3
IncRes-v2MI-FGSM69.565.794.2 *58.630.623.518.3
NI-FGSM68.063.297.2 *51.022.118.111.4
VMI-FGSM70.365.296.5 *59.635.331.634.9
NA-FGTM73.869.194.0 *62.247.942.340.4
Res-101MI-FGSM76.468.269.998.4 *37.333.620.8
NI-FGSM75.268.268.499.0 *27.926.215.6
VMI-FGSM78.171.474.699.1 *48.745.232.1
NA-FGTM81.074.175.198.9 *63.557.648.0
* Results of white-box setting.
Table 4. Attack success rates (%) of adversarial examples generated by ensemble model.
Table 4. Attack success rates (%) of adversarial examples generated by ensemble model.
AttackInc-v3Inc-v4IncRes-v2Res-101Inc-v3ens3Inc-v3ens4IncRes-v2ens
MI-FGSM99.999.196.299.940.835.523.4
NI-FGSM99.899.899.099.841.834.925.5
VMI-FGSM99.999.9100.099.959.661.853.4
NA-FGTM99.999.999.8100.075.070.662.3
Table 5. Mean perturbation values (L2) of adversarial examples produced by ensemble model.
Table 5. Mean perturbation values (L2) of adversarial examples produced by ensemble model.
AttackAvg Per
MI-FGSM15.95
NI-FGSM16.71
VMI-FGSM20.82
NA-FGTM11.48
Table 6. Attack success rates (%) of adversarial examples generated by model ensemble and data augmentation.
Table 6. Attack success rates (%) of adversarial examples generated by model ensemble and data augmentation.
AttackInc-v3Inc-v4IncRes-v2Res-101Inc-v3ens3Inc-v3ens4IncRes-v2ens
MI-FGSM99.598.295.799.758.952.938.5
NI-FGSM100.099.899.5100.047.542.728.2
VMI-FGSM99.999.799.999.876.471.865.3
NA-FGTM99.899.799.899.389.586.281.0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hong, D.; Chen, D.; Zhang, Y.; Zhou, H.; Xie, L.; Ju, J.; Tang, J. Efficient Adversarial Attack Based on Moment Estimation and Lookahead Gradient. Electronics 2024, 13, 2464. https://doi.org/10.3390/electronics13132464

AMA Style

Hong D, Chen D, Zhang Y, Zhou H, Xie L, Ju J, Tang J. Efficient Adversarial Attack Based on Moment Estimation and Lookahead Gradient. Electronics. 2024; 13(13):2464. https://doi.org/10.3390/electronics13132464

Chicago/Turabian Style

Hong, Dian, Deng Chen, Yanduo Zhang, Huabing Zhou, Liang Xie, Jianping Ju, and Jianyin Tang. 2024. "Efficient Adversarial Attack Based on Moment Estimation and Lookahead Gradient" Electronics 13, no. 13: 2464. https://doi.org/10.3390/electronics13132464

APA Style

Hong, D., Chen, D., Zhang, Y., Zhou, H., Xie, L., Ju, J., & Tang, J. (2024). Efficient Adversarial Attack Based on Moment Estimation and Lookahead Gradient. Electronics, 13(13), 2464. https://doi.org/10.3390/electronics13132464

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop