Next Article in Journal
Synthetic-Aperture Radar Radio-Frequency Interference Suppression Based on Regularized Optimization Feature Decomposition Network
Previous Article in Journal
Characterisation of Two Vineyards in Mexico Based on Sentinel-2 and Meteorological Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Transferability with Intra-Class Transformations and Inter-Class Nonlinear Fusion on SAR Images

1
School of Electronic Science, National University of Defense Technology (NUDT), Changsha 410073, China
2
Test Center, National University of Defense Technology (NUDT), Xi’an 710100, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(14), 2539; https://doi.org/10.3390/rs16142539
Submission received: 21 May 2024 / Revised: 22 June 2024 / Accepted: 27 June 2024 / Published: 10 July 2024
(This article belongs to the Section Remote Sensing Image Processing)

Abstract

:
Recent research has revealed that the deep neural network (DNN)-based synthetic-aperture radar (SAR) automatic target recognition (ATR) techniques are vulnerable to adversarial examples, which poses significant security risks for their deployment in real-world systems. At the same time, the adversarial examples often exhibit transferability across DNN models, whereby when they are generated on the surrogate model they can also attack other target models. As the significant property in black-box scenarios, transferability has been enhanced by various methods, among which input transformations have demonstrated excellent effectiveness. However, we find that existing transformations suffer from limited enhancement of transferability due to the unique imaging mechanism and scattering characteristics of SAR images. To overcome this issue, we propose a novel method called intra-class transformations and inter-class nonlinear fusion attack (ITINFA). It enhances transferability from two perspectives: intra-class single image transformations and inter-class multiple images fusion. The intra-class transformations module utilizes a series of diverse transformations that align with the intrinsic characteristics of SAR images to obtain a more stable gradient update direction and prevent the adversarial examples from overfitting the surrogate model. The inter-class fusion strategy incorporates the information from other categories in a nonlinear manner, effectively enhances the feature fusion effect, and guides the misclassification of adversarial examples. Extensive experiments on the MSTAR dataset and SEN1-2 dataset demonstrate that ITINFA exhibits significantly better transferability than the existing transfer-based methods, with the average transfer attack success rate increases exceeding 8% for single models and over 4% for ensemble models.

1. Introduction

Synthetic-aperture radar (SAR), as a sensor utilizing microwaves for imaging, possesses the capability to conduct all-weather day-and-night imaging without being influenced by factors such as lighting and weather conditions. Through imaging processing algorithms, high-resolution images can be obtained, enabling the accurate extraction of feature information for target recognition [1]. Therefore, SAR holds significant application value in both military and civilian domains [2]. In recent years, with the rapid development of deep learning, deep neural networks (DNNs) have made significant breakthroughs in SAR automatic target recognition (ATR). With the end-to-end training method and powerful feature extraction capability, it has dramatically surpassed the traditional recognition methods in terms of recognition efficiency and accuracy [3,4,5,6,7,8,9].
However, the internal working process and decision-making mechanisms of DNN models are not transparent, and their reliability and robustness still need improvement. Recent research has shown that DNN models are vulnerable to adversarial examples, which are crafted by adding subtle yet carefully designed perturbations to the input images [10,11]. While appearing virtually indistinguishable from the original images to the human eyes, they can cause the DNN models to produce incorrect results. The existence of adversarial examples reveals the vulnerability inherent in DNN models, which poses significant security risks for their deployment and application in radar intelligent recognition systems [12]. Consequently, this issue has garnered widespread attention and has become a hot topic in the SAR-ATR field.
Current research on SAR adversarial attacks primarily focuses on white-box scenarios, where the attacker has full access to the internal information of the target model, including its structures, parameters, and gradients [13,14,15,16]. However, in the real world, the opponent’s SAR systems and deep recognition models are typically unknown, making it impractical to employ white-box attacks directly. Consequently, it is imperative to conduct attacks in black-box settings, which do not necessitate any information about the target model and align with the SAR non-cooperative scenarios. One of the significant properties in black-box scenarios is the transferability of adversarial examples, whereby the adversarial examples generated on the surrogate model can also attack other models. Nevertheless, black-box attacks are unable to achieve comparable performance to white-box attacks, as the generated adversarial examples tend to overfit the surrogate model and have limited transferability on the target model. For instance, as contrasted in Figure 1, all attack algorithms achieve nearly 100% attack success rates in white-box settings. However, in black-box scenarios, the majority of attack algorithms exhibit transfer attack success rates between 60% and 70%, a significant decrease compared to their white-box performance. Consequently, enhancing the transferability of SAR adversarial examples is a significant and challenging problem.
Various methods have been proposed to enhance transferability, among which input transformations stand out as a prevailing strategy. The input transformation-based attacks aim to apply a series of transformations to enhance the input diversity and alleviate overfitting to the surrogate model. These transformations include resizing and padding [17], scaling [18], translation [19], and so on. We proceed in this direction and discover that existing transformations typically neglect the unique characteristics of SAR images, such as the multiplicative speckle noise generated by the coherent principle, which results in limited diversity and insufficient transformations for SAR images. Moreover, Admix [20] introduces images from other categories to achieve transformation. However, due to the high contrast and sharp edges between different regions in SAR images, its linear fusion strategy results in insufficient feature fusion and the disruption of original information.
To tackle these problems, we have explored two special strategies via creating diverse input patterns. First, from the perspective of intra-class transformation, we argue that enhancing the input diversity and considering some transformations that align with the characteristics of SAR images can achieve a superior transformation effect and enrich the gradient information, thereby stabilizing the update direction to avoid falling into local optima and alleviating the overfitting to the surrogate model. Second, from the perspective of inter-class transformation, we believe that nonlinear fusion is more suitable for SAR images than a linear mixing strategy, thereby reducing damage to the original features and achieving a better fusion effect.
In this article, we propose a novel method called intra-class transformations and inter-class nonlinear fusion attack (ITINFA) to enhance transferability. At first, we employ a series of transformations on a single image to increase the input diversity. Subsequently, we nonlinearly integrate the information from other categories into the transformed images to guide the adversarial examples to cross the decision boundary. Finally, we utilize the various transformed and mixed images to calculate the average gradient and combine the momentum iterative process to generate more transferable SAR adversarial examples. Extensive experiments on the MSTAR and SEN1-2 datasets demonstrate that ITINFA outperforms the state-of-the-art transfer-based attacks, dramatically enhancing the transferability in black-box scenarios while preserving comparable attack performance in white-box scenarios. The main contributions are summarized as follows:
  • We propose a novel method called ITINFA for generating more transferable adversarial examples to fool the DNN-based SAR image classifiers in black-box scenarios.
  • To improve the limited diversity and insufficient transformations of existing methods commonly utilized for optical images, we fully consider the unique characteristics of SAR imagery and apply diverse intra-class transformations to obtain a more stable update direction, which helps to avoid falling into local optima and alleviates overfitting to the surrogate model.
  • To overcome the insufficient fusion and damage of existing linear mixing, we devise a nonlinear fusion strategy according to the high-contrast characteristic inherent in SAR images, which is beneficial to reduce the background clutter information and avoid excessive disruption of the original SAR image, thereby incorporating information from other categories more effectively.
  • Extensive evaluations based on the MSTAR and SEN1-2 datasets show that the proposed ITINFA significantly improves the transferability of SAR adversarial examples on various cutting-edge DNN models.
The remainder of this article is organized as follows. Section 2 briefly reviews the related works. Section 3 introduces the details of our proposed ITINFA. Section 4 shows experimental results on the MSTAR and SEN1-2 datasets. Finally, we present the discussion and conclusions in Section 5 and Section 6.

2. Related Works

In this section, we first briefly introduce adversarial attacks in the image classification domain and the transferability of adversarial examples. Secondly, we provide a review of input transformation-based attacks, the most commonly employed approach to enhance the transferability of adversarial examples. Finally, we review the related works about adversarial attacks in SAR-ATR.

2.1. Adversarial Attacks and Transferability

Supposing x denotes the input image and y t r u e represents the ground-truth label. F θ denotes the classifier with parameters θ which can accurately predict the category of the input image:
F θ ( x ) = y t r u e .
Adversarial attacks aim to find a subtle adversarial perturbation δ that can induce misclassification while remaining imperceptible to the human eye, which is expressed in the following mathematical form:
F θ ( x + δ ) y t r u e s . t . δ p ϵ ,
where δ p ϵ represents the p norm constraint that guarantees the adversarial perturbation δ is visually imperceptible. In this article, we mainly focus on the constraint that p = , which is widely adopted in various transfer-based attacks.
FGSM: As the first gradient-based attack, fast gradient sign method (FGSM) [13] calculates the gradient of the loss function and updates the adversarial examples along the direction of the gradient, which is expressed as follows:
x a d v = x + ϵ · sign x J ( x , y t r u e ; θ ) ,
where sign ( · ) indicates the sign function and x J ( · ) denotes the gradient of the loss function with respect to the input x. Due to its single-update strategy, FGSM exhibits high efficiency but limited attack performance.
I-FGSM: Building upon the single-step update of FGSM, Kurakin et al. [14] introduced an iterative version named iterative fast gradient sign method (I-FGSM). I-FGSM extends FGSM through multiple iterative update; it is described as follows:
x t + 1 a d v = x t a d v + α · sign x J x t a d v , y t r u e ; θ ,
where x 0 a d v = x and α = ϵ n , with n being the number of iterations. Research has demonstrated that I-FGSM exhibits more powerful white-box attack performance than FGSM, with poorer transferability due to overfitting the surrogate model.
MI-FGSM: To alleviate the issue of overfitting and improve the transferability, Dong et al. [21] proposed the momentum iterative fast gradient sign method (MI-FGSM). MI-FGSM integrates a momentum term into I-FGSM to stabilize the update direction and prevent it from falling into local optima; it is represented as follows:
g t + 1 = μ · g t + x J x t a d v , y t r u e ; θ x J x t a d v , y t r u e ; θ 1 ,
x t + 1 a d v = x t a d v + α · sign g t + 1 ,
where μ denotes the decay factor and g t indicates the accumulated gradient information up to the t-th iteration.
NI-FGSM: Lin et al. [18] adopted the Nesterov algorithm to stabilize the gradient update and improve the transferability. The optimization algorithm is called the Nesterov iterative fast gradient sign method (NI-FGSM), which can be expressed as follows:
x t n e s = x t a d v + α · μ · g t ,
g t + 1 = μ · g t + x J x t n e s , y t r u e ; θ x J x t n e s , y t r u e ; θ 1 .
VMI-FGSM: Wang et al. [22] proposed an approach to utilize the gradient variance from previous iterations to stabilize the update direction, and thereby avoid local optima during the search process. Specifically, they approximate the gradient variance by sampling N images in the neighborhood of x, as follows:
V ( x ) = 1 N i = 1 N x i J x i , y t r u e ; θ x J ( x , y t r u e ; θ ) ,
where the x i is uniformly sampled in the neighborhood of x. Subsequently, they integrate the gradient variance into the momentum iteration, as follows:
g t + 1 = μ · g t + g ^ t + 1 + v t g ^ t + 1 + v t 1 .

2.2. Input Transformation-Based Adversarial Attacks

Recent studies [18,21,23] have analogized the optimization process of adversarial examples to the training process of neural networks, comparing the transferability of adversarial examples to the generalization ability of the trained networks. As a result, employing a series of data augmentation techniques to the input images is a standard practice aimed at improving models’ generalization ability. Additionally, these strategies have proven effective in enhancing the transferability of adversarial examples.
DIM: As the first input transformation-based attack, DIM [17] employs random resizing and padding on the input images. This involves resizing the images to a random size and subsequently padding them with zeros to revert to their initial size. To strike a balance between the performance of white-box attacks and transferability, a transformation probability p is introduced to determine whether to perform the transformation. The complete process is illustrated as follows:
T x t a d v ; p = T x t a d v with probability p x t a d v with probability 1 p ,
g t + 1 = μ · g t + x J T x t a d v ; p , y t r u e ; θ x J T x t a d v ; p , y t r u e ; θ 1 ,
where T x t a d v ; p denotes randomly resizing and padding the image with probability p.
SIM: Lin et al. [18] discovered that DNNs exhibit a scale-invariant property, where the outputs remain similar for both the original and scaled versions of images. Leveraging this property, they introduced the scale-invariant method (SIM), which involves scaling images with different factors and averaging the gradients obtained from these scaled images. This process is described as follows:
g ¯ t + 1 = 1 m i = 0 m 1 x J x t a d v 2 i , y t r u e ; θ ,
where m is the number of scaled images.
TIM: Dong et al. [19] proposed the translation-invariant method (TIM), which utilizes multiple translated images to calculate the average gradient for optimization. Furthermore, they approximated this process by convolving the gradient with a pre-defined convolutional kernel W to generate adversarial examples, which can be represented as follows:
g t + 1 = μ · g t + W x J x t a d v , y t r u e ; θ W x J x t a d v , y t r u e ; θ 1 .
Admix: Zhang et al. [24] proposed an effective data augmentation strategy known as mixup, aimed at enhancing the generalization capabilities of training models. It treats two images equally and mixes them, including their respective labels. This process is expressed as follows:
x ˜ = λ x i + ( 1 λ ) x j , y ˜ = λ y i + ( 1 λ ) y j ,
where x i and x j are the original images, y i and y j are the one-hot label encodings.
Drawing inspiration from this strategy, Wang et al. [20] proposed introducing images from other categories for input transformation. Specifically, they made two key enhancements to the mixing strategy. First, they substituted the equal manner with the master and slave manner, which considered the original image as primary and the introduced images as secondary. Controlling the weights ensured that the introduced images contributed only a small portion. Second, they only mixed the images while retaining the labels, which was able to avoid introducing the gradient of other categories during optimization. The process is expressed as
x ˜ = γ · x + η · x = γ · x + η · x ,
where x is the original image and x is the image sampled from other categories. And η = η / γ is employed to ensure that x only occupies a small portion of admixed image x ˜ . Subsequently, they calculate the average gradient on a series of admixed images to generate adversarial examples, which can be represented as follows:
g ¯ t + 1 = 1 m 1 · m 2 x X i = 0 m 1 1 x J γ i · x t a d v + η · x , y t r u e ; θ ,
where m 1 is the number of admixed images controlled by different weight factors γ i and m 2 is the number of images sampled from other categories.

2.3. Adversarial Attacks in SAR-ATR

In addition to garnering significant attention in computer vision, adversarial attacks have also become a research hotspot in SAR-ATR. Li et al. [25] first pointed out that adversarial examples also existed in SAR images. They empirically analyzed SAR adversarial examples, noting that SAR images containing richer information are more vulnerable to adversarial perturbations. Following this, Huang et al. [26] employed several well-known algorithms to craft adversarial examples, illustrating the vulnerability of SAR target recognition models to such threats. Du et al. [27] designed a generative adversarial network (GAN) to generate SAR adversarial examples that were both efficiently crafted and highly similar to measured SAR images. Subsequently, they further proposed utilizing a U-Net encoding network [28] to replace the traditional optimization process of the C&W algorithm [15], significantly increasing computational speed while maintaining comparable attack performance.
As research in this domain advanced, researchers started to incorporate SAR image characteristics to generate adversarial examples and considered more practical attack scenarios. Peng et al. [29] transformed the input image to the corresponding speckle variant and applied the target mask to confine the perturbation within the target region, aligning with practical physical implementations. Zhang et al. [30] also emphasized that adversarial perturbation confined to the target region would lead to significant performance degradation. Du et al. [31] utilized a generator to generate local universal adversarial perturbations on target regions, achieving comparable performance to global adversarial perturbations. In a fully black-box scenario, where training samples were unavailable, Peng et al. [32] devised a framework for generating universal adversarial perturbations. Experiments on the MSTAR and SARSIM datasets revealed significant effectiveness of their framework for SAR image classification models. Zhang et al. [33] exploited the characteristics of SAR image data to design several objective functions, achieving better performance compared to other baseline approaches. Lin et al. [34] proposed a targeted attack method that extracted the feature maps at shallow layers and constructed the feature-level loss function to generate adversarial examples. Although these methods consider various scenarios in the real world, there is still a long way to go before physically implementing the adversarial perturbations in the signal domain. Furthermore, the transferability remains severely limited when confronted with unknown target models. Therefore, enhancing the transferability of SAR adversarial examples poses a highly challenging task, and this article mainly focuses on developing an algorithm to generate more transferable adversarial examples in black-box settings.

3. Methods

In this section, we provide a detailed description of the proposed ITINFA. Firstly, we elaborate on several intra-class input transformation methods that align with the characteristics of SAR images. They are employed on the single input image to generate diverse transformed images and obtain rich gradient information, thereby alleviating overfitting to the surrogate model. Subsequently, an inter-class nonlinear fusion strategy is employed to incorporate information from other categories into the transformed images, which is beneficial to guide adversarial examples to cross the decision boundary to induce misclassification. Finally, we combine the momentum iterative algorithm with the intra-class transformations and inter-class nonlinear fusion strategies to further enhance the transferability of adversarial examples. The framework of the proposed ITINFA is shown in Figure 2.

3.1. Intra-Class Transformations

Input transformation-based methods [17,18,19,20] have significantly improved the transferability of adversarial examples. They regard generating adversarial examples as the network training process and utilize data augmentation, which is widely employed to improve the generalization ability of the network, to enhance the transferability [23]. However, due to the substantial disparities between SAR images and optical images in characteristics, spectral bands, and imaging mechanisms, these input transformation-based methods still exhibit limited improvement of transferability in SAR adversarial examples. Therefore, it is necessary to consider the transformations that align with the characteristics of SAR images, such as the speckle transformation method proposed in [29], which considers the inherent speckle noise in SAR images and alleviates overfitting to the surrogate model. Moreover, Wang et al. [35] proposed and empirically validated that more diverse transformed images result in better transferability. Motivated by this inspiration, we employ a series of input transformations on a single input image to obtain more diverse transformed images. The employed input transformations are shown in Figure 3.

3.1.1. Translation

Because convolutional neural networks inherently exhibit a certain level of small-scale translation invariance [36], we consider translation to simulate the positional deviation. Translation can be mathematically expressed as follows:
x i j = x i + u , j + v ,
where x is the input image of size m × n , ( i , j ) is the original coordinate and ( u , v ) is the shift coordinate. In this article, we select a random step for translation ranging from 10 to 30 pixels in the range and azimuth directions, which can guarantee that the target does not exceed the image boundary.

3.1.2. Rotation

As a typical geometric data augmentation technique, rotation is widely employed to transform the input images. In this article, we also utilize rotation to simulate the ideal perspective variation in SAR images, generating more diverse input images. Specifically, we randomly select one angle from 90°, 180°, and 270° to perform the input transformation.

3.1.3. Multiplicative Speckle Noise

Due to the physical scattering concepts of coherent radiation of SAR, multiplicative speckle noise is the unique noise inherent in SAR images [37]. According to the multiplicative noise model, the observed intensity at each resolution cell of the SAR images is formulated as the radar cross-section (RCS) modulating an exponentially distributed multiplicative noise, which can be modeled as follows [36]:
I = σ × n ,
p n n = e n 1 e a ,
where I represents the observed intensity and σ denotes the estimated RCS of the illuminated resolution cell. n denotes the multiplicative speckle noise, which follows a truncated exponential distribution with parameter a.

3.1.4. Additive Noise

Most imaging sensors are dominated by additive noise. In addition to the multiplicative noise model, we also consider the additive noise model to diversify the input images further. Specifically, we consider three types of noise in this article: Gaussian noise, salt-and-pepper noise, and uniform noise, and randomly select one type of noise to add to the original SAR images for input transformations.

3.1.5. Denoising

SAR images are often affected by significant amounts of noise, such as Gaussian noise and speckle noise [38]. Therefore, denoising is necessary to enhance the quality of SAR images, enabling better application in downstream tasks such as target detection and recognition. In this article, we randomly select one of the following denoising methods for input transformations: SAR-block-matching 3D (SAR-BM3D) and discrete cosine transform (DCT). SAR-BM3D [39] is one of the most promising nonlocal despeckling algorithms, achieving a satisfactory balance between speckle reduction and detail preservation. Its basic idea follows the BM3D algorithm [40] for additive white Gaussian noise denoising, but revises the processing steps and considers the unique characteristics of SAR images. DCT [35] is a transformation method that transforms the input image to the frequency domain, eliminates high-frequency components from the image, and restores the image to the spatial domain with the inverse DCT.

3.2. Inter-Class Nonlinear Fusion

Adversarial attacks seek to mislead DNNs, resulting in the misclassification of adversarial examples into wrong categories. Mixing images from other categories during the input transformation process will aid in guiding adversarial examples to cross the decision boundary towards other categories, thereby achieving misclassification. However, existing research has indicated that Admix has limitations due to its linear mixing strategy [41], which incorporates the feature information of the original image insufficiently and without adaptation. At the same time, as sharp edges exist between different regions in SAR images, it may disrupt certain regions in the original SAR image, leading to a suboptimal mixing effect. For instance, when a region in the original image has relatively small pixel values, and the corresponding pixel values in the introduced image are significantly large, even though the weighting coefficient is small, the linear addition may still result in substantial changes in that region and disrupt the feature information of the original image. Furthermore, linear addition may also incorporate background clutter information excessively, which undermines the mixing effect and hinders the misclassification of adversarial examples.
To overcome these problems, we utilize a nonlinear fusion strategy for a better mixing effect. First, we notice that the radar cross-section (RCS) of the target is relatively large, corresponding to regions with higher pixel values in SAR imagery. Therefore, we map the introduced images to a nonlinear space in the form of an exponential function. This exponential mapping expands the information within higher-pixel-value regions while compressing information within lower-pixel-value regions, thereby effectively concentrating the primary information within the target region and mitigating the influence of background clutter. Moreover, we employ multiplication modulation based on the original image, which can alleviate the distortions to the original image features when there exists a substantial difference between the introduced image and the original image within certain regions, thereby integrating information from other categories in a smooth fashion. The complete process is expressed as follows:
x ˜ = x e x r ,
where x is the original image and x is the randomly sampled image from another category. ⊙ denotes the element-wise product. r represents the parameter of the exponential function, which is used to modulate the strength of the nonlinear mapping.
Through the nonlinear multiplication strategy, we conduct information fusion more efficiently because it mitigates the disruptive effect on the original image and reduces the background clutter information in the introduced image. Since it is based on pixel-wise multiplication of the original image, the mixed image can preserve the information of the original image more adaptively and avoid excessive disruption while incorporating information from other categories to induce misclassification.

3.3. Intra-Class Transformations and Inter-Class Nonlinear Fusion Attack

Combining the above intra-class transformation and inter-class fusion strategies, we propose a novel input transformation-based attack, namely, intra-class transformations and inter-class nonlinear fusion attack (ITINFA). Firstly, we apply a series of image transformations T i · randomly selected in Section 3.1 to obtain diverse input and prevent the adversarial examples from overfitting the surrogate model. Secondly, we randomly sample several images from other categories and mix them up with the transformed images T i x in a nonlinear manner, which is elaborated on in Section 3.2. The nonlinear multiplication is beneficial to guide adversarial examples to be better classified into other incorrect categories while preserving the information of the original category without significant damage. After the intra-class input transformations and inter-class nonlinear fusion, we calculate the average gradient over the transformed and mixed copies of the input for updates, which can be expressed as
g ¯ t + 1 = 1 m 1 m 2 i = 0 m 1 1 j = 0 m 2 1 x J T i ( x ) e x j r , y t r u e , θ ,
where T i · denotes the input transformations and m 1 is the number of transformations. x j represents the image randomly sampled from another category and m 2 is the number of these sampled images.
Finally, we incorporate momentum for iterative updates to prevent becoming stuck in local optima, thereby boosting transferability more effectively. The pseudo-code of ITINFA is shown in Algorithm 1.
Algorithm 1 Intra-class Transformations and Inter-class Nonlinear Fusion Attack
Input:  A surrogate model F θ ; The input image x with ground-truth label y t r u e ; The maximum perturbation budget ϵ ; The number of iterations N; The decay factor μ ; The number of transformations m 1 ; The number of images from other categories m 2
Output:  An adversarial example x a d v
1:
α = ϵ / N ; g 0 = 0 ; g ¯ 0 = 0 ; x 0 a d v = x
2:
for  t = 0 to N 1  do
3:
    Construct a set of m 1 randomly transformed images according to Section 3.1;
4:
    Mix up each transformed image with m 2 images from other categories according to Section 3.2;
5:
    Calculate the average gradient according to Equation (22)
6:
    Update the enhanced momentum: g t + 1 = μ · g t + g ¯ t + 1 g ¯ t + 1 1
7:
    Update the adversarial example: x t + 1 a d v = x t a d v + α · sign g t + 1
8:
end for
9:
return  x a d v = x N a d v

4. Experiments

4.1. Datasets

4.1.1. MSTAR Dataset

The MSTAR dataset [42] is a benchmark dataset for SAR-ATR, which contains a series of X-band SAR images with ten different classes of ground targets at different azimuth and elevation angles. The optical images and their corresponding SAR images are depicted in Figure 4. In the experiments, we select the standard operation condition (SOC) that contains 2747 images collected at a 17° depression angle as the training dataset and 2426 images collected at a 15° depression angle as the test dataset. The detailed information of each category is shown in Table 1.

4.1.2. SEN1-2 Dataset

The SEN1-2 dataset [43] is an SAR–optical image pair dataset. It comprises four subsets, each corresponding to a different season—spring, summer, autumn, and winter—and contains 282,384 pairs of corresponding image patches. In this article, following the selection strategy adopted in [25], we select the summer subset, which consists of two parts: s1 and s2. The s1 part contains SAR images, while the s2 part contains corresponding optical images. There are 49 folders within s1, each containing SAR images from the same area and representing the same category. After careful consideration, we selected ten distinct categories for our experiments, striving to ensure maximum difference among different categories while maintaining similarity within each category, thus supporting the tasks of target recognition and adversarial attack. Within each category, we selected 500 images for training and 300 images for testing, as summarized in Table 1. Figure 5 illustrates the examples of the SAR images alongside their corresponding optical images.

4.2. Experimental Setup

4.2.1. Models

We conduct the experiments on eight classical DNN models. Among them, AlexNet [44], VGG11 [45], ResNet50 [46], ResNeXt50 [47], and DenseNet121 [48] are widely utilized as feature extraction networks and have achieved excellent target recognition performance. In addition, SqueezeNet [49], ShuffleNetV2 [50], and MobileNetV2 [51] are lightweight DNN architectures that reduce the model parameters and sizes while achieving comparable performance in various target recognition tasks. All of these models are well trained on the MSTAR and SEN1-2 datasets to evaluate the transferability of adversarial examples, and the specific information and recognition accuracies are shown in Table 2.
During the training process, we adopt the preprocessing strategy proposed in [3]. At first, the single-channel gray-scale SAR images are normalized to [0, 1] to accelerate the convergence of the loss function. Subsequently, we randomly crop 88 × 88 patches of training images for data augmentation and centrally crop 88 × 88 patches of test images to evaluate the recognition performance. After that, we resize these images to 224 × 224 and input them into the DNN models. Furthermore, the cross-entropy loss function and the Adam optimizer [52] are employed to train the models, with a learning rate of 0.0001, batch size of 20, and 500 training epochs.

4.2.2. Baselines

In the experiments, we compare our proposed ITINFA with several classic transfer-based attack algorithms, namely, MI-FGSM [21], NI-FGSM [18], and VMI-FGSM [22]. Additionally, we also compare with several input transformation-based methods, namely, DIM [17], SIM [18], TIM [19], Admix [20], and SVA [29]. The maximum perturbation budget ϵ for all these algorithms is set to 16/255, with the number of iterations T set to 10, step size α set to 16/2550, and decay factor μ set to 1. We set the number of sampled examples in the neighborhood to 5 and the upper bound to 1.5 for VMI-FGSM. We adopt transformation probability p = 0.5 for DIM and the number of scaled copies m = 30 for SIM. For TIM, we adopt a Gaussian kernel with a size of 5 × 5. For Admix, we randomly sample m = 3 images from other categories with the strength of 0.2 and scale 30 images for each admixed image. For SVA, we adopt a median filter with a kernel size of 5 × 5 and tail β = 1.5 of the truncated exponential distribution. We set m 1 = 30 , m 2 = 3 , and r = 0.8 for our proposed ITINFA. For fair comparisons, all of these methods are integrated into momentum iteration.

4.2.3. Metrics

The transferability is evaluated by the attack success rate, which is defined as the ratio of the number of examples misclassified after the attack to the total number of correctly classified examples. The formula is expressed as follows:
A S R = i = 1 N I argmax F θ x i a d v y t r u e N ,
where I · is the indication function and N is the total number of correctly classified examples.
In addition, the misclassification confidence [33] is utilized to further evaluate the attack performance and transferability of various attack algorithms. This is obtained from the softmax output of the target model and denotes the confidence probability that the adversarial example is misclassified into the wrong category in a successful attack.
Adversarial attacks require not only deceiving the DNNs but also ensuring that the generated adversarial perturbations are sufficiently subtle and imperceptible to the human eye. In the experiments, the 2 norm and structural similarity (SSIM) are employed to evaluate the imperceptibility of adversarial examples. The 2 norm [53] is a widely employed metric that measures the Euclidean distance between the adversarial example and the original image, while the SSIM [54] is utilized to measure the similarity of these two images.
Finally, we calculate the average time consumption for generating a single adversarial example, serving as a metric to assess the computational efficiency of the attack algorithms.

4.2.4. Environment

In this article, the Python programming language (v3.11.8) and the Pytorch deep learning framework (v2.2.2) [55] are used to implement all the experiments and evaluate the performance of various adversarial attack algorithms. All the experiments are supported by four NVIDIA RTX 6000 Ada Generation GPUs and powered by an Intel Xeon Silver 4310 CPU.

4.3. Single-Model Experiments

We first conduct experiments on a single model to evaluate the transferability of various attack algorithms. The adversarial examples are crafted on a single surrogate model and subsequently employed to attack the target models. If the surrogate model is identical to the target model, it denotes a white-box attack scenario; otherwise, it is classified as a black-box attack. Figure 6 summarizes the attack success rates of adversarial examples generated by each algorithm when the surrogate models are AlexNet, VGG, ResNet, ResNeXt, DenseNet, SqueezeNet, ShuffleNet, and MobileNet, respectively.
It is evident that the attack success rates of white-box attacks for all the surrogate models notably surpass those of black-box attacks. Furthermore, for the AlexNet, VGG, and ShuffleNet models, the white-box attack success rates for all the employed algorithms nearly reach 100%. As for the black-box attack performance, it is discernible that diverse surrogate models exhibit significant discrepancies in the transferability of the generated adversarial examples. For instance, when ResNet and ResNeXt are employed as surrogate models, they achieve notable transferability among different target models, with the attack success rates consistently surpassing 50%. However, when ShuffleNet and MobileNet are utilized as surrogate models, the transferability is notably limited, with attack success rates scarcely exceeding 20%. Different attack algorithms also exhibit discrepancies in transferability across different target models. In comparison, NI-FGSM generally demonstrates inferior transferability on most models compared to MI-FGSM, which may suggest that Nesterov’s accelerated gradient is less effective than momentum on SAR images. Additionally, it can be observed that among several input transformation-based methods, SIM shows the poorest transferability on most target models. This indicates that the scaling transformation may not be suitable for SAR images with sparse characteristics. At the same time, both DIM and TIM exhibit similar transferability on most models, suggesting that the contributions of resizing and translation in improving transferability are comparable for SAR images. Compared to these baselines, our proposed ITINFA consistently achieves the best transferability across eight surrogate models. This consistent and superior attack performance indicates that ITINFA can effectively enhance transferability across various target models. At the same time, it is worth noting that although ITINFA exhibits a slightly lower white-box attack success rate on specific models (e.g., AlexNet, ResNet, MobileNet) compared to some baseline methods, it still ensures a satisfactory level of white-box attack effectiveness, and the slight reduction in white-box attack performance as a trade-off for enhanced transferability is a normal phenomenon.
To better present the experimental results of transferability, Table 3 and Table 4, respectively, demonstrate the average transfer attack success rates on the MSTAR and SEN1-2 datasets when each model serves as the surrogate model and generates adversarial examples to attack the other seven target models. The experimental results indicate that regardless of which model acts as the surrogate model, our proposed ITINFA consistently achieves the best transferability. Specifically, ITINFA outperforms the best baseline method with a clear margin of 9.0% on the MSTAR dataset and 8.1% on the SEN1-2 dataset, further underscoring that ITINFA can significantly enhance the transferability across different architectures.
In addition, we also calculate the average misclassification confidence, 2 norm, SSIM, and time consumption on the MSTAR dataset to further conduct a comprehensive evaluation of the proposed algorithm. VGG is utilized as the surrogate model to attack the other seven target models, and the experimental results are reported in Table 5. It can be observed that ITINFA not only achieves the highest attack success rate but also attains the highest misclassification confidence, indicating that ITINFA is more effective than other baselines. At the same time, ITINFA demonstrates comparable 2 norm and SSIM values to other algorithms. This indicates that ITINFA maintains a similar level of imperceptibility as baseline methods, rendering the adversarial perturbations imperceptible to the human eye, especially when dealing with a limited perturbation budget. Moreover, both ITINFA and Admix require a longer execution time compared to other methods, indicating that the incorporation of additional image categories to guide the optimization of adversarial examples is more time-consuming than other transformation approaches. Although ITINFA takes a relatively longer time compared to other algorithms, it remains within an acceptable range, ensuring an average time consumption of less than 1 s for generating a single adversarial example. It is also worth mentioning that the computational efficiency of ITINFA is variable and positively correlates with the number of transformations and introduced samples. Reducing the number of transformations or introduced samples can significantly enhance computational efficiency at the cost of a slight decrease in attack success rate.

4.4. Ensemble-Model Experiments

To further validate the effectiveness of our proposed ITINFA, we conduct the ensemble-model experiments proposed in [21], which fuse the logit outputs of various models to generate the adversarial examples. Based on the MSTAR dataset, we construct an ensemble model by integrating four surrogate models, including AlexNet, VGG, ResNet, and ResNeXt, and all the ensemble models are assigned equal weights. Subsequently, we evaluate the transferability by conducting attacks on DenseNet, SqueezeNet, ShuffleNet, and MobileNet. The results are shown in Table 6.
From Table 6, it is observed that adversarial examples generated on the ensemble model exhibit more stable transferability across various target models with different architectures, with the attack success rate consistently remaining within a specific range. Compared with all the baseline methods, ITINFA achieves the average attack success rate of 87.30% and outperforms the best baseline method with a clear margin of 4.40%. Meanwhile, it exhibits the highest misclassification confidence of 90.7%, demonstrating its superior performance in generating transferable adversarial examples on the ensemble model.

4.5. Ablation Studies

In this subsection, to further explore the superior transferability achieved by ITINFA, we conduct ablation studies on the MSTAR dataset to validate that both intra-class transformations and inter-class nonlinear fusion are beneficial to enhance transferability. We set up the following four configurations for ablation studies: (1) the original ITINFA; (2) removing intra-class transformations; (3) removing inter-class nonlinear fusion; (4) removing both intra-class transformations and inter-class nonlinear fusion. We employ AlexNet as the surrogate model to generate adversarial examples and attack the other seven models to evaluate the transferability.
The experimental results are shown in Table 7. It is observed that configuration 1 exhibits the best transferability. However, removing either intra-class transformations module or inter-class nonlinear fusion module decreases transferability, and configuration 4 exhibits the poorest transferability when both modules are removed simultaneously. Therefore, it is evident that both intra-class transformations and inter-class nonlinear fusion are effective in enhancing transferability and can act synergistically to further enhance transferability.

4.6. Parameter Studies

In this subsection, we explore the impact of three hyper-parameters on the transferability: the number of transformations m 1 , the number of randomly sampled images from other categories m 2 , and the strength of nonlinear mapping r. All the adversarial examples are generated on VGG and utilized to attack the target models to evaluate transferability.
ITINFA applies m 1 transformations to the input image to obtain the diverse gradient information. To explore the impact of m 1 on the transferability, we conduct experiments with m 1 varying from 5 to 50, m 2 fixed to 3, and r fixed to 0.8. The attack success rates are recorded in Figure 7. It is evident that the white-box attack success rates consistently remain at a level close to 100%, and the transfer attack success rate is lowest when m 1 = 5 . As m 1 increases, the richness of the gradient information gradually enhances, leading to a gradual improvement in the transferability of adversarial examples. However, once m 1 exceeds 30, the improvement in transferability becomes negligible, while the computational cost increases substantially. Therefore, we select m 1 = 30 in the experiments to maintain a favorable balance between transferability and computational cost.
In Figure 8, we further report the attack success rates of ITINFA with various values of m 2 . We select m 2 from 1 to 5, fix m 1 to 30, and set r to 0.8. When m 2 3 , the transferability of adversarial examples on all the target models improves as the value of m 2 increases. This suggests that incorporating the information from other categories more effectively guides the misclassification of adversarial examples. However, when m 2 > 3 , the improvement in transferability becomes less pronounced and even shows a downward trend. This phenomenon might result from introducing excessive information from other categories, which impairs the transferability of adversarial examples. Additionally, increasing the value of m 2 also brings excessive computational costs. Consequently, m 2 = 3 is chosen for our experiments.
In addition, we also explored the impact of nonlinear mapping at different strengths on transferability, and the results are illustrated in Figure 9. In the experiment, r varies from 0.4 to 2.0, m 1 is fixed to 30, and m 2 is fixed to 3. It can be observed that when r is lower than 0.8, the attack success rate increases significantly with the growth in r. However, after r surpasses 1, the attack success rates gradually converge to a stable range and begin to decline slowly. In most of the target models, an optimal attack performance is observed at r = 0.8 . Consequently, this value is selected as the designated setting for r in our experiments.

4.7. Visualizations

Some results of the adversarial perturbations and adversarial examples generated by the various algorithms are visualized in Figure 10. The first two rows represent the examples from the MSTAR dataset, while the last two rows illustrate the examples from the SEN1-2 dataset. To enhance the visual effect, we utilize the Parula color map to visualize the gray-scale SAR images. It is evident that our proposed ITINFA is capable of achieving satisfactory visual imperceptibility. That is, the adversarial example generated by ITINFA can fool the target model while maintaining high similarity with the original image.
Figure 11 further illustrates an adversarial threat scenario in which a military target tank (T62) is erroneously recognized as a truck (ZIL-131) after the adversarial perturbation generated by ITINFA is added to the original image. We utilize the feature visualization technique GradCAM [56] to visualize the output feature map weights of the last convolutional layer of VGG. It is evident that adding adversarial perturbation results in a significant alteration in the attention regions. When the target category is set to T62, the attention of the model is mainly focused on the target area of the original image, whereas for the adversarial example, it shifts to the background clutter area and relocates back to the target area when the target category is set to ZIL-131. This presentation further enhances the visual understanding of the attack process and indicates the potential of ITINFA for application in real-world scenarios.

5. Discussion

In the field of adversarial attacks on SAR-ATR, due to the highly non-cooperative nature and the secretive characteristics of adversary’s SAR systems it is typically challenging to acquire information about target models. Therefore, enhancing the transferability of adversarial examples to unknown target models becomes exceptionally crucial. Input transformation-based attacks represent a prevalent method for boosting transferability today. The essence of these approaches lies in employing data augmentation to process the input image, thereby generating richer gradient information. This facilitates a more generalized optimization direction for updates, avoiding local optima in the optimization process and alleviating the overfitting of adversarial examples on the surrogate model. Therefore, in this article, we continue to explore towards this direction.
The proposed ITINFA comprises two main modules: firstly, from the perspective of the image itself, diverse intra-class transformations are applied to the input SAR image. The transformations include typical techniques used in optical images and methods developed explicitly for the unique properties of SAR imagery. Secondly, from the perspective of the other category images, incorporating information from other categories to better guide adversarial examples to be misclassified, regardless of the target models. Notably, we employ a nonlinear multiplication for integration rather than linear addition. Experimental results indicate that this modulation approach can achieve better effects. Future research might also consider the effects of other nonlinear mapping methods, such as using a logarithmic function instead of an exponential one. However, due to space limitations, this article will not delve into those. Both the intra-class transformations and inter-class fusion modules have been proven in ablation studies to enhance the transferability of adversarial examples significantly, and they can also be organically integrated with other types of methods to further improve transferability, such as ensemble attacks [57,58] or specialized objective functions [59,60].
Future research in this field will focus on the following aspects. First, enhancing the transferability to unknown black-box models. Given the non-cooperative nature of SAR systems, how to generate highly transferable adversarial examples using a known surrogate model without accessing target models will remain an essential research direction. Second, improving the robustness of adversarial perturbations. Since there may be some degree of position deviation, perspective changes, and other uncontrollable errors when physically implanting adversarial perturbations, ensuring that the generated adversarial examples can still maintain attack capabilities in the face of uncertainties will be a significant challenge. Third, enhancing the physical realizability of adversarial perturbations. Current research primarily focuses on pixel-level perturbations in the digital domain. Although some studies [53,61,62] have linked adversarial perturbations with electromagnetic scattering characteristics of the target by using the attribute scattering center model (ASCM), there is still a long way to go to realize adversarial perturbations physically. Future studies should consider passive interference methods like electromagnetic materials and metasurfaces [63] to alter the target’s electromagnetic scattering properties, or using jammers and other active interference methods to add adversarial perturbations to the target’s echo signals, thereby physically realizing adversarial attacks in the real world and bridging the gap between theoretical advancements and practical applicability.

6. Conclusions

In this article, we propose a novel adversarial attack method to enhance the transferability of SAR adversarial examples. Initially, we perform diverse intra-class transformations on the input image with a series of data augmentation techniques aligned with the unique characteristics of SAR imagery. This enriches gradient information, guides the optimization direction, and alleviates overfitting to the surrogate model. Subsequently, we innovatively incorporate image information from other categories in a nonlinear manner, which effectively assists the adversarial examples in navigating through the decision boundaries of various models, enhancing their adaptability across different recognition models. Moreover, we integrate a momentum iterative updating algorithm to escape local optima in the optimization process, further enhancing the transferability of the adversarial examples. Comprehensive experimental analysis based on the MSTAR dataset and SEN1-2 dataset has demonstrated that our proposed ITINFA significantly surpasses other baseline methods in terms of transferability while maintaining commendable white-box attack performance.

Author Contributions

Conceptualization, X.H. and Z.L.; methodology, X.H.; software, X.H.; validation, X.H. and Z.L.; formal analysis, X.H. and B.P.; investigation, X.H. and B.P.; resources, B.P.; data curation, X.H.; writing—original draft preparation, X.H.; writing—review and editing, B.P. and Z.L.; visualization, X.H.; supervision, B.P.; project administration, B.P.; funding acquisition, B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Changsha Outstanding Innovative Youth Training Program under Grant kq2107002 and in part by the National Natural Science Foundation of China under Grant 61921001.

Data Availability Statement

The data presented in this study are available in MSTAR at https://download.csdn.net/download/qq_34277608/11537943 (accessed on 13 August 2019). and in SEN1-2 at https://doi.org/10.48550/arXiv.1807.01569 (accessed on 4 July 2018).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chang, S.; Deng, Y.; Zhang, Y.; Zhao, Q.; Wang, R.; Zhang, K. An Advanced Scheme for Range Ambiguity Suppression of Spaceborne SAR Based on Blind Source Separation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
  2. Yue, D.; Xu, F.; Frery, A.C.; Jin, Y. Synthetic Aperture Radar Image Statistical Modeling: Part One-Single-Pixel Statistical Models. IEEE Geosci. Remote Sens. Mag. 2020, 9, 82–114. [Google Scholar] [CrossRef]
  3. Chen, S.; Wang, H.; Xu, F.; Jin, Y. Target Classification Using the Deep Convolutional Networks for SAR Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4806–4817. [Google Scholar] [CrossRef]
  4. Wang, C.; Pei, J.; Wang, Z.; Huang, Y.; Wu, J.; Yang, H.; Yang, J. When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation. Remote Sens. 2020, 12, 3863. [Google Scholar] [CrossRef]
  5. Zhu, X.; Montazeri, S.; Ali, M.; Hua, Y.; Wang, Y.; Mou, L.; Shi, Y.; Xu, F.; Bamler, R. Deep Learning Meets SAR: Concepts, Models, Pitfalls, and Perspectives. IEEE Geosci. Remote Sens. Mag. 2021, 9, 143–172. [Google Scholar] [CrossRef]
  6. Pei, J.; Wang, Z.; Sun, X.; Huo, W.; Zhang, Y.; Huang, Y.; Wu, J.; Yang, J. FEF-Net: A Deep Learning Approach to Multiview SAR Image Target Recognition. Remote Sens. 2021, 13, 3493. [Google Scholar] [CrossRef]
  7. Zeng, Z.; Sun, J.; Xu, C.; Wang, H. Unknown SAR Target Identification Method Based on Feature Extraction Network and KLD–RPA Joint Discrimination. Remote Sens. 2021, 13, 2901. [Google Scholar] [CrossRef]
  8. Li, J.; Yu, Z.; Yu, L.; Cheng, P.; Chen, J.; Chi, C. A Comprehensive Survey on SAR ATR in Deep-Learning Era. Remote Sens. 2023, 15, 1454. [Google Scholar] [CrossRef]
  9. Li, B.; Cui, Z.; Wang, H.; Deng, Y.; Ma, J.; Yang, J.; Cao, Z. SAR Incremental Automatic Target Recognition Based on Mutual Information Maximization. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4005305. [Google Scholar] [CrossRef]
  10. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing Properties of Neural Networks. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  11. Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef]
  12. Xu, Y.; Bai, T.; Yu, W.; Chang, S.; Atkinson, P.; Ghamisi, P. AI Security for Geoscience and Remote Sensing: Challenges and Future Trends. IEEE Geosci. Remote Sens. Mag. 2023, 11, 60–85. [Google Scholar] [CrossRef]
  13. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  14. Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial Examples in the Physical World. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
  15. Carlini, N.; Wagner, D. Towards Evaluating the Robustness of Neural Networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 22–24 May 2017; pp. 39–57. [Google Scholar]
  16. Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A Simple and Accurate Method to Fool Deep Neural Networks. In Proceedings of the Computer Vision and Pattern Recognition Conference, Las Vegas, NV, USA, 26–30 June 2016; pp. 2574–2582. [Google Scholar]
  17. Xie, C.; Zhang, Z.; Zhou, Y.; Bai, S.; Wang, J.; Ren, Z.; Yuille, A.L. Improving Transferability of Adversarial Examples with Input Diversity. In Proceedings of the Computer Vision and Pattern Recognition Conference, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  18. Lin, J.; Song, C.; He, K.; Wang, L.; Hopcroft, J. Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  19. Dong, Y.; Pang, T.; Su, H.; Zhu, J. Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks. In Proceedings of the Computer Vision and Pattern Recognition Conference, Long Beach, CA, USA, 15–20 June 2019; pp. 4312–4321. [Google Scholar]
  20. Wang, X.; He, X.; Wang, J.; He, K. Admix: Enhancing the Transferability of Adversarial Attacks. In Proceedings of the International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 16158–16167. [Google Scholar]
  21. Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting Adversarial Attacks with Momentum. In Proceedings of the Computer Vision and Pattern Recognition Conference, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9185–9193. [Google Scholar]
  22. Wang, X.; He, K. Enhancing the Transferability of Adversarial Attacks Through Variance Tuning. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 20–25 June 2021; pp. 1924–1933. [Google Scholar]
  23. Ge, Z.; Liu, H.; Wang, X.; Shang, F.; Liu, Y. Boosting Adversarial Transferability by Achieving Flat Local Maxima. In Proceedings of the Conference and Workshop on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 70141–70161. [Google Scholar]
  24. Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. Mixup: Beyond Empirical Risk Minimization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  25. Li, H.; Huang, H.; Chen, L.; Peng, J.; Huang, H.; Cui, Z.; Mei, X.; Wu, G. Adversarial Examples for CNN-Based SAR Image Classification: An Experience Study. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 1333–1347. [Google Scholar] [CrossRef]
  26. Huang, T.; Zhang, Q.; Liu, J.; Hou, R.; Wang, X.; Li, Y. Adversarial Attacks on Deep-Learning-Based SAR Image Target Recognition. J. Netw. Comput. Appl. 2020, 162, 102632. [Google Scholar] [CrossRef]
  27. Du, C.; Zhang, L. Adversarial Attack for SAR Target Recognition Based on UNet-Generative Adversarial Network. Remote Sens. 2021, 13, 4358. [Google Scholar] [CrossRef]
  28. Du, C.; Huo, C.; Zhang, L.; Chen, B.; Yuan, Y. Fast C&W: A Fast Adversarial Attack Algorithm to Fool SAR Target Recognition with Deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar]
  29. Peng, B.; Peng, B.; Zhou, J.; Xia, J.; Liu, L. Speckle-Variant Attack: Toward Transferable Adversarial Attack to SAR Target Recognition. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
  30. Zhang, Z.; Gao, X.; Liu, S.; Peng, B.; Wang, Y. Energy-Based Adversarial Example Detection for SAR Images. Remote Sens. 2022, 14, 5168. [Google Scholar] [CrossRef]
  31. Du, M.; Bi, D.; Du, M.; Xu, X.; Wu, Z. ULAN: A Universal Local Adversarial Network for SAR Target Recognition Based on Layer-Wise Relevance Propagation. Remote Sens. 2022, 15, 21. [Google Scholar] [CrossRef]
  32. Peng, B.; Peng, B.; Yong, S.; Liu, L. An Empirical Study of Fully Black-Box and Universal Adversarial Attack for SAR Target Recognition. Remote Sens. 2022, 14, 4017. [Google Scholar] [CrossRef]
  33. Zhang, F.; Meng, T.; Xiang, D.; Ma, F.; Sun, X.; Zhou, Y. Adversarial Deception against SAR Target Recognition Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4507–4520. [Google Scholar] [CrossRef]
  34. Lin, G.; Pan, Z.; Zhou, X.; Duan, Y.; Bai, W.; Zhan, D.; Zhu, L.; Zhao, G.; Li, T. Boosting Adversarial Transferability with Shallow-Feature Attack on SAR Images. Remote Sens. 2023, 15, 2699. [Google Scholar] [CrossRef]
  35. Wang, X.; Zhang, Z.; Zhang, J. Structure Invariant Transformation for Better Adversarial Transferability. In Proceedings of the International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4607–4619. [Google Scholar]
  36. Ding, J.; Chen, B.; Liu, H.; Huang, M. convolutional neural network with Data Augmentation for SAR Target Recognition. IEEE Geosci. Remote Sens. Lett. 2016, 13, 364–368. [Google Scholar] [CrossRef]
  37. Argenti, F.; Lapini, A.; Bianchi, T.; Alparone, L. A Tutorial on Speckle Reduction in Synthetic Aperture Radar Images. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–35. [Google Scholar] [CrossRef]
  38. Deledalle, C.A.; Denis, L.; Tupin, F.; Reigber, A.; Jäger, M. NL-SAR: A Unified Nonlocal Framework for Resolution-Preserving (Pol)(In) SAR Denoising. IEEE Trans. Geosci. Remote Sens. 2014, 53, 2021–2038. [Google Scholar] [CrossRef]
  39. Parrilli, S.; Poderico, M.; Angelino, C.V.; Verdoliva, L. A Nonlocal SAR Image Denoising Algorithm Based on LLMMSE Wavelet Shrinkage. IEEE Trans. Geosci. Remote Sens. 2011, 50, 606–616. [Google Scholar] [CrossRef]
  40. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
  41. Wang, T.; Ying, Z.; Li, Q.; Lian, Z. Boost Adversarial Transferability by Uniform Scale and Mix Mask Method. arXiv 2023, arXiv:2311.12051. [Google Scholar]
  42. Ross, T.D.; Worrell, S.W.; Velten, V.J.; Mossing, J.C.; Bryant, M.L. Standard SAR ATR Evaluation Experiments Using the MSTAR Public Release Data Set. In Proceedings of the Algorithms for Synthetic Aperture Radar Imagery V, SPIE, Bellingham, WA, USA, 14–17 April 1998; Volume 3370, pp. 566–573. [Google Scholar]
  43. Schmitt, M.; Hughes, L.; Zhu, X. The SEN1-2 Dataset for Deep Learning in Sar-Optical Data Fusion. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 2018, 4, 141–146. [Google Scholar] [CrossRef]
  44. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep convolutional neural networks. In Proceedings of the Conference and Workshop on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
  45. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  46. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the Computer Vision and Pattern Recognition Conference, Las Vegas, NV, USA, 26–30 June 2016; pp. 770–778. [Google Scholar]
  47. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the Computer Vision and Pattern Recognition Conference, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
  48. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K. Densely Connected Convolutional Networks. In Proceedings of the Computer Vision and Pattern Recognition Conference, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  49. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
  50. Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
  51. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the Computer Vision and Pattern Recognition Conference, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  52. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  53. Peng, B.; Peng, B.; Zhou, J.; Xie, J.; Liu, L. Scattering Model Guided Adversarial Examples for SAR Target Recognition: Attack and Defense. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  54. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  55. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Brooklyn, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
  56. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
  57. Liu, Y.; Chen, X.; Liu, C.; Song, D. Delving into Transferable Adversarial Examples and Black-Box Attacks. arXiv 2016, arXiv:1611.02770. [Google Scholar]
  58. Xiong, Y.; Lin, J.; Zhang, M.; Hopcroft, J.E.; He, K. Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability. In Proceedings of the Computer Vision and Pattern Recognition Conference, New Orleans, LA, USA, 19–24 June 2022; pp. 14983–14992. [Google Scholar]
  59. Zhou, W.; Hou, X.; Chen, Y.; Tang, M.; Huang, X.; Gan, X.; Yang, Y. Transferable Adversarial Perturbations. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 452–467. [Google Scholar]
  60. Wu, W.; Su, Y.; Chen, X.; Zhao, S.; King, I.; Lyu, M.R.; Tai, Y.W. Boosting the Transferability of Adversarial Samples via Attention. In Proceedings of the Computer Vision and Pattern Recognition Conference, Seattle, WA, USA, 14–19 June 2020; pp. 1161–1170. [Google Scholar]
  61. Qin, W.; Long, B.; Wang, F. SCMA: A Scattering Center Model Attack on CNN-SAR Target Recognition. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar]
  62. Zhou, J.; Feng, S.; Sun, H.; Zhang, L.; Kuang, G. Attributed Scattering Center Guided Adversarial Attack for DCNN SAR Target Recognition. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar]
  63. Xu, H.; Guan, D.F.; Peng, B.; Liu, Z.; Yong, S.; Liu, Y. Radar One-Dimensional Range Profile Dynamic Jamming Based on Programmable Metasurface. IEEE Antennas Wirel. Propag. Lett. 2021, 20, 1883–1887. [Google Scholar] [CrossRef]
Figure 1. Comparisons of the attack performance of existing algorithms in white-box and black-box scenarios. The data are obtained from single-model experiments with VGG11 utilized as the surrogate model.
Figure 1. Comparisons of the attack performance of existing algorithms in white-box and black-box scenarios. The data are obtained from single-model experiments with VGG11 utilized as the surrogate model.
Remotesensing 16 02539 g001
Figure 2. The framework of the proposed ITINFA. For enhanced visual presentation, the Parula color map is utilized to visualize the gray-scale SAR images in this article.
Figure 2. The framework of the proposed ITINFA. For enhanced visual presentation, the Parula color map is utilized to visualize the gray-scale SAR images in this article.
Remotesensing 16 02539 g002
Figure 3. Examples of the input image and transformed images.
Figure 3. Examples of the input image and transformed images.
Remotesensing 16 02539 g003
Figure 4. Examples in the MSTAR dataset: (top) optical images and (bottom) the corresponding SAR images.
Figure 4. Examples in the MSTAR dataset: (top) optical images and (bottom) the corresponding SAR images.
Remotesensing 16 02539 g004
Figure 5. Examples in the SEN1-2 dataset: (top) optical images and (bottom) the corresponding SAR images.
Figure 5. Examples in the SEN1-2 dataset: (top) optical images and (bottom) the corresponding SAR images.
Remotesensing 16 02539 g005
Figure 6. The attack success rates of eight models on the adversarial examples generated on each surrogate model.
Figure 6. The attack success rates of eight models on the adversarial examples generated on each surrogate model.
Remotesensing 16 02539 g006
Figure 7. The attack success rates of ITINFA with various numbers of transformations m 1 .
Figure 7. The attack success rates of ITINFA with various numbers of transformations m 1 .
Remotesensing 16 02539 g007
Figure 8. The attack success rates of ITINFA with various numbers of sampled images m 2 .
Figure 8. The attack success rates of ITINFA with various numbers of sampled images m 2 .
Remotesensing 16 02539 g008
Figure 9. The attack success rates of ITINFA with different strengths of nonlinear mapping r.
Figure 9. The attack success rates of ITINFA with different strengths of nonlinear mapping r.
Remotesensing 16 02539 g009
Figure 10. Adversarial perturbations and adversarial examples generated by various algorithms. (top) are adversarial perturbations, and (bottom) are the corresponding adversarial examples.
Figure 10. Adversarial perturbations and adversarial examples generated by various algorithms. (top) are adversarial perturbations, and (bottom) are the corresponding adversarial examples.
Remotesensing 16 02539 g010
Figure 11. Discrimination maps generated by Grad-CAM on VGG. The maps represent the attention mechanism for the original military target tank (T62) and the misclassified truck (ZIL-131).
Figure 11. Discrimination maps generated by Grad-CAM on VGG. The maps represent the attention mechanism for the original military target tank (T62) and the misclassified truck (ZIL-131).
Remotesensing 16 02539 g011
Table 1. Details of the MSTAR dataset and SEN1-2 dataset used in this article.
Table 1. Details of the MSTAR dataset and SEN1-2 dataset used in this article.
MSTARSEN1-2
ClassTraining SetTest SetClassTraining SetTest Set
2S1299274s1_18500300
BMP2233196s1_19500300
BRDM2298274s1_20500300
BTR60256195s1_32500300
BTR70233196s1_38500300
D7299274s1_45500300
T62299273s1_49500300
T72232196s1_59500300
ZIL131299274s1_61500300
ZSU234299274s1_78500300
Total27472426Total50003000
Table 2. The information and recognition accuracies of the DNN models.
Table 2. The information and recognition accuracies of the DNN models.
Model# params.FLOPs ( × 10 9 )MSTAR Acc. (%)SEN1-2 Acc. (%)
AlexNet [44]57,029,2580.6697.5793.37
VGG11 [45]128,806,1547.5598.6995.21
ResNet50 [46]23,522,3144.0597.8996.79
ResNeXt50 [47]22,994,1864.2198.0196.43
DenseNet121 [48]6,957,8982.8298.3198.03
SqueezeNet [49]726,4740.2596.6695.29
ShuffleNetV2 [50]1,263,4460.1497.9594.15
MobileNetV2 [51]2,236,1380.3196.9396.73
Table 3. The transfer attack success rates on the MSTAR dataset. Each datum represents the average attack success rate obtained using the surrogate model specified in that column to attack the other seven target models.
Table 3. The transfer attack success rates on the MSTAR dataset. Each datum represents the average attack success rate obtained using the surrogate model specified in that column to attack the other seven target models.
AttackAlexNetVGGResNetResNeXtDenseNetSqueezeNetShuffleNetMobileNetAverage
MI-FGSM42.462.373.577.162.659.211.412.550.1
NI-FGSM39.355.768.176.157.949.111.311.946.2
VMI-FGSM51.464.975.981.957.767.216.513.753.7
DIM45.169.574.178.665.459.513.212.752.3
SIM44.962.561.763.951.558.912.513.346.2
TIM46.369.274.978.964.760.813.912.852.7
Admix47.869.267.477.265.569.914.919.854.0
SVA54.572.667.976.862.568.123.719.355.7
ITINFA58.680.683.285.976.381.029.522.364.7
Table 4. The transfer attack success rates on the SEN1-2 dataset. Each datum represents the average attack success rate obtained using the surrogate model specified in that column to attack the other seven target models.
Table 4. The transfer attack success rates on the SEN1-2 dataset. Each datum represents the average attack success rate obtained using the surrogate model specified in that column to attack the other seven target models.
AttackAlexNetVGGResNetResNeXtDenseNetSqueezeNetShuffleNetMobileNetAverage
MI-FGSM37.560.845.954.472.145.131.135.247.8
NI-FGSM39.859.751.855.678.542.134.739.150.2
VMI-FGSM49.153.160.464.382.351.938.443.155.3
DIM40.966.555.163.880.456.733.740.354.7
SIM37.253.963.555.174.554.636.536.651.5
TIM40.967.756.465.480.357.534.441.355.5
Admix42.259.161.559.971.245.733.540.251.7
SVA40.367.357.161.173.847.936.438.952.9
ITINFA59.367.965.771.583.756.858.345.463.6
Table 5. The average value of misclassification confidence, 2 norm, SSIM, and time consumption.
Table 5. The average value of misclassification confidence, 2 norm, SSIM, and time consumption.
AttackConfidence (%) 2 NormSSIMTime Consumption (ms)
MI-FGSM95.25.1830.72710.2
NI-FGSM94.75.0600.6859.5
VMI-FGSM94.95.1150.73339.2
DIM92.15.0370.70110.7
SIM96.05.2570.73168.3
TIM93.75.0220.7039.8
Admix93.65.2890.707438.9
SVA96.55.0670.72533.5
ITINFA96.95.0730.726525.1
Table 6. The attack success rates and average misclassification confidences of adversarial examples generated on the ensemble model.
Table 6. The attack success rates and average misclassification confidences of adversarial examples generated on the ensemble model.
AttackDenseNetSqueezeNetShuffleNetMobileNetAverageConfidence
MI-FGSM67.355.464.153.960.1889.2
NI-FGSM72.564.369.054.365.0390.5
VMI-FGSM65.561.764.352.761.0585.6
DIM75.269.973.760.969.9387.9
SIM81.573.180.673.177.0889.0
TIM79.667.768.963.569.9387.2
Admix90.682.285.473.482.9086.1
SVA85.579.081.574.980.2387.5
ITINFA92.387.290.079.787.3090.7
Table 7. The average attack success rates of adversarial examples generated on AlexNet under different configurations.
Table 7. The average attack success rates of adversarial examples generated on AlexNet under different configurations.
AttackVGGResNetResNeXtDenseNetSqueezeNetShuffleNetMobileNetAverage
Configuration 163.150.756.755.169.463.550.558.4
Configuration 242.741.944.644.160.556.641.347.4
Configuration 351.743.045.945.455.157.245.149.1
Configuration 432.737.537.940.351.745.935.540.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, X.; Lu, Z.; Peng, B. Enhancing Transferability with Intra-Class Transformations and Inter-Class Nonlinear Fusion on SAR Images. Remote Sens. 2024, 16, 2539. https://doi.org/10.3390/rs16142539

AMA Style

Huang X, Lu Z, Peng B. Enhancing Transferability with Intra-Class Transformations and Inter-Class Nonlinear Fusion on SAR Images. Remote Sensing. 2024; 16(14):2539. https://doi.org/10.3390/rs16142539

Chicago/Turabian Style

Huang, Xichen, Zhengzhi Lu, and Bo Peng. 2024. "Enhancing Transferability with Intra-Class Transformations and Inter-Class Nonlinear Fusion on SAR Images" Remote Sensing 16, no. 14: 2539. https://doi.org/10.3390/rs16142539

APA Style

Huang, X., Lu, Z., & Peng, B. (2024). Enhancing Transferability with Intra-Class Transformations and Inter-Class Nonlinear Fusion on SAR Images. Remote Sensing, 16(14), 2539. https://doi.org/10.3390/rs16142539

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop