1. Introduction
Remote sensing images are widely used in mapping, environmental monitoring, battlefield detection, and other fields [
1,
2]. To prevent possible risks, remote sensing image recognition technology not only needs to be accurate, but also needs to be highly reliable and safe. In the field of remote sensing, scene classification is a fundamental research direction. Current studies focus not only on optical images but also on synthetic aperture radar (SAR) [
3] and hyperspectral images [
4]. Deep neural networks (DNNs) have achieved excellent performance in remote sensing image classification. In the early stages, convolution neural networks (CNNs) played a key role in image feature extraction [
5]. Since the emergence of vision transformer (ViT) in 2020, many variants of ViT have been proposed to enhance its classification performance on satellite and aerial images [
6]. Unlike the convolution operation with a local receptive field, ViT typically transforms the input image into a series of patches and uses the self-attention mechanism to capture global relationships, which helps in capturing the entire contextual information between and within the object. The vision Mamba, proposed in 2024, is capable of processing high-resolution remote sensing images [
7], extracting large-scale spatial features in multiple directions, and is more efficient in terms of GPU memory usage and inference time compared to transformer-based models [
8].
Many studies have shown that DNNs face safety threats from adversarial attack. Adversarial attack alters the original image by introducing carefully designed perturbations that cause the deep neural network to produce false predictions [
9,
10]. In the application scenarios of remote sensing, adversarial attacks can bring serious consequences [
11]. For example, adversarial attack could fabricate environmental disasters to mislead rescue decision-making. Additionally, attackers can obscure military forces and infrastructure, disrupting battlefield situational awareness. Therefore, researching and defending against adversarial attacks targeting remote sensing images is of significant practical importance and academic value [
12,
13,
14,
15].
The field of adversarial attacks in remote sensing has been widely studied. However, there are three limitations in the field of adversarial attacks in RSI. The first aspect is that most adversarial attacks are primarily conducted and validated on CNN-based models, with limited exploration of cross-architecture transferable black-box attacks [
16,
17,
18]. Although some studies have investigated adversarial attacks on ViT-based models, these approaches often take advantage of architectural properties specific to ViT-based models, including class tokens or self-attention mechanisms [
19,
20]. Therefore, these ViT-specific attack methods are difficult to migrate to models with different architectures. Secondly, like CNNs and ViTs, Mamba also faces challenges from adversarial perturbations [
21]. However, to the best of our knowledge, no existing research has explored the adversarial robustness of Mamba models in RSI classification. To improve the reliability of Mamba in remote sensing applications, it is necessary to find its weakness through adversarial attacks. Finally, transfer-based black-box attacks require a surrogate model to generate adversarial examples. However, existing studies on transfer-based attacks in remote sensing primarily focus on crafting perturbations, with little emphasis on the selection and enhancement of substitute models [
22]. Our experiments provide additional insights into improving substitute models for better attack performance.
To address the aforementioned research gaps, we propose a distilled surrogate model with feature-based adversarial attack (DMFAA) for remote sensing domain. Our method enhances the adversarial transferability of RSI classification models, enabling them to achieve better performance across models with different architectures. As shown in
Figure 1, models with different architectures exhibit differences in their attention regions for RSI classification. The adversarial examples generated by the DMFAA can alter the model’s attention regions on the image, leading to misclassification. Firstly, to ensure the adversarial examples have good transferability on models with different architectures, we propose a model distillation approach to train a new surrogate model. This approach ensures that the shallow features from the surrogate model incorporate information from various teacher models. We use the trained student model as an surrogate model and generate adversarial examples by feature-based attack.
Figure 1 shows that in the original image, the student model with the ResNet50 architecture focuses on features more similar to those of the Mamba model in the fourth column. This indicates that the distilled student model can learn feature representations from models with different architectures, thus improving the ability to generate cross-architecture attack examples. Secondly, we improve the feature-based adversarial attack method by extracting feature importance from both the semantic features and non-semantic features. We obtain the aggregated semantic features by transforming the image in the low-frequency region through frequency domain transformation. At the same time, we use white-box attack to fabricate adversarial example sets and obtain an aggregated non-semantic feature gradient. Finally, we adopt a data enhancement strategy for remote sensing images, including rotation transformation and random occlusion, and optimize the perturbation with momentum iteration. To evaluate cross-architecture black-box attacks, we selected models from four different architectures, including CNN, ViT, CLIP, and Mamba as target models. We conducted our experiments on the AID [
23], UC [
24], and NWPU [
2] datasets to verify the validity of the proposed approach, which achieves state-of-the-art performance across model architectures in a black-box setup.
The main contributions of this paper are summarized as follows:
- 1.
A new framework, DMFAA, is proposed, in which Mamba is used as a teacher model to distill a surrogate model and improve the feature-based attack, thereby enhancing the transferability of adversarial examples in the RSI classification task.
- 2.
The feature-based attack method is improved, differing from existing feature-based adversarial attacks by calculating both semantic and non-semantic aggregate features. This method focuses on the transferability of adversarial examples across models with different architectures.
- 3.
An input transformation technique is developed specifically for remote sensing target detection. This method utilizes regional sequence-independent features of RSI to perform input variations, thereby effectively improving the transferability of adversarial examples across black-box model.
- 4.
The performance of DMFAA is tested on 14 different models, including CNNs, ViTs, CLIP, and Mamba. To the best of our knowledge, this is also the first evaluation of adversarial attacks on Mamba in RSI.
The remainder of this paper is organized as follows.
Section 2 reviews works relevant to our research.
Section 3 provides an overview of the problem setup and offers a detailed explanation of our proposed method.
Section 4 presents the experimental results, conducts ablation studies, and analyzes the findings of the proposed approach.
Section 5 discusses the work and suggests directions for future research.
Section 6 concludes the paper.
4. Experiments
4.1. Experimental Setup
4.1.1. Datasets
We used three RSI datasets for scenario classification, AID [
23], UC Merced [
24], and NWPU [
2]. In the experiment, all images were adjusted to a size of 224 × 224.
(1) AID dataset: AID contains 30 categories, each with a different image count, ranging from a minimum of 220 to a maximum of 400. It consists of 10,000 images with a resolution of 600 × 600 pixels each and is saved in PNG format.
(2) UC dataset: The UC dataset consists of a high-resolution RSI with 21 different scenario categories. Each image in the UC dataset has a resolution of 256 × 256 pixels.
(3) NWPU dataset: The NWPU dataset consists of RSI with 45 different scenario categories. Each image in the NWPU dataset has a resolution of 256 × 256 pixels.
4.1.2. Model Architectures
In terms of model selection, we used 14 typical DNN classification models to evaluate attack performance, including ResNet50, Resnet18, DenseNet161, VGG19, MobileNet, efficient-b0, ViT, Swin Transformer, Maxvit, Clip-Resnet, Clip-Vit, Mamba-b, and Mamba-l. As shown in
Table 2, all models have more than 70% accuracy with the AID, UC, and NWPU datasets excluding Clip-ResNet50.
4.1.3. Comparison Methods
In the comparative experiment, we compare DMFAA with seven state-of-the-art attack methods, including PGD, MIM, FIA, NAA, BSR, BFA, and SFCoT.
Table 1 shows the publication times and reference scenarios for these methods.
4.1.4. Evaluation Indicator
In the study of adversarial attacks, a commonly used metric for evaluating attack effectiveness is the attack success rate (ASR). ASR is calculated as follows:
where
ntotal denotes the total number of images in the test dataset and
nwrong represents the number of images misclassified by the target model. A higher ASR indicates a more effective attack method.
4.1.5. General Implementation
We randomly select images from the test set as clean original images to generate adversarial examples. During the training of the surrogate model, the number of iterations I and the loss function parameters and are set to 20, 0.5, and 0.1. In the feature extraction stage, the random mask and random scaling r in the frequency domain transformation are set to 0.05 and 0.1, respectively, using the db3 wavelet transform. In the white-box adversarial example generation stage of feature extraction, the iteration strength r is set to 28, and the number of iterations t is set to 30. When generating adversarial examples, the perturbation constraint , step size , number of iterations T, and loss function are set to 16, 1, 30, and 3, respectively. The size of the transformed image set N is set to 5 for gradient computation. The experiment was conducted under the PyTorch framework that supports Nvidia RTX 3090 GPU.
4.2. Result
Quantitative results: We conducted a comprehensive evaluation of the attack performance on different models in a black-box environment, with the results shown in
Table 3,
Table 4 and
Table 5.
Table 3,
Table 4 and
Table 5 present the results on the AID dataset, UC dataset, and NWPU dataset, respectively. The target models and attack methods are listed in rows and columns, respectively. For other methods, ResNet50 trained on the original dataset is used as the surrogate model, while for DMFAA, the ResNet50 model distilled with Mamba is used as the surrogate model. The average accuracies of the target models when trained on the AID, UC, and NWPU dataset were 92.44%, 94.03%, and 85.93%, respectively. After applying the DMFAA attack algorithm proposed in this paper, the average ASR of 14 target models decreased to 14%, 24.84%, and 9.85%.
The experimental results demonstrate that DMFAA exhibits strong cross-model attack performance on average. For the AID and UC datasets, the results indicate that the distillation-based surrogate model does not perform well on CNN, but achieves higher ASR on ViTs, CLIP, and Mamba. In the NWPU dataset, ASR improved on almost all target models except the white-box surrogate model ResNet50. This may be because the NWPU dataset contains more image categories, allowing the distilled surrogate model to capture richer data information and extract more generalizable architectural features. Our average ASR on the AID, UC, and NWPU datasets exceeds that of state-of-the-art methods, with improvements of 3.49%, 7.54%, and 13.09%, respectively. This demonstrates that our method significantly enhances the transferability of adversarial examples across different architectures.
Adversarial defense: To further assess the robustness of the proposed DMFAA method in adversarial defense scenarios, we selected several commonly used defense techniques for evaluation. First, we applied the adversarial attack strategies mentioned earlier to generate adversarial samples. Then, we used three typical adversarial defense methods—Total Variance Minimization (TVM), JPEG Compression (JC), and Quantization-based Defense (QT)—to process these adversarial examples. After applying these defense methods, we re-evaluated the success rate of the adversarial attacks.
The parameter settings for these defense methods are as follows: for TVM, the Bernoulli distribution probability is set to 0.3, and the norm and lambda parameters are set to 2 and 0.5, respectively; for QT, the depth parameter is set to 3. The experimental results are shown in
Table 6,
Table 7,
Table 8 and
Table 9. After defense processing, the ASR of the DMFAA attack reached its highest value on both datasets. Furthermore, compared to the scenario without defense methods, the ASR of DMFAA decreased to less than that of some other attacks. This indicates that the proposed DMFAA attack method remains highly effective, even in adversarial defense scenarios.
Qualitative results: In order to better illustrate the feasibility of our proposed method, we perform a visual analysis.
Figure 6 shows the application of different attack methods on the two datasets and the effects of adversarial defense processing. It can be seen that the disturbance added to the graph by our proposed method interferes with human eye recognition, but it attacks the DNN models.
Imperceptibility of adversarial examples: Structural similarity (SSIM) is a commonly used metric to measure the imperceptibility of adversarial examples. When the SSIM is 1, it means that the two images are exactly the same. To ensure that the adversarial examples generated by our method maintain imperceptible, we compared the SSIM between adversarial examples and clean samples across three datasets for different attack methods. A higher SSIM means that the adversarial example is more structurally similar to the clean image. The results in
Table 10 show that the adversarial examples generated by DMFAA have a similar SSIM than those produced by the state-of-the-art method SFCoT. Although the SSIM of PGD, FIA, and NAA is higher than that of DMFAA, their ASR are much lower. These results demonstrate that our proposed method improves ASR while maintaining imperceptibility.
4.3. Selection of Teacher Model
To emphasize the key role of Mamba model distillation, we conducted an ablation study. We first compare the ability of the CNN, ViT, and Mamba models to resist adversarial attacks.
Figure 7 shows models that attack different architectures using four classical white-box attack algorithms. We used FGSM, PGD, MIM, and DIM to conduct white-box attacks against VGG, Resnet, SwinT, and Mamba, respectively. All four models that were used had a classification accuracy greater than 98% on clean images. The results show that the performance of all models decreases with the increase in disturbance intensity. Further analysis shows that the Mamba model is more robust than CNN against multi-step attacks, including PGD, MIM, and DIM. For single-step attacks such as FGSM, especially under large disturbance, Mamba is slightly less robust than CNN. We believe that this difference may be related to the correlation between image regions introduced in the Mamba model. In the Mamba architecture, the correlation between image regions allows FGSM to capture patterns across larger spatial regions better with fewer iterative gradients, which affects its response to perturbations.
Next, we attempt to distill the substitute model using teacher models with different architectures. We used an untrained ResNet50 as the student model, with ResNet18, ViT, and Mamba-l as teacher models T1, and the ResNet50 model as teacher model T2. Knowledge distillation was performed on the student model under the same parameter settings. The values of
and
were set to 0.5 and 0.1, respectively.
Table 11 presents the classification accuracy of adversarial examples generated using student models trained with four different teacher models when evaluated on various target models. From the data in the table, it can be seen that when using only the MIM method, the average ASR is 53.82%. Adding Mamba as the teacher model increases the ASR by approximately 8%. Although the models distilled using ViT and ResNet improve the quality of adversarial example generation compared to the non-distilled MIM, their attack performance on most black-box models remains inferior to that of the Mamba model. This ablation study further indicates that Mamba model distillation, as a black-box attack method, can significantly improve the transferability of attacks.
4.4. Ablation Study
In this section, we focus on some of the hyperparameters that affect the performance of our methods.
Effectiveness of feature extraction. In the feature-based adversarial attack stage, the aggregate feature extraction process in our proposed method consists of two components: frequency domain transformation and white-box attack. The former aims to extract semantic information where the energy is concentrated in the image, while the latter seeks to extract non-semantic features that are more vulnerable to attack. We conduct ablation experiments on the AID and UC datasets to demonstrate the ability of these two methods in enhancing adversarial transferability. Adversarial examples are generated on ResNet152 and tested on other models. The attack success rates for each dataset are shown in
Figure 8. First, as shown in the table, compared to MIM, adversarial examples generated using frequency-domain transformations exhibit lower ASR on white-box models but achieve better performance on black-box models. This suggests that extracting low-frequency information from images helps identify regions with highly concentrated energy in semantic features, thereby enhancing semantic feature integration. Furthermore, adversarial examples generated under white-box attack can capture non-semantic features of black-box models, improving attack effectiveness. Finally, by integrating frequency-domain transformations, white-box attacks, and image transformations within the DMFAA framework, the best attack performance is achieved.
Selection of frequency domain. In DMFAA, we apply transformations in the low-frequency domain of the image to enhance the extraction of semantic information. We used ResNet152 as a surrogate model and performed experiments on nine target models using the UC dataset. The results in
Table 12 show that applying transformations to different high-frequency directions performs worse than the origin image. Meanwhile, transforming in the low-frequency domain improves the average ASR by around 2% compared to origin image and 5% compared to the high frequency. These results demonstrate that transforming in the low-frequency space can effectively capture semantic information and enhance the transferability of adversarial examples.
Parameters of the distillation stage. We conducted ablation studies on the teacher model parameters during the distillation phase. As shown in
Figure 9, the ablation study was conducted on the AID dataset. We fixed
and varied
among 0.9, 0.5, and 0.1. The results show that the average ASR gradually increases as
decreases. Furthermore, when
, the success rate remains comparable across different values of
, including 0.3, 0.5, and 5. Based on the experiment, we chose
and
as the default parameters in our experiments.
Number of iterations. Finally, we investigate the effect of the number of training iterations on the performance of adversarial examples. We conducted experiments with different iterations, and gave the accuracy rates for ResNet50, VGG16, MobileNet, Clip-ResNet50, Vit, and MaxVit, as shown in
Figure 10. This helps us to understand how the number of training iterations affects the effectiveness of adversarial attacks. We can observe that from 10 iterations to 30 iterations, the accuracy of model recognition decreases, and the effect of transfer is gradually improved. After reaching 30 iterations, the mobility of the adversarial example reaches saturation or slightly decreases with additional iterations. In order to align previous experiments, we took 30 as the number of iterations.
5. Discussion
In summary, our extensive experimental results demonstrate that the proposed DMFAA exhibits outstanding performance across multiple datasets. Its performance not only surpasses classic adversarial attack methods such as PGD and FIA, but also outperforms attack methods like SFCoT that focus on remote sensing images. Additionally, to the best of our knowledge, we are the first to explore adversarial attacks and defense methods for Mamba in remote sensing images, highlighting its practical significance and potential for future development.
Although the effectiveness of the DMFAA in black-box attacks has been validated in our experiments, there remain many challenges and limitations in real-world applications. On one hand, DMFAA requires model distillation for the surrogate model, which demands more time and computational resources compared to other attacks. On the other hand, the perturbations generated by DMFAA are not completely imperceptible to the human eye. Therefore, future algorithm improvements will focus on enhancing the imperceptibility of attack perturbations and reducing the resource consumption of the attack. In addition, our method has only been verified on optical RSI and has not been verified on other datasets such as SAR and hyperspectral. We will attempt to conduct further exploration in the future.
6. Conclusions
In this paper, we combine model distillation with feature-based attacks and propose DMFAA for remote sensing target recognition, which effectively enhances transferability between models. During the model distillation phase, we select Mamba as the teacher model, enabling the student model to learn cross-architecture features. In the feature attack phase, we selectively transform low-frequency components to reduce the gap between models. Simultaneously, we employ white-box attacks to extract the aggregated non-semantic features from the model. Finally, aggregate gradients are computed on the transformed image set, enabling the stable generation of adversarial examples through iterative refinement. We evaluate the proposed method on two remote sensing datasets to demonstrate the effectiveness, which outperforms state-of-the-art methods in cross-architecture black-box attacks. Beyond the performance advantages, the established benchmark for black-box attacks on remote sensing target recognition also provides insights into model vulnerabilities. In future work, our aim is to further investigate adversarial learning in remote sensing, including adversarial attacks and defenses.