A Novel Adversarial Detection Method for UAV Vision Systems via Attribution Maps

: With the rapid advancement of unmanned aerial vehicles (UAVs) and the Internet of Things (IoTs), UAV-assisted IoTs has become integral in areas such as wildlife monitoring, disaster surveillance, and search and rescue operations. However, recent studies have shown that these systems are vulnerable to adversarial example attacks during data collection and transmission. These attacks subtly alter input data to trick UAV-based deep learning vision systems, signiﬁcantly compromising the reliability and security of IoTs systems. Consequently, various methods have been developed to identify adversarial examples within model inputs, but they often lack accuracy against complex attacks like C&W and others. Drawing inspiration from model visualization technology, we observed that adversarial perturbations markedly alter the attribution maps of clean examples. This paper introduces a new, effective detection method for UAV vision systems that uses attribution maps created by model visualization techniques. The method differentiates between genuine and adversarial examples by extracting their unique attribution maps and then training a classiﬁer on these maps. Validation experiments on the ImageNet dataset showed that our method achieves an average detection accuracy of 99.58%, surpassing the state-of-the-art methods.


Introduction
The rapid advancement in the fields of unmanned aerial vehicles (UAVs) and synthetic aperture radar (SAR) has led to significant breakthroughs in remote sensing capabilities [1,2].The integration of SAR technology with UAVs [1,2] has unlocked the potential for acquiring high-definition imagery from above, providing a versatile tool for detailed surveillance and analysis across vast and varied landscapes.Building upon this development, these high-resolution SAR images captured by UAVs are revolutionizing the Internet of Things (IoT) landscape by enriching the spectrum of data available for automated processing and interpretation.In the realm of IoTs, UAVs equipped with vision systems have become pivotal in tasks requiring extensive area coverage with precision, such as monitoring wildlife migration patterns in their natural habitats or providing real-time data on environmental conditions for ecological studies.
Meanwhile, deep neural networks (DNNs) have proven to be particularly effective in processing the complex data obtained from SAR images [3,4].Their advanced representation capabilities facilitate accurate analysis and have been instrumental in the development of SAR automatic target recognition (SAR-ATR) models [5][6][7][8].By leveraging the computational prowess of DNNs, these models have rapidly gained popularity due to their efficiency and reliability in identifying and classifying targets in diverse environments.
Although the integration of DNNs into UAV vision systems has markedly improved their capability to deal with complex data, it has concurrently opened up a vector for a new type of threat-adversarial attacks [9].These attacks are executed through the generation of adversarial examples, which are seemingly normal images that have been meticulously modified with imperceptible perturbations.These alterations are calculated and crafted to exploit the inherent vulnerabilities of DNNs, causing them to misinterpret the image and make incorrect decisions [10].The process of its attack implementation is shown in Figure 1.Such adversarial attacks on UAV vision systems can lead to unpredictable behavior and severely impact the security of data acquisition and transmission in UAV-assisted IoTs [11].In scenarios where UAVs are employed for critical missions, such as search and rescue operations, security surveillance, or precision agriculture, a false classification could lead to dire outcomes.For instance, an adversarial image could cause a UAV to overlook a lost hiker it was tasked to locate or misidentify a benign object as a security threat, triggering unwarranted responses.Moreover, the challenge is exacerbated by the fact that these perturbations are designed to be imperceptible to human analysts, which means that the reliability of UAV systems could be compromised without immediate detection.This underlines the urgency for the research and development of defense strategies that can detect and neutralize adversarial examples before they impact UAVs.In order to address this emerging threat, researchers propose two defense strategies: active and passive defense strategies.Active defense methods increase the robustness of models against adversarial attacks through techniques like adversarial training [12,13] and network distillation [14], which are applied during training.In contrast, passive defense involves detecting and filtering adversarial inputs during model deployment.Our research focuses on passive defense, aiming to identify features that differentiate adversarial from clean samples, thereby enabling the detection of adversarial examples and safeguarding the model from potential attacks.
As a reflection of the ongoing efforts in the deep learning community, there is intensive research on adversarial detection.Numerous passive defense techniques have been developed to tackle adversarial attacks.Hendrycks and Gimpel [15] proposed three methods: Reconstruction, PCA, and Softmax, which, while effective against FGSM [9] and BIM [16] attacks, have limitations in detecting more sophisticated threats.Metzen et al. [17] developed an adversarial detection network (ADN), which enhances binary detection in pretrained networks and shows promise against FGSM, DeepFool [18], and BIM attacks but not against the more challenging C&W [19] attacks.Gong et al. [20] made further attempts to improve an ADN, yet it remained inadequate for C&W attack detection.Xu et al. [21] suggested that adversarial example generation is linked to excessive input dimensions and introduced feature squeezing to identify adversarial instances by contrasting the outputs of squeezed and original samples.This approach shows potential in detecting C&W attacks, but its precision still requires enhancement.
In summary, while the aforementioned methods show efficiency in detecting adversarial attacks, such as FGSM and BIM, they fall short when faced with more potent threats like the C&W attack.
Building on this premise, researchers have demonstrated that the adversarial perturbations from C&W attacks tend to be subtler than those from other methods like FGSM when executed under comparable conditions [22].As a result, adversarial examples crafted via C&W attacks are notably more challenging to detect.In this paper, we classify such finely perturbed instances as strong adversarial examples typified by the C&W attack.In contrast, examples with more pronounced perturbations, like those from FGSM attacks, are deemed weak adversarial examples.
Additionally, recent studies have revealed that the convolutional layers in convolutional neural networks (CNNs) inherently function as object detectors, even without explicit object location supervision.Consequently, visualizing the model by extracting the features detected by each CNN layer is possible.In this paper, we have expanded upon earlier versions [23].We have added a significant amount of more detailed charts, conducted adversarial detection experiments based on different attribute maps, and further compared our method with several state-of-the-art approaches.We propose an adversarial detection method for UAV vision systems based on different attribution maps.The proposed methods consist of two steps.Firstly, we generate attribution maps for both clear and adversarial samples using three different types of attribution maps, including class activation mapping (CAM) [24], guided backpropagation (G-BP) [25], and guided Grad-CAM (GGCAM) [26].We then tried to train a binary classifier by using the generated attribution maps to detect adversarial samples.Through experiment analysis, there are distinguishing features between the attribution maps of clear and adversarial samples.In particular, the G-BP and GGCAM of clear samples have discrimination contour features, but after adversarial perturbations are added, the contour features are destroyed, showing that the attribution maps generated by model visualization techniques can effectively distinguish adversarial samples from clear samples.We conducted experiments on ImageNet, and the results show that when using three different attribution maps, this method can detect not only weak adversarial samples, such as FGSM and BIM, but also strong adversarial samples, like C&W.Among them, the detection success rate achieved using G-BP reached up to 99.58%, surpassing the state-of-the-art methods.
The main contributions of this paper are as follows: (1) This paper presents novel model visualization techniques that are introduced for the first time to detect adversarial examples.Model visualization approaches are employed to analyze sample features, and we find the attribution maps of adversarial and clear samples differ considerably.Specifically, the contour features of G-BP and GGCAM are destroyed when the adversarial perturbations are added.
(2) We propose a novel adversarial detection method for UAV vision systems via attribution maps.When using different attribute maps, the success rate of adversarial sample detection can reach more than 90%.Among them, the detection success rate based on the G-BP map can reach 99.58%, which is 1.68% higher than the state-of-the-art method.

Adversarial Attacks
Szegedy et al. [9] first presented the concept of adversarial perturbations.By leveraging gradient information, they skillfully introduced subtle perturbations to original images.These minor modifications transformed clean samples into adversarial examples.While often imperceptible to the human eye, these examples can mislead classifiers into making incorrect predictions.Based on their operational domain, current research primarily categorizes adversarial examples into digital and physical attacks.Let x ∈ R d denote a clean image sample, with x adv representing its corresponding adversarial example.The adversarial perturbations can be represented as an optimization problem: arg max where J(.) denotes the loss function, which measures the error magnitude of the adversarial example x adv in relation to y, the ground-truth label of x. ||x adv − x|| L p signifies the l p -norm distance between x adv and x, typically utilizing norms, p = {0, 2, ∞}. is a threshold that restricts the magnitude of the perturbation, implying that the difference between the adversarial and the original samples should remain within a narrow and explicitly defined boundary.Adversarial attacks involve perturbing input images at the pixel level to mislead DNN predictions.Depending on the knowledge of attackers about the target model, digital attacks can be categorized into white-box attacks and black-box attacks.In white-box scenarios, attackers have full access to all information about the target model, enabling them to effectively exploit model gradients to craft precise perturbations and misguide model predictions [9,19,[27][28][29].Conversely, under black-box assumptions, attackers have no knowledge of the target model and can only query the model to obtain its outputs.Such attackers often capitalize on the transferability of adversarial examples between different models [30][31][32][33] or reveal the inner structure of the model through repeated queries [34][35][36][37].Sometimes, attackers even combine both approaches for more potent attacks [38].

Adversarial Example-Detection Technology
Adversarial example-detection technology can be roughly divided into two stages.In the early stage, researchers mainly explore the difference between adversarial and clear samples.Hendrycks et al. [15] proposed principal components analysis (PCA) and found that the variance in the principal components of adversarial examples is usually larger than that of clean samples.At the same time, they also found that adversarial samples will emphasize the main components with a lower ranking in PCA abnormality.Therefore, they combined these two findings to design the adversarial sample detection method based on PCA, which can detect FGSM and BIM adversarial samples.However, this method is only effective if the attacker is unaware of the defense strategy.Hendrycks et al. also found that the Softmax outputs of adversarial and clear samples are different in several types of attacks, so the adversarial samples can be detected by detecting the Softmax distribution.However, Softmax distribution is not stable, which generally leads to low confidence in the prediction, and this method is only applicable to specific attacks.Xu et al. [21] found that higher-dimensional input features are more vulnerable to adversarial attacks.Therefore, they proposed the feature squeeze method.The main idea of this method is to squeeze the dimension of input features by removing unnecessary features.If the L1 norm difference between the prediction of squeezed and unsqueezed inputs is larger than some threshold, T, the input is marked as an adversarial sample.Feature squeeze has been shown to detect FGSM, BIM, DeepFool, and C&W attacks.Based on the assumption that adversarial samples are not non-adversarial data manifolds, Feinman et al. [39] proposed two adversarial detection methods: kernel density estimates (KDEs) and Bayesian uncertainty estimates (BUEs).The purpose of KDEs is to determine whether data are far from the class manifold, and BUEs can detect the data near the regions with low confidence when the KDEs are invalid.
Currently, adversarial detection typically involves decomposing input samples and analyzing the extracted features to identify adversarial examples.Metzen et al. [17] introduced an adversary detection network (ADN) to safeguard deep neural networks (DNNs).The ADN employs a binary detector network appended to a pretrained neural network.This detector is trained to differentiate between adversarial and genuine samples.Across 10 subsets of CIFAR10 and ImageNet, the authors successfully trained a highly accurate adversarial detection network using ADN.In order to enhance the detector's resilience against new attacks, they incorporated the generation of adversarial examples into the ADN training process.This approach effectively detects FGSM, DeepFool, and BIM attacks.Gong et al. [20] further refined the ADN method by training a binary classifier that operates independently of the main classifier.Unlike previous methods that tailor adversarial samples to the detector, this technique uses adversarial samples generated against the pretrained classifier to augment the original training dataset, which, in turn, trains the binary classifier.Nevertheless, Carlini et al. [40] noted that such methods yield a high false-positive rate when confronted with more potent attacks like the C&W.While these strategies are adept at detecting weaker adversarial examples, such as those from FGSM and BIM, their detection accuracy decreases against stronger attacks, exemplified by the C&W.

Model Visualization Methods
Model visualization techniques play a pivotal role in enhancing the transparency and interpretability of deep learning models.Marco et al. [41] proposed a method of local interpretable model agnostic explanations (LIMEs).LIMEs provide a local approximation of the behavior of models, which can be incredibly insightful when trying to understand individual predictions.However, its locality and linear approach might not capture the global complexity of the model.On the other hand, CAM, proposed by Zhou et al. [24], offers a straightforward mechanism for highlighting influential regions in an image for a given prediction.Yet, as mentioned, its reliance on model modification and retraining can be resource-intensive and may not be feasible for all applications.While post-interpretability methods like LIME and CAM have laid the groundwork, there is a continuous effort to refine these approaches to minimize their limitations and expand their applicability.In order to address these concerns, Grad-CAM [26] emerged as a versatile and powerful tool that extends the capabilities of CAM without the need for structural modifications or retraining.By utilizing the gradients flowing into the final convolutional layer, Grad-CAM provides a fine-grained visualization of the areas impacting the model's decision-making process.Building on the strengths of Grad-CAM, recent advancements have introduced more sophisticated techniques that further refine the interpretability of convolutional neural networks.One such development is Grad-CAM++ [42], an extension that captures the importance of each feature map for a particular class, allowing for even more detailed visual explanations.

Methodology
In recent years, researchers have studied the interpretability of DNNs from two aspects: models [43][44][45] and samples [40,46,47].This paper approaches the issue of adversarial example detection from the sampling perspective, utilizing model visualization techniques to produce attribution maps for both clean and adversarial samples, thereby investigating their distinguishing features.It is acknowledged that the smaller the perturbations in adversarial examples, the more challenging they are to detect.Consequently, this study opts for untargeted attacks to generate adversarial samples, aiming to minimize the perturbations introduced.
Inspired by the study of model interpretability [48], we find a large gap between the CAM of normal and adversarial samples.So, for example, we select two samples and transform them into adversarial samples by using the C&W and TPGD [49] methods.Then, we compare their CAMs, as shown in Figure 2. Normal represents the clear sample and its corresponding attribution map.The highlighted part contains the most abundant input information, revealing the causal relationship between model output (classification) and input.The row of TPGD is the adversarial samples and their corresponding attribution maps generated by the TPGD method.The highlighted part of the attribution map of the adversarial sample shows a great change, which has a high degree of differentiation compared with the attribution map of normal samples.Meanwhile, it also reveals the causal relationship between model misclassification and perturbation input.Similarly, the third row presents adversarial samples and their attribution maps created by C&W.
When compared to TPGD, the differences in attribution maps with C&W are much less pronounced, confirming the strength of C&W as a white-box attack [22].Figure 2 shows that there is a significant difference between the clear and adversarial samples by comparing their attribution maps.Which attribution map has the maximum identification is the key to designing adversarial detection algorithms in this paper.Therefore, we extracted the CAM, G-BP and GGCAM of the two samples in Figure 2, respectively, for comparison, as shown in Figure 3.
We have noted that CAMs exhibit limited differentiation for certain attacks, such as the C&W method.Conversely, guided backpropagation (G-BP) and guided Grad-CAM (GGCAM) demonstrate higher levels of discrimination: the Normal samples display distinct contour features in their attribution maps (see Figure 3: Normal), allowing for rough classification judgments by visual inspection.The introduction of adversarial perturbations, however, leads to a pronounced disruption of these contours in the clear samples (see Figure 3: TPGD and C&W), rendering the residual features unintelligible to the human observer.Hence, the disparities in G-BP and GGCAM between the clean and adversarial samples could provide a sufficient basis for detecting adversarial instances.Beyond TPGD and C&W attacks, this study also examines BIM, DeepFool, FGSM, One Pixel, PGD, Square, and Autoattack, employing a total of seven adversarial sample generation methods.The resulting attribution maps are depicted in Figure 4.
Figure 4 shows the comparison of attribution maps between nine adversarial samples and clear samples, among which OnePixel modified one pixel point, and AutoAttack is the adversarial sample generation method, which integrates APGD [50], APGDT [50], FAB [51], and Square [52].At the same time, we use the L 2 norm, respectively, to calculate the difference between the clear and adversarial samples.The analysis reveals that the attribution maps for the adversarial samples exhibit discernible alterations following the introduction of adversarial perturbations when compared to clear samples.Notably, the attribution maps measured using the L 2 norm in CAM display the most pronounced differences, characterized by brighter colors and more significant pixel value variations.In contrast, the variations between G-BP and GGCAM are even more apparent, with the contour features of clear samples experiencing substantial disruption.Consequently, the attribution maps generated by CAM, G-BP, and GGCAM demonstrate effective capabilities in identifying adversarial samples.

The Proposed Method
The proposed detection method (via attribution maps) can be divided into two parts.The first part is the generation of attribution maps, generating the clear and adversarial samples and turning them into attribution maps through visualization techniques.The second part involves training a binary classifier by using the generated attribution maps to detect adversarial samples.The structure of the two parts is shown in Figure 5.

The Generation of an Attribution Map
For the first part, we mainly generate the attribution maps of clear and adversarial samples via model visualization technology.Since the CAM method needs to modify the model, Grad-CAM (obtained by improving CAM) does not need to modify the model.Therefore, all the CAMs mentioned in this paper adopt the Grad-CAM method.The following steps outline the specific procedure: Step 1: In order to compute the gradient of the classification score y c for class c with respect to the feature a k in the convolutional layer, we calculate the partial derivative ∂y c ∂A k .Next, the backpropagation gradient is set to the global average in order to obtain the weight of the feature map, denoted as α c k .As is shown in Equation ( 2), Z represents the size of the feature A k .
Step 2: Once the weight, α c k , of the feature map has been computed, it is utilized to weigh and combine the feature map activations.This process results in the generation of the activation mapping L c for the class c.The combination of the feature map activations is performed according to Equation (3).
Step 3: In order to focus solely on the features that positively contribute to class c and disregard negative information that might be associated with other classes, ReLU is applied to calculate L c .This results in the generation of the class activation mapping L c CAM (Equation ( 4)), which highlights the regions in the input that strongly activate class c.
Step 4: The primary distinction between backpropagation and DeconvNet lies in how they handle nonlinearity through the ReLU function.In backpropagation, the threshold value selected during the ReLU operation is based on the feature value of the forward transmission.This is represented by Equation ( 5), where f l I denotes the input to the ReLU layer.The output of the hidden layer, denoted as f out , serves as the threshold value determined by the backpropagation method, as shown in Equation ( 6).On the other hand, DeconvNet employs the gradient value as the threshold, as depicted in Equation (7).By using the gradient value, DeconvNet adopts a different approach to determining the threshold during the ReLU operation.G-BP, which combines backpropagation and DeconvNet, effectively visualizes the features learned at higher levels of the neural network.By using Equation ( 8), we obtain G-BP.
Step 5: The attribution maps of CAM are expanded using linear interpolation, employing interpolation functions with two variables.This process aims to upsample the CAM to match the size of the input sample.Subsequently, the dot product of G-BP and CAM is computed.This calculation is demonstrated in Equation (9), where B denotes the operation of bilinear interpolation.By performing the bilinear interpolation and dot product of G-BP and CAM, GGCAM is obtained.

Adversarial Detection Classifier
The task of detecting adversarial samples in UAV-assisted operations demands classifiers that are not only precise but also efficient, considering the UAV constraints on energy, computation, communication, and storage.In the context of UAVs, the models used for such tasks must be optimized for performance while operating within these resource limitations.EfficientNet-B0 [53] stands out as a good choice for UAV vision systems due to its automated model optimization and compound scaling, which allow it to excel in environments where computational resources are at a premium.Its architectural design, consisting of a sequence of mobile inverted bottleneck convolutions (MBConv), convolution layers (Conv), a global average pooling layer (Pooling), and a classification layer (FC), has been tailored for efficiency without compromising accuracy (Table 1).This configuration ensures that high-level performance is maintained even when computational resources are constrained.In contrast to EfficientNet-B0, the widely recognized ResNet50 [54] model serves as a general benchmark to evaluate performance in adversarial detection.While it is a more computationally intensive model known for its deep residual learning framework that eases the training of networks, it offers a comparison point to EfficientNet-B0 in terms of robustness and accuracy.The inclusion of ResNet50 in our comparative analysis provides a broader perspective on how different models perform under the stringent operational conditions of UAVs, allowing us to assess the trade-offs between computational demand and detection capability.We train our model using a dataset comprising the attribution maps of both adversarial and normal samples.Throughout the training process, we follow the training methodologies and parameter selections, as delineated by Tan et al. [53] for EfficientNet-B0 and He et al. [54] for ResNet50.Our objective is to enable the model to distinguish between these two categories effectively, thereby achieving robust classification performance.

Experiment
In this section, we set up three experiments: (1) EfficientNet-B0 as the detector, which verifies the effectiveness of attribution maps.We compare which of the three attribute maps is most suitable for detecting adversarial samples.(2) ResNet50 as the detector, further illustrating that when we choose different classifiers, the attribution maps can also obtain good accuracy.(3) Comparisons with state-of-the-art methods.The experimental part selects the ImageNet validation set as the dataset.Adversarial samples are generated by five attacks, including C&W, BIM, FGSM, PGD, and AutoAttack (APGD, APGDT, FAB, and Square).The code in this paper is implemented via the PyTorch deep learning framework, and we use the TorchAttacks library to generate the adversarial samples.

Dataset and Models
ImageNet [55] consists of over 1.2 million images across 1000 diverse categories.It is widely used for benchmarking machine learning models in visual object recognition due to its variety of classes and high volume of data.In the context of UAVs, leveraging ImageNet can significantly aid in developing robust object classification algorithms, which are critical for UAV autonomous navigation and operational tasks, such as surveillance or search and rescue.Utilizing ImageNet for adversarial sample detection in UAV data transmission and output processing is crucial because it represents a broad spectrum of real-world scenarios that UAVs may encounter.In this paper, due to the extensive size of the ImageNet dataset, only the verification set consisting of 50,000 images was utilized as the data source.
We selected EfficientNet-B0 and ResNet50 as our classifier models and trained them using the attribution maps of adversarial and normal samples.For EfficientNet-B0, we follow the scaling method proposed by Tan et al. [53].Additionally, we employ the same RMSprop optimizer with a decay of 0.9 and a momentum of 0.9, alongside a learning rate warm-up and exponential decay, as suggested by Tan et al. [53].As for ResNet50, following the research of He et al. [54], we utilized a batch normalization momentum of 0.1 and a standard cross-entropy loss function.The model was trained using SGD with momentum, with an initial learning rate set as recommended, which was adjusted following a cosine decay schedule.

Training Sets and Validation Sets
We selected the first 40,000 images in the ImagNet validation set as the normal class.By using the C&W method, these samples were also transformed into adversarial samples and used as an adversarial sample class (not every normal sample can be converted into an adversarial sample).In the training, we selected 20% of the data as the verification set through random sampling, and the remaining 80% of the data was used for training.Table 2 shows the training set and verification set.In training, we selected 20% of the data in the training set as the verification set through random sampling, and the remaining 80% of the data was used for training.Table 2 shows the training set and verification set.
Finally, our training set contained 32,000 normal samples and 17,616 adversarial samples, and the verification set contained 8000 normal samples and 4403 adversarial samples.

Test Sets
We selected the remaining 10,000 images in the ImagNet validation set as the normal class, and these 10,000 images were converted into five types of adversarial samples by using the C&W, BIM, FGSM, PGD, and AutoAttack methods, which were used as the adversarial samples in the test set.AutoAttack integrates the APGD, APGDT, FAB, and Square attack methods.Table 3 shows the test sets.

Performance Metrics
The effectiveness of the method is assessed using four key metrics: true-negative (TN), true-positive (TP), false-positive (FP), and false-negative (FN).These metrics provide insights into the classification performance.When the positive class in the test dataset is correctly classified as a positive class, it is considered as TP.TN is achieved when a negative class is accurately predicted as a negative class.FN occurs when a positive class is mistakenly classified as a negative class.Conversely, FP happens when a negative class is incorrectly predicted as a positive class.In this paper, the evaluation criteria for the effectiveness of the method are based on the combination of precision (Equation ( 10)), recall (Equation ( 11)), and accuracy (Equation ( 12)).

Experiments Using EfficientNet-B0
As shown in Table 4, the results show that all three attribution maps (CAM, GGCAM, and G-BP) can effectively detect adversarial samples.The average accuracy of CAM is 94.06%, and in comparison with other attribution maps, CAM is lower than GGCAM and G-BP in terms of recall rate, precision, and accuracy.The results show that the difference in the CAM between the normal and adversarial samples is smaller than the other two attribute maps, which contradicts the calculation of the L 2 norm.This indicates that calculating the difference in the attribute maps by using the L 2 norm is not suitable.The average accuracy of GGCAM is 99.38%, which is higher than that of CAM by 5.32%.G-BP has the best effect, with an average accuracy of 99.56%, 0.18% higher than GGCAM.Therefore, the detection of C&W, BIM, FGSM, PGD, and AutoAttack can be realized well via G-BP.

Experiments Using ResNe50
As shown in Table 5, the detection model is replaced by ResNe50.The results of this experiment are similar to the experiments using EfficientNet-B0.The average accuracy of CAM is 93.03%, which is lower than that of GGCAM and G-BP in terms of recall, precision, and accuracy.When compared with the EfficientNet-B0 used in Experiment 1, the accuracy of CAM decreases by 1.03%, whereas that of GGCAM only decreases by 0.014%.Therefore, using CAM for adversarial sample detection is not stable compared to GGCAM.Meanwhile, all the evaluation indexes of G-BP are higher than CAM and GGCAM.The average accuracy is almost the same as in Experiment 1, even increasing by 0.02%.Therefore, G-BP is better adapted to the detection of adversarial samples.

Comparisons with State-of-the-Art Methods
Our detection approach was benchmarked against the leading state-of-the-art adversarial detection methods, including kernel density (KD) [39], local intrinsic dimensionality (LID) [56], Mahalanobis distance (MD) [57], LiBRe [58], S-N [59], and EPS-N [59].By adhering to the experimental protocols established by Zhang et al. [59], our implementation employs the ResNet50 architecture as the foundation for our detector.As indicated in Table 6, our proposed methods exhibit robust performance, surpassing other approaches in average detection success rate across various attack vectors.

Discussion
The proposed method based on attribution maps can effectively detect C&W, BIM, FGSM, PGD, and other attacks.The average detection accuracy of G-BP can reach 99.58%, indicating that the attribute maps can distinguish clear samples from adversarial samples.We choose the attribution maps as the distinguishing features.This method is feasible and independent of the choice of classifiers.
Further analyzing our approach, we focus on error examples in detecting C&W attacks under the Efficientnet-B0 model.In the case of misclassified normal samples (Figure 6), we observe two types of errors.Samples (a) and (b) have relatively clear contours, but the model fails to recognize these features (the highlighted part is not on the contour feature), leading to incorrect classification.Conversely, samples (c) and (d), with fuzzy contours, are also misclassified.This suggests that the model does not consistently extract contour features, leading to the misidentification of these samples as adversarial examples.Improving the steady extraction of contour features in samples will be our focus in future work.
For misclassified adversarial examples (Figure 7), again, we notice two scenarios.In cases (a) and (c), the object's contour almost disappears due to adversarial perturbations.The model's regions of interest are not focused on areas with significant gradient changes.This type of error may be attributed to an issue within the model itself.In cases (b) and (d), the partial contours remain visible despite increased adversarial perturbations.A human observer can distinguish objects like a bench or the alphabet.The model focuses on areas with rich contoured features, leading to the misclassification of these adversarial samples as normal.This indicates that the adversarial perturbations added during example generation are not always sufficient.
Our analysis identifies two main issues.The first concerns the stability of contour feature extraction in a small number of samples.As shown in Figure 6c,d, these normal samples' contour information is not fully captured in the attribution maps.This suggests that our method may struggle with consistently extracting contour features, especially from samples with limited instances.The second issue arises in classes with a small number of samples.Despite the adversarial perturbations, the contour features are not completely disrupted.Figure 7b,d show that only parts of the contours are affected by the perturbations.The remaining intact contour information can lead to misclassification.This underscores the challenge of effectively disrupting the contour features in adversarial samples, particularly in classes with fewer instances.

Limitation
While the proposed method-leveraging model visualization technology to generate attribution maps-offers an innovative approach for detecting adversarial examples in UAV vision systems, it also presents certain limitations that warrant further investigation.A primary limitation of our approach lies in the inherent instability of attention-based model visualization methods when extracting features from images.Despite their intuitive appeal and ease of implementation, these methods can sometimes yield inconsistent feature attributions, which may lead to the unreliable detection of adversarial examples.This is particularly problematic in scenarios where decisive image features are subtle or subject to variations in environmental conditions, such as lighting or occlusion.
In order to address this, future work will need to focus on enhancing the stability and consistency of feature extraction within attention mechanisms.This could involve developing more robust attention models that are less sensitive to input perturbations or integrating supplemental techniques that can corroborate and refine the attention-driven feature mappings.Ensuring stable feature extraction is crucial for the dependable deployment of UAV vision systems, where the accuracy of real-time adversarial example detection is paramount.

Figure 1 .
Figure 1.The implementation process of adversarial example attacks.During UAV data acquisition and transmission, attackers add subtle perturbations to the data, generating adversarial examples that can easily mislead the UAV vision system and cause misclassifications.

Figure 2 .
Figure 2. Samples and their corresponding attribution maps (CAMs).(a,b) are the two samples selected, respectively, where Normal represents the clear samples and their attribution maps, and TPGD and C&W represent the adversarial samples and their attribution maps generated by the corresponding attack method.

Figure 3 .Figure 4 .
Figure 3.Samples and their corresponding three attribution maps (CAM, G-BP, and GGCAM).(a,b) are the two samples selected, respectively, where Normal represents the clear samples and their attribution maps, and TPGD and C&W represent the adversarial samples and their attribution maps generated by the corresponding attack method.G-BP GGCAM GGCAM G-BP CAM CAM

Figure 5 .
Figure 5.The proposed adversarial detection framework based on attribution maps.

Figure 6 .
Figure 6.Four normal samples (a-d) which have been misclassified and their corresponding CAMs.

Figure 7 .
Figure 7. Four adversarial samples (a-d) that have been misclassified and their corresponding CAMs.

Table 2 .
Training set and validation set.

Table 4 .
Experiments using EfficientNet-B0.We highlight the highest values achieved in each performance metric category in bold.

Table 5 .
Experiments using ResNe50.We highlight the highest values achieved in each performance metric category in bold.

Table 6 .
Comparisons with State-of-the-Art methods.We have highlighted the two sets of data with the highest detection success rates in bold.