Defenses Against Adversarial Attacks on Object Detection: Methods and Future Directions

Thunuguntla, Anant; Tadepalli, Prasad; Raffa, Giuseppe

doi:10.3390/info16111003

Open AccessReview

Defenses Against Adversarial Attacks on Object Detection: Methods and Future Directions

by

Anant Thunuguntla

^1,*

,

Prasad Tadepalli

¹ and

Giuseppe Raffa

²

¹

School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA

²

Intel Corporation, Hillsboro, OR 97124, USA

^*

Author to whom correspondence should be addressed.

Information 2025, 16(11), 1003; https://doi.org/10.3390/info16111003

Submission received: 28 September 2025 / Revised: 14 November 2025 / Accepted: 14 November 2025 / Published: 18 November 2025

Download

Browse Figures

Versions Notes

Abstract

The object detectionsystems aim to classify and localize objects in an image or video. Over the past decade, we have seen how adversarial attacks can impact the performance of object detection systems and make them unusable in many situations. This survey summarizes different types of digital adversarial attacks bounded by

L_{p}

norms and the corresponding defense techniques for object detection systems. It categorizes the defenses into six groups, namely, preprocessing, adversarial training, detection of adversarial noise, architectural changes, ensemble defense, and certified defenses, and highlights the effectiveness of each technique. We end the paper with a discussion of the weaknesses of different defenses and possible approaches to make them stronger. Patch- or physical-based attacks are excluded from this survey, as they follow a different threat model.

Keywords:

adversarial attacks; adversarial defenses; object detection

1. Introduction

Deep learning has been used to solve multiple difficult tasks, such as language translation, building recommendation engines, image classification, object detection, and many others. Object detection is a computer vision task that involves analyzing an input image or video to identify and localize multiple objects within the scene, returning a set of detected instances, each annotated with a class label and an associated bounding box. This is a much harder task compared with classification, as there can be many objects in a given image, and the task is to predict the category of each object as well as the bounding box that covers the object. There are multiple real-world applications that use this task, such as video surveillance, autonomous driving, traffic monitoring, medical image processing, and others. When deploying these solutions in production, they need to be robust enough to deal with out-of-distribution data. One particular type of out-of-distribution data is adversarial examples, which are input data in which small perturbations, usually measured using the

L_{p}

norm, are added to the input to make the network output incorrect results or results that the attacker prefers. Object detectors have exhibited these vulnerabilities, where carefully crafted adversarial examples have made the networks output incorrect labels for objects, decreased the accuracy of the model, or prohibitively slowed down the detection through latency attacks. There are also patch attacks where the image is corrupted by adding a patch in the form of an adversarial pattern that compromises the model predictions. Although patch attacks constitute an important class of physical attacks, they differ fundamentally from

L_{p}

-bounded digital perturbations in both their threat model and implementation. Patch-based attacks involve localized and physically realizable modifications, whereas

L_{p}

attacks alter the values of the pixels within a bounded norm throughout the image. To maintain a coherent digital threat model and ensure comparability between defenses, this survey focuses exclusively on

L_{p}

-bounded adversarial attacks and the corresponding defenses.

Here are our main contributions:

We give background on $L_{p}$ adversarial attacks on object detection and characterize different defenses that protect against them.
For each defense technique, we provide a detailed list of metrics they test against, target datasets used, types of models that can leverage these defenses, and the kind of attacks against which they are able to defend.
We categorize defenses into different groups, highlight the strengths of each defense, and identify possible directions for future research.

2. Background

2.1. Object Detection

Object Detection: Unlike the image classification task, which only predicts a single category for an image, this task involves detecting and identifying a variable number of objects. More specifically, an object detector takes images or videos as input and returns a set of detected objects, and for each object predicts a category label (from a given set of categories) and a bounding box that encompasses the object.

Two-stage object detectors: These types of object detectors typically use a pre-trained Convolutional Neural Network (CNN) on an image classification task to propose category-independent regions of interest. These regions are then warped and propagated through a CNN network, which extracts a feature vector for each proposal and performs classification and localization on each of these vectors. The first technique of this type was the region-based convolutional neural network (R-CNN) [1]. Fast R-CNN [2] used backbone networks like AlexNet [3], VGG [4], and ResNet [5] to extract image features, and object proposals are mapped to these image features. These regions were then fed to CNNs to get the desired output. Faster R-CNN [6], the third model in the R-CNN family, was able to achieve near real-time object detection. The authors were able to achieve the speedup by incorporating the region proposal network (RPN) into the CNN model. The input image would pass through a CNN, and the output is a set of feature maps. These are sent to the RPN, which outputs a set of bounding boxes along with their classifications. These proposals are then mapped to feature maps obtained from the CNN layer in the Region of Interest (ROI) pooling layer, fed to a fully connected layer, and then sent to a classifier and bounding box regressor.

Single-stage object detectors: These types of detectors reframe the task of object detection as a regression problem, eliminate the step of object proposal extraction, and instead focus on simultaneously predicting the class score and the bounding box for objects. You Only Look Once (YOLO) [3] found inspiration from the GoogLeNet [7] model for image classification. YOLO divided the input image into a grid of

S \times S

. S is a hyperparameter that denotes the size of the grid into which the input image is divided (

7 \times 7

for YOLO). Each of these grid cells predicted multiple bounding boxes and returned the center, the height, the width of the bounding box, and the corresponding confidence score. The Single Shot Multibox Detector (SSD) [8] used a VGG-based architecture to predict objects at different scales.

Transformer-based object detector: Nicolas et al. [9] designed an architecture that does object detection using CNN and then used a transformer [10] to output a set of predictions, which were then fed into prediction heads, which consisted of a predefined number of feedforward networks. The architecture used a bipartite matching loss as a training objective to perform object detection. EfficientDet [11] was an extension of EfficientNet [12] for the object detection task. Like EfficientNet, it aimed to achieve higher accuracy with fewer parameters. It used EfficientNet as the backbone, proposed Bidirectional Feature Pyramid Network (BiFPN), which is a weighted bidirectional feature network to ensure fast multiscale feature fusion, and employed compound scaling that scales up network depth, width, and resolution.

Recent open-world object detectors such as YOLO-World [13] and Grounding DINO [14] extend traditional closed-set detection to open-vocabulary and vision-language grounded settings.

2.2. Adversarial Attacks on Image Classification

Adversarial attacks on object detection are often based on techniques for adversarial attacks on image classification, which are reviewed in this section. Depending on the level of access to the model and its weights, attacks can be classified as white-box or black-box. In the case of white-box attacks, the attacker has complete access to the deep learning model and its weights. Black-box attacks usually refer to attacks in which the attacker just has complete or partial access to the output (in the case of image classification, that can refer to the highest class score or scores of each class ranked in descending order). Attacks can also be classified as targeted or untargeted. Targeted attacks imply that the attacker wants to change the output to a fixed target class. In untargeted attacks, the attackers just want the predicted output to be different from the true output.

For adversarial attacks, the notion of similarity between a clean image x and its adversarial version

x^{'}

is measured using the

L_{p}

norm [15], defined as

∥ x - x^{'} ∥_{p} = (\sum_{i} | x_{i} - x_{i}^{'} {|^{p})}^{1 / p}

. Usually,

L_{0}

,

L_{2}

, and

L_{\infty}

norms are used to calculate this similarity.

L_{0}

mainly captures the number of pixels changed between clean and adversarial images.

L_{2}

captures the Euclidean distance, and

L_{\infty}

captures the maximum change added or subtracted to each pixel of the clean image.

Szegedy et al. [16] showed that neural networks are not resilient to small imperceptible perturbations and have blind spots, demonstrating how adversarial examples generated for a given network can also cause other neural networks trained with different hyperparameters to produce incorrect results.

The previous assumptions for the cause of these adversarial examples were due to insufficient regularization or nonlinearity of neural networks. Goodfellow et al. [15], however, showed that adversarial examples exploit the linearity of neural networks. Activation functions such as Rectified Linear Unit (ReLU), Maxout networks, and Long Short-Term Memory (LSTM) behave in a linear manner for easier optimization. To prove this hypothesis, they introduced a simple non-iterative technique called the Fast Gradient Sign Method (FGSM). It took the sign of the gradient of input with respect to the output label of a classifier, multiplied it by a small constant, subtracted it from the image in case of a targeted attack, and added it to the image in case of an untargeted attack. The intent of this attack was to focus on the direction of the perturbation rather than the magnitude of the gradient.

Madry et al. [17] introduced the Projected Gradient Descent (PGD) attack, which applies the FGSM in an iterative manner with the goal of finding an adversarial example that maximizes the classification loss. The main difference between PGD and Basic Iterative Method (BIM) [18] attacks is that the PGD attack is initialized to a random point within the

l_{\infty}

ball.

Carlini and Wagner (C&W) [19] designed a set of attacks to generate adversarial examples by defining the task of finding these examples as an optimization problem. The goal is to find a perturbation

δ

that, when added to the image, still results in a valid image:

Minimize $D (x, x + δ) + c . f (x + δ)$ subject to;
$x + δ \in {[0, 1]}^{n}$ .

Here, c is a constant greater than zero, D is the distance metric, and f is an optimization function. The paper offered multiple objective functions for the constraint

C (x + δ) = t m

where t is not the true label. These attacks were able to achieve 100% misclassifications against defensively distilled networks (elaborated in the next section), and the authors were able to show attacks that used the three distance metrics:

l_{0}

,

l_{2}

, and

l_{\infty}

norms.

Universal adversarial perturbations: Moosavi-Dezfooli et al. [20] were the first to report the existence of universal adversarial perturbations (UAPs) that could fool state-of-the-art image classifiers in most natural images with high probability. A subset of the training dataset is used to calculate the UAP. The UAP is initialized to 0, and at each step, if the UAP fooled the image classifier on the current image, they skip; otherwise, they used the DeepFool technique [21] to compute the minimal perturbation that could fool the classifier on the current image. They added this perturbation to the UAP and then projected it on the

L_{p}

ball of radius

ξ

, centered at 0. This procedure was repeated several times over the selected subset of training data until a predetermined percentage of images in the dataset were misclassified.

3. Adversarial Attacks on Object Detection

The object detection task usually involves minimizing a mixture of cross-entropy loss to figure out whether a bounding box contains a foreground object or the background, a regression or localization loss to predict the bounding box coordinates, Intersection over Union (IoU) loss to measure the overlap between predicted and true bounding boxes, and optionally a focal loss [22], which tries to address the class imbalance in the data set by modifying the cross-entropy loss to give more weightage to hard examples and less weightage to easier background examples.

L_{p}

-bounded adversarial attacks on object detection models can be broadly classified into three categories of attacks: mislabeling, bounding-box, and fabrication. They can be understood within a single data–model–inference framework. Mislabeling attacks mainly target the classification head by corrupting class-label predictions. Bounding-box attacks disrupt the regression or localization branch, causing inaccurate or missing detections. Meanwhile, fabrication attacks manipulate the inference stage, such as Non-Maximum Suppression (NMS), to insert or suppress detections.

3.1. Mislabeling Attack

This type of attack results in objects being mislabeled as either an incorrect class or the background. PGD and FGSM attacks can cause mislabeling as shown in Figure 1.

An extreme version of the mislabeling attack is a vanishing attack in which every object is classified as background. One such attack is Dense Adversary Generation (DAG) [23]. DAG is a white-box method that employs a gradient-based algorithm to craft adversarial perturbations against object detectors. It targeted all densely distributed bounding boxes simultaneously and optimized the loss function accordingly. The algorithm took as input the classifier f, the input image, the original set of labels

L = (l_{1}, l_{2}, \dots l_{n})

, and the adversarial set of labels

L^{'} = (l_{1}^{'}, l_{2}^{'}, \dots l_{n}^{'})

in which each entry is different from the original set of labels. At each iteration, it found the correctly predicted targets, computed the gradient with respect to input data for the incorrect target label and the correct target label for each proposal, and accumulated it. The goal was to increase the confidence in the incorrect label and decrease the confidence in the correct label. The algorithm iterated until the target proposal set of bounding boxes was empty or the maximum iterations were reached. The authors set the maximum number of iterations at 150 and found that in less than 1% of the images, DAG did not converge but still generated perturbations that worked well. When this attack was evaluated for black-box attacks, the authors found that it is harder to transfer across different architecture types, and creating an ensemble attack was the best way to enhance the effectiveness.

Most adversarial examples tend to do well when used on a source model that was used for generating the attack. They exhibit a low success rate when trying to attack models that have a different architecture than the source model; that is, they exhibit weak transferability [24]. Wei et al. tried to enhance this transferability by proposing a framework called Unified and Efficient Adversary (UEA). They leveraged Generative Adversarial Networks (GANs) [25] in this framework, where the Generator plays the role of generating adversarial examples from a given input image and the Discriminator tries to distinguish clean and adversarial examples. In addition to GAN loss and similarity loss, the authors also used DAG’s loss function, which tried to misclassify all predictions of proposal regions. To address transferability, the authors used a multiscale attention feature loss.

Data sets like MS-COCO [26] (description in Table 1) that are used to train object detectors tend to have a class imbalance [27], i.e., some classes, such as person, exist in a higher number than other classes. When an adversarial perturbation tries to attack all objects in an image jointly by maximizing the loss for each object, it does not take into consideration the dominance of some classes in the loss function. The class-wise attack (CWA) was introduced by Chen et al. [28] to balance the influence of each class and create a stronger attack compared with PGD that can attack all instances of objects in a uniform way.

3.2. Bounding Box Attack

PGD and FGSM attacks can also use localization loss

\nabla_{x} L_{l o c} (x_{a d v}, b_{k})

, where

b_{k}

represents the true bounding boxes. The attacker could take the sign of this gradient and multiply it by a small constant. This could be done in a single step for FGSM or in multiple steps for PGD.

Two-stage detectors such as Faster R-CNN [6] or R-FCN [32] used RPN to extract these proposals. Li et al. [33] proposed an attack called Robust Adversarial Perturbation (RAP), which could be universal for all region-based object detectors, i.e., it could degrade performance without knowledge of the underlying object detector. RAP attacked RPN by trying to find the minimum perturbation that could be added to an image to ensure that no correct proposals are generated. They used a combination of label loss that seeks to disrupt the label predictions of the proposals and shape loss that aimed to disrupt bounding box shape regression.

DAG, PGD, and FGSM are white-box attacks, and RAP is a black-box attack focusing mainly on region proposal networks. The Evaporate attack [34] is a novel black-box

L_{2}

attack that could work on both regression-based models (such as YOLO) and region-based models (such as Faster R-CNN). The attack assumed query access to the target model. The attack worked by first creating adversarial candidates with random noise and then feeding them, alongside the clean image, into the target detection model. The algorithm subsequently calculated the distance loss between the clean and adversarial examples and used the particle swarm optimization algorithm [35] to continuously optimize and find the best adversarial example.

3.3. Fabrication Attack

Fabrication attacks generated spurious detections that increased false positive rates. Object detectors can output multiple overlapping boxes that try to explain or annotate a single object. NMS is an algorithm that removes redundant predictions by keeping the prediction with the maximum score and suppressing the rest. It worked by selecting an anchor bounding box that has the maximum score, finding all other boxes that overlap this anchor box whose IoU value is beyond the selected threshold, and discarding boxes if their confidence score is lower than the anchor box. NMS is used at test time. We covered three different attacks that exploit NMS to make the object detector unusable. The Daedalus [36] attack caused the object detector to produce extremely dense false positives, resulting in zero Mean Average Precision (mAP) scores and increasing false positive rates to 99%. Daedalus generated bounding boxes with high confidence using a loss function and then compressed the IoUs for each pair of boxes by reducing their sizes using a second loss function. Daedalus is an image-specific attack that could take at least a minute to generate the attack per image. Phantom Sponges [37] formulated a UAP attack [20] that attacked the NMS algorithm and could be applied to images in real time. It used the PGD algorithm with the

l_{2}

norm and combined three loss functions: a max-object loss, which increases the number of candidates passed to NMS; the bounding box area loss; and IoU loss to create this UAP. Overload [38] is a latency attack similar to Daedalus and Phantom Sponge, but it adopts a simpler strategy based on spatial attention, directing the perturbation towards image regions with low density, as observed in Figure 2. This attack used fewer computing resources than Daedalus and Phantom Sponge.

Chow et al. [40] designed a framework called Targeted Adversarial Objectness Gradient Attacks (TOG) that could induce object vanishing, object fabrication, and object mislabeling, and generated universal perturbation attacks on object detection models. The framework can attack single- and two-stage object detectors; it takes an input image, and based on the attack type, it customizes the loss function and uses an iterative approach to create an attack. The iteration continued until either the desired attack behavior was achieved or a maximum number of iterations was reached.

We refer the reader to [41] for a more complete list of state-of-the-art attacks against object detection.

4. Defenses Against Object Detection Attacks

We classify defenses against object detection attacks into six groups as shown in Table 2 based on their functional role in the object detection pipeline and the type of robustness they provide: Preprocessing defenses work on the input stage by filtering images to suppress perturbations. Adversarial training defenses improve robustness by incorporating adversarial examples as part of training. Detection of adversarial noise defenses identifies and filters out adversarial inputs before inference. Architectural-change defenses modify the detector structure or feature representation to strengthen the inherent robustness. Ensemble defenses combine multiple detectors trained under diverse conditions to exploit model diversity and reduce transferability. Finally, certified defenses tend to offer provable guarantees of robustness within bounded perturbations. These are outlined in the following sections.

4.1. Preprocessing

Input data can consist of robust, useful features that a model is supposed to learn and non-robust, useless features that the model is inappropriately sensitive to. Ilyas et al. [64] predicted the existence of these robust and non-robust features, where non-robust features are features that are indicative of a true label but could be easily manipulated by an adversary. Zhou et al. [42] devised an input transformation defense that could clean the input to make it benign before feeding it into an object detector. The defense relied on removing non-robust features from an image and reconstructing the image with high quality. This allowed the model to be further enhanced by adversarial training (described in the next section) to combat specific attacks. A feature filtration model captures more robust features from the input and reconstructs high-quality examples to defend against adversarial examples. The model contained a filter network (a combination of encoder and decoder) and a critic network. The critic network’s job was to constrain the mapping from input to latent space so that differences in high-level representations between clean and adversarial examples are reduced. The design of the critic network was based on Wasserstein GAN [65], and rather than using weight clipping, which makes it difficult to enforce Lipschitz constraints on the model, the authors used the gradient penalty technique suggested by [66]. This plug-in defense model could be added to single- or two-stage detectors. An advantage of this filtration technique is that it could also defend against adversarial patch attacks. The mAP on clean examples was still affected by this technique. The use of spectral norm [67] that constrained the critic network to be Lipschitz continuous, as compared with gradient penalty, could possibly help in ensuring that mAP in clean examples remains close to networks trained only in clean examples.

4.2. Adversarial Training

Adversarial training, which incorporates adversarial examples in addition to original (clean) examples during training, has been shown to achieve reasonable accuracy against adversarial attacks. State-of-the-art neural network models tend to employ a variety of input transformations during training to increase robustness against unseen examples. However, as per Szegedy et al. [16], these transformations tend to be drawn from the same distribution throughout the model training process and are unable to model the local space around each training point. Using adversarial examples in training along with clean examples, neural networks can be somewhat regularized. Goodfellow et al. [15] explained in their work that by using adversarial examples, neural networks can generally resist the same kind of attack that was used to generate these examples. They also clearly state that adversarial training is only useful when the model has sufficient capacity.

Madry et al. [17] presented an optimization view of adversarial robustness as shown in Algorithm 1. The algorithm trained a model f, parameterized by

θ

, over multiple epochs using clean and adversarially perturbed examples. For each minibatch of training data, adversarial examples were generated by first adding small random noise to the clean inputs, followed by iterative refinement through k-step PGD. In each PGD step, the input was updated in the direction of the gradient of the loss with respect to the input and then projected (via clipping) back into the valid

ϵ

-bounded region around the original input. After generating adversarial examples, the model parameters were updated using gradients computed on these perturbed inputs. This adversarial training process can be interpreted as a saddle point optimization problem, consisting of a nonconcave inner maximization (adversarial example generation) and a nonconvex outer minimization (model parameter update), thereby framing robustness as a min–max optimization task. The training procedure achieved 64% accuracy on CIFAR data against black-box/transfer attacks. The paper was able to show, with experiments, how networks with larger capacity were less vulnerable to transfer attacks and how adversarial training using the k-step PGD was superior to FGSM. Adversarial training decreased the accuracy on clean examples, as pointed out by Tsipras et al. [68].

Algorithm 1 Adversarial Training using k-step PGD [17]

Input:: model f parameterized by $θ$ , training dataset D, steps k, number of epochs n, perturbation bound $ϵ$ , step size $ϵ_{s t e p}$ , learning rate $l r$
Output:: adversarially trained model f

Initialize :

θ

1:: for $e p o c h = 1$ to n do
2:: for minibatch $B \in D$ do
3:: get $(x, y) \in B$
4:: $x_{a d v} \leftarrow x + u n i f o r m (- ϵ, ϵ)$
5:: for $i = 1$ to k do
6:: $x_{a d v} \leftarrow x_{a d v} + ϵ_{s t e p} (\nabla_{x} L (x_{a d v}, y; θ))$
7:: $x_{a d v} \leftarrow c l i p (x_{a d v}, x - ϵ, x + ϵ)$
8:: end for
9:: update $θ$
10:: $θ \leftarrow θ - l r (E_{(x, y) \in B} [\nabla_{θ} L (x_{a d v}, y; θ)])$
11:: end for
12:: end for

The k-step PGD attack can be computationally expensive and makes it unusable for large data sets like ImageNet [69], as the training algorithm performs k forward passes and k backward passes without updating the classifier’s parameters. Instead, Shafahi et al. [70] updated the classifier’s parameters as well as perturbation with every backward pass. This work presented the first approach to efficiently train a robust model on ImageNet (non-targeted), achieving performance comparable to existing, substantially slower methods. The authors successfully trained a robust ImageNet classifier using only 4 P100 GPUs over a span of 2 days, a runtime closely matching that of standard (natural) training.

Among defenses against adversarial attacks for object detection tasks, most of the works are centered on adversarial training. We divide it into two subsections. The first subsection covers papers where the model’s loss function is modified, and the second subsection covers papers that make changes to the model’s architecture and its loss function.

4.2.1. Changes to Loss Function

Most adversarial attacks for object detection tasks tend to utilize variants of individual task losses (classification or localization) or a combination of both. The authors of the MTD [43] defense observed in a t-SNE plot that both task losses interact with each other and that their gradients share a certain level of common directions that are not fully aligned, which could lead to misaligned task gradients that could obfuscate adversarial training. Madry et al. [17] aimed to find an adversarial example that causes inner maximization and then to find a classifier that minimizes this loss. Similarly, MTD expanded this adversarial training algorithm, Algorithm 2, by incorporating both classification and localization losses. For each sample

x^{i}

, an initial perturbed version

{\tilde{x}}^{i}

was generated within an

ϵ

-bounded ball. Adversarial examples were then computed separately in the classification and localization task domains. These perturbations were then projected back into the valid input space using a projection operator. MTD then selected between the two adversarial examples—

{\tilde{x}}_{c l s}^{i}

and

{\tilde{x}}_{l o c}^{i}

—based on which one yielded a higher combined loss across both tasks. The selected example was used in the adversarial training step to update the model parameters via gradient descent. MTD is able to defend effectively against attacks with different budgets and is quite successful against DAG and RAP attacks.

Algorithm 2 Adversarial Training using MTD [43]

Input:: dataset D, number of training epochs T, batch size S, attack budget $ϵ$ , learning rate $γ$ , model parameter $θ$ , projection operator $P_{S_{x}} (\cdot)$ that projects input into feasible region $S_{x}$

Output:: adversarially trained model f
1:: for $t = 1$ to T do
2:: for minibatch ${x^{i}, {y_{k}^{i}, b_{k}^{i}}}_{i = 1}^{S} \in D$ do
3:: ${\tilde{x}}^{i} \sim B (x^{i}, ϵ)$
4:: compute attacks in classification task domain
5:: ${\tilde{x}}_{c l s}^{i} = P_{S_{x}} ({\tilde{x}}^{i} + ϵ \cdot (\nabla_{x} l o s s_{c l s} ({\tilde{x}}^{i}, {y_{k}^{i}})))$
6:: compute attacks in localization task domain
7:: ${\tilde{x}}_{l o c}^{i} = P_{S_{x}} ({\tilde{x}}^{i} + ϵ \cdot (\nabla_{x} l o s s_{l o c} ({\tilde{x}}^{i}, {b_{k}^{i}})))$
8:: compute final attack examples
9:: $m = L ({\tilde{x}}_{c l s}^{i}, {y_{k}^{i}, b_{k}^{i}}) > L ({\tilde{x}}_{l o c}^{i}, {y_{k}^{i}, b_{k}^{i}})$
10:: ${\tilde{x}}^{i} = m ⊙ {\tilde{x}}_{c l s}^{i} + (1 - m) ⊙ {\tilde{x}}_{l o c}^{i}$
11:: perform adversarial training step
12:: $θ = θ - γ \cdot \nabla_{θ} \frac{1}{S} \sum_{i = 1}^{S} L ({\tilde{x}}^{i}, {y_{k}^{i}, b_{k}^{i}}; θ)$
13:: end for
14:: end for

Object detection datasets tend to have class imbalance, i.e., in given images, there are more objects of a specific class than other classes. Chen et al. [28] took this imbalance into account and proposed a Class-wise Adversarial Training (CWAT) technique for object detection tasks. CWAT generated adversarial examples using the Class-wise Adversarial (CWA) attack. CWAT used three different adversarial training techniques (multitask, object-wise, and class-wise) based on class-weighted loss, such that they not only balanced the influence of each class but also evenly improved adversarial robustness for all object classes. This helped reduce the dataset imbalance issue and, in turn, helped the object detection model to become robust against PGD, FGSM, and CWA attacks, as shown in Figure 3. CWAT used the fast adversarial training technique [70], and this resulted in the technique being 3.19× faster than MTD with 4 2080Ti GPUs and a batch size of 14 for each GPU.

Starting with YOLOv2 [71], instead of directly predicting class in a given anchor box, the authors instead predicted class and objectness for each anchor box. The objectness score predicted whether the anchor box contained any object. Most adversarial attacks covered so far only targeted either classification or localization, or possibly both, while ignoring objectness loss. Choi et al. [44] found that objectness-based gradient domain overlapped with both the classification and localization gradient domains. They modified the MTD algorithm to include objectness loss and generated three adversarial examples using classification, localization, and objectness loss. The one that gave the maximum loss among the three adversarial examples was used. To speed up training, adversarial examples were generated using FGSM with random initialization. Using this objectness loss in adversarial training, the authors were able to improve robustness on the KITTI data set and MS-COCO traffic (a subset of MS-COCO to include just eight categories: person, bicycle, car, motorcycle, bus, truck, traffic light, and stop sign).

Besides a decrease in accuracy on clean examples, adversarial training can lead to overfitting to specific attack types, which is why it exhibits weak generalization to different kinds of attacks on which the object detector was not trained. Jung et al. [45] took inspiration from domain adaptation and combined it with adversarial training to separate the characteristics of clean and adversarial samples into private and shared features to mitigate conflicts. They used an additional feature calibrator that prevented conflicts between clean and adversarial domains by recalibrating shared features of the adversarial domain. This resulted in good performance against unseen attacks (attacks that were not used to generate adversarial examples during adversarial training), such as DAG, CWA, C&W, as well as good performance against clean samples.

So far, the defenses covered have mainly focused on defense against classification, localization, or objectness attacks. Attacks such as Daedalus, Overload, and Phantom Sponge targeted the computational bottleneck inside the NMS to congest NMS processing pipeline, thus making the object detector unusable. Wang et al. [46] forged a background-attentive adversarial training technique that could defend against latency attacks. They were able to obtain a reasonable balance between clean and robust accuracy by exploiting the distinction between robust and non-robust features. The adversarial technique only used objectness loss instead of classification and bounding-box losses, thus reducing some redundancy and building background attention into the adversarial training pipeline. The primary limitation of the background-attentive adversarial technique was its reliance on hardware capacity to set the maximum number of candidate bounding boxes, thereby constraining robustness based on the underlying hardware. To potentially improve this defense, Xiao et al. [72] proposed leveraging spatial consistency to differentiate between clean and adversarial examples, a strategy that could be integrated into background-attentive adversarial training.

4.2.2. Changes to Model

AdvProp [73] showed that using adversarial examples as additional examples can prevent overfitting. They employed separate batchnorm [74] layers for clean training images and adversarial examples to accommodate their distinct statistics. Det-AdvProp [47] adapted it for the object detection task. They generate adversarial examples using FGSM such that one maximizes classification loss and the other maximizes localization loss. As the MTD paper showed that the gradients of these two losses can be entangled, the authors selected the example out of two that maximized the total loss of the detection task. They used the main batchnorm for clean examples and the auxiliary batchnorm for adversarial examples. The auxiliary batchnorm was discarded during inference. The authors showed that the resulting network trained using this technique is capable of improving mAP in MS-COCO and COCO-C datasets and can resist targeted and non-targeted adversarial attacks, although the strength of the attack is weak

ϵ \leq 3

.

MTD resulted in a robust detector at the expense of a drop in mAP values on clean images. Dong et al. [48] designed the RobustDet architecture that enhances SSD by inserting Adversarial Image Discriminator (AID) and Consistent Features with Reconstruction (CFR) into its backbone. The intent of these layers was to ensure adversarial robustness while being close to the non-robust SSD object detector’s mAP on clean images. The output of AID is fed into an adversarially aware convolution that uses different kernels for clean and adversarial images. CFR draws inspiration from VAE [75] to extract robust features so that it reconstructs consistent features of clean and adversarial images using clean images via adversarially aware convolutions. RobustDet’s mAP on clean images for the PASCAL VOC and MS-COCO test datasets shows a 2% and 5–6% decrease, respectively. It is capable of resisting larger perturbations better than MTD and CWAT, but Cheng et al. [49] found that RobustDet performs poorly when the number of steps in PGD classification and localization attacks is low (e.g., 2 or 3), that is, the attack intensity is low. Adversarial training generally ignores attack intensity. Cheng et al. [49] proposed a robust object detection method called Adversarial Intensity Aware Robust Detector (AIAD), which detected this attack intensity information in an image and categorized it as clean, weak, or strong. The intensity-aware Discriminator is trained to distinguish the adversarial intensity using k-step PGD classification and localization attacks, taking the adversarial example that causes the maximum loss. They then replace SSD’s convolution with intensity-guided convolution, which is based on dynamic convolutions [76], and remove batchnorm layers in SSD. Dynamic convolutions increase the representation capabilities of a convolutional neural network by adapting convolutional kernel weights to the input, i.e., convolution is a function of the input. This ensures that different images are processed by their own convolution kernels. Hence, AIAD was able to learn robust features in clean as well as adversarial images. Their experiments in the MS-COCO test dataset showed higher mAP values in clean,

A_{l o c}

(FGSM, PGD attack in localization domain),

A_{c l s}

(FGSM, PGD attack in classification domain), and CWA attacks compared with MTD, CWAT, and RobustDet defenses.

Pérez et al. [77] replaced standard convolution layers in networks such as AlexNet, VGG16, and Wide-ResNet with parameterized Gabor-structured convolution layers and observed that they could increase robustness against FGSM and PGD attacks. Gabor filters come from classic computer vision, where researchers use them to create a set of these filters with multiple frequencies and orientations to extract features from the input image. This paper uses these Gabor layers to create a regularizer based on the Lipschitz constant, which enhances the robustness of the network. Along these lines, Amirkhani et al. [50] replaced the convolutional layers in the single- and two-stage object detector backbones with convolutional Gabor layers. They enhance adversarial training by having a bank of Gabor filters pick up the robust features of the perturbed images to be considered for the object detection task. This technique achieved reasonable results against targeted attacks (TOG-mislabeling, TOG-vanishing, and TOG-fabrication) and untargeted random attacks (DAG, RAP, and UEA).

Zeng et al. [51] used contrastive learning to ensure that adversarial training does not affect the mAP of the object detector in clean images. The goal of this contrastive learning module is to reduce the high-dimensional features of the backbone and obtain low-dimensional contrastive features for the comparison similarity function. They used the output of this compact module to reduce the distance between clean and corresponding adversarial examples of the same class and kept clustering centers of different categories away from each other. Yang et al. [78] also proposed a robust object method based on contrastive learning. Rather than just comparing the distance between clean and corresponding adversarial examples of the same class, they used a circle loss function to decrease the distances between samples of the same class (clean vs. clean and adversarial vs. adversarial) and increase the distances between samples of different classes (clean vs. adversarial). Using this contrastive learning method allows the object detector to supervise dynamic convolutions [76] that extract robust features from clean and adversarial samples.

Object detection models usually have a pre-trained classification-based backbone. Using merely an adversarially trained backbone for the object detection task does not result in a significant improvement in the robustness of these object detection models, as standard training leads to the backbone losing its robustness. Awais et al. [52] proposed a computationally efficient method, Free Robust Object Detection (FROD), to preserve robustness through batch normalization updates. FROD-DAT is an enhanced version of FROD that further improves robustness by using two lightweight components called imitation loss and delayed adversarial training. Imitation loss allows the object detector to leverage robust features of a pre-trained robust backbone to ensure that adversarial training does not impact the backbone’s robustness.

One way to speed up adversarial training is to use FGSM with random initialization. Wong et al. [79] showed that the execution of adversarial training using adversarial examples generated by FGSM with random initialization was comparable to using adversarial examples generated using k-step PGD. Delayed adversarial training takes inspiration from [79], where the object detector is trained with clean examples for

t_{1}

epochs and single-step adversarial examples with random initialization for

t_{2}

epochs. Li et al. [53] enhance the robustness of the object detector by modifying the backbone to do more computation and reduce the computation of downstream classification and detection modules. Standard training for an object detection task involves training an upstream classification model and then using this as a backbone for the downstream object detection task and training this component. We have seen adversarial training so far that only focuses on the downstream object detection task. [53] showed that it was easy to transfer adversarial examples with different detection-specific modules compared with transferring between different backbone networks. They reuse gradients similar to [70]. They were able to use ConvNeXt-T as a backbone, allocate more computation toward the backbone, and reduce the computation of detection-specific modules.

4.3. Detection of Adversarial Noise

Adversarial training, while increasing the robustness of object detection networks against adversarial attacks, tends to cause performance degradation on clean samples. Given this disadvantage, a natural question is whether adversarial perturbations can be detected and filtered. Yang et al. [54] aimed to address this question by proposing a robust object detection method based on a contrastive learning perspective (RCP) that can learn features from both clean and adversarial samples. The technique also used the circle loss function similar to [78]. It is made up of two modules: (1) the RPU (robust optimization based on perspective of uniform metrics) module, which allows the backbone to assign different feature weights to dynamic convolution so that relevant features from clean and adversarial images are extracted, and (2) the PVB (perturbation filtering verification based on bilinear interpolation) module, which filters out the adversarial noise. The paper evaluated the defense of the SSD model against attacks like DAG, CWA, and PGD-based classification and localization attacks better than MTD, CWAT, and Robust DET for the PASCAL VOC dataset, while incurring a marginal drop in mAP on clean samples. For each attack, separate RPU-PVB modules are needed. Instead, following the approach proposed by Choi et al. [44], RPU-PVB can leverage objectness-based gradient domain PGD attacks, thus reducing the need to deploy distinct RPU-PVB modules tailored to individual attack types. It is important to note that the objectness-based gradient domain inherently overlaps with both the classification and localization gradient domains.

4.4. Architecture Changes

Contextual information refers to data obtained from an object’s own statistical property and its surrounding region, encompassing both intra-class and inter-class cues. This information can improve object detection performance by providing additional scene-level understanding. Transformers [10] consist of an encoder followed by a decoder. It achieved really good performance for Natural Language Processing (NLP) tasks and was also used in the case of computer vision tasks. One reason why it worked well is its ability to use contextual information. Alamri et al. [55] showed that by using just the transformer encoder module (TEDM) pre-trained on MS-COCO, one could improve labeling of object instances and improve the object detector’s performance on natural images and robustness to adversarial attacks. By simply adding this module to the object detector’s extracted features and running a classifier on top of it, the technique achieved higher mAP, F1 scores, and AUC average scores as compared with the Faster R-CNN detector. However, evaluating this module against stronger attacks such as TOG, PGD-classification, or PGD-bounding box could have further reinforced the effectiveness of the defense.

A function f is said to be

l - l o c a l l y

Lipschitz in a ball of radius r (

| | x - \tilde{x} | | \leq r

) for input x if for all

\tilde{x}

:

| | f (x) - f (\tilde{x}) | | \leq l | | x - \tilde{x} | |

. Similarly, Szegedy et al. used the Lipschitz bounds of each layer to express the instability of the network. Xu et al. [56] devised the Knowledge-Distilled Feature Alignment (KDFA) module and the Self-Supervised Feature Alignment (SSFA) module, which could guide the network to generate more effective features and reduce the instability of the network. They took advantage of concepts from [80] and used an object detector trained using standard training as a teacher and distilled the feature knowledge of the middle layer to the student network. KDFA tried to make sure that the output of the middle layer of the student model with perturbed input is close enough to the output of the middle layer of the teacher model with the clean example. SSFA guided the output of the middle layer of the student model with perturbed input to be equal to the output of the middle layer of the student model with clean input. Using this feature alignment, the object detector could generate more effective features that can strengthen the robustness against adversarial attacks. Xu et al. had a follow-up work [57] to address the drop in precision on clean inputs. They update the framework by extending KDFA to perform contrastive loss between the student and the teacher for clean images. The authors also adopted decoupled foreground-background features instead of global-pooling features to improve distillation effectiveness. The proposed technique improves robustness compared with FA [56] and Det-AdvProp [47].

4.5. Ensemble Defense

When authors of the DAG attack experimented with their attack in black-box scenarios, they found that it is difficult to transfer the perturbation generated for the source detector to the target detector if their architecture is different. UEA tries to address this by using GANs to construct these perturbations. FUSE [58] utilized an ensemble of object detection models that relied on heterogeneous backbone networks. The framework trained multiple object detection models at the same time and used a diversity regularizer that encouraged each model to learn differently using a diversified detection loss and/or kernel filter regularizer. Adversarial input could fool an ensemble of homogeneous detectors if they were trained using standard training, but if they were trained in the FUSE network, which encouraged diversity, they found that the attack had weak transferability. This group of detectors trained in this manner is capable of defending against adversarial perturbations generated by DAG, RAP, UEA, and TOG, as well as adversarial patch attacks such as [81,82]. Once this training is complete, during test time, the input image is sent to FUSE, which sends it to n detectors to perform independent object detection in parallel. The detection results, which contain the bounding boxes of detected objects, their objectness, and class labels along with probabilities, are sent to a custom detection fusion algorithm. This fusion algorithm uses a consistency evaluator to consolidate detections and a graph partitioning algorithm to fuse bounding boxes. FUSE also reported a higher mAP on clean images compared with the vanilla object detector that was trained using standard training.

An ensemble of models can usually help to give better results on a task they are attempting to solve. Chow et al. [59] attempted to answer the question: Is it better to use all available object detection models to solve object detection, or can we pick a subset of these models such that they are diverse and have weak error correlation between them? Does choosing such a subset give better mAP than individual models or all available models? To solve this, the authors have created a focal error diversity framework called EDI for robust object detection ensembles. Using focal error diversity measures such as Jaccard, Sorensen, and generalized diversity, they are able to construct subensembles that have higher mAP than individual base models as well as combinations of all models. They introduced negative sampling methods to capture the negative correlation of component models in the ensemble. Using these negative correlation scores, the authors present a focal error diversity measure. They used this score to develop an ensemble pruning method such that the top subensembles have a higher mAP on the given dataset. The paper also showed that using such an ensemble, they can resist adversarial attacks such as TOG-vanishing, TOG-fabrication, and TOG-mislabeling. Similarly to FUSE, mAP measured using ensembles is significantly better than individual models.

There are multiple ways to create an ensemble. These can include different backbones for object detection models, different training strategies, different architectures, etc. Peng et al. [60] tried to create an ensemble using the same architecture and the same backbone, but with a training strategy that incorporated diversity. The goal was to increase dissimilarity between submodels in an ensemble during the training process, which will encourage submodels to have different gradient vectors for the same input while minimizing overall loss of the object detection model. To achieve this, the authors devised a novel diversity training strategy such that penalties are added to the loss function based on the cosine similarity of input gradients (with respect to both localization loss and classification loss between submodels), as well as an aggregation algorithm that combines detected bounding boxes from all submodels in an ensemble. The training strategy requires minimal changes to the standard training algorithm. The ensemble was able to resist DAG, TOG-vanishing, and physical adversarial attacks to evade person detectors [83]. The defense is very similar to FUSE. It mainly differs in the manner in which outputs from the ensemble are aggregated and the fact that the ensemble here is created using the same architecture and the same backbone as compared with FUSE, which allows users to create an ensemble with different architectures with different backbones.

Training models on multiple tasks can improve robustness against adversarial attacks, as shown by some works such as Mao et al. [84] and Yeo et al. [85]. Chow et al. [61] improve their previous works [58,59] by creating a two-tier heterogeneous ensemble learning. In the first tier, similar to [58], Chow et al. [61] select heterogeneous models, which are composed of different backbones and different detection algorithms, and then identify a subset of these models using focal diversity metrics to create a superior ensemble (as measured by mAP) compared with the naive ensemble comprising all models. The detection results of the individual models in the ensemble are combined using the algorithm implemented in [58]. In the next tier, the authors incorporate a semantic segmentation model into the ensemble (created in the first tier) to further enhance adversarial robustness against TOG-mislabeling and TOG-vanishing attacks. Using these techniques, Chow et al. [61] build an ensemble team that has a high diversity and a low negative correlation between individual models in the ensemble.

Table 3 lists the defenses addressing

L_{p}

-bounded adversarial attacks on object detection. These methods represent the main approaches evaluated on standard datasets such as PASCAL VOC and MS-COCO and tested against common

L_{p}

-bounded attacks, including PGD, DAG, TOG, CWA, RAP, and latency-based perturbations.

4.6. Certified Defenses

Adversarial training is a technique that can empirically improve the robustness of a neural network. But is it possible to get a robustness guarantee against these attacks? Using a powerful attack to generate adversarial examples can be computationally expensive, as shown in the previous section on Adversarial Training. When adversarial training employs adversarial examples generated using a relatively weak attack method, such as the single-step FGSM, to enhance the robustness of a neural network, the resulting model typically fails to exhibit resilience against stronger, multi-step attacks, such as the k-step PGD method. Cohen et al. [62] answered this question, and they used randomized smoothing to provide certified bounds for

l_{2}

perturbations and thus guarantee the robustness of the classifier. The main idea behind this paper is to construct a classifier g from the classifier f such that we have robustness guarantees to adversarial perturbations under the

l_{2}

norm. Their paper provides a guarantee of robustness with high probability and is scalable to classification architectures that were trained on ImageNet data. But the object detection task is far more complicated than classification due to the size of networks and the object detector’s variable-length output, which includes class labels and bounding box coordinates.

Chiang et al. [87] extended the idea proposed by Cohen et al. [62] to certify object detection networks. This is the first model-agnostic, training-free, and certified defense for object detection against

l_{2}

-bound attacks. They reduce the object detection task to a regression problem. They then implement a method to obtain a certifiably robust wrapper of YOLOv3, Mask R-CNN, and Faster R-CNN. They used this approach to guarantee robustness against all possible attackers within the threat model. The authors used a Monte Carlo sampling process for inference. They took 2000 samples to approximate the smoothed model for each image. Although it can be quite expensive, the technique presented by the authors was able to achieve nontrivial certified AP without any retraining of the base detector. As object detectors output variable-length output, the existing certification techniques based on randomized smoothing are weaker. The authors developed a simple sorting and binning technique that can address the impact of smoothing.

The goal of formal verification is to ensure that a given neural network satisfies the desired properties under all possible input scenarios. Elboher et al. [63] developed an approach to identify and mitigate vulnerabilities such as adversarial attacks, as well as to improve model robustness and reliability. In this paper, the authors use the Alpha-Beta-Crown verifier, which supports networks with multiple outputs and a variety of activation functions. The paper treats provable defenses against object detection attacks as a formal verification problem. The authors mainly focused on misclassification and misdetection attacks. The proposed technique only works for the single-object case. It does not deal with post-processing steps, which could reduce false positives.

5. Discussion and Future Work

5.1. Mapping Between Attacks and Defenses

To better illustrate the relationship between adversarial attacks and corresponding defense mechanisms, Table 4 summarizes how each of the three attack types affects detection and which defense categories provide partial or full mitigation.

5.2. Unified Evaluation Pipeline

To facilitate reproducible and comparable research, we outline a unified pipeline for evaluating adversarial attacks and defenses on object detection models:

Dataset selection: Standard datasets such as PASCAL VOC and MS-COCO should be used. For real-time detectors, a subset of BDD100K or KITTI can be used for efficiency.
Model preparation: Train or fine-tune representative detectors (e.g., Faster R-CNN, YOLOv, SSD, DETR) using standard clean datasets.
Attack simulation: Apply common $L_{p}$ -bounded attacks (PGD, C&W, TOG, UEA, CWA, DAG, RAP, OOA) and latency-based attacks such as Daedalus, Overload, and Phantom Sponge at multiple perturbation strengths ( $ϵ$ values). Evaluate both white-box and transfer (black-box) settings.
Integration of defenses: Implement each defense in its respective category—input transformations, adversarial training, noise detection modules, or certified smoothing.
Evaluation metrics: Measure performance using clean and adversarial mAP, per-class accuracy, detection latency, and certified radius (for certified defenses).
Visualization: Use Grad-CAM to qualitatively assess robustness and localization fidelity.

5.3. Future Work and Research Directions

We have discussed state-of-the-art adversarial attacks for object detection tasks and covered most of the defenses trying to safeguard object detection models against these attacks. This brings us to share some key insights that can further help improve the landscape of defenses:

Broadening the attack surface
Table 3 summarizes the empirical defenses along with the corresponding attacks they address. Notably, only one defense, Underload [46], specifically targets latency-based attacks such as Daedalus, Overload, and Phantom Sponge, indicating a clear need for further research and development of defenses in this area. Similarly, many defenses are validated only against PGD attacks. Relatively few are tested against stronger or more recent object detection–specific attacks (DAG, TOG, CWA, universal adversarial perturbations). This creates blind spots, in that if the defense is not evaluated against a broad attack set, it may fail when deployed for use.
Scalability and training efficiency
Table 5 lists the impacts of defenses on mAP for clean images and the additional training time they incur as they rely on multi-step PGD adversarial training. Methods that reuse gradients [70] can reduce the cost. The loss of accuracy on clean data can be alleviated by strengthening the backbone classification network [53] or incorporating a Transformer-Encoder Module [55] to capture broader contextual information.
Benchmarking and evaluation frameworks
Existing defenses are evaluated under different experimental setups that differ in model architectures, datasets, or attack configurations. Many defenses only test on a single dataset, such as PASCAL VOC or MS-COCO, and on a single detector, typically YOLO, SSD, or Faster R-CNN. Transferability across datasets, architectures, and deployment scenarios is rarely shown. A shared evaluation pipeline, like the one described in previous Section 5.2, can help standardize both accuracy- and latency-based metrics. Integrating visualization tools like Grad-CAM [88] can strengthen interpretability. Further research is also needed to develop explanations for both robust and non-robust features.
Hybrid and cross-domain defenses
Chiang et al. [87] proposed certifying object detection through median smoothing. Their approach relies on Monte Carlo sampling, which requires about 2000 samples per image to approximate the smoothed model, making inference computationally expensive. A more practical direction would be to leverage data generated with diffusion models, as demonstrated by Altstidl et al. [89], to obtain stronger robustness certificates on PASCAL VOC and MS-COCO under various $L_{p}$ attacks. A key opportunity for future work lies in developing formal verification techniques for larger architectures such as YOLO and Faster R-CNN, aimed at verifying issues like overdetection, misclassification, and misdetection in multi-object settings. In the ensemble defense section, we covered four approaches that leverage diversity (across upstream backbones or downstream detectors) to improve robustness against adversarial attacks. The Daedalus attack [36] demonstrated how ensembles of surrogate models can be exploited to bypass defenses. These findings underscore a potential limitation of ensemble defenses: the same diversity that strengthens defenses can be harnessed by adversaries to improve transferability. Future research could therefore explore adaptive ensemble defenses that dynamically reconfigure model diversity in response to evolving attacks or investigate hybrid strategies combining ensembles with input transformation.

6. Conclusions

In this survey paper, we have highlighted the broad spectrum of

L_{p}

attack strategies that threaten object detectors, such as mislabeling of objects, making all foreground objects disappear, reducing detection accuracy, and significantly increasing processing time using latency attacks. Each of these attack strategies presents a distinct challenge, often exploiting different aspects of the machine learning pipeline. To counter these threats, we covered a variety of defenses, including adversarial training, input preprocessing, robust optimization, ensemble model defenses, model changes, detection of adversarial noise, and certified defenses. We find that adversarial training remains the most practical and widely used defense, although it usually sacrifices accuracy on clean examples. Preprocessing, ensemble model defense, and architectural change offer complementary protection but can be circumvented by strong adaptive attacks. Certified defenses are promising for long-term robustness but need scalability improvements, as well as formal verification techniques for larger object detectors in multi-object detection scenarios. In general, the survey reveals that no single defense provides universal robustness. This underscores the need for layered defense strategies that combine complementary approaches, as well as adaptive mechanisms capable of evolving alongside emerging attack techniques. Several open problems persist, such as a need for a defense that balances robustness with efficiency with a small loss of accuracy, a better evaluation standard, and a benchmark to compare defenses fairly across a diverse set of attacks, such as TOG, UEA, sponge/latency, and universal perturbations.

Author Contributions

Conceptualization, A.T., P.T., and G.R.; methodology, A.T., P.T., and G.R.; formal analysis, A.T.; investigation, A.T.; writing—original draft preparation, A.T.; writing—review and editing, P.T. and G.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

In this study, no new data was created or analyzed. Data sharing is not applicable to this article.

Conflicts of Interest

Author Giuseppe Raffa was employed by the company Intel (United States). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R.B. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar]
Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2019; pp. 10778–10787. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
Cheng, T.; Song, L.; Ge, Y.; Liu, W.; Wang, X.; Shan, Y. YOLO-World: Real-Time Open-Vocabulary Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16901–16911. [Google Scholar]
Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Li, C.; Yang, J.; Su, H.; Zhu, J.; et al. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. In Proceedings of the European Conference on Computer Vision, Paris, France, 2–3 October 2023. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.J.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. [Google Scholar]
Carlini, N.; Wagner, D.A. Towards Evaluating the Robustness of Neural Networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
Moosavi-Dezfooli, S.; Fawzi, A.; Fawzi, O.; Frossard, P. Universal Adversarial Perturbations. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 86–94. [Google Scholar]
Moosavi-Dezfooli, S.; Fawzi, A.; Frossard, P. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Xie, C.; Wang, J.; Zhang, Z.; Zhou, Y.; Xie, L.; Yuille, A.L. Adversarial Examples for Semantic Segmentation and Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1378–1387. [Google Scholar]
Wei, X.; Liang, S.; Cao, X.; Zhu, J. Transferable Adversarial Attacks for Image and Video Object Detection. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.C.; Bengio, Y. Generative Adversarial Networks. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–7. [Google Scholar]
Lin, T.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
Agnew, C.; Eising, C.; Denny, P.; Scanlan, A.G.; Van de Ven, P.; Grua, E.M. Quantifying the Effects of Ground Truth Annotation Quality on Object Detection and Instance Segmentation Performance. IEEE Access 2023, 11, 25174–25188. [Google Scholar] [CrossRef]
Chen, P.-C.; Kung, B.-H.; Chen, J.-C. Class-Aware Robust Adversarial Training for Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10415–10424. [Google Scholar] [CrossRef]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 2633–2642. [Google Scholar]
Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the Neural Information Processing Systems (2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Li, Y.; Tian, D.; Chang, M.; Bian, X.; Lyu, S. Robust Adversarial Perturbation on Deep Proposal-based Models. In Proceedings of the British Machine Vision Conference, Newcastle, UK, 3–6 September 2018. [Google Scholar]
Wang, Y.; Tan, Y.; Zhang, W.; Zhao, Y.; Kuang, X. An adversarial attack on DNN-based black-box object detectors. J. Netw. Comput. Appl. 2020, 161, 102634. [Google Scholar] [CrossRef]
Venter, G.; Sobieszczanski-Sobieski, J. Particle Swarm Optimization. In Advances in Metaheuristic Algorithms for Optimal Design of Structures; Springer: Cham, Switzerland, 2019. [Google Scholar]
Wang, D.; Li, C.; Wen, S.; Nepal, S.; Xiang, Y. Daedalus: Breaking Non-Maximum Suppression in Object Detection via Adversarial Examples. arXiv 2019, arXiv:1902.02067. [Google Scholar] [CrossRef] [PubMed]
Shapira, A.; Zolfi, A.; Demetrio, L.; Biggio, B.; Shabtai, A. Phantom Sponges: Exploiting Non-Maximum Suppression to Attack Deep Object Detectors. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–7 January 2023; pp. 4560–4569. [Google Scholar]
Chen, E.; Chen, P.; Chung, I.; Lee, C. Overload: Latency Attacks on Object Detection for Edge Devices. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 24716–24725. [Google Scholar]
Ultralytics. Yolov5. 2022. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 August 2025).
Chow, K.; Liu, L.; Loper, M.L.; Bae, J.; Gursoy, M.E.; Truex, S.; Wei, W.; Wu, Y. Adversarial Objectness Gradient Attacks in Real-time Object Detection Systems. In Proceedings of the 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Atlanta, GA, USA, 28–31 October 2020; pp. 263–272. [Google Scholar]
Nguyen, K.N.; Zhang, W.; Lu, K.; Wu, Y.; Zheng, X.; Tan, H.L.; Zhen, L. A Survey and Evaluation of Adversarial Attacks in Object Detection. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 15706–15722. [Google Scholar] [CrossRef] [PubMed]
Zhou, L.; Liu, Q.; Zhou, S. Preprocessing-based Adversarial Defense for Object Detection via Feature Filtration. In Proceedings of the 7th International Conference on Algorithms, Computing and Systems (ICACS ’23). Association for Computing Machinery, New York, NY, USA, 19–21 October 2024; pp. 80–87. [Google Scholar]
Zhang, H.; Wang, J. Towards Adversarially Robust Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 421–430. [Google Scholar]
Choi, J.I.; Tian, Q. Adversarial Attack and Defense of YOLO Detectors in Autonomous Driving Scenarios. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 4–9 June 2022; pp. 1011–1017. [Google Scholar]
Jung, Y.; Song, B.C. Toward Enhanced Adversarial Robustness Generalization in Object Detection: Feature Disentangled Domain Adaptation for Adversarial Training. IEEE Access 2024, 12, 179065–179076. [Google Scholar] [CrossRef]
Wang, T.; Wang, Z.; Wang, C.; Shu, Y.; Deng, R.; Cheng, P.; Chen, J. Can’t Slow me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025. [Google Scholar]
Chen, X.; Xie, C.; Tan, M.; Zhang, L.; Hsieh, C.; Gong, B. Robust and Accurate Object Detection via Adversarial Learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16617–16626. [Google Scholar]
Dong, Z.; Wei, P.; Lin, L. Adversarially-Aware Robust Object Detector. In Computer Vision—ECCV 2022. Lecture Notes in Computer Science; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer: Cham, Switzerland, 2022; Volume 13669. [Google Scholar]
Cheng, J.; Huang, B.; Fang, Y.; Han, Z.; Wang, Z. Adversarial intensity awareness for robust object detection. Comput. Vis. Image Underst. 2024, 251, 104252. [Google Scholar] [CrossRef]
Amirkhani, A.; Karimi, M.P. Adversarial defenses for object detectors based on Gabor convolutional layers. Vis. Comput. 2022, 38, 1929–1944. [Google Scholar] [CrossRef]
Zeng, W.; Gao, S.; Zhou, W.; Dong, Y.; Wang, R. Improving the Adversarial Robustness of Object Detection with Contrastive Learning. In Chinese Conference on Pattern Recognition and Computer Vision; Springer: Singapore, 2023. [Google Scholar]
Muhammad, A.; Zhuang, W.; Lyu, L.; Bae, S.-H. FROD: Robust Object Detection for Free. arXiv 2023, arXiv:2308.01888. [Google Scholar] [CrossRef]
Li, X.; Chen, H.; Hu, X. On the Importance of Backbone to the Adversarial Robustness of Object Detectors. IEEE Trans. Inf. Forensics Secur. 2025, 20, 2387–2398. [Google Scholar] [CrossRef]
Yang, H.; Wang, X.; Chen, Y.; Dou, H.; Zhang, Y. RPU-PVB: Robust object detection based on a unified metric perspective with bilinear interpolation. J. Cloud Comput. 2023, 12, 169. [Google Scholar] [CrossRef]
Alamri, F.; Kalkan, S.; Pugeault, N. Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2020; pp. 9577–9584. [Google Scholar]
Xu, W.; Huang, H.; Pan, S. Using Feature Alignment Can Improve Clean Average Precision and Adversarial Robustness in Object Detection. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 2184–2188. [Google Scholar]
Xu, W.; Chu, P.; Xie, R.; Xiao, X.; Huang, H. Robust and Accurate Object Detection Via Self-Knowledge Distillation. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 91–95. [Google Scholar]
Chow, K.-H. Robust Object Detection Fusion Against Deception. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Singapore, 14–18 August 2021. [Google Scholar]
Chow, K.H.; Liu, L. Boosting Object Detection Ensembles with Error Diversity. In Proceedings of the 2022 IEEE International Conference on Data Mining (ICDM), Orlando, FL, USA, 28 November–1 December 2022; pp. 903–908. [Google Scholar]
Peng, Z.; Chen, X.; Huang, W.; Kong, X.; Li, J.; Xue, S. Shielding Object Detection: Enhancing Adversarial Defense through Ensemble Methods. In Proceedings of the 2024 5th Information Communication Technologies Conference (ICTC), Nanjing, China, 10–12 May 2024; pp. 88–97. [Google Scholar] [CrossRef]
Wu, Y.; Chow, K.; Wei, W.; Liu, L. Exploring Model Learning Heterogeneity for Boosting Ensemble Robustness. In Proceedings of the 2023 IEEE International Conference on Data Mining (ICDM), Shanghai, China, 1–4 December 2023; pp. 648–657. [Google Scholar]
Cohen, J.M.; Rosenfeld, E.; Kolter, J.Z. Certified Adversarial Robustness via Randomized Smoothing. arXiv 2019, arXiv:1902.02918. [Google Scholar] [CrossRef]
Elboher, Y.Y.; Raviv, A.; Weiss, Y.L.; Cohen, O.; Assa, R.; Katz, G.; Kugler, H. Formal Verification of Deep Neural Networks for Object Detection. arXiv 2024, arXiv:2407.01295. [Google Scholar]
Ilyas, A.; Santurkar, S.; Tsipras, D.; Engstrom, L.; Tran, B.; Madry, A. Adversarial Examples Are Not Bugs, They Are Features. In Proceedings of the Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar] [PubMed]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral Normalization for Generative Adversarial Networks. arXiv 2018, arXiv:1802.05957. [Google Scholar] [CrossRef]
Tsipras, D.; Santurkar, S.; Engstrom, L.; Turner, A.; Madry, A. There Is No Free Lunch In Adversarial Robustness (But There Are Unexpected Benefits). arXiv 2018, arXiv:1805.12152. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.S.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2014, 115, 211–252. [Google Scholar] [CrossRef]
Shafahi, A.; Najibi, M.; Ghiasi, A.; Xu, Z.; Dickerson, J.P.; Studer, C.; Davis, L.S.; Taylor, G.; Goldstein, T. Adversarial Training for Free! In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 3358–3369. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2016; pp. 6517–6525. [Google Scholar]
Xiao, C.; Deng, R.; Li, B.; Yu, F.; Liu, M.; Song, D.X. Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation. arXiv 2018, arXiv:1810.05162. [Google Scholar] [CrossRef]
Xie, C.; Tan, M.; Gong, B.; Wang, J.; Yuille, A.L.; Le, Q.V. Adversarial examples improve image recognition. In Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11030–11039. [Google Scholar]
P’erez, J.C.; Alfarra, M.; Jeanneret, G.; Bibi, A.; Thabet, A.K.; Ghanem, B.; Arbel’aez, P. Gabor Layers Enhance Network Robustness. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2019. [Google Scholar]
Yang, H.; Chen, Y.; Dou, H.; Luo, Y.; Tan, C.J.; Zhang, Y. Robust Object Detection Based on a Comparative Learning Perspective. In Proceedings of the 2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Abu Dhabi, United Arab Emirates, 14–17 November 2023; pp. 0557–0563. [Google Scholar]
Wong, E.; Rice, L.; Kolter, J.Z. Fast is better than free: Revisiting adversarial training. arXiv 2020, arXiv:2001.03994. [Google Scholar] [CrossRef]
Hinton, G.E.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Thys, S.; Ranst, W.V.; Goedemé, T. Fooling Automated Surveillance Cameras: Adversarial Patches to Attack Person Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 49–55. [Google Scholar]
Liu, X.; Yang, H.; Liu, Z.; Song, L.; Chen, Y.; Li, H.H. DPATCH: An Adversarial Patch Attack on Object Detectors. arXiv 2018, arXiv:1806.02299. [Google Scholar]
Xu, K.; Zhang, G.; Liu, S.; Fan, Q.; Sun, M.; Chen, H.; Chen, P.; Wang, Y.; Lin, X. Adversarial T-Shirt! Evading Person Detectors in a Physical World. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part V; Springer: Berlin/Heidelberg, Germany, 2020; pp. 665–681. [Google Scholar]
Mao, C.; Gupta, A.; Nitin, V.; Ray, B.; Song, S.; Yang, J.; Vondrick, C. Multitask learning strengthens adversarial robustness. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part II, Ser. Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12347, pp. 158–174. [Google Scholar]
Yeo, T.; Kar, O.F.; Zamir, A. Robustness via cross-domain ensembles. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 12189–12199. [Google Scholar]
Mopuri, K.R.; Garg, U.; Babu, R.V. Fast feature fool: A data independent approach to universal adversarial perturbations. arXiv 2017, arXiv:1707.05572. [Google Scholar] [CrossRef]
Chiang, P.Y.; Curry, M.; Abdelkader, A.; Kumar, A.; Dickerson, J.; Goldstein, T. Detection as Regression: Certified Object Detection by Median Smoothing. Adv. Neural Inf. Process. Syst. 2020, 33, 1275–1286. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Altstidl, T.R.; Dobre, D.; Kosmala, A.; Eskofier, B.M.; Gidel, G.; Schwinn, L. On the Scalability of Certified Adversarial Robustness with Generated Data. In Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]

Figure 1. Faster R-CNN output on: clean image (left), adversarial image generated using PGD attack (right).

Figure 2. Yolov5 [39] output on a clean image (left) and an adversarial image generated using the Overload [38] attack (right).

Figure 3. (Source: [28]) Standard SSD output on clean image (left), standard SSD output on CWA attack (center), and CWAT SSD output on CWA attack (right). The bounding box color indicates the category. In the left most image, all boxes belong to vehicle category. When the image is attacked, they end up generating bounding boxes belonging to animal or other categories. Hence they are represented in different color.

Table 1. Object detection datasets.

Dataset	Description
PASCAL VOC 2007-12 [29]	20 classes in 4 categories person, animal, vehicle and indoor. 11,530 training images and 4921 test images.
MS-COCO [26]	80 classes. 80k training images and 40k validation images (115k/5k in 2017), 40k test images
KITTI [30]	7481 training images and 7518 test images, containing 80,256 labeled objects
BDD [31]	100k videos with objects belonging to 12 classes (lane, driveable area, car, traffic sign, traffic light, person, train, motor, rider, bike, bus, truck.)

Table 2. Defense techniques and the papers in that category.

Defense Techniques	Defenses
Preprocessing	Feature Filtration [42]
Adversarial Training	Changes to Loss function: MTD [43], CWAT [28], OOD [44], FDDA [45], Underload [46] Changes to Model: Det-AdvProp [47], RobustDet [48], AIAD [49], Gabor conv layers [50], Contrastive Adv Training [51], FROD [52], Backbone Adv Robustness [53]
Detection of Adversarial Noise	RPU-PVB [54]
Architecture Changes	Transformer-Encoder Module [55], FA [56], UDFA [57]
Ensemble Defense	FUSE [58], Error Diversity Framework(EDI) [59], DEM [60], Two-tier ensemble [61]
Certified Defenses	Certified Object Detection by Median Smoothing [62], Formal Verification of Deep Neural Networks for Object Detection [63]

Table 3. Empirical defenses and the attacks the authors tested against.

Defenses	PGD *	DAG	RAP	CWA	OOA **	TOG	UEA	FFF [86]	Latency ***
Feature Filtration [42]	x	x	x	x	x	✓	✓	x	x
MTD [43]	✓	✓	✓	x	x	x	x	x	x
CWAT [28]	✓	✓	x	✓	x	x	x	x	x
OOD [44]	x	x	x	x	✓	x	x	x	x
FDDA [45]	✓	✓	x	✓	x	x	x	x	x
Underload [46]	x	x	x	x	x	x	x	x	✓
Det-AdvProp [47]	✓	x	x	x	x	x	x	x	x
RobustDet [48]	✓	✓	x	✓	x	x	x	x	x
AIAD [49]	✓	x	x	✓	x	x	x	x	x
Gabor conv layers [50]	x	✓	✓	x	x	✓	✓	x	x
Contrastive Adv Training [51]	✓	x	x	✓	x	x	x	x	x
FROD [52]	✓	x	x	x	x	x	x	x	x
Backbone Adv Robustness [53]	✓	x	x	✓	x	x	x	x	x
RPU-PVB [54]	✓	✓	x	✓	x	x	x	x	x
Transformer-Encoder Module [55]	x	x	x	x	x	x	x	✓	x
FA [56]	✓	x	x	x	x	x	x	x	x
UDFA [57]	✓	x	x	x	x	x	x	x	x
FUSE [58]	x	✓	✓	x	x	✓	✓	x	x
EDI [59]	x	x	x	x	x	✓	x	x	x
DEM [60]	x	✓	x	x	x	✓	x	x	x
Two-tier ensemble [61]	x	x	x	x	x	✓	x	x	x

* PGD also covers FGSM. ** Objectness Oriented Attack. *** Latency attacks like Daedalus, Phantom, Overload.

Table 4. Mapping between three attack types and six defense categories. A check mark (✓) indicates that the defense provides partial or direct mitigation.

Attack Type	Preprocessing	Adversarial Training	Noise Detection	Architectural Change	Ensemble	Certified Defense
Mislabeling	✓	✓	✓		✓	✓
Bounding-box		✓		✓	✓	✓
Fabrication			✓	✓	✓

Table 5. Empirical defenses and the impact on mAP for clean images and training time.

Defenses	mAP on Clean Images	Additional Training Time
Feature Filtration [42]	12% decrease for YOLOv3, Faster R-CNN	Time to train the filter and critic networks
MTD [43]	Decreases (12 in case of SSD)	k-step PGD cls and loc attack time
CWAT [28]	Decreases, but less than MTD	3.19× faster than MTD
OOD [44]	Not mentioned	FGSM attack 3 times for each batch per epoch
FDDA [45]	0.8% decrease on MS-COCO, 0.3–0.5% decrease for PASCAL VOC	24% increase compared with standard adversarial training
Underload [46]	6 to 13% decrease on PASCAL VOC	k-step PGD loop till maximum number of candidates are reached
Det-AdvProp [47]	+1.1 mAP increase	k-step PGD cls and loc attack time
RobustDet [48]	Decrease of 2.7% for PASCAL VOC, 6% for MS-COCO	Time to train AID and CFR
AIAD [49]	2% decrease for PASCAL VOC, 10% decrease for MS-COCO	k-step PGD cls and loc attack time
Gabor conv layers [50]	<1% drop in accuracy reported	Perturbation generation time and discrete gabor filter bank generation time
Contrastive Adv Training [51]	Higher than MTD, CWAT, FA	perturbation generation time, contrastive learning module training time
FROD [52]	10% decrease for PASCAL VOC (less than MTD, CWAT)	Minimal, takes advantage of [70] technique
Backbone Adv Robustness [53]	Increased for PASCAL VOC (Faster R-CNN), Slight decrease in case of MS-COCO dataset	Adv training of backbone and object detector (took advantage of [70])
RPU-PVB [54]	Decrease of 2.8% for PASCAL VOC, 5.8% for MS-COCO	MTD adversarial sample generation time, RCP module training
Transformer-Encoder Module [55]	Increases mAP by 4.71 for MS-COCO dataset	No change, pre-trained Transfer encoder is used
FA [56]	Slight decrease	k-step PGD training and distillation time
UDFA [57]	Increase in mAP by 1.6 AP for PASCAL VOC	k-step PGD cls and loc attack time and feature generation time
FUSE [58]	Increases mAP by 1.20× for Faster R-CNN	Time to perform diverse joint training of detection models
EDI [59]	Increases mAP by 1.27× for Faster R-CNN	Time to create the best ensemble using EDI
DEM [60]	Increases for PASCAL VOC, mild increase for MS-COCO	9x increase per epoch
Two-tier ensemble [61]	Increases mAP by 1.07× for Faster R-CNN, 1.18x for SSD	Time to create ensemble

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Thunuguntla, A.; Tadepalli, P.; Raffa, G. Defenses Against Adversarial Attacks on Object Detection: Methods and Future Directions. Information 2025, 16, 1003. https://doi.org/10.3390/info16111003

AMA Style

Thunuguntla A, Tadepalli P, Raffa G. Defenses Against Adversarial Attacks on Object Detection: Methods and Future Directions. Information. 2025; 16(11):1003. https://doi.org/10.3390/info16111003

Chicago/Turabian Style

Thunuguntla, Anant, Prasad Tadepalli, and Giuseppe Raffa. 2025. "Defenses Against Adversarial Attacks on Object Detection: Methods and Future Directions" Information 16, no. 11: 1003. https://doi.org/10.3390/info16111003

APA Style

Thunuguntla, A., Tadepalli, P., & Raffa, G. (2025). Defenses Against Adversarial Attacks on Object Detection: Methods and Future Directions. Information, 16(11), 1003. https://doi.org/10.3390/info16111003

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Defenses Against Adversarial Attacks on Object Detection: Methods and Future Directions

Abstract

1. Introduction

2. Background

2.1. Object Detection

2.2. Adversarial Attacks on Image Classification

3. Adversarial Attacks on Object Detection

3.1. Mislabeling Attack

3.2. Bounding Box Attack

3.3. Fabrication Attack

4. Defenses Against Object Detection Attacks

4.1. Preprocessing

4.2. Adversarial Training

4.2.1. Changes to Loss Function

4.2.2. Changes to Model

4.3. Detection of Adversarial Noise

4.4. Architecture Changes

4.5. Ensemble Defense

4.6. Certified Defenses

5. Discussion and Future Work

5.1. Mapping Between Attacks and Defenses

5.2. Unified Evaluation Pipeline

5.3. Future Work and Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI