Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World

Biton, Dudi; Shams, Jacob; Koda, Satoru; Shabtai, Asaf; Elovici, Yuval; Nassi, Ben

doi:10.3390/jcp5040108

Open AccessArticle

Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World^†

by

Dudi Biton

^1,*

,

Jacob Shams

¹,

Satoru Koda

²

,

Asaf Shabtai

¹,

Yuval Elovici

¹

and

Ben Nassi

^1,*

¹

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel

²

Fujitsu Limited, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa 211-8588, Japan

^*

Authors to whom correspondence should be addressed.

^†

PAPLA Video Demonstration: https://www.youtube.com/watch?v=AtambR-sJD4.

J. Cybersecur. Priv. 2025, 5(4), 108; https://doi.org/10.3390/jcp5040108

Submission received: 9 October 2025 / Revised: 16 November 2025 / Accepted: 28 November 2025 / Published: 1 December 2025

(This article belongs to the Section Privacy)

Download

Browse Figures

Versions Notes

Abstract

The traditional process for learning patch-based adversarial attacks, conducted in the digital domain and later applied in the physical domain (e.g., via printed stickers), may suffer reduced performance due to adversarial patches’ limited transferability between domains. Given that previous studies have considered using film projectors to apply adversarial attacks, we ask: Can adversarial learning (i.e., patch generation) be performed entirely in the physical domain using a film projector? In this work, we propose the Physical-domain Adversarial Patch Learning Augmentation (PAPLA) framework, a novel end-to-end (E2E) framework that shifts adversarial learning from the digital domain to the physical domain using a film projector. We evaluate PAPLA in scenarios, including controlled laboratory and realistic outdoor settings, demonstrating its ability to ensure attack success compared to conventional digital learning–physical application (DL-PA) methods. We also analyze how environmental factors such as projection surface color, projector strength, ambient light, distance, and the target object’s angle relative to the camera affect patch effectiveness. Finally, we demonstrate the feasibility of the attack against a parked car and a stop sign in a real-world outdoor environment. Our results show that under specific conditions, E2E adversarial learning in the physical domain eliminates transferability issues and ensures evasion of object detectors. We also discuss the challenges and opportunities of adversarial learning in the physical domain and identify where this approach is more effective than using a sticker.

Keywords:

privacy; physical adversarial attacks; physical adversarial patches; projection-based adversarial attacks

1. Introduction

In recent years, object detectors have been integrated into various systems that collect data from the physical domain, including security cameras [1], license plate recognition systems [2], and autonomous vehicles [3]. However, these object detectors are vulnerable to adversarial attacks, where small, often imperceptible, image alterations can cause object detection systems to misclassify input. These misclassifications present significant challenges for the reliability and safety of such systems.

Previous studies have carried out the generation process of adversarial examples in the digital domain [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. Using these digitally generated patches, several studies [4,9,10,11,13,14,15,16,18,19,22,23,31,35,36,43] printed them as stickers and applied them in the physical domain. However, these methods face limitations due to the challenge of transferability between the digital and physical domains. Specifically, patches that are effective in the digital domain may not perform as intended when applied in the physical domain. Figure 1 illustrates this issue. In the non-adversarial case (Figure 1a), the object detector identifies the cup with a high confidence score of 1.0. In the digital learning-digital application (DL-DA) scenario (Figure 1b), the adversarial patch (applied digitally) successfully reduces the confidence score to 0.4, achieving its goal. However, when the same patch is printed and applied in the physical domain, as shown in the digital learning–physical application (DL-PA) scenario (Figure 1c), it fails to affect the object detector, which maintains a confidence score of 1.0. This issue of transferability raises two essential questions: (1) Can adversarial patch learning be conducted entirely in the physical domain to ensure that physical patches maintain their effectiveness in the application phase? (2) Under what constraints may the physical learning approach yield better results than the traditional digital learning approach?

In this paper, we address the transferability issue by extending digital adversarial learning methodologies to the physical domain. We start by adapting the learning process of digital patch attacks, in particular the Dpatch [14] attack and the Naturalistic Adversarial Patch (NAP) [4] attack, for physical domain adversarial learning. We extend the conventional approach, where adversarial learning occurs entirely in the digital environment, by introducing Physical-domain Adversarial Patch Learning Augmentation (PAPLA), a new framework that enables the conversion of digital adversarial learning processes to their physical-domain equivalents. PAPLA addresses the difficulty of transferring digital adversarial examples to physical scenarios by performing the adversarial patch learning process end-to-end (E2E) in the physical domain using a projector. We find that transferring digital patches to the physical domain is affected by: (1) additional noise from external environmental factors, and (2) difficulty in matching digital colors to printed colors. By performing patch learning E2E in the physical domain, these factors are either avoided or integrated into the learning process. As illustrated in Figure 1d, PAPLA’s physical learning–physical application (PL-PA) approach enables E2E adversarial patch learning and application directly in the physical domain, thus completely hiding the cup from the object detector. Prior studies [37,38,39,44] have explored the use of projectors to apply adversarial attacks directly in the physical domain, leveraging light-based projections to mislead detection systems. We use projectors because they allow rapid iteration of adversarial patches, supporting patch learning in the physical domain.

Our research shows that under specific conditions, PAPLA improves adversarial patch effectiveness by reducing object detector confidence scores in real-world settings. By incorporating environmental factors like distance, angle, and lighting into the learning process, PAPLA enhances robustness compared to traditional digital learning–physical application methods. However, its effectiveness is influenced by several factors. For example, the color of the projection surface impacts performance, with lighter surfaces yielding better results. Furthermore, PAPLA introduces higher

L_{2}

and

L_{\infty}

norms compared to digital learning–physical application patches, indicating a trade-off between image quality and the success of the attack. Environmental factors, including the distance and angle of the camera and the light intensity of the projector, also play a critical role: greater projector strength and optimized camera positioning improve attack effectiveness. While PAPLA demonstrates clear advantages in controlled environments, it requires careful setup. Most of the experiments described in this work were conducted in a controlled laboratory environment to ensure consistency and reproducibility. However, we also conducted experiments in an outdoor environment to evaluate PAPLA’s robustness under realistic and dynamic conditions and demonstrate E2E physical learning and application against a parked car and a stop sign, underscoring its relevance to safety-critical scenarios, such as in autonomous vehicles. In our evaluations, we applied PAPLA to object detectors including YOLOv3 [45], YOLOv4 [46], and Faster R-CNN [47]. These models are frequently studied in the context of autonomous driving security [44,48], thus substantiating the implications of our findings for real-world systems.

Contributions. Our contributions can be summarized as follows: (1) In contrast to previous works where adversarial learning and attack application were conducted in different domains (e.g., [4,8,9,10,11,13,14,15,16,17,18,19,20,21,22,23,24,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]), we take the first step in conducting E2E adversarial learning entirely in the physical domain. We present PAPLA, a framework that converts the adversarial learning of existing digital adversarial attacks to the physical domain. We convert two attacks (Dpatch [14] and NAP [4]) from digital learning-digital application and digital learning–physical application scenarios to a physical learning–physical application scenario and ensure the success of the attacks in the physical world. (2) We perform a detailed analysis of the factors that influence the effectiveness of E2E physical domain adversarial attacks. Specifically, we investigate environmental factors including projector strength, ambient lighting, camera distance, camera angle, and projection surface, in order to determine their impact on attack effectiveness. (3) We compare the results obtained from three distinct adversarial learning scenarios: digital learning-digital application (patch learning and application are performed in the digital domain), digital learning–physical application (patch learning is performed digitally, and printed/applied in the physical domain—the current practice for physical-domain attacks), and physical learning–physical application (our framework—patch learning and application are performed in the physical domain).

Structure. The remainder of the paper is organized as follows: In Section 2, we present the motivation for PL-PA. This is followed by a detailed explanation of our threat model and method in Section 3. We present a detailed analysis of PAPLA in Section 4, and our evaluations in Section 5. We then discuss the limitations of PL-PA in Section 6. In Section 7, we review related work in the field. We conclude with a summary of our findings and suggestions for future research in Section 8.

2. Transferability Between Digital and Physical Domains

Here, we present the motivation for converting digital adversarial attacks to E2E execution in the physical domain. We show that digitally learned patches may display reduced effectiveness when transferred to the physical domain as printed stickers. Specifically, in our experiments, we find that when patches are applied in the digital domain, the average confidence of object detectors decreases from 0.97 in the non-adversarial scenario to 0.66. However, when these patches are printed and applied in the physical domain, the average confidence only decreases to 0.90, highlighting the challenge of maintaining an attack’s effectiveness when transferring adversarial patches from the digital domain to the physical domain. In Section 2.3, we review two key factors contributing to the difficulty of transferring adversarial patches from the digital to the physical domain. First, the printing process introduces color discrepancies, with an average of 99.34% of the pixels differing between digital and printed patches. Second, environmental noise in the physical domain causes pixel values to vary significantly over short time intervals, even in controlled conditions.

2.1. Experimental Setup

We generated digital patches for five target objects: a car, a stop sign, a bottle, a cup, and a potted plant. These patches were generated using two untargeted adversarial attacks: DPatch [14] and Robust DPatch [10] (both from the ART [49] library) against the YOLOv3 [45] and Faster R-CNN [47] object detectors, respectively. We selected Faster R-CNN and YOLOv3 because the DPatch and Robust DPatch attacks were originally designed and evaluated for them.

For each patch, we used the default settings defined in ART, specifically a learning rate of 5.0, batch size of 16, maximum iterations of 1000, and varying patch sizes depending on the size of the target object and the implementation of each attack. The patches were generated on a machine equipped with an NVIDIA RTX 2080 Ti GPU, six CPU cores, and 24 GB RAM. We used the ZED2i camera to capture the objects from a distance of 0.5 m. The same camera was used for both the patch learning and application phases.

For the physical domain application, we printed the patches as stickers on Chromo 300 gsm paper using a Xerox Versant 280 printer (Xerox Corporation, Norwalk, CT, USA). (An industrial-grade printer typically found in professional print shops (approximate cost ∼$35,000). See https://www.xerox.com/en-us/digital-printing/digital-presses/xerox-versant-280 (accessed on 27 November 2025) for further specifications). We evaluated the patches’ effectiveness, i.e., the difference in the object detector’s confidence when detecting the target object before and after applying the patch, in both the digital and physical domains.

2.2. Failure of Transferability

Table 1 illustrates the effect of the digitally generated patches, both when applied digitally and when printed and placed in the physical domain, on the Faster R-CNN and YOLOv3 object detectors. The average confidences of YOLOv3 and Faster R-CNN on the clean images without patches were 0.96 and 0.98, respectively. When applying the DPatch and Robust DPatch attacks to YOLOv3 and Faster R-CNN digitally, the average confidences dropped to 0.55 and 0.77, respectively. When printing and placing the patches in the physical domain, rather than applying the patches digitally, the average confidences of YOLOv3 and Faster R-CNN rose to 0.88 and 0.93, respectively. These results demonstrate that when patches generated in the digital domain are transferred to the physical domain, they may fail to perform as intended, since the object detectors continue to detect all objects with a high level of confidence.

2.3. Cause of the Failure

Here, we conduct two experiments to explore the causes of the transferability failure between digital and physical domains.

In Section 2.3.1, we compare digital patches with their printed counterparts as stickers, analyzing how they are perceived by the camera.

In Section 2.3.2, we evaluate the impact of environmental noise on recorded consecutive frames. We captured multiple images of the same object at short intervals of 30 s in a controlled environment to examine the variability in pixel values over time.

Both experiments utilized three cameras from different categories: a smartphone camera (iPhone 16), a stereo camera (ZED2i), and a dash camera (YI Smart Dash Camera with ADAS capabilities). For the first experiment, we used three different patches generated during the experiment described in Section 2.1.

2.3.1. Differences Between Digital and Physical Patches

One key factor in the transferability issue is the difficulty in accurately reproducing the colors of digital patches when printing them for physical application. This is evident in Table 1, which shows that identical digital patches appear different when applied digitally versus physically (see also Appendix A Figure A1b,c), even when using a professional industrial printer.

Table 2 shows the pixel-level differences for three different digital patches and their printed counterparts. On average, 99.34% of the pixels differed between respective digital and printed patches, highlighting the inconsistencies caused by the printing process.

2.3.2. Differences Between Consecutive Captures of the Same Scene

An additional factor affecting patch transferability is the noise introduced by recording in the physical domain. While in the digital domain, the environment is stable (only the patches change over time), in the physical domain, the majority of pixels in the image change over time, adding noise that can affect the patches’ performance. This is demonstrated in Table 3. For three camera models, four images of the same scene are captured in a controlled laboratory environment under constant conditions at 30 s intervals. No projections (e.g., performing PAPLA) were made during the recording. For each camera, we measured the

L_{2}

,

L_{\infty}

, and

L_{0}

values of consecutive images. We found high values of

L_{2}

,

L_{\infty}

, and

L_{0}

between consecutive images, averaging 8264.40, 74.67, and 78.26%, respectively, which demonstrate that external environmental factors introduce significant noise in the physical domain, even under controlled conditions.

These findings and insights indicate that in order to ensure the success of adversarial attacks, the learning process for adversarial patches must be performed in the physical domain. This is because adversarial learning in the digital domain does not ensure similar performance when applied in the physical domain.

3. Threat Model and Method

Here, we outline the threat model and methodology for adversarial learning in the PL-PA scenario. We define the adversary’s capabilities, justify the approach, and detail PAPLA, our framework for generating and applying adversarial patches under real-world conditions in the physical domain.

3.1. Threat Model

We assume that an adversary is interested in performing an evasion attack in the physical domain to hide an object from an object detector. The purpose of the attack is to produce a patch through physical domain learning, and thus ensure the hiding of an object from the object detector in the physical environment. Furthermore, we assume that the target object is static and has a suitable surface for projection. A potential use case for this scenario is hiding parked vehicles and road signs from the detection systems of autonomous vehicles.

3.1.1. Attacker’s Capabilities and Knowledge

We assume that the adversary has access to a position with a visual line of sight to the target object (the object that the attacker wants to hide) to allow projection of the adversarial patch. The target object is assumed to possess a suitable surface for projecting a patch. In addition, we assume the adversary can position a projector to project a patch on the target object, as well as a camera to capture the scene with the projected patch (see Figure 2).

3.1.2. Extension of Previously Evaluated Threat Model

We note that state-of-the-art methods have adopted a similar threat model, employing projectors to apply adversarial attacks in the physical domain [37,38,39,44]. Our approach extends this threat model by assuming that the adversarial learning process itself can also be conducted entirely in the physical domain using a projector.

3.1.3. Significance

The significance of our threat model is that, unlike previous methods where adversarial learning and attack application were conducted in different domains (e.g., [4,8,9,10,11,13,14,15,16,17,18,19,20,21,22,23,24,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42], we conduct both adversarial learning and attack application in the physical domain. This eliminates the transferability challenges between digital and physical domains, ensuring that successful learning directly results in a successful attack.

3.2. Method—PAPLA

Here we review the methodology of the Physical-domain Adversarial Patch Learning Augmentation (PAPLA) framework, employed to conduct adversarial patch learning in the PL-PA scenario. Our approach is to adapt existing adversarial methods, where the learning process is carried out in the digital domain, to be carried out E2E in the physical domain. Unlike prior methods, where the patch learning phase occurs before the application phase, our approach performs both phases simultaneously. Adversarial patch learning in the PL-PA scenario deploys the latest iteration’s patch at the target object and, according to the adversarial method selected, updates the patch based on physical conditions before deployment. This improves the performance of the patch application phase since it is integrated into the learning process.

PAPLA’s adversarial patch learning process is summarized in Algorithm 1 and illustrated in Figure 2; it proceeds as follows.

Iterative Learning: In this phase, the framework generates a random digital patch using an existing digital attack method (e.g., DPatch [14] or NAP [4]). The patch is then iteratively optimized according to the chosen attack, using footage from the physical domain:
(a)
Apply (project) the patch onto the target object using a projector.
(b)
Capture the physical scene containing the object and the projected patch with a camera.
(c)
Update the patch pixels iteratively using the chosen attack, incorporating the physical conditions to maximize the adversarial effect.
Attack Application: After the patch is fully optimized, it is projected onto the target object in the physical environment to mislead the object detector.

Algorithm 1 PAPLA

Require:: Detector $D$ , Projector $P$ , Camera $C$ , target object; initial patch $P_{0}$ ; max iters T; target threshold $τ$ ▹ stopping threshold on confidence

1:: for $t = 0$ to $T - 1$ do
2:: Project $P_{t}$ onto the target object with $P$
3:: Capture frame $I_{t} \leftarrow$ Capture $(C)$
4:: Infer detector confidence $c_{t} \leftarrow$ Confidence $(D, I_{t})$
5:: if $c_{t} \leq τ$ then break
6:: end if
7:: Update patch $P_{t + 1} \leftarrow$ UpdatePatch_attack $(P_{t}, I_{t}, D)$ ▹ e.g., NAP/DPatch
8:: end for
9:: Apply the final patch $P_{t}$ by projection to execute the attack

PAPLA wraps existing adversarial attack methods, originally designed for adversarial learning in the digital domain, and transforms them into a framework for physical-domain adversarial learning. This transformation is achieved by projecting the patches onto the target object and iteratively capturing images of the physical scene using a camera. By integrating the physical conditions directly into the learning process, PAPLA ensures that the adversarial attack is optimized for real-world scenarios.

4. Analysis

Here, we analyze the factors influencing the success of PL-PA. We investigate the impact of environmental conditions and surface characteristics. We aim to identify the key elements that contribute to the effectiveness and robustness of adversarial patches under various physical-domain learning scenarios, as well as the physical constraints under which E2E adversarial learning in the physical domain demonstrates reduced effectiveness. The experiments presented in this section were conducted in a controlled laboratory environment to allow us to control and isolate the examined factors.

4.1. Impact of Environmental Factors on Attack Success

Here, we analyze how various environmental factors affect PAPLA’s success in improving confidence score reduction. Specifically, we perform the Dpatch [14] attack on a car against YOLOv3, with the learning process carried out in the physical domain, and examine the effect of the following factors: projector strength, ambient light (measured in lux), distance, and the angle of the target object in relation to the camera. We test the effect of each factor in combination with the others. In total, we performed 81 different runs to analyze each factor’s impact on the attack’s success.

4.1.1. Experimental Setup

We examine several values for each environmental factor (projector strength, ambient light, distance, and target object angle) to analyze the impact of each factor on PAPLA’s success. Figure 3 provides an overview of the experimental setup. We used three types of projectors: Innova HD-9 with a light output of 1800 ANSI lumens, Philips NeoPix Prime One NPX525 with a light output of 3000 ANSI lumens, and EIKI EK-308U with a light output of 6000 ANSI lumens. Additionally, we tested three ambient light levels: 100 lux, 200 lux, and 400 lux, measured using the Extech HD450 light meter. Furthermore, we analyzed three distances and three different angles of the camera in relation to the object. We tested distances of 0.5, 1.0, and 1.5 m. For angles, we analyzed 0°, 20°, and

- 20

° (340°). We used default parameters from the ART library for the DPatch attack. The camera used was the ZED 2i Stereo Camera. The size of the car (a Range Rover Sport miniature) was

33 \times 11

cm, and the size of the patches in this analysis was

4.5 \times 4.5

cm.

4.1.2. Results

The results of our experiments, presented in Figure 3, highlight several key insights into the factors affecting the confidence reduction percentage of PAPLA on a car in the physical domain. To understand the effect of each factor, we used box plots (see Figure 4) and ANOVA analysis, which provided a clear visualization and statistical significance of the impact of projector strength, distance, angle, and ambient light on the confidence reduction percentage.

Projector Strength. The projector strength appears to have a moderate impact on PAPLA’s confidence reduction percentage. The ANOVA results show a p-value of approximately 0.0908, indicating a trend approaching significance, but not a strong effect. Figure 4a shows that the EIKI projector has the highest median confidence reduction percentage at approximately 29.60%, followed by the Philips projector at 9.18%, and the Innova projector at 6.74%. This suggests that the EIKI projector, with the highest light output of 6000 ANSI lumens, is the most effective.

Ambient Light. Ambient light appears to have a relatively minor impact on the confidence reduction percentage. The ANOVA results show a p-value of approximately 0.332, indicating a non-significant effect. Figure 4b indicates overlapping distributions for different lux levels, with medians of 12.67% for 100 lux, 10.56% for 200 lux, and 8.86% for 400 lux. This indicates no clear trend of ambient light levels significantly affecting the confidence reduction percentage.

Camera Distance. The distance of the camera from the attacked object has a significant impact on the confidence reduction percentage. The ANOVA results show a very low p-value (approximately

2.35 \times

10^{- 15}

), indicating a highly significant effect. Figure 4c shows that the median confidence reduction percentage increases substantially with distance: 0.65% at 0.5 m, 15.91% at 1 m, and 76.53% at 1.5 m. We note that beyond the tested range (greater than 1.5 m), the object is not detected at all, regardless of whether a patch is projected or not.

Angle. The angle of the camera in relation to the attacked object shows a moderate effect on the PAPLA’s confidence reduction percentage. The ANOVA results provide a p-value of approximately 0.112, indicating some effect but not highly significant. Figure 4d shows that the median confidence reduction percentage is highest at 0° (45.49%), followed by

- 20

° (10.51%) and 20° (6.34%). This suggests that a 0° angle tends to yield a higher confidence reduction percentage.

4.2. Effect of Surface Color on Patch Effectiveness

In this part, we analyze the impact of surface color on PAPLA’s effectiveness in reducing object detection confidence.

4.2.1. Experimental Setup

To demonstrate the projection surface color’s effect on PAPLA’s effectiveness, we conducted the NAP [4] attack using PAPLA’s E2E physical domain learning framework. The attack was applied to six identical ceramic cups, with the only difference being their color, as seen in Figure 5. For each color, we measured the percentage difference in the confidence score returned by the Faster R-CNN object detector between the clean cup (without patch projection) and the cup with the adversarial patch projected and learned E2E in the physical domain using PAPLA. For each color, we ran the attack three times and averaged the results.

This analysis was performed on a TITAN X Pascal machine with four CPU cores and 32 GB of RAM. A Stereolabs ZED2i camera was used to capture the scene, while an EIKI EK-308U projector was used to project the patch in the physical domain. The same parameters were applied across all scenarios, following the default settings from the original attack implementation, with a slight modification: each patch learning process was limited to 50 epochs instead of the original 100. Environmental conditions, including camera position, distance, patch size, and ambient lighting, were kept constant throughout all runs. The cup was placed 0.85 m from the camera at a 0° angle, with a

4 \times 4

cm patch applied. The ambient light was maintained at 100 lux, measured using an Extech HD450 light meter (Extech Instruments, Woburn, MA, USA).

4.2.2. Results

The results, as illustrated in Figure 5, demonstrate how the color of the projection surface impacts the success of the adversarial patch attack. The lighter surfaces (white, light grey, and yellow) yielded the most significant reductions, achieving a 100% confidence score drop. In contrast, darker surfaces (green, orange, and blue) showed reduced effectiveness, with confidence score decreases of 78.44%, 46.22%, and 44.76%, respectively. This suggests that lighter colors allow the adversarial patch to be projected more clearly, enhancing its effectiveness. While PAPLA reduces detection confidence across all tested surfaces, the results highlight a variation in performance according to the color of the surface.

5. Evaluations

This section reviews the evaluations conducted for our proposed framework, PAPLA. All evaluations (except Section 5.5) were performed in a controlled laboratory environment to maintain consistency and reproducibility. The final evaluation, in Section 5.5, was conducted outdoors to assess the framework’s performance in realistic conditions. Specifically, we evaluate the DPatch [14] and NAP [4] attacks after applying PAPLA. We evaluate the effectiveness of these attacks in three scenarios. In the digital learning-digital application (DL-DA) scenario, both patch learning and attack execution occur in the digital domain. In the digital learning–physical application (DL-PA) scenario, the patch is learned in the digital domain and applied as a printed sticker in the physical domain. Finally, in the physical learning–physical application (PL-PA) scenario, which is our proposed framework, both the patch learning and attack execution are performed E2E in the physical domain. We use the DL-PA scenario as a baseline to compare with PAPLA’s physical domain performance. We also evaluate patch transferability across different object detectors.

5.1. Robustness Against Various Target Objects

We evaluate PAPLA’s robustness in improving DL-PA adversarial attack performance on various target objects.

5.1.1. Experimental Setup

We evaluate PAPLA on four target objects: a stop sign, a car, a potted plant, and a cup. All evaluations were performed on a TITAN X Pascal machine with four CPU cores and 32 GB RAM. We used a camera with two lenses for the physical domain experiments: the Stereolabs ZED2i to capture the scene and an EIKI EK-308U projector to project the patch in the physical domain. The digital patches were printed as stickers on Chromo 300 gsm paper using a Xerox Versant 280 printer. We used identical parameters in each scenario, specifically the default settings defined in the original attack implementations. We also ensured consistency in the environmental conditions, including ambient lighting, which was maintained at 100 lux and measured using an Extech HD450 light meter. Camera position, distance, and patch size were consistent across both digital and physical domains. The car and stop sign were captured from a distance of 1.5 m, the potted plant was captured from 0.6 m, and the cup was captured from 0.5 m; these distances were chosen to ensure a high confidence score from the object detector (around 1.0) for the clean objects. All objects were captured from a 0° angle relative to the camera. The sizes of the objects are as follows: the car (a Range Rover Sport miniature) was

33 \times 11

cm, the stop sign was

30 \times 30

cm, the potted plant was

34 \times 20

cm, and the cup was

10 \times 11

cm. The patch sizes were

4.5 \times 4.5

cm for the car,

10 \times 10

cm for the stop sign,

4 \times 4

cm for the potted plant, and

4 \times 4

cm for the cup. The recorded scene in the physical domain was identical to that in the digital domain. Each experiment was conducted 10 times, and the reported confidence scores represent the average across these runs for each scenario. PAPLA was evaluated in both stereoscopic and monocular scenarios. In the monocular scenario, we evaluate the object detector’s confidence score for each target object observed by a single lens. In the stereoscopic scenario, we use the highest confidence score returned from the two lenses of the camera, allowing us to determine whether the object was evaded in a scene that involves communication between the stereoscopic camera and the object detector.

5.1.2. Results

The results are presented in Figure 6. For all evaluated target objects (car, stop sign, potted plant, cup), the PL-PA scenario shows a significant reduction in confidence scores, highlighting the effectiveness and potential of E2E learning in the physical domain.

For the DPatch attack on YOLOv3 (Figure 6a,b) in the monocular camera setup, the PL-PA scenario reduced the confidence score of the stop sign to 0, the car to 0.04, the potted plant to 0, and the cup to 0, compared to higher scores for the DL-PA scenario (1.00 for the stop sign, 1.00 for the car, 0.99 for the potted plant, and 0.96 for the cup). Similarly, in the stereoscopic camera setup, the PL-PA scenario reduced confidence scores of the stop sign (0.13), car (0.29), potted plant (0.04), and cup (0), compared to 1.00, 1.00, 0.99, and 1.00, respectively, for the DL-PA scenario.

For the NAP attack on Faster R-CNN (Figure 6c,d) in the monocular camera setup, the PL-PA scenario reduced the confidence scores of the cup to 0, the car to 0.12, and the potted plant to 0.46. The stereoscopic camera setup also produced a confidence reduction; the car’s score was reduced to 0.23, the cup’s to 0.74, and the potted plant’s to 0.57. This contrasts with the significantly higher confidence scores for the DL-PA scenario (0.99 for the car, 1.00 for the cup, and 0.90 for the potted plant). For both camera setups in the PL-PA scenario, the stop sign confidence scores were slightly higher than the other target objects but remained lower than those in the DL-PA scenario.

In conclusion, the PL-PA scenario, where both patch learning and attack execution occur in the physical domain, is more effective at evading object detection compared to the DL-PA scenario, in both monocular and stereoscopic camera setups.

5.2. Robustness Against Various Object Detectors

Here, we evaluate the robustness of PAPLA’s ability to improve the performance of DL-PA adversarial attack methods on various object detectors.

5.2.1. Experimental Setup

We evaluated PAPLA’s performance at improving the NAP [4] attack’s confidence score reduction on three object detectors: Faster R-CNN [47], RetinaNet [50], and SSD [51]. The target object in this evaluation was a potted plant. We evaluated the performance of the NAP [4] attack under four different scenarios: non-adversarial (a clean potted plant), DL-DA, DL-PA, and PL-PA. Each experiment was conducted 10 times, and the reported confidence scores represent the average across these runs for each scenario. We used the same experimental setup as described in Section 5.1. The potted plant had a size of

34 \times 20

cm, with a patch size of

4 \times 4

cm, and was captured from a distance of 0.6 m at a 0° angle to the camera.

5.2.2. Results

The results of this evaluation, as presented in Figure 7, demonstrate the effectiveness of the NAP attack across different object detectors: Faster R-CNN, RetinaNet, and SSD.

For Faster R-CNN, in the non-adversarial scenario, the confidence score was 0.93, dropping to 0.81 in the DL-DA scenario. In the DL-PA scenario, the confidence score remained relatively high at 0.90. However, in the PL-PA scenario, the confidence score dropped to 0.46, showing the effectiveness of the PL-PA approach.

For RetinaNet, the confidence score was 0.95 in the non-adversarial scenario, 0.80 in the DL-DA scenario, and 0.86 in the DL-PA scenario. In the PL-PA scenario, the confidence score dropped to 0.34, indicating the robustness of PAPLA’s PL-PA confidence reduction.

Lastly, for SSD, the non-adversarial confidence score was 0.72, 0.60 in the DL-DA scenario, and 0.68 in the DL-PA scenario. In the PL-PA scenario, the confidence score dropped to 0.38, a significant reduction compared to the non-adversarial and DL-PA setups.

These results indicate that the PL-PA scenario consistently reduces confidence scores across all evaluated object detectors compared to the DL-PA scenario.

In conclusion, the PL-PA scenario leveraged by PAPLA improves the confidence reduction in attacks normally performed in the DL-PA scenario across various object detectors.

5.3. Evaluating Image Quality Using $L_{2}$ and $L_{\infty}$

Here, we evaluate the image quality of the recorded scenes in each adversarial patch learning scenario: DL-DA, DL-PA, and PL-PA (PAPLA). We use

L_{2}

and

L_{\infty}

norms to assess and compare the quality of the results across the three scenarios. We examine the image quality of the attacks conducted in Section 5.1 and Section 5.2, using a clean object image without a patch as a baseline for comparison. The experimental setup remains consistent with the settings used in Section 5.1 and Section 5.2.

Results

Figure 8 presents the image qualities obtained in the three scenarios (DL-DA, DL-PA, and PL-PA) using

L_{2}

and

L_{\infty}

norms.

According to the

L_{2}

norm, the PL-PA scenario (PAPLA) consistently produced the highest norm values, with an average of 26,295.67, indicating that this scenario produced the largest differences compared to the clean object images. This was followed by the DL-DA scenario, with an average

L_{2}

norm of 19,060.69, and the DL-PA scenario, which produced the lowest average

L_{2}

norm at 6606.05. This suggests that scenes captured in the DL-PA scenario closely resemble the original scene than the other two scenarios. This could be due to using a projector-based patch application in the PL-PA scenario, with light emanation affecting the surrounding scene beyond the areas of the patch.

When examining the

L_{\infty}

norm, the DL-DA scenario had the highest average value at 232.69, indicating the greatest single-pixel differences between the images and their clean counterparts. The PL-PA scenario had a slightly lower

L_{\infty}

norm of 219.31, while DL-PA had the lowest

L_{\infty}

norm at 49.44. These results further highlight that DL-PA results in the least distortion in terms of both maximum pixel differences and overall image differences.

In conclusion, while PAPLA’s PL-PA demonstrates enhanced robustness in reducing detection confidence scores and evading object detection, the lighting emitted by the projector in the physical domain introduces more significant changes to the image.

5.4. Transferability of Patches Across Object Detectors

Here, we evaluate the transferability of adversarial patches created using PAPLA across different object detectors. Our goal is to examine the effectiveness of patches intended for specific attacks (NAP against Faster R-CNN and DPatch against YOLOv3) when tested on various object detectors, including SSD, RetinaNet, and YOLOv11. We compare the results with those obtained from a DL-DA scenario. This comparison allows us to analyze the effectiveness and transferability of physical patches in real-world settings.

5.4.1. Experimental Setup

We created adversarial patches targeting specific object detectors using (1) PAPLA, our PL-PA framework, and (2) using the original DL-DA attacks. The patches were generated for the NAP attack against Faster R-CNN and the DPatch attack against YOLOv3, then tested on various object detectors to assess transferability in both PL-PA and DL-DA settings. We evaluated four target objects: a potted plant, a car, a stop sign, and a cup. For each target object, we performed ten runs to calculate the average clean and patched detection confidence scores. We measured the percentage difference in confidence scores between clean and patched objects for each detector. For further details on the experimental setup, see the experimental setup description in Section 5.1.1.

5.4.2. Results

The results are summarized in Table 4 and Table 5. These tables highlight the effectiveness of the NAP and DPatch attacks when using PAPLA, comparing the percentage difference in detection confidence between clean and patched objects for both scenarios across various object detectors.

For patches generated by the NAP attack targeting Faster R-CNN, the average percentage confidence difference was 39.1% in the DL-DA scenario and 52.4% in the PL-PA scenario, indicating greater transferability success across different object detectors in the PL-PA scenario. The detailed results show varying effectiveness across different object detectors: For the potted plant, the NAP patch reduced the confidence score by 41.6% on Faster R-CNN in the PL-PA scenario, with similar effectiveness on RetinaNet (40.4%) and a complete reduction (100.0%) on YOLOv11. The DL-DA scenario showed lower effectiveness, with the highest reduction at 40.8% on YOLOv11. For the car object, the PL-PA scenario demonstrated a 100.0% reduction on Faster R-CNN and SSD, with moderate reductions on RetinaNet (23.9%) and YOLOv3 (47.1%). In the DL-DA scenario, the patch achieved a 39.2% reduction on SSD and a complete confidence reduction (100.0%) on YOLOv3, indicating a stronger effect on YOLOv3 in DL-DA compared to PL-PA. The stop sign object showed varied results across object detectors. In the PL-PA scenario, the patch achieved a 100.0% reduction on SSD and a moderate reduction on YOLOv3 (32.7%), while producing minimal reduction on RetinaNet (0.1%) and a 14.9% reduction on Faster R-CNN. In the DL-DA scenario, SSD had a reduction of 45.8%, with similar low reductions on RetinaNet (5.2%) and a slightly stronger effect on Faster R-CNN (28.5%) compared to PL-PA. For the cup, the PL-PA patches achieved high confidence reductions on most object detectors, including 100.0% on Faster R-CNN, SSD, and YOLOv11, while DL-DA also achieved complete confidence reduction on SSD, YOLOv11, and RetinaNet.

For the DPatch attack targeting YOLOv3, the results showed varying effectiveness, with average reductions of 19.8% in the DL-DA scenario and 35.6% in the PL-PA scenario. Detailed analysis of each object reveals some notable differences: For the potted plant, the PL-PA patch was more effective on YOLOv3 (71.0%) and YOLOv11 (100.0%), compared to the DL-DA scenario, which had a maximum reduction of 32.9% on YOLOv11. The car object showed high effectiveness in the PL-PA scenario on YOLOv3, with an 84.3% reduction, compared to the DL-DA scenario with a 16.5% reduction. SSD showed low confidence reductions across both scenarios, indicating its robustness against the DPatch attack for this target object. For the stop sign, the PL-PA scenario achieved complete confidence reduction on YOLOv3, while the DL-DA scenario had a minimal impact across all object detectors, with a maximum reduction of 12.1% on YOLOv3. The cup object displayed high reductions in both scenarios, with 84.1% reduction on YOLOv3 in the PL-PA scenario and 57.3% in the DL-DA scenario. YOLOv11 was vulnerable in both scenarios, showing a 100.0% reduction both times.

5.5. Performance in the Real World

Here, we evaluate the performance of PAPLA in real-world outdoor environments. We analyze the learning process and effectiveness of adversarial patches applied to objects under realistic environmental conditions. We perform the NAP [4] attack against the YOLOv4 object detector for evaluation.

5.5.1. Experimental Setup

The experimental setup for the outdoor environment is depicted in Figure 9. A projector, camera, and PC were set up outdoors to simulate realistic environmental conditions for patch projection and detection. Two target objects were evaluated: a parked car and a stop sign.

5.5.2. Results

The results of the experiments in the outdoor environment are presented in Figure 9. The learning process of the adversarial patches, including changes in confidence scores over successive iterations, can also be observed in a demonstration video (https://www.youtube.com/watch?v=AtambR-sJD4 (accessed on 27 November 2025)).

For the parked car, the confidence score in the non-adversarial scenario was 0.95. By the final iteration, the parked car was no longer detected, with a confidence score of 0. Similarly, for the stop sign, the confidence score started at 0.95 in the non-adversarial scenario and decreased to 0.39 in the last iteration.

We observed “noisy learning" in these experiments. In the outdoor environment, rather than a steady decrease from a high initial confidence score, there are fluctuations. This phenomenon is further discussed in Section 6.

6. Limitations & Constraints

While PAPLA can ensure the success of an adversarial attack, it has the following limitations:

(1) Noisy Learning in the Physical Domain. Table 3 shows significant pixel value changes over short time intervals, even in a controlled, consistent physical environment, due to external factors. As a result, the learning process in the physical domain introduces noise, as illustrated in Figure 9.

(2) Ineffective Against Moving or Hollow Objects PAPLA cannot be used with moving objects, as the projector cannot adjust the patch to match target movement in real time. This limitation is also relevant for applications such as evading face recognition systems, where faces are typically moving targets and real-time adaptive projection is infeasible. Similarly, the method does not work with hollow objects (e.g., bikes, scissors), as they lack a suitable projection surface. Therefore, projecting a patch onto such items is not possible. In these cases, printed patches have a clear advantage over PAPLA.

(3) Constraints of the Threat Model. PAPLA requires a line-of-sight to the target object to project the adversarial patch during the learning process. Compared to printed patch approaches, PAPLA yields higher

L_{2}

and

L_{\infty}

norms, resulting in more visually conspicuous patches that are less naturally integrated into the scene. These factors may attract attention and raise suspicion.

7. Related Work

This section reviews related work on physical-domain model evasion attacks, focusing on Patch-based and Projection-based methods.

Patch-based physical evasion attacks generate a patch in the digital domain, which is then printed and deployed in the physical domain. These attacks pose a significant threat to real-time object detection systems, as they require only the patch’s deployment, not access to the object detector itself [4,8,17,18,19,20,22,24,27,28,29,30,32,33,34,35]. Many methods have been proposed to deploy adversarial patches on shirts [4,17,18,19,22,24,29,30,32,33,34,35] in order to avoid detection by object detectors. Physical adversarial patches have also been applied to items such as hats [41] and glasses [40,52,53]), or as makeup [42,54,55] in order to evade detection by facial recognition systems.

Projection-based attacks deploy adversarial perturbations using lasers [56,57], lights [58,59,60,61,62], or projectors [37,38,39,44]. These methods include projecting virtual objects to deceive advanced driver assistance systems (ADASs) [38,44], using colored light to misclassify objects [39], and projecting infrared light to manipulate ADAS perception [62]. The objectives of these methods range from evading detection by facial recognition systems [58,59] to hiding objects from detection by ADASs [37,38,39,44,56,57,61,62].

8. Conclusions & Discussion

This work introduces PAPLA, a novel framework for adversarial learning in the PL-PA setting. The findings of this work are not intended to argue against previous work using traditional digital learning and physical application for adversarial attacks. This work demonstrates a new approach aimed at ensuring that adversarial attacks using perturbations generated in the digital domain do not degrade when applied in the physical world.

Returning to the initial research question: under what constraints might PAPLA outperform the traditional digital learning approach? PAPLA is preferred when (1) a clear line of sight to the target exists for projecting a patch, (2) the object is static, and (3) the object has a surface suitable for projection (e.g., not a hollow object like a bicycle). Use cases include autonomous vehicles detecting parked cars or road signs. In contrast, if the object is moving, lacks a suitable surface or line of sight, or when the quality of the patch is a priority, traditional digital learning approaches that produce printed patches are preferred.

While traditional printed patch attacks can be limited in their real-world success, our work demonstrates that projector-based adversarial attacks like PAPLA can be highly effective under certain conditions. This highlights a significant real-world implication and a potential security vulnerability. From a safety perspective, these findings underscore the need to develop robust mitigation and detection methods for this type of attack to ensure the safety and reliability of critical systems, such as autonomous vehicles.

Future Work

Future work could explore (1) evaluating PAPLA’s robustness against existing adversarial patch defenses [63,64], and, where necessary, designing a dedicated classifier (e.g., CNN-based) to detect projector-based adversarial patches; (2) systematically analyzing projector specifications—resolution, projection distance, and contrast ratio—to quantify their effect on projection-based learning and application; (3) analyzing the computational efficiency and scalability of PAPLA’s physical iterative learning, including convergence behavior (iterations and wall-clock time) when scaling from single-target indoor setups to larger scenes or multiple concurrent targets, and comparing these costs to traditional DL-DA/DL-PA pipelines under matched conditions; (4) building on Section 5.3, isolate projector-induced changes (e.g., color shifts, non-uniformity, flicker), evaluate mitigation (calibration, hardware), and assess sensitivity to lighting and time; and (5) applying self-supervised and progressive domain adaptation [65,66] to stabilize PAPLA’s learning under dynamic lighting, sensors, and environments, improving robustness.

Author Contributions

Conceptualization, B.N., Y.E., A.S. and S.K.; methodology, B.N.; software, D.B.; validation, B.N.; formal analysis, D.B.; investigation, D.B.; resources, S.K., A.S. and Y.E.; data curation, D.B.; writing—original draft preparation, D.B.; writing—review and editing, J.S. and B.N.; visualization, D.B.; supervision, A.S., Y.E. and B.N.; project administration, B.N.; funding acquisition, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by a joint research effort conducted by the authors and Fujitsu Limited.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Satoru Koda is employed by Fujitsu Limited. All other authors declare no conflicts of interest.

Appendix A

Figure A1. Representative adversarial patches used in our experiments. (Top) DPatch under DL-DA/DL-PA scenarios; (bottom) NAP under PL-PA (PAPLA). Additional examples of DPatch and NAP patches appear in the original works [4,14].

References

Ouardirhi, Z.; Mahmoudi, S.A.; Zbakh, M. Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches. Electronics 2024, 13, 541. [Google Scholar] [CrossRef]
Lubna; Mufti, N.; Shah, S.A.A. Automatic number plate Recognition: A detailed survey of relevant algorithms. Sensors 2021, 21, 3028. [Google Scholar] [CrossRef] [PubMed]
Juyal, A.; Sharma, S.; Matta, P. Deep learning methods for object detection in autonomous vehicles. In Proceedings of the 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 3–5 June 2021; pp. 751–755. [Google Scholar]
Hu, Y.C.T.; Kung, B.H.; Tan, D.S.; Chen, J.C.; Hua, K.L.; Cheng, W.H. Naturalistic physical adversarial patch for object detectors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 7848–7857. [Google Scholar]
Chen, J.; Jordan, M.I.; Wainwright, M.J. Hopskipjumpattack: A query-efficient decision-based attack. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1277–1294. [Google Scholar]
Guo, C.; Gardner, J.; You, Y.; Wilson, A.G.; Weinberger, K. Simple black-box adversarial attacks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2484–2493. [Google Scholar]
Brendel, W.; Rauber, J.; Bethge, M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv 2017, arXiv:1712.04248. [Google Scholar]
Athalye, A.; Engstrom, L.; Ilyas, A.; Kwok, K. Synthesizing Robust Adversarial Examples. arXiv 2018, arXiv:1707.07397. [Google Scholar] [CrossRef]
Song, D.; Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Tramer, F.; Prakash, A.; Kohno, T. Physical adversarial examples for object detectors. In Proceedings of the 12th USENIX Workshop on Offensive Technologies (WOOT 18), Baltimore, MD, USA, 13–14 August 2018. [Google Scholar]
Lee, M.; Kolter, Z. On physical adversarial patches for object detection. arXiv 2019, arXiv:1906.11897. [Google Scholar] [CrossRef]
Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1625–1634. [Google Scholar]
Katzav, R.; Giloni, A.; Grolman, E.; Saito, H.; Shibata, T.; Omino, T.; Komatsu, M.; Hanatani, Y.; Elovici, Y.; Shabtai, A. Adversarialeak: External information leakage attack using adversarial samples on face recognition systems. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 4 October–29 September 2024; pp. 288–303. [Google Scholar]
Chen, S.T.; Cornelius, C.; Martin, J.; Chau, D.H. Shapeshifter: Robust physical adversarial attack on faster R-CNN object detector. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, 10–14 September 2018; pp. 52–68. [Google Scholar]
Liu, X.; Yang, H.; Liu, Z.; Song, L.; Li, H.; Chen, Y. Dpatch: An adversarial patch attack on object detectors. arXiv 2018, arXiv:1806.02299. [Google Scholar]
Zhang, Y.; Foroosh, H.; David, P.; Gong, B. CAMOU: Learning physical vehicle camouflages to adversarially attack detectors in the wild. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Thys, S.; Van Ranst, W.; Goedemé, T. Fooling automated surveillance cameras: Adversarial patches to attack person detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
Xu, K.; Zhang, G.; Liu, S.; Fan, Q.; Sun, M.; Chen, H.; Chen, P.Y.; Wang, Y.; Lin, X. Adversarial t-shirt! evading person detectors in a physical world. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part V 16; Springer: Cham, Switzerland, 2020; pp. 665–681. [Google Scholar]
Huang, L.; Gao, C.; Zhou, Y.; Xie, C.; Yuille, A.L.; Zou, C.; Liu, N. Universal physical camouflage attacks on object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 720–729. [Google Scholar]
Wu, Z.; Lim, S.N.; Davis, L.S.; Goldstein, T. Making an invisibility cloak: Real world adversarial attacks on object detectors. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IV 16; Springer: Cham, Switzerland, 2020; pp. 1–17. [Google Scholar]
Zolfi, A.; Kravchik, M.; Elovici, Y.; Shabtai, A. The translucent patch: A physical and universal attack on object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15232–15241. [Google Scholar]
Jing, P.; Tang, Q.; Du, Y.; Xue, L.; Luo, X.; Wang, T.; Nie, S.; Wu, S. Too good to be safe: Tricking lane detection in autonomous driving with crafted perturbations. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual Event, 11–13 August 2021; pp. 3237–3254. [Google Scholar]
Tan, J.; Ji, N.; Xie, H.; Xiang, X. Legitimate adversarial patches: Evading human eyes and detection models in the physical world. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 5307–5315. [Google Scholar]
Suryanto, N.; Kim, Y.; Kang, H.; Larasati, H.T.; Yun, Y.; Le, T.T.H.; Yang, H.; Oh, S.Y.; Kim, H. DTA: Physical camouflage attacks using differentiable transformation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15305–15314. [Google Scholar]
Hu, Z.; Huang, S.; Zhu, X.; Sun, F.; Zhang, B.; Hu, X. Adversarial texture for fooling person detectors in the physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13307–13316. [Google Scholar]
Biton, D.; Misra, A.; Levy, E.; Kotak, J.; Bitton, R.; Schuster, R.; Papernot, N.; Elovici, Y.; Nassi, B. The Adversarial Implications of Variable-Time Inference. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, Copenhagen, Denmark, 30 November 2023; pp. 103–114. [Google Scholar]
Jia, W.; Lu, Z.; Zhang, H.; Liu, Z.; Wang, J.; Qu, G. Fooling the eyes of autonomous vehicles: Robust physical adversarial examples against traffic sign recognition systems. arXiv 2022, arXiv:2201.06192. [Google Scholar] [CrossRef]
Huang, H.; Chen, Z.; Chen, H.; Wang, Y.; Zhang, K. T-SEA: Transfer-based self-ensemble attack on object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20514–20523. [Google Scholar]
Zhu, W.; Ji, X.; Cheng, Y.; Zhang, S.; Xu, W. {TPatch}: A Triggered Physical Adversarial Patch. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 661–678. [Google Scholar]
Hu, Z.; Chu, W.; Zhu, X.; Zhang, H.; Zhang, B.; Hu, X. Physically realizable natural-looking clothing textures evade person detectors via 3d modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16975–16984. [Google Scholar]
Guesmi, A.; Ding, R.; Hanif, M.A.; Alouani, I.; Shafique, M. Dap: A dynamic adversarial patch for evading person detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 24595–24604. [Google Scholar]
Wei, H.; Wang, Z.; Zhang, K.; Hou, J.; Liu, Y.; Tang, H.; Wang, Z. Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
Cheng, Z.; Hu, Z.; Liu, Y.; Li, J.; Su, H.; Hu, X. Full-Distance Evasion of Pedestrian Detectors in the Physical World. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
Zhu, X.; Hu, Z.; Huang, S.; Li, J.; Hu, X. Infrared invisible clothing: Hiding from infrared detectors at multiple angles in real world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13317–13326. [Google Scholar]
Wei, H.; Wang, Z.; Jia, X.; Zheng, Y.; Tang, H.; Satoh, S.; Wang, Z. Hotcold block: Fooling thermal infrared detectors with a novel wearable design. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 15233–15241. [Google Scholar]
Wei, X.; Yu, J.; Huang, Y. Infrared adversarial patches with learnable shapes and locations in the physical world. Int. J. Comput. Vis. 2024, 132, 1928–1944. [Google Scholar] [CrossRef]
Zhu, X.; Liu, Y.; Hu, Z.; Li, J.; Hu, X. Infrared Adversarial Car Stickers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 24284–24293. [Google Scholar]
Lovisotto, G.; Turner, H.; Sluganovic, I.; Strohmeier, M.; Martinovic, I. {SLAP}: Improving physical adversarial examples with {Short-Lived} adversarial perturbations. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual Event, 11–13 August 2021; pp. 1865–1882. [Google Scholar]
Wen, H.; Chang, S.; Zhou, L.; Liu, W.; Zhu, H. OptiCloak: Blinding Vision-Based Autonomous Driving Systems Through Adversarial Optical Projection. IEEE Internet Things J. 2024, 11, 28931–28944. [Google Scholar] [CrossRef]
Hu, C.; Shi, W.; Tian, L. Adversarial color projection: A projector-based physical-world attack to DNNs. Image Vis. Comput. 2023, 140, 104861. [Google Scholar] [CrossRef]
Hwang, R.H.; Lin, J.Y.; Hsieh, S.Y.; Lin, H.Y.; Lin, C.L. Adversarial patch attacks on deep-learning-based face recognition systems using generative adversarial networks. Sensors 2023, 23, 853. [Google Scholar] [CrossRef]
Komkov, S.; Petiushko, A. Advhat: Real-world adversarial attack on arcface face id system. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 819–826. [Google Scholar]
Lin, C.S.; Hsu, C.Y.; Chen, P.Y.; Yu, C.M. Real-world adversarial examples via makeup. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 2854–2858. [Google Scholar]
Wei, X.; Huang, Y.; Sun, Y.; Yu, J. Unified adversarial patch for cross-modal attacks in the physical world. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4445–4454. [Google Scholar]
Nassi, B.; Mirsky, Y.; Nassi, D.; Ben-Netanel, R.; Drokin, O.; Elovici, Y. Phantom of the adas: Securing advanced driver-assistance systems from split-second phantom attacks. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, 9–13 November 2020; pp. 293–308. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Choi, J.I.; Tian, Q. Adversarial attack and defense of yolo detectors in autonomous driving scenarios. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 4–9 June 2022; pp. 1011–1017. [Google Scholar]
Nicolae, M.I.; Sinn, M.; Tran, M.N.; Buesser, B.; Rawat, A.; Wistuba, M.; Zantedeschi, V.; Baracaldo, N.; Chen, B.; Ludwig, H.; et al. Adversarial Robustness Toolbox v1.0.0. arXiv 2018, arXiv:1807.01069. [Google Scholar]
Ross, T.Y.; Dollár, G. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Pautov, M.; Melnikov, G.; Kaziakhmedov, E.; Kireev, K.; Petiushko, A. On adversarial patches: Real-world attack on arcface-100 face recognition system. In Proceedings of the 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON), Novosibirsk, Russia, 21–27 October 2019; pp. 0391–0396. [Google Scholar]
Sharif, M.; Bhagavatula, S.; Bauer, L.; Reiter, M.K. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 1528–1540. [Google Scholar]
Zhu, Z.A.; Lu, Y.Z.; Chiang, C.K. Generating adversarial examples by makeup attacks on face recognition. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2516–2520. [Google Scholar]
Yin, B.; Wang, W.; Yao, T.; Guo, J.; Kong, Z.; Ding, S.; Li, J.; Liu, C. Adv-makeup: A new imperceptible and transferable attack on face recognition. arXiv 2021, arXiv:2105.03162. [Google Scholar] [CrossRef]
Sato, T.; Bhupathiraju, S.H.V.; Clifford, M.; Sugawara, T.; Chen, Q.A.; Rampazzi, S. Invisible Reflections: Leveraging Infrared Laser Reflections to Target Traffic Sign Perception. arXiv 2024, arXiv:2401.03582. [Google Scholar] [CrossRef]
Duan, R.; Mao, X.; Qin, A.K.; Chen, Y.; Ye, S.; He, Y.; Yang, Y. Adversarial laser beam: Effective physical-world attack to dnns in a blink. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16062–16071. [Google Scholar]
Zhou, Z.; Tang, D.; Wang, X.; Han, W.; Liu, X.; Zhang, K. Invisible Mask: Practical Attacks on Face Recognition with Infrared. arXiv 2018, arXiv:1803.04683. [Google Scholar] [CrossRef]
Shen, M.; Liao, Z.; Zhu, L.; Xu, K.; Du, X. VLA: A practical visible light-based attack on face recognition systems in physical world. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2019, 3, 103. [Google Scholar] [CrossRef]
Zhu, X.; Li, X.; Li, J.; Wang, Z.; Hu, X. Fooling thermal infrared pedestrian detectors in real world using small bulbs. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; Volume 35, pp. 3616–3624. [Google Scholar]
Yufeng, L.; Fengyu, Y.; Qi, L.; Jiangtao, L.; Chenhong, C. Light can be dangerous: Stealthy and effective physical-world adversarial attack by spot light. Comput. Secur. 2023, 132, 103345. [Google Scholar] [CrossRef]
Wang, W.; Yao, Y.; Liu, X.; Li, X.; Hao, P.; Zhu, T. I can see the light: Attacks on autonomous vehicles using invisible lights. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, 15–19 November 2021; pp. 1930–1944. [Google Scholar]
Chou, E.; Tramer, F.; Pellegrino, G. Sentinet: Detecting localized universal attacks against deep learning systems. In Proceedings of the 2020 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 21 May 2020; pp. 48–54. [Google Scholar]
Liu, J.; Levine, A.; Lau, C.P.; Chellappa, R.; Feizi, S. Segment and complete: Defending object detectors against adversarial patch attacks with robust patch detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14973–14982. [Google Scholar]
Geng, G.; Zhou, S.; Tang, J.; Zhang, X.; Liu, Q.; Yuan, D. Self-Supervised Visual Tracking via Image Synthesis and Domain Adversarial Learning. Sensors 2025, 25, 4621. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Tan, K.; Yuan, D.; Liu, Q. Progressive Domain Adaptation for Thermal Infrared Tracking. Electronics 2025, 14, 162. [Google Scholar] [CrossRef]

Figure 1. Application of adversarial patches in different learning scenarios. We applied the NAP [4] attack against the Faster R-CNN object detector in four different scenarios: (a) no application of NAP, (b) the adversarial patch was generated and applied to the object in the digital domain, (c) the patch was generated digitally and physically applied to the object as a sticker, and (d) using PAPLA, our E2E framework, the patch was generated and applied in the physical domain, causing the object detector to fail to detect the cup.

Figure 2. PAPLA learning process: an adversary points a projector and a camera at the target object and (1) projects a patch onto the object, as well as (2) captures the scene that contains the object with the projected patch. (3) The patch pixels are updated using PAPLA, and the process repeats.

Figure 3. Confidence reduction percentage for different angles, distances, ambient light levels, and projectors. Each cell shows the percentage difference between the original confidence score (without patch projection) and the confidence score with patch projection learned E2E in the physical domain.

Figure 4. Box plots illustrating the impact of each environmental factor on the confidence reduction percentage of the DPatch attack performed using PAPLA (E2E in the physical domain): (a) impact of projector strength, (b) impact of ambient light, (c) impact of distance, and (d) impact of angle. The Y-axis represents the percentage difference between the original confidence score (without patch projection) and the confidence score with patch projection using PAPLA.

Figure 5. Impact of surface color on patch projection effectiveness: Each bar corresponds to a cup of a specific color, indicated by the bar’s color. The Y-axis shows the percentage decrease in the object detection model’s confidence score when a patch is projected onto the cup, compared to the confidence score without the patch.

Figure 6. Target object confidence scores for different attack and camera setups. Sub-figures show: (a) DPatch against YOLOv3 using a monocular camera, (b) DPatch against YOLOv3 using a stereoscopic camera, (c) NAP against Faster R-CNN using a monocular camera, and (d) NAP against Faster R-CNN using a stereoscopic camera. The purple bar represents the confidence score of the object without a patch (non-adversarial), the dark blue bar represents the confidence score when the patch was learned and applied in the digital domain (DL-DA), the dark green bar represents the confidence score when the patch was learned in the digital domain and applied in the physical domain (DL-PA), and the orange bar represents the confidence score when the patch was learned and applied in the physical domain (PL-PA).

Figure 7. Performance comparison of NAP on a potted plant object against Faster R-CNN, RetinaNet, and SSD, evaluated in four scenarios: non-adversarial, DL-DA, DL-PA, and PL-PA.

Figure 8. Average

L_{2}

and

L_{\infty}

norm values for each scenario (DL-DA, DL-PA, and PL-PA) on a log scale.

Figure 8. Average

L_{2}

and

L_{\infty}

norm values for each scenario (DL-DA, DL-PA, and PL-PA) on a log scale.

Figure 9. (Left) PAPLA setup visualization. (Right) Confidence scores of parked car and stop sign targets while conducting PAPLA in an outdoor environment.

Table 1. Generated adversarial patches and their effect on the Faster R-CNN and YOLOv3 object detectors in the digital and physical domains.

Attack	Target Detector	Scenario	Avg. Conf. Score
DPatch	YOLOv3	Non-Adversarial	0.96
		Digital Learning - Digital Application	0.55
		Digital Learning - Physical Application	0.88
Robust DPatch	Faster R-CNN	Non-Adversarial	0.98
		Digital Learning - Digital Application	0.77
		Digital Learning - Physical Application	0.93

Table 2. The difference between patches applied digitally and the same patches applied physically as stickers.

Camera Model	Patch	$L_{2}$	$L_{\infty}$	$L_{0}$ (%)
ZED2i	#1	15,770.25	221	99.42
	#2	17,249.71	213	99.50
	#3	13,635.54	207	99.39
iPhone 16	#1	11,087.62	197	99.17
	#2	11,329.53	204	99.06
	#3	12,847.78	236	99.30
YI Dash Camera	#1	15,779.49	231	99.39
	#2	16,285.12	239	99.40
	#3	13,981.11	242	99.46
Average		14,218.46	221.11	99.34

Table 3. Comparison of pixel differences between consecutive images captured at 30-s intervals under consistent and controlled conditions.

Camera Model	$L_{2}$	$L_{\infty}$	$L_{0}$ (%)
ZED2i	10,371.30	45	89.69
	9981.69	85	89.59
	9844.01	87	89.22
iPhone 16	4595.37	33	79.23
	4075.02	35	68.28
	5970.31	78	73.33
YI Dash Camera	10,340.12	125	71.85
	9762.41	96	71.78
	9439.34	88	71.35
Average	8264.40	74.67	78.26

Table 4. Transferability of Digital Learning—Digital Application and Physical Learning—Physical Application Patches for NAP targeting Faster R-CNN. “–” indicates failure to detect the clean object.

Target Object	Tested Detector	Conf. Diff. (%)
Target Object	Tested Detector	DL-DA	PL-PA
Potted Plant	Faster R-CNN	18.0%	41.6%
	SSD	13.4%	3.0%
	RetinaNet	16.1%	40.4%
	YOLOv3	0%	0%
	YOLOv11	40.8%	100.0%
Car	Faster R-CNN	15.4%	100.0%
	SSD	39.2%	100.0%
	RetinaNet	0%	23.9%
	YOLOv3	100.0%	47.1%
	YOLOv11	-	-
Stop Sign	Faster R-CNN	28.5%	14.9%
	SSD	45.8%	100.0%
	RetinaNet	5.2%	0.1%
	YOLOv3	0%	32.7%
	YOLOv11	38.3%	33.3%
Cup	Faster R-CNN	59.6%	100.0%
	SSD	100.0%	100.0%
	RetinaNet	100.0%	44.9%
	YOLOv3	22.9%	13.8%
	YOLOv11	100.0%	100.0%
Average Percentage Confidence Difference		39.1%	52.4%

Table 5. Transferability of Digital Learning—Digital Application and Physical Learning—Physical Application Patches for DPatch targeting YOLOv3. “–” indicates failure to detect the clean object.

Target Object	Tested Detector	Conf. Diff. (%)
Target Object	Tested Detector	DL-DA	PL-PA
Potted Plant	YOLOv3	21.8%	71.0%
	SSD	0%	0%
	RetinaNet	6.3%	19.3%
	Faster R-CNN	7.2%	14.2%
	YOLOv11	32.9%	100.0%
Car	YOLOv3	16.5%	84.3%
	SSD	0%	4.3%
	RetinaNet	12.3%	17.2%
	Faster R-CNN	1.3%	0.1%
	YOLOv11	-	-
Stop Sign	YOLOv3	12.1%	100.0%
	SSD	0%	0%
	RetinaNet	0%	0%
	Faster R-CNN	0%	0%
	YOLOv11	4.5%	31.4%
Cup	YOLOv3	57.3%	84.1%
	SSD	100.0%	21.5%
	RetinaNet	4.0%	24.9%
	Faster R-CNN	0%	3.9%
	YOLOv11	100.0%	100.0%
Average Percentage Confidence Difference		19.8%	35.6%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Biton, D.; Shams, J.; Koda, S.; Shabtai, A.; Elovici, Y.; Nassi, B. Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World. J. Cybersecur. Priv. 2025, 5, 108. https://doi.org/10.3390/jcp5040108

AMA Style

Biton D, Shams J, Koda S, Shabtai A, Elovici Y, Nassi B. Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World. Journal of Cybersecurity and Privacy. 2025; 5(4):108. https://doi.org/10.3390/jcp5040108

Chicago/Turabian Style

Biton, Dudi, Jacob Shams, Satoru Koda, Asaf Shabtai, Yuval Elovici, and Ben Nassi. 2025. "Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World" Journal of Cybersecurity and Privacy 5, no. 4: 108. https://doi.org/10.3390/jcp5040108

APA Style

Biton, D., Shams, J., Koda, S., Shabtai, A., Elovici, Y., & Nassi, B. (2025). Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World. Journal of Cybersecurity and Privacy, 5(4), 108. https://doi.org/10.3390/jcp5040108

Article Menu

Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World †