SENTINEL: Action-Level Adversarial Defense for Autonomous Vehicles via Counterfactual Policy Verification
Abstract
1. Introduction
2. Literature Review
2.1. Adversarial Attacks Against Autonomous Vehicle Perception
2.2. Perception-Layer Adversarial Defenses
2.3. Foundation Models as External Verification
2.4. Cognition-Layer and Decision-Time Verification
2.5. Runtime Safety Shielding and Graceful Degradation
3. System Architecture and Methodology
3.1. Threat Model and Problem Formalization
3.2. Overall System Architecture
3.3. Foundation Model Verification Ensemble
3.4. Multi-Horizon Temporal Consistency Scorer
3.5. Counterfactual Policy Verifier
3.6. Risk-Adaptive Safety Shield
3.7. Calibration and Integration
4. Experimental Setup
4.1. Datasets and Simulation Platforms
4.2. Adversarial Attack Suite and Adaptive Adversary Threat Model
4.3. Baseline Defenses, Metrics, and Implementation
5. Results and Discussion
5.1. Perception-Layer Robustness
5.2. Action-Layer Safety in Closed-Loop Driving
5.3. Cross-Dataset Generalization, Latency, and Ablation
5.4. Adaptive Adversary Robustness
6. Discussion and Limitations
6.1. Mechanistic Analysis and Implications for Transportation Systems
6.2. Limitations
6.3. Hardware Validation Roadmap
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. Robust Physical-World Attacks on Deep Learning Visual Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2018; pp. 1625–1634. [Google Scholar] [CrossRef]
- Thys, S.; Van Ranst, W.; Goedemé, T. Fooling Automated Surveillance Cameras: Adversarial Patches to Attack Person Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: New York, NY, USA, 2019; pp. 49–55. [Google Scholar] [CrossRef]
- Wei, H.; Tang, H.; Jia, X.; Wang, Z.; Yu, H.; Li, Z.; Satoh, S.; Van Gool, L.; Wang, Z. Physical Adversarial Attack Meets Computer Vision: A Decade Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9797–9817. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.; Han, C.; Liang, J.; Yang, W.; Liu, D.; Zhang, L.X.; Wang, Q.; Luo, J.; Tang, R. Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA; IEEE: New York, NY, USA, 2025; pp. 6948–6958. [Google Scholar] [CrossRef]
- Wei, X.; Kang, C.; Dong, Y.; Wang, Z.; Ruan, S.; Chen, Y.; Su, H. Real-World Adversarial Defense Against Patch Attacks Based on Diffusion Model. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 11124–11140. [Google Scholar] [CrossRef] [PubMed]
- Salman, H.; Jain, S.; Wong, E.; Madry, A. Certified Patch Robustness via Smoothed Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA; IEEE: New York, NY, USA, 2022; pp. 15116–15126. [Google Scholar] [CrossRef]
- Elsken, T.; Staffler, B.; Metzen, J.H.; Hutter, F. Meta-Learning of Neural Architectures for Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA; IEEE: New York, NY, USA, 2020; pp. 12365–12375. [Google Scholar] [CrossRef]
- Carvalho, J.P.M.; Stefenon, S.F.; Leithardt, V.R.Q.; Seman, L.O.; Yow, K.C.; De Paz Santana, J.F. Input Attention, Squeeze and Excitation, and Spatial Transformer of YOLO for Fault Detection Using UAV. Ain Shams Eng. J. 2026, 17, 104067. [Google Scholar] [CrossRef]
- Liang, J.; Yi, R.; Chen, J.; Nie, Y.; Zhang, H. Securing Autonomous Vehicles’ Visual Perception: Adversarial Patch Attack and Defense Schemes With Experimental Validations. IEEE Trans. Intell. Veh. 2024, 9, 7865–7875. [Google Scholar] [CrossRef]
- Wang, W.; Qi, L.; Jie, Z. Enhanced Sensor Fusion and Adaptive Control for UAV Fire Control Systems: A Quantitative Evaluation of Graceful Degradation Under Adverse Conditions. Ain Shams Eng. J. 2025, 16, 103613. [Google Scholar] [CrossRef]
- Qian, H.; Wang, M.; Zhu, M.; Wang, H. A Review of Multi-Sensor Fusion in Autonomous Driving. Sensors 2025, 25, 6033. [Google Scholar] [CrossRef] [PubMed]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the International Conference on Machine Learning (ICML); PMLR 139; PMLR: Brookline, MA, USA, 2021; pp. 8748–8763. [Google Scholar]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. DINOv2: Learning Robust Visual Features Without Supervision. arXiv 2024, arXiv:2304.07193. [Google Scholar] [CrossRef]
- Ravi, N.; Gabeur, V.; Hu, Y.T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. SAM 2: Segment Anything in Images and Videos. In Proceedings of the International Conference on Learning Representations (ICLR); ICLR: Singapore, 2025; pp. 28085–28128. [Google Scholar] [CrossRef]
- Liu, X.; Yang, H.; Liu, Z.; Song, L.; Li, H.; Chen, Y. DPatch: An Adversarial Patch Attack on Object Detectors. In Proceedings of the AAAI Workshop on Artificial Intelligence Safety (SafeAI), Honolulu, HI, USA, 27 January 2019. [Google Scholar]
- Wang, Z.; Ma, X.; Jiang, Y.G. BadPatch: Diffusion-Based Generation of Physical Adversarial Patches. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Honolulu, HI, USA; IEEE: New York, NY, USA, 2025; pp. 6303–6313. [Google Scholar] [CrossRef]
- Xi, H.; Ru, L.; Tian, J.; Wang, W.; Zhu, R.; Li, S.; Zhang, Z.; Liu, L.; Luan, X. Towards Robust Physical Adversarial Attacks on UAV Object Detection: A Multi-Dimensional Feature Optimization Approach. Machines 2025, 13, 1060. [Google Scholar] [CrossRef]
- Xi, H.; Ru, L.; Tian, J.; Lu, B.; Hu, S.; Wang, W.; Luan, X. URAdv: A Novel Framework for Generating Ultra-Robust Adversarial Patches Against UAV Object Detection. Mathematics 2025, 13, 591. [Google Scholar] [CrossRef]
- Cao, Y. From 2D-Patch to 3D-Camouflage: A Review of Physical Adversarial Attack in Object Detection. Electronics 2025, 14, 4236. [Google Scholar] [CrossRef]
- Kim, T.H.; Krichen, M.; Alamro, M.A.; Sampedro, G.A. A Novel Dataset and Approach for Adversarial Attack Detection in Connected and Automated Vehicles. Electronics 2024, 13, 2420. [Google Scholar] [CrossRef]
- Liu, X.; Xu, R. From Vulnerability to Robustness: A Survey of Patch Attacks and Defenses in Computer Vision. Electronics 2025, 14, 4553. [Google Scholar] [CrossRef]
- Liu, L.; Guo, Y.; Zhang, Y.; Yang, J. Understanding and Defending Patch-Based Adversarial Attacks for Vision Transformer. In Proceedings of the International Conference on Machine Learning (ICML); PMLR 202; PMLR: Brookline, MA, USA, 2023; pp. 21631–21657. [Google Scholar]
- Cai, M.; Wang, X.; Sohel, F.; Lei, H. Unsupervised Anomaly Detection for Improving Adversarial Robustness of 3D Object Detection Models. Electronics 2025, 14, 236. [Google Scholar] [CrossRef]
- Csizmadia, D.; Codreanu, A.; Sim, V.; Prabhu, V.; Lu, M.; Zhu, K.; O’Brien, S.; Sharma, V. Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation. arXiv 2025, arXiv:2505.21549. [Google Scholar] [CrossRef]
- Faghri, F.; Vasu, P.K.A.; Koc, C.; Shankar, V.; Toshev, A.; Tuzel, O.; Pouransari, H. MobileCLIP2: Improving Multi-Modal Reinforced Training. arXiv 2025, arXiv:2508.20691. [Google Scholar]
- Guan, J.; Pan, L.; Wang, C.; Yu, S.; Gao, L.; Zheng, X. Trustworthy Sensor Fusion Against Inaudible Command Attacks in Advanced Driver-Assistance Systems. IEEE Internet Things J. 2023, 10, 17254–17264. [Google Scholar] [CrossRef]
- Mohammed, A.; Ibrahim, H.M.; Omar, N.M. FDSNet: Dynamic Multimodal Fusion Stage Selection for Autonomous Driving via Feature Disagreement Scoring. Sci. Rep. 2025, 15, 44209. [Google Scholar] [CrossRef]
- Alsadie, D. Cybersecurity and Artificial Intelligence in Unmanned Aerial Vehicles: Emerging Challenges and Advanced Countermeasures. IET Inf. Secur. 2025, 2025, 2046868. [Google Scholar] [CrossRef]
- Lopez Pellicer, A.; Angelov, P.; Suri, N. Securing (Vision-Based) Autonomous Systems: Taxonomy, Challenges, and Defense Mechanisms Against Adversarial Threats. Artif. Intell. Rev. 2025, 58, 373. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the Conference on Robot Learning (CoRL); PMLR 78; PMLR: Brookline, MA, USA, 2017; pp. 1–16. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2020; pp. 11621–11631. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2020; pp. 2636–2645. [Google Scholar] [CrossRef]
- Athalye, A.; Carlini, N.; Wagner, D. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In Proceedings of the International Conference on Machine Learning (ICML); PMLR 80; PMLR: Brookline, MA, USA, 2018; pp. 274–283. [Google Scholar]
- ISO 26262:2018; Road Vehicles—Functional Safety. International Organization for Standardization: Geneva, Switzerland, 2018.
- ISO 21448:2022; Road Vehicles—Safety of the Intended Functionality. International Organization for Standardization: Geneva, Switzerland, 2022.






| Method | Plug-and- | Cognition | Foundation | Temporal | Graceful | Closed-Loop |
|---|---|---|---|---|---|---|
| Play | Layer | Model | Consistency | Degradation | AV Eval. | |
| Adversarial Training [7] | × | × | × | × | × | × |
| Smoothed ViT (certified) [6] | × | × | × | × | × | × |
| Jedi | ✓ | × | × | × | × | × |
| DIFFender [5] | × | × | Partial | × | × | × |
| Guan et al. [26] | ✓ | × | × | × | Partial | ✓ |
| FDSNet [27] | × | × | × | × | × | ✓ |
| VLA Defense [4] | × | × | Partial | × | × | ✓ |
| SENTINEL (ours) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Group | Parameter | Value |
|---|---|---|
| Backbones (frozen) | Foundation models | CLIP ViT-L/14, DINOv2 ViT-L/14, SAM-2 Hiera-L |
| Detector/BEV encoder | YOLOv10/BEVFormer | |
| Counterfactual inpainter | Distilled SD v1.5, 4-step LCM, FP16 | |
| Temporal scorer | 4 layers, 4 heads, hidden dim 128 | |
| Ensemble weights | (CLIP) | 0.35 |
| (DINOv2) | 0.40 | |
| (SAM-2) | 0.25 | |
| Detection thresholds | (Foundation) | 0.68 |
| (Temporal) | 0.72 | |
| (Target false alarm rate) | ≈2% on clean data | |
| Shield coefficients | 1.2, 0.8, 1.5 | |
| bias b | ||
| divergence scale | 2.5 | |
| action weights | 1.0, 0.8, 1.2 (lane choice highest) | |
| Temporal/attack | Window T | 16 frames |
| PGD budgets | 16/255, 32/255, 64/255 | |
| PGD steps/restarts | 100/500/2000; 1/1/5 restarts | |
| Protocol | Software | Python 3.11, PyTorch 2.4 |
| Seeds/significance | 5 seeds; Wilcoxon signed-rank, | |
| Calibration objective | Binary cross-entropy on clean + adversarial set |
| Component | Precision | Approx. Resident Memory |
|---|---|---|
| CLIP ViT-L/14 | FP16 | ∼1.7 GB |
| DINOv2 ViT-L/14 | FP16 | ∼1.6 GB |
| SAM-2 Hiera-L | FP16 | ∼0.9 GB |
| Distilled SD v1.5 inpainter (4-step LCM) | FP16 | ∼2.0 GB (only when ) |
| Deployed detector + BEV encoder | FP16 | ∼1.5 GB |
| Temporal scorer + buffers + exemplars | FP16 | ∼0.6 GB |
| Total (concurrent worst case) | ∼8.3 GB (fits within 24 GB) |
| Defense | RP2 Sign | Person Patch | DPatch | BadPatch | Temporal Patch | Clean Acc. (%) |
|---|---|---|---|---|---|---|
| No Defense | 89.7 ± 2.1 | 84.3 ± 2.8 | 91.2 ± 1.9 | 87.6 ± 2.4 | 86.4 ± 2.7 | 94.2 ± 0.4 |
| Adv. Training (PGD) | 31.2 ± 3.4 | 35.8 ± 3.1 | 29.7 ± 2.9 | 42.1 ± 3.6 | 48.3 ± 3.8 | 89.1 ± 0.7 |
| Smoothed ViT | 22.4 ± 2.7 | 27.9 ± 2.5 | 24.6 ± 2.3 | 31.5 ± 2.8 | 38.7 ± 3.1 | 86.8 ± 0.9 |
| Jedi | 18.6 ± 2.3 | 23.1 ± 2.4 | 20.5 ± 2.1 | 28.4 ± 2.7 | 34.2 ± 2.9 | 90.3 ± 0.6 |
| DIFFender | 11.3 ± 1.8 | 14.7 ± 2.0 | 13.2 ± 1.7 | 19.8 ± 2.2 | 22.6 ± 2.4 | 91.8 ± 0.5 |
| SENTINEL | 7.1 ± 1.2 | 8.9 ± 1.4 | 8.3 ± 1.3 | 11.6 ± 1.6 | 9.4 ± 1.5 | 92.4 ± 0.4 |
| Defense | Collision Rate (%) | Violation Rate (%) | Deviation (m) |
|---|---|---|---|
| No Defense | 38.6 ± 3.2 | 47.1 ± 3.8 | 4.72 ± 0.58 |
| Adv. Training (PGD) | 18.3 ± 2.4 | 26.4 ± 2.9 | 2.31 ± 0.34 |
| Smoothed ViT | 14.7 ± 2.1 | 22.8 ± 2.6 | 1.94 ± 0.29 |
| Jedi | 12.1 ± 1.9 | 19.6 ± 2.3 | 1.72 ± 0.26 |
| DIFFender | 8.4 ± 1.5 | 14.3 ± 2.0 | 1.28 ± 0.21 |
| SENTINEL | 4.9 ± 1.1 | 8.7 ± 1.4 | 0.83 ± 0.16 |
| Defense | CARLA | nuScenes | KITTI | BDD100K |
|---|---|---|---|---|
| No Defense | 88.0 ± 2.3 | 86.7 ± 2.6 | 85.4 ± 2.9 | 87.3 ± 2.7 |
| DIFFender | 16.3 ± 2.0 | 19.7 ± 2.4 | 21.3 ± 2.6 | 18.9 ± 2.3 |
| SENTINEL | 9.1 ± 1.4 | 11.8 ± 1.7 | 12.9 ± 1.8 | 11.4 ± 1.6 |
| Component | Latency (ms) |
|---|---|
| : Foundation Model Ensemble | 18.4 ± 1.2 |
| : Temporal Consistency Scorer | 4.7 ± 0.6 |
| : Counterfactual Verifier (when triggered) | 19.8 ± 2.1 |
| : Safety Shield | 0.8 ± 0.1 |
| SENTINEL Total (avg, conditional ) | 42.6 ± 3.4 |
| DIFFender | 87.5 ± 4.8 |
| Smoothed ViT | 64.3 ± 3.9 |
| Jedi | 29.6 ± 2.4 |
| Configuration | ASR (%) | Collision (%) |
|---|---|---|
| only | 17.3 ± 2.1 | 10.6 ± 1.7 |
| only | 24.8 ± 2.6 | 14.2 ± 2.0 |
| (no ) | 12.5 ± 1.8 | 7.4 ± 1.4 |
| No Safety Shield (binary) | 10.2 ± 1.6 | 6.8 ± 1.3 |
| Full SENTINEL | 9.1 ± 1.4 | 4.9 ± 1.1 |
| Defense | ASR | Collision | ||||
|---|---|---|---|---|---|---|
| Low | Med | High | Low | Med | High | |
| No Defense | 88.0 | 88.7 | 89.1 | 38.6 | 39.1 | 39.4 |
| DIFFender | 18.4 | 34.9 | 51.2 | 9.7 | 18.7 | 28.4 |
| SENTINEL | 12.4 | 18.6 | 27.3 | 6.2 | 9.8 | 14.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Alserhani, A.F.; Alserhani, F.M. SENTINEL: Action-Level Adversarial Defense for Autonomous Vehicles via Counterfactual Policy Verification. Electronics 2026, 15, 2901. https://doi.org/10.3390/electronics15132901
Alserhani AF, Alserhani FM. SENTINEL: Action-Level Adversarial Defense for Autonomous Vehicles via Counterfactual Policy Verification. Electronics. 2026; 15(13):2901. https://doi.org/10.3390/electronics15132901
Chicago/Turabian StyleAlserhani, Azzam F., and Faeiz M. Alserhani. 2026. "SENTINEL: Action-Level Adversarial Defense for Autonomous Vehicles via Counterfactual Policy Verification" Electronics 15, no. 13: 2901. https://doi.org/10.3390/electronics15132901
APA StyleAlserhani, A. F., & Alserhani, F. M. (2026). SENTINEL: Action-Level Adversarial Defense for Autonomous Vehicles via Counterfactual Policy Verification. Electronics, 15(13), 2901. https://doi.org/10.3390/electronics15132901

