Generating Hard-Label Black-Box Adversarial Examples for Video Recognition Models
Abstract
1. Introduction
2. Proposed Framework
2.1. Finding a Boundary Video at the Decision Boundary
2.2. Calculating the Next Movement Direction by Estimating the Gradient
2.3. Moving Along the Estimated Gradient Direction
| Algorithm 1 DBA |
| Input: The source video , the target video , the target model f, and the adversarial label . Output: The adversarial video .
|
3. Experiments
3.1. Algorithm Competitors
3.2. Experimental Settings
3.3. Results and Analysis
3.3.1. Magnitude of Perturbations
3.3.2. Success Rate
3.3.3. Ablation Studies
3.4. Failure Cases and Future Directions
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Lemma 1
Appendix B. Proof of Lemma 2
Appendix C. Proof of Theorem 1
References
- Zhang, Y.; Chen, Z. A New Architecture of Neural Network. In Proceedings of the 1991 IEEE International Joint Conference on Neural Networks, Singapore, 18–21 November 1991; Volume 1, pp. 833–838. [Google Scholar]
- Özgür, A.; Nar, F. Effect of Dropout Layer on Classical Regression Problems. In Proceedings of the 2020 28th Signal Processing and Communications Applications Conference (SIU), Gaziantep, Turkey, 5–7 October 2020; pp. 1–4. [Google Scholar]
- Khedr, Y.M.; Liu, X.; Lu, H.; He, K. Transferable Adversarial Attacks against Face Recognition Using Surrogate Model Fine-Tuning. Appl. Soft Comput. 2025, 174, 112983. [Google Scholar] [CrossRef]
- Zhang, C.; Zhou, L.; Xu, X.; Wu, J.; Liu, Z. Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey. ACM Comput. Surv. 2025, 58, 52. [Google Scholar] [CrossRef]
- Zheng, M.; Yan, X.; Zhu, Z.; Chen, H.; Wu, B. BlackboxBench: A Comprehensive Benchmark of Black-Box Adversarial Attacks. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 7867–7885. [Google Scholar] [CrossRef]
- Ran, Y.; Zhang, A.-X.; Li, M.; Tang, W.; Wang, Y.-G. Black-box adversarial attacks against image quality assessment models. Expert Syst. Appl. 2025, 260, 125415. [Google Scholar] [CrossRef]
- BenSaid, E.; Neji, M.; Jabberi, M.; Alimi, A.M. Deep keypoints adversarial attack on face recognition systems. Neurocomputing 2025, 621, 129295. [Google Scholar] [CrossRef]
- Cui, J.; Gao, S.; Lv, T.; Ji, J.; Yao, S.; Zhou, W. Dual-label guided unrestricted target attack with diffusion model. Neurocomputing 2025, 665, 132185. [Google Scholar] [CrossRef]
- Dong, Y.; Wang, L.; Li, Z.; Li, H.; Tang, P.; Hu, C.; Guo, S. Safe Driving Adversarial Trajectory Can Mislead: Toward More Stealthy Adversarial Attack Against Autonomous Driving Prediction Module. ACM Trans. Priv. Secur. 2025, 28, 19. [Google Scholar] [CrossRef]
- Wang, J.; Li, F.; He, L. A Unified Framework for Adversarial Patch Attacks Against Visual 3D Object Detection in Autonomous Driving. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 4949–4962. [Google Scholar] [CrossRef]
- Chen, G.; Qian, Z.; Zhang, D.; Qiu, S.; Zhou, R. Enhancing Robustness Against Adversarial Attacks in Multimodal Emotion Recognition With Spiking Transformers. IEEE Access 2025, 13, 34584–34597. [Google Scholar] [CrossRef]
- Ma, J.; Li, Y.; Xiao, Z.; Cao, A.; Zhang, J.; Ye, C.; Zhao, J. Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models. In Findings of the Association for Computational Linguistics: NAACL 2025; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 3141–3157. [Google Scholar] [CrossRef]
- Asimopoulos, D.C.; Radoglou–Grammatikis, P.; Lagkas, T.; Argyriou, A.; Moscholios, I.; Cani, J. AAG: Adversarial Attack Generator for Evaluating the Robustness of Machine Learning Models against Adversarial Attacks. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; pp. 2682–2689. [Google Scholar]
- Duan, M.; Qin, Y.; Deng, J.; Li, K.; Xiao, B. Dual Attention Adversarial Attacks with Limited Perturbations. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 13990–14004. [Google Scholar] [CrossRef]
- Jain, S.; Dutta, T. Towards Understanding and Improving Adversarial Robustness of Vision Transformers. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–18 June 2024; pp. 24736–24745. [Google Scholar]
- Chen, Z.; Li, B.; Wu, S.; Jiang, K.; Ding, S.; Zhang, W. Content-Based Unrestricted Adversarial Attack. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY, USA, 2023; Curran Associates Inc.: Red Hook, NY, USA, 2023. [Google Scholar]
- Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Black-Box Adversarial Attacks with Limited Queries and Information. arXiv 2018, arXiv:1804.08598. [Google Scholar] [CrossRef]
- Chen, J.; Jordan, M.I. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 1277–1294. [Google Scholar]
- Wang, J.; Li, F.; Lv, S.; He, L.; Shen, C. Physically Realizable Adversarial Creating Attack Against Vision-Based BEV Space 3D Object Detection. IEEE Trans. Image Process. 2025, 34, 538–551. [Google Scholar] [CrossRef]
- Rahman, M.; Roy, P.; Frizell, S.S.; Qian, L. Evaluating Pretrained Deep Learning Models for Image Classification Against Individual and Ensemble Adversarial Attacks. IEEE Access 2025, 13, 35230–35242. [Google Scholar] [CrossRef]
- Song, Y.; Zhou, Z.; Li, M.; Wang, X.; Zhang, H.; Deng, M.; Wan, W.; Hu, S.; Zhang, L.Y. PB-UAP: Hybride Universal Adversarial Attack for Image Segmentation. In Proceedings of the ICASSP 2025-IEEE International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Liu, Z.; Wu, X.; Wang, S.; Shang, Y. Violent Video Recognition Based on Global-Local Visual and Audio Contrastive Learning. IEEE Signal Process. Lett. 2024, 31, 476–480. [Google Scholar] [CrossRef]
- Qasim, I.; Horsch, A.; Prasad, D. Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols. ACM Comput. Surv. 2025, 57, 154. [Google Scholar] [CrossRef]
- Yang, Z.; Miao, J.; Wei, Y.; Wang, W.; Wang, X.; Yang, Y. Scalable Video Object Segmentation with Identification Mechanism. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6247–6262. [Google Scholar] [CrossRef]
- Della Torca, S.; Casola, V.; Izzo, S. N-Pixels: A Novel Grey-Box Adversarial Attack for Fooling Convolutional Neural Networks. In Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, SAC ’25, Catania, Italy, 31 March–4 April 2025; pp. 1539–1547. [Google Scholar]
- Zeng, Q.; Wang, Z.; Cheung, Y.-m.; Jiang, M. Ask, Attend, Attack: An Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY, USA, 2025; Curran Associates Inc.: Red Hook, NY, USA, 2025. [Google Scholar]
- Song, W.; Cong, C.; Zhong, H.; Xue, J. Correction-Based Defense against Adversarial Video Attacks via Discretization-Enhanced Video Compressive Sensing. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–16 August 2024; pp. 3603–3620. [Google Scholar]
- Pan, Y.; Huang, J.-J.; Chen, Z.; Zhao, W.; Wang, Z. SVASTIN: Sparse Video Adversarial Attack via Spatiotemporal Invertible Neural Networks. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024; pp. 1–6. [Google Scholar]
- Jiang, L.; Ma, X.; Chen, S.; Bailey, J.; Jiang, Y.-G. Black-Box Adversarial Attacks on Video Recognition Models. In Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, New York, NY, USA, 21–25 October 2019; pp. 864–872. [Google Scholar]
- Yan, H.; Wei, X. Efficient Sparse Attacks on Videos Using Reinforcement Learning. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, New York, NY, USA, 20–24 October 2021; pp. 2326–2334. [Google Scholar]
- Jiang, K.; Chen, Z.; Huang, H.; Wang, J.; Yang, D.; Li, B.; Wang, Y.; Zhang, W. Efficient Decision-Based Black-Box Patch Attacks on Video Recognition. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Los Alamitos, CA, USA, 2–6 October 2023; pp. 4356–4366. [Google Scholar]
- Li, H.; Xu, X.; Zhang, X.; Yang, S.; Li, B. QEBA: Query-efficient boundary-based blackbox attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1218–1227. [Google Scholar]
- Zhang, J.; Li, L.; Li, H.; Zhang, X.; Yang, S.; Li, B. Progressive-scale boundary blackbox attack via projective gradient estimation. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 12479–12490. [Google Scholar]
- Hu, J.; Li, X.; Liu, C.; Zhang, R.; Tang, J.; Sun, Y.; Wang, Y. APDL: An adaptive step size method for white-box adversarial attacks. Complex Intell. Syst. 2025, 11, 116. [Google Scholar] [CrossRef]
- Croce, F.; Hein, M. Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-Free Attacks. In Proceedings of the International Conference on Machine Learning, Virtual Conference, 13–18 July 2020; pp. 2206–2216. [Google Scholar]
- Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A Large Video Database for Human Motion Recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [Google Scholar]
- Liu, J.; Zhang, C.; Lyu, X. Boosting the Transferability of Adversarial Examples via Local Mixup and Adaptive Step Size. In Proceedings of the ICASSP 2025-IEEE International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Li, J.; Yu, Z.; He, Z.; Wang, Z.J.; Kang, X. PGD-Imp: Rethinking and Unleashing Potential of Classic PGD with Dual Strategies for Imperceptible Adversarial Attacks. In Proceedings of the ICASSP 2025-IEEE International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Zhao, J.-C.; Ding, J.; Sun, Y.-Z.; Tan, P.; Ma, J.-E.; Fang, Y.-T. Avoiding catastrophic overfitting in fast adversarial training with adaptive similarity step size. PLoS ONE 2025, 20, e0317023. [Google Scholar] [CrossRef]
- Gotin, G.; Shumitskaya, E.; Antsiferova, A.; Vatolin, D. Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics. arXiv 2025, arXiv:2501.08415. [Google Scholar]
- Wei, Z.; Chen, J.; Wu, Z.; Jiang, Y.-G. Adaptive Cross-Modal Transferable Adversarial Attacks from Images to Videos. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 3772–3783. [Google Scholar] [CrossRef] [PubMed]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild. arXiv 2012, arXiv:1212.0402. [Google Scholar] [CrossRef]
- Hara, K.; Kataoka, H.; Satoh, Y. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6546–6555. [Google Scholar]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Van Gool, L. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In Computer Vision–ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Swizerland, 2016; pp. 20–36. [Google Scholar]
- Vranjes, M.; Rimac-Drlje, S.; Grgic, K. Locally Averaged PSNR as a Simple Objective Video Quality Metric. In Proceedings of the 2008 50th International Symposium ELMAR, Zadar, Croatia, 10–12 September 2008; pp. 17–20. [Google Scholar]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing Properties of Neural Networks. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014; Bengio, Y., Le Cun, Y., Eds.; ICLR: Banff, AB, Canada, 2014. [Google Scholar]
- Wang, R.; Guo, Y.; Wang, Y. Global-Local Characteristic Excited Cross-Modal Attacks from Images to Videos. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI’23, Washington, DC, USA, 7–14 February 2023; AAAI Press: Washington, DC, USA, 2023. [Google Scholar]
- Chen, K.; Wei, Z.; Chen, J.; Wu, Z.; Jiang, Y.-G. GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos. In Proceedings of the 31st ACM International Conference on Multimedia, MM ’23, New York, NY, USA, 29 October–3 November 2023; pp. 698–708. [Google Scholar]












| Serial Number | Symbol | Description |
|---|---|---|
| 1 | The boundary video generated in the t-th iteration | |
| 2 | The adversarial video generated in the t-th iteration | |
| 3 | The target video | |
| 4 | The probability difference function | |
| 5 | The sign function | |
| 6 | u | The sampled perturbation |
| 7 | A small constant | |
| 8 | L | The Lipschitz continuity coefficient of |
| 9 | The true gradient of | |
| 10 | The estimated gradient of | |
| 11 | The expectation function | |
| 12 | d | The number of pixels of videos |
| 13 | The probability function | |
| 14 | The Beta distribution | |
| 15 | The video between and | |
| 16 | The uniform distribution | |
| 17 | The three subregions of the spherical region | |
| 18 | The orthogonal basis |
| Algorithm | MQN | MSE | PSNR | AOA |
|---|---|---|---|---|
| DBA | 29,910 | 20.8 | 35.0 | 0.08 |
| STDE | 29,954 | 90.1 | 28.6 | 0.33 |
| EARL | - | - | - | - |
| VBAD | - | - | - | - |
| Algorithm | MQN | MSE | PSNR | AOA |
|---|---|---|---|---|
| DBA | 9870 | 14.6 | 36.5 | 0.15 |
| STDE | 29,875 | 39.2 | 32.2 | 0.40 |
| EARL | - | - | - | - |
| VBAD | 11,750 | 14.6 | 36.5 | 0.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jing, Y.; Wu, L.; Su, K.; Wu, W.; Li, Z.; Deng, Q. Generating Hard-Label Black-Box Adversarial Examples for Video Recognition Models. Mathematics 2026, 14, 1016. https://doi.org/10.3390/math14061016
Jing Y, Wu L, Su K, Wu W, Li Z, Deng Q. Generating Hard-Label Black-Box Adversarial Examples for Video Recognition Models. Mathematics. 2026; 14(6):1016. https://doi.org/10.3390/math14061016
Chicago/Turabian StyleJing, Yulin, Lijun Wu, Kaile Su, Wei Wu, Zhiyuan Li, and Qi Deng. 2026. "Generating Hard-Label Black-Box Adversarial Examples for Video Recognition Models" Mathematics 14, no. 6: 1016. https://doi.org/10.3390/math14061016
APA StyleJing, Y., Wu, L., Su, K., Wu, W., Li, Z., & Deng, Q. (2026). Generating Hard-Label Black-Box Adversarial Examples for Video Recognition Models. Mathematics, 14(6), 1016. https://doi.org/10.3390/math14061016

