Universal Low-Frequency Noise Black-Box Attack on Visual Object Tracking
Abstract
:1. Introduction
- We present the Universal Low-frequency Noise black-box attack that generates perturbations solely through iterative search and query processes. In each iteration, we pregenerate orthogonal vector space and sample perturbations from the low-frequency discrete cosine transform (DCT) space. In addition, we use a sampling–reconstruction strategy to effectively save the query cost for perturbation.
- The proposed attack eliminates the need for pretraining a generator or utilizing subsequent video frames, and also significantly reduces the time required for each iteration, ensuring low hardware requirements and enabling the use of parallel computing for enhanced efficiency.
- Experiments are a crucial part of our attack evaluation. We conducted extensive attack experiments against trackers in four prominent domains, demonstrating the robust versatility and effectiveness of the proposed attack algorithm.
2. Background
2.1. Visual Object Tracking
2.2. Adversarial Attacks
2.3. Low-Frequency Perturbation
3. Method
3.1. Motivation
3.2. ULN Attack
3.2.1. Perturbation Generation
Algorithm 1 Universal Low-frequency Noise Black-box Attack. |
|
3.2.2. Optimization
4. Experiments
4.1. Testing Dataset
4.2. Tracking Challenge
4.2.1. VOT Challenge
4.2.2. OPE Challenge
4.3. Defense Scenario Experiment
4.4. Efficiency Experiment
4.5. Practicality Verification
4.6. Ablation Study
4.7. Comparisons with Other Methods
5. Discussion
5.1. Similar Systems
5.2. Limitations
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, Y.; Wang, Q. Pedestrian tracking through coordinated mining of multiple moving cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 252–261. [Google Scholar]
- Menezes, R.; de Miranda, A.; Maia, H. Pymicetracking: An open-source toolbox for real-time behavioral neuroscience experiments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 21459–21465. [Google Scholar]
- Cao, Z.; Huang, Z.; Pan, L.; Zhang, S.; Liu, Z.; Fu, C. Tctrack: Temporal contexts for aerial tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 14778–14788. [Google Scholar]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.J.; Fergus, R. Intriguing properties of neural networks. 2013. arXiv 2013, arXiv:1312.6199. [Google Scholar]
- Croce, F.; Hein, M. Minimally distorted adversarial examples with a fast adaptive boundary attack. In Proceedings of the International Conference on Machine Learning (ICML), Virtual Event, 13–18 July 2020; pp. 2196–2205. [Google Scholar]
- Guo, Q.; Cheng, Z.; Juefei-Xu, F.; Ma, L.; Xie, X.; Liu, Y.; Zhao, J. Learning to adversarially blur visual object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 10839–10848. [Google Scholar]
- Croce, F.; Hein, M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the International Conference on Machine Learning (ICML), Virtual Event, 13–18 July 2020; pp. 2206–2216. [Google Scholar]
- Li, Z.; Shi, Y.; Gao, J.; Wang, S.; Li, B.; Liang, P.; Hu, W. A simple and strong baseline for universal targeted attacks on siamese visual tracking. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 3880–3894. [Google Scholar] [CrossRef]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–19 June 2019; pp. 4282–4291. [Google Scholar]
- Bhat, G.; Danelljan, M.; Gool, L.V.; Timofte, R. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6182–6191. [Google Scholar]
- Zhao, M.; Okada, K.; Inaba, M. Trtr: Visual tracking with transformer. arXiv 2021, arXiv:2105.03817. [Google Scholar]
- Gao, S.; Zhou, C.; Zhang, J. Generalized relation modeling for transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 18686–18695. [Google Scholar]
- Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–19 June 2019; pp. 5374–5383. [Google Scholar]
- Kugarajeevan, J.; Kokul, T.; Ramanan, A.; Fernando, S. Transformers in single object tracking: An experimental survey. IEEE Access 2023, 11, 80297–80326. [Google Scholar] [CrossRef]
- Yan, X.; Chen, X.; Jiang, Y.; Xia, S.-T.; Zhao, Y.; Zheng, F. Hijacking tracker: A powerful adversarial attack on visual tracking. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 2897–2901. [Google Scholar]
- Yan, B.; Wang, D.; Lu, H.; Yang, X. Cooling-shrinking attack: Blinding the tracker with imperceptible noises. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 990–999. [Google Scholar]
- Zhou, Z.; Sun, Y.; Sun, Q.; Li, C.; Ren, Z. Only once attack: Fooling the tracker with adversarial template. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 3173–3184. [Google Scholar] [CrossRef]
- Liu, S.; Chen, Z.; Li, W.; Zhu, J.; Wang, J.; Zhang, W.; Gan, Z. Efficient universal shuffle attack for visual object tracking. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 2739–2743. [Google Scholar]
- Guo, Q.; Xie, X.; Juefei-Xu, F.; Ma, L.; Li, Z.; Xue, W.; Feng, W.; Liu, Y. Spark: Spatial-aware online incremental attack against visual tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 202–219. [Google Scholar]
- Chen, X.; Yan, X.; Zheng, F.; Jiang, Y.; Xia, S.-T.; Zhao, Y.; Ji, R. One-shot adversarial attacks on visual tracking with dual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10176–10185. [Google Scholar]
- Jia, S.; Song, Y.; Ma, C.; Yang, X. Iou attack: Towards temporally coherent black-box adversarial attack for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6709–6718. [Google Scholar]
- Yin, X.; Ruan, W.; Fieldsend, J. Dimba: Discretely masked black-box attack in single object tracking. Mach. Learn. 2022, 13, 1705–1723. [Google Scholar] [CrossRef]
- Nakka, K.K.; Salzmann, M. Temporally-transferable perturbations: Efficient, one-shot adversarial attacks for online visual object trackers. arXiv 2020, arXiv:2012.15183. [Google Scholar]
- Guo, C.; Gardner, J.; You, Y.; Wilson, A.G.; Weinberger, K. Simple black-box adversarial attacks. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 2484–2493. [Google Scholar]
- Qiao, Y.; Liu, D.; Wang, R.; Liang, K. Low-frequency black-box backdoor attack via evolutionary algorithm. arXiv 2024, arXiv:2402.15653. [Google Scholar] [CrossRef]
- Guo, C.; Frank, J.S.; Weinberger, K.Q. Low frequency adversarial perturbation. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, 22–25 July 2019; Volume 115, pp. 1127–1137. Available online: http://proceedings.mlr.press/v115/guo20a.html (accessed on 6 March 2025).
- Wallace, G.K. The JPEG still picture compression standard. Commun. ACM 1991, 34, 30–44. [Google Scholar] [CrossRef]
- Mishra, D.; Singh, S.K.; Singh, R.K. Deep architectures for image compression: A critical review. Signal Process. 2022, 191, 108346. [Google Scholar] [CrossRef]
- Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Cehovin Zajc, L.; Vojir, T.; Bhat, G.; Lukezic, A.; Eldesokey, A.; et al. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 3–53. [Google Scholar]
- Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Pflugfelder, R.; Kamarainen, J.-K.; Zajc, L.C.; Drbohlav, O.; Lukezic, A.; Berg, A.; et al. The seventh visual object tracking vot2019 challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2206–2241. [Google Scholar]
- Roffo, G.; Melzi, S. The visual object tracking vot2016 challenge results. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part II; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 777–823. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.-H. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed]
- Galoogahi, H.K.; Fagg, A.; Huang, C.; Ramanan, D.; Lucey, S. Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for UAV tracking. In Computer Vision-ECCV 2016-14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I; Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 445–461. [Google Scholar] [CrossRef]
- Hou, Y.; Zhao, C.; Yang, D.; Cheng, Y. Comments on “image denoising by sparse 3-d transform-domain collaborative filtering”. IEEE Trans. Image Process. 2011, 20, 268–270. [Google Scholar] [CrossRef] [PubMed]
- Jia, S.; Ma, C.; Song, Y.; Yang, X. Robust tracking against adversarial attacks. In Computer Vision-ECCV 2020-16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX; Lecture Notes in Computer Science; Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Springer: Cham, Switzerland, 2020; Volume 12364, pp. 69–84. [Google Scholar] [CrossRef]
Tracker | SiamRPN++ | DIMP | |||||
---|---|---|---|---|---|---|---|
Challenge | SSIM | ACC ↓ | FAIL ↑ | EAO ↓ | ACC ↓ | FAIL ↑ | EAO ↓ |
VOT2016 | 1216 | 701 | |||||
VOT2018 | 1247 | 941 | |||||
VOT2019 | 1207 | 827 | |||||
Tracker | TrTr | GRM | |||||
Challenge | ACC ↓ | FAIL ↑ | EAO ↓ | ACC ↓ | FAIL ↑ | EAO ↓ | |
VOT2016 | 1313 | 1331 | |||||
VOT2018 | 1428 | 1607 | |||||
VOT2019 | 1370 | 1333 |
Tracker | Metrics | Origin | Ours | DROP |
---|---|---|---|---|
SiamRPN++ | AUC ↓ | 0.695 | 0.163 | 0.532 |
Pre ↓ | 0.905 | 0.191 | 0.714 | |
SSIM ↑ | 1.000 | 0.862 | 0.138 | |
Dimp | AUC ↓ | 0.671 | 0.257 | 0.414 |
Pre ↓ | 0.869 | 0.360 | 0.509 | |
SSIM ↑ | 1.000 | 0.877 | 0.123 | |
TrTr | AUC ↓ | 0.683 | 0.160 | 0.523 |
Pre ↓ | 0.887 | 0.219 | 0.668 | |
SSIM ↑ | 1.000 | 0.867 | 0.133 | |
GRM | AUC ↓ | 0.701 | 0.165 | 0.536 |
Pre ↓ | 0.908 | 0.226 | 0.682 | |
SSIM ↑ | 1.000 | 0.853 | 0.147 |
Tracker | Metrics | Origin | Ours | DROP |
---|---|---|---|---|
SiamRPN++ | AUC ↓ | 0.509 | 0.134 | 0.375 |
Pre ↓ | 0.601 | 0.119 | 0.482 | |
SSIM ↑ | 1.000 | 0.871 | 0.129 | |
Dimp | AUC ↓ | 0.614 | 0.166 | 0.448 |
Pre ↓ | 0.729 | 0.169 | 0.560 | |
SSIM ↑ | 1.000 | 0.864 | 0.136 | |
TrTr | AUC ↓ | 0.559 | 0.114 | 0.445 |
Pre ↓ | 0.668 | 0.115 | 0.553 | |
SSIM ↑ | 1.000 | 0.863 | 0.137 | |
GRM | AUC ↓ | 0.644 | 0.194 | 0.450 |
Pre ↓ | 0.779 | 0.167 | 0.612 | |
SSIM ↑ | 1.000 | 0.869 | 0.131 |
Tracker | Challenge | Metrics | Origin | Ours | DROP |
---|---|---|---|---|---|
SiamRPN++ | UAV123 | AUC ↓ | 0.605 | 0.165 | 0.440 |
Pre ↓ | 0.798 | 0.202 | 0.596 | ||
SSIM ↑ | 1.000 | 0.864 | 0.136 | ||
LasOT | AUC ↓ | 0.495 | 0.208 | 0.287 | |
Pre ↓ | 0.491 | 0.092 | 0.399 | ||
SSIM ↑ | 1.000 | 0.842 | 0.158 | ||
GRM | UAV123 | AUC ↓ | 0.701 | 0.184 | 0.517 |
Pre ↓ | 0.788 | 0.232 | 0.556 | ||
SSIM ↑ | 1.000 | 0.865 | 0.135 | ||
LasOT | AUC ↓ | 0.699 | 0.194 | 0.505 | |
Pre ↓ | 0.816 | 0.076 | 0.740 | ||
SSIM ↑ | 1.000 | 0.839 | 0.161 |
Tracker | Cost Per Frame | ||||
---|---|---|---|---|---|
Attack | IOU [21] | ULN (RTG) | ULN (PreG) | ||
Tracker | Cost | Cost | EAO | Cost | EAO |
SiamRPN++ | 150 ms | 128 ms | 0.020 | 56 ms | 0.077 |
Dimp | 212 ms | 143 ms | 0.024 | 51 ms | 0.085 |
TrTr | 611 ms | 156 ms | 0.015 | 62 ms | 0.081 |
GRM | 649 ms | 134 ms | 0.009 | 53 ms | 0.071 |
Attack Mode | ULN (RTG) | |||
---|---|---|---|---|
Experiments Device | Executing Unit | Query | Success | Precision |
Raspberry Pi 4B | CPU@1.8 GHz | 35.21 | 0.423 | 0.551 |
GPU (GL 3.1) | 22.54 | 0.405 | 0.533 | |
Iphone14 pro | A16@3.46 GHz | 66.25 | 0.288 | 0.375 |
Soc GPU (Metal) | 61.88 | 0.306 | 0.412 | |
Laptop (Ubuntu20.04) | Intel i7-12700h | 89.77 | 0.216 | 0.265 |
Nvidia RTX 3060 | 105.21 | 0.212 | 0.259 | |
Attack Mode | ULN (PreG) | |||
Experiments Device | Executing Unit | Query | Success | Precision |
Raspberry Pi 4B | CPU@1.8 GHz | 58.92 | 0.406 | 0.532 |
GPU (GL 3.1) | 31.70 | 0.412 | 0.542 | |
Iphone14 pro | A16@3.46 GHz | 111.38 | 0.274 | 0.369 |
Soc GPU (Metal) | 95.34 | 0.325 | 0.455 | |
Laptop (Ubuntu20.04) | Intel i7-12700h | 166.31 | 0.221 | 0.271 |
Nvidia RTX 3060 | 197.52 | 0.211 | 0.262 |
Method | Success ↓ | Precision ↓ | Operability |
---|---|---|---|
Orig. | 0.696 | 0.905 | |
SPARK [19] | 0.629 | 0.878 | On-site deployment |
IOU [21] | 0.499 | 0.644 | On-site deployment |
CSA [16] | 0.324 | 0.471 | Pretraining needed |
EUSA [18] | 0.236 | 0.327 | Pretraining needed |
Ours | 0.163 | 0.191 | On-site deployment |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hou, H.; Bao, H.; Wei, K.; Wu, Y. Universal Low-Frequency Noise Black-Box Attack on Visual Object Tracking. Symmetry 2025, 17, 462. https://doi.org/10.3390/sym17030462
Hou H, Bao H, Wei K, Wu Y. Universal Low-Frequency Noise Black-Box Attack on Visual Object Tracking. Symmetry. 2025; 17(3):462. https://doi.org/10.3390/sym17030462
Chicago/Turabian StyleHou, Hanting, Huan Bao, Kaimin Wei, and Yongdong Wu. 2025. "Universal Low-Frequency Noise Black-Box Attack on Visual Object Tracking" Symmetry 17, no. 3: 462. https://doi.org/10.3390/sym17030462
APA StyleHou, H., Bao, H., Wei, K., & Wu, Y. (2025). Universal Low-Frequency Noise Black-Box Attack on Visual Object Tracking. Symmetry, 17(3), 462. https://doi.org/10.3390/sym17030462