Scaling-Invariant Max-Filtering Enhancement Transformers for Efficient Visual Tracking
Abstract
:1. Introduction
- This paper presents scaling-invariant max-filtering to suppress the background expression in features and filter the suspected targets to be enhanced and maintained, improving target localization accuracy;
- We utilize Pixel-Shuffle to reconstruct the lost information, increasing the fine-grained level of the feature map and making the bounding box more compact;
- This paper presents a large number of experiments that verified the effectiveness of our proposed method.
2. Related Work
2.1. Siamese-Based Tracker
2.2. Efficient Tracker
2.3. Transformer-Based Tracker
3. Methods
3.1. Standard Transformer
3.2. Scale-Invariant Max-Filtering Enhancement Transformer
3.3. Fine-Grained Feature Representation with Pixel-Shuffle
3.4. AnteaTrack Architecture
4. Experiments and Analysis of Results
4.1. Experimental Requirements and Implementation Details
4.2. Evaluation Datasets and Analysis of Results
4.2.1. GOT-10K Dataset and Analysis of Evaluation Results
4.2.2. OTB100 Dataset and Analysis of Evaluation Results
4.2.3. UAV123 Dataset and Analysis of Evaluation Results
4.2.4. LaSOT Dataset and Analysis of Evaluation Results
4.2.5. NFS Dataset and Analysis of Evaluation Results
4.3. Attribute-Based Performance with UAV123
Pixel-Shuffle
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef] [PubMed]
- Danelljan, M.; Bhat, G.; Shahbaz Khan, F.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6638–6646. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-Convolutional Siamese Networks for Object Tracking. In Proceedings of the Computer Vision—ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–16 October 2016; Lecture Notes in Computer Science. Hua, G., Jégou, H., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 850–865. [Google Scholar] [CrossRef]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8971–8980. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4282–4291. [Google Scholar]
- Zhang, Z.; Peng, H.; Fu, J.; Li, B.; Hu, W. Ocean: Object-Aware Anchor-Free Tracking. In Proceedings of the Computer Vision—ECCV, Glasgow, UK, 23–28 August 2020; Lecture Notes in Computer Science. Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 771–787. [Google Scholar] [CrossRef]
- Yan, B.; Peng, H.; Wu, K.; Wang, D.; Fu, J.; Lu, H. LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15180–15189. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Lin, J.; Gan, C.; Han, S. Tsm: Temporal Shift Module for Efficient Video Understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7083–7093. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Lecture Notes in Computer Science. Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Jia, D.; Yuan, Y.; He, H.; Wu, X.; Yu, H.; Lin, W.; Sun, L.; Zhang, C.; Hu, H. Detrs with Hybrid Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 19702–19712. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Chen, X.; Peng, H.; Wang, D.; Lu, H.; Hu, H. SeqTrack: Sequence to Sequence Learning for Visual Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 14572–14581. [Google Scholar]
- Cui, Y.; Jiang, C.; Wang, L.; Wu, G. Mixformer: End-to-end Tracking with Iterative Mixed Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13608–13618. [Google Scholar]
- Thangavel, J.; Kokul, T.; Ramanan, A.; Fernando, S. Transformers in Single Object Tracking: An Experimental Survey. arXiv 2023, arXiv:2302.11867. [Google Scholar]
- Blatter, P.; Kanakis, M.; Danelljan, M.; Van Gool, L. Efficient Visual Tracking With Exemplar Transformers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–7 January 2023; pp. 1571–1581. [Google Scholar]
- Mayer, C.; Danelljan, M.; Paudel, D.P.; Van Gool, L. Learning Target Candidate Association to Keep Track of What Not to Track. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11– 17 October 2021; pp. 13444–13454. [Google Scholar]
- Zhang, Q.; Yang, Y.B. Rest v2: Simpler, Faster and Stronger. Adv. Neural Inf. Process. Syst. 2022, 35, 36440–36452. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5374–5383. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed]
- Mueller, M.; Smith, N.; Ghanem, B. A Benchmark and Simulator for UAV Tracking. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Lecture Notes in Computer Science. Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 445–461. [Google Scholar] [CrossRef]
- Kiani Galoogahi, H.; Fagg, A.; Huang, C.; Ramanan, D.; Lucey, S. Need for Speed: A Benchmark for Higher Frame Rate Object Tracking. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1125–1134. [Google Scholar]
- Huang, L.; Zhao, X.; Huang, K. GOT-10K: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1562–1577. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Huang, L.; Zhao, X.; Huang, K. Globaltrack: A Simple and Strong Baseline for Long-Term Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11037–11044. [Google Scholar]
- Yan, B.; Zhao, H.; Wang, D.; Lu, H.; Yang, X. ‘skimming-Perusal’ tracking: A Framework for Real-Time and Robust Long-Term Tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2385–2393. [Google Scholar]
- Xue, Y.; Zhang, J.; Lin, Z.; Li, C.; Huo, B.; Zhang, Y. SiamCAF: Complementary Attention Fusion-Based Siamese Network for RGBT Tracking. Remote Sens. 2023, 15, 3252. [Google Scholar] [CrossRef]
- Zhang, T.; Liu, X.; Zhang, Q.; Han, J. SiamCDA: Complementarity-and Distractor-Aware RGB-T Tracking Based on Siamese Network. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1403–1417. [Google Scholar] [CrossRef]
- Deng, A.; Han, G.; Chen, D.; Ma, T.; Liu, Z. Slight Aware Enhancement Transformer and Multiple Matching Network for Real-Time UAV Tracking. Remote Sens. 2023, 15, 2857. [Google Scholar] [CrossRef]
- Danelljan, M.; Häger, G.; Khan, F.; Felsberg, M. Accurate Scale Estimation for Robust Visual Tracking. In Proceedings of the British Machine Vision Conference, Nottingham, UK, 1–5 September 2014; Bmva Press: Durham, UK, 2014. [Google Scholar]
- Yan, B.; Peng, H.; Fu, J.; Wang, D.; Lu, H. Learning Spatio-Temporal Transformer for Visual Tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10448–10457. [Google Scholar]
- Chen, X.; Yan, B.; Zhu, J.; Wang, D.; Yang, X.; Lu, H. Transformer Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8126–8135. [Google Scholar]
- Mayer, C.; Danelljan, M.; Bhat, G.; Paul, M.; Paudel, D.P.; Yu, F.; Van Gool, L. Transforming Model Prediction for Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8731–8740. [Google Scholar]
- Wang, N.; Zhou, W.; Wang, J.; Li, H. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1571–1580. [Google Scholar]
- Xie, F.; Wang, C.; Wang, G.; Yang, W.; Zeng, W. Learning Tracking Representations via Dual-Branch Fully Transformer Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2688–2697. [Google Scholar]
- Lin, L.; Fan, H.; Zhang, Z.; Xu, Y.; Ling, H. Swintrack: A Simple and Strong Baseline for Transformer Tracking. Adv. Neural Inf. Process. Syst. 2022, 35, 16743–16754. [Google Scholar]
- Fu, Z.; Fu, Z.; Liu, Q.; Cai, W.; Wang, Y. Sparsett: Visual Tracking with Sparse Transformers. arXiv 2022, arXiv:2205.03776. [Google Scholar]
- Javed, S.; Danelljan, M.; Khan, F.S.; Khan, M.H.; Felsberg, M.; Matas, J. Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 6552–6574. [Google Scholar] [CrossRef] [PubMed]
- Ye, B.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 341–357. [Google Scholar]
- Wei, X.; Bai, Y.; Zheng, Y.; Shi, D.; Gong, Y. Autoregressive Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 9697–9706. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An Advanced Object Detection Network. In Proceedings of the 24th ACM International Conference on Multimedia, Rhodes, Greece, 24–28 October 2016; pp. 516–520. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Ruder, S. An Overview of Gradient Descent Optimization Algorithms. arXiv 2017, arXiv:1609.04747. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Lecture Notes in Computer Science. Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, D.; Lu, H. Representation Learning for Visual Object Tracking by Masked Appearance Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 18696–18705. [Google Scholar]
- Bhat, G.; Danelljan, M.; Gool, L.V.; Timofte, R. Learning Discriminative Model Prediction for Tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6182–6191. [Google Scholar]
- Danelljan, M.; Gool, L.V.; Timofte, R. Probabilistic Regression for Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7183–7192. [Google Scholar]
- Müller, M.; Bibi, A.; Giancola, S.; Alsubaihi, S.; Ghanem, B. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11205, pp. 310–327. [Google Scholar] [CrossRef]
Non-Real Time | Real Time | ||||||||
---|---|---|---|---|---|---|---|---|---|
TransT [34] | TrDiMP [36] | MATTrack [47] | MixFormer [15] | OSTrack [41] | ECO [2] | LT-Mobile [7] | E.T.Track [17] | AnteaTrack (Ours) | |
0.671 | 0.671 | 0.677 | 0.712 | 0.775 | 0.316 | 0.582 | 0.562 | 0.589 | |
0.768 | 0.777 | 0.784 | 0.799 | 0.876 | 0.309 | 0.671 | 0.641 | 0.690 | |
0.682 | 0.597 | 0.776 | 0.728 | 0.764 | 0.111 | 0.442 | 0.423 | 0.463 | |
FPS | 5 | 6 | 6 | 4 | 3 | 25 | 47 | 47 | 47 |
(a) Normalized Precision | |||||||||
---|---|---|---|---|---|---|---|---|---|
Non-Real Time | Real Time | ||||||||
TrDiMP [36] | TrSiam [36] | DiMP50 [48] | PrDiMP50 [49] | TransT [34] | ECO [2] | LT-Mobile [7] | E.T.Track [17] | AnteaTrack (Ours) | |
ARC | 0.812 | 0.807 | 0.775 | 0.819 | 0.816 | 0.570 | 0.732 | 0.724 | 0.744↑ |
BC | 0.567 | 0.644 | 0.607 | 0.688 | 0.567 | 0.534 | 0.543 | 0.537 | 0.547↑ |
CM | 0.821 | 0.849 | 0.827 | 0.862 | 0.859 | 0.650 | 0.788 | 0.768 | 0.788↑ |
FM | 0.783 | 0.770 | 0.774 | 0.800 | 0.824 | 0.577 | 0.758 | 0.749 | 0.758↑ |
FOC | 0.614 | 0.659 | 0.599 | 0.677 | 0.625 | 0.484 | 0.549 | 0.498 | 0.571↑ |
IV | 0.762 | 0.804 | 0.795 | 0.779 | 0.781 | 0.599 | 0.711 | 0.712 | 0.697 ↓ |
LR | 0.669 | 0.630 | 0.654 | 0.694 | 0.708 | 0.531 | 0.600 | 0.592 | 0.600↑ |
OV | 0.821 | 0.825 | 0.781 | 0.788 | 0.849 | 0.592 | 0.762 | 0.717 | 0.762↑ |
POC | 0.755 | 0.761 | 0.743 | 0.785 | 0.784 | 0.591 | 0.661 | 0.669 | 0.693↑ |
SOB | 0.791 | 0.782 | 0.791 | 0.802 | 0.804 | 0.639 | 0.666 | 0.658 | 0.680↑ |
SV | 0.795 | 0.792 | 0.782 | 0.814 | 0.824 | 0.633 | 0.746 | 0.732 | 0.747↑ |
VC | 0.846 | 0.842 | 0.806 | 0.857 | 0.865 | 0.584 | 0.767 | 0.772 | 0.782↑ |
(b) Precision | |||||||||
Non-Real Time | Real Time | ||||||||
TrDiMP [36] | TrSiam [36] | DiMP50 [48] | PrDiMP50 [49] | TransT [34] | ECO [2] | LT-Mobile [7] | E.T.Track [17] | AnteaTrack (Ours) | |
ARC | 0.851 | 0.842 | 0.808 | 0.851 | 0.840 | 0.654 | 0.760 | 0.762 | 0.787↑ |
BC | 0.540 | 0.722 | 0.687 | 0.775 | 0.614 | 0.624 | 0.625 | 0.599 | 0.625↑ |
CM | 0.864 | 0.892 | 0.872 | 0.903 | 0.893 | 0.721 | 0.826 | 0.813 | 0.839↑ |
FM | 0.842 | 0.823 | 0.832 | 0.858 | 0.860 | 0.652 | 0.800 | 0.793 | 0.803↑ |
FOC | 0.687 | 0.731 | 0.673 | 0.760 | 0.678 | 0.576 | 0.602 | 0.571 | 0.652↑ |
IV | 0.809 | 0.850 | 0.847 | 0.838 | 0.816 | 0.710 | 0.757 | 0.757 | 0.751 ↓ |
LR | 0.767 | 0.720 | 0.747 | 0.790 | 0.772 | 0.683 | 0.673 | 0.677 | 0.674 ↓ |
OV | 0.835 | 0.835 | 0.790 | 0.797 | 0.857 | 0.590 | 0.767 | 0.743 | 0.795↑ |
POC | 0.810 | 0.814 | 0.797 | 0.839 | 0.823 | 0.669 | 0.705 | 0.721 | 0.748↑ |
SOB | 0.848 | 0.833 | 0.800 | 0.848 | 0.850 | 0.747 | 0.716 | 0.725 | 0.763↑ |
SV | 0.845 | 0.839 | 0.830 | 0.862 | 0.860 | 0.707 | 0.785 | 0.778 | 0.796↑ |
VC | 0.878 | 0.870 | 0.828 | 0.883 | 0.892 | 0.680 | 0.789 | 0.795 | 0.813↑ |
(c) Success | |||||||||
Non-Real Time | Real Time | ||||||||
TrDiMP [36] | TrSiam [36] | DiMP50 [48] | PrDiMP50 [49] | TransT [34] | ECO [2] | LT-Mobile [7] | E.T.Track [17] | AnteaTrack (Ours) | |
ARC | 0.643 | 0.639 | 0.601 | 0.645 | 0.648 | 0.445 | 0.585 | 0.581 | 0.594↑ |
BC | 0.429 | 0.495 | 0.461 | 0.527 | 0.430 | 0.387 | 0.433 | 0.409 | 0.428 ↑ |
CM | 0.661 | 0.683 | 0.660 | 0.662 | 0.692 | 0.506 | 0.644 | 0.627 | 0.640 ↑ |
FM | 0.629 | 0.617 | 0.615 | 0.640 | 0.656 | 0.415 | 0.610 | 0.595 | 0.603↑ |
FOC | 0.435 | 0.474 | 0.422 | 0.491 | 0.444 | 0.308 | 0.386 | 0.353 | 0.416↑ |
IV | 0.601 | 0.634 | 0.627 | 0.617 | 0.617 | 0.458 | 0.574 | 0.568 | 0.558 ↓ |
LR | 0.517 | 0.488 | 0.495 | 0.530 | 0.542 | 0.396 | 0.459 | 0.457 | 0.458 ↑ |
OV | 0.632 | 0.634 | 0.590 | 0.608 | 0.663 | 0.425 | 0.601 | 0.571 | 0.605↑ |
POC | 0.593 | 0.599 | 0.576 | 0.619 | 0.614 | 0.456 | 0.523 | 0.529 | 0.546↑ |
SOB | 0.634 | 0.627 | 0.594 | 0.641 | 0.638 | 0.518 | 0.537 | 0.534 | 0.550↑ |
SV | 0.645 | 0.643 | 0.625 | 0.656 | 0.667 | 0.496 | 0.608 | 0.599 | 0.605 ↑ |
VC | 0.689 | 0.683 | 0.641 | 0.688 | 0.708 | 0.473 | 0.633 | 0.635 | 0.641↑ |
Baseline | Cls | Reg | Pixel-Shuffle | UAV123 [23] | GOT-10K [25] | NFS [24] |
---|---|---|---|---|---|---|
✔ | 61.9 | 64.1 | 57.8 | |||
✔ | 61.9 | 65.9 | 59.5 | |||
✔ | ✔ | 62.0 | 68.1 | 60.4 | ||
✔ | ✔ | 61.3 | 67.2 | 59.3 | ||
✔ | ✔ | ✔ | 62.3 | 69.0 | 62.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Z.; Xiong, X.; Meng, F.; Xiao, X.; Liu, J. Scaling-Invariant Max-Filtering Enhancement Transformers for Efficient Visual Tracking. Electronics 2023, 12, 3905. https://doi.org/10.3390/electronics12183905
Chen Z, Xiong X, Meng F, Xiao X, Liu J. Scaling-Invariant Max-Filtering Enhancement Transformers for Efficient Visual Tracking. Electronics. 2023; 12(18):3905. https://doi.org/10.3390/electronics12183905
Chicago/Turabian StyleChen, Zhen, Xingzhong Xiong, Fanqin Meng, Xianbing Xiao, and Jun Liu. 2023. "Scaling-Invariant Max-Filtering Enhancement Transformers for Efficient Visual Tracking" Electronics 12, no. 18: 3905. https://doi.org/10.3390/electronics12183905
APA StyleChen, Z., Xiong, X., Meng, F., Xiao, X., & Liu, J. (2023). Scaling-Invariant Max-Filtering Enhancement Transformers for Efficient Visual Tracking. Electronics, 12(18), 3905. https://doi.org/10.3390/electronics12183905