Learning Precise Mask Representation for Siamese Visual Tracking
Abstract
1. Introduction
- A simple and efficient precise mask representation module is proposed to address limitations of the bounding box-based tracking paradigm in accurately estimating target extent, enabling pixel-wise segmentation by auxiliary learning multi-scale masks of the target.
- To enhance the discrimination of our module, a saliency localization head is designed to capture the spatial saliency of targets and suppress similar distractors.
- The developed PMR module and SL head are generic and easy to integrate into Siamese frameworks, allowing these trackers to achieve accurate segmentation tracking and improve performance without significant additional cost.
2. Related Work
2.1. Siamese Visual Tracking
2.2. Visual Object Segmentation
2.3. Segmentation Guidance Tracking
3. Proposed Method
3.1. Overview
3.2. Precise Mask Representation Module
3.3. Saliency Localization Head
Algorithm 1: Saliency feature generation |
3.4. Supervised Training Loss
4. Experiments
4.1. Implementation Details
4.2. Quantitative Analysis
4.3. Qualitative Analysis
4.4. Ablation Study
4.5. Failure Examples and Future Work
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Javed, S.; Danelljan, M.; Khan, F.S.; Khan, M.H.; Felsberg, M.; Matas, J. Visual object tracking with discriminative filters and siamese networks: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 6552–6574. [Google Scholar] [CrossRef] [PubMed]
- Marvasti-Zadeh, S.M.; Cheng, L.; Ghanei-Yakhdan, H.; Kasaei, S. Deep learning for visual tracking: A comprehensive survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 3943–3968. [Google Scholar] [CrossRef]
- Tao, R.; Gavves, E.; Smeulders, A.W. Siamese instance search for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1420–1429. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking. In Proceedings of the Computer Vision–ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 850–865. [Google Scholar]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8971–8980. [Google Scholar]
- Fan, H.; Ling, H. Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7952–7961. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4282–4291. [Google Scholar]
- Zhang, Z.; Peng, H. Deeper and wider siamese networks for real-time visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4591–4600. [Google Scholar]
- Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J.; Hu, W. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 101–117. [Google Scholar]
- Nie, J.; He, Z.; Yang, Y.; Gao, M.; Dong, Z. Learning localization-aware target confidence for siamese visual tracking. IEEE Trans. Multimed. 2022, 25, 6194–6206. [Google Scholar] [CrossRef]
- Ma, Z.; Wang, L.; Zhang, H.; Lu, W.; Yin, J. Rpt: Learning point set representation for siamese visual tracking. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part V 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 653–665. [Google Scholar]
- Chen, Z.; Zhong, B.; Li, G.; Zhang, S.; Ji, R.; Tang, Z.; Li, X. SiamBAN: Target-aware tracking with Siamese box adaptive network. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5158–5173. [Google Scholar] [CrossRef]
- Han, G.; Su, J.; Liu, Y.; Zhao, Y.; Kwong, S. Multi-stage visual tracking with siamese anchor-free proposal network. IEEE Trans. Multimed. 2021, 25, 430–442. [Google Scholar] [CrossRef]
- Zhang, S.; Zhao, X.; Fang, L. CAT: Corner aided tracking with deep regression network. IEEE Trans. Multimed. 2020, 23, 859–870. [Google Scholar] [CrossRef]
- Huang, X.; Cao, S.; Dong, C.; Song, T.; Xu, Z. Improved Fully Convolutional Siamese Networks for Visual Object Tracking Based on Response Behaviour Analysis. Sensors 2022, 22, 6550. [Google Scholar] [CrossRef]
- Paul, M.; Danelljan, M.; Mayer, C.; Van Gool, L. Robust visual tracking by segmentation. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 571–588. [Google Scholar]
- Voigtlaender, P.; Luiten, J.; Torr, P.H.; Leibe, B. Siam r-cnn: Visual tracking by re-detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6578–6588. [Google Scholar]
- Yan, B.; Zhang, X.; Wang, D.; Lu, H.; Yang, X. Alpha-refine: Boosting tracking performance by precise bounding box estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5289–5298. [Google Scholar]
- Ning, T.; Zhong, B.; Liang, Q.; Tang, Z.; Li, X. Robust Tracking via Bidirectional Transduction With Mask Information. IEEE Trans. Multimed. 2023, 26, 4308–4319. [Google Scholar] [CrossRef]
- Lukežič, A.; Matas, J.; Kristan, M. A discriminative single-shot segmentation network for visual object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 9742–9755. [Google Scholar] [CrossRef]
- Hu, W.; Wang, Q.; Zhang, L.; Bertinetto, L.; Torr, P.H. Siammask: A framework for fast online object tracking and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3072–3089. [Google Scholar]
- Zhou, T.; Porikli, F.; Crandall, D.J.; Van Gool, L.; Wang, W. A survey on deep learning technique for video segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 7099–7122. [Google Scholar] [CrossRef] [PubMed]
- Hu, Y.T.; Huang, J.B.; Schwing, A.G. Videomatch: Matching based video object segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 54–70. [Google Scholar]
- Chen, Z.; Zhong, B.; Li, G.; Zhang, S.; Ji, R. Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6668–6677. [Google Scholar]
- Huang, L.; Zhao, X.; Huang, K. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1562–1577. [Google Scholar] [CrossRef]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [PubMed]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Proceedings, Part IV 12. Springer: Berlin/Heidelberg, Germany, 2012; pp. 702–715. [Google Scholar]
- Arthanari, S.; Moorthy, S.; Jeong, J.H.; Joo, Y.H. Adaptive spatially regularized target attribute-aware background suppressed deep correlation filter for object tracking. Signal Process. Image Commun. 2025, 136, 117305. [Google Scholar] [CrossRef]
- Danelljan, M.; Bhat, G.; Shahbaz Khan, F.; Felsberg, M. Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6638–6646. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef]
- Xu, Y.; Wang, Z.; Li, Z.; Yuan, Y.; Yu, G. Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12549–12556. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. arXiv 2019, arXiv:1904.01355. [Google Scholar] [CrossRef]
- Zhang, Z.; Peng, H.; Fu, J.; Li, B.; Hu, W. Ocean: Object-aware anchor-free tracking. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXI 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 771–787. [Google Scholar]
- Yang, K.; He, Z.; Pei, W.; Zhou, Z.; Li, X.; Yuan, D.; Zhang, H. SiamCorners: Siamese corner networks for visual tracking. IEEE Trans. Multimed. 2021, 24, 1956–1967. [Google Scholar] [CrossRef]
- Li, Q.; Qin, Z.; Zhang, W.; Zheng, W. Siamese keypoint prediction network for visual object tracking. arXiv 2020, arXiv:2006.04078. [Google Scholar] [CrossRef]
- Fan, B.; Chen, K.; Jiang, G.; Tian, J. Two-way complementary tracking guidance. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6200–6212. [Google Scholar] [CrossRef]
- Meng, F.; Gong, X.; Zhang, Y. SiamRank: A siamese based visual tracking network with ranking strategy. Pattern Recognit. 2023, 141, 109630. [Google Scholar] [CrossRef]
- Zhang, J.; Dai, K.; Li, Z.; Wei, R.; Wang, Y. Spatio-temporal matching for siamese visual tracking. Neurocomputing 2023, 522, 73–88. [Google Scholar] [CrossRef]
- Zhou, Z.; Zhou, X.; Chen, Z.; Guo, P.; Liu, Q.Y.; Zhang, W. Memory network with pixel-level spatio-temporal learning for visual object tracking. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6897–6911. [Google Scholar] [CrossRef]
- Voigtlaender, P.; Leibe, B. Online adaptation of convolutional neural networks for video object segmentation. arXiv 2017, arXiv:1706.09364. [Google Scholar] [CrossRef]
- Xiao, H.; Kang, B.; Liu, Y.; Zhang, M.; Feng, J. Online meta adaptation for fast video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 1205–1217. [Google Scholar] [CrossRef] [PubMed]
- Cheng, J.; Tsai, Y.H.; Hung, W.C.; Wang, S.; Yang, M.H. Fast and accurate online video object segmentation via tracking parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7415–7424. [Google Scholar]
- Huang, X.; Xu, J.; Tai, Y.W.; Tang, C.K. Fast video object segmentation with temporal aggregation network and dynamic template matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8879–8889. [Google Scholar]
- Xie, H.; Yao, H.; Zhou, S.; Zhang, S.; Sun, W. Efficient regional memory network for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1286–1295. [Google Scholar]
- Wang, H.; Jiang, X.; Ren, H.; Hu, Y.; Bai, S. Swiftnet: Real-time video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1296–1305. [Google Scholar]
- Yang, P.; Wang, Q.; Dou, J.; Dou, L. Learning saliency-awareness Siamese network for visual object tracking. J. Vis. Commun. Image Represent. 2024, 103, 104237. [Google Scholar] [CrossRef]
- Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Guo, D.; Wang, J.; Cui, Y.; Wang, Z.; Chen, S. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6269–6277. [Google Scholar]
- Zhuge, M.; Fan, D.P.; Liu, N.; Zhang, D.; Xu, D.; Shao, L. Salient object detection via integrity learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3738–3752. [Google Scholar] [CrossRef]
- De Boer, P.T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Xu, N.; Yang, L.; Fan, Y.; Yang, J.; Yue, D.; Liang, Y.; Price, B.; Cohen, S.; Huang, T. Youtube-vos: Sequence-to-sequence video object segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 585–601. [Google Scholar]
- Zhao, B.; Bhat, G.; Danelljan, M.; Van Gool, L.; Timofte, R. Generating masks from boxes by mining spatio-temporal consistencies in videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 13556–13566. [Google Scholar]
- Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5374–5383. [Google Scholar]
- Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Pflugfelder, R.; Kamarainen, J.K.; ˇCehovin Zajc, L.; Drbohlav, O.; Lukezic, A.; Berg, A.; et al. The seventh visual object tracking VOT2019 challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 2206–2241. [Google Scholar]
- Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for uav tracking. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 445–461. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4660–4669. [Google Scholar]
- Huang, B.; Xu, T.; Shen, Z.; Jiang, S.; Zhao, B.; Bian, Z. SiamATL: Online update of siamese tracking network via attentional transfer learning. IEEE Trans. Cybern. 2021, 52, 7527–7540. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Wang, H.; Ma, C.; Su, Y.; Yang, X. Siamdmu: Siamese dual mask update network for visual object tracking. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 1656–1669. [Google Scholar] [CrossRef]
- Arthanari, S.; Jeong, J.H.; Joo, Y.H. Learning multi-regularized mutation-aware correlation filter for object tracking via an adaptive hybrid model. Neural Netw. 2025, 191, 107746. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W.; Torr, P.H. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1328–1338. [Google Scholar]
- Chen, B.X.; Tsotsos, J.K. Fast visual object tracking with rotated bounding boxes. arXiv 2019, arXiv:1907.03892. [Google Scholar] [CrossRef]
- Fan, N.; Liu, Q.; Li, X.; Zhou, Z.; He, Z. Siamese residual network for efficient visual tracking. Inf. Sci. 2023, 624, 606–623. [Google Scholar] [CrossRef]
- Wu, Y.; Lim, J.; Yang, M.H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, Oregon, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
- Guo, D.; Shao, Y.; Cui, Y.; Wang, Z.; Zhang, L.; Shen, C. Graph attention tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9543–9552. [Google Scholar]
- Wei, B.; Chen, H.; Ding, Q.; Luo, H. Siamagn: Siamese attention-guided network for visual tracking. Neurocomputing 2022, 512, 69–82. [Google Scholar] [CrossRef]
Dataset | No. of Videos | Total Frames | Object Categories | No. of Attributes | Scope |
---|---|---|---|---|---|
GOT-10k | 420 | 56k | 84 | 6 | short-term, wild |
LaSOT | 280 | 568k | 70 | 14 | long-term, wild |
VOT2019 | 60 | 19.9k | 30 | 5 | short-term, rotated bound Box |
UAV123 | 123 | 113k | 9 | 12 | UAV vision. |
OTB100 | 100 | 59k | 22 | 11 | short-term, general |
GOT-10k | LaSOT | VOT2019 | ||||||
---|---|---|---|---|---|---|---|---|
Tracker | AO↑ | SR0.5↑ | SR0.75↑ | Succ.↑ | Prec.↑ | EAO↑ | A↑ | B↓ |
SiamFC [4] | 0.392 | 0.426 | 0.135 | - | - | - | - | - |
SiamRPN [5] | 0.463 | 0.549 | 0.253 | 0.448 | 0.436 | 0.233 | 0.604 | 0.487 |
SiamFC++-AlexNet [31] | 0.493 | 0.577 | 0.323 | 0.501 | - | - | - | - |
SiamMask [21] | 0.514 | 0.587 | 0.366 | 0.467 | 0.469 | 0.282 | 0.604 | 0.487 |
SiamKPN [35] | 0.529 | 0.606 | 0.362 | 0.489 | 0.489 | |||
ATOM [62] | 0.556 | 0.635 | 0.402 | 0.515 | - | 0.301 | 0.603 | 0.411 |
SiamCAR [50] | 0.569 | 0.687 | 0.415 | 0.507 | 0.510 | |||
SiamRAKN [37] | 0.544 | 0.646 | 0.368 | - | - | 0.294 | 0.588 | 0.461 |
SiamATL [63] | 0.388 | - | - | 0.429 | 0.412 | - | - | - |
SiamCorners [34] | - | - | - | 0.480 | 0.555 | - | - | - |
Ocean_offline [33] | - | - | - | 0.526 | 0.526 | 0.327 | 0.590 | 0.376 |
SiamDMU [64] | - | - | - | 0.499 | 0.498 | - | - | - |
ASTABSCF [28] | - | 0.581 | - | 0.460 | 0.487 | - | - | - |
MRMACF [65] | - | - | - | 0.475 | 0.500 | - | - | - |
SiamBAN [24] | 0.541 | 0.642 | 0.401 | 0.514 | 0.518 | 0.314 | 0.597 | 0.381 |
SiamRPN++ [7] | 0.518 | 0.618 | 0.325 | 0.495 | 0.493 | 0.287 | 0.595 | 0.467 |
SiamRPN++-PMRSL | 0.545 | 0.667 | 0.317 | 0.532 | 0.529 | 0.301 | 0.597 | 0.426 |
SiamBAN-PMRSL | 0.582 | 0.687 | 0.446 | 0.534 | 0.531 | 0.324 | 0.600 | 0.321 |
PMR | SL | GFLOPs | Params | GOT-10k | LaSOT | VOT2019 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AO ↑ | SR0.5 ↑ | FPS ↑ | Succ. ↑ | Prec. ↑ | FPS ↑ | EAO ↑ | A ↑ | FPS ↑ | |||||
SiamBAN | 59.59 | 53.93M | 0.541 | 0.642 | 114.1 | 0.514 | 0.518 | 127.0 | 0.312 | 0.597 | 142.5 | ||
✓ | 61.82 | 54.60M | 0.574 | 0.676 | 83.3 | 0.519 | 0.527 | 97.2 | 0.309 | 0.598 | 107.6 | ||
✓ | ✓ | 62.31 | 54.71M | 0.582 | 0.687 | 70.9 | 0.534 | 0.531 | 79.4 | 0.324 | 0.600 | 80.5 | |
SiamRPN++ | 59.60 | 53.95M | 0.518 | 0.618 | 115.6 | 0.496 | 0.491 | 126.5 | 0.276 | 0.603 | 142.9 | ||
✓ | 61.61 | 54.55M | 0.537 | 0.641 | 80.8 | 0.515 | 0.521 | 95.4 | 0.290 | 0.609 | 104.7 | ||
✓ | ✓ | 62.25 | 54.66M | 0.545 | 0.667 | 67.6 | 0.532 | 0.529 | 81.3 | 0.301 | 0.597 | 87.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, P.; Hu, F.; Wang, Q.; Dou, L. Learning Precise Mask Representation for Siamese Visual Tracking. Sensors 2025, 25, 5743. https://doi.org/10.3390/s25185743
Yang P, Hu F, Wang Q, Dou L. Learning Precise Mask Representation for Siamese Visual Tracking. Sensors. 2025; 25(18):5743. https://doi.org/10.3390/s25185743
Chicago/Turabian StyleYang, Peng, Fen Hu, Qinghui Wang, and Lei Dou. 2025. "Learning Precise Mask Representation for Siamese Visual Tracking" Sensors 25, no. 18: 5743. https://doi.org/10.3390/s25185743
APA StyleYang, P., Hu, F., Wang, Q., & Dou, L. (2025). Learning Precise Mask Representation for Siamese Visual Tracking. Sensors, 25(18), 5743. https://doi.org/10.3390/s25185743