ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking
Abstract
:1. Introduction
2. Methods
2.1. Global Context Block
2.2. Cropping-Inside Selective Kernel Block
3. Experiments
3.1. Implementation Details
3.2. Result on OTB100
3.3. Result on VOT2016
3.4. Result on VOT2019
3.5. Ablation Study
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Qin, X.; Fan, Z. Initial Matting-Guided Visual Tracking with Siamese Network. IEEE Access 2019, 7, 41669–41677. [Google Scholar] [CrossRef]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8971–8980. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-Convolutional Siamese Networks for Object Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 850–865. [Google Scholar]
- Fan, H.; Ling, H. Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7952–7961. [Google Scholar]
- Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J.; Hu, W. Distractor-aware Siamese Networks for Visual Object Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 103–119. [Google Scholar]
- Zhang, Z.; Peng, H. Deeper and Wider Siamese Networks for Real-Time Visual Tracking. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4591–4600. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2019; pp. 4282–4291. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Qin, X.; Wu, C.; Chang, H.; Lu, H.; Zhang, X. Match Feature U-Net: Dynamic Receptive Field Networks for Biomedical Image Segmentation. Symmetry 2020, 12, 1230. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-local networks meet squeeze-excitation networks and beyond. arXiv 2019, arXiv:1904.11492. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Real, E.; Shlens, J.; Mazzocchi, S.; Pan, X.; Vanhoucke, V. YouTubeBoundingBoxes. A large high-precision human-annotated data set for object detection in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5296–5305. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.-H. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [Green Version]
- Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Cehovin, L.; Chi, Z. The Visual Object Tracking VOT2016 Challenge Results. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherland, 8–10 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 777–823. [Google Scholar]
- Kristan, M.; Berg, A.; Zheng, L.; Rout, L.; Van Gool, L.; Bertinetto, L.; Zhou, L. The Seventh Visual Object Tracking VOT2019 Challenge Results. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Zhang, J.; Ma, S.; Sclaroff, S. MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 188–203. [Google Scholar]
- Hong, Z.; Chen, Z.; Wang, C.; Mei, X.; Prokhorov, D.V.; Tao, D. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of the Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 749–758. [Google Scholar]
- Danelljan, M.; Hager, G.; Khan, F.S.; Felsberg, M. Accurate Scale Estimation for Robust Visual Tracking. In Proceedings of the British Machine Vision Conference, Nottingham, UK, 1–5 September 2014; BMVA Press: Nottingham, UK, 2014. [Google Scholar]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J.P. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [Green Version]
- Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M.-M.; Hicks, S.L.; Torr, P.H.S. Struck: Structured Output Tracking with Kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2096–2109. [Google Scholar] [CrossRef] [Green Version]
- Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-Learning-Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1409–1422. [Google Scholar] [CrossRef] [Green Version]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 702–715. [Google Scholar]
- Danelljan, M.; Robinson, A.; Khan, F.S.; Felsberg, M. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 472–488. [Google Scholar]
- Nam, H.; Baek, M.; Han, B. Modeling and Propagating CNNs in a Tree Structure for Visual Tracking. arXiv 2016, arXiv:1608.07242. [Google Scholar]
- Wang, L.; Ouyang, W.; Wang, X.; Lu, H.; Lijun, W.; Wanli, O.; XiaoGang, W.; Huchuan, L. Visual Tracking with Fully Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3119–3127. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H. Staple: Complementary Learners for Real-Time Tracking. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1401–1409. [Google Scholar]
- Zhu, G.; Porikli, F.; Li, H. Tracking Randomly Moving Objects on Edge Box Proposals. arXiv 2015, arXiv:1507.08085. [Google Scholar]
- Li, X.; Ma, C.; Wu, B.; He, Z.; Yang, M. Target-Aware Deep Tracking. In Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 1369–1378. [Google Scholar]
- Lukezic, A.; Vojir, T.; Zajc, L.C.; Matas, J.; Kristan, M. Discriminative Correlation Filter with Channel and Spatial Reliability. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4847–4856. [Google Scholar]
Trackers | EAO | Accuracy | Robustness |
---|---|---|---|
Ours | 0.397 | 0.601 | 0.252 |
SiamDW [7] | 0.376 | 0.580 | 0.240 |
SiamRPN [2] | 0.344 | 0.560 | 0.260 |
CCOT [32] | 0.331 | 0.539 | 0.238 |
TCNN [33] | 0.325 | 0.554 | 0.268 |
SSAT [23] | 0.321 | 0.577 | 0.291 |
MLDF [34] | 0.311 | 0.490 | 0.233 |
Staple [35] | 0.295 | 0.544 | 0.378 |
EBT [36] | 0.291 | 0.465 | 0.251 |
SRBT [23] | 0.290 | 0.496 | 0.350 |
Trackers | EAO | Accuracy | Robustness |
---|---|---|---|
Ours | 0.240 | 0.562 | 0.642 |
SSRCCOT [24] | 0.234 | 0.495 | 0.507 |
MemDTC [24] | 0.228 | 0.485 | 0.587 |
SiamRPNX [24] | 0.224 | 0.517 | 0.552 |
Siamfcos [24] | 0.223 | 0.561 | 0.788 |
TADT [37] | 0.207 | 0.516 | 0.677 |
CSRDCF [38] | 0.201 | 0.496 | 0.632 |
CSRpp [24] | 0.187 | 0.468 | 0.662 |
FSC2F [24] | 0.185 | 0.480 | 0.752 |
ALTO [24] | 0.182 | 0.358 | 0.818 |
Settings | VOT2016 | ||
---|---|---|---|
EAO | Accuracy | Robustness | |
SiamRPN | 0.344 | 0.560 | 0.260 |
SiamRPN+GC | 0.366 | 0.597 | 0.265 |
SiamRPN+CiSK | 0.373 | 0.599 | 0.270 |
ACSiamRPN(Ours) | 0.397 | 0.601 | 0.252 |
Crop | GAP | GMP | VOT2016 | ||
---|---|---|---|---|---|
EAO | Accuracy | Robustness | |||
√ | 0.369 | 0.607 | 0.266 | ||
√ | 0.367 | 0.601 | 0.308 | ||
√ | √ | 0.370 | 0.605 | 0.294 | |
√ | √ | 0.382 | 0.601 | 0.266 | |
√ | √ | 0.395 | 0.604 | 0.256 | |
√ | √ | √ | 0.397 | 0.601 | 0.252 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qin, X.; Zhang, Y.; Chang, H.; Lu, H.; Zhang, X. ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking. Electronics 2020, 9, 1528. https://doi.org/10.3390/electronics9091528
Qin X, Zhang Y, Chang H, Lu H, Zhang X. ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking. Electronics. 2020; 9(9):1528. https://doi.org/10.3390/electronics9091528
Chicago/Turabian StyleQin, Xiaofei, Yipeng Zhang, Hang Chang, Hao Lu, and Xuedian Zhang. 2020. "ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking" Electronics 9, no. 9: 1528. https://doi.org/10.3390/electronics9091528
APA StyleQin, X., Zhang, Y., Chang, H., Lu, H., & Zhang, X. (2020). ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking. Electronics, 9(9), 1528. https://doi.org/10.3390/electronics9091528