Next Article in Journal
Portable LiDAR-Based Method for Improvement of Grass Height Measurement Accuracy: Comparison with SfM Methods
Previous Article in Journal
An Acoustic Sensing Gesture Recognition System Design Based on a Hidden Markov Model
Article

HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking

Department of Computer Science, College of Mathematics and Computer Science, Zhejiang Normal University, No 688, Yingbin Road, Jinhua 321004, China
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(17), 4807; https://doi.org/10.3390/s20174807
Received: 23 July 2020 / Revised: 18 August 2020 / Accepted: 22 August 2020 / Published: 26 August 2020
(This article belongs to the Section Intelligent Sensors)
Siamese network-based trackers consider tracking as features cross-correlation between the target template and the search region. Therefore, feature representation plays an important role for constructing a high-performance tracker. However, all existing Siamese networks extract the deep but low-resolution features of the entire patch, which is not robust enough to estimate the target bounding box accurately. In this work, to address this issue, we propose a novel high-resolution Siamese network, which connects the high-to-low resolution convolution streams in parallel as well as repeatedly exchanges the information across resolutions to maintain high-resolution representations. The resulting representation is semantically richer and spatially more precise by a simple yet effective multi-scale feature fusion strategy. Moreover, we exploit attention mechanisms to learn object-aware masks for adaptive feature refinement, and use deformable convolution to handle complex geometric transformations. This makes the target more discriminative against distractors and background. Without bells and whistles, extensive experiments on popular tracking benchmarks containing OTB100, UAV123, VOT2018 and LaSOT demonstrate that the proposed tracker achieves state-of-the-art performance and runs in real time, confirming its efficiency and effectiveness. View Full-Text
Keywords: Siamese network; high-resolution representation; multi-scale fusion; visual tracking; attention mechanisms; deformable convolution Siamese network; high-resolution representation; multi-scale fusion; visual tracking; attention mechanisms; deformable convolution
Show Figures

Figure 1

MDPI and ACS Style

Zhang, D.; Zheng, Z.; Wang, T.; He, Y. HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking. Sensors 2020, 20, 4807. https://doi.org/10.3390/s20174807

AMA Style

Zhang D, Zheng Z, Wang T, He Y. HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking. Sensors. 2020; 20(17):4807. https://doi.org/10.3390/s20174807

Chicago/Turabian Style

Zhang, Dawei, Zhonglong Zheng, Tianxiang Wang, and Yiran He. 2020. "HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking" Sensors 20, no. 17: 4807. https://doi.org/10.3390/s20174807

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop