High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts
Abstract
1. Introduction
- High-Order Temporal Context-Aware Representation: We propose a dual-stream framework that enriches motion and appearance features with temporal contexts. Motion features, extracted from optical flow and motion-history images (MHIs), capture dynamic trajectories, while appearance features, refined through temporal-adaptive convolutions, adapt to appearance variation. Both enhanced cues improve robustness against drifting or disruption commonly encountered in aerial tracking.
- Heterogeneous Feature Representation Refinement: We leverage a bidirectional cross-domain interaction that equally refines the motion and appearance features. Motion guides appearance to ensure temporal consistency, while appearance calibrates motion to suppress noise, creating a cohesive representation resilient to occlusions and complex backgrounds.
- Consistent Decision Fusion with Smooth Constraint: We introduce a decision module that dynamically judges the refined feature pairs with velocity-based gating and enforces consistent inference with smooth spatial–temporal constraints so as to avoid unreasonable decision switch between the heterogeneous visual experts.
2. Related Works
2.1. Visual Tracking in Aerial Remote Sensing
2.2. Temporal Context in Visual Tracking
2.3. Motion Cue in Visual Tracking
2.4. Heterogeneous Feature Fusion in Visual Tracking
3. Proposed Method
3.1. High-Order Temporal-Aware Representation
3.1.1. Motion Representation
3.1.2. Appearance Representation
3.1.3. Temporal Context-Aware Enhancement
3.2. Heterogeneous Feature Refinement
3.2.1. Feature Projection Embedding
3.2.2. Bidirectional Cross-Domain Interaction
3.3. Consistent Decision Fusion with Smooth Constraint
3.3.1. Dynamic Feature Gating
- Robust context-driven feature selection. The motion gating response is more sensitive to short-term velocity changes, enabling swift adaptation to rapid movements, while the appearance gating favors stable scenes, ensuring visual consistency when motion is less informative.
- Temporal smoothness. Incorporating temporal gradients suppresses erratic cue shifts when the video contents change slowly (e.g., cruise or hover mode).
- Historical complementarity optimization. Leveraging historical confidence and velocity weighting, enabling the model to offset short-term degradation in either stream by relying on the complementary one.
3.3.2. Smooth Decision Fusion
3.3.3. Failure-Aware Recovery
- Confidence-Aware Fallback: If the appearance cue is unreliable (e.g., ), we preserve the gating weight from the previous frame () and estimate the target’s position using a short-term average of recent velocities, . This strategy assumes that the target continues along a consistent path over a short period, effectively bridging visual disruptions.
- Velocity-Guided Prediction: If the appearance confidence drops below 0.2 (e.g., severe occlusion), we update the position based on extrapolated motion using . Here, is a weighted average of the past three velocities, computed as . This method extends the motion trend to estimate the target’s location when the appearance cue is unavailable.
4. Experimental Results
4.1. Dataset
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Quantitative Evaluation
4.5. Qualitative Evaluation
4.6. Ablation and Hyperparameter Sensitivity Study
4.6.1. Component Ablation of Overall Framework
4.6.2. Sensitivity to MHI Parameters
4.6.3. Sensitivity to Cross-Attention Head Number
4.6.4. Sensitivity to Smooth Constraint Parameters
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gaffey, C.; Bhardwaj, A. Applications of unmanned aerial vehicles in cryosphere: Latest advances and prospects. Remote Sens. 2020, 12, 948. [Google Scholar] [CrossRef]
- Sishodia, R.P.; Ray, R.L.; Singh, S.K. Applications of remote sensing in precision agriculture: A review. Remote Sens. 2020, 12, 3136. [Google Scholar] [CrossRef]
- Manfreda, S.; McCabe, M.F.; Miller, P.E.; Lucas, R.; Pajuelo Madrigal, V.; Mallinis, G.; Ben Dor, E.; Helman, D.; Estes, L.; Ciraolo, G.; et al. On the use of unmanned aerial systems for environmental monitoring. Remote Sens. 2018, 10, 641. [Google Scholar] [CrossRef]
- Jorge, V.A.; Granada, R.; Maidana, R.G.; Jurak, D.A.; Heck, G.; Negreiros, A.P.; Dos Santos, D.H.; Gonçalves, L.M.; Amory, A.M. A survey on unmanned surface vehicles for disaster robotics: Main challenges and directions. Sensors 2019, 19, 702. [Google Scholar] [CrossRef] [PubMed]
- Hamdi, Z.M.; Brandmeier, M.; Straub, C. Forest damage assessment using deep learning on high resolution remote sensing data. Remote Sens. 2019, 11, 1976. [Google Scholar] [CrossRef]
- Danelljan, M.; Bhat, G.; Shahbaz Khan, F.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6638–6646. [Google Scholar]
- Galoogahi, H.K.; Fagg, A.; Lucey, S. Learning background-aware correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1135–1143. [Google Scholar]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern. Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef] [PubMed]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4310–4318. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 850–865. [Google Scholar]
- Cao, Z.; Fu, C.; Ye, J.; Li, B.; Li, Y. SiamAPN++: Siamese attentional aggregation network for real-time UAV tracking. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 1–7. [Google Scholar]
- Li, F.; Tian, C.; Zuo, W.; Zhang, L.; Yang, M.H. Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4904–4913. [Google Scholar]
- Zhang, L.; Gonzalez-Garcia, A.; van de Weijer, J.; Danelljan, M.; Khan, F.S. Learning the model update for siamese trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4010–4019. [Google Scholar]
- Yang, T.; Chan, A.B. Learning dynamic memory networks for object tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 152–167. [Google Scholar]
- Fu, Z.; Liu, Q.; Fu, Z.; Wang, Y. STMTrack: Template-free visual tracking with space-time memory networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13774–13783. [Google Scholar]
- Cao, Z.; Huang, Z.; Pan, L.; Zhang, S.; Liu, Z.; Fu, C. TCTrack: Temporal contexts for aerial tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14778–14788. [Google Scholar]
- Yan, B.; Peng, H.; Fu, J.; Wang, D.; Lu, H. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10448–10457. [Google Scholar]
- Zhu, Z.; Wu, W.; Zou, W.; Yan, J. End-to-end flow correlation tracking with spatial-temporal attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 548–557. [Google Scholar]
- Moudgil, A.; Gandhi, V. Long-term tracking in the wild: A benchmark. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 670–686. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4277–4286. [Google Scholar]
- Huang, Z.; Fu, C.; Li, Y.; Lin, F.; Lu, P. Learning aberrance repressed correlation filters for real-time UAV tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2891–2900. [Google Scholar]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. ATOM: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4660–4669. [Google Scholar]
- Zhu, X.; Xiong, Y.; Dai, J.; Yuan, L.; Wei, Y. Deep feature flow for video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2349–2358. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Yu, Y.; Xiong, Y.; Huang, W.; Scott, M.R. Deformable siamese attention networks for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6728–6737. [Google Scholar]
- Guo, D.; Shao, Y.; Cui, Y.; Wang, Z.; Zhang, L.; Shen, C. Graph attention tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9543–9552. [Google Scholar]
- Chen, X.; Yan, B.; Zhu, J.; Wang, D.; Yang, X.; Lu, H. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8126–8135. [Google Scholar]
- Cui, Y.; Jiang, C.; Wang, L.; Wu, G. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13608–13618. [Google Scholar]
- He, K.; Zhang, C.; Xie, S.; Li, Z.; Wang, Z. Target-aware tracking with long-term context attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 773–780. [Google Scholar]
- Farnebäck, G. Two-Frame Motion Estimation Based on Polynomial Expansion. In Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, 29 June–2 July 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 363–370. [Google Scholar]
- Huang, Z.; Zhang, S.; Pan, L.; Qing, Z.; Tang, M.; Liu, Z.; Ang, M.H., Jr. TAda! Temporally-Adaptive Convolutions for Video Understanding. arXiv 2021, arXiv:2110.06178. [Google Scholar]
- Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for UAV tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 445–461. [Google Scholar]
- Li, S.; Yeung, D.Y. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5374–5383. [Google Scholar]
- Huang, L.; Zhao, X.; Huang, K. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern. Anal. Mach. Intell. 2019, 43, 1562–1577. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Peng, H. Deeper and wider siamese networks for real-time visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4591–4600. [Google Scholar]
- Danelljan, M.; Robinson, A.; Shahbaz Khan, F.; Felsberg, M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 472–488. [Google Scholar]
- Ma, C.; Yang, X.; Zhang, C.; Yang, M.H. Long-term correlation tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5388–5396. [Google Scholar]
- Wang, N.; Song, Y.; Ma, C.; Zhou, W.; Liu, W.; Li, H. Unsupervised deep tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1308–1317. [Google Scholar]
- Ma, C.; Huang, J.B.; Yang, X.; Yang, M.H. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3074–3082. [Google Scholar]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8971–8980. [Google Scholar]
- Wang, N.; Zhou, W.; Tian, Q.; Hong, R.; Wang, M.; Li, H. Multi-cue correlation filters for robust visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4844–4853. [Google Scholar]
- Fu, C.; Cao, Z.; Li, Y.; Ye, J.; Feng, C. Siamese anchor proposal network for high-speed aerial tracking. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 510–516. [Google Scholar]
- Zhu, G.; Wang, J.; Wu, Y.; Lu, H. Collaborative Correlation Tracking. In Proceedings of the British Machine Vision Conference, Swansea, UK, 7–10 September 2015; pp. 184.1–184.12. [Google Scholar]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Discriminative scale space tracking. IEEE Trans. Pattern. Anal. Mach. Intell. 2016, 39, 1561–1575. [Google Scholar] [CrossRef] [PubMed]
- Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H. Staple: Complementary learners for real-time tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1401–1409. [Google Scholar]
- Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1430–1438. [Google Scholar]
- Li, Y.; Fu, C.; Ding, F.; Huang, Z.; Lu, G. AutoTrack: Towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11923–11932. [Google Scholar]
- Li, Y.; Zhu, J. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 254–265. [Google Scholar]
- Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J.; Hu, W. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 101–117. [Google Scholar]
- Danelljan, M.; Shahbaz Khan, F.; Felsberg, M.; Van de Weijer, J. Adaptive color attributes for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1090–1097. [Google Scholar]
- Wang, C.; Zhang, L.; Xie, L.; Yuan, J. Kernel cross-correlator. Proc. AAAI Conf. Artif. Intell. 2018, 32. [Google Scholar] [CrossRef]
- Li, F.; Yao, Y.; Li, P.; Zhang, D.; Zuo, W.; Yang, M.H. Integrating boundary and center correlation filters for visual tracking with aspect ratio variation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2001–2009. [Google Scholar]
- Li, X.; Ma, C.; Wu, B.; He, Z.; Yang, M.H. Target-aware deep tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1369–1378. [Google Scholar]
- Lukezic, A.; Vojir, T.; Čehovin Zajc, L.; Matas, J.; Kristan, M. Discriminative correlation filter with channel and spatial reliability. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6309–6318. [Google Scholar]
- Zhang, T.; Xu, C.; Yang, M.H. Multi-task correlation particle filter for robust object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4335–4343. [Google Scholar]
Trackers | Succ. | Prec. | Trackers | Succ. | Prec. |
---|---|---|---|---|---|
Ours | 0.629 | 0.828 | UDT [42] | 0.422 | 0.605 |
TCTrack [16] | 0.621 | 0.815 | ARCF_H [22] | 0.416 | 0.618 |
SiamAPN++ [11] | 0.594 | 0.792 | CF2 [43] | 0.415 | 0.645 |
SiamRPN [44] | 0.586 | 0.794 | MCCT_H [45] | 0.405 | 0.606 |
SiamAPN [46] | 0.585 | 0.786 | BACF [7] | 0.402 | 0.601 |
CCOT [40] | 0.517 | 0.771 | CoKCF [47] | 0.378 | 0.592 |
DeepS0TRCF [12] | 0.505 | 0.736 | SRDCF [9] | 0.363 | 0.707 |
ECO_gpu [6] | 0.502 | 0.724 | fDSST [48] | 0.357 | 0.536 |
SiamDW [39] | 0.492 | 0.727 | Staple_CA [49] | 0.351 | 0.535 |
MCCT [45] | 0.484 | 0.707 | SRDCFdecon [50] | 0.351 | 0.506 |
AutoTrack [51] | 0.478 | 0.698 | SAMF_CA [52] | 0.346 | 0.609 |
DASiamRPN [53] | 0.474 | 0.684 | SAMF [52] | 0.340 | 0.524 |
ARCF [22] | 0.472 | 0.671 | CN [54] | 0.315 | 0.514 |
UDTplus [42] | 0.462 | 0.665 | KCC [55] | 0.291 | 0.442 |
IBCCF [56] | 0.460 | 0.671 | LCT [41] | 0.282 | 0.464 |
TADT [57] | 0.459 | 0.691 | KCF [8] | 0.280 | 0.470 |
ECO_HC [6] | 0.453 | 0.645 | DCF [58] | 0.280 | 0.469 |
CSRDCF [58] | 0.438 | 0.648 | DSST [48] | 0.276 | 0.521 |
STRCF [12] | 0.437 | 0.651 | Staple [49] | 0.265 | 0.368 |
MCPF [59] | 0.432 | 0.666 |
Components | UAV123 | DTB70 | ||||
---|---|---|---|---|---|---|
Motion Feature Extraction | Heterogeneous Feature Refinement | Consistent Decision Fusion | Succ. | Prec. | Succ. | Prec. |
× | × | × | 0.575 | 0.765 | 0.586 | 0.794 |
✓ | × | × | 0.591 | 0.783 | 0.601 | 0.809 |
✓ | ✓ | × | 0.607 | 0.801 | 0.614 | 0.816 |
✓ | ✓ | ✓ | 0.616 | 0.812 | 0.629 | 0.828 |
Configuration | Fast Motion | Occlusion | Overall | |||
---|---|---|---|---|---|---|
Succ. | Prec. | Succ. | Prec. | Succ. | Prec. | |
MHI ( = 0, L = 3) | 0.459 | 0.665 | 0.370 | 0.595 | 0.594 | 0.785 |
MHI ( = 1, L = 3) | 0.537 | 0.752 | 0.438 | 0.687 | 0.602 | 0.796 |
MHI ( = 3, L = 3) | 0.573 | 0.786 | 0.527 | 0.735 | 0.616 | 0.812 |
MHI ( = 5, L = 3) | 0.551 | 0.761 | 0.489 | 0.702 | 0.606 | 0.801 |
MHI ( = 3, L = 5) | 0.565 | 0.773 | 0.503 | 0.718 | 0.609 | 0.805 |
Head Number | Background Clutter | Viewpoint Change | Overall | |||
---|---|---|---|---|---|---|
Succ. | Prec. | Succ. | Prec. | Succ. | Prec. | |
4 heads | 0.327 | 0.551 | 0.605 | 0.758 | 0.601 | 0.807 |
8 heads | 0.398 | 0.581 | 0.632 | 0.821 | 0.616 | 0.812 |
12 heads | 0.362 | 0.563 | 0.617 | 0.793 | 0.603 | 0.804 |
Parameters | Occlusion | Motion Blur | Overall | |||
---|---|---|---|---|---|---|
Succ. | Prec. | Succ. | Prec. | Succ. | Prec. | |
= 0, = 0 | 0.446 | 0.633 | 0.500 | 0.710 | 0.519 | 0.703 |
= 0.3, = 0.7 | 0.492 | 0.687 | 0.537 | 0.761 | 0.535 | 0.728 |
= 0.5, = 0.7 | 0.537 | 0.737 | 0.557 | 0.821 | 0.552 | 0.737 |
= 0.7, = 0.7 | 0.514 | 0.713 | 0.525 | 0.783 | 0.539 | 0.721 |
= 0.5, = 0.5 | 0.509 | 0.706 | 0.542 | 0.804 | 0.547 | 0.731 |
= 0.5, = 0.9 | 0.493 | 0.682 | 0.507 | 0.771 | 0.523 | 0.718 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, S.; Fan, X.; Wang, Z.; Wang, W.; Zhang, Y. High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts. Remote Sens. 2025, 17, 2237. https://doi.org/10.3390/rs17132237
Zhou S, Fan X, Wang Z, Wang W, Zhang Y. High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts. Remote Sensing. 2025; 17(13):2237. https://doi.org/10.3390/rs17132237
Chicago/Turabian StyleZhou, Shichao, Xiangpan Fan, Zhuowei Wang, Wenzheng Wang, and Yunpu Zhang. 2025. "High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts" Remote Sensing 17, no. 13: 2237. https://doi.org/10.3390/rs17132237
APA StyleZhou, S., Fan, X., Wang, Z., Wang, W., & Zhang, Y. (2025). High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts. Remote Sensing, 17(13), 2237. https://doi.org/10.3390/rs17132237