HB-YOLO: An Improved YOLOv7 Algorithm for Dim-Object Tracking in Satellite Remote Sensing Videos
Abstract
:1. Introduction
- The field of view in satellite remote sensing images is vast [1], and objects in motion are generally faint and occupy only a few to dozens of pixels, lacking color, texture, and other features. In the presence of noise, conventional algorithms may not detect and track such dim objects effectively.
- Moreover, due to the movement of the satellite, the background of the remote sensing image demonstrates weak, non-uniform motion [4]. The satellite remote sensing image, being a projection of a satellite on a two-dimensional plane during complex three-dimensional motion [6,7], exhibits complex motion patterns.
- Furthermore, due to the dark current in the photosensitive device, material structures, and other local differences, there exist many additional noise sources, leading to variations in the gray value of each pixel for the same irradiance. After image stabilization, these variations can result in random fluctuations of pixel brightness in satellite images, posing a challenge for detecting and tracking faint objects in satellite remote sensing images.
- In the backbone, we employed an enhanced variant of HorNet, a novel methodology that effectively integrates three 1 × 1 convolution models in lieu of the conventional convolution utilized in YOLO. This innovative approach enables the attainment of heightened spatial interaction, thereby facilitating improved feature extraction capabilities.
- To achieve optimal fusion of backbone-extracted features, we employed the BoTNet attention mechanism in the neck as an alternative to Extended Efficient Layer Aggregation Networks (ELANs), facilitating enhanced feature integration.
- The Anchor Box was re-selected to improve performance indexes, including mean Average Precision (mAP).
- The proposed framework was used for object detection and combined with image segmentation detection and the latest BoT-SORT for object tracking, which was evaluated using general indexes.
2. Related Works
2.1. Object Detection
2.2. Object Tracking
3. Materials and Methods
3.1. Detection and Improvement
3.1.1. Original and Improved YOLOv7
3.1.2. Anchor Box
3.1.3. Image Segmentation
3.2. Object-Tracking Model
- Unmatched tracks indicate the temporary disappearance of the object, with the removal of the object ID from the graph if it does not reappear within a specified number of frames.
- Unmatched detections correspond to new objects, and a new ID is assigned to each object, with Kalman filter predictions initiated.
- Matched tracks represent successful matches, with the tracks and Kalman filter being updated accordingly.
- Combined with motion and appearance information, a new cost matrix is designed.
- Camera motion compensation is introduced.
- More accurate width estimation is obtained using a more precise Kalman filter state vector and replacing the original aspect ratio and height with width and height.
4. Results
4.1. Dataset
4.2. Evolution and Indicator
4.3. Experiment and Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yin, Q.; Hu, Q.; Liu, H.; Zhang, F.; Wang, Y.; Lin, Z.; An, W.; Guo, Y. Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5612518. [Google Scholar] [CrossRef]
- Zhao, M.; Li, S.; Xuan, S.; Kou, L.; Gong, S.; Zhou, Z. SatSOT: A Benchmark Dataset for Satellite Video Single Object Tracking. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5617611. [Google Scholar] [CrossRef]
- Ye, F.; Ai, T.; Wang, J.; Yao, Y.; Zhou, Z. A Method for Classifying Complex Features in Urban Areas Using Video Satellite Remote Sensing Data. Remote Sens. 2022, 14, 2324. [Google Scholar] [CrossRef]
- Yang, L.; Yuan, G.; Zhou, H.; Liu, H.; Chen, J.; Wu, H. RS-YOLOX: A High-Precision Detector for Object Detection in Satellite Remote Sensing Images. Appl. Sci. 2022, 12, 8707. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, T.; Zhang, G.; Cheng, Q.; Wu, J. Small Object Tracking in Satellite Videos Using Background Compensation. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7010–7021. [Google Scholar] [CrossRef]
- Li, W.; Gao, F.; Zhang, P.; Li, Y.; An, Y.; Zhong, X.; Lu, Q. Research on Multiview Stereo Mapping Based on Satellite Video Images. IEEE Access 2021, 9, 44069–44083. [Google Scholar] [CrossRef]
- Shao, J.; Du, B.; Wu, C.; Zhang, L. Tracking Objects From Satellite Videos: A Velocity Feature Based Correlation Filter. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7860–7871. [Google Scholar] [CrossRef]
- Wu, J.; Cao, C.; Zhou, Y.; Zeng, X.; Feng, Z.; Wu, Q.; Huang, Z. Multiple Ship Tracking in Remote Sensing Images Using Deep Learning. Remote Sens. 2021, 13, 3601. [Google Scholar] [CrossRef]
- Liu, Z.; Gao, Y.; Du, Q.; Chen, M.; Lv, W. YOLO-Extract: Improved YOLOv5 for Aircraft Object Detection in Remote Sensing Images. IEEE Access 2023, 11, 1742–1751. [Google Scholar] [CrossRef]
- Etten, A.V. You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery. arXiv 2018, arXiv:1805.09512. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE ICCV, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE CVPR, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for multi-scale remote sensing object detection. Sensors 2020, 20, 4276. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Mark Liao, H.-Y. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the IEEE ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Mehran, Y.; Thierry, B. New trends on moving object detection in video images captured by a moving camera: A survey. Comput. Sci. Rev. 2018, 28, 157–177. [Google Scholar]
- Hu, Q.; Guo, Y.; Lin, Z.; An, W.; Cheng, H. Object tracking using multiple features and adaptive model updating. IEEE Trans. Instrum. Meas. 2017, 66, 2882–2897. [Google Scholar] [CrossRef]
- Luca, B.; Jack, V.; João, F.H.; Andrea, V.; Philip, H.S.T. Fully-convolutional Siamese networks for object tracking. arXiv 2016, arXiv:1606.09549. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and real-time tracking. In Proceedings of the 23rd IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Saleemi, I.; Shah, M. Multiframe many–many point correspondence for vehicle tracking in high density wide area aerial videos. Int. J. Comput. Vis. 2013, 104, 198–219. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and real-time tracking with a deep association metric. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Lalonde, R.; Zhang, D.; Shah, M. ClusterNet: Detecting small objects in large scenes by exploiting spatio-temporal information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, Utah, 18–22 June 2018; pp. 4003–4012. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. arXiv 2021, arXiv:2110.06864. [Google Scholar]
- Aharon, N.; Orfaig, R.; Bobrovsky, B. BoT-SORT: Robust Associations Multi-Pedestrian Tracking. arXiv 2022, arXiv:2206.14651. [Google Scholar]
- Rao, Y.; Zhao, W.; Tang, Y.; Zhou, J.; Lim, S.; Lu, J. HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutionsar. arXiv 2022, arXiv:2207.14284. [Google Scholar]
- Srinivas, A.; Lin, T.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck Transformers for Visual Recognition. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16514–16524. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cao, Y.; Wang, G.; Yan, D.; Zhao, Z. Two algorithms for the detection and tracking of moving vehicle objects in aerial infrared image sequences. Remote Sens. 2016, 8, 28. [Google Scholar] [CrossRef] [Green Version]
- Zivkovic, Z.; Van der Heijden, F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit. Lett. 2006, 27, 773–780. [Google Scholar] [CrossRef]
- Barnich, O.; Van Droogenbroeck, M. ViBe: A universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 2011, 20, 1709–1724. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rezaei, B.; Ostadabbas, S. Background subtraction via fast robust matrix completion. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 1871–1879. [Google Scholar]
- Pflugfelder, R.; Weissenfeld, A.; Wagner, J. On learning vehicle detection in satellite video. arXiv 2020, arXiv:2001.10900. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Reilly, V.; Idrees, H.; Shah, M. Detection and tracking of large number of objects in wide area surveillance. Proc. Eur. Conf. Comput. Vis. 2010, 6313, 186–199. [Google Scholar]
- Rodriguez, P.; Wohlberg, B. Fast principal component pursuit via alternating minimization. In Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, VIC, Australia, 15–18 September 2013; pp. 69–73. [Google Scholar]
Improved the Structure | Anchor | Rcll | Prcn | [email protected] |
---|---|---|---|---|
-- | -- | -- | -- | -- |
-- | √ | -- | -- | -- |
√ | -- | 6.65 × 10−6 | 1.29 × 10−6 | 6 × 10−7 |
√ | √ | 0.49 | 0.72 | 0.53 |
Method | Training Loss | Rcll | Prcn | F1 | [email protected] | ||
---|---|---|---|---|---|---|---|
Box | Obj | Cls | |||||
C3HB only | 0.142 | 0.177 | 0.0025 | 0.32 | 0.48 | 0.384 | 0.27 |
C3HB × 2 | 0.139 | 0.155 | 0.0011 | 0.39 | 0.53 | 0.449 | 0.39 |
C3HB × 2 and BoT | 0.137 | 0.153 | 0.0009 | 0.49 | 0.72 | 0.583 | 0.53 |
Method | F1 | Rcll | Prcn | mAP |
---|---|---|---|---|
FD [32] | 0.34 | 0.72 | 0.23 | 0.43 |
FRMC [35] | 0.32 | 0.53 | 0.26 | 0.22 |
GMM [33] | 0.43 | 0.46 | 0.45 | 0.38 |
MGB [38] | 0.44 | 0.7 | 0.34 | 0.46 |
FPCP [39] | 0.47 | 0.59 | 0.36 | 0.36 |
Ours | 0.58 | 0.49 | 0.72 | 0.53 |
Segmentation Index (m) | MOTA/% | MOTP/% | IDs | FPS |
---|---|---|---|---|
1 | 41.3 | 30.2 | 3551 | 12.33 |
2 | 42.4 | 30.1 | 3480 | 9.82 |
3 | 45.5 | 29.7 | 3071 | 5.40 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, C.; Feng, Z.; Wu, Z.; Wei, R.; Song, B.; Cao, C. HB-YOLO: An Improved YOLOv7 Algorithm for Dim-Object Tracking in Satellite Remote Sensing Videos. Remote Sens. 2023, 15, 3551. https://doi.org/10.3390/rs15143551
Yu C, Feng Z, Wu Z, Wei R, Song B, Cao C. HB-YOLO: An Improved YOLOv7 Algorithm for Dim-Object Tracking in Satellite Remote Sensing Videos. Remote Sensing. 2023; 15(14):3551. https://doi.org/10.3390/rs15143551
Chicago/Turabian StyleYu, Chaoran, Zhejun Feng, Zengyan Wu, Runxi Wei, Baoming Song, and Changqing Cao. 2023. "HB-YOLO: An Improved YOLOv7 Algorithm for Dim-Object Tracking in Satellite Remote Sensing Videos" Remote Sensing 15, no. 14: 3551. https://doi.org/10.3390/rs15143551
APA StyleYu, C., Feng, Z., Wu, Z., Wei, R., Song, B., & Cao, C. (2023). HB-YOLO: An Improved YOLOv7 Algorithm for Dim-Object Tracking in Satellite Remote Sensing Videos. Remote Sensing, 15(14), 3551. https://doi.org/10.3390/rs15143551