RN-YOLO: A Small Target Detection Model for Aerial Remote-Sensing Images
Abstract
:1. Introduction
- (1)
- Tackling the difficulty of detecting small targets and extracting detailed features within limitations is achieved by integrating NAM [21] between the feature extraction and fusion networks. NAM optimally preserves small target features through its lightweight design and enhances detection accuracy by adjusting the weight contribution factor.
- (2)
- The RepGhost module [22] is introduced within the feature fusion network, creating RepGhost_C2f. This innovation effectively tackles the issue of inadequate detection capacity for targets spanning a wide range of sizes while significantly reducing model parameters.
- (3)
- The WIoU loss function [23] replaces the CIoU in the original model, enhancing detectors’ overall performance by assigning different weights to the targets with various sizes and alleviating the challenge of localizing small targets.
2. Related Works
2.1. Machine Learning in Remote-Sensing Images
2.2. Reinforcement Learning in Remote-Sensing Images
2.3. YOLOv8 Model
3. Methods
3.1. Overall Architecture
3.2. Normalization-Based Attention Module (NAM)
3.3. RepGhost_C2f
3.4. WioU (Wise Intersection over Union)
4. Experiments
4.1. Experimental Datasets and Their Preprocessing
4.2. Experimental Environment and Training Setting
4.3. Experimental Comparison and Analysis
4.3.1. Comparative Experiments for Attention Module
4.3.2. Comparative Experiments for Lightweight Convolution
4.3.3. Comparative Experiments for Loss Function
4.3.4. Ablation Study
4.3.5. Comparative Experiments with Other Models
4.4. Visualization Experiments
5. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Melesse, M.; Weng, Q.; Thenkabail, P.S.; Senay, G.B. Remote sensing sensors and applications in environmental resources mapping and modelling. Sensors 2007, 7, 3209–3241. [Google Scholar] [CrossRef] [PubMed]
- Gakhar, S.; Tiwari, K.C. Spectral–spatial urban target detection for hyperspectral remote sensing data using artificial neural network. Egypt. J. Remote Sens. Space Sci. 2021, 24, 173–180. [Google Scholar] [CrossRef]
- Yang, C. Remote sensing and precision agriculture technologies for crop disease detection and management with a practical application example. Engineering 2020, 6, 528–532. [Google Scholar] [CrossRef]
- Koshimura, S.; Moya, L.; Mas, E.; Bai, Y. Tsunami damage detection with remote sensing: A review. Geosciences 2020, 10, 177. [Google Scholar] [CrossRef]
- Bi, Y.; Bai, X.; Jin, T.; Guo, S. Multiple feature analysis for infrared small target detection. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1333–1337. [Google Scholar] [CrossRef]
- Zhou, P.; Cheng, G.; Liu, Z.; Bu, S.; Hu, X. Weakly supervised target detection in remote sensing images based on transferred deep features and negative bootstrapping. Multidimens. Syst. Signal Process. 2016, 27, 925–944. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference of Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; Volume 14, pp. 21–37, Part I. [Google Scholar] [CrossRef]
- Li, Z.; Yang, L.; Zhou, F. FSSD: Feature fusion single shot multibox detector. arXiv 2017, arXiv:1712.00960. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar] [CrossRef]
- Yang, L.; Yuan, G.; Zhou, H.; Liu, H.; Chen, J.; Wu, H. RS-Yolox: A high-precision detector for object detection in satellite remote sensing images. Appl. Sci. 2022, 12, 8707. [Google Scholar] [CrossRef]
- Yu, J.; Liu, S.; Xu, T. Research on YOLOv7 remote sensing small target detection algorithm incorporating attention mechanism. J. Comput. Eng. Appl. 2023, 59, 167. (In Chinese) [Google Scholar] [CrossRef]
- Liu, Z.; Gao, Y.; Du, Q.; Chen, M.; Lv, W. YOLO-extract: Improved YOLOv5 for aircraft object detection in remote sensing images. IEEE Access 2023, 11, 1742–1751. [Google Scholar] [CrossRef]
- Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial pho-tography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef]
- Liu, Y.; Shao, Z.; Hoffmann, Y.N. NAM: Normalization-based attention module. arXiv 2021, arXiv:2111.12419. [Google Scholar] [CrossRef]
- Chen, C.; Guo, Z.; Zeng, H.; Xiong, P.; Dong, J. Repghost: A hardware-efficient ghost module via re-parameterization. arXiv 2022, arXiv:2211.06088. [Google Scholar] [CrossRef]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar] [CrossRef]
- Vedaldi, A.; Gulshan, V.; Varma, M.; Zisserman, A. Multiple kernels for object detection. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 606–613. [Google Scholar] [CrossRef]
- Ranzato, M.A.; Boureau, Y.L.; Cun, Y. Sparse feature learning for deep belief networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; Volume 20. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; Volume 1, p. I. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
- Terven, J.; Cordova-Esparza, D.-M.; Romero-González, J.-A. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar] [CrossRef]
- Zhang, Y.; Yuan, Y.; Feng, Y.; Lu, X. Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5535–5548. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Gao, P.; Lu, J.; Li, H.; Mottaghi, R.; Kembhavi, A. Container: Context aggregation network. arXiv 2021, arXiv:2106.01401. [Google Scholar] [CrossRef]
- Zhang, Q.L.; Yang, Y.B. Sa-net: Shuffle attention for deep convolutional neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021), Toronto, ON, Canada, 6–11 June 2021; pp. 235–2239. [Google Scholar] [CrossRef]
- Li, J.; Wen, Y.; He, L. Scconv: Spatial and channel reconstruction convolution for feature redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar] [CrossRef]
Methods | P (%) | R (%) | Paras (M) | mAP@50 (%) | mAP @.5:.95 (%) |
---|---|---|---|---|---|
Yolov8 | 85.0 | 81.6 | 11.13 | 86.2 | 60.2 |
+NAM | 86.3 | 82.3 | 11.13 | 86.7 | 61.5 |
+CBAM | 85.2 | 82.3 | 11.48 | 86.7 | 60.8 |
+ContextAggregation | 86.9 | 82.0 | 11.82 | 86.9 | 61.3 |
+ShuffleAttention | 87.2 | 82.6 | 11.13 | 87.2 | 61.2 |
Methods | RSOD | TGRS-HRRSD | DOTAv1_5 | ||||||
---|---|---|---|---|---|---|---|---|---|
P (%) | R (%) | mAP@.5:.95 (%) | P (%) | R (%) | mAP@.5:.95 (%) | P (%) | R (%) | mAP@.5:.95 (%) | |
Yolov8 | 95.1 | 94.5 | 78.5 | 91.2 | 87.1 | 67.9 | 82.0 | 81.6 | 60.2 |
+NAM | 95.4 | 95.7 | 79.7 | 91.6 | 87.2 | 68.7 | 86.3 | 82.3 | 61.5 |
Position | Methods | RSOD | DOTAv1.5 | ||||
---|---|---|---|---|---|---|---|
P (M) | mAP@50 (%) | mAP@.5:.95 (%) | Paras (M) | mAP@50 (%) | mAP@.5:.95 (%) | ||
None | YOLOv8 | 11.13 | 99.1 | 78.5 | 11.13 | 86.2 | 60.2 |
backbone | +Scconv | 10.36 | 97.3 | 77.7 | 10.36 | 84.9 | 58.6 |
+RepGhost | 9.59 | 97.5 | 78.2 | 9.59 | 86.6 | 60.4 | |
head | +Scconv | 10.51 | 98.0 | 79.4 | 10.51 | 86.7 | 60.9 |
+RepGhost | 9.50 | 97.5 | 79.5 | 9.76 | 87.0 | 62.3 |
Methods | RSOD | TGRS-HRRSD | DOTAv1_5 | ||||||
---|---|---|---|---|---|---|---|---|---|
P (%) | R (%) | mAP@.5:.95 (%) | P (%) | R (%) | mAP@.5:.95 (%) | P (%) | R (%) | mAP@.5:.95 (%) | |
YOLOv8 | 95.1 | 94.5 | 78.5 | 91.2 | 87.1 | 67.9 | 85.0 | 81.6 | 60.2 |
+RepGhost | 95.8 | 95.9 | 79.5 | 91.1 | 88.0 | 68.4 | 88.1 | 82.4 | 62.3 |
Methods | RSOD | TGRS-HRRSD | DOTAv1_5 | |||
---|---|---|---|---|---|---|
mAP@50 (%) | mAP@.5:.95 (%) | mAP@50 (%) | mAP@.5:.95 (%) | mAP@50 (%) | mAP@.5:.95 (%) | |
YOLOv8 | 99.1 | 78.5 | 91.6 | 67.9 | 86.2 | 60.2 |
+WIoUv3 | 98.4 | 79.4 | 91.9 | 68.5 | 87.6 | 62.0 |
+GIoU | 98.8 | 78.5 | 92.0 | 68.3 | 86.4 | 60.5 |
Model | RepGhost | WIoUv3 | NAM | P (%) | R (%) | Paras (M) | mAP@50 (%) | mAP@.5:.95 (%) |
---|---|---|---|---|---|---|---|---|
Yolov8 | 85.0 | 81.6 | 11.13 | 86.2 | 60.2 | |||
Yolov8 | √ | 88.1 | 82.4 | 9.76 | 87.0 | 62.3 (+2.1) | ||
Yolov8 | √ | √ | 87.3 | 83.8 | 9.77 | 87.8 | 62.8 (+0.5) | |
Yolov8 | √ | √ | √ | 87.9 | 84.9 | 9.77 | 87.8 | 63.8 (+1.0) |
Dataset | Evaluation Metrics | Improved Faster R-CNN | YOLO X | YOLO v5 | YOLO v7 | YOLO v8 | RN-YOLO (Ours) |
---|---|---|---|---|---|---|---|
DOTA-v1.5 | mAP@50 (%) | 72.5 | 84.2 | 87.3 | 85.7 | 86.2 | 87.8 |
mAP@.5:.95 (%) | 50.3 | 59.9 | 58.9 | 58.5 | 60.2 | 63.8 | |
Param (M) | 60.40 | 8.94 | 7.05 | 36.56 | 11.13 | 9.77 | |
TGRS-HRRSD | mAP@50 (%) | 74.3 | 77.4 | 91.7 | 92.0 | 91.6 | 92.5 |
mAP@.5:.95 (%) | 53.2 | 58.9 | 66.3 | 67.5 | 67.9 | 69.1 | |
Param (M) | 60.42 | 8.94 | 7.04 | 36.54 | 11.13 | 9.77 | |
RSOD | mAP@50 (%) | 80.8 | 92.1 | 97.9 | 98.5 | 99.1 | 98.0 |
mAP@.5:.95 (%) | 62.1 | 70.9 | 73.5 | 76.6 | 78.5 | 80.5 | |
Param (M) | 60.42 | 8.94 | 7.02 | 36.4 | 11.13 | 9.77 |
Categories | YOLOv5 (%) | YOLOv7 (%) | YOLOv8 (%) | RN-YOLO (Ours) (%) |
---|---|---|---|---|
ship | 66.7 | 72.4 | 65.2 | 61.2 |
bridge | 35.3 | 40.6 | 43.5 | 50.1 |
ground_track | 74.5 | 79.9 | 77.5 | 81.1 |
storage_tank | 84.2 | 84.1 | 85.4 | 88.3 |
basketball_court | 67.9 | 71.9 | 69.0 | 63.9 |
tennis_court | 87.5 | 87.7 | 88.2 | 89.2 |
airplane | 86.2 | 87.6 | 85.5 | 83.5 |
baseball_diamond | 63.4 | 63.1 | 65.8 | 70.3 |
harbor | 73.9 | 71.3 | 80.6 | 82.8 |
vehicle | 74.6 | 71.9 | 73.7 | 72.9 |
crossroad | 53.5 | 52.7 | 54.9 | 58.1 |
T_junction | 45.1 | 44.3 | 43.3 | 42.6 |
parking_lot | 48.1 | 49.4 | 49.6 | 55.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, K.; Zhou, H.; Wu, H.; Yuan, G. RN-YOLO: A Small Target Detection Model for Aerial Remote-Sensing Images. Electronics 2024, 13, 2383. https://doi.org/10.3390/electronics13122383
Wang K, Zhou H, Wu H, Yuan G. RN-YOLO: A Small Target Detection Model for Aerial Remote-Sensing Images. Electronics. 2024; 13(12):2383. https://doi.org/10.3390/electronics13122383
Chicago/Turabian StyleWang, Ke, Hao Zhou, Hao Wu, and Guowu Yuan. 2024. "RN-YOLO: A Small Target Detection Model for Aerial Remote-Sensing Images" Electronics 13, no. 12: 2383. https://doi.org/10.3390/electronics13122383
APA StyleWang, K., Zhou, H., Wu, H., & Yuan, G. (2024). RN-YOLO: A Small Target Detection Model for Aerial Remote-Sensing Images. Electronics, 13(12), 2383. https://doi.org/10.3390/electronics13122383