A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images
Abstract
:1. Introduction
- (1)
- To enhance feature extraction of multi-scale objects in remote sensing images, MFANet introduced a structurally reparameterized VGG-like technique (RepVGG) to reparameterize a new backbone and improve multi-object detection accuracy without increasing computation time.
- (2)
- Detailed enhancement channels were introduced in path aggregation feature pyramid networks (PAFPN) to express a great deal of object information. Combining with residual connections, this paper formed a new multi-branch convolutional module (Res-RFBs) to improve the recognition rate of multi-scale objects in remote sensing images. The coordinate attention (CA) mechanism was introduced to reduce the interference of background information and enhance the perception of remote sensing objects by the neural network.
- (3)
- To address the shortcomings of the baseline in the object localization and identification problem, generalized intersection over union (GIoU) was used to optimize the loss, speed up the convergence of the model, and reduce the target miss rate.
2. Methods
2.1. The Structure of the Network
2.2. RepVGG Block
2.3. Improved Feature Detection
2.4. Coordinate Attention Mechanism
2.5. Loss Function Improvement
3. Experiment
3.1. Experimental Environment
3.2. Data Set
3.3. Evaluation Metrics
3.4. Ablation Experiment
3.5. Comparison with Other Algorithms
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, W. Detection of Ship in Optical Remote Sensing Image of Median-Low Resolution. Master’s Thesis, National University of Defense Technology, Changsha, China, November 2008. [Google Scholar]
- Wang, Y.; Ma, L.; Tian, Y. State-of-the-art of Ship Detection and Recognition in Optical Remotely Sensed lmagery. Acta Autom. Sin. 2011, 37, 1029–1039. [Google Scholar]
- Rajendran, G.B.; Kumarasamy, U.M.; Zarro, C.; Divakarachari, P.B.; Ullo, S.L. Land-Use and Land-Cover Classification Using a Human Group-Based Particle Swarm Optimization Algorithm with an LSTM Classifier on Hybrid Pre-Processing Remote-Sensing Images. Remote Sens. 2020, 12, 4135. [Google Scholar] [CrossRef]
- Zhang, W.; Zhang, B.; Zhu, W.; Tang, X.; Li, F.; Liu, X.; Yu, Q. Comprehensive assessment of MODIS-derived near-surface air temperature using wide elevation-spanned measurements in China. Sci. Total Environ. 2021, 800, 149535. [Google Scholar] [CrossRef] [PubMed]
- Nie, G.; Huang, H. A survey of object detection in optical remote sensing images. Acta Autom. Sin. 2021, 47, 1749–1768. [Google Scholar]
- Parameshachari, B.; Gurumoorthy, S.; Frnda, J.; Nelson, S.C.; Balmuri, K.R. Cognitive linear discriminant regression computing technique for HTTP video services in SDN networks. Soft Comput. 2022, 26, 621–633. [Google Scholar] [CrossRef]
- Wang, R.; Wu, X.; Kittler, J. SymNet: A simple symmetric positive definite manifold deep learning method for image set classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2208–2222. [Google Scholar] [CrossRef] [PubMed]
- Gao, X.; Niu, S.; Wei, D.; Liu, X.; Wang, T.; Zhu, F.; Dong, J.; Sun, Q. Joint Metric Learning-Based Class-Specific Representation for Image Set Classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Parameshachari, B.; Panduranga, H. Medical image encryption using SCAN technique and chaotic tent map system. In Recent Advances in Artificial Intelligence and Data Engineering; Springer: Singapore, 2022; pp. 181–193. [Google Scholar]
- Zhou, F.; Jin, L.; Dong, J. Review of Convolutional Neural Network. Chin. J. Comput. 2017, 40, 1229–1251. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Zhu, M.; Xu, Y.; Ma, S.; Li, S.; Ma, H.; Han, Y. Effective airplane detection in remote sensing images based on multilayer feature fusion and improved nonmaximal suppression algorithm. Remote Sens. 2019, 11, 1062. [Google Scholar] [CrossRef]
- Shivappriya, S.N.; Priyadarsini, M.J.P.; Stateczny, A.; Puttamadappa, C.; Parameshachari, B.D. Cascade Object Detection and Remote Sensing Object Detection Method Based on Trainable Activation Function. Remote Sens. 2021, 13, 200. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Laban, N.; Abdellatif, B.; Ebeid, H.M.; Shedeed, H.A.; Tolba, M.F. Convolutional Neural Network with Dilated Anchors for Object Detection in Very High Resolution Satellite Images. In Proceedings of the International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 17 December 2019; pp. 34–39. [Google Scholar]
- Hong, Z.; Yang, T.; Tong, X.; Zhang, Y.; Jiang, S.; Zhou, R.; Han, Y.; Wang, J.; Yang, S.; Liu, S. Multi-scale ship detection from SAR and optical imagery via a more accurate YOLOv3. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 6083–6101. [Google Scholar] [CrossRef]
- Zhou, H.; Guo, W. Improved YOLOv5 Network in Application of Remote Sensing Image Object Detection. Remote Sens. Inf. 2022, 37, 23–30. [Google Scholar]
- Wang, X.; Li, W.; Guo, W.; Cao, K. SPB-YOLO: An Efficient Real-Time Detector For Unmanned Aerial Vehicle Images. In Proceedings of the International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021; pp. 099–104. [Google Scholar]
- Han, X.; Li, F. Remote Sensing Small Object Detection Based on Cross-Layer Attention Enhancement. Laser Optoelectron. Prog. 2022, pp. 1–19. Available online: https://kns.cnki.net/kcms/detail/31.1690.TN.20220722.2132.050.html (accessed on 17 February 2023).
- Wu, Q.; Zhang, B.; Xu, C.; Zhang, H.; Wang, C. Dense Oil Tank Detection and Classification via YOLOX-TR Network in Large-Scale SAR Images. Remote Sens. 2022, 14, 3246. [Google Scholar] [CrossRef]
- Yang, L.; Yuan, G.; Zhou, H.; Liu, H.; Chen, J.; Wu, H. RS-YOLOX: A High-Precision Detector for Object Detection in Satellite Remote Sensing Images. Appl. Sci. 2022, 12, 8707. [Google Scholar] [CrossRef]
- Guo, Q.; Yuan, C. Leveraging Spatial-Semantic Information in Object Detection and Segmentation. Ruan Jian Xue Bao/J. Softw. 2022, pp. 1–13. Available online: http://www.jos.org.cn/jos/article/abstract/6509 (accessed on 17 February 2023). (In Chinese).
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-style ConvNets Great Again. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13728–13737. [Google Scholar]
- Shang, W.; Sohn, K.; Almeida, D.; Lee, H. Understanding and improving convolutional neural networks via concatenated rectified linear units. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2217–2225. [Google Scholar]
- Ramachandran, P.; Zoph, B.; Le, Q. Swish: A Self-Gated Activation Function. arXiv 2017, arXiv:1710.05941. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. Available online: https://doi.org/10.48550/arXiv.1711.07767 (accessed on 28 August 2022).
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8231–8240. [Google Scholar]
- Fan, X.; Yan, W.; Shi, P.; Zhang, X. Remote sensing image target detection based on a multi-scale deep feature fusion network. Natl. Remote Sens. Bull. 2022, 26, 2292–2303. [Google Scholar]
- Zhang, J.; Wu, X.; Zhao, X.; Zhuo, L.; Zhang, J. Scene Constrained Object Detection Method in High-Resolution Remote Sensing Images by Relation-Aware Global Attention. J. Electron. Inf. Technol. 2022, 44, 2924–2931. [Google Scholar]
- Xue, J.; Zhu, J.; Zhang, J.; Li, X.; Dou, S.; Mi, L.; Li, Z.; Yuan, X.; Li, C. Object Detection in Optical Remote Sensing Images Based on FFC-SSD Model. Acta Opt. Sin. 2022, 42, 138–148. [Google Scholar]
- Cheng, G.; Wang, J.; Li, K.; Xie, X.; Lang, C.; Yao, Y.; Han, J. Anchor-Free Oriented Proposal Generator for Object Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5625411. [Google Scholar] [CrossRef]
- Huang, Z.; Li, W.; Xia, X.; Wang, H.; Jie, F.; Tao, R. LO-Det: Lightweight Oriented Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Li, W.; Chen, Y.; Hu, K.; Zhu, J. Oriented reppoints for aerial object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 1829–1838. [Google Scholar]
- Xu, T.; Sun, X.; Diao, W.; Zhao, L.; Fu, K.; Wang, K. ASSD: Feature Aligned Single-Shot Detection for Multiscale Objects in Aerial Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5607117. [Google Scholar] [CrossRef]
- Yao, Y.; Cheng, G.; Xie, X.; Han, J. Optical remote sensing image object detection based on multi-resolution feature fusion. Natl. Remote Sens. Bull. 2021, 25, 1124–1137. [Google Scholar]
- Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2384–2399. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.; Zheng, C.; Yan, H.; Zuo, X.; Liu, Y.; Qiao, B.; Yang, Y. RepDarkNet: A Multi-Branched Detector for Small-Target Detection in Remote Sensing Images. ISPRS Int. J. Geo-Inf. 2022, 11, 158. [Google Scholar] [CrossRef]
- Ye, Y.; Ren, Y.; Gao, X.; Wang, J. Remote sensing image target detection based on improved YOLOv4. J. Optoelectron. Laser 2022, 33, 607–613. [Google Scholar]
- Zhu, F.; Gao, J.; Yang, J.; Ye, N. Neighborhood linear discriminant analysis. Pattern Recognit. 2022, 123, 108422. [Google Scholar] [CrossRef]
- Zhu, F.; Ning, Y.; Chen, X.; Zhao, Y.; Gang, Y. On removing potential redundant constraints for SVOR learning. Appl. Soft Comput. 2021, 102, 106941. [Google Scholar] [CrossRef]
Location | P/(%) | R/(%) | mAP/(%) | FPS/(f/s) |
RepVGG_1(Relu) | 91.55 | 86.40 | 90.14 | 45.64 |
RepVGG_1 | 91.94 | 87.20 | 92.19 | 46.89 |
RepVGG_2 | 92.91 | 88.07 | 93.07 | 46.37 |
RepVGG_3 | 91.03 | 90.58 | 93.34 | 48.39 |
RepVGG_4 | 91.17 | 91.93 | 92.99 | 47.45 |
RepVGG_5 | 90.67 | 89.78 | 93.53 | 48.50 |
RepVGG | Q+ Res-RFBs | CA | GIoU | P/(%) | R/(%) | mAP/(%) | FPS/(f/s) |
- | - | - | - | 89.68 | 90.37 | 92.23 | 48.02 |
√ | - | - | - | 90.67 | 89.78 | 93.53 | 48.50 |
- | √ | - | - | 93.45 | 91.66 | 94.98 | 33.61 |
- | - | √ | - | 90.51 | 93.28 | 94.14 | 41.26 |
- | - | - | √ | 93.12 | 90.65 | 93.56 | 35.61 |
√ | √ | √ | √ | 94.09 | 94.94 | 96.63 | 30.09 |
Method | mAP/(%) | FPS/(f/s) |
Faster RCNN | 79.48 | 10.59 |
Laban’s [20] | 78.00 | - |
YOLOv4-tiny | 84.14 | 81.36 |
YOLOv5 | 89.14 | 54.91 |
SCRDet [37] | 91.75 | - |
YOLOX-s | 92.23 | 48.02 |
Fan’s [38] | 93.40 | - |
Zhang’s [39] | 95.59 | 30.07 |
Xue’s [40] | 95.70 | - |
MFANet | 96.63 | 30.09 |
Method | mAP/(%) | FPS/(f/s) |
Faster RCNN | 57.35 | 10.42 |
AOPG [41] | 64.41 | - |
LO-Det [42] | 65.85 | 60.03 |
Li’s [43] | 66.71 | - |
YOLOv4-tiny | 66.77 | 56.68 |
ASSD [44] | 71.80 | 21.00 |
Yao’s [45] | 75.80 | - |
SCRDet++ [46] | 77.80 | - |
YOLOv5 | 80.96 | 51.41 |
SPB-YOLO [23] | 81.10 | - |
YOLOX-s | 82.23 | 47.99 |
Zhou’s [47] | 84.30 | - |
YOLOX [24] | 85.70 | - |
Ye’s [48] | 86.55 | - |
MFANet | 87.88 | 29.45 |
Method | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | mAP |
YOLOv4-tiny | 0.91 | 0.98 | 0.99 | 0.66 | 1.00 | 0.71 | 0.99 | 0.76 | 0.82 | 0.59 | 0.84 |
YOLOv5 | 1.00 | 1.00 | 0.98 | 0.83 | 1.00 | 0.99 | 0.98 | 0.91 | 0.90 | 0.33 | 0.89 |
YOLOX-s | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 0.94 | 0.89 | 0.40 | 0.92 |
Fan’s | 1.00 | 1.00 | 0.91 | 0.91 | 1.00 | 0.90 | 0.94 | 0.91 | 0.90 | 0.91 | 0.93 |
Zhang’s | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.90 | 1.00 | 0.90 | 0.87 | 0.90 | 0.96 |
Xue’s | 0.90 | 0.96 | 1.00 | 1.00 | 1.00 | 0.88 | 1.00 | 0.96 | 0.89 | 0.99 | 0.96 |
MFANet | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.97 | 0.94 | 0.77 | 0.96 |
Method | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 |
ASSD | 0.86 | 0.82 | 0.76 | 0.90 | 0.41 | 0.78 | 0.65 | 0.67 | 0.62 | 0.81 |
Yao’s | 0.91 | 0.75 | 0.93 | 0.83 | 0.47 | 0.92 | 0.63 | 0.68 | 0.61 | 0.80 |
SCRDet++ | 0.81 | 0.88 | 0.80 | 0.90 | 0.58 | 0.81 | 0.75 | 0.90 | 0.83 | 0.85 |
YOLOv5 | 0.96 | 0.86 | 0.97 | 0.86 | 0.48 | 0.86 | 0.75 | 0.86 | 0.77 | 0.71 |
YOLOX-s | 0.96 | 0.88 | 0.96 | 0.83 | 0.48 | 0.78 | 0.78 | 0.94 | 0.79 | 0.83 |
Zhou’s | 0.98 | 0.90 | 0.95 | 0.93 | 0.62 | 0.91 | 0.68 | 0.96 | 0.86 | 0.87 |
MFANet | 0.97 | 0.93 | 0.97 | 0.86 | 0.59 | 0.88 | 0.87 | 0.97 | 0.90 | 0.86 |
mAP | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | C20 |
0.71 | 0.79 | 0.62 | 0.58 | 0.85 | 0.77 | 0.65 | 0.88 | 0.62 | 0.45 | 0.76 |
0.76 | 0.83 | 0.57 | 0.66 | 0.80 | 0.93 | 0.81 | 0.89 | 0.63 | 0.73 | 0.78 |
0.78 | 0.84 | 0.63 | 0.67 | 0.73 | 0.79 | 0.70 | 0.90 | 0.71 | 0.59 | 0.90 |
0.81 | 0.92 | 0.67 | 0.71 | 0.95 | 0.89 | 0.86 | 0.96 | 0.63 | 0.60 | 0.93 |
0.82 | 0.90 | 0.69 | 0.71 | 0.96 | 0.96 | 0.87 | 0.95 | 0.65 | 0.62 | 0.92 |
0.84 | 0.91 | 0.63 | 0.73 | 0.96 | 0.92 | 0.90 | 0.96 | 0.57 | 0.71 | 0.94 |
0.87 | 0.93 | 0.75 | 0.79 | 0.97 | 0.97 | 0.92 | 0.97 | 0.79 | 0.73 | 0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, Y.; Wang, W.; Zhang, W.; Yang, L.; Wang, J.; Ni, H.; Guan, T.; He, J.; Gu, Y.; Tran, N.N. A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images. Remote Sens. 2023, 15, 2096. https://doi.org/10.3390/rs15082096
Cheng Y, Wang W, Zhang W, Yang L, Wang J, Ni H, Guan T, He J, Gu Y, Tran NN. A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images. Remote Sensing. 2023; 15(8):2096. https://doi.org/10.3390/rs15082096
Chicago/Turabian StyleCheng, Yong, Wei Wang, Wenjie Zhang, Ling Yang, Jun Wang, Huan Ni, Tingzhao Guan, Jiaxin He, Yakang Gu, and Ngoc Nguyen Tran. 2023. "A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images" Remote Sensing 15, no. 8: 2096. https://doi.org/10.3390/rs15082096
APA StyleCheng, Y., Wang, W., Zhang, W., Yang, L., Wang, J., Ni, H., Guan, T., He, J., Gu, Y., & Tran, N. N. (2023). A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images. Remote Sensing, 15(8), 2096. https://doi.org/10.3390/rs15082096