A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s
Abstract
:1. Introduction
- (1)
- In this paper, depthwise convolution is used to replace the standard convolution in the decoupled head module to construct a new detection head, the DD-head, which can improve the negative impact of classification and regression task conflicts while reducing the parameter volume of the decoupled head.
- (2)
- Based on the SPPCSPC module, this paper utilizes the design principle of the GS bottleneck and replaces the CBS module in the SPPCSPC module with the GSConv module to design a lightweight SPPCSPG module, which is introduced into the backbone structure to optimize the YOLOv5s network model.
- (3)
- The effect of embedding the SA module in the network backbone, neck, and head regions is studied, and the SA module is ultimately embedded in the head region to enhance the spatial attention and channel attention of the feature map, thereby improving the accuracy of multi-scale object detection.
- (4)
- The CARAFE module is used to replace the nearest neighbor interpolation up-sampling module to reassemble feature points with similar semantic information in a content-aware manner and aggregate features in a larger receptive field, achieving the up-sampling operation.
- (5)
- In this study, the Conv module in the variety of view-GS cross-stage partial (VoV-GSCSP) module is replaced by the GSConv module to reconstruct a new VoV-GSCSP module to further reduce the model’s parameters. The GSConv module and the improved VoV-GSCSP module are embedded in the neck structure to maintain the model’s detection accuracy while reducing the parameter volume.
2. Related Work
2.1. Object Detection Algorithms for Remote Sensing Images
2.2. Attention Mechanism
2.3. Multi-Scale Feature Fusion
3. YOLOv5 Algorithm
3.1. Input
3.2. Backbone
3.3. Neck
3.4. Head
3.5. Loss Function
4. Improved YOLOv5s Algorithm
4.1. Shuffle Attention Module
4.2. DD-Head Module
4.3. Content-Aware Reassembly of the Features Module
4.4. GSConv Module
4.5. SPPCSPG Module
5. Experimental Results and Analysis
5.1. Experimental Platform and Dataset
5.2. Evaluation Metrics
5.3. Experimental Results and Analysis
5.3.1. Performance Evaluation of the SA Module Embedded Model
5.3.2. Effect Experiment of the DD-Head Module
5.3.3. Effect Experiment of the SPPCSPG Module
5.3.4. Performance Comparison in the RSOD Dataset
5.3.5. Performance Comparison in the DIOR Dataset
5.3.6. Performance Comparison in the PASCAL VOC Dataset
5.3.7. Performance Comparison in the MS COCO Dataset
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Haq, M.A.; Ahmed, A.; Khan, I.; Gyani, J.; Mohamed, A.; Attia, E.; Mangan, P.; Pandi, D. Analysis of environmental factors using AI and ML methods. Sci. Rep. 2022, 12, 13267. [Google Scholar] [CrossRef] [PubMed]
- Haq, M.A.; Jilani, A.K.; Prabu, P. Deep Learning Based Modeling of Groundwater Storage Change. CMC Comput. Mat. Contin. 2022, 70, 4599–4617. [Google Scholar]
- Haq, M.A. CDLSTM: A Novel Model for Climate Change Forecasting. CMC Comput. Mat. Contin. 2022, 71, 2363–2381. [Google Scholar]
- Haq, M.A. SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification. CMC Comput. Mat. Contin. 2022, 71, 1403–1425. [Google Scholar]
- Ning, Z.; Sun, S.; Wang, X.; Guo, L.; Wang, G.; Gao, X.; Kwok, R.Y.K. Intelligent resource allocation in mobile blockchain for privacy and security transactions: A deep reinforcement learning based approach. Sci. China Inf. Sci. 2021, 64, 162303. [Google Scholar] [CrossRef]
- Xu, Y.; Wang, H.; Liu, X.; He, H.R.; Gu, Q.; Sun, W. Learning to See the Hidden Part of the Vehicle in the Autopilot Scene. Electronics 2019, 8, 331. [Google Scholar] [CrossRef]
- Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review. Remote Sens. 2020, 12, 144. [Google Scholar] [CrossRef]
- Liu, P.; Wang, Q.; Yang, G.; Li, L.; Zhang, H. Survey of Road Extraction Methods in Remote Sensing Images Based on Deep Learning. PFG—J. Photogramm. Remote Sens. Geoinf. Sci. 2022, 90, 135–159. [Google Scholar] [CrossRef]
- Jia, D.; He, Z.; Zhang, C.; Yin, W.; Wu, N.; Li, Z. Detection of cervical cancer cells in complex situation based on improved YOLOv3 network. Multimed. Tools Appl. 2022, 81, 8939–8961. [Google Scholar] [CrossRef]
- Shaheen, H.; Ravikumar, K.; Lakshmipathi Anantha, N.; Uma Shankar Kumar, A.; Jayapandian, N.; Kirubakaran, S. An efficient classification of cirrhosis liver disease using hybrid convolutional neural network-capsule network. Biomed. Signal. Process. Control. 2023, 80, 104152. [Google Scholar] [CrossRef]
- Yang, J.; Guo, X.; Li, Y.; Marinello, F.; Ercisli, S.; Zhang, Z. A survey of few-shot learning in smart agriculture: Developments, applications, and challenges. Plant Methods 2022, 18, 28. [Google Scholar] [CrossRef] [PubMed]
- Lv, Z.; Zhang, S.; Xiu, W. Solving the Security Problem of Intelligent Transportation System with Deep Learning. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4281–4290. [Google Scholar] [CrossRef]
- Shaik, A.S.; Karsh, R.K.; Islam, M.; Laskar, R.H. A review of hashing based image authentication techniques. Multimed. Tools Appl. 2022, 81, 2489–2516. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, H.; Xue, X.; Jiang, Y.; Shen, Q. Deep learning for remote sensing image classification: A survey. WIREs Data Min. Knowl. Discov. 2018, 8, e1264. [Google Scholar] [CrossRef]
- Fan, D.; Ji, G.; Cheng, M.; Shao, L. Concealed Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6024–6042. [Google Scholar] [CrossRef] [PubMed]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3523–3542. [Google Scholar] [CrossRef]
- Liu, X.; He, J.; Yao, Y.; Zhang, J.; Liang, H.; Wang, H.; Hong, Y. Classifying urban land use by integrating remote sensing and social media data. Int. J. Geogr. Inf. Sci. 2017, 31, 1675–1696. [Google Scholar] [CrossRef]
- Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
- Li, S.; Lyu, D.; Huang, G.; Zhang, X.; Gao, F.; Chen, Y.; Liu, X. Spatially varying impacts of built environment factors on rail transit ridership at station level: A case study in Guangzhou, China. J. Transp. Geogr. 2020, 82, 102631. [Google Scholar] [CrossRef]
- Hu, S.; Fong, S.; Yang, L.; Yang, S.; Dey, N.; Millham, R.C.; Fiaidhi, J. Fast and Accurate Terrain Image Classification for ASTER Remote Sensing by Data Stream Mining and Evolutionary-EAC Instance-Learning-Based Algorithm. Remote Sens. 2021, 13, 1123. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Tang, X.; Zhou, P.; Wang, P. Real-time image-based driver fatigue detection and monitoring system for monitoring driver vigilance. In Proceedings of the 2016 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016; pp. 4188–4193. [Google Scholar]
- Alexe, B.; Deselaers, T.; Ferrari, V. Measuring the Objectness of Image Windows. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2189–2202. [Google Scholar] [CrossRef]
- Yap, M.H.; Pons, G.; Martí, J.; Ganau, S.; Sentís, M.; Zwiggelaar, R.; Davison, A.K.; Martí, R. Automated Breast Ultrasound Lesions Detection Using Convolutional Neural Networks. IEEE J. Biomed. Health Inform. 2018, 22, 1218–1226. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Kong, T.; Yao, A.; Chen, Y.; Sun, F. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 845–853. [Google Scholar]
- Cho, M.; Chung, T.Y.; Lee, H.; Lee, S. N-RPN: Hard Example Learning for Region Proposal Networks. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 3955–3959. [Google Scholar]
- Rao, Y.; Cheng, Y.; Xue, J.; Pu, J.; Wang, Q.; Jin, R.; Wang, Q. FPSiamRPN: Feature Pyramid Siamese Network with Region Proposal Network for Target Tracking. IEEE Access 2020, 8, 176158–176169. [Google Scholar] [CrossRef]
- Zhong, Q.; Li, C.; Zhang, Y.; Xie, D.; Yang, S.; Pu, S. Cascade region proposal and global context for deep object detection. Neurocomputing 2020, 395, 170–177. [Google Scholar] [CrossRef]
- Cai, C.; Chen, L.; Zhang, X.; Gao, Z. End-to-End Optimized ROI Image Compression. IEEE Trans. Image Process. 2020, 29, 3442–3457. [Google Scholar] [CrossRef] [PubMed]
- Shaik, A.S.; Karsh, R.K.; Islam, M.; Singh, S.P.; Wan, S. A Secure and Robust Autoencoder-Based Perceptual Image Hashing for Image Authentication. Wirel. Commun. Mob. Comput. 2022, 2022, 1645658. [Google Scholar] [CrossRef]
- Seferbekov, S.; Iglovikov, V.; Buslaev, A.; Shvets, A. Feature Pyramid Network for Multi-class Land Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 272–273. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Wang, C.; Liao, H.M.; Wu, Y.; Chen, P.; Hsieh, J.; Yeh, I. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 13–19 June 2020; pp. 1571–1580. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Li, H.; Xiong, P.; An, J.; Wang, L. Pyramid Attention Network for Semantic Segmentation. arXiv 2018, arXiv:1805.10180. [Google Scholar]
- Xu, J.; Sun, X.; Zhang, D.; Fu, K. Automatic Detection of Inshore Ships in High-Resolution Remote Sensing Images Using Robust Invariant Generalized Hough Transform. IEEE Geosci. Remote Sens. Lett. 2014, 11, 2070–2074. [Google Scholar]
- Cucchiara, R.; Grana, C.; Piccardi, M.; Prati, A.; Sirotti, S. Improving shadow suppression in moving object detection with HSV color information. In Proceedings of the ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.01TH8585), Oakland, CA, USA, 25–29 August 2001; pp. 334–339. [Google Scholar]
- Corbane, C.; Najman, L.; Pecoul, E.; Demagistri, L.; Petit, M. A complete processing chain for ship detection using optical satellite imagery. Int. J. Remote Sens. 2010, 31, 5837–5854. [Google Scholar] [CrossRef]
- Li, Z.; Itti, L. Saliency and Gist Features for Target Detection in Satellite Images. IEEE Trans. Image Process. 2011, 20, 2017–2029. [Google Scholar]
- Brekke, C.; Solberg, A.H.S. Oil spill detection by satellite remote sensing. Remote Sens. Environ. 2005, 95, 1–13. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Guo, L.; Qian, X.; Zhou, P.; Yao, X.; Hu, X. Object detection in remote sensing imagery using a discriminatively trained mixture model. ISPRS J. Photogramm. Remote Sens. 2013, 85, 32–43. [Google Scholar] [CrossRef]
- Hinz, S.; Stilla, U. Car detection in aerial thermal images by local and global evidence accumulation. Pattern Recognit. Lett. 2006, 27, 308–315. [Google Scholar] [CrossRef]
- Pang, J.; Li, C.; Shi, J.; Xu, Z.; Feng, H. R2-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5512–5524. [Google Scholar] [CrossRef]
- Fu, Y.; Wu, F.; Zhao, J. Context-Aware and Depthwise-based Detection on Orbit for Remote Sensing Image. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 1725–1730. [Google Scholar]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Xu, D. Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection. IEEE Trans. Image Process. 2019, 28, 265–278. [Google Scholar] [CrossRef]
- Yang, X.; Sun, H.; Sun, X.; Yan, M.; Guo, Z.; Fu, K. Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network. IEEE Access 2018, 6, 50839–50849. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, S.; Thachan, S.; Chen, J.; Qian, Y. Deconv R-CNN for Small Object Detection on Remote Sensing Images. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 2483–2486. [Google Scholar]
- Li, L.; Cheng, L.; Guo, X.; Liu, X.; Jiao, L.; Liu, F. Deep Adaptive Proposal Network in Optical Remote Sensing Images Objective Detection. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 2651–2654. [Google Scholar]
- Guo, W.; Yang, W.; Zhang, H.; Hua, G. Geospatial Object Detection in High Resolution Satellite Images Based on Multi-Scale Convolutional Neural Network. Remote Sens. 2018, 10, 131. [Google Scholar] [CrossRef]
- Zhang, X.; Zhu, K.; Chen, G.; Tan, X.; Zhang, L.; Dai, F.; Liao, P.; Gong, Y. Geospatial Object Detection on High Resolution Remote Sensing Imagery Based on Double Multi-Scale Feature Pyramid Network. Remote Sens. 2019, 11, 755. [Google Scholar] [CrossRef]
- Li, K.; Cheng, G.; Bu, S.; You, X. Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2337–2348. [Google Scholar] [CrossRef]
- Li, Q.; Mou, L.; Jiang, K.; Liu, Q.; Wang, Y.; Zhu, X. Hierarchical Region Based Convolution Neural Network for Multiscale Object Detection in Remote Sensing Images. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 4355–4358. [Google Scholar]
- Guo, M.; Xu, T.; Liu, J.; Liu, Z.; Jiang, P.; Mu, T.; Zhang, S.; Martin, R.R.; Cheng, M.; Hu, S. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Hao, Z.; Wang, Z.; Bai, D.; Tao, B.; Tong, X.; Chen, B. Intelligent Detection of Steel Defects Based on Improved Split Attention Networks. Front. Bioeng. Biotechnol. 2022, 9, 810876. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar]
- Guan, Q.; Huang, Y.; Zhong, Z.; Zheng, Z.; Zheng, L.; Yang, Y. Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification. arXiv 2018, arXiv:1801.09927. [Google Scholar]
- Dai, T.; Cai, J.; Zhang, Y.; Xia, S.T.; Zhang, L. Second-Order Attention Network for Single Image Super-Resolution. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11057–11066. [Google Scholar]
- Xu, T.; Zhang, P.; Huang, Q.; Zhang, H.; Gan, Z.; Huang, X.; He, X. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1316–1324. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Shaik, A.S.; Karsh, R.K.; Suresh, M.; Gunjan, V.K. LWT-DCT Based Image Hashing for Tampering Localization via Blind Geometric Correction. In ICDSMLA 2020; Kumar, A., Senatore, S., Gunjan, V.K., Eds.; Springer: Singapore, 2022; pp. 1651–1663. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent Models of Visual Attention. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2204–2212. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2017–2025. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Vedaldi, A. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 9423–9433. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. CCNet: Criss-Cross Attention for Semantic Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Li, J.; Zhang, S.; Wang, J.; Gao, W.; Tian, Q. Global-Local Temporal Representations for Video Person Re-Identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3957–3966. [Google Scholar]
- Liu, Z.; Wang, L.; Wu, W.; Qian, C.; Lu, T. TAM: Temporal Adaptive Module for Video Recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 13688–13698. [Google Scholar]
- Srivastava, R.K.; Greff, K.; Schmidhuber, J.U.R. Training Very Deep Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2377–2385. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Yang, B.; Bender, G.; Le, Q.V.; Ngiam, J. CondConv: Conditionally Parameterized Convolutions for Efficient Inference. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 1307–1318. [Google Scholar]
- Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic Convolution: Attention Over Convolution Kernels. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11027–11036. [Google Scholar]
- Chen, L.; Zhang, H.; Xiao, J.; Nie, L.; Shao, J.; Liu, W.; Chua, T.S. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6298–6306. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6450–6458. [Google Scholar]
- Song, S.; Lan, C.; Xing, J.; Zeng, W.; Liu, J. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 4263–4270. [Google Scholar]
- Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.Y.; Liu, J. LSTM network: A deep learning approach for short-term traffic forecast. IET Intell. Transp. Syst. 2017, 11, 68–75. [Google Scholar] [CrossRef]
- Wang, L.; Ouyang, W.; Wang, X.; Lu, H. Visual Tracking with Fully Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3119–3127. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7029–7038. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
- Merugu, S.; Tiwari, A.; Sharma, S.K. Spatial–Spectral Image Classification with Edge Preserving Method. J. Indian Soc. Remote Sens. 2021, 49, 703–711. [Google Scholar] [CrossRef]
- Liu, S.; Di, H.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Zhang, Y.; Wang, W.; Li, Z.; Shu, S.; Lang, X.; Zhang, T.; Dong, J. Development of a cross-scale weighted feature fusion network for hot-rolled steel surface defect detection. Eng. Appl. Artif. Intell. 2023, 117, 105628. [Google Scholar] [CrossRef]
- Qiu, M.; Huang, L.; Tang, B. Bridge detection method for HSRRSIs based on YOLOv5 with a decoupled head. Int. J. Digit. Earth 2023, 16, 113–129. [Google Scholar] [CrossRef]
- Liang, M.; Liu, X.; Hu, X. Small target detection algorithm for train operating environment image based on improved YOLOv3. J. Comput. Appl. 2023, 1–12. [Google Scholar]
- Li, W.; Chen, L.; Xe, X.; Hao, X.; Li, H. An Algorithm for Detecting Prohibited Items in X-ray Images Based on Improved YOLOv5. Comput. Eng. Appl. 2023, 42, 2675–2683. [Google Scholar]
- Zhao, W.; Syafrudin, M.; Fitriyani, N.L. CRAS-YOLO: A Novel Multi-Category Vessel Detection and Classification Model Based on YOLOv5s Algorithm. IEEE Access 2023, 11, 11463–11478. [Google Scholar] [CrossRef]
- Luo, X.; Wu, Y.; Zhao, L. YOLOD: A Target Detection Method for UAV Aerial Imagery. Remote Sens. 2022, 14, 3240. [Google Scholar] [CrossRef]
- Yun, S.; Han, D.; Chun, S.; Oh, S.J.; Yoo, Y.; Choe, J. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE: Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6022–6031. [Google Scholar]
- Ran, X.; Zhou, X.; Lei, M.; Tepsan, W.; Deng, W. A Novel K-Means Clustering Algorithm with a Noise Algorithm for Capturing Urban Hotspots. Appl. Sci. 2021, 11, 11202. [Google Scholar] [CrossRef]
- Li, Z.; Yang, S.; Deshuai, S.; Liu, X.; Zheng, Y. Yield estimation method of apple tree based on improved lightweight YOLOv5. Smart Agric. 2021, 3, 100–114. [Google Scholar]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2022, 52, 8574–8586. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Hu, X.; Yang, J. Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks. arXiv 2019, arXiv:1905.09646. [Google Scholar]
- Zhang, Q.; Yang, Y. SA-Net: Shuffle Attention for Deep Convolutional Neural Networks. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2235–2239. [Google Scholar]
- Song, G.; Liu, Y.; Wang, X. Revisiting the Sibling Head in Object Detector. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11560–11569. [Google Scholar]
- Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking Classification and Localization for Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10183–10192. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Gao, H.; Yang, Y.; Li, C.; Gao, L.; Zhang, B. Multiscale Residual Network with Mixed Depthwise Convolution for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3396–3408. [Google Scholar] [CrossRef]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. CARAFE: Content-Aware ReAssembly of FEatures. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
- Zhang, M.; Gao, F.; Yang, W.; Zhang, H. Wildlife Object Detection Method Applying Segmentation Gradient Flow and Feature Dimensionality Reduction. Electronics 2023, 12, 377. [Google Scholar] [CrossRef]
- Yang, Z.; Li, L.; Luo, W.; Ning, X. PDNet: Improved YOLOv5 Nondeformable Disease Detection Network for Asphalt Pavement. Comput. Intell. Neurosci. 2022, 2022, 5133543. [Google Scholar] [CrossRef] [PubMed]
- Wu, F.; Duan, J.; Ai, P.; Chen, Z.; Yang, Z.; Zou, X. Rachis detection and three-dimensional localization of cut off point for vision-based banana robot. Comput. Electron. Agric. 2022, 198, 107079. [Google Scholar] [CrossRef]
- Wang, C.; Mark, A.B.; Liao, M.H. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Xiao, Z.; Liu, Q.; Tang, G.; Zhai, X. Elliptic Fourier transformation-based histograms of oriented gradients for rotationally invariant object detection in remote-sensing images. Int. J. Remote Sens. 2015, 36, 618–644. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Lin, H.; Zhou, J.; Gan, Y.; Vong, C.; Liu, Q. Novel up-scale feature aggregation for object detection in aerial images. Neurocomputing 2020, 411, 364–374. [Google Scholar] [CrossRef]
- Li, Y.; Huang, Q.; Pei, X.; Chen, Y.; Jiao, L.; Shang, R. Cross-Layer Attention Network for Small Object Detection in Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 2148–2161. [Google Scholar] [CrossRef]
- Yao, Y.; Cheng, G.; Xie, X.; Han, J. Optical remote sensing image object detection based on multiresolution feature fusion. Natl. Remote Sens. Bull. 2021, 25, 1124–1137. [Google Scholar]
- Yuan, Z.; Liu, Z.; Zhu, C.; Qi, J.; Zhao, D. Object Detection in Remote Sensing Images via Multi-Feature Pyramid Network with Receptive Field Block. Remote Sens. 2021, 13, 862. [Google Scholar] [CrossRef]
- Wang, G.; Zhuang, Y.; Chen, H.; Liu, X.; Zhang, T.; Li, L.; Dong, S.; Sang, Q. FSoD-Net: Full-Scale Object Detection from Optical Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [Google Scholar] [CrossRef]
- Liu, N.; Mao, Z.; Wang, Y.; Shen, J. Remote Sensing Images Target Detection Based on Adjustable Parameter and Receptive field. Acta Photonica Sin. 2021, 50, 302–313. [Google Scholar]
- Zhang, T.; Zhuang, Y.; Wang, G.; Dong, S.; Chen, H.; Li, L. Multiscale Semantic Fusion-Guided Fractal Convolutional Object Detection Network for Optical Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–20. [Google Scholar] [CrossRef]
- Xue, J.; Zheng, Y.G.; Dong, C.; Wang, P.; Yasir, M. Improved YOLOv5 network method for remote sensing image-based ground objects recognition. Soft Comput. 2022, 26, 10879–10889. [Google Scholar] [CrossRef]
- Sun, Y.; Liu, W.; Gao, Y.; Hou, X.; Bi, F. A Dense Feature Pyramid Network for Remote Sensing Object Detection. Appl. Sci. 2022, 12, 4997. [Google Scholar] [CrossRef]
- Liu, H.; Zhang, L.; Wang, F.; He, R. Object detection algorithm based on attention mechanism and context information. J. Comput. Appl. 2022, 1–9. [Google Scholar]
- Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 2384–2399. [Google Scholar] [CrossRef]
- Chen, W.; Han, B.; Yang, Z.; Gao, X. MSSDet: Multi-Scale Ship-Detection Framework in Optical Remote-Sensing Images and New Benchmark. Remote Sens. 2022, 14, 5460. [Google Scholar] [CrossRef]
- Gao, P.; Cao, X.; Li, K.; You, X. Object Detection in Remote Sensing Images by Fusing Multi-neuron Sparse Features and Hierarchical Depth Features. J. Geo Inf. Sci. 2023, 25, 638–653. [Google Scholar]
- Chen, J.; Hong, H.; Song, B.; Guo, J.; Chen, C.; Xu, J. MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images. Remote Sens. 2023, 15, 371. [Google Scholar] [CrossRef]
- Zhao, H.; Li, Z.; Zhang, T. Attention Based Single Shot Multibox Detector. J. Electron. Inf. Technol. 2021, 43, 2096–2104. [Google Scholar]
- Qu, Z.; Han, T.; Yi, T. MFFAMM: A Small Object Detection with Multi-Scale Feature Fusion and Attention Mechanism Module. Appl. Sci. 2022, 12, 8940. [Google Scholar] [CrossRef]
- Yang, Z.; Bu, Z.; Liu, C. SSD Optimization Model Based on Shallow Feature Fusion. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2259033. [Google Scholar] [CrossRef]
- Qian, H.; Wang, H.; Feng, S.; Yan, S. FESSD: SSD target detection based on feature fusion and feature enhancement. J. Real Time Image Process. 2023, 20, 2. [Google Scholar] [CrossRef]
- Yang, Y.; Deng, H. GC-YOLOv3: You Only Look Once with Global Context Block. Electronics 2020, 9, 1235. [Google Scholar] [CrossRef]
- Zhang, X.; Gao, Y.; Wang, H.; Wang, Q. Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection. Int. J. Adv. Robot. Syst. 2020, 17, 1738093438. [Google Scholar] [CrossRef]
- Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- He, X.; Song, X. Improved YOLOv4-Tiny lightweight target detection algorithm. J. Front. Comput. Sci. Technol. 2023, 1–17. [Google Scholar]
- Junayed, M.S.; Islam, M.B.; Imani, H.; Aydin, T. PDS-Net: A novel point and depth-wise separable convolution for real-time object detection. Int. J. Multimed. Inf. Retr. 2022, 11, 171–188. [Google Scholar] [CrossRef]
- Wang, K.; Wang, Y.; Zhang, S.; Tian, Y.; Li, D. SLMS-SSD: Improving the balance of semantic and spatial information in object detection. Expert Syst. Appl. 2022, 206, 117682. [Google Scholar] [CrossRef]
- Kong, T.; Sun, F.; Yao, A.; Liu, H.; Lu, M.; Chen, Y. RON: Reverse Connection with Objectness Prior Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5244–5252. [Google Scholar]
- Zhou, P.; Ni, B.; Geng, C.; Hu, J.; Xu, Y. Scale-Transferrable Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 528–537. [Google Scholar]
- Qu, Z.; Gao, L.; Wang, S.; Yin, H.; Yi, T. An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network. Image Vis. Comput. 2022, 125, 104518. [Google Scholar] [CrossRef]
- Tu, X.; Bao, X.; Wu, B.; Jin, Y.; Zhang, Q. Object detection algorithm for 3D coordinate attention path aggregation network. J. Front. Comput. Sci. Technol. 2023, 1–16. [Google Scholar]
- Yang, J.; Hong, L.; Du, Y.; Mao, Y.; Liu, Q. A Lightweight Object Detection Algorithm Based on Improved YOLOv5s. Electron. Opt. Control 2023, 30, 24–30. [Google Scholar]
- Song, Z.; Xiao, B.; Ai, Y.; Zheng, L.; Tie, J. Improved lightweight YOLOv4 target detection algorithm. Electron. Meas. Technol. 2022, 45, 142–152. [Google Scholar]
- Hu, J.; Wang, Y.; Cheng, S.; Liu, J.; Kang, J.; Yang, W. SFGNet detecting objects via spatial fine-grained feature and enhanced RPN with spatial context. Syst. Sci. Control Eng. 2022, 10, 388–406. [Google Scholar] [CrossRef]
- Dai, J.F.; Li, Y.; He, K.M.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Bacea, D.; Oniga, F. Single stage architecture for improved accuracy real-time object detection on mobile devices. Image Vis. Comput. 2023, 130, 104613. [Google Scholar] [CrossRef]
- Wang, G.; Ding, H.; Yang, Z.; Li, B.; Wang, Y.; Bao, L. TRC-YOLO: A real-time detection method for lightweight targets based on mobile devices. IET Comput. Vis. 2022, 16, 126–142. [Google Scholar] [CrossRef]
- Wang, G.; Ding, H.; Li, B.; Nie, R.; Zhao, Y. Trident-YOLO: Improving the precision and speed of mobile device object detection. IET Image Process. 2022, 16, 145–157. [Google Scholar] [CrossRef]
- Xiao, J.; Guo, H.; Zhou, J.; Zhao, T.; Yu, Q.; Chen, Y.; Wang, Z. Tiny object detection with context enhancement and feature purification. Expert Syst. Appl. 2023, 211, 118665. [Google Scholar] [CrossRef]
Method | Data Size | Param. | GFLOPs | Precision (%) | Recall (%) | [email protected] (%) | FPS |
---|---|---|---|---|---|---|---|
YOLOv5s | 640 × 640 | 7.02 M | 15.8 | 0.942 | 0.932 | 0.950 | 119.0 |
YOLOv5s-SA-A | 640 × 640 | 7.02 M | 15.8 | 0.930 | 0.920 | 0.948 | 105.8 |
YOLOv5s-SA-B | 640 × 640 | 7.02 M | 15.8 | 0.937 | 0.935 | 0.948 | 112.3 |
YOLOv5s-SA-C | 640 × 640 | 7.02 M | 15.8 | 0.939 | 0.930 | 0.952 | 113.9 |
Method | Data Size | Param. | GFLOPs | Precision (%) | Recall (%) | [email protected] (%) | FPS |
---|---|---|---|---|---|---|---|
YOLOv5s | 640 × 640 | 7.02 M | 15.8 | 0.942 | 0.932 | 0.950 | 119.0 |
YOLOv5s + Decoupled head | 640 × 640 | 14.33 M | 56.2 | 0.937 | 0.969 | 0.956 | 94.9 |
YOLOv5s + DD-head | 640 × 640 | 7.28 M | 16.7 | 0.960 | 0.929 | 0.955 | 106.8 |
Method | Data Size | Param. | GFLOPs | Precision (%) | Recall (%) | [email protected] (%) | FPS |
---|---|---|---|---|---|---|---|
YOLOv5s | 640 × 640 | 7.02 M | 15.8 | 0.942 | 0.932 | 0.950 | 119.0 |
YOLOv5s + SPPCSPC | 640 × 640 | 13.45 M | 20.9 | 0.947 | 0.931 | 0.953 | 117.0 |
YOLOv5s + SPPCSPG | 640 × 640 | 9.95 M | 18.1 | 0.942 | 0.954 | 0.955 | 113.1 |
Method | Param. | GFLOPs | [email protected] (%) | Aircraft | Oil Tank | Playground | Overpass | FPS |
---|---|---|---|---|---|---|---|---|
YOLOv5s | 7.02 M | 15.8 | 0.950 | 0.983 | 0.986 | 0.839 | 0.994 | 119.0 |
Ours | 9.67 M | 18.4 | 0.964 | 0.981 | 0.987 | 0.893 | 0.992 | 88.8 |
Method | mAP | AL | AT | BF | BC | B | C | D | ESA | ETS | GC |
---|---|---|---|---|---|---|---|---|---|---|---|
HawkNet [119] | 72.0 | 65.7 | 84.2 | 76.1 | 87.4 | 45.3 | 79.0 | 64.5 | 82.8 | 72.4 | 82.5 |
CANet [120] | 74.3 | 70.3 | 82.4 | 72.0 | 87.8 | 55.7 | 79.9 | 67.7 | 83.5 | 77.2 | 77.3 |
Yao et al. [121] | 75.8 | 91.0 | 74.5 | 93.3 | 83.2 | 47.4 | 91.9 | 63.3 | 68.0 | 61.4 | 80.0 |
MFPNet [122] | 71.2 | 76.6 | 83.4 | 80.6 | 82.1 | 44.3 | 75.6 | 68.5 | 85.9 | 63.9 | 77.3 |
FSoD-Net [123] | 71.8 | 88.9 | 66.9 | 86.8 | 90.2 | 45.5 | 79.6 | 48.2 | 86.9 | 75.5 | 67.0 |
ASDN [124] | 66.9 | 63.9 | 73.8 | 71.8 | 81 | 46.3 | 73.4 | 56.3 | 73.4 | 66.2 | 74.7 |
MSFC-Net [125] | 70.1 | 85.8 | 76.2 | 74.4 | 90.1 | 44.2 | 78.1 | 55.5 | 60.9 | 59.5 | 76.9 |
Xue et al. [126] | 80.5 | 95.2 | 84.2 | 94.8 | 85.2 | 54.0 | 90.5 | 71.0 | 75.3 | 70.7 | 82.0 |
DFPN-YOLO [127] | 69.33 | 80.2 | 76.8 | 72.7 | 89.1 | 43.4 | 76.9 | 72.3 | 59.8 | 56.4 | 74.3 |
AC-YOLO [128] | 77.1 | 93.1 | 80.9 | 79.9 | 84.4 | 76.0 | 81.7 | 77.1 | 67.6 | 70.0 | 66.7 |
SCRDet++ [129] | 75.1 | 71.9 | 85.0 | 79.5 | 88.9 | 52.3 | 79.1 | 77.6 | 89.5 | 77.8 | 84.2 |
MSSDet [130] | 76.9 | 70.7 | 88.6 | 81.8 | 90.4 | 56.5 | 82.5 | 73.0 | 90.1 | 78.6 | 86.6 |
Gao et al. [131] | 72.5 | 78.1 | 83.9 | 73.0 | 89.0 | 48.2 | 79.4 | 65.6 | 63.9 | 61.9 | 80.6 |
MDCT [132] | 80.5 | 92.5 | 85.0 | 93.5 | 84.7 | 53.7 | 90.2 | 74.3 | 79.9 | 68.2 | 68.6 |
YOLOv5s | 80.4 | 87.2 | 86.9 | 86.2 | 92.3 | 55.5 | 83.0 | 72.6 | 91.1 | 83.0 | 81.6 |
Ours | 81.6 | 87.9 | 91.1 | 84.9 | 91.7 | 55.8 | 80.7 | 78.9 | 92.8 | 82.6 | 86.6 |
Method | mAP | GTF | HB | O | S | SD | ST | TC | TS | V | W |
HawkNet [119] | 72.0 | 74.7 | 50.2 | 59.6 | 89.7 | 66.0 | 70.8 | 87.2 | 61.4 | 52.8 | 88.2 |
CANet [120] | 74.3 | 83.6 | 56.0 | 63.6 | 81.0 | 79.8 | 70.8 | 88.2 | 67.6 | 51.2 | 89.6 |
Yao et al. [121] | 75.8 | 82.8 | 57.4 | 65.8 | 80.0 | 92.5 | 81.1 | 88.7 | 63.0 | 73.0 | 78.1 |
MFPNet [122] | 71.2 | 77.2 | 62.1 | 58.8 | 77.2 | 76.8 | 60.3 | 86.4 | 64.5 | 41.5 | 80.2 |
FSoD-Net [123] | 71.8 | 77.3 | 53.6 | 59.7 | 78.3 | 69.9 | 75.0 | 91.4 | 52.3 | 52.0 | 90.6 |
ASDN [124] | 66.9 | 75.2 | 51.1 | 58.4 | 76.2 | 67.4 | 60.2 | 81.4 | 58.7 | 45.8 | 83.1 |
MSFC-Net [125] | 70.1 | 73.7 | 49.6 | 57.2 | 89.6 | 69.2 | 76.5 | 86.7 | 51.8 | 55.2 | 84.3 |
Xue et al. [126] | 80.5 | 82.1 | 70.6 | 67.3 | 95.0 | 94.3 | 83.8 | 91.6 | 61.2 | 79.8 | 81.8 |
DFPN-YOLO [127] | 69.33 | 71.6 | 63.1 | 58.7 | 81.5 | 40.1 | 74.2 | 85.8 | 73.6 | 49.7 | 86.5 |
AC-YOLO [128] | 77.1 | 75.7 | 75.5 | 76.7 | 87.0 | 65.8 | 70.1 | 88.7 | 63.5 | 81.2 | 80.5 |
SCRDet++ [129] | 75.1 | 83.1 | 64.2 | 65.6 | 71.3 | 76.5 | 64.5 | 88.0 | 70.9 | 47.1 | 85.1 |
MSSDet [130] | 76.9 | 85.6 | 63.5 | 66.5 | 82.5 | 82.0 | 63.3 | 88.7 | 71.7 | 46.7 | 89.2 |
Gao et al. [131] | 72.5 | 76.6 | 63.5 | 61.6 | 89.6 | 68.7 | 76.4 | 87.0 | 66.4 | 57.0 | 78.7 |
MDCT [132] | 80.5 | 92.9 | 68.4 | 83.8 | 92.9 | 77.4 | 83.0 | 92.8 | 64.7 | 77.4 | 83.0 |
YOLOv5s | 80.4 | 86.4 | 66.5 | 67.3 | 91.8 | 81.0 | 80.4 | 93.2 | 69.7 | 60.3 | 92.2 |
Ours | 81.6 | 86.4 | 68.7 | 67.3 | 91.7 | 81.5 | 80.3 | 93.0 | 77.1 | 60.9 | 91.4 |
Method | Backbone | [email protected] (%) | FPS | GPU |
---|---|---|---|---|
Faster R-CNN [27] | VGGNet | 73.2 | 7 | Titan X |
SSD 300 [35] | VGGNet | 74.1 | 46 | Titan X |
ASSD 300 [133] | VGGNet | 79.1 | 39.6 | GTX 1080Ti |
MFFAMM 300 [134] | VGG16 | 80.7 | 26 | - |
Zhe et al. 300 [135] | VGG16 | 80.1 | 42.2 | RTX 2080Ti |
FESSD 300 [136] | ResNet-50 | 82.2 | 41.3 | RTX 3090 |
YOLOv3 320 [37] | Darknet53 | 74.5 | 45.5 | Titan X |
GC-YOLOv3 320 [137] | Darknet53 | 81.3 | 39 | GTX 1080Ti |
DSP-YOLO 416 [138] | Darknet53 | 82.2 | 56 | Titan Xp |
Zhang et al. 416 [139] | MobileNetv2 | 81.67 | 44.18 | RTX 2080Ti |
He et al. 416 [140] | ECA-CSPNet | 78.6 | 94 | RTX 2080Ti |
SSD 512 [35] | VGGNet | 76.8 | 19.0 | Titan X |
ASSD 512 [133] | VGGNet | 81.0 | 20.8 | GTX 1080Ti |
PDS-Net 512 [141] | CSPDarknet-53 | 84.9 | 32.2 | RTX 2070 |
SLMS-SSD 512 [142] | VGG16 | 81.2 | 17.4 | RTX 2080Ti |
YOLOv3 544 [37] | Darknet53 | 78.6 | 40 | Titan X |
GC-YOLOv3 544 [137] | Darknet53 | 83.7 | 31 | GTX 1080Ti |
RON384++ [143] | VGG16 | 77.6 | - | Titan X |
STDN 513 [144] | DenseNet169 | 80.9 | 28.6 | Titan Xp |
Zhong et al. [145] | BottleneckCSP | 84.3 | 85.2 | RTX 2080Ti |
YOLO-T 640 [146] | CSPDarknet-53 | 85.2 | 65.7 | RTX 3090 |
BFBG-YOLO [147] | CSPDarknet-53 | 80.3 | 99.0 | RTX 3090 |
SL-YOLO [148] | ShuffleNet v2 | 81.2 | 17.8 | Tesla P40 |
YOLOv5s 640 | CSPDarknet-53 | 83.7 | 143 | RTX A5000 |
Ours 640 | CSPDarknet-53 | 85.1 | 90.2 | RTX A5000 |
Method | mAP | Aero | Bike | Bird | Boat | Bottle | Bus | Car | Cat | Chair | Cow |
---|---|---|---|---|---|---|---|---|---|---|---|
Faster R-CNN [27] | 73.2 | 76.5 | 79 | 70.9 | 65.5 | 52.1 | 83.1 | 84.7 | 86.4 | 52 | 81.9 |
SSD 300 [35] | 74.1 | 74.6 | 80.2 | 72.2 | 66.2 | 47.1 | 82.9 | 83.4 | 86.1 | 54.4 | 78.5 |
ASSD 300 [133] | 79.1 | 85.4 | 84.1 | 78.7 | 71.8 | 54.0 | 86.2 | 85.3 | 89.5 | 60.4 | 87.4 |
FESSD 300 [136] | 82.2 | 89.4 | 86.2 | 84.3 | 78.2 | 57.8 | 91.6 | 91.5 | 91.7 | 62.2 | 90.4 |
Zhe et al. 300 [135] | 80.1 | 84.6 | 87.6 | 80.1 | 73.0 | 50.4 | 89.3 | 88.3 | 90.9 | 60.2 | 87.8 |
He et al. 416 [140] | 78.6 | 86.1 | 86.2 | 76.5 | 66.5 | 66.4 | 86.6 | 91.3 | 80.7 | 64.3 | 84.4 |
Zhang et al. 416 [139] | 81.6 | 88.5 | 87.5 | 83.1 | 75.2 | 67.1 | 85.3 | 90.2 | 88.9 | 60.9 | 89.7 |
DSP-YOLO 416 [138] | 82.2 | 88.5 | 89.5 | 79.1 | 74.0 | 68.7 | 89.7 | 90.6 | 89.9 | 66.7 | 84.4 |
SSD 512 [35] | 76.8 | 82.4 | 84.7 | 78.4 | 73.8 | 53.2 | 86.2 | 87.5 | 86.0 | 57.8 | 83.1 |
ASSD 512 [133] | 81.0 | 86.8 | 85.2 | 84.1 | 75.2 | 60.5 | 88.3 | 88.4 | 89.3 | 63.5 | 87.6 |
PDS-Net 512 [141] | 84.2 | 93.3 | 98.0 | 80.2 | 73.8 | 70.2 | 90.9 | 96.3 | 87.1 | 65.0 | 87.3 |
SLMS-SSD 512 [142] | 81.2 | 88.5 | 87.1 | 83.2 | 76.4 | 59.2 | 88.3 | 88.4 | 89.0 | 66.6 | 86.9 |
STDN513 [144] | 80.9 | 86.1 | 89.3 | 79.5 | 74.3 | 61.9 | 88.5 | 88.3 | 89.4 | 67.4 | 86.5 |
DSP-YOLO 608 [138] | 83.1 | 91.0 | 90.7 | 81.8 | 75.6 | 73.8 | 91.3 | 92.7 | 91.2 | 66.9 | 86.9 |
RON384++ [143] | 77.6 | 86. 0 | 82.5 | 76.9 | 69.1 | 59.2 | 86.2 | 85.5 | 87.2 | 59.9 | 81.4 |
SFGNet [149] | 81.2 | 82.2 | 83.9 | 80.3 | 71.5 | 78.2 | 89.6 | 86.9 | 90.0 | 65.7 | 87.9 |
SL-YOLO [148] | 81.2 | 86.4 | 85.7 | 77.9 | 75.5 | 72.5 | 85.4 | 87.8 | 86.2 | 85.9 | 72.1 |
YOLOv5s 640 | 83.7 | 91.6 | 91.9 | 81.1 | 75.0 | 78.5 | 91.2 | 92.9 | 87.3 | 67.4 | 88.0 |
Ours 640 | 85.1 | 92.2 | 92.0 | 82.9 | 74.4 | 78.1 | 92.6 | 93.7 | 91.1 | 68.8 | 89.3 |
Method | mAP | Table | Dog | Horse | Mbike | Person | Plant | Sheep | Sofa | Train | Tv |
Faster R-CNN [27] | 73.2 | 65.7 | 84.8 | 84.6 | 77.5 | 76.7 | 38.8 | 73.6 | 73.9 | 83 | 72.6 |
SSD 300 [35] | 74.1 | 73.9 | 84.4 | 84.5 | 82.4 | 76.1 | 48.6 | 74.3 | 75.0 | 84.3 | 74.0 |
ASSD 300 [133] | 79.1 | 77.1 | 87.4 | 86.8 | 84.8 | 79.5 | 57.8 | 81.5 | 80.1 | 87.4 | 76.9 |
FESSD 300 [136] | 82.2 | 74.4 | 89.4 | 90.5 | 87.7 | 83.7 | 52.4 | 88.6 | 81.6 | 91.0 | 81.7 |
Zhe et al. 300 [135] | 80.1 | 81.4 | 87.1 | 89.1 | 87.9 | 82.1 | 54.6 | 80.4 | 80.5 | 89.2 | 78.1 |
He et al. 416 [140] | 78.6 | 73.7 | 77.6 | 85.3 | 85.9 | 86.1 | 52.4 | 80.8 | 75.5 | 84.5 | 80.7 |
Zhang et al. 416 [139] | 81.6 | 78.4 | 89.5 | 89.5 | 84.9 | 84.8 | 55.1 | 86.9 | 74.3 | 90.8 | 82.0 |
DSP-YOLO 416 [138] | 82.2 | 75.0 | 89.2 | 89.3 | 89.8 | 85.8 | 56.6 | 84.4 | 81.1 | 89.1 | 81.6 |
SSD 512 [35] | 76.8 | 70.2 | 84.9 | 85.2 | 83.9 | 79.7 | 50.3 | 77.9 | 73.9 | 82.5 | 75.3 |
ASSD 512 [133] | 81.0 | 76.6 | 88.2 | 86.7 | 85.7 | 82.8 | 59.2 | 83.6 | 80.5 | 87.5 | 80.8 |
PDS-Net 512 [141] | 84.2 | 74.5 | 86.5 | 91.8 | 91.9 | 89.7 | 59.9 | 92.8 | 79.3 | 89.1 | 84.6 |
SLMS-SSD 512 [142] | 81.2 | 74.6 | 87.3 | 88.6 | 86.5 | 82.2 | 54.8 | 85.5 | 80.9 | 87.9 | 81.0 |
STDN513 [144] | 80.9 | 79.5 | 86.4 | 89.2 | 88.5 | 79.3 | 53.0 | 77.9 | 81.4 | 86.6 | 85.5 |
DSP-YOLO 608 [138] | 83.1 | 75.5 | 89.0 | 90.4 | 88.6 | 87.3 | 55.1 | 87.3 | 80.0 | 86.9 | 80.0 |
RON384++ [143] | 77.6 | 73.3 | 85.9 | 86.8 | 82.8 | 79.6 | 52.4 | 78.2 | 76.0 | 86.2 | 78.0 |
SFGNet [149] | 81.2 | 72.4 | 90.3 | 89.9 | 83.5 | 82.5 | 67.8 | 79.0 | 81.6 | 86.7 | 75.7 |
SL-YOLO [148] | 81.2 | 79.5 | 78.8 | 88.2 | 86.5 | 81.1 | 71.2 | 84.4 | 79.2 | 82.7 | 76.3 |
YOLOv5s 640 | 83.7 | 79.1 | 86.0 | 91.4 | 89.3 | 89.5 | 60.5 | 86.1 | 76.0 | 86.8 | 84.9 |
Ours 640 | 85.1 | 76.7 | 88.1 | 92.6 | 92.2 | 90.0 | 61.2 | 89.4 | 79.0 | 90.6 | 84.7 |
Method | AP | AP50 | AP75 | AP Small | AP Medium | AP Large |
---|---|---|---|---|---|---|
R-FCN [150] | 29.2 | 51.5 | - | 10.3 | 32.4 | 43.3 |
SSD 300 [35] | 25.1 | 43.1 | 25.8 | 6.6 | 25.9 | 41.4 |
FESSD 300 [136] | 28.3 | - | 29.6 | - | - | - |
Zhe et al. 300 [135] | 29.9 | 49.9 | 31.3 | 10.6 | 24.5 | 47.6 |
YOLOv3 416 [37] | 31.0 | 55.3 | 32.3 | 15.2 | 33.2 | 42.8 |
GC-YOLOv3 416 [137] | - | 55.5 | - | - | - | |
Mini-YOLOv4-tiny 416 [151] | 23.4 | 42.2 | 23.4 | - | - | - |
TRC-YOLO 416 [152] | 18.4 | 38.4 | 15.6 | 6.3 | 17.6 | 27.2 |
Trident-YOLO 416 [153] | 18.8 | 37.0 | 17.3 | 20.9 | 25.1 | 29.3 |
He et al. 416 [140] | 23.6 | 43.8 | 26.8 | 8.4 | 27.1 | 42.3 |
SSD 512 [35] | 28.8 | 48.5 | 30.3 | 10.9 | 31.8 | 43.5 |
SLMS-SSD 512 [142] | 30.8 | 52.4 | 32.0 | 16.1 | 33.7 | 44.0 |
BANet_S 640 [154] | 40.2 | 58.6 | 23.5 | 44.6 | 53.2 | |
STDN 513 [144] | 31.8 | 51.0 | 33.6 | 14.4 | 36.1 | 43.4 |
YOLOv3 608 [37] | 33.0 | 57.9 | 34.4 | 18.3 | 35.4 | 41.9 |
SFGNet [149] | 32.3 | 54.1 | - | - | - | - |
Zhong et al. [145] | 38.4 | 56.2 | 42.1 | 21.6 | 43.0 | 52.4 |
YOLO-T 640 [146] | 42.0 | 58.3 | 44.1 | - | - | - |
SL-YOLO [148] | 36.8 | 51.3 | 37.3 | 11.7 | 37.7 | 48.4 |
YOLOv5s 640 | 37.4 | 56.8 | 40.7 | 21.2 | 42.3 | 49.0 |
Ours 640 | 40.8 | 59.9 | 43.7 | 21.4 | 44.2 | 52.9 |
Model | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | Model 8 | Model 9 | Model 10 |
---|---|---|---|---|---|---|---|---|---|---|
CARAFE | √ | √ | √ | √ | √ | |||||
SA | √ | √ | √ | √ | √ | |||||
SPPCSPG | √ | √ | √ | √ | ||||||
GSConv | √ | √ | √ | |||||||
DD-head | √ | √ | ||||||||
Params(M) | 7.02 M | 7.15 M | 7.02 M | 9.95 M | 6.35 M | 7.28 M | 7.15 M | 10.09 M | 9.41 M | 9.67 M |
FLOPs(G) | 15.8 | 16.3 | 15.8 | 18.1 | 14.6 | 16.7 | 16.3 | 18.6 | 17.5 | 18.4 |
[email protected] (%) | 0.950 | 0.952 | 0.952 | 0.955 | 0.953 | 0.955 | 0.955 | 0.958 | 0.961 | 0.964 |
[email protected]:0.95 (%) | 0.653 | 0.659 | 0.652 | 0.676 | 0.649 | 0.674 | 0.671 | 0.672 | 0.669 | 0.648 |
FPS | 119.0 | 110.9 | 113.9 | 113.1 | 107.8 | 106.8 | 109.9 | 96.2 | 91.2 | 88.8 |
Model | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | Model 8 | Model 9 | Model 10 |
---|---|---|---|---|---|---|---|---|---|---|
CARAFE | √ | √ | √ | √ | √ | |||||
SA | √ | √ | √ | √ | √ | |||||
SPPCSPG | √ | √ | √ | √ | ||||||
GSConv | √ | √ | √ | |||||||
DD-head | √ | √ | ||||||||
Params(M) | 7.06 M | 7.20 M | 7.06 M | 10.00 M | 6.39 M | 7.32 M | 7.20 M | 10.13 M | 9.46 M | 9.71 M |
FLOPs(G) | 15.9 | 16.4 | 15.9 | 18.3 | 14.8 | 16.9 | 16.4 | 18.8 | 17.6 | 18.6 |
[email protected] (%) | 0.837 | 0.840 | 0.839 | 0.847 | 0.841 | 0.842 | 0.841 | 0.848 | 0.850 | 0.851 |
[email protected]:0.95 (%) | 0.585 | 0.595 | 0.588 | 0.606 | 0.603 | 0.596 | 0.595 | 0.615 | 0.619 | 0.619 |
FPS | 143.0 | 125.4 | 141.9 | 122.5 | 125.2 | 128.6 | 120.8 | 110.8 | 98.4 | 90.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, P.; Wang, Q.; Zhang, H.; Mi, J.; Liu, Y. A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s. Remote Sens. 2023, 15, 2429. https://doi.org/10.3390/rs15092429
Liu P, Wang Q, Zhang H, Mi J, Liu Y. A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s. Remote Sensing. 2023; 15(9):2429. https://doi.org/10.3390/rs15092429
Chicago/Turabian StyleLiu, Pengfei, Qing Wang, Huan Zhang, Jing Mi, and Youchen Liu. 2023. "A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s" Remote Sensing 15, no. 9: 2429. https://doi.org/10.3390/rs15092429
APA StyleLiu, P., Wang, Q., Zhang, H., Mi, J., & Liu, Y. (2023). A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s. Remote Sensing, 15(9), 2429. https://doi.org/10.3390/rs15092429