YOLO-DRS: A Bioinspired Object Detection Algorithm for Remote Sensing Images Incorporating a Multi-Scale Efficient Lightweight Attention Mechanism
Abstract
:1. Introduction
2. Related Work
- Based on the original EMA attention, a new module LEC(LDW-EMA-C3) is proposed for the fusion of a multi-scale lightweight efficient attention with the C3 structure in YOLOv5, replacing the last two C3 modules of the backbone with LDW-EMA to extract high-dimensional feature information at different scales.
- In the upsampling process of YOLOv5, the upsampling transposed convolution is introduced to replace the original nearest-neighbor interpolation upsampling to reduce the loss of the feature information of small targets in the upsampling process.
3. The Basic Structure Of YOLOv5s
4. Proposed Method
4.1. The LEC Module
4.2. Transposed Convolution
- Fill rows s-1 and column 0 between input feature mapping elements (where s denotes stride to transform convolution).
- Fill k-p-1 rows and column 0 around the input feature map (where k denotes the kernel_size size of the transposed convolutional and p is the padding of the transposed convolution).
- Flip the convolutional kernel parameters up and down, left and right.
- Perform normal convolution operations (padding = 0, stride = 1).
- First, fill s-1 = 0 rows and column 0 (equal to 0 without padding) between elements.
- Second, fill k-p-1 = 2 rows and columns around the feature map 0.
- Third, the convolutional kernel parameters are flipped up and down, left and right.
- Finally, perform normal convolutional (padding = 0, stride = 1).
5. Experiments
5.1. Datasets
5.2. Evaluation Metrics
5.2.1. Precision
5.2.2. Recall
5.2.3. Mean Average Precision
5.2.4. FLOPs
5.2.5. FPS
5.3. Network Training and Parameter Setting
5.3.1. Parameter Setting
5.3.2. Network Training
5.4. Analysis of Results
5.5. Visualization Experiments
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, F.; Zhu, J.; Wang, W.; Kuang, M. Surface-to-air missile sites detection agent with remote sensing images. Sci. China Inf. Sci. 2021, 64, 1–3. [Google Scholar] [CrossRef]
- Zhang, Y.; Ning, G.; Chen, S.; Yang, Y. Impact of rapid urban sprawl on the local meteorological observational environment based on remote sensing images and GIS technology. Remote Sens. 2021, 13, 2624. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
- Platt, J.C. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines; Microsoft: Redmond, WA, USA, 1998. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; Mcallester, D.A. Cascade object detection with deformable part models. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ullah, F.; Ullah, S.; Naeem, M.R.; Mostarda, L.; Rho, S.; Cheng, X. Cyber-threat detection system using a hybrid approach of transfer learning and multi-model image representation. Sensors 2022, 22, 5883. [Google Scholar] [CrossRef] [PubMed]
- Du, F.J.; Jiao, S.J. Improvement of lightweight convolutional neural network model based on YOLO algorithm and its research in pavement defect detection. Sensors 2022, 22, 3537. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A new lightweight deep neural network for surface scratch detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef] [PubMed]
- Shen, L.; Tao, H.; Ni, Y.; Wang, Y.; Stojanovic, V. Improved YOLOv3 model with feature map cropping for multi-scale road object detection. Meas. Sci. Technol. 2023, 34, 045406. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, F.; Zhang, Y.; Liu, Y.; Cheng, T. Lightweight Object Detection Algorithm for UAV Aerial Imagery. Sensors 2023, 23, 5786. [Google Scholar] [CrossRef]
- Qu, Z.; Zhu, F.; Qi, C. Remote sensing image target detection: Improvement of the YOLOv3 model with auxiliary networks. Remote. Sens. 2021, 13, 3908. [Google Scholar] [CrossRef]
- Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for multi-scale remote sensing target detection. Sensors 2020, 20, 4276. [Google Scholar] [PubMed]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 4700–4708. [Google Scholar]
- Xu, D.; Wu, Y. FE-YOLO: A feature enhancement network for remote sensing target detection. Remote Sens. 2021, 13, 1311. [Google Scholar] [CrossRef]
- Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 652–662. [Google Scholar] [CrossRef] [PubMed]
- Cao, C.; Wu, J.; Zeng, X.; Feng, Z.; Wang, T.; Yan, X.; Wu, Z.; Wu, Q.; Huang, Z. Research on airplane and ship detection of aerial remote sensing images based on convolutional neural network. Sensors 2020, 20, 4696. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 2881–2890. [Google Scholar]
- Li, R.; Shen, Y. YOLOSR-IST: A deep learning method for small target detection in infrared remote sensing images based on super-resolution and YOLO. Signal Process. 2023, 208, 108962. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 390–391. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 June 2015; pp. 448–456. [Google Scholar]
- Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Project | Environment |
---|---|
Operating System | Ubuntu |
CPU | E5-2680 v4 |
GPU | GeForce RTX 3060 |
Memory | 14 GB |
Pytorch version | 1.10.0 |
CUDA | 11.1 |
Dataset Labeling | Number of Images |
---|---|
aircraft | 446 |
playground | 189 |
overpass | 176 |
oiltank | 165 |
Parameters | Value |
---|---|
weights | yolov5s.pt |
division ratio | 7:2:1 (train:val:test) |
optimizer | SGD |
batch size | 16 |
epochs | 100 |
Method | Precision | Recall | [email protected] | FLOPs |
---|---|---|---|---|
YOLOv5s | 0.930 | 0.939 | 0.950 | 15.8 |
+LEC-top | 0.959 | 0.956 | 0.972 | 15.9 |
+LEC-bottom | 0.962 | 0.925 | 0.970 | 15.9 |
+LEC-both | 0.967 | 0.936 | 0.963 | 16.0 |
Method | Precision | Recall | [email protected] | FLOPs |
---|---|---|---|---|
YOLOv5s | 0.930 | 0.939 | 0.950 | 15.8 |
+LDW (ReLU) | 0.959 | 0.956 | 0.972 | 15.9 |
+LDW (SiLU) | 0.953 | 0.975 | 0.967 | 15.9 |
+LDW (Mish) | 0.961 | 0.952 | 0.968 | 15.9 |
Method | Precision | Recall | [email protected] | FLOPs |
---|---|---|---|---|
YOLOv5s | 0.930 | 0.939 | 0.950 | 15.8 |
+Trans-first | 0.963 | 0.939 | 0.961 | 15.8 |
+Trans-second | 0.966 | 0.936 | 0.975 | 15.8 |
+Trans-both | 0.959 | 0.971 | 0.971 | 15.8 |
Method | Precision | Recall | [email protected] | FLOPs |
---|---|---|---|---|
YOLOv5s | 0.930 | 0.939 | 0.950 | 15.8 |
+EMA | 0.937 | 0.957 | 0.956 | 15.8 |
+LDW (3 × 3) | 0.921 | 0.960 | 0.964 | 15.8 |
+LDW-EMA | 0.959 | 0.956 | 0.972 | 16.0 |
Method | Precision | Recall | [email protected] | FLOPs |
---|---|---|---|---|
YOLOv5s | 0.930 | 0.939 | 0.950 | 15.8 |
+C3CA | 0.970 | 0.935 | 0.957 | 15.8 |
+C3ECA | 0.950 | 0.940 | 0.955 | 15.8 |
+C3SE | 0.990 | 0.923 | 0.968 | 15.8 |
+LEC | 0.959 | 0.956 | 0.972 | 16.0 |
Method | Precision | Recall | [email protected] | FLOPs | FPS/(frame/s) |
---|---|---|---|---|---|
YOLOv5s | 0.930 | 0.939 | 0.950 | 15.8 | 76.8 |
+LEC | 0.959 | 0.956 | 0.972 | 16.0 | 51.9 |
+Trans-both | 0.959 | 0.971 | 0.971 | 15.8 | 69.7 |
YOLO-DRS | 0.953 | 0.971 | 0.975 | 16.0 | 53.9 |
Method | Precision | Recall | [email protected] |
---|---|---|---|
YOLOv5s | 0.930 | 0.939 | 0.950 |
YOLOv7-tiny | 0.953 | 0.957 | 0.957 |
YOLOv8s | 0.871 | 0.864 | 0.902 |
YOLO-DRS | 0.953 | 0.971 | 0.975 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liao, H.; Zhu, W. YOLO-DRS: A Bioinspired Object Detection Algorithm for Remote Sensing Images Incorporating a Multi-Scale Efficient Lightweight Attention Mechanism. Biomimetics 2023, 8, 458. https://doi.org/10.3390/biomimetics8060458
Liao H, Zhu W. YOLO-DRS: A Bioinspired Object Detection Algorithm for Remote Sensing Images Incorporating a Multi-Scale Efficient Lightweight Attention Mechanism. Biomimetics. 2023; 8(6):458. https://doi.org/10.3390/biomimetics8060458
Chicago/Turabian StyleLiao, Huan, and Wenqiu Zhu. 2023. "YOLO-DRS: A Bioinspired Object Detection Algorithm for Remote Sensing Images Incorporating a Multi-Scale Efficient Lightweight Attention Mechanism" Biomimetics 8, no. 6: 458. https://doi.org/10.3390/biomimetics8060458
APA StyleLiao, H., & Zhu, W. (2023). YOLO-DRS: A Bioinspired Object Detection Algorithm for Remote Sensing Images Incorporating a Multi-Scale Efficient Lightweight Attention Mechanism. Biomimetics, 8(6), 458. https://doi.org/10.3390/biomimetics8060458