DSW-YOLOv8n: A New Underwater Target Detection Algorithm Based on Improved YOLOv8n
Abstract
:1. Introduction
- (1)
- To improve adaptability to object deformations and enable more precise convolutional operations, we replace certain C2f modules in the YOLOv8n backbone feature extraction network with deformable convolutional v2 modules.
- (2)
- We introduce an attention mechanism (SimAm) to the network structure, which does not introduce external parameters but assigns a 3D attention weight to the feature map.
- (3)
- Resolving a problem with the loss function in which discrepancies between the direction of the prediction boxes and the ground truth bounding boxes may result in oscillations in the position of the prediction box during training, slowing convergence and lowering prediction accuracy. We suggest using the WIoU v3 loss function to better improve the network structure in order to get around this.
2. Related Work
2.1. Objection Detection Algorithm
2.2. Fusion of Deformable Convolutional Feature Extraction Network
2.3. Simple and Efficient Parameter-Free Attention Mechanism
2.4. Loss Function with Dynamic Focusing Mechanism
2.4.1. WIoU v1
2.4.2. WIoU v2
2.4.3. WIoU v3
3. Experiment
3.1. Underwater Target Detection Dataset
3.2. Experimental Configuration and Environment
3.3. Model Evaluation Metrics
4. Analysis and Discussion of Experimental Result
4.1. Comparison of Experimental Results of Different Model
4.2. Comparison of Ablation Experiments
4.3. Pascal VOC Dataset Experimental Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sun, Y.; Zheng, W.; Du, X.; Yan, Z. Underwater small target detection based on yolox combined with mobilevit and double coordinate attention. J. Mar. Sci. Eng. 2023, 11, 1178. [Google Scholar] [CrossRef]
- Zvarikova, K.; Rowland, Z.; Nica, E. Multisensor fusion and dynamic routing technologies, virtual navigation and simulation modeling tools, and image processing computational and visual cognitive algorithms across web3-powered metaverse worlds. Anal. Metaphys. 2022, 21, 125–141. [Google Scholar]
- Kovacova, M.; Oláh, J.; Popp, J.; Nica, E. The algorithmic governance of autonomous driving behaviors: Multi-sensor data fusion, spatial computing technologies, and movement tracking tools. Contemp. Read. Law Soc. Justice 2022, 14, 27–45. [Google Scholar]
- Yan, J.; Zhou, Z.; Zhou, D.; Su, B.; Xuanyuan, Z.; Tang, J.; Lai, Y.; Chen, J.; Liang, W. Underwater object detection algorithm based on attention mechanism and cross-stage partial fast spatial pyramidal pooling. Front. Mar. Sci. 2022, 9, 1056300. [Google Scholar] [CrossRef]
- Wang, X.; Xue, G.; Huang, S.; Liu, Y. Underwater object detection algorithm based on adding channel and spatial fusion attention mechanism. J. Mar. Sci. Eng. 2023, 11, 1116. [Google Scholar] [CrossRef]
- Novak, A.; Sedlackova, A.N.; Vochozka, M.; Popescu, G.H. Big data-driven governance of smart sustainable intelligent transportation systems: Autonomous driving behaviors, predictive modeling techniques, and sensing and computing technologies. Contemp. Read. Law Soc. Justice 2022, 14, 100–117. [Google Scholar]
- Wen, G.; Li, S.; Liu, F.; Luo, X.; Er, M.-J.; Mahmud, M.; Wu, T. Yolov5s-ca: A modified yolov5s network with coordinate attention for underwater target detection. Sensors 2023, 23, 3367. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, G.; Li, H.; Liu, H.; Tan, J.; Xue, X. Underwater target detection algorithm based on improved yolov4 with semidsconv and fiou loss function. Front. Mar. Sci. 2023, 10, 1153416. [Google Scholar] [CrossRef]
- Lei, Z.; Lei, X.; Zhou, C.; Qing, L.; Zhang, Q. Compressed sensing multiscale sample entropy feature extraction method for underwater target radiation noise. IEEE Access 2022, 10, 77688–77694. [Google Scholar] [CrossRef]
- Li, W.; Zhang, Z.; Jin, B.; Yu, W. A real-time fish target detection algorithm based on improved yolov5. J. Mar. Sci. Eng. 2023, 11, 572. [Google Scholar] [CrossRef]
- Zhang, Y.; Ni, Q. A novel weld-seam defect detection algorithm based on the s-yolo model. Axioms 2023, 12, 697. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolo9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Terven, J.; Cordova-Esparza, D. A comprehensive review of yolo: From yolov1 to yolov8 and beyond. arXiv 2023, arXiv:2304.00501. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. Yolov6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. Damo-yolo: A report on real-time object detection design. arXiv 2022, arXiv:2211.15444. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-iou loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York Hilton Midtown, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Lou, H.; Duan, X.; Guo, J.; Liu, H.; Gu, J.; Bi, L.; Chen, H. Dc-yolov8: Small-size object detection algorithm based on camera sensor. Electronics 2023, 12, 2323. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, H.; Yan, X.; Zhou, K.; Zhang, J.; Zhang, Y.; Jiang, H.; Shao, B. An improved yolov5 underwater detector based on an attention mechanism and multi-branch reparameterization module. Electronics 2023, 12, 2597. [Google Scholar] [CrossRef]
- Lei, F.; Tang, F.; Li, S. Underwater target detection algorithm based on improved yolov5. J. Mar. Sci. Eng. 2022, 10, 310. [Google Scholar] [CrossRef]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. Eca-net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In International Conference on Machine Learning; PMLR: Westminster, UK, 2021; pp. 11863–11874. [Google Scholar]
- Lai, Y.; Ma, R.; Chen, Y.; Wan, T.; Jiao, R.; He, H. A pineapple target detection method in a field environment based on improved yolov7. Appl. Sci. 2023, 13, 2691. [Google Scholar] [CrossRef]
- Dong, C.; Cai, C.; Chen, S.; Xu, H.; Yang, L.; Ji, J.; Huang, S.; Hung, I.-K.; Weng, Y.; Lou, X. Crown width extraction of metasequoia glyptostroboides using improved yolov7 based on uav images. Drones 2023, 7, 336. [Google Scholar] [CrossRef]
- Mao, R.; Wang, Z.; Li, F.; Zhou, J.; Chen, Y.; Hu, X. Gseyolox-s: An improved lightweight network for identifying the severity of wheat fusarium head blight. Agronomy 2023, 13, 242. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient iou loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-iou: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Zhu, Q.; Ma, K.; Wang, Z.; Shi, P. Yolov7-csaw for maritime target detection. Front. Neurorobot. 2023, 17, 1210470. [Google Scholar] [CrossRef]
- Zhao, Q.; Wei, H.; Zhai, X. Improving tire specification character recognition in the yolov5 network. Appl. Sci. 2023, 13, 7310. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Experiment Set | Train | Test | Validation | Total | |||
---|---|---|---|---|---|---|---|
Quantity of images | 1109 | 317 | 159 | 1585 | |||
Category | jellyfish | fish | Sea urchin | Scallop | Sea grass | Shark | Sea turtle |
Quantity of samples | 356 | 1939 | 3335 | 1537 | 271 | 527 | 340 |
Environment | Version or Model Number |
---|---|
Operating System | Ubuntu18.04 |
CUDA Version | 11.3 |
CPU | Intel(R) Xeon(R) CPU E5-2620 v4 |
GPU | 4 |
RAM | 126G |
Python version | Python 3.8 |
Deep learning framework | Pytorch-1.12.0 |
Model | Backbone | Flops/G | Params/M | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|---|
DAMO-YOLO | CSP-Darknet | 18.1 | 8.5 | 72.5 | 37.2 |
YOLOX | Darknet53 | 26.8 | 9.0 | 81.45 | 42.7 |
YOLOv7 | E-ELAN | 105.2 | 37.2 | 83.5 | 46.3 |
YOLOv8n | Darknet53 | 3.0 | 8.2 | 88.6 | 51.8 |
DSW-YOLOv8n | Darknet53(Our) | 3.13 | 7.7 | 91.8 | 55.9 |
Model | Flops/G | Params/M | Average Detection Time/ms | Recall | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|---|---|
YOLOv8n | 3.0 | 8.2 | 5 | 84.8 | 88.6 | 51.8 |
YOLOv8n + DefConv2 | 3.13 | 7.7 | 7.4 | 86.1 | 91.0 | 54 |
YOLOv8n + SimAm | 3.0 | 8.2 | 10.1 | 87.5 | 90.2 | 55.2 |
YOLOv8n + WIoUv3 | 3.0 | 8.2 | 5.4 | 85.9 | 91.6 | 53.8 |
YOLOv8n + DefConv2 + SimAm | 3.13 | 7.7 | 10.6 | 80.4 | 91.6 | 53.5 |
YOLOv8n + DefConv2 + WIoUv3 | 3.13 | 7.7 | 8.1 | 85.8 | 91.5 | 54.3 |
YOLOv8n + SimAm + WIoUv3 | 3.0 | 8.2 | 5 | 81.4 | 88.5 | 54.8 |
DSW-YOLOv8n | 3.13 | 7.7 | 8.7 | 85.1 | 91.8 | 55.9 |
YOLOv8n | Average Detection Time/ms | mAP@0.5 | mAP@0.5:0.95 | ||||
---|---|---|---|---|---|---|---|
DefConv2 | SimAm | WIoUv1 | WIoUv2 | WIoUv3 | |||
√ | √ | √ | 8.73 | 90.94 | 55.8 | ||
√ | √ | √ | 10.6 | 91.01 | 55.3 | ||
√ | √ | √ | 8.7 | 91.8 | 55.9 |
Dataset | Model | Flops/G | Params/M | Recall | mAP@0.5 | mAP@0.95 |
---|---|---|---|---|---|---|
Pascal VOC 2012 | YOLOv8n | 3.0 | 8.2 | 55.1 | 62.2 | 45.9 |
YOLOv8n + DefConv2 | 3.13 | 7.8 | 56.3 | 64.7 | 48 | |
YOLOv8n + SimAm | 3.0 | 8.2 | 58.3 | 64.1 | 47.5 | |
YOLOv8n + WIoUv1 | 3.0 | 8.2 | 55.2 | 63.3 | 46.5 | |
YOLOv8n + WIoUv2 | 3.0 | 8.2 | 56.8 | 63.9 | 46.7 | |
YOLOv8n + WIoUv3 | 3.0 | 8.2 | 55.5 | 63.5 | 46.5 | |
YOLOv8n+DefConv2 + SimAm | 3.13 | 7.8 | 55.8 | 64.4 | 48.2 | |
YOLOv8n+DefConv2 + WIoUv1 | 3.13 | 7.8 | 58.6 | 65.4 | 48.4 | |
YOLOv8n+DefConv2 + WIoUv2 | 3.13 | 7.8 | 56.9 | 65.1 | 48.1 | |
YOLOv8n+DefConv2 + WIoUv3 | 3.13 | 7.8 | 57.8 | 64.9 | 47.6 | |
YOLOv8n+SimAm + WIoUv1 | 3.0 | 8.2 | 57 | 63.8 | 46.8 | |
YOLOv8n+SimAm + WIoUv2 | 3.0 | 8.2 | 53.6 | 62.8 | 45.6 | |
YOLOv8n+SimAm + WIoUv3 | 3.0 | 8.2 | 54.5 | 64.2 | 46.4 | |
YOLOv8n + DefConv2 + SimAm + WIoUv1 | 3.13 | 7.8 | 56.5 | 64.5 | 47.7 | |
YOLOv8n + DefConv2 + SimAm + WIoUv2 | 3.13 | 7.8 | 59.8 | 64.7 | 47.3 | |
DSW-YOLOv8n | 3.13 | 7.8 | 59.5 | 65.7 | 48.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Q.; Huang, W.; Duan, X.; Wei, J.; Hu, T.; Yu, J.; Huang, J. DSW-YOLOv8n: A New Underwater Target Detection Algorithm Based on Improved YOLOv8n. Electronics 2023, 12, 3892. https://doi.org/10.3390/electronics12183892
Liu Q, Huang W, Duan X, Wei J, Hu T, Yu J, Huang J. DSW-YOLOv8n: A New Underwater Target Detection Algorithm Based on Improved YOLOv8n. Electronics. 2023; 12(18):3892. https://doi.org/10.3390/electronics12183892
Chicago/Turabian StyleLiu, Qiang, Wei Huang, Xiaoqiu Duan, Jianghao Wei, Tao Hu, Jie Yu, and Jiahuan Huang. 2023. "DSW-YOLOv8n: A New Underwater Target Detection Algorithm Based on Improved YOLOv8n" Electronics 12, no. 18: 3892. https://doi.org/10.3390/electronics12183892
APA StyleLiu, Q., Huang, W., Duan, X., Wei, J., Hu, T., Yu, J., & Huang, J. (2023). DSW-YOLOv8n: A New Underwater Target Detection Algorithm Based on Improved YOLOv8n. Electronics, 12(18), 3892. https://doi.org/10.3390/electronics12183892