Improved Ship Detection with YOLOv8 Enhanced with MobileViT and GSConv
Abstract
:1. Introduction
2. Related Work
2.1. Algorithms Based on Feature Extraction
2.2. Algorithms Based on CNN
2.3. Algorithms Based on Transformer
3. The Proposed Method
3.1. YOLOv8 Algorithm
3.2. Improved YOLOv8
3.2.1. MobileViTSF Block
3.2.2. Small Target Detection Layer
3.2.3. GSConv
4. Experiments
4.1. Dataset Production
4.2. Experimental Configuration
4.3. Model Evaluation Indicators
5. Results and Discussion
5.1. Ablation Experiments
5.2. Comparison Experiments
5.3. Experimental Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, P.; Li, X.; Zheng, G. Rapid detection to long ship wake in synthetic aperture radar satellite imagery. J. Oceanol. Limnol. 2019, 37, 1523–1532. [Google Scholar] [CrossRef]
- Wu, W.; Li, X.; Hu, Z.; Liu, X. Ship Detection and Recognition Based on Improved YOLOv7. Comput. Mater. Contin. 2023, 76, 1. [Google Scholar] [CrossRef]
- Yu, N.; Fan, X.; Deng, T.; Mao, G. Ship detection algorithm with complex background based on multi-head self-attention. J. Zhejiang Univ. Eng. Ed. 2022, 12, 2392–2402. [Google Scholar]
- Lee, S.J.; Roh, M.I.; Lee, H.W.; Ha, J.S.; Woo, I.G. Image-Based Ship Detection and Classification for Unmanned Surface Vehicle using Real-Time Object Detection Neural Networks. In Proceedings of the ISOPE International Ocean and Polar Engineering Conference, ISOPE, Sapporo, Japan, 10–15 June 2018. [Google Scholar]
- Shao, Z.; Wang, L.; Wang, Z.; Du, W.; Wu, W. Saliency-aware convolution neural network for ship detection in surveillance video. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 781–794. [Google Scholar] [CrossRef]
- Ting, L.; Baijun, Z.; Yongsheng, Z.; Shun, Y. Ship Detection Algorithm Based on Improved YOLO V5. In Proceedings of the 2021 6th International Conference on Automation, Control and Robotics Engineering (CACRE), Dalian, China, 15–17 July 2021; pp. 483–487. [Google Scholar]
- Li, H.; Deng, L.; Yang, C.; Liu, J.; Gu, Z. Enhanced YOLO v3 tiny network for real-time ship detection from visual image. IEEE Access 2021, 9, 16692–16706. [Google Scholar] [CrossRef]
- Xie, P.; Tao, R.; Luo, X.; Shi, Y. YOLOv4-MobileNetV2-DW-LCARM: A Real-Time Ship Detection Network. In Communications in Computer and Information Science, Proceedings of the International Conference on Knowledge Management in Organizations, Hagen, Germany, 11–14 July 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 281–293. [Google Scholar]
- Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar] [CrossRef]
- Chen, J.; Wang, J.; Lu, H. Ship Detection in Complex Weather Based on CNN. In Proceedings of the 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 9–11 April 2021; pp. 1225–1228. [Google Scholar]
- Han, X.; Zhao, L.; Ning, Y.; Hu, J. ShipYolo: An enhanced model for ship detection. J. Adv. Transp. 2021, 2021, 1–11. [Google Scholar] [CrossRef]
- Terven, J.; Cordova-Esparza, D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv 2023, arXiv:2304.00501. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Wadekar, S.N.; Chaurasia, A. Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features. arXiv 2022, arXiv:2209.15159. [Google Scholar]
- Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
- Lowe, G. Sift-the scale invariant feature transform. Int. J. 2004, 2, 2. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Forsyth, D. Object detection with discriminatively trained part-based models. Computer 2014, 47, 6–7. [Google Scholar] [CrossRef]
- Neubeck, A.; Van Gool, L. Efficient Non-Maximum Suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards Balanced Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Cui, Y.; Yang, L.; Liu, D. Dynamic proposals for efficient object detection. arXiv 2022, arXiv:2207.05252. [Google Scholar]
- Jung, H.K.; Choi, G.S. Improved yolov5: Efficient object detection using drone images under various conditions. Appl. Sci. 2022, 12, 7255. [Google Scholar] [CrossRef]
- Zhao, J.; Xu, S.; Wang, R.; Zhang, B.; Guo, G.; Doermann, D.; Sun, D. Data-adaptive binary neural networks for efficient object detection and recognition. Pattern Recognit. Lett. 2022, 153, 239–245. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Lecture Notes in Computer Science, Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Li, Y.; Mao, H.; Girshick, R.; He, K. Exploring Plain Vision Transformer Backbones for Object Detection. In Lecture Notes in Computer Science, Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 280–296. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-Aligned One-Stage Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; IEEE Computer Society: Washington, DC, USA, 2021; pp. 3490–3499. [Google Scholar]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Xiong, E.; Zhang, R.; Liu, Y.; Peng, J. Ghost-YOLOv8 detection algorithm for traffic signs. IN Comput. Eng. Appl. 2023, 59, 200–207. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical Guidelines for Efficient cnn Architecture Design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Chen, H.; Wang, Y.; Guo, J.; Tao, D. VanillaNet: The Power of Minimalism in Deep Learning. arXiv 2023, arXiv:2305.12972. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
YOLOv8 | MobileViTSF | P2 | GSConv | mAP0.5 (%) | mAP0.5:0.95 (%) | FLOPs (G) | Size (MB) |
---|---|---|---|---|---|---|---|
√ | 97.9 | 78.0 | 8.1 | 6.2 | |||
√ | √ | 97.8 | 76.4 | 4.9 | 2.7 | ||
√ | √ | 98.5 | 82.3 | 12.2 | 6.5 | ||
√ | √ | 98.1 | 77.5 | 7.0 | 5.5 | ||
√ | √ | √ | 98.1 | 80.3 | 1.2 | 2.6 | |
√ | √ | √ | √ | 98.8 | 82.5 | 3.9 | 3.6 |
Models | Input | mAP0.5 (%) | mAP0.5:0.95 (%) | FLOPs (G) | Size (MB) | FPS (F/S) |
---|---|---|---|---|---|---|
YOLOv8n | 640 × 640 | 97.9 | 78.0 | 8.1 | 6.2 | 102.04 |
YOLOv7-tiny | 640 × 640 | 94.4 | 70.2 | 13.2 | 12.0 | 66.22 |
YOLOX-s | 640 × 640 | 92.1 | 76.8 | 26.77 | 8.9 | 94.3 |
YOLOv6 | 640 × 640 | 98.0 | 78.7 | 11.9 | 8.7 | 83.33 |
YOLOv5n(6.0) | 640 × 640 | 97.7 | 74.5 | 7.8 | 5.2 | 65.36 |
YOLOv5n(6.0)-P2 | 1280 × 1280 | 98.2 | 80.9 | 7.2 | 8.7 | 67.11 |
YOLOv4-tiny | 416 × 416 | 97.1 | 75.2 | 20.8 | 18.6 | 44.64 |
YOLOv3-tiny | 416 × 416 | 96.8 | 71.4 | 18.9 | 24.4 | 163.98 |
Slim-neck+v8 | 640 × 640 | 98.0 | 77.7 | 7.0 | 5.5 | 110.48 |
VanillaNet [49] | 640 × 640 | 96.8 | 73.3 | 10.2 | 7.9 | 96.98 |
MobileViT+v8 | 640 × 640 | 93.6 | 69.0 | 5.3 | 2.6 | 70.42 |
MobileViTv3+v8 | 640 × 640 | 95.8 | 74.1 | 5.5 | 2.9 | 72.53 |
Ours | 1280 × 1280 | 98.8 | 82.5 | 3.9 | 3.6 | 68.49 |
Category | YOLOv8n | Ours | ||
---|---|---|---|---|
mAP0.5(%) | mAP0.5:0.95(%) | mAP0.5(%) | mAP0.5:0.95(%) | |
Ore carrier | 98.4 | 75.6 | 98.9 | 81.6 |
Bulk cargo carrier | 97.8 | 77.2 | 98.8 | 83.9 |
General cargo carrier | 98.5 | 80.5 | 99.2 | 85.3 |
Container carrier | 99.0 | 83.7 | 99.5 | 86.7 |
Fishing boat | 96.1 | 72.4 | 98.0 | 78.7 |
Passenger ship | 97.3 | 73.0 | 98.6 | 78.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, X.; Song, Y. Improved Ship Detection with YOLOv8 Enhanced with MobileViT and GSConv. Electronics 2023, 12, 4666. https://doi.org/10.3390/electronics12224666
Zhao X, Song Y. Improved Ship Detection with YOLOv8 Enhanced with MobileViT and GSConv. Electronics. 2023; 12(22):4666. https://doi.org/10.3390/electronics12224666
Chicago/Turabian StyleZhao, Xuemeng, and Yinglei Song. 2023. "Improved Ship Detection with YOLOv8 Enhanced with MobileViT and GSConv" Electronics 12, no. 22: 4666. https://doi.org/10.3390/electronics12224666
APA StyleZhao, X., & Song, Y. (2023). Improved Ship Detection with YOLOv8 Enhanced with MobileViT and GSConv. Electronics, 12(22), 4666. https://doi.org/10.3390/electronics12224666