RCF-YOLOv8: A Multi-Scale Attention and Adaptive Feature Fusion Method for Object Detection in Forward-Looking Sonar Images
Abstract
Highlights
- YOLO is highly effective in underwater object detection, and the RCF-YOLOv8 network, proposed for forward-looking sonar images, further enhances underwater object detection capabilities.
- RCF-YOLOv8 significantly enhances spatial perception, feature representation, and fusion quality, thereby reducing false positives and missed detections in complex underwater environments.
- This method provides a robust and efficient solution for underwater object detection, helping to enhance the practical application capabilities of underwater unmanned systems in target identification and operational tasks.
- This research provides new insights into acoustic image analysis and highlights the importance of domain adaptation mechanisms in detection tasks.
Abstract
1. Introduction
- CoordConv is used to solve the problem of the insufficient spatial perception of ordinary convolution. By explicitly embedding coordinate information, the network’s modeling ability of the target spatial position is directly improved.
- The backbone network is combined with the EMA module for cross-space learning to solve the problem of feature incoherence that is caused by sparse target distribution in FLS images. The relevance of multi-scale features is enhanced through cross-channel and cross-space attention mechanisms, the model’s ability to represent sparse targets is improved, and a new connection strategy is introduced to reduce information loss.
- The C2f-Fusion module is proposed to reduce the impact of FLS image blurring. By optimizing feature fusion, the fusion quality is improved and context information is captured more effectively.
2. Methods
2.1. CoordConv
2.2. EMA Module
2.3. C2f-Fusion Module
3. Experimental Results and Analysis
3.1. Sonar Image Dataset
3.2. Experimental Setup and Model Training
3.3. Performance Evaluation Indicators
3.4. Experimental Results
3.4.1. Image Denoising Strategy
3.4.2. Ablation Experiment
3.4.3. Comparative Experiment
3.5. Verification of the Generalization Ability of Classification Models
4. Discussion
4.1. Acoustic Image Object Detection Network
4.2. Challenges in Underwater Object Detection
4.3. Generalization and Interpretability Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yang, G.; Cong, H.; Zhao, M.; Gong, Z. Underwater sonar image detection algorithm based on corner. In Proceedings of the 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), Virtual, 25–27 February 2022; Volume 12348, pp. 336–343. [Google Scholar]
- Li, Y.; Ye, X.; Zhang, W.; Liu, W. DCSP-Yolov5: Improved Yolov5 Based on Dilated Convolution for Object Detection of Forward-Looking Sonar Images. In Proceedings of the OCEANS 2022, Hampton Roads, VA, USA, 17–20 October 2022; pp. 1–5. [Google Scholar]
- Li, Z.; Xie, Z.; Duan, P.; Kang, X.; Li, S. Dual Spatial Attention Network for Underwater Object Detection with Sonar Imagery. IEEE Sens. J. 2024, 24, 6998–7008. [Google Scholar] [CrossRef]
- Li, D.; Qu, D.; Li, X.; Li, L.; Gao, Q.; Yu, X. Lightweight global adaptive feature enhancement network for underwater object detection with sonar image. J. Phys. Conf. Ser. 2024, 2914, 012023. [Google Scholar] [CrossRef]
- Yuanzi, L.; Xiufen, Y.; Weizheng, Z. Transyolo: High-performance object detector for forward looking sonar images. IEEE Signal Process. Lett. 2022, 29, 2098–2102. [Google Scholar]
- Zheng, L.; Hu, T.; Zhu, J. Underwater sonar target detection based on improved ScEMA YOLOv8. IEEE Geosci. Remote. Sens. Lett. 2024, 21, 1503505. [Google Scholar] [CrossRef]
- Fortes, I.S.; Araujo, J.C.; Pereira, B.S.B.; Seoane, J.C.S. Sea bottom types of a coral reef marine protected area revealed by side scan survey. In Proceedings of the 2015 IEEE/OES Acoustics in Underwater Geosciences Symposium (RIO Acoustics), Rio de Janeiro, Brazil, 29–31 July 2015; pp. 1–9. [Google Scholar]
- Williams, D.P. Fast target detection in synthetic aperture sonar imagery: A new algorithm and large-scale performance analysis. IEEE J. Ocean. Eng. 2014, 40, 71–92. [Google Scholar] [CrossRef]
- Song, S.B.; Liu, J.F.; Ni, H.Y.; Cao, X.L.; Pu, H.; Huang, B.X. A new automatic thresholding algorithm for unimodal gray-level distribution images by using the gray gradient information. J. Pet. Sci. Eng. 2020, 190, 107074. [Google Scholar] [CrossRef]
- Bianco, M.J.; Gerstoft, P.; Traer, J.; Ozanich, E.; Roch, M.A.; Gannot, S.; Deledalle, C.A. Machine learning in acoustics: Theory and applications. J. Acoust. Soc. Am. 2019, 146, 3590–3628. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Szegedy, C.; Toshev, A.; Erhan, D. Deep Neural Networks for Object Detection. In Advances in Neural Information Processing Systems; Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K., Eds.; Curran Associates, Inc.: Nice, France, 2013; Volume 26. [Google Scholar]
- Karimanzira, D.; Renkewitz, H.; Shea, D.; Albiez, J. Object detection in sonar images. Electronics 2020, 9, 1180. [Google Scholar] [CrossRef]
- Topini, E.; Fanelli, F.; Topini, A.; Pebody, M.; Ridolfi, A.; Phillips, A.B.; Allotta, B. An experimental comparison of Deep Learning strategies for AUV navigation in DVL-denied environments. Ocean. Eng. 2023, 274, 114034. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. arXiv 2015, arXiv:1504.08083. [Google Scholar] [CrossRef]
- Ren, S. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
- Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804, pp. 1–6. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef]
- Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Zhang, Y.; Guo, Z.; Wu, J.; Tian, Y.; Tang, H.; Guo, X. Real-time vehicle detection based on improved yolo v5. Sustainability 2022, 14, 12274. [Google Scholar] [CrossRef]
- Kim, J.H.; Kim, N.; Park, Y.W.; Won, C.S. Object detection and classification based on YOLO-V5 with improved maritime dataset. J. Mar. Sci. Eng. 2022, 10, 377. [Google Scholar] [CrossRef]
- Wang, X.; He, N.; Hong, C.; Wang, Q.; Chen, M. Improved YOLOX-X based UAV aerial photography object detection algorithm. Image Vis. Comput. 2023, 135, 104697. [Google Scholar] [CrossRef]
- Li, S.; Fu, X.; Dong, J. Improved ship detection algorithm based on YOLOX for SAR outline enhancement image. Remote. Sens. 2022, 14, 4070. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, J.; Yu, S.; Wang, K.; Han, Z.; Tang, Y. Underwater Object Detection based on YOLO-v3 network. In Proceedings of the 2021 IEEE International Conference on Unmanned Systems (ICUS), Beijing, China, 15–17 October 2021; pp. 571–575. [Google Scholar]
- Li, S.; Zhang, W.; Luo, R.; Zeng, P.; Jiang, X.; Zhu, L.; Wang, Z. A Research of Deep Learning on Target Detection of Underwater Sonar Images. In Proceedings of the 2022 10th International Conference on Information Systems and Computing Technology (ISCTech), Guilin, China, 28–30 December 2022; pp. 759–765. [Google Scholar]
- Xie, B.; He, S.; Cao, X. Target detection for forward looking sonar image based on deep learning. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022; pp. 7191–7196. [Google Scholar]
- Steiniger, Y.; Groen, J.; Stoppe, J.; Kraus, D.; Meisen, T. A study on modern deep learning detection algorithms for automatic target recognition in sidescan sonar images. In Proceedings of the Meetings on Acoustics, Virtual, 8–10 June 2021; Volume 44. [Google Scholar]
- Fan, X.; Lu, L.; Shi, P.; Zhang, X. A novel sonar target detection and classification algorithm. Multimed. Tools Appl. 2022, 81, 10091–10106. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Chen, R.; Zhan, S.; Chen, Y. Underwater target detection algorithm based on YOLO and Swin transformer for sonar images. In Proceedings of the OCEANS 2022, Hampton Roads, VA, USA, 21–24 February 2022; pp. 1–7. [Google Scholar]
- Aubard, M.; Madureira, A.; Madureira, L.; Pinto, J. Real-time automatic wall detection and localization based on side scan sonar images. In Proceedings of the 2022 IEEE/OES Autonomous Underwater Vehicles Symposium (AUV), Singapore, 19–21 September 2022; pp. 1–6. [Google Scholar]
- Xing, B.; Sun, M.; Liu, Z.; Guan, L.; Han, J.; Yan, C.; Han, C. Sonar Fish School Detection and Counting Method Based on Improved YOLOv8 and BoT-SORT. J. Mar. Sci. Eng. 2024, 12, 964. [Google Scholar] [CrossRef]
- Yulin, T.; Jin, S.; Bian, G.; Zhang, Y. Shipwreck target recognition in side-scan sonar images by improved YOLOv3 model based on transfer learning. IEEE Access 2020, 8, 173450–173460. [Google Scholar] [CrossRef]
- Peng, C.; Jin, S.; Liu, H.; Zhang, W.; Xia, H. Adversarial enhancement generation method for side-scan sonar images based on DDPM–YOLO. Mar. Geod. 2024, 47, 526–554. [Google Scholar] [CrossRef]
- Qu, P.; Cheng, E.; Chen, K. Real-Time Ocean Small Target Detection Based on Improved YOLOX Network. In Proceedings of the OCEANS 2022, Hampton Roads, VA, USA, 21–24 February 2022; pp. 1–5. [Google Scholar]
- Zhang, F.; Zhang, W.; Cheng, C.; Hou, X.; Cao, C. Detection of small objects in side-scan sonar images using an enhanced YOLOv7-based approach. J. Mar. Sci. Eng. 2023, 11, 2155. [Google Scholar] [CrossRef]
- Zhuang, Y.; Liu, J.; Zhao, H.; Ma, L.; Fang, Z.; Li, L.; Wu, C.; Cui, W.; Liu, Z. A deep learning framework based on structured space model for detecting small objects in complex underwater environments. Commun. Eng. 2025, 4, 24. [Google Scholar] [CrossRef]
- Wu, T.; Dong, Y. YOLO-SE: Improved YOLOv8 for remote sensing object detection and recognition. Appl. Sci. 2023, 13, 12977. [Google Scholar] [CrossRef]
- Shen, L.; Lang, B.; Song, Z. DS-YOLOv8-Based object detection method for remote sensing images. IEEE Access 2023, 11, 125122–125137. [Google Scholar] [CrossRef]
- Liu, R.; Lehman, J.; Molino, P.; Petroski Such, F.; Frank, E.; Sergeev, A.; Yosinski, J. An intriguing failing of convolutional neural networks and the coordconv solution. Adv. Neural Inf. Process. Syst. 2018, 31, 9628–9639. [Google Scholar]
- Lee, B.; Ku, B.; Kim, W.; Kim, S.; Ko, H. Feature sparse coding with coordconv for side scan sonar image enhancement. IEEE Geosci. Remote. Sens. Lett. 2020, 19, 1–5. [Google Scholar] [CrossRef]
- Zhang, H.; Tian, M.; Shao, G.; Cheng, J.; Liu, J. Target detection of forward-looking sonar image based on improved YOLOv5. IEEE Access 2022, 10, 18023–18034. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Zhang, X.; Zhu, D.; Gan, W. YOLOv7t-CEBC Network for Underwater Litter Detection. J. Mar. Sci. Eng. 2024, 12, 524. [Google Scholar] [CrossRef]
- Quine, W.V. Concatenation as a basis for arithmetic. J. Symb. Log. 1946, 11, 105–114. [Google Scholar] [CrossRef]
- Levy, O.; Lee, K.; FitzGerald, N.; Zettlemoyer, L. Long short-term memory as a dynamically computed element-wise weighted sum. arXiv 2018, arXiv:1805.03716. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Dai, Y.; Gieseke, F.; Oehmcke, S.; Wu, Y.; Barnard, K. Attentional feature fusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 3560–3569. [Google Scholar]
- Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight underwater object detection based on yolo v4 and multi-scale attentional feature fusion. Remote. Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- Zhao, W.; Kang, Y.; Chen, H.; Zhao, Z.; Zhao, Z.; Zhai, Y. Adaptively attentional feature fusion oriented to multiscale object detection in remote sensing images. IEEE Trans. Instrum. Meas. 2023, 72, 1–11. [Google Scholar] [CrossRef]
- Tian, Y.; Lan, L.; Guo, H. A review on the wavelet methods for sonar image segmentation. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420936091. [Google Scholar] [CrossRef]
- Yang, H.; Yu, X.; Zhang, T.; Zhou, T. SSE-YOLO: A lighter and faster object detection network for small targets in sonar images. In Proceedings of the 2023 IEEE 11th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 21–22 October 2023; pp. 230–234. [Google Scholar]
- Danielyan, A.; Katkovnik, V.; Egiazarian, K. BM3D frames and variational image deblurring. IEEE Trans. Image Process. 2011, 21, 1715–1728. [Google Scholar] [CrossRef]
- Wen, X.; Wang, J.; Cheng, C.; Zhang, F.; Pan, G. Underwater side-scan sonar target detection: YOLOv7 model combined with attention mechanism and scaling factor. Remote. Sens. 2024, 16, 2492. [Google Scholar] [CrossRef]
- Most, T.; Will, J. Sensitivity analysis using the Metamodel of Optimal Prognosis. arXiv 2024, arXiv:2408.03590. [Google Scholar] [CrossRef]
- Ma, Q.; Jiang, L.; Yu, W.; Jin, R.; Wu, Z.; Xu, F. Training with noise adversarial network: A generalization method for object detection on sonar image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 729–738. [Google Scholar]
- Dwivedi, R.; Dave, D.; Naik, H.; Singhal, S.; Omer, R.; Patel, P.; Qian, B.; Wen, Z.; Shah, T.; Morgan, G.; et al. Explainable AI (XAI): Core ideas, techniques, and solutions. ACM Comput. Surv. 2023, 55, 1–33. [Google Scholar] [CrossRef]
- Gunning, D.; Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 2019, 40, 44–58. [Google Scholar]
Experiment | YOLOv8 | CoordConv | EMA | C2f-Fusion Module | mAP50 (%) | mAP50-95 (%) | Params (M) | FPS |
---|---|---|---|---|---|---|---|---|
1 | ✓ | - | - | - | 96.4 | 57.2 | 11.14 | 909 |
2 | ✓ | ✓ | - | - | 98.8 | 67.0 | 11.16 | 227 |
3 | ✓ | - | ✓ | - | 98.6 | 67.1 | 11.55 | 833 |
4 | ✓ | - | - | ✓ | 98.5 | 65.7 | 11.56 | 833 |
5 | ✓ | - | ✓ | ✓ | 98.8 | 66.9 | 11.97 | 769 |
6 | ✓ | ✓ | - | ✓ | 98.7 | 67.4 | 11.58 | 222 |
7 | ✓ | ✓ | ✓ | - | 98.7 | 67.1 | 11.57 | 217 |
8 | ✓ | ✓ | ✓ | ✓ | 98.8 | 67.6 | 11.99 | 208 |
Model | Input Size | mAP50 (%) | mAP5095 (%) | Params (M) | GFLOPs |
---|---|---|---|---|---|
Faster R-CNN | 640 × 640 | 64.5 | 24.0 | 137.1 | 370.2 |
SSD | 640 × 640 | 91.9 | 46.2 | 26.29 | 62.7 |
YOLOv3-tiny | 640 × 640 | 96.0 | 51.4 | 8.69 | 13.0 |
YOLOv4-tiny | 640 × 640 | 94.9 | 48.0 | 5.89 | 16.2 |
YOLOv5s | 640 × 640 | 95.4 | 46.2 | 7.04 | 16.0 |
YOLOv7s | 640 × 640 | 89.7 | 38.8 | 6.03 | 13.2 |
YOLOv8s | 640 × 640 | 96.4 | 57.2 | 11.14 | 28.7 |
RCF-YOLOv8 | 640 × 640 | 98.8 | 67.6 | 11.99 | 29.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Chen, Y.; Liu, X.; Qin, Z.; Wan, J.; Yan, Q. RCF-YOLOv8: A Multi-Scale Attention and Adaptive Feature Fusion Method for Object Detection in Forward-Looking Sonar Images. Remote Sens. 2025, 17, 3288. https://doi.org/10.3390/rs17193288
Li X, Chen Y, Liu X, Qin Z, Wan J, Yan Q. RCF-YOLOv8: A Multi-Scale Attention and Adaptive Feature Fusion Method for Object Detection in Forward-Looking Sonar Images. Remote Sensing. 2025; 17(19):3288. https://doi.org/10.3390/rs17193288
Chicago/Turabian StyleLi, Xiaoxue, Yuhan Chen, Xueqin Liu, Zhiliang Qin, Jiaxin Wan, and Qingyun Yan. 2025. "RCF-YOLOv8: A Multi-Scale Attention and Adaptive Feature Fusion Method for Object Detection in Forward-Looking Sonar Images" Remote Sensing 17, no. 19: 3288. https://doi.org/10.3390/rs17193288
APA StyleLi, X., Chen, Y., Liu, X., Qin, Z., Wan, J., & Yan, Q. (2025). RCF-YOLOv8: A Multi-Scale Attention and Adaptive Feature Fusion Method for Object Detection in Forward-Looking Sonar Images. Remote Sensing, 17(19), 3288. https://doi.org/10.3390/rs17193288