DiCAF: A Dual-Input Co-Attention Fusion Network with NMS Ensemble for Underwater Debris Detection
Abstract
1. Introduction
- A dual-backbone architecture based on YOLOv11 is designed, incorporating a co-attention fusion module to effectively integrate complementary features from the original and enhanced images, thereby improving robustness to the structural variability of underwater scenes.
- A lightweight Gaussian filter is introduced as a post-processing step to suppress high-frequency noise amplified during image enhancement, thereby reducing false detections caused by speckle noise from suspended particles.
- An NMS-based ensemble strategy is proposed to integrate detection outputs from three independent branches—DiCAF-fused, original-only, and enhanced-only—achieving superior detection performance compared to any single branch.
2. Related Works
2.1. Underwater Image Enhancement
2.2. Object Detection in Underwater Environment
2.3. Research Gap and Motivation
3. Methodology
3.1. Overall Framework
3.2. Image Enhancement
3.3. Dual-Input Co-Attention Fusion
3.4. NMS-Based Ensemble Strategy
4. Experimental Results
4.1. Experimental Setup
4.2. Ablation Study
4.3. Quantitative Comparison
4.4. Noise Robustness Evaluation
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jalil, B.; Maggiani, L.; Valcarenghi, L. Convolutional neural networks and transformers-based techniques for underwater marine debris classification: A comparative study. IEEE J. Ocean. Eng. 2025, 50, 594–607. [Google Scholar] [CrossRef]
- Nyadjro, E.S.; Webster, J.A.B.; Boyer, T.P.; Cebrian, J.; Collazo, L.; Kaltenberger, G.; Larsen, K.; Lau, Y.H.; Mickle, P.; Toft, T.; et al. The NOAA NCEI marine microplastics database. Sci. Data 2023, 10, 726. [Google Scholar] [CrossRef]
- Sawant, S.; D’sOuza, L.; Kulkarni, A.; Uchil, D.; Nagvenkar, K. Performance evaluation of YOLO variants on marine trash images: A comparative study of YOLOv5, YOLOv7, YOLOv8, and TinyYOLO. In Proceedings of the IEEE Bangalore Humanitarian Technology Conference, Karkala, India, 22–23 March 2024. [Google Scholar]
- Chin, C.S.; Neo, A.B.H.; See, S. Visual marine debris detection using YOLOv5s for autonomous underwater vehicle. In Proceedings of the IEEE/ACIS International Conference on Computer and Information Science, Zhuhai, China, 26–28 June 2022. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state of the art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Varghese, R.; Sambath, M. YOLOv8: A novel object detection algorithm with enhanced performance and robustness. In Proceedings of the International Conference on Advances in Data Engineering and Intelligent Computing Systems, Chennai, India, 18–19 April 2024. [Google Scholar]
- Loh, A.I.L.; Goh, Y.X.; Bingi, L.; Ibrahim, R. YOLOv11-powered emergency vehicle detection algorithm using Tello quadrotor drone. In Proceedings of the IEEE International Conference on Robotics and Technologies for Industrial Automation, Kuala Lumpur, Malaysia, 12 April 2025. [Google Scholar]
- Corrigan, B.C.; Tay, Z.Y.; Konovessis, D. Real-time instance segmentation for detection of underwater litter as a plastic source. J. Mar. Sci. Eng. 2023, 11, 1532. [Google Scholar] [CrossRef]
- Adimulam, R.P.; Thumu, R.R.; Lanke, N. Real-time underwater trash object detection using enhanced YOLOv8. In Proceedings of the International Conference on Intelligent Systems, Advanced Computing and Communication, Silchar, India, 27–28 February 2025. [Google Scholar]
- Rasheed, S.; Mirza, A.; Saeed, M.S.; Yousaf, M.H. Trash detection in water bodies using YOLO with explainable AI insight. In Proceedings of the International Conference on Robotics and Automation in Industry, Rawalpindi, Pakistan, 18–19 December 2024. [Google Scholar]
- Saleem, A.; Awad, A.; Paheding, S.; Lucas, E.; Havens, T.C.; Esselman, P.C. Understanding the influence of image enhancement on underwater object detection: A quantitative and qualitative study. Remote Sens. 2025, 17, 185. [Google Scholar] [CrossRef]
- Ouyang, W.; Wei, Y.; Hou, T.; Liu, J. An in-situ image enhancement method for the detection of marine organisms by remotely operated vehicles. ICES J. Mar. Sci. 2024, 81, 440–452. [Google Scholar] [CrossRef]
- Xu, S.; Zhang, M.; Song, W.; Mei, H.; He, Q.; Liotta, A. A systematic review and analysis of deep learning-based underwater object detection. Neurocomputing 2023, 527, 204–232. [Google Scholar] [CrossRef]
- Song, H.; Xia, W.; Kang, J.; Zhang, S.; Ye, C.; Kang, W.; Toe, T.T. Underwater image enhancement method based on dark channel prior and guided filtering. In Proceedings of the International Conference on Automation, Robotics and Computer Engineering, Wuhan, China, 16–17 December 2022. [Google Scholar]
- Zhang, W.; Zhou, L.; Zhuang, P.; Li, G.; Pan, X.; Zhao, W.; Li, C. Underwater image enhancement via weighted wavelet visual perception fusion. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 2469–2483. [Google Scholar] [CrossRef]
- Zhang, G.; Li, C.; Yan, J.; Zheng, Y. ULD-CycleGAN: An underwater light field and depth map-optimized CycleGAN for underwater image enhancement. IEEE J. Ocean. Eng. 2024, 49, 1275–1288. [Google Scholar] [CrossRef]
- Cong, R.; Yang, W.; Zhang, W.; Li, C.; Guo, C.-L.; Huang, Q.; Kwong, S. PUGAN: Physical model-guided underwater image enhancement using GAN with dual-discriminators. IEEE Trans. Image Process. 2023, 32, 4472–4485. [Google Scholar] [CrossRef]
- Wang, B.; Xu, H.; Jiang, G.; Yu, M. UIE-convformer: Underwater image enhancement based on convolution and feature fusion transformer. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 1952–1968. [Google Scholar] [CrossRef]
- Yu, F.; Xiao, F.; Li, C.; Cheng, E.; Yuan, F. AO-UOD: A novel paradigm for underwater object detection using acousto-optic fusion. IEEE J. Ocean. Eng. 2025, 50, 919–940. [Google Scholar] [CrossRef]
- Zhou, W.; Zheng, F.; Yin, G.; Pang, Y.; Yi, J. YOLOTrashCan: A deep learning marine debris detection network. IEEE Trans. Instrum. Meas. 2022, 72, 1–12. [Google Scholar] [CrossRef]
- Li, X.; Zhao, Y.; Su, H.; Wang, Y.; Chen, G. Efficient underwater object detection based on feature enhancement and attention detection head. Sci. Rep. 2025, 15, 5973. [Google Scholar] [CrossRef]
- He, X.; Zhang, Y.; Zhang, Q. AIN-YOLO: A lightweight YOLO network with attention-based InceptionNext and knowledge distillation for underwater object detection. Adv. Eng. Inform. 2025, 66, 103504. [Google Scholar] [CrossRef]
- Lucas, E.; Awad, A.; Geglio, A.; Saleem, A. Underwater image enhancement and object detection: Are poor object detection results on enhanced images due to missing human labels? In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, Tucson, AZ, USA, 28 February–4 March 2025. [Google Scholar]
- Qiu, X.; Shi, Y. Multimodal fusion image enhancement technique and CFEC-YOLOv7 for underwater target detection algorithm research. Front. Neurorobot. 2025, 19, 1616919. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Wang, B.; Li, Y.; He, J.; Li, Y. UnitModule: A lightweight joint image enhancement module for underwater object detection. Pattern Recognit. 2025, 151, 110435. [Google Scholar] [CrossRef]
- Zhang, W.; Li, X.; Xu, S.; Li, X.; Yang, Y.; Xu, D.; Liu, T.; Hu, H. Underwater image restoration via adaptive color correction and contrast enhancement fusion. Remote Sens. 2023, 15, 4699. [Google Scholar] [CrossRef]
- Liu, W.; Xu, J.; He, S.; Chen, Y.; Zhang, X.; Shu, H.; Qi, P. Underwater-image enhancement based on maximum information-channel correction and edge-preserving filtering. Symmetry 2025, 17, 725. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, P.; Quan, L.; Yi, C.; Lu, C. Underwater image enhancement based on deep learning and image formation model. arXiv 2021, arXiv:2101.00991. [Google Scholar] [CrossRef]
- AI Hub. Enhanced Marine Debris Image Data. Available online: https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=71340 (accessed on 3 January 2025).
- Đuraš, A.; Wolf, B.J.; Ilioudi, A.; Palunko, I.; De Schutter, B.A. Dataset of Detection and Segmentation of Underwater Marine Debris in Shallow Waters. Sci Data 2024, 11, 921. [Google Scholar] [CrossRef]
- Wang, Z.; Zhou, D.; Li, Z.; Yuan, Z.; Yang, C. Underwater image enhancement via adaptive color correction and stationary wavelet detail enhancement. IEEE Access 2024, 12, 11066–11082. [Google Scholar] [CrossRef]
- Mughal, A.B.; Shah, S.M.; Khan, R.U. Real-time detection of underwater debris using YOLOv11: An optimized approach. In Proceedings of the International Conference on Emerging Technologies in Electronics, Computing, and Communication, Jamshoro, Pakistan, 23–25 April 2025. [Google Scholar]











| Module | AP@0.5 | AP@0.5:0.95 | ||
|---|---|---|---|---|
| Co-Attention | Filtering | Ensemble | ||
| - | - | - | 0.88 | 0.72 |
| √ | - | - | 0.84 | 0.67 |
| - | √ | - | 0.85 | 0.66 |
| √ | √ | - | 0.87 | 0.71 |
| √ | √ | √ | 0.91 | 0.75 |
| Model | Enhancement Method | AP@0.5 | AP@0.5:0.95 |
|---|---|---|---|
| Mughal et al. [32] | None | 0.88 | 0.72 |
| Wang et al. [31] | Physical model | 0.79 | 0.59 |
| Zhang et al. [15] | Physical model | 0.83 | 0.63 |
| Chen et al. [28] | Deep learning model | 0.84 | 0.66 |
| Ours (without NMS) | Deep learning model | 0.87 | 0.71 |
| Ours (with NMS) | Deep learning model | 0.91 | 0.75 |
| Model | Enhancement Method | AP@0.5 | AP@0.5:0.95 |
|---|---|---|---|
| Mughal et al. [32] | None | 0.69 | 0.52 |
| Wang et al. [31] | Physical model | 0.33 | 0.21 |
| Zhang et al. [15] | Physical model | 0.32 | 0.23 |
| Chen et al. [28] | Deep learning model | 0.44 | 0.32 |
| Ours (without NMS) | Deep learning model | 0.58 | 0.43 |
| Ours (with NMS) | Deep learning model | 0.71 | 0.53 |
| Model | Enhancement Method | AP@0.5 | AP@0.5:0.95 |
|---|---|---|---|
| Mughal et al. [32] | None | 0.32 | 0.21 |
| Wang et al. [31] | Physical model | 0.16 | 0.11 |
| Zhang et al. [15] | Physical model | 0.23 | 0.15 |
| Chen et al. [28] | Deep learning model | 0.12 | 0.08 |
| Ours (without NMS) | Deep learning model | 0.60 | 0.42 |
| Ours (with NMS) | Deep learning model | 0.62 | 0.44 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yoon, S.; Cho, J. DiCAF: A Dual-Input Co-Attention Fusion Network with NMS Ensemble for Underwater Debris Detection. J. Mar. Sci. Eng. 2025, 13, 2228. https://doi.org/10.3390/jmse13122228
Yoon S, Cho J. DiCAF: A Dual-Input Co-Attention Fusion Network with NMS Ensemble for Underwater Debris Detection. Journal of Marine Science and Engineering. 2025; 13(12):2228. https://doi.org/10.3390/jmse13122228
Chicago/Turabian StyleYoon, Sungan, and Jeongho Cho. 2025. "DiCAF: A Dual-Input Co-Attention Fusion Network with NMS Ensemble for Underwater Debris Detection" Journal of Marine Science and Engineering 13, no. 12: 2228. https://doi.org/10.3390/jmse13122228
APA StyleYoon, S., & Cho, J. (2025). DiCAF: A Dual-Input Co-Attention Fusion Network with NMS Ensemble for Underwater Debris Detection. Journal of Marine Science and Engineering, 13(12), 2228. https://doi.org/10.3390/jmse13122228

