BSE-YOLO: An Enhanced Lightweight Multi-Scale Underwater Object Detection Model
Abstract
:1. Introduction
2. Related Work
2.1. Underwater Object Detection
2.2. Attention Mechanism
3. Methods
3.1. The Architecture of YOLOv10n
3.2. The Structure of the Proposed BSE-YOLO
3.2.1. Improved Bidirectional Feature Pyramid Network (BiFPN)
3.2.2. Multi-Scale Attention Synergy Module (MASM)
3.2.3. Efficient Multi-Scale Attention (EMA)
4. Experiment and Results
4.1. Datasets
4.2. Experimental Setup
4.3. Evaluation Metrics
4.4. Comparisons with Other Methods on URPC2020
4.5. Comparisons with Other Methods on DUO
4.6. Ablation Study
4.7. Application Test
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Apriliani, E.; Nurhadi, H. Ensemble and Fuzzy Kalman Filter for position estimation of an autonomous underwater vehicle based on dynamical system of AUV motion. Expert Syst. Appl. 2017, 68, 29–35. [Google Scholar]
- Lu, H.; Li, Y.; Zhang, Y.; Chen, M.; Serikawa, S.; Kim, H. Underwater optical image processing: A comprehensive review. Mob. Netw. Appl. 2017, 22, 1204–1211. [Google Scholar] [CrossRef]
- Yeh, C.H.; Lin, C.H.; Kang, L.W.; Huang, C.H.; Lin, M.H.; Chang, C.Y.; Wang, C.C. Lightweight deep neural network for joint learning of underwater object detection and color conversion. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6129–6143. [Google Scholar] [CrossRef] [PubMed]
- Xu, S.; Zhang, M.; Song, W.; Mei, H.; He, Q.; Liotta, A. A systematic review and analysis of deep learning-based underwater object detection. Neurocomputing 2023, 527, 204–232. [Google Scholar] [CrossRef]
- Zhao, C.; Shu, X.; Yan, X.; Zuo, X.; Zhu, F. RDD-YOLO: A modified YOLO for detection of steel surface defects. Measurement 2023, 214, 112776. [Google Scholar] [CrossRef]
- Yang, Y.; Zhang, J.; Shu, X.; Pan, L.; Zhang, M. A lightweight Transformer model for defect detection in electroluminescence images of photovoltaic cells. IEEE Access 2024, 12, 194922–194931. [Google Scholar] [CrossRef]
- Hong, L.; Shu, X.; Wang, Q.; Ye, H.; Shi, J.; Liu, C. CCM-Net: Color compensation and coordinate attention guided underwater image enhancement with multi-scale feature aggregation. Opt. Lasers Eng. 2025, 184, 108590. [Google Scholar] [CrossRef]
- Liu, C.; Shu, X.; Xu, D.; Shi, J. GCCF: A lightweight and scalable network for underwater image enhancement. Eng. Appl. Artif. Intell. 2024, 128, 107462. [Google Scholar] [CrossRef]
- Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804, pp. 1–6. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. arXiv 2015, arXiv:1504.08083. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Chen, L.; Zhou, F.; Wang, S.; Dong, J.; Li, N.; Ma, H.; Wang, X.; Zhou, H. SWIPENET: Object detection in noisy underwater scenes. Pattern Recognit. 2022, 132, 108926. [Google Scholar] [CrossRef]
- Liu, J.; Liu, S.; Xu, S.; Zhou, C. Two-stage underwater object detection network using swin transformer. IEEE Access 2022, 10, 117235–117247. [Google Scholar] [CrossRef]
- Pan, T.S.; Huang, H.C.; Lee, J.C.; Chen, C.H. Multi-scale ResNet for real-time underwater object detection. Signal Image Video Process. 2021, 15, 941–949. [Google Scholar] [CrossRef]
- Gao, J.; Zhang, Y.; Geng, X.; Tang, H.; Bhatti, U.A. PE-Transformer: Path enhanced transformer for improving underwater object detection. Expert Syst. Appl. 2024, 246, 123253. [Google Scholar] [CrossRef]
- Cai, S.; Li, G.; Shan, Y. Underwater object detection using collaborative weakly supervision. Comput. Electr. Eng. 2022, 102, 108159. [Google Scholar] [CrossRef]
- Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight underwater object detection based on yolo v4 and multi-scale attentional feature fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- Liu, K.; Peng, L.; Tang, S. Underwater object detection using TC-YOLO with attention mechanisms. Sensors 2023, 23, 2567. [Google Scholar] [CrossRef]
- Feng, J.; Jin, T. CEH-YOLO: A composite enhanced YOLO-based model for underwater object detection. Ecol. Inform. 2024, 82, 102758. [Google Scholar] [CrossRef]
- Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Wang, X.; Xue, G.; Huang, S.; Liu, Y. Underwater object detection algorithm based on adding channel and spatial fusion attention mechanism. J. Mar. Sci. Eng. 2023, 11, 1116. [Google Scholar] [CrossRef]
- Yan, J.; Zhou, Z.; Zhou, D.; Su, B.; Xuanyuan, Z.; Tang, J.; Lai, Y.; Chen, J.; Liang, W. Underwater object detection algorithm based on attention mechanism and cross-stage partial fast spatial pyramidal pooling. Front. Mar. Sci. 2022, 9, 1056300. [Google Scholar] [CrossRef]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2017–2025. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Liu, N.; Han, J.; Yang, M.H. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3089–3098. [Google Scholar]
- Si, Y.; Xu, H.; Zhu, X.; Zhang, W.; Dong, Y.; Chen, Y.; Li, H. SCSA: Exploring the synergistic effects between spatial and channel attention. arXiv 2024, arXiv:2407.05128. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2025, 37, 107984–108011. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Zhao, Y.; Sun, F.; Wu, X. FEB-YOLOv8: A multi-scale lightweight detection model for underwater object detection. PLoS ONE 2024, 19, e0311173. [Google Scholar] [CrossRef] [PubMed]
- Liu, C.; Li, H.; Wang, S.; Zhu, M.; Wang, D.; Fan, X. A dataset and benchmark of underwater object detection for robot picking. In Proceedings of the 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Li, N.; Ding, B.; Yang, G.; Ni, S.; Wang, M. Lightweight LUW-DETR for efficient underwater benthic organism detection. Vis. Comput. 2025, 1–16. [Google Scholar] [CrossRef]
Methods | mAP@0.5 | mAP@0.5:0.95 | Parameters (M) | GFLOPs (G) | FPS |
---|---|---|---|---|---|
SSD [14] | 76.2 | 37.6 | 24.01 | 274.4 | 70.6 |
YOLOv3 | 72.9 | 31.4 | 61.54 | 155.3 | 69.1 |
YOLOv4 | 75.4 | 34.0 | 63.95 | 141.9 | 51.9 |
YOLOv5s | 80.4 | 43.5 | 7.27 | 17.1 | 103.1 |
YOLOvX [41] | 79.7 | 42.0 | 5.03 | 15.2 | 102.7 |
YOLOv8n | 82.2 | 47.9 | 3.01 | 8.2 | 140.8 |
YOLOv10n | 81.5 | 47.6 | 2.70 | 8.2 | 136.4 |
FEB-YOLO [39] | 83.5 | 48.9 | 1.64 | 6.2 | - |
RTDETR [42] | 80.5 | 44.9 | 31.99 | 103.4 | 37.8 |
LUW-DETR [43] | 83.1 | - | 14.23 | 40.1 | - |
BSE-YOLO | 83.7 | 48.6 | 2.47 | 8.3 | 97.7 |
Methods | mAP@0.5 | mAP@0.5:0.95 | Parameters (M) | GFLOPs (G) | FPS |
---|---|---|---|---|---|
SSD [14] | 79.7 | 50.7 | 24.01 | 274.4 | 70.6 |
Faster R-CNN [17] | 74.4 | 39.3 | 136.75 | 401.7 | 38.2 |
YOLOv3 | 71.6 | 40.3 | 61.54 | 155.3 | 69.1 |
YOLOv4 | 76.7 | 43.9 | 63.95 | 141.9 | 51.9 |
YOLOv7-Tiny | 81.0 | 57.7 | 6.02 | 13.2 | 105.2 |
YOLOv8n | 81.7 | 61.8 | 3.01 | 8.2 | 140.8 |
YOLOv10n | 80.9 | 61.7 | 2.70 | 8.2 | 136.4 |
FEB-YOLO [39] | 82.9 | 63.2 | 1.64 | 6.2 | - |
RT-DETR [42] | 77.4 | 55.5 | 31.99 | 103.4 | 37.8 |
BSE-YOLO | 83.9 | 64.2 | 2.47 | 8.3 | 97.7 |
Methods | mAP@0.5 | mAP@0.5:0.95 | Parameters (M) | GFLOPs (G) | FPS |
---|---|---|---|---|---|
YOLOv10n | 81.5 | 47.6 | 2.70 | 8.2 | 136.4 |
B-YOLO | 83.0 | 47.5 | 2.43 | 7.9 | 144.4 |
S-YOLO | 82.2 | 47.6 | 2.73 | 8.4 | 103.6 |
E-YOLO | 81.6 | 47.2 | 2.71 | 8.4 | 115.2 |
BS-YOLO | 83.5 | 48.3 | 2.46 | 8.0 | 103.0 |
BE_YOLO | 83.2 | 47.7 | 2.44 | 8.0 | 125.6 |
SE_YOLO | 82.8 | 47.8 | 2.74 | 8.5 | 94.2 |
BSE_YOLO | 83.7 | 48.6 | 2.47 | 8.3 | 97.7 |
Methods | Size | Parameters | GFLOPs | Framework | Time (ms) |
---|---|---|---|---|---|
YOLOv10n | 256 | 2.70 | 8.2 | ONNX | 44.2 |
BSE-YOLO | 256 | 2.47 | 8.3 | ONNX | 57.6 |
YOLOv10n | 640 | 2.70 | 8.2 | ONNX | 236.2 |
BSE-YOLO | 640 | 2.47 | 8.3 | ONNX | 264.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Ye, H.; Shu, X. BSE-YOLO: An Enhanced Lightweight Multi-Scale Underwater Object Detection Model. Sensors 2025, 25, 3890. https://doi.org/10.3390/s25133890
Wang Y, Ye H, Shu X. BSE-YOLO: An Enhanced Lightweight Multi-Scale Underwater Object Detection Model. Sensors. 2025; 25(13):3890. https://doi.org/10.3390/s25133890
Chicago/Turabian StyleWang, Yuhang, Hua Ye, and Xin Shu. 2025. "BSE-YOLO: An Enhanced Lightweight Multi-Scale Underwater Object Detection Model" Sensors 25, no. 13: 3890. https://doi.org/10.3390/s25133890
APA StyleWang, Y., Ye, H., & Shu, X. (2025). BSE-YOLO: An Enhanced Lightweight Multi-Scale Underwater Object Detection Model. Sensors, 25(13), 3890. https://doi.org/10.3390/s25133890