Ship Target Detection Method Based on Feature Fusion and Bi-Level Routing Attention
Abstract
1. Introduction
- (1)
- Constructed a Representative Self-built Ship Dataset: The study focuses on the East China Sea and the lower reaches of the Yangtze River, establishing a custom ship dataset comprising approximately ten thousand images. This dataset covers nine target categories, including cargo ships, cruise ships, military vessels, and sailboats. It encompasses typical ship types and complex navigation scenarios, featuring high-density traffic and variable backgrounds.
- (2)
- Proposed a Lightweight and Efficient Network Model Based on an Improved YOLOv11 Framework: Using YOLOv11 as the baseline model, the study introduces three core enhancements. The proposed framework assigns each module distinct and complementary functions, forming a progressive optimization pipeline. The iAFF module acts as a feature optimizer, deployed at the end of the backbone network. It dynamically selects and fuses multi-scale features through its iterative attention mechanism. Its core function is to enhance the discriminative features of targets, especially for small-scale and occluded objects, while effectively suppressing background noise interference. This process provides a more discriminative primary feature representation for subsequent processing stages. The BiFormer module serves as a context modeler, receiving features preprocessed by iAFF. Its adopted BRA mechanism can efficiently capture long-range dependencies and global semantic relationships between features with relatively low computational complexity. Building upon the optimized features provided by iAFF, this module can deeply analyze scene context information. For example, it accurately distinguishes real ship targets from interferences such as waves and reflections, or understands the spatial relationships among densely arranged ships, thereby significantly reducing the false detection rate caused by complex backgrounds and inter-target occlusion. The preceding feature optimization by iAFF effectively enhances the allocation efficiency of BiFormer’s attention resources. The MPDIoU loss function functions as a geometry optimizer, operating at the network output layer during the training phase. Based on the high-quality semantic and contextual features jointly learned by iAFF and BiFormer, this function directly optimizes the bounding box regression process by minimizing the Euclidean distance between the key corners of the predicted bounding boxes and the ground truth boxes. This design, sensitive to geometric properties, is particularly suitable for the elongated shapes common to ships, enabling higher-precision pixel-level localization. In summary, iAFF and BiFormer collaboratively improve the model’s reliability in target recognition and preliminary localization, while MPDIoU further ensures the geometric accuracy of bounding box regression.
- (3)
- Achieves comprehensive performance improvements on the custom dataset: The experimental results demonstrate that the improved model proposed in this paper surpasses mainstream detection algorithms across several key metrics. Specifically, on the custom dataset, the model achieves 93.96% mAP, 92.93% Recall, and 94.97% Precision, with a parameter count of merely 2.90 M, striking a good balance between accuracy and efficiency. Ablation studies further validate the effectiveness of each proposed module.
2. Materials and Methods
2.1. Network Framework
2.2. Iterative Attentional Feature Fusion
2.3. Biformer
2.4. MPDIoU Loss Function
3. Results
3.1. Experiment Settings
3.2. Datasets
3.3. Evaluation Indexes
3.4. Comparative Experiment
3.4.1. Detection Performance
3.4.2. Ablation Study
4. Conclusions
- (1)
- Although this study has achieved modular design and lightweight improvements for ship target detection at the algorithmic level, it has not yet evolved into a complete, robust, and real-time integrated system for ship target detection under complex weather conditions on water. Future research should focus on designing and implementing an end-to-end system that integrates weather-adaptive perception, real-time image enhancement, multi-target detection, and visual interaction. This system should feature a user-friendly graphical interface and be deployable directly in port monitoring centers, shipborne sensing units, or shore-based intelligent systems, providing real-time technical support for practical applications such as maritime traffic supervision, intelligent ship collision avoidance, autonomous navigation, and unmanned vessel docking. Although the current model has undergone certain lightweight optimizations in terms of parameter count and computational load, its inference efficiency, memory footprint, power consumption, and thermal performance on edge devices with strictly limited computational resources still require further fine-tuning. Targeted efforts in model pruning, quantization, compilation optimization, and hardware adaptation are needed, along with the establishment of a performance-versus-power trade-off assessment framework, to truly meet the demands of future engineering and large-scale deployment.
- (2)
- Current research primarily focuses on the perception level of ship targets and has not yet deeply explored high-level semantic understanding. With the rapid development of vision-language multimodal large models, new possibilities have emerged for endowing ship intelligent perception systems with deeper cognitive and analytical capabilities. Future work should actively explore a deep integration paradigm between ship detection systems and multimodal large models. This can be achieved by fine-tuning general vision-language large models with domain-specific maritime data to construct domain-specific vision-language models equipped with maritime expertise, supplemented with a professional knowledge base covering ship types, navigation rules, maritime regulations, and risk case studies. Building on this foundation, the system will not only be capable of detecting ship targets but will also further identify nuanced ship states, behavioral patterns, and interactive relationships, while automatically generating structured semantic descriptive reports or multimodal risk warning information. This will fundamentally transform the operational mode of maritime monitoring systems, shifting from a traditionally passive, image perception and information listing-based monitoring approach to a comprehensive intelligent system that integrates active perception, deep understanding, intelligent decision-making, and forward-looking early warning. Ultimately, such a system will not only provide clear observation of maritime situations but also enable a profound understanding of their complex implications, allowing for the anticipation of future risks and thereby substantially enhancing the proactiveness and intelligence level of maritime safety supervision.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huan, Y.; Chen, L.; Liu, B.; Wang, W. Research on ship detection technology based on improved YOLOv5. In Proceedings of the 2023 7th International Conference on Machine Vision and Information Technology (CMVIT), Xiamen, China, 24–26 March 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
- Song, W.; Yan, D.; Yan, J.; Hu, C.; Wu, W.; Wang, X. Ship detection and identification in SDGSAT-1 glimmer images based on the glimmer YOLO model. Int. J. Digit. Earth 2023, 16, 4687–4706. [Google Scholar] [CrossRef]
- Ezzeddini, L.; Affes, N.; Ktari, J.; Frikha, T.; Ben Halima, R.; Hamam, H. Smart Maritime Surveillance: Leveraging YOLO Detection and Blockchain traceability for Vessel Monitoring. J. Inf. Assur. Secur. 2025, 19, 233–248. [Google Scholar] [CrossRef]
- Lan, K.; Jiang, X.; Ding, X.; Lin, H.; Chan, S. High-Efficiency and High-Precision Ship Detection Algorithm Based on Improved YOLOv8n. Mathematics 2024, 12, 1072. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, S.; Xu, J.; Cheng, Z.; Du, G. YOLO-StarLS: A Ship Detection Algorithm Based on Wavelet Transform and Multi-Scale Feature Extraction for Complex Environments. Symmetry 2025, 17, 1116. [Google Scholar] [CrossRef]
- Hui, Z.F.; Li, P.L.; Shen, L.; Shen, H.; Sui, J.; Zhang, S. Research on Target Detection and Statistics Method for Fishing Port Vessels Entering and Leaving the Port Based on Improved YOLOv8. J. Dalian Ocean. Univ. 2024, 39, 498–505. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Zaurin, R.; Khuc, T.; Catbas, F.N. Hybrid Sensor-Camera Monitoring for Damage Detection: Case Study of a Real Bridge. J. Bridge Eng. 2016, 21, 934–942. [Google Scholar] [CrossRef]
- Srinivas, Y.; Ganivada, A. A modified inter-frame difference method for detection of moving objects in videos. Int. J. Inf. Technol. 2024, 17, 749–754. [Google Scholar] [CrossRef]
- Xin, J.; Cao, X.; Xiao, H.; Liu, T.; Liu, R.; Xin, Y. Infrared Small Target Detection Based on Multiscale Kurtosis Map Fusion and Optical Flow Method. Sensors 2023, 23, 1660. [Google Scholar] [CrossRef]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
- Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
- Achanta, R.; Hemami, S.; Estrada, F.; Susstrunk, S. Frequency-tuned salient region detection. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar] [CrossRef]
- Hou, X.; Zhang, L. Saliency Detection: A Spectral Residual Approach. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar] [CrossRef]
- Schölkopf, B.; Platt, J.; Hofmann, T. Graph-Based Visual Saliency. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference; MIT Press: Cambridge, MA, USA, 2007; pp. 545–552. [Google Scholar] [CrossRef]
- Qi, S.; Ma, J.; Lin, J.; Li, Y.; Tian, J. Unsupervised Ship Detection Based on Saliency and S-HOG Descriptor from Optical Satellite Images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1451–1455. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 91–99. [Google Scholar] [CrossRef]
- Xie, F.; Zhu, D.J. Survey on Deep Learning Object Detection. Comput. Syst. Appl. 2022, 31, 1–12. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Fu, C.-Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv 2017, arXiv:1701.06659. [Google Scholar] [CrossRef]
- Li, Z.; Yang, L.; Zhou, F. FSSD: Feature Fusion Single Shot Multibox Detector. arXiv 2017, arXiv:1712.00960. [Google Scholar] [CrossRef]
- Lim, J.-S.; Astrid, M.; Yoon, H.-J.; Lee, S.-I. Small Object Detection using Context and Attention. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAICIC), Jeju Island, Republic of Korea, 13–16 April 2021. [Google Scholar] [CrossRef]
- Wang, J.; Pan, Q.; Lu, D.; Zhang, Y. An Efficient Ship-Detection Algorithm Based on the Improved YOLOv5. Electronics 2023, 12, 3600. [Google Scholar] [CrossRef]
- Zhang, J.; Li, Y.; Wan, G.; Jiang, M.; Huang, Z.; Tao, X.; Chen, J.; Chu, D. Small Target Detection Algorithm for UAV Based on Improved YOLOv5. In Proceedings of the 8th International Conference on Signal and Image Processing, Wuxi, China, 8–10 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 78–82. [Google Scholar] [CrossRef]
- Li, P.; Zheng, J.; Li, P.; Long, H.; Li, M.; Gao, L. Tomato Maturity Detection and Counting Model Based on MHSA-YOLOv8. Sensors 2023, 23, 6701. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Meng, H.; Yuan, F. Multiscale and Multilevel Enhanced Features for Ship Target Recognition in Complex Environments. IEEE Trans. Ind. Inform. 2024, 20, 4640–4650. [Google Scholar] [CrossRef]
- Zhao, L.; Ning, F.; Xi, Y.; Liang, G.; He, Z.; Zhang, Y. MSFA-YOLO: A Multi-Scale SAR Ship Detection Algorithm Based on Fused Attention. IEEE Access 2024, 12, 24554–24568. [Google Scholar] [CrossRef]
- Xie, Y.; Liu, S.; Chen, H.; Cao, S.; Zhang, H.; Feng, D.; Wan, Q.; Zhu, J.; Zhu, Q. Localization, balance and affinity: A stronger multifaceted collaborative salient object detector in remote sensing images. arXiv 2024, arXiv:2410.23991. [Google Scholar] [CrossRef]
- Xie, Y.; Zhan, N.; Zhu, J.; Xu, B.; Chen, H.; Mao, W.; Luo, X.; Hu, Y. Landslide Extraction from Aerial Imagery Considering Context Association Characteristics. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103950. [Google Scholar] [CrossRef]
- Zhu, J.; Zhang, J.; Chen, H.; Xie, Y.; Gu, H.; Lian, H. A Cross-View Intelligent Person Search Method Based on Multi-Feature Constraints. Int. J. Digit. Earth 2024, 17, 2346259. [Google Scholar] [CrossRef]
- Song, W.; Zhao, Y.; Tu, J.; Chen, M.; Xie, Y.; Cui, X. A Visual Attention-Guided Approach for Concrete Crack Detection in Complex Environments. Eng. Appl. Artif. Intell. 2026, 173, 114439. [Google Scholar] [CrossRef]
- Zhang, Q.; Wang, L.; Meng, H.; Zhang, Z.; Yang, C. Ship Detection in Maritime Scenes under Adverse Weather Conditions. Remote Sens. 2024, 16, 1567. [Google Scholar] [CrossRef]
- Liu, W.; Ren, G.; Yu, R.; Guo, S.; Zhu, J.; Zhang, L. Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions. arXiv 2021, arXiv:2112.08088. [Google Scholar]
- Hu, R.; Zheng, H.; Ye, S.; Qing, L.; Chen, H. A Lightweight Framework for Robust Object Detection in Adverse Weather Based on Dual-Teacher Feature Alignment. Neurocomputing 2026, 671, 132726. [Google Scholar] [CrossRef]
- Zhang, Y.; Xuan, S.; Li, Z. Robust Object Detection in Adverse Weather with Feature Decorrelation via Independence Learning. Pattern Recognit. 2026, 169, 111790. [Google Scholar] [CrossRef]
- Ogino, Y.; Shoji, Y.; Toizumi, T.; Ito, A. ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing. arXiv 2024, arXiv:2411.02799. [Google Scholar]
- Liu, Y.; Yuan, T.; Ren, A.; Kuo, Y.; Xiong, X. YOLO-FOD: Lightweight Object Detection Based on Multibranch and Multiscale Feature Fusion for Adverse Weather. Neurocomputing 2026, 659, 131778. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Sekachev, B.; Zhavoronkov, A.; Manovich, N. Computer Vision Annotation Tool. 2019. Available online: https://github.com/opencv/cvat (accessed on 20 April 2024).
- Mishra, A.; Gupta, M.; Sharma, P. Enhancement of Underwater Images Using Improved CLAHE. In Proceedings of the 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), Bhopal, India, 28–29 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar] [CrossRef]










| Method | mAP (%) | Recall (%) | Precision (%) | Parameters (M) | FLOPs (G) |
|---|---|---|---|---|---|
| Faster R-CNN | 82.18 | 77.35 | 83.67 | 40.63 | 207 |
| CenterNet | 88.19 | 79.61 | 89.31 | 45.21 | 50 |
| YOLOv5 | 88.32 | 83.66 | 84.97 | 7.04 | 24.0 |
| YOLOv8 | 88.54 | 88.55 | 85.16 | 3.01 | 28.6 |
| YOLOv10 | 90.13 | 89.35 | 91.36 | 2.72 | 21.6 |
| YOLOv11 | 91.56 | 90.84 | 91.39 | 2.59 | 21.5 |
| Ours | 93.96 | 92.93 | 94.97 | 2.90 | 7.9 |
| Method | mAP (%) | Recall (%) | Precision (%) |
|---|---|---|---|
| Faster R-CNN | 89.68 | 88.34 | 89.96 |
| CenterNet | 90.17 | 90.04 | 92.17 |
| YOLOv5 | 95.52 | 97.63 | 96.83 |
| YOLOv8 | 97.82 | 96.35 | 97.61 |
| YOLOv10 | 94.17 | 95.44 | 96.37 |
| YOLOv11 | 95.68 | 96.34 | 96.61 |
| Ours | 96.93 | 97.82 | 97.89 |
| Method | iAFF | Biformer | MPDIoU | mAP (%) | Recall (%) | Precision (%) | Parameters (M) |
|---|---|---|---|---|---|---|---|
| Baseline | - | - | - | 91.56 | 90.84 | 91.39 | 2.59 |
| 1 | √ | - | - | 92.14 | 91.28 | 82.76 | 2.95 |
| 2 | √ | √ | - | 92.89 | 91.75 | 93.64 | 2.60 |
| 3 | √ | - | √ | 92.61 | 92.46 | 94.26 | 2.94 |
| 4 | √ | √ | √ | 93.96 | 92.93 | 94.97 | 2.90 |
| Method | mAP (%) | Recall (%) | Precision (%) |
|---|---|---|---|
| 91.69 | 90.81 | 91.37 | |
| 91.87 | 89.29 | 93.64 | |
| 90.53 | 91.35 | 89.76 | |
| 92.32 | 90.41 | 95.22 | |
| 93.96 | 92.93 | 94.97 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zuo, D.; Qi, L.; Ni, H.; Song, S.; Li, H.; Wang, X. Ship Target Detection Method Based on Feature Fusion and Bi-Level Routing Attention. Symmetry 2026, 18, 729. https://doi.org/10.3390/sym18050729
Zuo D, Qi L, Ni H, Song S, Li H, Wang X. Ship Target Detection Method Based on Feature Fusion and Bi-Level Routing Attention. Symmetry. 2026; 18(5):729. https://doi.org/10.3390/sym18050729
Chicago/Turabian StyleZuo, Danfeng, Liang Qi, Hao Ni, Song Song, Haifeng Li, and Xinwen Wang. 2026. "Ship Target Detection Method Based on Feature Fusion and Bi-Level Routing Attention" Symmetry 18, no. 5: 729. https://doi.org/10.3390/sym18050729
APA StyleZuo, D., Qi, L., Ni, H., Song, S., Li, H., & Wang, X. (2026). Ship Target Detection Method Based on Feature Fusion and Bi-Level Routing Attention. Symmetry, 18(5), 729. https://doi.org/10.3390/sym18050729

