Exploring Attention Placement in YOLOv5 for Ship Detection in Infrared Maritime Scenes
Abstract
1. Introduction
2. Proposed Improved YOLOv5
2.1. Overview of the YOLOv5 Architecture
2.2. YOLOv5n Enhanced with Attention Mechanism
- Neck Position: Enhancing early-stage attention to small target features. In infrared imagery, small maritime objects—such as the canoe and fishing boat—tend to exhibit sparse texture and diminished saliency following deep downsampling, making them susceptible to being overwhelmed by background clutter. This indicates that embedding the CBAM into the neck network enhances the sensitivity of the model to small target during the multi-scale feature fusion stage. Specifically, the spatial attention mechanism emphasizes the spatial locations of potential targets, while the channel attention mechanism strengthens semantic channels associated with thermal emissions and structural contours of ships. This integration improves the effectiveness of early feature perception and enhances the model’s ability to preserve discriminative information relevant to small objects.
- Head Position: Improving late-stage suppression of background noise. Infrared maritime scenes often exhibit complex backgrounds, with interference from waves, fog, and strong reflections, which can mislead the detection process. To address this challenge, integrating the CBAM into the prediction head enables further refinement of the candidate feature maps during the output stage. By selectively compressing and reweighting features, the model effectively suppresses background noise and enhances the response to target regions, thereby improving robustness and discriminative capability in the final prediction phase.
- Neck+Head Joint Integration: Enabling multi-stage collaborative optimization. Incorporating attention mechanism into both the neck and prediction head is theoretically advantageous, as it allows for the simultaneous enhancement of small target representations in the early stages and suppression of background interference in the later stages, potentially yielding a synergistic optimization effect. Nevertheless, the use of dual attention modules may also introduce feature redundancy or interference, thereby affecting inference efficiency and training stability. To evaluate the effectiveness of this joint strategy, a variant model named YOLOv5n-HN is constructed. Extensive experiments are conducted under various infrared maritime scenarios to assess whether this configuration offers generalized performance gains or if its benefits are limited to specific task settings.
3. Experiment Results Analysis
3.1. Dataset Description and Analysis
3.2. Training Strategy and Implementation Details
3.2.1. Hyperparameter Setting
3.2.2. Loss Function Analysis
3.2.3. mAP Metric Analysis
3.3. Performance Evaluation and Comparative Analysis
3.3.1. Metric-Based Comparison
3.3.2. Case-Level Analysis
4. Conclusions
- The YOLOv5n-HN, which integrates attention modules into both the neck and head, delivers the most comprehensive performance, effectively fusing semantic and spatial representations for improved detection of medium- and large-scale targets.
- The YOLOv5n-N demonstrates clear advantages in small-object detection, suggesting that attention integration within the neck module is particularly beneficial for capturing fine-grained features.
- The attention mechanism enhances background suppression and target saliency. However, the simultaneous deployment of multiple attention modules may lead to over-fitting or feature redundancy.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, X.; Liu, M.; Yin, Y. Infrared Ship Detection in Complex Nearshore Scenes Based on Improved YOLOv5s. Sensors 2025, 25, 3979. [Google Scholar] [CrossRef] [PubMed]
- Fan, X.; Hu, Z. A Small-Ship Object Detection Method for Satellite Remote Sensing Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 11886–11898. [Google Scholar] [CrossRef]
- Xiong, W.; Xiong, Z.; Cui, Y.; Huang, L.; Yang, R. An Interpretable Fusion Siamese Network for Multi-Modality Remote Sensing Ship Image Retrieval. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 2696–2712. [Google Scholar] [CrossRef]
- Huang, L.F.; Liu, C.G.; Wang, H.G.; Zhu, Q.L.; Zhang, L.J.; Han, J.; Zhang, Y.S.; Wang, Q.N. Experimental Analysis of Atmospheric Ducts and Navigation Radar Over-the-Horizon Detection. Remote Sens. 2022, 14, 2588. [Google Scholar] [CrossRef]
- Chen, X.; Mu, X.; Guan, J.; Liu, N.; Zhou, W. Marine Target Detection Based on Marine-Faster R-CNN for Navigation Radar Plane Position Indicator Images. Front. Inf. Technol. Electron. Eng. 2022, 23, 630–643. [Google Scholar] [CrossRef]
- Huang, Q.; Sun, H.; Wang, Y.; Yuan, Y.; Guo, X.; Gao, Q. Ship Detection Based on YOLO Algorithm for Visible Images. Iet Image Process. 2024, 18, 481–492. [Google Scholar] [CrossRef]
- Zhao, P.; Yu, X.; Chen, Z.; Liang, Y. A Real-Time Ship Detector via a Common Camera. J. Mar. Sci. Eng. 2022, 10, 1043. [Google Scholar] [CrossRef]
- Zhang, T.; Shen, H.; Rehman, S.U.; Liu, Z.; Li, Y.; Rehman, O.U. Two-Stage Domain Adaptation for Infrared Ship Target Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4208315. [Google Scholar] [CrossRef]
- Zhou, A.; Xie, W.; Pei, J. Background Modeling in the Fourier Domain for Maritime Infrared Target Detection. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 2634–2649. [Google Scholar] [CrossRef]
- Zhou, A.; Xie, W.; Pei, J. Infrared Maritime Target Detection Using the High Order Statistic Filtering in Fractional Fourier Domain. Infrared Phys. Technol. 2018, 91, 123–136. [Google Scholar] [CrossRef]
- Mo, W.; Pei, J. Nighttime Infrared Ship Target Detection Based on Two-channel Image Separation Combined with Saliency Mapping of Local Grayscale Dynamic Range. Infrared Phys. Technol. 2022, 127, 104416. [Google Scholar] [CrossRef]
- Li, L.; Liu, G.; Li, Z.; Ding, Z.; Qin, T. Infrared Ship Detection Based on Time Fluctuation Feature and Space Structure Feature in Sun-Glint Scene. Infrared Phys. Technol. 2021, 115, 103693. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:1708.02002. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Firdiantika, I.M.; Kim, S. One-Stage Infrared Ships Detection with Attention Mechanism. In Proceedings of the 2023 23rd International Conference on Control, Automation and Systems (ICCAS), Yeosu, Republic of Korea, 17–20 October 2023; pp. 448–451. [Google Scholar] [CrossRef]
- Ye, J.; Yuan, Z.; Qian, C.; Li, X. CAA-YOLO: Combined-Attention-Augmented YOLO for Infrared Ocean Ships Detection. Sensors 2022, 22, 3782. [Google Scholar] [CrossRef] [PubMed]
- Miao, L.; Li, N.; Zhou, M.; Zhou, H. CBAM-Yolov5: Improved Yolov5 Based on Attention Model for Infrared Ship Detection. In Proceedings of the International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2021), Harbin, China, 24–26 December 2021; SPIE: Bellingham, WA, USA, 2022; p. 33. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. What Is YOLOv5: A Deep Look into the Internal Features of the Popular Object Detector. arXiv 2024, arXiv:2407.20892. [Google Scholar] [CrossRef]
- Qian, L.; Zheng, Y.; Cao, J.; Ma, Y.; Zhang, Y.; Liu, X. Lightweight Ship Target Detection Algorithm Based on Improved YOLOv5s. J.-Real-Time Image Process. 2024, 21, 3. [Google Scholar] [CrossRef]
- Guo, L.; Wang, Y.; Guo, M.; Zhou, X. YOLO-IRS: Infrared Ship Detection Algorithm Based on Self-Attention Mechanism and KAN in Complex Marine Background. Remote Sens. 2024, 17, 20. [Google Scholar] [CrossRef]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7029–7038. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Zhou, J.; Jiang, P.; Zou, A.; Chen, X.; Hu, W. Ship Target Detection Algorithm Based on Improved YOLOv5. J. Mar. Sci. Eng. 2021, 9, 908. [Google Scholar] [CrossRef]
- Giulietti, N.; Tombesi, S.; Bedodi, M.; Sergenti, C.; Carnevale, M.; Giberti, H. Hazelnut Yield Estimation: A Vision-Based Approach for Automated Counting of Hazelnut Female Flowers. Sensors 2025, 25, 3212. [Google Scholar] [CrossRef]
Component | Specification |
---|---|
CPU | 22vCPU AMD EPYC 7T83 64-Core Processor |
GPU | vGPU-32GB (32 GB) |
Memory | 90 GB |
CUDA | (compatible with PyTorch 1.13.1) |
Operation System | Linux (Ubuntu LTS, Chongqing, China) |
Deep Learning Framework | PyTorch (1.13.1) |
Hyperparameter | Value |
---|---|
Batch Size | 32 |
Epoch | 300 |
Learning Rate | 0.01 |
Mosaic Probability | 0.5 |
MixUp Probability | 0.5 |
Optimizer | SGD |
Momentum | 0.937 |
Special Auto Ratio | 0.7 |
Model Variant | mAP@0.5 (%) | mAP@0.5:0.95 (%) | Precision (%) | Recall (%) |
---|---|---|---|---|
YOLOv5n | 86.20 | 50.33 | 91.62 | 80.24 |
YOLOv5n-H | 86.37 | 51.04 | 93.41 | 76.50 |
YOLOv5n-N | 85.95 | 50.43 | 93.49 | 75.39 |
YOLOv5n-HN | 86.83 | 50.00 | 93.13 | 76.13 |
Model Variant | Bulk Carrier (%) | Canoe (%) | Container Ship (%) | Fishing Boat (%) | Liner (%) | Sailboat (%) | Warship (%) |
---|---|---|---|---|---|---|---|
YOLOv5n | 95.18 | 71.78 | 95.95 | 80.51 | 86.75 | 85.46 | 98.79 |
YOLOv5n-H | 95.19 | 73.21 | 94.36 | 79.10 | 86.93 | 85.31 | 98.84 |
YOLOv5n-N | 94.71 | 74.30 | 91.68 | 78.52 | 85.22 | 86.98 | 98.63 |
YOLOv5n-HN | 94.12 | 71.89 | 91.64 | 78.96 | 86.59 | 85.58 | 99.02 |
Model Variant | Bulk Carrier (%) | Canoe (%) | Container Ship (%) | Fishing Boat (%) | Liner (%) | Sailboat (%) | Warship (%) |
---|---|---|---|---|---|---|---|
YOLOv5n | 96.97 | 80.12 | 95.77 | 89.02 | 86.75 | 94.16 | 94.81 |
YOLOv5n-H | 96.39 | 85.90 | 98.51 | 88.70 | 95.83 | 93.55 | 95.74 |
YOLOv5n-N | 96.39 | 88.41 | 98.48 | 87.97 | 94.26 | 93.18 | 95.05 |
YOLOv5n-HN | 96.23 | 86.87 | 98.48 | 88.19 | 94.35 | 92.32 | 95.47 |
Model Variant | Bulk Carrier (%) | Canoe (%) | Container Ship (%) | Fishing Boat (%) | Liner (%) | Sailboat (%) | Warship (%) |
---|---|---|---|---|---|---|---|
YOLOv5n | 87.91 | 59.91 | 93.15 | 68.42 | 79.63 | 74.80 | 97.86 |
YOLOv5n-H | 87.91 | 57.76 | 90.41 | 63.79 | 70.99 | 68.08 | 96.43 |
YOLOv5n-N | 87.91 | 57.54 | 89.04 | 63.17 | 70.99 | 70.58 | 96.07 |
YOLOv5n-HN | 84.07 | 55.60 | 89.04 | 64.51 | 78.22 | 69.64 | 97.86 |
Model Variant | Bulk Carrier | Canoe | Container Ship | Fishing Boat | Liner | Sailboat | Warship |
---|---|---|---|---|---|---|---|
YOLOv5n | 0.92 | 0.69 | 0.94 | 0.77 | 0.86 | 0.82 | 0.96 |
YOLOv5n-H | 0.92 | 0.69 | 0.94 | 0.74 | 0.82 | 0.79 | 0.96 |
YOLOv5n-N | 0.92 | 0.70 | 0.94 | 0.74 | 0.81 | 0.80 | 0.96 |
YOLOv5n-HN | 0.90 | 0.68 | 0.94 | 0.75 | 0.82 | 0.79 | 0.97 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, R.; Zhang, J.; Yang, D.; Zhao, D.; Chen, J.; Zhu, Z. Exploring Attention Placement in YOLOv5 for Ship Detection in Infrared Maritime Scenes. Technologies 2025, 13, 391. https://doi.org/10.3390/technologies13090391
Zhu R, Zhang J, Yang D, Zhao D, Chen J, Zhu Z. Exploring Attention Placement in YOLOv5 for Ship Detection in Infrared Maritime Scenes. Technologies. 2025; 13(9):391. https://doi.org/10.3390/technologies13090391
Chicago/Turabian StyleZhu, Ruian, Junchao Zhang, Degui Yang, Dongbo Zhao, Jiashu Chen, and Zhengliang Zhu. 2025. "Exploring Attention Placement in YOLOv5 for Ship Detection in Infrared Maritime Scenes" Technologies 13, no. 9: 391. https://doi.org/10.3390/technologies13090391
APA StyleZhu, R., Zhang, J., Yang, D., Zhao, D., Chen, J., & Zhu, Z. (2025). Exploring Attention Placement in YOLOv5 for Ship Detection in Infrared Maritime Scenes. Technologies, 13(9), 391. https://doi.org/10.3390/technologies13090391