GDS-YOLOv7: A High-Performance Model for Water-Surface Obstacle Detection Using Optimized Receptive Field and Attention Mechanisms
Abstract
1. Introduction
- (1)
- To mitigate the limitations caused by insufficient feature representation for small- and medium-scale objects, this paper proposes a novel receptive field enlargement module, Ghost Spatial Pyramid Pooling Cross Stage Partial Connections (GhostSPPCSPC) and introduces an attention mechanism. By reducing the pooling kernel size and decreasing model parameters, this approach enhances the model’s multi-scale target detection capabilities.
- (2)
- Depthwise Separable Convolution (DSC) [19] is introduced to replace some of the traditional convolutional layers in the baseline model, significantly reducing the network’s parameter count, improving computational speed, and minimizing the impact of precision loss on model performance.
- (3)
- To mitigate the issue of feature degradation resulting from the absence of global contextual cues in convolution operations, a novel attention mechanism named Spatial–Channel Synergetic Attention (SCSA) is proposed. This module strengthens the network’s capacity and is applied to the Effective Long-Range Aggregation Network (ELAN) module in the backbone network, leading to improvements in both model accuracy and parameter reduction.
2. Materials and Methods
2.1. Fundamental Concepts of the YOLOv7 Structure
2.2. YOLOv7 Model Improvement
2.2.1. Space Pyramid Pooling Module Based on GhostConv (GhostSPPCSPC)
- (1)
- For the Conv module, it is substituted with GhostConv, a neural-network structure optimized for efficiency, particularly suited for deployment on mobile and edge computing devices [28]. The core idea of GhostConv is to significantly reduce computational resource consumption through an efficient feature reuse strategy, enabling the model to achieve competitive accuracy with markedly reduced architectural complexity. The computational process of GhostConv is shown in Figure 3. GhostConv first applies standard convolution to derive preliminary features from the input data. Then, linear operations are performed to enhance the features and increase the channel count. During this process, identity operations are conducted in parallel with linear operations to preserve the integrity of initial feature representations. The final output is generated by feature concatenation. This approach not only reduces the demand for computational resources but also ensures that the model’s performance remains unaffected.
- (2)
- Introduction of the SimAM—a simple, parameter-free attention module that is simple in structure and highly efficient [29]. This attention module assigns attention weights to the three-dimensional feature information without increasing the network’s parameter count. This design enables the model to prioritize three-dimensional spatial features, addressing the issue of limited features for small target obstacles on the water surface. SimAM is an energy function optimized through neuroscience principles, with its energy function given by Equation (1).
- (3)
- The three MaxPool kernels in the SPPCSPC module have been optimized by reducing the original 5 × 5, 9 × 9, and 13 × 13 pooling kernels to 3 × 3, 5 × 5, and 9 × 9, respectively. While this improvement reduces the original module’s demand for parameters and computational power, it better captures the details and features of small objects, significantly enhancing small-object detection performance. However, this adjustment may also result in a loss of global contextual information for large objects, hindering the integration of larger region information, which could affect the model’s ability to capture background information for larger objects. At the same time, considering the characteristics of the dataset used in this study and the specific requirements of the detection targets, the smaller pooling kernels are more suitable for the needs of this paper, minimizing spatial loss and thus improving detection accuracy.
2.2.2. Depthwise Separable Convolution
2.2.3. Integrating the SCSA Attention Mechanism
- (1)
- Spatial and Channel Decomposition: SMSA decomposes the given input along the height and width dimensions. By applying global average pooling along each dimension, two unidirectional one-dimensional sequence structures, and , are generated. Simultaneously, in order to capture different spatial distributions and contextual relationships, the feature set is divided into equally sized independent sub-features, and , with each sub-feature having channels. In this paper, the default value of K is set to 4. The decomposition process of the sub-features is as follows:
- (2)
- Lightweight Convolution Strategy Across Non-Intersecting Sub-Features: After performing the cross-channel grouping of the feature set, to effectively learn the different semantic spatial structures within each sub-feature, convolution operations with kernel sizes of 3, 5, 7, and 9 are applied to the corresponding sub-features. This approach optimizes the continuity of feature representation and reduces the representation discrepancy across different semantic layers. Furthermore, SMSA employs lightweight shared convolutions to address the limited receptive field problem caused by one-dimensional convolutions. The information extraction process is defined as follows:
2.3. Experimental Environment and Parameter Setting
2.4. Evaluation Methods
3. Results and Discussion
3.1. Comparison of Attention Mechanism Fusion
3.2. Ablation Experiment
3.3. Comparison of Model Experiments Before and After Improvement
3.4. Comparison of the Improved Baseline Model with Other Network Models
4. Conclusions
- (1)
- On top of the baseline model, improvements were made by enhancing the SPPCSPC module, introducing the DSC module, and adding the SCSA module. Precision (P), recall (R), and mAP@0.5 were improved by 4.3%, 6.9%, and 4.9%, respectively, demonstrating the effectiveness of the improvements.
- (2)
- The proposed method yields a slightly lower mAP@0.5 than the YOLOv8 and YOLOv9 models. However, considering key precision metrics, the proposed method outperforms them and meets the requirements for accuracy and real-time performance in water-surface detection.
- (3)
- Adding more modules to the model is not always better. Excessive additions may lead to a decrease in some model metrics (such as R and mAP@0.5).
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Alamoush, A.S.; Ölçer, A.I. Maritime Autonomous Surface Ships: Architecture for Autonomous Navigation Systems. J. Mar. Sci. Eng. 2025, 13, 122. [Google Scholar] [CrossRef]
- Guo, S.Y.; Zhang, X.G.; Zheng, Y.S.; Du, Y.Q. An Autonomous Path Planning Model for Unmanned Ships Based on Deep Reinforcement Learning. J. Mar. Sci. Eng. 2020, 20, 426. [Google Scholar] [CrossRef] [PubMed]
- Chen, Z.; Liu, C.; Filaretov, V.F.; Yukhimets, D. Multi-scale ship detection algorithm based on YOLOv7 for complex scene SAR images. Remote Sens. 2023, 15, 2071. [Google Scholar] [CrossRef]
- Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Yang, Z.; Lan, X.; Wang, H. Comparative Analysis of YOLO Series Algorithms for UAV-Based Highway Distress Inspection: Performance and Application Insights. Sensors 2025, 25, 1475. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Gan, L.; Yan, Z.; Zhang, L.; Liu, K.; Zheng, Y.; Zhou, C.; Shu, Y. Ship path planning based on safety potential field in inland rivers. Ocean Eng. 2022, 260, 111928. [Google Scholar] [CrossRef]
- Ning, Y.; Zhao, L.; Zhang, C.; Yuan, Z. STD-Yolov5: A ship-type detection model based on improved Yolov5. Ships Offshore Struct. 2024, 19, 66–75. [Google Scholar] [CrossRef]
- Yang, S.; Wei, S.; Wei, L.; Shuai, W.; Yang, Z. Review of research on information fusion of shipborne radar and AIS. Ship Sci. Technol. 2021, 43, 167–171. [Google Scholar]
- Qi, L.L.; Gao, J.L. Small Object Detection Based on Improved YOLOv7. Comput. Eng. 2023, 49, 41–48. [Google Scholar]
- Hao, K.; Wang, K.; Wang, B.B. Lightweight Underwater Biological Detection Algorithm Based on Improved Mobilenet-YOLOv3. J. Zhejiang Univ. (Eng. Sci.) 2022, 56, 1622–1632. [Google Scholar]
- Tang, Y.S.; Zhang, Y.; Xiao, J.R.; Cao, Y.; Yu, Z.J. An Enhanced Shuffle Attention with Context Decoupling Head with Wise IoU Loss for SAR Ship Detection. Remote Sens. 2024, 16, 4128. [Google Scholar] [CrossRef]
- Sun, Z.; Leng, X.; Zhang, X.; Zhou, Z.; Xiong, B.; Ji, K.; Kuang, G. Arbitrary-Direction SAR Ship Detection Method for Multiscale Imbalance. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5208921. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, S.; Sun, Z.; Liu, C.; Sun, Y.; Ji, K.; Kuang, G. Cross-sensor SAR image target detection based on dynamic feature discrimination and center-aware calibration. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5209417–5209433. [Google Scholar] [CrossRef]
- Chang, S.; Deng, Y.K.; Zhang, Y.Y.; Zhao, Q.C.; Wang, R.; Zhang, K. An advanced scheme for range ambiguity suppression of spaceborne SAR based on blind source separation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5230112–5230123. [Google Scholar] [CrossRef]
- Zhang, M.H.; Xu, S.B.; Song, W.; He, Q.; Wei, Q.M. Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- Li, Z.Z.; Ren, H.X.; Yang, X.; Wang, D.; Sun, J. LWS-YOLOv7: A Lightweight Water-Surface Object-Detection Model. J. Mar. Sci. Eng. 2024, 12, 861. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.X.; Wang, W.J.; Zhu, Y.K.; Pang, R.M.; Vasudevan, V. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Zhang, Y.; Sun, Y.P.; Wang, Z.; Jiang, Y. YOLOv7-RAR for urban vehicle detection. Sensors 2023, 23, 1801. [Google Scholar] [CrossRef] [PubMed]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Lee, Y.; Hwang, J.-W.; Lee, S.; Bae, Y.; Park, J. An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 752–760. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13029–13038. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.F.; Shi, J.P.; Jia, J.Y. Path aggregation network for instance segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Hu, M.; Li, Y.; Fang, L.; Wang, S.J. A2-FPN: Attention aggregation based feature pyramid network for instance segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15343–15352. [Google Scholar]
- Han, K.; Wang, Y.H.; Tian, Q.; Guo, J.Y.; Xu, C.J.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Model | P/% | R/% | Params/M | mAP@0.5/% |
---|---|---|---|---|
SPPCSPC (13 × 13, 9 × 9, 5 × 5) | 79.10 | 73.80 | 37.28 | 73.20 |
SPPCSPC (9 × 9, 5 × 5, 3 × 3) | 79.10 | 75.80 | 37.28 | 73.00 |
GhostSPPCSPC (13 × 13, 9 × 9, 5 × 5) | 78.10 | 76.20 | 32.81 | 73.37 |
GhostSPPCSPC (9 × 9, 5 × 5, 3 × 3) | 80.10 | 71.70 | 32.81 | 74.00 |
Name | Configuration | Parameter | Parameter Values |
---|---|---|---|
GPU | RTX 4060Ti | Image size/pixel | 640 × 640 |
CPU | Core(TM) i7-13700KF | Learning rate | 0.01 |
Batch-Size | 8 | Optimizer | SGD |
Model | Params/M | mAP@0.5/% | FPS/(Frame•s−1) Fame × s−1 |
---|---|---|---|
- | 36.57 | 73.20 | 128.21 |
CBAM | 37.92 | 79.30 | 131.58 |
MSAM | 37.93 | 81.40 | 84.03 |
SCSA | 35.24 | 81.00 | 149.25 |
Network Model | SCSA | GhostSPPCSPC | DSC | P/% | R/% | GFLOPS | mAP@0.5/% |
---|---|---|---|---|---|---|---|
Baseline model | 79.10 | 73.80 | 103.4 | 73.20 | |||
1 | √ | 81.10 | 86.00 | 93.4 | 81.00 | ||
2 | √ | 80.10 | 71.70 | 102.4 | 74.00 | ||
3 | √ | 81.90 | 75.10 | 90.1 | 74.60 | ||
4 | √ | √ | 80.50 | 77.10 | 90.4 | 75.80 | |
5 | √ | √ | √ | 83.40 | 80.70 | 79.8 | 78.10 |
Network Model | Input Size/Pixel | mAP@0.5/% | P/% | Params/M | R/% |
---|---|---|---|---|---|
Faster-RCNN | 640 × 640 | 19.17 | 35.79 | 137.099 | 21.31 |
SSD | 640 × 640 | 62.23 | 69.80 | 26.285 | 40.38 |
YOLOv7 | 640 × 640 | 73.20 | 79.10 | 37.28 | 73.80 |
YOLOv7-tiny | 640 × 640 | 77.80 | 79.50 | 25.06 | 82.00 |
YOLO-NAS | 640 × 640 | 84.43 | 25.84 | 19.03 | 94.95 |
YOLOv8 | 640 × 640 | 82.00 | 78.20 | 30.10 | 79.80 |
LWS-YOLOv7 [18] | 640 × 640 | 61.30 | 72.10 | 34.77 | 64.60 |
YOLOv9 | 640 × 640 | 79.80 | 75.90 | 19.74 | 79.40 |
YOLOv10 | 640 × 640 | 76.70 | 74.80 | 27.01 | 78.20 |
GDS-YOLOv7 | 640 × 640 | 78.10 | 83.40 | 31.36 | 80.70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, X.; Huang, L.; Ke, F.; Liu, C.; Yang, R.; Xie, S. GDS-YOLOv7: A High-Performance Model for Water-Surface Obstacle Detection Using Optimized Receptive Field and Attention Mechanisms. ISPRS Int. J. Geo-Inf. 2025, 14, 238. https://doi.org/10.3390/ijgi14070238
Yang X, Huang L, Ke F, Liu C, Yang R, Xie S. GDS-YOLOv7: A High-Performance Model for Water-Surface Obstacle Detection Using Optimized Receptive Field and Attention Mechanisms. ISPRS International Journal of Geo-Information. 2025; 14(7):238. https://doi.org/10.3390/ijgi14070238
Chicago/Turabian StyleYang, Xu, Lei Huang, Fuyang Ke, Chao Liu, Ruixue Yang, and Shicheng Xie. 2025. "GDS-YOLOv7: A High-Performance Model for Water-Surface Obstacle Detection Using Optimized Receptive Field and Attention Mechanisms" ISPRS International Journal of Geo-Information 14, no. 7: 238. https://doi.org/10.3390/ijgi14070238
APA StyleYang, X., Huang, L., Ke, F., Liu, C., Yang, R., & Xie, S. (2025). GDS-YOLOv7: A High-Performance Model for Water-Surface Obstacle Detection Using Optimized Receptive Field and Attention Mechanisms. ISPRS International Journal of Geo-Information, 14(7), 238. https://doi.org/10.3390/ijgi14070238