CRS-Y: A Study and Application of a Target Detection Method for Underwater Blasting Construction Sites
Abstract
1. Introduction
- Backbone Reconstruction for Enhanced Feature Extraction: By integrating the C3K-RVB structure with the Efficient Multi-Scale Attention (EMA) mechanism, the C3K module is reconstructed to significantly strengthen the model’s context awareness and feature capture capabilities for tiny explosive targets against complex backgrounds.
- Optimization of Detection Head and Downsampling Strategy: The Space-to-Depth Convolution (SPDConv) is integrated into depthwise separable convolutions to replace conventional convolutions in the Neck network. This preserves fine-grained feature information and improves multi-scale detection performance while effectively reducing the model’s parameter count and computational complexity.
- Refinement of Bounding Box Regression Accuracy: The Inner-IoU loss function is introduced to accelerate the regression process by controlling the scale of auxiliary bounding boxes. This accurately optimizes the localization of explosives and other targets, addressing the issue of significant localization deviations inherent in baseline models within complex, occluded environments.
2. Feature Analysis of Underwater Blasting Construction Sites
3. Model Selection and Model Improvement
3.1. Overview of Object Detection Algorithms
3.2. YOLOv11 Model
3.3. CRS-Y Model
- Backbone Network Reconstruction and Feature Decoupling: Underwater blasting environments exhibit highly unstructured background characteristics, where uneven surface reflections often lead to significant confusion between targets and background textures. To mitigate this issue, the original C3K2 module is replaced with the proposed C3K-RVB module. By explicitly decoupling spatial feature modeling (Token Mixing) from channel-wise feature modeling (Channel Mixing), the network is able to more robustly disentangle complex background interference (spatial information) from the semantic attributes of targets (channel information), thereby significantly enhancing feature representation accuracy under adverse lighting conditions. In addition, the SE module is replaced with an EMA attention mechanism, which leverages cross-dimensional interactions to strengthen the recovery of degraded visual features (e.g., partially occluded targets) without introducing additional computational overhead.
- Spatial Information Preservation in the Neck Network: In large-scale operational scenes, explosive objects located at long distances typically appear at extremely small scales, making them highly susceptible to feature loss during conventional convolutional downsampling. To alleviate this problem, SPDConv is embedded into depthwise separable convolutions within the Neck network. By transferring spatial pixel information into the channel dimension, this design avoids the information degradation caused by stride-based convolutions or pooling operations, thereby substantially improving feature retention and representational capability for small-scale targets.
- Multi-Scale Loss Function Optimization: To address the large variation in target scales and the blurred object boundaries resulting from occlusion in construction scenes, an Inner-IoU loss function is incorporated into the detection head, together with a scale ratio factor. By generating auxiliary bounding boxes at different scales to participate in the loss computation, the model is able to more effectively capture fine-grained boundary information of multi-scale targets, leading to improved localization accuracy in complex and occlusion-heavy environments.
3.4. Design of the Backbone Network
3.4.1. Design of the C3K-RVB Structure
3.4.2. Design of the C3K-RVB-EMA Structure
3.5. Design of the Neck Network
3.5.1. Depthwise Separable Convolution
3.5.2. SPD-DSConv
3.6. Improvements in Detection Head Loss Function
4. Detection Testing and Result Analysis
4.1. Test Dataset
4.2. Network Configuration
4.3. Evaluation Metrics
4.4. Result Analysis
4.4.1. Comparison of Different Loss Functions
4.4.2. Comparison of Different Attention Mechanisms
4.4.3. Comparison of Different Detection Algorithms
4.4.4. Visual Comparison of the Model on Different Datasets
4.4.5. Ablation Study
5. Conclusions and Future Work
- The introduction of the Inner-IoU loss function effectively enhances the target localization accuracy, especially demonstrating outstanding performance in the regression tasks of targets with irregular shapes and varying scales. This loss function strengthens the model’s capability in detecting hard samples and small samples through a weighting mechanism.
- The combination of the C3K-RepViT structure and the EMA mechanism significantly improves the model’s feature extraction and spatial feature expression abilities, endowing it with high detection accuracy in both complex backgrounds and multi-scale scenarios.
- The proposed SPD-DSConv structure reduces the computational cost of the P2 feature layer while realizing an integrated design of downsampling and channel expansion. It enhances the joint expression capability of multi-dimensional features while ensuring the model’s lightweight property.
- The collaborative integration of multiple modules achieved an excellent balance among accuracy, recall rate, and computational complexity. Consequently, the CRS-Y model outperforms mainstream detection models on multiple public datasets and the self-built dataset, exhibiting strong generalization ability and practical deployment value.
- Model Optimization and Efficiency Enhancement: Further streamlining and optimizing the network architecture to improve inference speed and energy efficiency on resource-constrained edge devices, while maintaining detection accuracy;
- Advanced Compression and Acceleration Techniques: Investigating efficient model compression and inference acceleration strategies to achieve a better balance between high detection accuracy and lightweight deployment;
- Multimodal and Spatiotemporal Information Fusion: Integrating multimodal sensory information with spatiotemporal feature modeling to enhance robustness and generalization in complex and dynamic underwater environments, thereby meeting the stringent real-time and reliability requirements of large-scale waterway blasting operations.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, X.G. Blasting Design and Construction; Metallurgical Industry Press: Beijing, China, 2014. [Google Scholar]
- Ye, Z.Y.; Liang, H.L.; Lan, C.D. Application of YOLOv5s algorithm model for underwater target detection. TV Technol. 2023, 47, 39–43. [Google Scholar]
- Zhao, L.; Yun, Q.; Yuan, F.; Ren, X.; Jin, J.; Zhu, X. YOLOv7-CHS: An emerging model for underwater object detection. J. Mar. Sci. Eng. 2023, 11, 1949. [Google Scholar] [CrossRef]
- Lei, F.; Tang, F.; Li, S. Underwater target detection algorithm based on improved YOLOv5. J. Mar. Sci. Eng. 2022, 10, 310. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Edozie, E.; Shuaibu, A.N.; John, U.K.; Sadiq, B.O. Comprehensive Review of Recent Developments in Visual Object Detection Based on Deep Learning. Artif. Intell. Rev. 2025, 58, 277. [Google Scholar] [CrossRef]
- Malagoli, E.; Di Persio, L. 2D Object Detection: A Survey. Mathematics 2025, 13, 893. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE CVPR, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE ICCV, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE CVPR, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE ICCV, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Rong, X.L.; Xie, A.Q.; Zhao, P.; Chen, W.; Wang, B.X. Design of intelligent maintenance system for concrete beams based on machine vision. China J. Highw. Transp. 2025, 38, 307–317. [Google Scholar]
- Peng, L.P.; Zhao, B.T. Vertical shaft guide surface defect detection model based on MELE-YOLOv11n. J. Hubei Minzu Univ. (Nat. Sci. Ed.) 2025, 43, 376–381. [Google Scholar]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. AugFPN: Improving Multi-scale Feature Learning for Object Detection. arXiv 2019, arXiv:1912.05384. [Google Scholar]
- He, W.; Zhang, Y.; Chen, J. Multi-Scale Residual Aggregation Feature Pyramid Network (MSRA-FPN). Electronics 2022, 12, 93. [Google Scholar] [CrossRef]
- Du, S.; Zhou, L.; Zhang, H. ASC-YOLO: Multi-Scale Feature Fusion and Adaptive Decoupled Head for Fracture Detection in Medical Imaging. Appl. Sci. 2025, 15, 9031. [Google Scholar] [CrossRef]
- Xun, J.; Li, Q.; Zhou, K. An Efficient Algorithm for Pedestrian Fall Detection Based on Lightweight YOLO and Adaptive Binary Cross-Entropy. Sci. Rep. 2025, 15, 9036. [Google Scholar] [CrossRef]
- Jing, C.L. Research and Implementation of Lightweight Network Embedded Machine Vision System for Traffic Monitoring. Master’s Thesis, Qilu University of Technology, Jinan, China, 2024. [Google Scholar]
- Wang, A.; Chen, H.; Lin, Z.J.; Han, J.D.; Ding, G.G. RepViT: Revisiting Mobile CNN from ViT Perspective. arXiv 2023, arXiv:2307.09283. [Google Scholar]
- Hu, Y.; Chen, Y.; Li, X.; Feng, J. Dynamic Feature Fusion for Semantic Edge Detection. arXiv 2019, arXiv:1902.09104. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Ouyang, D.L.; He, S.; Zhang, G.Z.; Luo, M.; Guo, H.; Zhan, J. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes, Greece, 4–10 June 2023. [Google Scholar]
- Rozhbayani, G.; Tuama, A.; Al-Azzo, E. Social Distancing Monitoring by Human Detection Through Bird’s-Eye View Technique. In Proceedings of the VISIGRAPP: VISAPP, Rome, Italy, 27–29 February 2024. [Google Scholar]
- Chen, L.; Yu, Z.; Yang, J. SPD-CNN: A plain CNN-based model using the symmetric positive definite (SPD) matrix. Front. Neurorobot. 2022, 16, 958052. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Xu, C.; Zhang, S.J. Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box. arXiv 2023, arXiv:2311.02877. [Google Scholar]
- Padilla, R.; Netto, S.L.; da Silva, E.A.B. A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niterói, Brazil, 7–9 July 2020; pp. 237–242. [Google Scholar] [CrossRef]
- Zangana, M.; Zangana, H.M. Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments. Sensors 2021, 21, 5116. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv 2020, arXiv:2005.03572. [Google Scholar] [CrossRef]
- Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IoU Loss for Accurate Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Gu, R.; Wang, G.; Song, T.; Huang, R.; Aertsen, M.; Deprest, J.; Ourselin, S.; Vercauteren, T. CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation. arXiv 2020, arXiv:2009.10549. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11634–11642. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–24 June 2023. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]














| Model | R/% | mAP0.5/% | GFLOPs |
|---|---|---|---|
| IOU [30] | 50.4 | 57.7 | 6.4 |
| CIOU [31] | 51.7 | 57.9 | 6.4 |
| EIOU [32] | 51.1 | 58.8 | 6.4 |
| GIOU [33] | 51.4 | 56.7 | 6.4 |
| Inner-IoU | 55.7 | 61.3 | 6.4 |
| Model | R/% | mAP0.5/% | GFLOPs |
|---|---|---|---|
| CBAM [34] | 52.1 | 60.6 | 7.5 |
| CA [35] | 50.6 | 59.8 | 7.0 |
| ECA [36] | 52.4 | 54.1 | 7.3 |
| SE | 51.2 | 56.9 | 6.8 |
| EMA | 56.8 | 62.8 | 6.6 |
| Model | mAP0.5–0.95/% | Recall/% | Params/M |
|---|---|---|---|
| YOLO v5 | 31.9 | 44.2 | 1.9 |
| YOLOv8 | 33.5 | 48.1 | 3.2 |
| YOLOX | 33.8 | 48.7 | 11.2 |
| Faster R-CNN | 30.1 | 43.5 | 14.3 |
| Mask R-CNN | 31.7 | 44.2 | 16.2 |
| CRS-Y | 43.2 | 59.7 | 2.2 |
| Number | A | B | C | GFLOPS | mAP0.5 | mAP0.5:0.95 |
|---|---|---|---|---|---|---|
| 1 | 6.4 | 57.7 | 33.5 | |||
| 2 | √ | 6.4 | 61.3 | 33.6 | ||
| 3 | √ | 5.9 | 62.8 | 37.6 | ||
| 4 | √ | 6.2 | 60.2 | 35.2 | ||
| 5 | √ | √ | 6.1 | 60.8 | 36.6 | |
| 6 | √ | √ | √ | 6.6 | 66.6 | 43.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, X.; Gao, H.; Li, L.; Zhao, Y.; Men, C. CRS-Y: A Study and Application of a Target Detection Method for Underwater Blasting Construction Sites. Appl. Sci. 2026, 16, 615. https://doi.org/10.3390/app16020615
Huang X, Gao H, Li L, Zhao Y, Men C. CRS-Y: A Study and Application of a Target Detection Method for Underwater Blasting Construction Sites. Applied Sciences. 2026; 16(2):615. https://doi.org/10.3390/app16020615
Chicago/Turabian StyleHuang, Xiaowu, Han Gao, Linna Li, Yucheng Zhao, and Chen Men. 2026. "CRS-Y: A Study and Application of a Target Detection Method for Underwater Blasting Construction Sites" Applied Sciences 16, no. 2: 615. https://doi.org/10.3390/app16020615
APA StyleHuang, X., Gao, H., Li, L., Zhao, Y., & Men, C. (2026). CRS-Y: A Study and Application of a Target Detection Method for Underwater Blasting Construction Sites. Applied Sciences, 16(2), 615. https://doi.org/10.3390/app16020615

