SDA-Mask R-CNN: An Advanced Seabed Feature Extraction Network for UUV
Abstract
:1. Introduction
- SSGAR-Net uses a backbone network with Group Convolution (GC), Cross-scale Convolutional Block Attention Module (CCBAM), and Skip Integration to extract multi-scale features, minimize parameter redundancy, and improve edge response through redundancy verification.
- DWHF net combines Deep Separable Convolution (DSC) to reduce computational complexity while maintaining model performance, which is especially suitable for high-resolution SSS image processing. The module further uses a weighted pyramid structure to realize multi-scale feature fusion, which significantly improves the adaptability to different target scales in underwater environments.
- The ASMO strategy systematically enhances dynamic learning-rate scheduling, the collaborative optimization mechanism of the regression box, and the segmentation task.
2. Materials and Methods
2.1. SSS Data Preprocessing
2.2. Architecture of Seabed Feature Extraction of SSS for UUV Based on SDA-Mask R-CNN
2.2.1. Preliminary Screening of Backbone Networks
2.2.2. Structural Synergistic Group-Attention Residual Network
2.2.3. Depth-Weighted Hierarchical Fusion Network (DWHF-Net)
2.2.4. Adaptive Synergistic Mask Optimization (ASMO)
- Initial Stage: Set the initial learning rate , and gradually reduce it to through cosine annealing to optimize the weight update direction of the backbone network;
- Restart Strategy: Restart the learning rate every steps of training (corresponding to medium-resolution data of SSS images), forcing the model to jump out of local minima and alleviating gradient stagnation caused by blurred target edges.
- During the restart phase, dynamically adjust the gradient weights of the classification, regression, and mask branches, prioritizing the optimization of the bounding box regression branch, which is critical for SSS target localization.
3. Results
3.1. Experimental Settings and Data Split
- Initial learning rate: ;
- Batch size: 8;
- SGDR: = 0.1, = 0.001, = 50;
- Focal Loss: = 0.8, = 2;
3.2. Loss Function
3.3. Discussion
3.3.1. Experiment 1
3.3.2. Experiment 2
3.3.3. Experiment 3
3.3.4. Experiment 4
- Noise suppression: The DWHF-Net’s depthwise separable convolutions and weighted pyramid fusion effectively preserve high-frequency edge details while attenuating Gaussian noise, as evidenced by SDA-Mask R-CNN’s 18.9% higher AP@0.5—Noisy compared to Mask R-CNN;
- Adaptive optimization: The ASMO strategy, particularly Focal Loss and Matrix-NMS, prioritizes hard samples (e.g., faint seabed edges) and suppresses false positives, reducing fragmentation in noisy masks;
- Multi-task stability: SGDR’s cyclic learning-rate scheduling prevents overfitting to noisy annotations, stabilizing both detection and segmentation branches during training (Figure 15).
3.4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ioannou, G.; Forti, N.; Millefiori, L. Underwater inspection and monitoring: Technologies for autonomous operations. IEEE Aerosp. Electron. Syst. Mag. 2024, 39, 4–16. [Google Scholar] [CrossRef]
- Song, Y.W.; Sung, M.; Park, S. Visual SLAM-Based 3D Path Planning Method of UUV for Enhancement of 3D Reconstruction Using Photogrammetry. In Proceedings of the 2024 24th International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 29 October–1 November 2024; pp. 1625–1626. [Google Scholar]
- Kyatham, H.; Negahdaripour, S.; Xu, M. Performance Assessment of Feature Detection Methods for 2-D FS Sonar Imagery. In Proceedings of the IEEE OCEANS 2024-Halifax, Halifax, NS, Canada, 23–26 September 2024; pp. 1–7. [Google Scholar]
- Marty, B.; Chemisky, B.; Charlot, D. Inertial Sidescan Sonar: Expanding Side Scan Sonar Processing by Leveraging Inertial Navigation Systems. In Proceedings of the IEEE OCEANS 2024-Halifax, Halifax, NS, Canada, 23–26 September 2024; pp. 1–4. [Google Scholar]
- Arora, P.; Mehta, R.; Ahuja, R. An adaptive medical image registration using hybridization of teaching learning-based optimization with affine and speeded up robust features with projective transformation. Clust. Comput. 2024, 27, 607–627. [Google Scholar] [CrossRef]
- Hamdani, I.; Anam, S.; Shofianah, N. Counting Bacterial Colony and Reducing noise on Low-Quality Image Using Modified Perona-Malik Diffusion Filter with Sobel Mask Fractional Order. J. Sisfokom Sistem Inf. Dan Komput. 2023, 12, 271–279. [Google Scholar] [CrossRef]
- Yu, Z. Central difference convolutional networks for face anti-spoofing. IEEE Trans. Inf. Forensics Secur. 2025, 16, 231–245. [Google Scholar]
- Maurya, R.; Arora, D.; Gopalakrishnan, T. Combining Deep Features with the Invariant local binary pattern features for Skin Cancer Classification. In Proceedings of the 2023 IEEE Pune Section International Conference (PuneCon), Pune, India, 14–16 December 2023; pp. 1–5. [Google Scholar]
- Chen, Z.; Yang, R.; Zhang, S. Recognizing pawing behavior of prepartum doe using semantic segmentation and motion history image (MHI) features. Expert Syst. Appl. 2024, 242, 122829. [Google Scholar] [CrossRef]
- Rumson, A. The application of fully unmanned robotic systems for inspection of subsea pipelines. Ocean Eng. 2021, 235, 109214. [Google Scholar] [CrossRef]
- Howarth, K.; Neilsen, T.; Van Komen, D. Seabed classification using a convolutional neural network on explosive sounds. IEEE J. Ocean. Eng. 2021, 47, 670–679. [Google Scholar] [CrossRef]
- Zheng, X.; Yu, X.; Yin, Y. Three-dimensional feature maps and convolutional neural network-based emotion recognition. Int. J. Intell. Syst. 2021, 36, 6312–6336. [Google Scholar] [CrossRef]
- Ma, C.; Gu, Y.; Wang, Z. TriConvUNeXt: A pure CNN-Based lightweight symmetrical network for biomedical image segmentation. J. Imaging Inform. Med. 2024, 37, 2311–2323. [Google Scholar] [CrossRef] [PubMed]
- Gu, W.; Bai, S.; Kong, L. A review on 2D instance segmentation based on deep neural networks. Image Vis. Comput. 2022, 120, 104401. [Google Scholar] [CrossRef]
- Indriani, R.; Adiwijaya, R.; Jarmawijaya, N. Applying Transfer Learning ResNet-50 for Tracking and Classification of A Coral Reef in Development The Mobile Application with Scrum Framework. J. Inf. Technol. 2023, 4, 24–29. [Google Scholar] [CrossRef]
- Alindayu, R.; Ignacio, P.; Licnachan, L. Learning from the field: Practical and technical learnings in implementing a national research and training program for quantifying and classifying marine plastics pollution in the Philippines. In Proceedings of the IEEE OCEANS 2023-Limerick, Limerick, Ireland, 5–8 June 2023; pp. 1–7. [Google Scholar]
- Kumar, A.; Sharma, P.; Rajendran, S. ResNeXt-based architectures for side-scan sonar segmentation. IEEE J. Ocean. Eng. 2023, 48, 189–203. [Google Scholar]
- Gao, S.; Cheng, M.M.; Zhao, K. Res2Net: A new multi-scale backbone architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021; pp. 652–661. [Google Scholar]
- Patel, D.; Amin, K. A cross-domain semantic similarity measure and multi-source domain adaptation in sentiment analysis. In Proceedings of the International Conference on Augmented Intelligence and Sustainable Systems (ICAISS) 2022, Trichy, India, 24–26 November 2022; pp. 760–764. [Google Scholar]
- Wu, X.; Ju, X.; Dai, S. DFSMDA: A Domain Adaptation Algorithm with Domain Feature Extraction for EEG Emotion Recognition. In Proceedings of the International Conference on Artificial Intelligence, Virtual Reality and Visualization 2024, Nanjing, China, 1–3 November 2024; pp. 120–124. [Google Scholar]
- Tian, J.; Huang, H. CIA-SOLO: Coordinate Instance Attention SOLOv2 for Instance Segmentation in Side-Scan Sonar Images. In Proceedings of the IEEE 2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 19–21 April 2024; pp. 141–145. [Google Scholar]
- Sharma, J.; Kumar, D.; Chattopadhay, S. Automated Detection of Wheat Powdery Mildew Using YOLACT Instance Segmentation. In Proceedings of the IEEE 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 14–15 March 2024; pp. 1–4. [Google Scholar]
- Zhang, H.; Tian, M.; Shao, G. Target detection of forward-looking sonar image based on improved YOLOv5. IEEE Access. 2022, 10, 18023–18034. [Google Scholar] [CrossRef]
- Yin, F.; Nie, W.; Su, Y. Semantic Segmentation of Forward-Looking Sonar Images Based on Improved Deeplabv3+. In Proceedings of the IEEE OCEANS 2024, Singapore, 14–18 April 2024; pp. 1–5. [Google Scholar]
- Chungath, T.; Nambiar, A.; Mittal, A. Transfer learning and few-shot learning based deep neural network models for underwater sonar image classification with a few samples. IEEE J. Ocean. Eng. 2023, 49, 294–310. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Fu, C.; Liu, R.; Fan, X. Rethinking general underwater object detection: Datasets, challenges, and solutions. Neurocomputing 2023, 517, 243–256. [Google Scholar] [CrossRef]
- Gao, S.; Han, Q.; Li, D. Representative batch normalization with feature calibration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021; pp. 8669–8679. [Google Scholar]
- Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV) 2018, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Sample Description | Training Set | Validation Set | Testing Set |
---|---|---|---|
Positive Sample (including foreground) | 1320 | 284 | 284 |
Negative Sample (background) | 80 | 16 | 16 |
Model | AP@0.5 | IoU | Core Improvements | SSS Images Adaptation |
---|---|---|---|---|
Model v1 | 0.96 | 0.519 | Post-activated structure | Baseline performance with post-activation residual blocks. |
Model v2 | 0.94 | 0.478 | Pre-activation structure | Pre-activation design underperforms on small datasets due to over-smoothing. |
Model v3 | 0.96 | 0.568 | GN replaces BN; Mish replaces ReLU | GN stabilizes small-batch training; Mish enhances low-contrast feature retention. |
Model v4 | 0.98 | 0.582 | Res2Net-style grouping | Hierarchical multi-scale fusion improves edge preservation and IoU. |
Model v5 | 0.97 | 0.542 | ResNeXt-style grouping | Homogeneous grouping reduces feature diversity; unsuitable for SSS multi-scale tasks. |
Model v6 | 0.98 | 0.595 | V4 + CBAM in each block | Local attention refines edge features but lacks cross-scale interaction. |
Model v7 | 0.97 | 0.552 | V4 + CBAM at stage ends | Global attention loses spatial resolution, degrading edge precision. |
Model v8 | 0.98 | 0.628 | V4 + CBAM in each block + CCBAM in skip connections | Cross-scale attention balances hierarchical features, achieving optimal edge-semantic fusion. |
Model v9 | 0.98 | 0.594 | branch participates in multi-layer fusion | Forced convolution in shallow layers blurs details, breaking progressive enhancement. |
Model v10 | 0.97 | 0.586 | Unbalanced grouping | Uneven channel allocation skews scale coverage, harming small-target segmentation. |
Model v11 | 1.00 | 0.569 | V8 + BiFPN with standard convolutions | Using standard convolutions sacrifices spatial detail, critical for SSS edge accuracy. |
Model v12 | 1.00 | 0.656 | V8 + BiFPN with DSC | DSC preserves high-frequency features, maximizing IoU. |
Model v13 | 1.00 | 0.695 | ASMO | Synergy of loss reweighting, mask refinement, and cyclic learning-rate optimization. |
Model | AP@0.5 | IoU |
---|---|---|
SDA-Mask R-CNN | 1.00 | 0.695 |
Mask R-CNN | 0.861 | 0.633 |
YOLOv5s_seg | 0.878 | 0.659 |
YOLACT | 0.824 | 0.631 |
SOLOv2 | 0.786 | 0.612 |
Deeplabv3+ | 0.856 | 0.651 |
Canny | Harris | Prewitt | Susan | Sobel | Laplace | Robert | Kirsch | Log | Dog | |
---|---|---|---|---|---|---|---|---|---|---|
IoU | 0.124 | 0.218 | 0.217 | 0.035 | 0.109 | 0.128 | 0.121 | 0.236 | 0.112 | 0.126 |
Model | AP@0.5-Noisy | AP@0.5-Clean | AP | IoU@0.5-Noisy | IoU@0.5-Clean | IoU |
---|---|---|---|---|---|---|
SDA-Mask R-CNN | 0.952 | 1.00 | 0.048 | 0.664 | 0.695 | 0.031 |
Mask R-CNN | 0.793 | 0.861 | 0.068 | 0.585 | 0.633 | 0.048 |
YOLOv5s_seg | 0.811 | 0.878 | 0.067 | 0.617 | 0.659 | 0.042 |
YOLACT | 0.732 | 0.824 | 0.092 | 0.569 | 0.631 | 0.062 |
SOLOv2 | 0.709 | 0.786 | 0.077 | 0.524 | 0.612 | 0.088 |
Deeplabv3+ | 0.778 | 0.856 | 0.078 | 0.607 | 0.651 | 0.044 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, Y.; Dai, D.; Wang, H.; Li, C.; Song, S. SDA-Mask R-CNN: An Advanced Seabed Feature Extraction Network for UUV. J. Mar. Sci. Eng. 2025, 13, 863. https://doi.org/10.3390/jmse13050863
Xiao Y, Dai D, Wang H, Li C, Song S. SDA-Mask R-CNN: An Advanced Seabed Feature Extraction Network for UUV. Journal of Marine Science and Engineering. 2025; 13(5):863. https://doi.org/10.3390/jmse13050863
Chicago/Turabian StyleXiao, Yao, Dongchen Dai, Hongjian Wang, Chengfeng Li, and Shaozheng Song. 2025. "SDA-Mask R-CNN: An Advanced Seabed Feature Extraction Network for UUV" Journal of Marine Science and Engineering 13, no. 5: 863. https://doi.org/10.3390/jmse13050863
APA StyleXiao, Y., Dai, D., Wang, H., Li, C., & Song, S. (2025). SDA-Mask R-CNN: An Advanced Seabed Feature Extraction Network for UUV. Journal of Marine Science and Engineering, 13(5), 863. https://doi.org/10.3390/jmse13050863