A Spatial Distribution Probability-Guided Detection Framework for Underwater Sonar Imagery
Highlights
- A real sonar dataset containing practice mines and practice subsurface buoys is created. The dataset is used to train and validate a deep neural network.
- A Spatial Distribution Probability-Guided Detection Framework is designed, focusing on submarine point target detection, which achieves superior detection performance.
- Spatial distribution probability provides reliable prior knowledge which enables the deep neural network to focus on the target part.
- The proposed framework achieves robust object detection in data-scarce scenarios, demonstrating generalizability beyond underwater sonar datasets.
Abstract
1. Introduction
- A real sonar dataset containing practice mines and practice subsurface buoys is created. The dataset is used to train and validate the deep neural network.
- The core novel spatial distribution probability-guided network is designed, focusing on submarine point target detection. Detection results of targets, including location and confidence, are generated by the proposed network for further processing.
- A post-processing method, which is composed of the DBSCAN algorithm, is designed to cluster coordinate points.
- Experiments on public datasets and our self-collected dataset are designed to verify the effectiveness of the proposed framework.
2. Overviews of Our Proposed Submarine Target Inspection System
3. The Proposed Method
3.1. Versatile Vision Foundation Model for Sonar Image Processing
3.2. Spatial Distribution Probability-Guided Detection Module
3.2.1. Spatial Distribution Probability
3.2.2. Framework of Spatial Distribution Probability-Guided Detection Module
- Transformer Blocks: The Transformer-based feature extraction module employs four Transformer Blocks to capture the long-range dependencies inherent in linear target features. The architecture of a single Transformer block is illustrated in Figure 7a. Each block incorporates a pre-normalization design. First, the input feature maps are normalized using Layer Normalization (LayerNorm) and then processed by a Multi-Head Attention (MHA) mechanism. The output from the MHA is combined with the original input via a residual connection. This summed tensor is then passed through a second LayerNorm layer before entering a Feed-Forward Network (FFN), which produces the final output of the block. The core of the Transformer block is the Multi-Head Attention mechanism. This mechanism projects input vectors into multiple distinct subspaces, enabling each “head” to independently compute attention weights from different representational perspectives. The outputs from all heads are concatenated and subsequently integrated via a final linear projection. This process allows the final representation to synthesize rich information from diverse dimensions. Mathematically, the scaled dot-product attention is defined as
- Patch Merging: The proposed network incorporates Patch Merging modules to perform spatial downsampling. The detailed operation is illustrated in Figure 7b. Given an input feature map of size H × W × C, the module partitions the spatial dimensions into non-overlapping 2 × 2 patches. Each group consists of four adjacent pixels: the top-left (orange), top-right (green), bottom-left (yellow), and bottom-right (red). These corresponding patches are then rearranged based on their relative positions. Subsequently, the rearranged patches are concatenated along the channel dimension, yielding a downsampled feature map of size . A key advantage of Patch Merging is that it achieves spatial compression and preserves more detailed information.
- Feature Guidance Module: To achieve feature guidance via spatial distribution probability, we designed the Feature Guidance Module, as detailed in Figure 7c. First, the spatial distribution probability map with dimensions h × w × 1 is upsampled to H × W × 1 to align with the spatial resolution of the input feature map. It is then replicated along the channel dimension to match the channel count C of the input. Finally, utilizing a residual structure, the input feature map is multiplied by the transformed probability map to emphasize target regions. This result is then added to the original input feature map to yield the final feature-guided output.
3.3. Target Position Calculation Module
3.4. DBSCAN-Based Post-Processing Module
- Core Point: A point is defined as a Core Point if its ε-neighborhood contains at least MinPts points (including itself). Core Points serve as the “seeds” of a cluster.
- Border Point: A point that is not a Core Point but lies within the ε-neighborhood of a Core Point is classified as a Border Point. Border Points belong to the cluster of the associated Core Point but cannot expand the cluster further.
- Noise Point: Points that are neither Core Points nor Border Points are considered Noise Points. These are treated as outliers or background noise and do not belong to any cluster.
| Algorithm 1 Density-Based Spatial Clustering of Applications with Noise(DBSCAN). |
| 1. Initialization: Mark all points as “unvisited”. |
| 2. Iteration: Randomly select an unvisited point p from the dataset. |
3. Check:
|
| 4. Expansion: For each Core Point newly added to the cluster, the algorithm recursively examines its ε-neighborhood. If a point within this neighborhood is also identified as a Core Point, all points in its neighborhood are incorporated into the current cluster. This process continues iteratively until no new points can be added to any cluster. |
| 5. Repetition: Repeat steps 2–4 until all points have been visited. Ultimately, every point is either assigned to a cluster or labeled as Noise Point. |
4. Dataset
4.1. Data Acquisition
4.2. Data Processing
5. Experiments and Analysis
5.1. Experimental Setup
5.2. Experiments on Public Dataset
5.3. Experiments on Our Dataset
6. Discussion
6.1. Generalization Capability in Few-Shot Scenarios
6.2. Localization Accuracy and Robustness in Complex Environments
6.3. Noise Suppression and False Alarm Handling
6.4. Limitations of the Method
7. Conclusions
- Innovative Detection Architecture: We designed the Spatial Distribution Probability-Guided Detection Module. This module utilizes a general-purpose Vision Foundation Model (DINOv3) to generate spatial distribution probability maps, guiding the Transformer-based feature extraction network. This mechanism breaks the dependence of traditional Convolutional Neural Networks on large amounts of annotated data, achieving high-precision object detection under few-shot conditions.
- Complete Perception System: We constructed a complete system comprising an Object Position Calculation Module (converting image coordinates to global longitude and latitude) and a DBSCAN-based post-processing module (aggregating discrete detection points). This enables the UUV to perform online detection, global localization, and intelligent navigation.
- Empirical Validity: On the public mine dataset, the method verified its superiority in low-data regimes. On the self-constructed complex scenario dataset, the model achieved a mAP50 of 0.715, significantly outperforming baseline models. Field sea trials verified that the system can effectively distinguish real targets from noise and correct localization errors caused by inertial navigation drift through clustering algorithms.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- McKay, J.; Gerg, I.; Monga, V.; Raj, R.G. What’s mine is yours: Pretrained CNNs for limited training sonar ATR. In Proceedings of the OCEANS 2017-Anchorage, Anchorage, AK, USA, 18–21 September 2017; IEEE: New York, NY, USA, 2017; pp. 1–7. [Google Scholar]
- Zhu, K.; Tian, J.; Huang, H. Underwater objects classification method in high-resolution sonar images using deep neural network. Acta Acust. 2019, 44, 595–603. [Google Scholar]
- Cheng, Z.; Huo, G.; Li, H. A multidomain collaborative transfer learning method with multiscale repeated attention mechanism for underwater sidescan sonar image classification. Remote Sens. 2022, 14, 355. [Google Scholar] [CrossRef]
- Xu, H.; Yang, L.; Long, X. Underwater sonar image classification with small samples based on parameter-based transfer learning and deep learning. In Proceedings of the Global Conference on Robotics, Artificial Intelligence and Information Technology (GCRAIT), Chicago, IL, USA, 30–31 July 2022; IEEE: New York, NY, USA, 2022; pp. 304–307. [Google Scholar]
- Ochal, M.; Vazquez, J.; Petillot, Y.; Wang, S. A comparison of few-shot learning methods for underwater optical and sonar image classification. In Proceedings of the Global Oceans 2020, Biloxi, MS, USA, 5–30 October 2020; IEEE: New York, NY, USA, 2020; pp. 1–10. [Google Scholar]
- Chen, Y.; Li, B.; Liang, H.; Yang, C. Research on sonar image few-shot classification based on deep learning. J. Northwest. Polytech. Univ. 2022, 40, 739–745. [Google Scholar]
- Xu, H.; Yang, L.; Zhang, M. Unsupervised classification based on deep adaptation network for sonar images. J. Electron. Imaging 2023, 32, 013029. [Google Scholar] [CrossRef]
- Bell, J.M. A Model for the Simulation of Sidescan Sonar. Ph.D. Thesis, Heriot-Watt University, Edinburgh, UK, 1995. [Google Scholar]
- Bell, J.M.; Darlington, D.J.; Elston, G.R. Techniques for the physical modelling of the sonar image generation process. In Proceedings of the SEE International Conference on Physics in Signal and Image Processing (PSIP’99); SEE: Paris, France, 1999; pp. 66–72. [Google Scholar]
- Cerqueira, R.; Trocoli, T.; Neves, G.; Joyeux, S.; Albiez, J.; Oliveira, L. A novel GPU-based sonar simulator for real-time applications. Comput. Graph. 2017, 68, 66–76. [Google Scholar] [CrossRef]
- Cerqueira, R.; Trocoli, T.; Albiez, J.; Oliveira, L. A rasterized ray-tracer pipeline for real-time, multi-device sonar simulation. Graph. Model. 2020, 111, 101086. [Google Scholar] [CrossRef]
- Li, X. Research on Target Sample Generation and Classification of Side Scan Sonar Image. Master’s Thesis, Harbin Engineering University, Harbin, China, 2020. [Google Scholar]
- Li, B.; Huang, H.; Liu, J.; Li, Y. Optical image-to-underwater small target synthetic aperture sonar image translation algorithm based on improved CycleGAN. Acta Electron. Sin. 2021, 49, 1746–1753. [Google Scholar]
- Hu, Y.; Zhang, W.; Li, B.; Liu, J.; Huang, H. Self-perceptual generative adversarial network for synthetic aperture sonar image generation. In Proceedings of the Fourteenth International Conference on Graphics and Image Processing (ICGIP 2022); SPIE: Bellingham, WA, USA, 2023; Volume 12705, pp. 864–872. [Google Scholar]
- Du, Y.; Lin, W.; Zhong, W.; Yuan, Y. An effective approach for sonar image recognition with improved efficientdet and ensemble learning. J. Phys. Conf. Ser. 2022, 2258, 012038. [Google Scholar]
- Lei, C.; Wang, H.; Lei, J. Enhancing side-scan sonar image classification based on graph structure. IEEE Sens. J. 2024, 24, 24388–24404. [Google Scholar] [CrossRef]
- Ruan, F.; Dang, L.; Ge, Q.; Zhang, Q.; Qiao, B.; Zuo, X. Dualpath residual “shrinkage” network for side-scan sonar image classification. Comput. Intell. Neurosci. 2022, 2022, 6962838. [Google Scholar] [CrossRef] [PubMed]
- Tang, Y.; Li, H.; Zhang, W.; Bian, S.; Zhai, G.; Liu, M.; Zhang, X. Lightweight DETR-YOLO method for detecting shipwreck target in side-scan sonar. Syst. Eng. Electron. 2022, 44, 2427–2436. [Google Scholar]
- Steiniger, Y.; Groen, J.; Stoppe, J.; Kraus, D.; Meisen, T. A study on modern deep learning detection algorithms for automatic target recognition in sidescan sonar images. In Proceedings of the Meetings on Acoustics; Acoustical Society of America: Melville, NY, USA, 2021; Volume 44, p. 070010. [Google Scholar]
- Yu, Y.; Zhao, J.; Gong, Q.; Huang, C.; Zheng, G.; Ma, J. Realtime underwater maritime object detection in side-scan sonar images based on transformer-YOLOv5. Remote Sens. 2021, 13, 3555. [Google Scholar]
- Li, B.; Huang, H.; Liu, J.; Wei, L. Underwater Small Target Detection Method and System for Synthetic Aperture Sonar Image. CN202311062705.1, 5 January 2024. [Google Scholar]
- Yang, C.; Li, Y.; Jiang, L.; Huang, J. Foreground enhancement network for object detection in sonar images. Mach. Vis. Appl. 2023, 34, 56. [Google Scholar] [CrossRef]
- Tang, Y.; Wang, L.; Li, H.; Bian, S. Side-scan sonar underwater target segmentation using the BHPUNet. EURASIP J. Adv. Signal Process. 2023, 2023, 76. [Google Scholar] [CrossRef]
- Er, M.J.; Chen, J.; Zhang, Y. Marine robotics 4.0: Present and future of real-time detection techniques for underwater objects. In Industry 4.0—Perspectives and Applications; IntechOpen: Rijeka, Croatia, 2022; p. 8. [Google Scholar]
- Wang, J.; Shan, T.; Muthukumaran, C.; Osedach, T.; Englot, B. Deep learning for detection and tracking of underwater pipelines using multibeam imaging sonar. In Proceedings of the IEEE International Conference on Robotics and Automation Workshop, Montreal, QC, Canada, 20–24 May 2019; IEEE: New York, NY, USA, 2019. [Google Scholar]
- Siméoni, O.; Vo, H.V.; Seitzer, M.; Baldassarre, F. DINOv3. arXiv 2025, arXiv:2508.10104. [Google Scholar]
- Schubert, E.; Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 2017, 42, 1–21. [Google Scholar]
- Nuno, P.S.; Ricardo, M.; Gonçalo, S.T.; Lobo, V.; de Castro Neto, M. Side-scan sonar imaging data of underwater vehicles for mine detection. Data Br. 2024, 53, 110132. [Google Scholar]
- Glenn, J.; Ayush, C.; Qiu, J. Ultralytics YOLOv8, version 8.0.0; Ultralytics: Frederick, MD, USA. Available online: https://github.com/ultralytics/ultralytics (accessed on 14 April 2026).
- Ranjan, S.; Rahul, H.C.; Ajay, S.; Manoj, K. YOLO26: Key Architectural Enhancements and Performance Benchmarking for Real-Time Object Detection. arXiv 2025, arXiv:2509.25164. [Google Scholar]












| Parameter | High Frequency | Low Frequency |
|---|---|---|
| Frequency | 900 kHz | 450 kHz |
| Range | 75 m | 150 m |
| Horizontal beam width | 0.2° | 0.2° |
| Vertical beam width | 50° | 50° |
| Horizontal resolution | 1 cm | 2 cm |
| Vertical resolution | 0.07 m@20 m | 0.17 m@50 m |
| 0.17 m@50 m | 0.34 m@100 m | |
| 0.26 m@75 m | 0.51 m@150 m |
| Methods | Precision | Recall | mAP50 | mAP50–95 | Parameters | Time(ms) |
|---|---|---|---|---|---|---|
| YOLOv8 | 0.81920 | 0.85510 | 0.90701 | 0.56403 | 3.0 M | 3.9 |
| YOLOv8 + Transformer | 0.95707 | 0.88889 | 0.96115 | 0.59123 | 74.0 M | 20.3 |
| YOLOv8 + Guidance | 0.99186 | 1 | 0.995 | 0.69136 | 314.4 M | 32.4 |
| YOLOv8 + Transformer + Guidance(ours) | 0.99335 | 1 | 0.995 | 0.88613 | 387.4 M | 46.6 |
| Methods | Precision | Recall | mAP50 | mAP50–95 |
|---|---|---|---|---|
| YOLOv4 | 0.82 | 0.64 | 0.75 | - |
| YOLOv8 | 0.81920 | 0.85510 | 0.90701 | 0.56403 |
| YOLO26 | 0.99751 | 1 | 0.995 | 0.74311 |
| YOLOv8 + Transformer + Guidance (ours) | 0.99335 | 1 | 0.995 | 0.88613 |
| Methods | Precision | Recall | mAP50 | mAP50–95 | Parameters | Time(ms) |
|---|---|---|---|---|---|---|
| YOLOv8 | 0.74618 | 0.45332 | 0.50996 | 0.18642 | 3.0 M | 4.5 |
| YOLOv8 + Transformer | 0.68769 | 0.61538 | 0.67939 | 0.28468 | 74.0 M | 23.0 |
| YOLOv8 + Guidance | 0.67336 | 0.63546 | 0.65109 | 0.20411 | 314.4 M | 56.9 |
| YOLOv8 + Transformer + Guidance (ours) | 0.81269 | 0.69231 | 0.71465 | 0.24956 | 387.4 M | 70.0 |
| Methods | Precision | Recall | mAP50 | mAP50–95 |
|---|---|---|---|---|
| YOLOv8 | 0.74618 | 0.45332 | 0.50996 | 0.18642 |
| YOLO26 | 0.74263 | 0.46154 | 0.59807 | 0.20426 |
| YOLOv8 + Transformer + Guidance(ours) | 0.81269 | 0.69231 | 0.71465 | 0.24956 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jia, D.; Huang, Y.; Qiao, J.; Wang, Z.; Feng, H.; Yu, J. A Spatial Distribution Probability-Guided Detection Framework for Underwater Sonar Imagery. Remote Sens. 2026, 18, 1906. https://doi.org/10.3390/rs18121906
Jia D, Huang Y, Qiao J, Wang Z, Feng H, Yu J. A Spatial Distribution Probability-Guided Detection Framework for Underwater Sonar Imagery. Remote Sensing. 2026; 18(12):1906. https://doi.org/10.3390/rs18121906
Chicago/Turabian StyleJia, Dayu, Yan Huang, Jianan Qiao, Zhenyu Wang, Hao Feng, and Jiancheng Yu. 2026. "A Spatial Distribution Probability-Guided Detection Framework for Underwater Sonar Imagery" Remote Sensing 18, no. 12: 1906. https://doi.org/10.3390/rs18121906
APA StyleJia, D., Huang, Y., Qiao, J., Wang, Z., Feng, H., & Yu, J. (2026). A Spatial Distribution Probability-Guided Detection Framework for Underwater Sonar Imagery. Remote Sensing, 18(12), 1906. https://doi.org/10.3390/rs18121906

