SimNFND: A Forward-Looking Sonar Denoising Model Trained on Simulated Noise-Free and Noisy Data
Abstract
:1. Introduction
- A forward-looking sonar sonar data simulation method based on RGBD data is proposed, eliminating the cumbersome manual construction of virtual scenes required by other simulation methods. This approach lays the groundwork for the rapid generation of large volumes of high-quality forward-looking sonar sonar simulation data.
- A simulated ground truth-based supervised training method for forward-looking sonar sonar denoising models is introduced. This method addresses the challenges in performance evaluation and incomplete denoising due to the absence of completely noise-free ground truth in previous research.
- A forward-looking sonar denoising model is trained, leveraging simulated noise-free and noise data and the new loss function. This guides the model to understand the imaging characteristics of the sonar, significantly enhancing the model’s denoising capability and detail retention for FLS data.
2. Related Works
2.1. Forward-Looking Sonar Simulation
2.2. Denoising for Forward-Looking Sonar
3. Forward-Looking Sonar Simulation
3.1. Virtual Scene Construction Method
3.2. Sonar Image Rendering Method
3.3. Forward-Looking Sonar Noise Modeling
3.3.1. Background Noise
3.3.2. Range Spreading
3.3.3. Multipath Effects
3.3.4. Azimuthal Artifacts
3.3.5. Resampling
4. Forward-Looking Sonar Image Denoising Model
4.1. Problem Definition
4.2. Denoising Model
- Encoding part: The encoding part of the denoising model is constructed by initially applying a 3 × 3 convolution to embed the original image features, encoding the 3-channel image into 48-channel high-dimensional data. Then, transformer and downsampling modules are used for feature extraction and downsampling, forming a feature pyramid through three downsamplings to effectively understand and extract high-dimensional semantic information and low-dimensional texture information.
- Decoding part: The data are then decoded and upsampled using transformer and upsampling modules. Through three upsamplings and feature fusion with each layer of the encoding part, the fused data are initially encoded with double-layer convolutions before passing into the transformer module for semantic analysis. This allows for high-dimensional semantic information to effectively pass to and merge with low-dimensional texture information.
- Output part: Finally, in the output part, transformer and convolution modules are used to finalize the feature fusion, converting the feature dimensions back to the three dimensions of the image and outputting the result.
- The multi-head convolutional attention module first transforms the feature map into three sets of features: queries (Q), keys (K), and values (V) using three sets of double-layer convolutions. Next, the queries and keys are restructured to perform matrix multiplication, resulting in a matrix. After applying an activation function, a channel attention matrix is produced, which is multiplied with V, allowing the model to focus on features from different channels rather than traditional spatial attention, which typically focuses on pixel or region-level features. The model learns to assign weights to different channels, effectively expressing specific attributes or features of the image. This method is suitable for processing high-resolution images as it reduces computational demands and enhances the model’s focus on significant channel features without having to process the entire feature map.
- The gated convolutional feedforward network divides the features into gating features and transformed features through two sets of double-layer convolutions. The gating features, after passing through an activation function, are combined with the transformed features, thereby controlling the flow of information.
4.3. Loss Function
5. Experiment and Discussion
5.1. Experimental Equipment
5.2. Forward-Looking Sonar Image Denoising Dataset Generation
5.3. The Effectiveness Verification of the Proposed Denoising Method
- Shadow Areas: These are regions where sound waves do not reach due to field-of-view limitations or obstacles blocking the path. Ideally, there should be no return signal, and these areas should appear completely black. However, background noise often causes speckle noise in these regions.
- Illuminated Areas: These are regions where the sound waves emitted by the FLS reach. Ideally, the structural features of objects within this area should be clearly reflected.
- High-Reflectance Areas: These regions have high reflectance due to the structure or density of objects. High-intensity and numerous reflected beams often result in phenomena such as ghosting.
5.3.1. Qualitative Analysis
5.3.2. Quantitative Analysis
- Peak Signal-to-Noise Ratio (PSNR) is one of the most widely used evaluation metrics in the field of image processing. PSNR is an engineering term that expresses the ratio between the maximum possible power of a signal and the destructive noise power that affects its accuracy. PSNR is usually expressed in logarithmic decibel units (dB). It is defined as follows for a high-quality reference image y and a noisy or denoised image x:
- The Structural Similarity Index (SSIM) is another widely used image similarity evaluation metric. Unlike PSNR, which evaluates pixel-by-pixel differences, SSIM measures structural information in the images, which is closer to how humans perceive visual information. Therefore, SSIM is often considered to better reflect a human evaluation of image quality. The core concept of structural similarity is based on the highly structured nature of natural images, where there is strong correlation between neighboring pixels, carrying structural information about the objects in the scene. The human visual system tends to extract structural information when observing images. Therefore, when designing image quality assessment metrics to measure image distortion, measuring structural distortion is particularly important. Given two image signals Y and X, the SSIM is defined as follows:In Equation (10), represent luminance, contrast, and structure comparisons, respectively. The parameters adjust the relative importance of these three metrics, which are defined as follows:Here, denote the mean and standard deviation of x and y, respectively, and represents the covariance of x and y. are constants to maintain result stability. When computing the Structural Similarity Index (SSIM) between two images, a local window approach is typically used. Specifically, windows are selected to evaluate SSIM within each window, moving them pixel by pixel until the entire image is covered. The overall SSIM between the two images is the average of these local SSIM values. A higher SSIM value indicates a higher similarity between the two images. Generally, SSIM and PSNR trends are similar, meaning that high PSNR values often correspond to high SSIM values.
- Gradient Magnitude Similarity Deviation (GMSD) assesses image quality by calculating image gradients, and it is known for its high accuracy and low computational cost. Natural images contain various local structures whose gradient magnitudes degrade to different extents when distorted. Changes in these local structures are crucial for image quality assessment. GMSD evaluates local image quality by calculating gradient magnitude similarity in local regions and then computing the standard deviation of these assessments to obtain the overall quality assessment. This approach comprehensively considers detailed local information and overall global perception, making the evaluation results more precise and comprehensive. When dividing an image into N local regions, GMSD is defined as follows:GMS denotes gradient magnitude similarity, GMSM denotes its mean, denote the gradient magnitudes of the reference and degraded images, and are the Prewitt operators used to calculate image gradients.
5.3.3. Ablation Experiment
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Greene, A.; Rahman, A.F.; Kline, R.; Rahman, M.S. Side scan sonar: A cost-efficient alternative method for measuring seagrass cover in shallow environments. Estuar. Coast. Shelf Sci. 2018, 207, 250–258. [Google Scholar] [CrossRef]
- Arshad, M.R. Recent advancement in sensor technology for underwater applications. Indian J. Mar. Sci. 2009, 38, 267–273. [Google Scholar]
- Hurtós, N.; Palomeras, N.; Carrera, A.; Carreras, M. Autonomous detection, following and mapping of an underwater chain using sonar. Ocean Eng. 2017, 130, 336–350. [Google Scholar] [CrossRef]
- Henriksen, L. Real-time underwater object detection based on an electrically scanned high-resolution sonar. In Proceedings of the IEEE Symposium on Autonomous Underwater Vehicle Technology (AUV’94), Cambridge, MA, USA, 19–20 July 1994; pp. 99–104. [Google Scholar]
- Gu, J.-H.; Joe, H.-G.; Yu, S.-C. Development of image sonar simulator for underwater object recognition. In Proceedings of the 2013 OCEANS, San Diego, CA, USA, 10–13 June 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1–6. [Google Scholar]
- Kwak, S.; Ji, Y.; Yamashita, A.; Asama, H. Development of acoustic camera-imaging simulator based on novel model. In Proceedings of the 2015 IEEE 15th International Conference on Environment and Electrical Engineering (EEEIC), Rome, Italy, 10–13 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1719–1724. [Google Scholar]
- Cerqueira, R.; Trocoli, T.; Neves, G.; Oliveira, L.; Joyeux, S.; Albiez, J.; Center, R.I. Custom shader and 3D rendering for computationally efficient sonar simulation. In Proceedings of the 29th Conference on Graphics, Patterns and Images-SIBGRAPI, Sao Paulo, Brazil, 4–7 October 2016. [Google Scholar]
- Cerqueira, R.; Trocoli, T.; Albiez, J.; Oliveira, L. A rasterized ray-tracer pipeline for real-time, multi-device sonar simulation. Graph. Model. 2020, 111, 101086. [Google Scholar] [CrossRef]
- Potokar, E.; Lay, K.; Norman, K.; Benham, D.; Neilsen, T.B.; Kaess, M.; Mangelson, J.G. HoloOcean: Realistic sonar simulation. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 8450–8456. [Google Scholar]
- Zhengguo, S.; Chunhui, Z.; Jian, W. Application of multi-resolution analysis in sonar image denoising. J. Syst. Eng. Electron. 2008, 19, 1082–1089. [Google Scholar] [CrossRef]
- Isar, A.; Moga, S.; Isar, D. A new denoising system for SONAR images. EURASIP J. Image Video Process. 2009, 2009, 173841. [Google Scholar] [CrossRef]
- Wang, X.; Liu, A.; Zhang, Y.; Xue, F. Underwater acoustic target recognition: A combination of multi-dimensional fusion features and modified deep neural network. Remote Sens. 2019, 11, 1888. [Google Scholar] [CrossRef]
- Jin, Y.; Ku, B.; Ahn, J.; Kim, S.; Ko, H. Nonhomogeneous noise removal from side-scan sonar images using structural sparsity. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1215–1219. [Google Scholar] [CrossRef]
- Vishwakarma, A. Denoising and inpainting of sonar images using convolutional sparse representation. IEEE Trans. Instrum. Meas. 2023, 72, 5007709. [Google Scholar] [CrossRef]
- Cho, H.; Yu, S.-C. Real-time sonar image enhancement for AUV-based acoustic vision. Ocean Eng. 2015, 104, 568–579. [Google Scholar] [CrossRef]
- Kim, J.; Song, S.; Yu, S.-C. Denoising auto-encoder based image enhancement for high resolution sonar image. In Proceedings of the 2017 IEEE Underwater Technology (UT), Busan, Republic of Korea, 21–24 February 2017; pp. 1–5. [Google Scholar]
- Shen, P.; Zhang, L.; Wang, M.; Yin, G. Deeper super-resolution generative adversarial network with gradient penalty for sonar image enhancement. Multimed. Tools Appl. 2021, 80, 28087–28107. [Google Scholar] [CrossRef]
- Zhou, X.; Tian, K.; Zhou, Z.; Ning, B.; Wang, Y. SID-TGAN: A Transformer-Based Generative Adversarial Network for Sonar Image Despeckling. Remote Sens. 2023, 15, 5072. [Google Scholar] [CrossRef]
- Lu, Z.; Zhu, T.; Zhou, H.; Zhang, L.; Jia, C. An image enhancement method for side-scan sonar images based on multi-stage repairing image fusion. Electronics 2023, 12, 3553. [Google Scholar] [CrossRef]
- Abraham, D.A.; Lyons, A.P. Novel physical interpretations of k-distributed reverberation. IEEE J. Ocean. Eng. 2002, 27, 800–813. [Google Scholar] [CrossRef]
- Randall, Y.; Treibitz, T. Flsea: Underwater visual-inertial and stereo-vision forward-looking datasets. arXiv 2023, arXiv:2302.12772. [Google Scholar]
- Singh, D.; Valdenegro-Toro, M. The Marine Debris Dataset for Forward-Looking Sonar Semantic Segmentation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 3734–3742. [Google Scholar]
- Wang, Z.; Li, Z.; Teng, X.; Chen, D. LPMsDE: Multi-Scale Denoising and Enhancement Method Based on Laplacian Pyramid Framework for Forward-Looking Sonar Image. IEEE Access 2023, 11, 132942–132954. [Google Scholar] [CrossRef]
- Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. In Proceedings of the Sixth International Conference on Computer Vision, Washington, DC, USA, 4–7 January 1998; pp. 839–846. [Google Scholar]
- Dabov, K.; Foi, A.; Katkovnik, V. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
- Buades, A.; Coll, B.; Morel, J.-M. Non-local means denoising. Image Process. Line 2011, 1, 208–212. [Google Scholar] [CrossRef]
Component | Specification |
---|---|
System | Ubuntu 20.04 |
CPU | AMD RezenTM ThreaddripperTM PRO 5975WX |
GPU | NVIDIA GeForce RTX 4090 × 4 |
Memory | 256 G |
Video Memory | 96 G |
Method | Origin | Median | Gaussian | Bilateral | NLM | BM3D | Ours |
---|---|---|---|---|---|---|---|
PSNR ↑ | 17.64 | 17.76 | 17.76 | 17.90 | 17.73 | 16.65 | 20.77 |
SSIM ↑ | 0.68 | 0.69 | 0.69 | 0.69 | 0.69 | 0.69 | 0.79 |
GMSD ↓ | 0.22 | 0.19 | 0.20 | 0.19 | 0.21 | 0.19 | 0.18 |
Method | PSNR ↑ | SSIM ↑ | GMSD ↓ |
---|---|---|---|
OldData + L1loss | 17.89705692 | 0.744666177 | 0.194743096 |
SimNFND + L1loss | 20.16920049 | 0.7889269 | 0.188262191 |
SimNFND + Newloss | 20.77546045 | 0.795312975 | 0.180447812 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, T.; Zhang, T.; Yao, Y. SimNFND: A Forward-Looking Sonar Denoising Model Trained on Simulated Noise-Free and Noisy Data. Remote Sens. 2024, 16, 2815. https://doi.org/10.3390/rs16152815
Yang T, Zhang T, Yao Y. SimNFND: A Forward-Looking Sonar Denoising Model Trained on Simulated Noise-Free and Noisy Data. Remote Sensing. 2024; 16(15):2815. https://doi.org/10.3390/rs16152815
Chicago/Turabian StyleYang, Taihong, Tao Zhang, and Yiqing Yao. 2024. "SimNFND: A Forward-Looking Sonar Denoising Model Trained on Simulated Noise-Free and Noisy Data" Remote Sensing 16, no. 15: 2815. https://doi.org/10.3390/rs16152815
APA StyleYang, T., Zhang, T., & Yao, Y. (2024). SimNFND: A Forward-Looking Sonar Denoising Model Trained on Simulated Noise-Free and Noisy Data. Remote Sensing, 16(15), 2815. https://doi.org/10.3390/rs16152815