Application and Analysis of the MFF-YOLOv7 Model in Underwater Sonar Image Target Detection
Abstract
:1. Introduction
- The sonar system emits sound waves.
- The sound waves pass through the water, reflect off underwater targets, and return.
- The reflected echoes return to the sonar system.
- Images are formed through the complex processing of these echoes.
- The traditional YOLOv7 model has many limitations when dealing with high-noise and unclear sonar images, such as the complex underwater environment, various target sizes, numerous interfering information, low imaging resolution, and small and dense targets. These factors can easily lead to missed detections and false detections. To address these issues, we have designed the MFF-YOLOv7 model.
- The Multi-Scale Information Fusion Module (MIFM) has been introduced and implemented to enhance the YOLOv7 model. This module excels at integrating information from various scales, thereby improving the model’s ability to process features at different levels, particularly in complex underwater environments where target dimensions fluctuate. The MIFM demonstrates robust fusion capabilities, overcoming the constraints of conventional modules and effectively capturing characteristics of targets with diverse sizes. Additionally, the MIFM can dynamically adjust its focus on targets of varying scales based on the actual underwater conditions, enabling intelligent resource allocation. This mechanism substantially enhances the precision of sonar image target identification while minimizing instances of missed and false detections.
- Rigorous comparative evaluations were conducted on the real-world sonar image datasets URPC, SCTD, and UATD. The results indicate that the MFF-YOLOv7 model performs exceptionally well in these two datasets. It demonstrates good performance on specific datasets, exhibits strong generalization ability, and can adapt to sonar image recognition tasks in different scenarios.
2. Background
2.1. YOLOv7
2.2. Multi-Scale Information Fusion Module (MIFM)
2.3. RFAConv
2.4. Spatial and Channel Synergistic Attention (SCSA) Module
- Spatial and Channel Decomposition: We break down the given input in terms of the height and width dimensions. Global average pooling is then applied to each of these dimensions, which gives rise to two one-way 1D sequence structures, namely and . In order to capture different spatial distributions as well as contextual relationships, we divide the feature set into sub-features that are of the same size and are independent of each other. These sub-features are named and , and each sub-feature has a channel count of . In this paper, we have set the default value as . The procedure for decomposing into sub-features is detailed as follows:
- 2.
- Efficient Convolutional Approach: Implement separable one-dimensional convolutions with filter sizes of 3, 5, 7, and 9 across the four sub-features to detect various semantic spatial patterns. Concurrently, employ efficient shared convolutions for alignment to tackle the restricted receptive field issue resulting from splitting features into H and W dimensions and utilizing 1D convolutions. The process of obtaining diverse semantic spatial data are defined as where a denotes the b-th sub-feature.
- 3.
- Computing the Spatial Attention Map: Aggregate different semantic sub-features, normalize them using group normalization (GN) of K groups, and then generate spatial attention through the Sigmoid activation function. The formula for calculating the output feature is:
2.5. MFF-YOLOv7
3. Experimental Design and Experimental Analysis
3.1. Experimental Environment
3.2. Experimental Indicators
3.3. Experimental Results and Analysis of the URPC Dataset
3.3.1. Attention Mechanism-Contrast Trials
3.3.2. Ablation Experiment
3.3.3. Contrasting Experiments with the Other Algorithms
3.4. Experimental Results and Analysis of the SCTD
Model Generalizability Experiment
3.5. Generalization Experiment
3.5.1. UATD Datasets
3.5.2. Experimental Results and Analysis
4. Limitations of the Proposed Method
4.1. Data Requirements and Adaptability Challenges
4.2. Noise Processing and Computational Efficiency
4.3. Model Complexity and Generalization Ability
4.4. Dataset-Specific Performance and Improvement Directions
5. Conclusions
- We have introduced a series of new modules to enhance the model’s performance. The Multi-Scale Information Fusion Module (MIFM) has replaced the SPPCSPC in YOLOv7. It can integrate multi-scale information better, thereby strengthening the model’s ability to handle features of different scales. This improvement has effectively addressed the issue that due to the significant differences in the sizes of underwater targets, traditional modules have difficulties processing features of different scales. It has significantly improved the accuracy of sonar image target recognition accuracy and reduced the occurrences of missed and false detections.
- The RFAConv has been introduced to replace the Conv in the CBS of ELAN. It boasts better feature extraction capabilities and is more adaptable to sonar images’ high-noise and unclear characteristics. As a result, it has significantly enhanced the model’s ability to learn and represent the features of sonar images, enabling it to extract helpful target features from noise more effectively.
- Moreover, the SCSA mechanism has been introduced at three connection positions between the backbone network and the head. It helps the model focus more on important feature information and reduces the interference of irrelevant information. The introduction of the SCSA mechanism at three connection positions between the backbone network and the head further improves the recognition accuracy and robustness of the model, allowing it to focus on target features more accurately in complex underwater environments.
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
MFF-YOLOv7 | Multi-Gradient Feature Fusion YOLOv7 model |
MIFM | Multi-Scale Information Fusion Module |
SPPCSPC | Spatial Pyramid Pooling Channel Shuffling and Pixel-level Convolution |
CBS | Convolution-Batch Normalization-SiLU activation function |
ELAN | Efficient Layer Aggregation Network |
RFAConv | Recurrent Feature Aggregation Convolution |
SCSA | Spatial and Channel Synergistic Attention |
UATD | Underwater Acoustic Target Detection Dataset |
SCTD | Smaller Common Sonar Target Detection Dataset |
URPC | The Underwater Optical Target Detection Intelligent Algorithm Competition 2021 Dataset |
GCC-Net | Grouped Channel Composition Network. |
YOLO | You Only Look Once |
Conv | Convolution |
BN | Batch Normalization |
SiLU | Sigmoid Linear Unit |
RFA | Receptive Field Attention |
SMSA | Shared Multi-Semantic Spatial Attention |
CBS | Convolution-Batch Normalization-SiLU activation function |
PCSA | Progressive Channel Self-Attention |
IOU | Intersection Over the Union |
CBAM | Convolutional Block Attention Module |
ECA | Efficient Channel Attention |
SE | Squeeze-and-Excitation |
SimAM | A Simple, Parameter-Free Attention Module for Convolutional Neural Networks |
Biformer | Bidirectional Interactive Attention Transformer |
Faster R-CNN | Faster Region-based Convolutional Neural Networks |
References
- Ahmad-Kamil, E.; Zakaria, S.Z.S.; Othman, M.; Chen, F.L.; Deraman, M.Y. Enabling marine conservation through education: Insights from the Malaysian Nature Society. J. Clean. Prod. 2024, 435, 140554. [Google Scholar] [CrossRef]
- Khoo, L.S.; Hasmi, A.H.; Mahmood, M.S.; Vanezis, P. Underwater DVI: Simple fingerprint technique for positive identification. Forensic Sci. Int. 2016, 266, e4–e9. [Google Scholar] [CrossRef] [PubMed]
- Fan, X.; Lu, L.; Shi, P.; Zhang, X. A novel sonar target detection and classification algorithm. Multimed. Tools Appl. 2022, 81, 10091–10106. [Google Scholar] [CrossRef]
- Yin, Z.; Zhang, S.; Sun, R.; Ding, Y.; Guo, Y. Sonar image target detection based on deep learning. In Proceedings of the 2023 International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballar, India, 29–30 April 2023. [Google Scholar]
- Wang, X.; Yuen, K.F.; Wong, Y.D.; Li, K.X. How can the maritime industry meet Sustainable Development Goals? An analysis of sustainability reports from the social entrepreneurship perspective. Transp. Res. Part D Transp. Environ. 2020, 78, 102173. [Google Scholar] [CrossRef]
- Vijaya Kumar, D.T.T.; Mahammad Shafi, R. A fast feature selection technique for real-time face detection using hybrid optimized region based convolutional neural network. Multimed. Tools Appl. 2022, 82, 1–14. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Huang, C.; Zhao, J.; Zhang, H.; Yu, Y. Seg2Sonar: A Full-Class Sample Synthesis Method Applied to Underwater Sonar Image Target Detection, Recognition, and Segmentation Tasks. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5909319. [Google Scholar] [CrossRef]
- Zhou, T.; Si, J.; Wang, L.; Xu, C.; Yu, X. Automatic detection of underwater small targets using forward-looking sonar images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4207912. [Google Scholar] [CrossRef]
- Xi, J.; Ye, X.; Li, C. Sonar image target detection based on style transfer learning and random shape of noise under zero shot target. Remote Sens. 2022, 14, 6260. [Google Scholar] [CrossRef]
- Villon, S.; Mouillot, D.; Chaumont, M.; Darling, E.S.; Subsol, G.; Claverie, T.; Villéger, S. A deep learning method for accurate and fast identification of coral reef fishes in underwater images. Ecol. Inform. 2018, 48, 238–244. [Google Scholar] [CrossRef]
- Guo, X.; Zhao, X.; Liu, Y.; Li, D. Underwater sea cucumber identification via deep residual networks. Inf. Process. Agric. 2019, 6, 307–315. [Google Scholar] [CrossRef]
- Dai, L.; Liu, H.; Song, P.; Liu, M. A gated cross-domain collaborative network for underwater object detection. Pattern Recognit. 2024, 149, 110222. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for realtime object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Al Muksit, A.; Hasan, F.; Emon, M.F.H.B.; Haque, M.R.; Anwary, A.R.; Shatabda, S. YOLO-Fish: A robust fish detection model to detect fish in realistic underwater environment. Ecol. Inform. 2022, 72, 101847. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, B.; Li, Y.; He, J.; Li, Y. UnitModule: A light-weight joint image enhancement module for underwater object detection. Pattern Recognit. 2024, 151, 110435. [Google Scholar] [CrossRef]
- Lei, F.; Tang, F.; Li, S. Underwater target detection algorithm based on improved YOLOv5. J. Mar. Sci. Eng. 2022, 10, 310. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, S.; Wu, K.; Ning, M.; Chen, H.; Zhang, P. SCTD1.0: Common Sonar Target Detection Dataset. Ship Sci. Technol. 2021, 43, 54–58. [Google Scholar]
- Dong, J.; Yang, M.; Xie, Z.; Cai, L. Overview of Underwater Image Object Detection Dataset and Detection Algorithms. J. Ocean. Technol. 2022, 41, 60–72. [Google Scholar]
- Xie, K.; Yang, J.; Qiu, K. A dataset with multibeam forward-looking sonar for underwater object detection. Sci. Data 2022, 9, 739. [Google Scholar] [CrossRef] [PubMed]
- Woo, S.; Park, J.; Lee, J.; Kweon, I. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv 2020, arXiv:1910.03151v4. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. arXiv 2019, arXiv:1709.01507v4. [Google Scholar]
- Yang, L.; Zhang, R.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R. BiFormer: Vision Transformer with Bi-Level Routing Attention. arXiv 2023, arXiv:2303.08810v1. [Google Scholar]
- Si, Y.; Xu, H.; Zhu, X.; Zhang, W.; Dong, Y.; Chen, Y.; Li, H. SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention. arXiv 2024, arXiv:2407.05128v1. [Google Scholar]
- Wu, W.; Luo, X. Sonar Object Detection Based on Global Context Feature Fusion and Extraction. In Proceedings of the 2024 12th International Conference on Intelligent Control and Information Processing (ICICIP), Nanjing, China, 8–10 March 2024; pp. 195–202. [Google Scholar]
- Mehmood, S.; Irfan Muhammad, H.U.H.; Ali, S. Underwater Object Detection from Sonar Images Using Transfer Learning. In Proceedings of the 2024 21st International Bhurban Conference on Applied Sciences Technology (IBCAST), Murree, Pakistan, 20–23 August 2024; pp. 1–2. [Google Scholar]
- Xue, G.; Zhang, J.; Wang, K.; Ma, D.; Weichen, P.; Hu, S.; Yang, Z.; Liu, T. Application of YOLOv7-tiny in the detection of steel surface defects. In Proceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering, Xi’an, China, 26–28 January 2024; pp. 718–723. [Google Scholar] [CrossRef]
- Glenn, J. Yolov8. Available online: https://github.com/ultralytics/ultralytics/tree/main (accessed on 16 November 2024).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Glenn, J. Yolov5 Release v7.0. Available online: https://github.com/ultralytics/yolov5/tree/v7.0 (accessed on 16 November 2024).
- Wang, Z.; Guo, J.; Zeng, L.; Zhang, C.; Wang, B. MLFFNet: Multilevel feature fusion network for object detection in sonar images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5119119. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Hou, J. Underwater Detection using Forward-Looking Sonar Images based on Deformable Convolution YOLOv3. In Proceedings of the 2024 4th International Conference on Neural Networks, Information and Communication (NNICE), Gaungzhou, China, 19–21 January 2024; pp. 490–493. [Google Scholar]
- Pebrianto, W.; Mudjirahardjo, P.; Pramono, S.H.; Rahmadwati; Setyawan, R.A. YOLOv3 with Spatial Pyramid Pooling for Object Detection with Unmanned Aerial Vehicles. arXiv 2023, arXiv:2305.12344. [Google Scholar]
Method | APechinus | APstarfish | APholothurian | APscallop | mAP@0.5:0.95 | |||
---|---|---|---|---|---|---|---|---|
YOLOv7 [20] | 85.0% | 89.6% | 78.5% | 35.4% | 72.1% | 42.6% | 82.1 | 66.9 |
YOLOv7-CBAM [28] | 85.3% | 89.8% | 79.3% | 34.5% | 72.2% | 42.6% | 82.0 | 67.1 |
YOLOv7-ECA [29] | 85.0% | 89.5% | 79.5% | 36.2% | 72.6% | 42.8% | 82.1 | 68.1 |
YOLOv7-SE [30] | 85.5% | 89.4% | 78.6% | 35.7% | 72.3% | 42.7% | 83.0 | 66.4 |
YOLOv7-SimAM [31] | 85.2% | 89.9% | 79.2% | 36.2% | 72.7% | 42.8% | 82.0 | 67.1 |
YOLOv7-Biformer [32] | 85.3% | 90.1% | 79.8% | 36.3% | 72.9% | 42.9% | 81.4 | 66.5 |
YOLOv7-SCSA [33] | 86.0% | 90.8% | 84.7% | 37.7% | 74.8% | 43.5% | 84.9 | 68.9 |
Model | RFAConv | SCSA | MIFM | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|---|---|---|
YOLOv7 | 82.1% | 66.9% | 72.1% | 42.6% | |||
√ | 85.1% | 67.2% | 73.1% | 42.7% | |||
√ | √ | 88.2% | 69.3% | 75.9% | 44.3% | ||
√ | √ | √ | 89.9% | 73.0% | 79.1% | 45.0% |
Model | Precision (%) | Recall (%) | mAp@0.5 (%) | mAp@0.5:0.95 (%) |
---|---|---|---|---|
YOLOv5s [34] | 80.4 | 65.5 | 68.9 | 38.1 |
YOLOv5m [35] | 82.6 | 65.2 | 69.5 | 41.2 |
YOLOv7 [20] | 82.1 | 66.9 | 72.1 | 42.6 |
YOLOv7-Tiny [36] | 82.7 | 63.6 | 69.6 | 36.9 |
YOLOv7-SDBB [20] | 82.0 | 66.8 | 72.4 | 43.4 |
YOLOv8n [37] | 80.1 | 64.8 | 68.6 | 38.6 |
YOLOv9 [21] | 82.1 | 64.8 | 71.0 | 42.0 |
MFF-YOLOv7 | 89.9 | 73.0 | 79.1 | 45.0 |
Method | APship | APplane | APhuman | mAp@0.5 | mAp@0.5:0.95 |
---|---|---|---|---|---|
SSD [38] | 86.2% | 86.8% | 86.1% | 86.4% | 43.0% |
Faster R-CNN [17] | 88.2% | 86.8% | 87.3% | 87.5% | 43.8% |
YOLOv3 [17] | 87.3% | 89.0% | 86.1% | 87.5% | 47.8% |
YOLOv4 [18] | 89.2% | 87.9% | 87.3% | 88.2% | 49.6% |
YOLOv5 [39] | 90.2% | 90.1% | 89.9% | 90.1% | 56.6% |
YOLOv7 [20] | 89.2% | 89.0% | 89.9% | 89.3% | 54.0% |
YOLOv8 [37] | 89.2% | 90.1% | 89.9% | 89.7% | 54.7% |
MFF-YOLO v7 | 96.3% | 99.9% | 99.9% | 98.7% | 63.2% |
Model | Precision (%) | Recall (%) | mAp@0.5 (%) |
---|---|---|---|
RetinaNet [40] | 63.2 | 62.4 | 62.5 |
Faster R-CNN [17] | 74.3 | 75.3 | 75.1 |
YOLOv3 [17] | 85.4 | 82.1 | 79.1 |
SDD Net [41] | 81.3 | 79.7 | 80.2 |
YOLO-DCN [42] | 86.2 | 83.4 | 80.5 |
YOLOv3SPP [43] | 91.1 | 93.0 | 92.2 |
YOLOv8 [37] | 85.4 | 81.0 | 83.3 |
MFF-YOLOv7 | 91.2 | 88.9 | 87.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, K.; Liang, H.; Zhao, H.; Chen, Z.; Xie, G.; Li, L.; Lu, J.; Long, Z. Application and Analysis of the MFF-YOLOv7 Model in Underwater Sonar Image Target Detection. J. Mar. Sci. Eng. 2024, 12, 2326. https://doi.org/10.3390/jmse12122326
Zheng K, Liang H, Zhao H, Chen Z, Xie G, Li L, Lu J, Long Z. Application and Analysis of the MFF-YOLOv7 Model in Underwater Sonar Image Target Detection. Journal of Marine Science and Engineering. 2024; 12(12):2326. https://doi.org/10.3390/jmse12122326
Chicago/Turabian StyleZheng, Kun, Haoshan Liang, Hongwei Zhao, Zhe Chen, Guohao Xie, Liguo Li, Jinghua Lu, and Zhangda Long. 2024. "Application and Analysis of the MFF-YOLOv7 Model in Underwater Sonar Image Target Detection" Journal of Marine Science and Engineering 12, no. 12: 2326. https://doi.org/10.3390/jmse12122326
APA StyleZheng, K., Liang, H., Zhao, H., Chen, Z., Xie, G., Li, L., Lu, J., & Long, Z. (2024). Application and Analysis of the MFF-YOLOv7 Model in Underwater Sonar Image Target Detection. Journal of Marine Science and Engineering, 12(12), 2326. https://doi.org/10.3390/jmse12122326