Research on Rapid Detection of Underwater Targets Based on Global Differential Model Compression
Abstract
:1. Introduction
- A lightweight model named YOLO-TN is designed for underwater target recognition. Experimental results demonstrate that the YOLO-TN model achieves an mAP@0.5 of 0.5425 after extreme compression with an input size of 416 × 416. This represents a reduction of less than 1% compared to YOLO-V5s. Moreover, the inference FPS reaches 28.8 on the CPU, effectively realizing high accuracy and lightweight characteristics for the underwater target recognition model. This ensures the feasibility of offline deployment and real-time inference of the model.
- Through the construction and processing methods of a realistic underwater environment dataset, issues such as imbalanced target quantities and a singular image environment in existing underwater datasets are mitigated. Additionally, we analyze challenges present in underwater datasets, including inadequate underwater illumination, image degradation, and blurring. For different real underwater environments, this paper proposes preprocessing techniques, including dark channel dehazing, underwater image color restoration, and automatic color balance algorithms. These methods effectively enhance the quality of underwater datasets, improving the model’s generalization.
- Deploying the YOLO-TN model to the Jetson TX2 embedded platform using the MNN inference engine, a real-time underwater target offline recognition system is established. Test results indicate that the FPS of the YOLO-TN network ensures real-time performance. The pruned YOLO-TN achieves an FPS of 28.6 and 20.4 at input sizes of 320 × 320 and 416 × 416, respectively.
2. Model Construction
2.1. Loss Function
2.2. Construction of the Student Model
2.3. Parameter Pruning
2.4. Evaluation Criteria
3. Model Validation
- -
- CPU: Intel(R) Xeon(R) Silver 4110;
- -
- Memory: 36 GB DDR4;
- -
- GPU: Nvidia 2080TI.
- -
- Ubuntu 18.04;
- -
- PyTorch 1.9.0;
- -
- CUDA 11.2.
4. Engineering Experiment
4.1. Evaluation Criteria
4.2. Data Preprocessing
4.3. Model Application
5. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, S.; Zhang, M.; Song, W.; Mei, H.; He, Q.; Liotta, A. A systematic review and analysis of deep learning-based underwater object detection. Neurocomputing 2023, 527, 204–232. [Google Scholar] [CrossRef]
- Zhang, R.; Li, S.; Ji, G.; Zhao, X.; Li, J.; Pan, M. Survey on Deep Learning-Based Marine Object Detection. J. Adv. Transp. 2021, 2021, 5808206. [Google Scholar] [CrossRef]
- Dakhil, R.A.; Khayeat, A.R.H. Review on deep learning techniques for marine object recognition: Architectures and algorithms. In Proceedings of the CS & IT-CSCP 2022, Vancouver, BC, Canada, 26–27 February 2022; pp. 49–63. [Google Scholar]
- Myers, V.; Fawcett, J. A template matching procedure for automatic target recognition in synthetic aperture sonar imagery. IEEE Signal Process. Lett. 2010, 17, 683–686. [Google Scholar] [CrossRef]
- Barngrover, C.M. Automated Detection of Mine-like Objects in Side Scan Sonar Imagery; University of California: San Diego, CA, USA, 2014. [Google Scholar]
- Abu, A.; Diamant, R. A statistically-based method for the detection of underwater objects in sonar imagery. IEEE Sensors J. 2019, 19, 6858–6871. [Google Scholar] [CrossRef]
- Kim, B.; Yu, S. Imaging sonar based real-time underwater object detection utilizing adaboost method. In Proceedings of the 2017 IEEE Underwater Technology (UT), Busan, Republic of Korea, 21–24 February 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar]
- Chiang, J.; Chen, Y. Underwater image enhancement by wavelength compensation and dehazing. IEEE Trans. Image Process. 2012, 21, 1756–1769. [Google Scholar] [CrossRef]
- Akkaynak, D.; Treibitz, T. A revised underwater image formation model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Lu, H.; Li, Y.; Zhang, Y.; Chen, M.; Kim, S.S. Underwater Optical Image Processing: A Comprehensive Review. Mobile Netw. Appl. 2017, 22, 1204–1211. [Google Scholar] [CrossRef]
- Anwar, S.; Li, C. Diving deeper into underwater image enhancement: A survey. Signal Process. Image Commun. 2020, 89, 115978. [Google Scholar] [CrossRef]
- Liu, C.; Wang, Z.; Wang, S.; Tang, T.; Tao, Y.; Yang, C.; Li, H.; Liu, X.; Fan, X. A New Dataset, Poisson GAN and AquaNet for Underwater Object Grabbing. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5. [Google Scholar] [CrossRef]
- Yang, J.; Han, P.; Li, X. Equilibrating the impact of fluid scattering attenuation on underwater optical imaging via adaptive parameter learning. Opt. Express 2024, 32, 23333–23346. [Google Scholar] [CrossRef]
- Oscar, B.; Edmunds, P.J.; Kline, D.I.; Mitchell, B.G.; Kriegman, D. Automated annotation of coral reef survey images. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1170–1177. [Google Scholar]
- Palazzo, S.; Kavasidis, I.; Spampinato, C. Covariance based modeling of underwater scenes for fish detection. In Proceedings of the 20th IEEE International Conference on Image Processing, Melbourne, Australia, 15–18 September 2013; pp. 1481–1485. [Google Scholar]
- Ravanbakhsh, M.; Shortis, M.R.; Shafait, F.; Mian, A.; Harvey, E.S.; Seager, J.W. Automated fish detection in underwater images using shape-based level sets. Photogramm. Rec. 2015, 30, 46–62. [Google Scholar] [CrossRef]
- Hou, G.-J.; Luan, X.; Song, D.-L.; Ma, X.-Y. Underwater man-made object recognition on the basis of color and shape features. J. Coast. Res. 2015, 32, 1135–1141. [Google Scholar] [CrossRef]
- Vasamsetti, S.; Setia, S.; Mittal, N.; Sardana, H.K.; Babbar, G. Automatic underwater moving object detection using multi-feature integration framework in complex backgrounds. IET Comput. Vis. 2018, 12, 770–778. [Google Scholar] [CrossRef]
- Wang, Q.; Zeng, X. Deep learning methods and their applications in underwater targets recognition. In Proceedings of the 2015 Academic Conference of the Hydroacoustics Branch of the Acoustical Society of China, Hydroacoustics Branch of the Acoustical Society of China, Harrogate, UK, 15 October 2015; p. 3. Available online: https://kns.cnki.net/kcms2/article/abstract?v=zcLOVLBHd2yuc0K9K0lIzqLOnyKffA5JXrD7S_1b3A_AZXUYyZdd4zqOJi6uoXZuBegPu97bvG__mRmWiZ1qiES5LkrfFdAaLnkYK8_GA9f1_xAZ0NOvmf3X2L4wqsnvfrs4_PiwGj1e4kfoQ9LpLw==&uniplatform=NZKPT&language=CHS (accessed on 20 December 2023).
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R. Faster r-cnn: Towards realtime object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollar, P. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-Cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Beery, S.; Wu, G.; Rathod, V. Context r-cnn: Long term temporal context for per-camera object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13075–13085. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Juan, R.; Terven, D.M.C.E. A Comprehensive Review of YOLO: From YOLOv1 to YOLOv8 and Beyond. arXiv 2023. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Huynh-Thu, Q.; Ghanbari, M. Perceived Quality of the Variation of the Video Temporal Resolution for Low Bit Rate Coding. Available online: https://www.researchgate.net/publication/266575823/_Perceived_quality_of_the_variation_of_the_video_temporal_resolution_for_low_bit_rate_coding (accessed on 20 December 2023).
- Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, Cambridge, MA, USA, 7–12 December 2015; MIT Press: Cambridge, MA, USA, 2015; Volume 1, pp. 1135–1143. [Google Scholar]
- Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning structured sparsity in deep neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Barcelona Spain, 5–10 December 2016; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 2082–2090. [Google Scholar]
- Lin, M.; Ji, R.; Wang, Y.; Zhang, Y. HRank: Filter Pruning Using High-Rank Feature Map. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1526–1535. [Google Scholar] [CrossRef]
- Gao, S.; Huang, F.; Cai, W.; Huang, H. Network Pruning via Performance Maximization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 9266–9276. [Google Scholar] [CrossRef]
- Gholami, A.; Kim, S.; Dong, Z.; Yao, Z.; Mahoney, M.W.; Keutzer, K. A Survey of Quantization Methods for Efficient Neural Network Inference. arXiv 2021. [Google Scholar] [CrossRef]
- Faraone, J.; Fraser, N.; Blott, M.; Leong, H.W. SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4300–4309. [Google Scholar] [CrossRef]
- Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv 2016. [Google Scholar] [CrossRef]
- Chen, P.; Liu, J.; Zhuang, B.; Tan, M.; Shen, C. AQD: Towards Accurate Quantized Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 21–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 104–113. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
- Wang, X.; Kan, M.; Shan, S.; Chen, X. Fully Learnable Group Convolution for Acceleration of Deep Neural Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 9041–9050. [Google Scholar] [CrossRef]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Buciluǎ, C.; Caruana, R.; Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; ACM: New York, NY, USA; pp. 535–541. [Google Scholar] [CrossRef]
- Zagoruyko, S.; Komodakis, N. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. arXiv 2017. [Google Scholar] [CrossRef]
- Heo, B.; Lee, M.; Yun, S.; Choi, J.Y. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons. AAAI 2019, 33, 3779–3787. [Google Scholar] [CrossRef]
- Peng, B.; Jin, X.; Liu, J.; Zhou, S.; Wu, Y.; Liu, Y.; Li, D.; Zhang, Z. Correlation Congruence for Knowledge Distillation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 5006–5015. [Google Scholar] [CrossRef]
- Cho, J.H.; Hariharan, B. On the Efficacy of Knowledge Distillation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4793–4801. [Google Scholar] [CrossRef]
- Mirzadeh, S.I.; Farajtabar, M.; Li, A.; Levine, N.; Matsukawa, A.; Ghasemzadeh, H. Improved Knowledge Distillation via Teacher Assistant. AAAI 2020, 34, 5191–5198. [Google Scholar] [CrossRef]
- Liu, Y.; Jia, X.; Tan, M.; Vemulapalli, R.; Zhu, Y.; Green, B.; Wang, X. Search to Distill: Pearls Are Everywhere but Not the Eyes. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 7536–7545. [Google Scholar] [CrossRef]
- Shen, S.H.; Li, Y.L.; Qiang, Y.K.; Xue, R.L.; Jun, W.L. Research on Compression of Teacher Guidance Network Use Global Differential Computing Neural Architecture Search. In Proceedings of the 2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 27–30 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 526–531. [Google Scholar] [CrossRef]
- Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable Architecture Search. arXiv 2019. [Google Scholar] [CrossRef]
- He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar]
- Berman, D.; Treibitz, T.; Avidan, S. Diving into haze-lines: Color restoration of underwater images. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 4–7 September 2017; Volume 1. [Google Scholar]
- Getreuer, P. Automatic color enhancement (ACE) and its fast implementation. Image Process. Line 2012, 2, 266–277. [Google Scholar] [CrossRef]
Model | Cell Number | Initial Channel | mAP@0.5 (Undistilled/Distilled) | Parameter (M) | FPS (GPU/CPU) | FLOPs (G) |
---|---|---|---|---|---|---|
YOLO-TN(a) | 10 | 8 | 0.5038/0.5205 | 2.8704 | 112.7/9.8 | 7.8 |
YOLO-TN(b) | 10 | 16 | 0.5312/0.5441 | 3.0516 | 109.4/7.7 | 9.1 |
YOLO-TN(c) | 7 | 16 | 0.5326/0.5437 | 1.2083 | 134.5/8.9 | 3.9 |
YOLO-TN(d) | 5 | 16 | 0.5355/0.5471 | 0.9481 | 162.7/10.2 | 3.5 |
YOLO-TN(e) | 4 | 16 | 0.5384/0.5592 | 0.8401 | 176.3/14.1 | 3.3 |
YOLO-V5s | - | - | 0.5495/- | 7.2 | 178.9/8.3 | 16.5 |
Model | Input Size | mAP@0.5 | FPS (GPU/CPU) | FLOPs (G) |
---|---|---|---|---|
YOLO-TN-640 | 640 × 640 | 0.5592 | 176.8/17.2 | 3.3 |
YOLO-TN-416 | 416 × 416 | 0.5425 | 176.9/28.8 | 2.8 |
YOLO-TN-320 | 320 × 320 | 0.5101 | 177.6/38.4 | 2.8 |
Model | Input Size | FPS (CPU) |
---|---|---|
Pruned YOLO-TN-640 | 640 × 640 | 10.8 |
Pruned YOLO-TN-416 | 416 × 416 | 20.4 |
Pruned YOLO-TN-320 | 320 × 320 | 28.6 |
Unpruned YOLO-TN | 640 × 640 | 8.9 |
YOLO-V5s | 640 × 640 | 2.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, W.; Li, Y.; Li, R.; Shen, H.; Li, W.; Yue, K. Research on Rapid Detection of Underwater Targets Based on Global Differential Model Compression. J. Mar. Sci. Eng. 2024, 12, 1760. https://doi.org/10.3390/jmse12101760
Li W, Li Y, Li R, Shen H, Li W, Yue K. Research on Rapid Detection of Underwater Targets Based on Global Differential Model Compression. Journal of Marine Science and Engineering. 2024; 12(10):1760. https://doi.org/10.3390/jmse12101760
Chicago/Turabian StyleLi, Weishan, Yilin Li, Ruixue Li, Haozhe Shen, Wenjun Li, and Keqiang Yue. 2024. "Research on Rapid Detection of Underwater Targets Based on Global Differential Model Compression" Journal of Marine Science and Engineering 12, no. 10: 1760. https://doi.org/10.3390/jmse12101760
APA StyleLi, W., Li, Y., Li, R., Shen, H., Li, W., & Yue, K. (2024). Research on Rapid Detection of Underwater Targets Based on Global Differential Model Compression. Journal of Marine Science and Engineering, 12(10), 1760. https://doi.org/10.3390/jmse12101760