RT-Seg: A Real-Time Semantic Segmentation Network for Side-Scan Sonar Images
Abstract
:1. Introduction
2. Related Work
2.1. SSS Images Segmentation
2.2. Semantic Segmentation
2.3. Real-Time Semantic Segmentation
3. Network Architecture
3.1. Encoder Architecture
3.2. Decoder Architecture
4. Implementation Details
4.1. Patch-Wise Strategy and Datasets
4.2. Details of Training
4.2.1. Loss Function
4.2.2. Optimization
4.3. Evaluation Metrics
5. Experimental Results and Analysis
5.1. Qualitative Results and Analysis
5.2. Quantitative Results and Analysis
5.3. Inference Time
5.4. Hardware Requirements
5.5. Real-Time Process and Analysis
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Bryant, R. Side scan sonar for hydrography: An evaluation by the Canadian hydrographic service. Int. Hydrogr. Rev. 2015, 52, 43–56. [Google Scholar]
- Bucci, G. Remote Sensing and Geo-Archaeological Data: Inland Water Studies for the Conservation of Underwater Cultural Heritage in the Ferrara District, Italy. Remote Sens. 2018, 10, 380. [Google Scholar] [CrossRef]
- Healy, C.A.; Schultz, J.J.; Parker, K.; Lowers, B. Detecting submerged bodies: Controlled research using side-scan sonar to detect submerged proxy cadavers. J. Forensic Sci. 2015, 60, 743–752. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.W.; Chen, E.; Guo, J. Efficient seafloor classification and submarine cable route design using an autonomous underwater vehicle. IEEE J. Ocean. Eng. 2017, 43, 7–18. [Google Scholar] [CrossRef]
- Fallon, M.F.; Kaess, M.; Johannsson, H.; Leonard, J.J. Leonard. Efficient auv navigation fusing acoustic ranging and side-scan sonar. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 2398–2405. [Google Scholar]
- Petrich, J.; Brown, M.F.; Pentzer, J.L.; Sustersic, J.P. Side scan sonar based self-localization for small Autonomous Underwater Vehicles. IEEE J. Ocean. Eng. 2018, 161, 221–226. [Google Scholar] [CrossRef]
- Chabane, A.N.; Islam, N.; Zerr, B. Incremental clustering of sonar images using self-organizing maps combined with fuzzy adaptive resonance theory. Ocean. Eng. 2017, 142, 133–144. [Google Scholar] [CrossRef]
- Huo, G.; Yang, S.X.; Li, Q.; Zhou, Y. A robust and fast method for sidescan sonar image segmentation using nonlocal despeckling and active contour model. IEEE Trans. Cybern. 2017, 47, 855–872. [Google Scholar] [CrossRef] [PubMed]
- Mignotte, M.; Collet, C.; Pérez, P.; Bouthemy, P. Three-class markovian segmentation of high-resolution sonar images. Comput. Vis. Image Underst. 1999, 76, 191–204. [Google Scholar] [CrossRef]
- Celik, T.; Tjahjadi, T. A novel method for sidescan sonar image segmentation. IEEE J. Ocean. Eng. 2011, 36, 186–194. [Google Scholar]
- Liu, G.Y.; Bian, H.Y.; Shen, Z.Y. Research on level set segmentation algorithm for sonar image. Transducer Microsyst. Technol. 2012. Available online: http://en.cnki.com.cn/Article_en/CJFDTotal-CGQJ201201029.htm (accessed on 26 April 2019).
- Zhu, B.; Wang, X.; Chu, Z.; Yang, Y.; Shi, J. Active Learning for Recognition of Shipwreck Target in Side-Scan Sonar Image. Remote Sens. 2019, 11, 243. [Google Scholar] [CrossRef]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Boston, MA, USA, 7–12 June 2015; pp. 1520–1528. [Google Scholar]
- Yang, J.; Liu, Q.; Zhang, K. Stacked hourglass network for robust facial landmark localisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 79–87. [Google Scholar]
- Badrinarayanan, V.; Handa, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. arXiv 2015, arXiv:1505.07293. [Google Scholar]
- Liu, Y.; Ren, Q.; Geng, J.; Ding, M.; Li, J. Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images. Sensors 2018, 18, 3232. [Google Scholar] [CrossRef]
- Liu, G.; Bian, H.; Ye, X.; Shi, H. An improved spectral clustering sonar image segmentation method. In Proceedings of the The 2011 IEEE/ICME International Conference on Complex Medical Engineering, Harbin, China, 22–25 May 2011; pp. 474–477. [Google Scholar]
- Ye, X.F.; Zhang, Z.H.; Liu, P.X.; Guan, H.L. Sonar image segmentation based on gmrf and level-set models. Ocean Eng. 2010, 37, 891–901. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Xu, H.; Gao, Y.; Yu, F.; Darrell, T. End-to-end learning of driving models from large-scale video datasets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 2174–2182. [Google Scholar]
- Wong, J.M.; Wagner, S.; Lawson, C.; Kee, V.; Hebert, M.; Rooney, J.; Johnson, D. Segicp-dsr: Dense semantic scene reconstruction and registration. arXiv 2017, arXiv:1711.02216. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. CoRR. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Chaurasia, A.; Culurciello, E. Linknet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017. [Google Scholar]
- Nekrasov, V.; Dharmasiri, T.; Spek, A.; Drummond, T.; Reid, I. Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations. arXiv 2018, arXiv:1809.04766. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv 2018, arXiv:1801.04381. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 1251–1258. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Branch | Input | Operator | Output |
---|---|---|---|
×1, conv, BN, ReLU | |||
Left branch | 3 × 3, DWconv, BN, ReLU | ||
1 × 1, PWconv, BN, Linear | |||
Right branch | 5 × 5, DWconv, BN, ReLU | ||
1 × 1, PWconv, BN, Linear | |||
Filter Concatenate | |||
Max-pooling |
Network | PA | MPA | MIOU | FWIOU |
---|---|---|---|---|
SegNet | 90.09 | 80.77 | 59.54 | 83.24 |
UNet | 92.79 | 83.11 | 67.70 | 85.77 |
ENet | 89.23 | 79.12 | 57.23 | 82.11 |
RT-Seg-t-4 | 86.53 | 76.24 | 53.46 | 79.36 |
RT-Seg-t-6 | 91.33 | 82.32 | 65.78 | 83.46 |
RT-Seg-t-8 | 91.42 | 82.13 | 66.32 | 86.38 |
Network | PA | MPA | MIOU | FWIOU |
---|---|---|---|---|
SegNet | 91.56 | 83.76 | 50.72 | 84.57 |
UNet | 93.88 | 85.62 | 66.62 | 88.12 |
ENet | 90.10 | 80.15 | 60.48 | 81.59 |
RT-Seg-t-4 | 85.64 | 85.93 | 56.82 | 85.15 |
RT-Seg-t-6 | 92.43 | 84.32 | 63.17 | 86.10 |
RT-Seg-t-8 | 92.67 | 85.13 | 63.45 | 86.97 |
Model | NVIDIA Jetson AGX Xavier | |||
---|---|---|---|---|
224 × 224 | 500 × 500 | |||
ms | fps | Ms | fps | |
SegNet | 72.9 | 13.717 | 316.2 | 3.162 |
UNet | 68.2 | 14.662 | 320.1 | 3.124 |
ENet | 54.4 | 18.382 | 80.5 | 12.422 |
RT-Seg-t-6 | 25.6 | 39.063 | 38.9 | 25.678 |
RT-Seg-t-8 | 60.4 | 16.547 | 87.4 | 11.435 |
Network | GFLOPs | Parameters | Model Size (fp16) |
---|---|---|---|
SegNet | 286.0 | 29.5M | 58.9M |
UNet | 328.1 | 31.03M | 62.04M |
ENet | 3.83 | 0.35M | 0.7M |
RT-Seg-t-6 | 2.14 | 0.46M | 1.4M |
RT-Seg-t-8 | 4.96 | 1.4M | 2.5M |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Q.; Wu, M.; Yu, F.; Feng, C.; Li, K.; Zhu, Y.; Rigall, E.; He, B. RT-Seg: A Real-Time Semantic Segmentation Network for Side-Scan Sonar Images. Sensors 2019, 19, 1985. https://doi.org/10.3390/s19091985
Wang Q, Wu M, Yu F, Feng C, Li K, Zhu Y, Rigall E, He B. RT-Seg: A Real-Time Semantic Segmentation Network for Side-Scan Sonar Images. Sensors. 2019; 19(9):1985. https://doi.org/10.3390/s19091985
Chicago/Turabian StyleWang, Qi, Meihan Wu, Fei Yu, Chen Feng, Kaige Li, Yuemei Zhu, Eric Rigall, and Bo He. 2019. "RT-Seg: A Real-Time Semantic Segmentation Network for Side-Scan Sonar Images" Sensors 19, no. 9: 1985. https://doi.org/10.3390/s19091985
APA StyleWang, Q., Wu, M., Yu, F., Feng, C., Li, K., Zhu, Y., Rigall, E., & He, B. (2019). RT-Seg: A Real-Time Semantic Segmentation Network for Side-Scan Sonar Images. Sensors, 19(9), 1985. https://doi.org/10.3390/s19091985