Multiple Receptive Field Network (MRF-Net) for Autonomous Underwater Vehicle Fishing Net Detection Using Forward-Looking Sonar Images
Abstract
:1. Introduction
2. Related Work
3. Method
3.1. Data Preprocessing
3.2. Network Architecture
3.2.1. Feature Extraction Network
- Dilated Convolution. This introduces a new parameter, dilation rate, into standard convolution. The actual positions where kernels implement the convolutional operation vary with the dilation rate, as shown in Figure 7. The purpose of dilated convolution is to replace the maximum pooling that cause information loss and provide a larger receptive field when the amount of computation is equivalent. However, dilated convolution causes the discontinuity of the receptive field, that is, the gridding issue. Thus, when using dilated convolution with the same dilation rate, many neighboring pixels will be ignored and only a small part will be calculated. Moreover, with the increase in dilation rate, the local information, especially the local information of the center point, is seriously ignored. Therefore, we combined the dilated convolutions with different dilation rates to detect the large and small targets correctly at the same time. As shown in Figure 6a,b, one branch has a 3 × 3 convolution layer with a dilation rate of 2, and another branch has two 3 × 3 convolution layer with dilation rates of 1 and 5, respectively. After the down-sampling coefficient reaches 16, we used the three-branch MRF block, setting the dilation rate to 2, 3, and 5, respectively, as shown in Figure 6c.
- IBN Layer. We used IBN-Net [37] for reference to add the IBN layer to the MRF block, which combines instance normalization (IN) and batch normalization (BN) reasonably to improve the learning ability and generalization ability of the network. IN learns features that are invariant to appearance changes but reduces the useful information about content of images, while BN is essential for preserving content related information. It is known that the low-level features of CNN reflect appearance differences, and the high-level features reflect semantic information. Therefore, we used this property by introducing IN to the MRF block. As shown in Figure 6, we placed IN only at the lower layer, which can filter the information that reflects the appearance and retain the semantic information at the same time. Moreover, in order to retain the content information of the image in the lower layer, we set half of the normalization features to IN, and the other half to BN. To retain the semantic information, we only added IN to low layers whose downsampling factor was less than 16.
- Activation Function. The nonlinear function is usually used as the activation function to add nonlinearity to the CNN and enhance the ability of feature expression. ReLU is widely used as the nonlinear activation function. Compared with sigmoid and other functions, it has no gradient vanishing problem and is easy to calculate. However, it is found that it is easy to lose information when the ReLU operation is performed on low dimensional features [34]. To solve the problem of information loss, we used the linear activation function to replace ReLU after the DW convolution layer as shown in Figure 6.
3.2.2. Prediction Module
4. Implementation and Experimental Details
4.1. Dataset
4.2. Loss
4.3. Training Details
4.3.1. Mixup Strategy
4.3.2. Training Parameters
5. Experimental Results and Analysis
5.1. Accuracy
5.2. Inference Time
5.3. Ablation Experiment
- Preprocessing Methods. As described in Section 3.1, we used gray stretching and threshold segmentation to preprocess the FLS images, which can suppress the noise interference to a certain extent, and achieve the effect of highlighting the target. This operation improves the detection performance by 0.6% (from 88.6% to 89.2%). In addition, we can see from the Table 5 that the median filter and Lee filter did not improved the detection accuracy but led to a decrease in different amplitudes.
- IBN layer. The IBN layer combines two normalization methods, IN and BN, and it not only retains the features of appearance invariants but also retains rich semantic features. In the experiment, we added the IBN layer to the shallow layer of the network, which improved the learning ability and generalization ability of the network and further improved the mAP by 0.5% (from 89.2% to 89.7%).
- Dilated Convolution. Dilated convolution is a method that can increase the receptive field without increasing the number of parameters. We introduced dilated convolution in our architecture to ensure the high-resolution of the feature map without reducing the receptive field. We also combined convolution kernels of different sizes with different dilated rates to achieve accurate detection of multi-scale targets. This is one of the reasons why MRF-Net can perform well in detecting small objects. As described in Table 5, selecting the dilated convolution can improve the accuracy by 1.0% (from 87.6% to 88.6%).
- Mixup Strategy. The scale of our own dataset used in our experiment is far less than that of the open source datasets, so there is a risk of over fitting during training. In order to enrich the dataset and reduce the risk of over fitting, we use the mixup strategy which randomly synthesizing virtual data. We used this technique in the training process, which further boosted the performance by 0.6% (from 89.7% to 90.3%) for our MRF-Net.
5.4. Real-Time Experimental Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Huang, S.-W.; Chen, E.; Guo, J. Efficient Seafloor Classification and Submarine Cable Route Design Using an Autonomous Underwater Vehicle. IEEE J. Ocean. Eng. 2017, 43, 7–18. [Google Scholar] [CrossRef]
- Bingham, B.; Foley, B.; Singh, H.; Camilli, R.; Delaporta, K.; Eustice, R.M.; Mallios, A.; Mindell, D.; Roman, C.; Sakellariou, D. Robotic tools for deep water archaeology: Surveying an ancient shipwreck with an autonomous underwater vehicle. J. Field Robot. 2010, 27, 702–717. [Google Scholar] [CrossRef] [Green Version]
- Neves, G.; Cerqueira, R.; Albiez, J.; Oliveira, L. Rotation-invariant shipwreck recognition with forward-looking sonar. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach Convention & Entertainment Center, Los Angeles, CA, USA, 15–21 June 2019. [Google Scholar]
- Qin, H.; Li, X.; Liang, J.; Peng, Y.; Zhang, C. DeepFish: Accurate underwater live fish recognition with a deep architecture. Neurocomputing 2016, 187, 49–58. [Google Scholar] [CrossRef]
- Kim, J.; Yu, S.-C. Convolutional neural network-based real-time ROV detection using forward-looking sonar image. In Proceedings of the 2016 IEEE/OES Autonomous Underwater Vehicles (AUV), Tokyo, Japan, 6–9 November 2016; pp. 396–400. [Google Scholar]
- Valdenegro-Toro, M. Learning Objectness from Sonar Images for Class-Independent Object Detection. In Proceedings of the 2019 European Conference on Mobile Robots (ECMR), Prague, Czech Republic, 4–6 September 2019; pp. 1–6. [Google Scholar]
- Lee, Y.; Kim, T.G.; Choi, H.-T. Preliminary study on a framework for imaging sonar based underwater object recognition. In Proceedings of the 2013 10th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, Korea, 30 October–2 November 2013; pp. 517–520. [Google Scholar]
- Petillot, Y.; Ruiz, I.T.; Lane, D.M. Underwater vehicle obstacle avoidance and path planning using a multi-beam forward looking sonar. IEEE J. Ocean. Eng. 2001, 26, 240–251. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Zhou, X.; Wang, D.; Krahenbuhl, P. Objects as Points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach Convention & Entertainment Center, Los Angeles, CA, USA, 15–21 June 2019. [Google Scholar]
- Zhang, Z.; He, T.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of Freebies for Training Object Detection Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach Convention & Entertainment Center, Los Angeles, CA, USA, 15–21 June 2019. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 42, 386–397. [Google Scholar] [CrossRef] [PubMed]
- Uijlings, J.R.R.; Van De Sande, K.E.A.; Gevers, T.; Smeulders, A.W.M. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; Volume 99, pp. 2999–3007. [Google Scholar] [CrossRef] [Green Version]
- Liu, S.; Huang, D.; Wang, Y. Receptive Field Block Net for Accurate and Fast Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin, Germany, 2018; pp. 385–400. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Weng, L.-Y.; Li, M.; Gong, Z.; Ma, S. Underwater object detection and localization based on multi-beam sonar image processing. In Proceedings of the 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), Guangzhou, China, 11–14 December 2012; pp. 514–519. [Google Scholar]
- Hurtos, N.; Palomeras, N.; Nagappa, S.; Salvi, J. Automatic detection of underwater chain links using a forward-looking sonar. In Proceedings of the 2013 MTS/IEEE OCEANS—Bergen, Bergen, Norway, 10–14 June 2013; pp. 1–7. [Google Scholar]
- Valdenegro-Toro, M. Submerged marine debris detection with autonomous underwater vehicles. In Proceedings of the 2016 International Conference on Robotics and Automation for Humanitarian Applications (RAHA), Hong Kong, China, 18–20 December 2016; pp. 1–7. [Google Scholar]
- Valdenegro-Toro, M. End-to-end object detection and recognition in forward-looking sonar images with convolutional neural networks. In Proceedings of the 2016 IEEE/OES Autonomous Underwater Vehicles (AUV), Tokyo, Japan, 6–9 November 2016; pp. 144–150. [Google Scholar] [CrossRef]
- Zacchini, L.; Franchi, M.; Manzari, V.; Pagliai, M.; Ridolfi, A. Forward-Looking Sonar CNN-based Automatic Target Recog-nition: An experimental campaign with FeelHippo AUV. In Proceedings of the IEEE Conference on Autonomous Underwater Vehicles Symposium (AUV), St. Johns, NL, Canada, 30 September–2 October 2020. [Google Scholar]
- Kvasic, I.; Miskovic, N.; Vukic, Z. Convolutional Neural Network Architectures for Sonar-Based Diver Detection and Tracking. In Proceedings of the OCEANS 2019—Marseille, Marseille, France, 17–20 June 2019; pp. 1–6. [Google Scholar]
- Li, Z.; Peng, C.; Yu, G.; Zhang, X.; Deng, Y.; Sun, J. Detnet: A Backbone network for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Sandler, M.; Howard, A.G.; Zhu, M.; Zhmoginov, A.; Chen, L. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-nition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Howard, A.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; Institute of Electrical and Electronics Engineers (IEEE): New York, NY, USA, 2018; pp. 6848–6856. [Google Scholar]
- Pan, X.; Luo, P.; Shi, J.; Tang, X. Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net. In Constructive Side-Channel Analysis and Secure Design; Springer International Publishing: Cham, Switzerland, 2018; pp. 484–500. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopezpaz, D. mixup: Beyond Empirical Risk Minimization. In Proceedings of the Inter-national Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Technical Parameter | |
---|---|
Operating frequency | 720 kHz |
Voltage supply | 20–75 V DC |
Detection range | 0.2–120 m |
Number of beams | 256 |
Visual angle | 120° |
Angular resolution | 0.5° |
Range resolution | 8 mm |
Vertical beam width | 20° |
Size | 228 mm × 135 mm × 110 mm |
Symbol | Determinant |
---|---|
TP | Number of predicted bounding boxes whose IoU with the ground truth are higher than 0.5 |
FP | Number of predicted bounding boxes whose IoU with the ground truth are less than or equal to 0.5 |
TN | Not used |
FN | Number of undetected ground truth |
Method | mAP (%) | Cloth (%) | Fishnet (%) | Plastic Bag (%) |
---|---|---|---|---|
CenterNet-dla [10] | 90.0 | 86.3 | 93.0 | 90.7 |
Faster RCNN [16] | 86.7 | 79.7 | 92.3 | 88.2 |
YoloV3 [21] | 71.2 | 42.0 | 83.6 | 87.9 |
RFBNet [25] | 87.2 | 83.3 | 87.3 | 90.9 |
SSD300 [33] | 80.6 | 71.4 | 88.1 | 82.4 |
MRFNet(Ours) | 90.3 | 88.2 | 91.8 | 90.7 |
Method | Parameters | Model Size | GFLOPs | Time(ms) | FPS |
---|---|---|---|---|---|
CenterNet-dla [10] | 17.9 M | 70.5 M | 30.9 G | 127 | 7.87 |
Faster RCNN [16] | 417.9 M | 547.0 M | 231.3 G | 957 | 1.04 |
YoloV3 [21] | 61.9 M | 246.4 M | 33.0 G | 94 | 10.64 |
RFBNet [25] | 34.1 M | 136.6 M | 35.0 G | 99 | 10.01 |
SSD300 [33] | 23.7 M | 96.1 M | 30.4 G | 113 | 8.85 |
MRF-Net (Ours) | 6.4 M | 26.1 M | 18.5 G | 89 | 11.13 |
MRF-Net | |||||||
---|---|---|---|---|---|---|---|
Median filter | √ | ||||||
Lee filter | √ | ||||||
Our preprocessing method | √ | √ | √ | ||||
IBN layer | √ | √ | |||||
Dilated convolution | √ | √ | √ | √ | √ | √ | |
Mixup strategy | √ | ||||||
mAP | 87.6 | 88.6 | 88.3 | 88.1 | 89.2 | 89.7 | 90.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qin, R.; Zhao, X.; Zhu, W.; Yang, Q.; He, B.; Li, G.; Yan, T. Multiple Receptive Field Network (MRF-Net) for Autonomous Underwater Vehicle Fishing Net Detection Using Forward-Looking Sonar Images. Sensors 2021, 21, 1933. https://doi.org/10.3390/s21061933
Qin R, Zhao X, Zhu W, Yang Q, He B, Li G, Yan T. Multiple Receptive Field Network (MRF-Net) for Autonomous Underwater Vehicle Fishing Net Detection Using Forward-Looking Sonar Images. Sensors. 2021; 21(6):1933. https://doi.org/10.3390/s21061933
Chicago/Turabian StyleQin, Rixia, Xiaohong Zhao, Wenbo Zhu, Qianqian Yang, Bo He, Guangliang Li, and Tianhong Yan. 2021. "Multiple Receptive Field Network (MRF-Net) for Autonomous Underwater Vehicle Fishing Net Detection Using Forward-Looking Sonar Images" Sensors 21, no. 6: 1933. https://doi.org/10.3390/s21061933
APA StyleQin, R., Zhao, X., Zhu, W., Yang, Q., He, B., Li, G., & Yan, T. (2021). Multiple Receptive Field Network (MRF-Net) for Autonomous Underwater Vehicle Fishing Net Detection Using Forward-Looking Sonar Images. Sensors, 21(6), 1933. https://doi.org/10.3390/s21061933