3D-GIoU: 3D Generalized Intersection over Union for Object Detection in Point Cloud
Abstract
:1. Introduction
2. Related Work
2.1. Monocular Image-Based Detection
2.2. Point Cloud-Based Detection
2.3. Multimodal Fusion-Based Detection
3. Method
3.1. Data Preprocessing
3.2. Point-Voxel Feature Encoder
3.3. Sparse Convolution Middle Layers
3.4. Region Proposal Network
4. Loss Function
4.1. Classification Loss
4.2. 3D Bounding Box Regression Loss
4.3. 3D GIoU Loss
- (1)
- When the predicted and ground truth bounding box do not coincide completely, the gradient of loss function is 0, which makes it impossible to optimize;
- (2)
- Two shapes can overlap in different ways to get the same IoU value, that is, the IoU does not reflect how overlap between two objects occurs (see Figure 4).
Algorithm 1: 3D Generalized Intersection Over Union Loss |
Input: The information of the predicted and ground truth bounding box: Output:
|
5. Experiments
5.1. Network Details
5.1.1. Car Detection
5.1.2. Cyclist and Pedestrian Detection
5.2. Training
5.3. Comparisons on the KITTI Validation Set
5.4. Analysis of the Detection Results
5.4.1. Car Detection
5.4.2. Cyclist and Pedestrian Detection
5.5. Ablation Studies
- (1)
- Comparing Baseline 1 with SECOND [18], it is easy to find that the proposed 3D GIoU loss can improve detection performance. In particular, the AP of the Hard level was increased by 6.4%.
- (2)
- Comparing Baseline 2 with SECOND [18], we can find that the use of the proposed Backbone Network improved the detection performance in Hard level by 7.28%.
- (3)
- By comparing 3D-GIoU with Baseline 1, Baseline 2, and SECOND [18], it is not difficult to find that when the 3D GIoU loss and Backbone Network are used simultaneously, the performance of 3D object detection is greatly improved.
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Chao, M.; Yulan, G.; Yinjie, L.; Wei, A. Binary Volumetric Convolutional Neural Networks for 3-D Object Recognition. IEEE Trans. Instrum. Meas. 2019, 68, 38–48. [Google Scholar]
- Chao, M.; Yulan, G.; Jungang, Y.; Wei, A. Learning Multi-view Representation with LSTM for 3D Shape Recognition and Retrieval. IEEE Trans. Multimed. 2019, 21, 1169–1182. [Google Scholar]
- Ankit, K.; Ozan, I.; Peter, O.; Mohit, I.; James, B.; Ishaan, G.; Victor, Z.; Romain, P.; Richard, S. Ask Me Anything Dynamic Memory Networks for Natural Language Processing. arXiv 2015, arXiv:1506.07285. [Google Scholar]
- Alexis, C.; Holger, S.; Yann Le, C.; Loïc, B. Deep Convolutional Networks for Natural Language Processing. arXiv 2018, arXiv:1805.09843. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-View 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; Volume 1, pp. 1907–1915. [Google Scholar]
- Chen, X.; Kundu, K.; Zhang, Z.; Ma, H.; Fidler, S.; Urtasun, R. Monocular 3d Object Detection for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2147–2156. [Google Scholar]
- Li, B.; Ouyang, W.; Sheng, L.; Zeng, X.; Wang, X. GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long beach, CA, USA, 15–21 June 2019; pp. 1019–1028. [Google Scholar]
- Guan, P.; Ulrich, N. 3D Point Cloud Object Detection with Multi-View Convolutional Neural Network. In Proceedings of the IEEE Conference on International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2040–2049. [Google Scholar]
- Zeng, Y.; Hu, Y.; Liu, S.; Ye, J.; Han, Y.; Li, X.; Sun, N. RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving. IEEE Robot. Autom. Lett. 2018, 3, 3434–3440. [Google Scholar] [CrossRef]
- François, P.; Francis, C.; Roland, S. A Review of Point Cloud Registration Algorithms for Mobile Robotics; Foundations and Trends® in Robotics, Mike Casey: Boston, MA, USA, 2015; Volume 4, pp. 1–104. [Google Scholar]
- Boyoon, J.; Sukhatme, G.S. Detecting Moving Objects Using a Single Camera on a Mobile Robot in an Outdoor Environment. In Proceedings of the 8th Conference on Intelligent Autonomous Systems, Amsterdam, The Netherlands, 10–13 March 2004; pp. 980–987. [Google Scholar]
- Lavanya, S.; Nirvikar, L.; Dileep, K.Y. A Study of Challenging Issues on Video Surveillance System for Object Detection. J. Basic Appl. Eng. Res. 2017, 4, 313–318. [Google Scholar]
- Khan, M.; Jamil, A.; Zhihan, L.; Paolo, B.; Po, Y.; Sung, W. Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Applications. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 1419–1434. [Google Scholar]
- Cheng-bin, J.; Shengzhe, L.; Trung, D.D.; Hakil, K. Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras. In Advances in Multimedia Information Processing—PCM 2015; Springer: Cham, Switzerland, 2015; pp. 330–339. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar]
- Kitti 3D Object Detection Benchmark Leader Board. Available online: http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d (accessed on 28 April 2018).
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4490–4499. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
- Simon, M.; Milz, S.; Amende, K.; Gross, H.M. Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 197–209. [Google Scholar]
- Hamid, R.; Nathan, T.; JunYoung, G.; Amir, S.; Ian, R.; Silvio, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar]
- Yang, B.; Luo, W.; Urtasun, R. PIXOR: Real-time 3D Object Detection from Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7652–7660. [Google Scholar]
- Li, B.; Zhang, T.; Xia, T. Vehicle detection from 3D lidar using fully convolutional network. arXiv 2016, arXiv:1608.07916. [Google Scholar]
- Engelcke, M.; Rao, D.; Wang, D.Z.; Tong, C.H.; Posner, I. Vote3deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1355–1361. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- Kiwoo, S.; Youngwook Paul, K.; Masayoshi, T. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement. arXiv 2018, arXiv:1811.03818. [Google Scholar]
- Liu, W.; Ji, R.; Li, S. Towards 3D Object Detection with Bimodal Deep Boltzmann Machines over RGBD Imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3013–3021. [Google Scholar]
- Zhuo, D.; Londin, J.L. Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5762–5770. [Google Scholar]
- Qianhui, L.; Huifang, M.; Yue, W.; Li, T.; Rong, X. 3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection. arXiv 2015, arXiv:1711.00238. [Google Scholar]
- Song, S.; Xiao, J. Deep Sliding Shapes for Amodal 3D Object Detection in Rgb-d Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 808–816. [Google Scholar]
- Ling, M.; Yang, B.; Wang, S.; Raquel, U. Deep Continuous Fusion for Multi-Sensor 3D Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 641–656. [Google Scholar]
- Huitl, R.; Schroth, G.; Hilsenbeck, S.; Schweiger, F.; Steinbach, E. TUMindoor: An Extensive Image and Point Cloud Dataset for Visual Indoor Localization and Mapping. In Proceedings of the IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Li, M.; Hu, Y.; Zhao, N.; Qian, Q. One-Stage Multi-Sensor Data Fusion Convolutional Neural Network for 3D Object Detection. Sensors 2019, 19, 1434. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Ma, Y.; He, S.; Zhu, J.; Xiao, Y.; Zhang, J. PVFE: Point-Voxel Feature Encoders for 3D Object Detection. In Proceedings of the IEEE International Conference on Signal, Information and Data Processing, Chongqing, China, 11–13 December 2019. accepted. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
- Everingham, M.L.; Van Gool, C.; Williams, K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Alex, H.; Sourabh, V.; Holger, C.; Zhou, L.; Jiong, Y.; Oscar, B. PointPillars: Fast Encoders for Object Detection from Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 12697–12705. [Google Scholar]
Method | Modality | Car | Cyclist | Pedestrian | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | ||
MV3D | Img. & Lidar | 71.09 | 62.35 | 55.12 | N/A | N/A | N/A | N/A | N/A | N/A |
AVOD | Img. & Lidar | 81.94 | 71.88 | 66.38 | 64.00 | 52.18 | 46.61 | 50.80 | 42.81 | 40.88 |
F-PointNet | Img. & Lidar | 81.20 | 70.39 | 62.19 | 71.96 | 56.77 | 50.39 | 51.21 | 44.89 | 51.21 |
VoxelNet | Lidar | 77.47 | 65.11 | 57.73 | 61.22 | 48.36 | 44.37 | 39.48 | 33.69 | 31.50 |
PointPillars | Lidar | 86.96 | 76.35 | 70.19 | 77.75 | 58.55 | 54.85 | 67.07 | 58.74 | 55.97 |
PVFE | Lidar | 87.32 | 77.12 | 68.87 | 81.58 | 62.41 | 56.33 | 58.48 | 51.74 | 45.09 |
SECOND | Lidar | 85.99 | 75.51 | 68.25 | 80.47 | 57.02 | 55.79 | 56.99 | 50.22 | 43.59 |
3D-GIoU | Lidar | 87.83 | 77.91 | 75.55 | 83.32 | 64.69 | 63.51 | 67.23 | 59.58 | 52.69 |
Method | Modality | Car | Cyclist | Pedestrian | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | ||
MV3D | Img & Lidar | 86.02 | 76.90 | 68.48 | N/A | N/A | N/A | N/A | N/A | N/A |
AVOD | Img & Lidar | 88.53 | 83.79 | 77.90 | 68.09 | 57.48 | 50.77 | 58.75 | 51.05 | 47.54 |
F-PointNet | Img & Lidar | 88.07 | 84.00 | 75.33 | 75.38 | 61.96 | 54.68 | 58.09 | 50.22 | 47.02 |
PIXOR | Lidar | 89.38 | 83.70 | 77.97 | N/A | N/A | N/A | N/A | N/A | N/A |
VoxelNet | Lidar | 89.35 | 79.26 | 77.39 | 66.07 | 54.76 | 50.55 | 46.13 | 40.74 | 38.11 |
PointPillars | Lidar | 90.12 | 86.67 | 84.53 | 80.89 | 61.54 | 58.63 | 73.08 | 68.20 | 63.20 |
PVFE | Lidar | 89.98 | 87.03 | 79.31 | 84.30 | 64.72 | 58.42 | 61.93 | 54.88 | 51.93 |
SECOND | Lidar | 89.23 | 86.25 | 78.95 | 82.88 | 63.46 | 57.63 | 60.81 | 53.67 | 51.10 |
3D-GIoU | Lidar | 90.16 | 87.92 | 86.55 | 85.35 | 66.91 | 65.06 | 70.16 | 62.57 | 55.52 |
Method | Method | Car | Cyclist | Pedestrian | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | ||
3D | SECOND | 85.99 | 75.51 | 68.25 | 80.47 | 57.02 | 55.79 | 56.99 | 50.22 | 43.59 |
Baseline 1 | 87.20 | 76.80 | 74.65 | 82.84 | 62.34 | 56.66 | 58.16 | 51.42 | 44.74 | |
Baseline 2 | 87.62 | 77.37 | 75.53 | 83.89 | 64.27 | 62.75 | 59.37 | 52.42 | 49.78 | |
3D-GIoU | 87.83 | 77.91 | 75.55 | 83.32 | 64.69 | 63.51 | 67.23 | 59.58 | 52.69 | |
BEV | SECOND | 89.23 | 86.25 | 78.95 | 82.88 | 63.46 | 57.63 | 60.81 | 53.67 | 51.10 |
Baseline 1 | 89.99 | 86.82 | 86.03 | 84.83 | 64.56 | 58.55 | 62.34 | 59.35 | 52.70 | |
Baseline 2 | 89.80 | 87.13 | 86.31 | 85.42 | 65.78 | 64.45 | 66.40 | 59.40 | 52.56 | |
3D-GIoU | 90.16 | 87.92 | 86.55 | 85.35 | 66.91 | 65.06 | 70.16 | 62.57 | 55.52 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, J.; Ma, Y.; He, S.; Zhu, J. 3D-GIoU: 3D Generalized Intersection over Union for Object Detection in Point Cloud. Sensors 2019, 19, 4093. https://doi.org/10.3390/s19194093
Xu J, Ma Y, He S, Zhu J. 3D-GIoU: 3D Generalized Intersection over Union for Object Detection in Point Cloud. Sensors. 2019; 19(19):4093. https://doi.org/10.3390/s19194093
Chicago/Turabian StyleXu, Jun, Yanxin Ma, Songhua He, and Jiahua Zhu. 2019. "3D-GIoU: 3D Generalized Intersection over Union for Object Detection in Point Cloud" Sensors 19, no. 19: 4093. https://doi.org/10.3390/s19194093
APA StyleXu, J., Ma, Y., He, S., & Zhu, J. (2019). 3D-GIoU: 3D Generalized Intersection over Union for Object Detection in Point Cloud. Sensors, 19(19), 4093. https://doi.org/10.3390/s19194093