One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors
Abstract
:1. Introduction
- We apply for rotation regression as PIXOR [10]. Without any pivot angle prior and auxiliary angle classification branches in anchor-based methods, the unique orientation angle can be decoded by the result of our regression branch and achieve high average orientation similarity results.
- We combine SmoothL1 and high-level IoU loss for training. The SmoothL1 loss trains each regression branch separately, while IoU loss uniformly trains all regression parameters. The experimental results show that our detector achieves nearly the same performance level as the other anchor-based detectors. We also analyze the performance between the anchor-based and the anchor-free methods.
2. Related Work
3. Our Approach
3.1. Point Cloud Feature Extractor
3.1.1. Viewpoint Selection
3.1.2. Pillar Feature Extractor
3.1.3. Parameterless Voxel Feature Extractor
3.1.4. Projection Relationship
3.2. Backbone Network
3.3. Anchor-Free Detector
3.3.1. Heatmap for Classification
3.3.2. 3D Information Regression
3.3.3. Auxiliary Loss and Joint Training
4. Experiments
4.1. Implementation Details
4.2. Experiments on the KITTI Validation Set
4.3. Experiments on the KITTI Test Set
4.4. Experiments on the Average Orientation Similarity
4.5. Experiments on the Training Sample
4.6. Influence of Classification Loss Function
4.7. From Anchor-Based to Anchor-Free
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shin, K.; Kwon, Y.P.; Tomizuka, M. Roarnet: A robust 3d object detection based on region approximation refinement. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 2510–2515. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Jia, K. Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. arXiv 2019, arXiv:1903.01864. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Shi, S.; Wang, Z.; Wang, X.; Li, H. Part-A^ 2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud. arXiv 2019, arXiv:1907.03670. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Wu, B.; Wan, A.; Yue, X.; Keutzer, K. Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1887–1893. [Google Scholar]
- Yang, B.; Luo, W.; Urtasun, R. Pixor: Real-time 3d object detection from point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7652–7660. [Google Scholar]
- Chen, X.; Kundu, K.; Zhang, Z.; Ma, H.; Fidler, S.; Urtasun, R. Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2147–2156. [Google Scholar]
- Mousavian, A.; Anguelov, D.; Flynn, J.; Kosecka, J. 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7074–7082. [Google Scholar]
- Chen, X.; Kundu, K.; Zhu, Y.; Ma, H.; Fidler, S.; Urtasun, R. 3d object proposals using stereo imagery for accurate object class detection. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1259–1272. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st Conference on Neural Information Processing System, Long Beach, CA, USA, 4–9 December 2017; pp. 5105–5114. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Vora, S.; Lang, A.H.; Helou, B.; Beijbom, O. PointPainting: Sequential Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4603–4611. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3d proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shi, S.; Wang, X.; Li, H. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–21 June 2019; pp. 770–779. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Xie, L.; Xiang, C.; Yu, Z.; Xu, G.; Yang, Z.; Cai, D.; He, X. PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module. AAAI 2020, 34, 12460–12467. [Google Scholar] [CrossRef]
- You, Y.; Wang, Y.; Chao, W.L.; Garg, D.; Pleiss, G.; Hariharan, B.; Campbell, M.; Weinberger, K.Q. Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Ge, R.; Ding, Z.; Hu, Y.; Wang, Y.; Chen, S.; Huang, L.; Li, Y. Afdet: Anchor free one stage 3d object detection. arXiv 2020, arXiv:2006.12671. [Google Scholar]
- Wang, G.; Tian, B.; Ai, Y.; Xu, T.; Chen, L.; Cao, D. CenterNet3D: An Anchor free Object Detector for Autonomous Driving. arXiv 2020, arXiv:2007.07214. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Cao, Z.; Martinez, G.H.; Simon, T.; Wei, S.E.; Sheikh, Y.A. OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 172–186. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Toshev, A.; Szegedy, C. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 25 September 2014; pp. 1653–1660. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Huang, L.; Yang, Y.; Deng, Y.; Yu, Y. Densebox: Unifying landmark localization with end to end object detection. arXiv 2015, arXiv:1509.04874. [Google Scholar]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. Foveabox: Beyound anchor-based object detection. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Zhou, D.; Fang, J.; Song, X.; Guan, C.; Yin, J.; Dai, Y.; Yang, R. Iou loss for 2D/3D object detection. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 31 October 2019; pp. 85–94. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Fu, H.; Wang, L.; Xiao, L.; Dai, B. SCNet: Subdivision Coding Network for Object Detection Based on 3D Point Cloud. IEEE Access 2019, 7, 120449–120462. [Google Scholar] [CrossRef]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3dssd: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11040–11048. [Google Scholar]
- Zheng, W.; Tang, W.; Chen, S.; Jiang, L.; Fu, C.W. CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point Cloud; AAAI: Menlo Park, CA, USA, 2021. [Google Scholar]
Target Assignment | Similarity Matrix | Training Label |
---|---|---|
Anchor-based | ||
Ours |
Method | Input | ||||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
MV3D [15] | RGB&LiDAR | 71.29 | 62.68 | 56.56 | 86.55 | 78.10 | 76.67 |
AVOD [17] | RGB&LiDAR | 84.41 | 74.44 | 68.65 | N/A | N/A | N/A |
F-PointNet [20] | RGB&LiDAR | 83.76 | 70.91 | 67.47 | 88.16 | 84.02 | 74.44 |
PointPainting [16] | RGB&LiDAR | 86.26 | 76.77 | 70.25 | 90.01 | 87.65 | 85.56 |
PL++ [23] | RGB&LiDAR | 75.10 | 63.80 | 57.40 | 88.20 | 76.90 | 73.40 |
PI-RCNN [22] | RGB&LiDAR | 88.27 | 78.53 | 77.75 | N/A | N/A | N/A |
PIXOR [10] | LiDAR | N/A | N/A | N/A | 86.79 | 80.75 | 76.60 |
VoxelNet [3] | LiDAR | 81.97 | 65.46 | 62.85 | 89.60 | 84.81 | 78.57 |
SECOND [4] | LiDAR | 87.43 | 76.48 | 69.10 | 89.79 | 87.07 | 79.66 |
PointPillars [5] | LiDAR | 86.53 | 77.20 | 70.93 | 89.93 | 87.16 | 85.03 |
SCNet [39] | LiDAR | 87.83 | 77.77 | 75.97 | 90.35 | 88.09 | 87.30 |
AFDet [24] | LiDAR | 85.68 | 75.57 | 69.31 | 89.42 | 85.45 | 80.56 |
CenterNet3D-SL1 [25] | LiDAR | 87.92 | 76.84 | 75.74 | 89.97 | 86.81 | 85.85 |
3DSSD [40] | LiDAR | 89.71 | 79.45 | 78.67 | N/A | N/A | N/A |
CIA-SSD [41] | LiDAR | 90.04 | 79.81 | 78.80 | N/A | N/A | N/A |
Ours (PP) | LiDAR | 82.55 | 75.14 | 72.70 | 89.79 | 86.73 | 84.91 |
Ours (VFE) | LiDAR | 88.31 | 77.97 | 76.17 | 89.84 | 87.43 | 86.63 |
Method | Input | ||||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
MV3D [15] | RGB&LiDAR | 74.97 | 63.63 | 54.00 | 86.62 | 78.93 | 69.80 |
AVOD [17] | RGB&LiDAR | 76.39 | 66.47 | 60.23 | 89.75 | 84.95 | 78.32 |
F-PointNet [20] | RGB&LiDAR | 82.19 | 69.79 | 60.59 | 91.17 | 84.67 | 74.77 |
PointPainting [16] | RGB&LiDAR | 82.11 | 71.70 | 67.08 | 92.45 | 88.11 | 83.36 |
PL++ [23] | RGB&LiDAR | 68.38 | 54.88 | 49.16 | 84.61 | 73.80 | 65.59 |
PI-RCNN [22] | RGB&LiDAR | 84.37 | 74.82 | 70.03 | 91.44 | 85.81 | 81.00 |
PIXOR [10] | LiDAR | N/A | N/A | N/A | 81.70 | 77.05 | 72.95 |
VoxelNet [3] | LiDAR | 77.47 | 65.11 | 57.73 | 89.35 | 79.26 | 77.39 |
SECOND [4] | LiDAR | 83.13 | 73.66 | 66.20 | 88.01 | 79.37 | 77.95 |
PointPillars [5] | LiDAR | 82.58 | 74.31 | 68.99 | 90.07 | 86.56 | 82.81 |
SCNet [39] | LiDAR | 83.34 | 73.17 | 67.93 | 90.07 | 86.48 | 81.30 |
CenterNet3D [25] | LiDAR | 86.20 | 77.90 | 73.03 | 91.08 | 88.46 | 83.62 |
3DSSD [40] | LiDAR | 88.36 | 79.57 | 74.55 | 92.66 | 89.02 | 85.86 |
CIA-SSD [41] | LiDAR | 89.59 | 80.28 | 72.87 | 93.74 | 89.84 | 82.39 |
Ours (VFE) | LiDAR | 84.41 | 75.39 | 69.89 | 91.58 | 85.83 | 80.54 |
Loss Function | Average Orientation Similarity (%) | ||
---|---|---|---|
Easy | Moderate | Hard | |
IoU | 45.23 | 44.54 | 43.97 |
IoU + Cls | 90.59 | 89.02 | 87.83 |
RIoU | 90.49 | 88.57 | 87.02 |
Combined | 90.65 | 89.13 | 88.07 |
Loss Function | |||
---|---|---|---|
Easy | Moderate | Hard | |
IoU | 86.31 | 76.88 | 74.98 |
IoU + Cls | 88.18 | 78.15 | 76.82 |
RIoU | 86.87 | 76.91 | 74.81 |
Combined | 87.94 | 77.74 | 76.39 |
Param. | Param. | ||||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
80.69 | 65.36 | 58.37 | 87.21 | 77.81 | 69.95 | ||
81.89 | 72.99 | 66.44 | 87.52 | 84.56 | 78.05 | ||
87.45 | 76.75 | 74.44 | 89.96 | 86.01 | 85.94 |
Param. | Param. | ||||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
86.38 | 75.42 | 68.40 | 90.17 | 87.36 | 79.80 | ||
87.14 | 76.68 | 74.84 | 89.99 | 86.30 | 79.77 | ||
86.84 | 76.46 | 74.15 | 90.08 | 86.18 | 85.82 |
Loss Function | Loss Function | ||||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
RetinaNet (Equation (2)) | 87.17 | 77.05 | 75.66 | RetinaNet (Equation (2)) | 89.84 | 87.38 | 86.72 |
CornerNet (Equation (12)) | 81.54 | 72.67 | 72.28 | CornerNet (Equation (12)) | 87.92 | 84.67 | 85.19 |
Method | Anchor-Aware Param. | Aux. Cls. | |||||||
---|---|---|---|---|---|---|---|---|---|
Cls. | Reg. | Easy | Moderate | Hard | Easy | Moderate | Hard | ||
SECOND-lite | ✓ | ✓ | ✓ | 88.10 | 77.68 | 75.35 | 90.18 | 87.34 | 86.38 |
Modified(v1) | ✓ | ✓ | 86.84 | 75.98 | 68.63 | 89.88 | 85.71 | 79.35 | |
Modified(v2) | 86.84 | 76.46 | 74.15 | 90.08 | 86.18 | 85.82 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; Zhao, S.; Zhao, W.; Zhang, L.; Shen, J. One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors. Sensors 2021, 21, 2651. https://doi.org/10.3390/s21082651
Li H, Zhao S, Zhao W, Zhang L, Shen J. One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors. Sensors. 2021; 21(8):2651. https://doi.org/10.3390/s21082651
Chicago/Turabian StyleLi, Hao, Sanyuan Zhao, Wenjun Zhao, Libin Zhang, and Jianbing Shen. 2021. "One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors" Sensors 21, no. 8: 2651. https://doi.org/10.3390/s21082651
APA StyleLi, H., Zhao, S., Zhao, W., Zhang, L., & Shen, J. (2021). One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors. Sensors, 21(8), 2651. https://doi.org/10.3390/s21082651