The Graph Neural Network Detector Based on Neighbor Feature Alignment Mechanism in LIDAR Point Clouds
Abstract
:1. Introduction
- (1)
- We propose a novel graph neural network framework based on neighbor feature alignment mechanism for 3D object detection. This framework converts the input point cloud into a graph and uses a graph neural network for feature extraction, which implements the 3D object detection in LiDAR point clouds.
- (2)
- A neighbor feature alignment mechanism is proposed. This method exploits the structural information of graph, and it aggregates the neighbor and edge features to update the state of vertices during the iteration process. This method enables the reduction of the offset error of the vertices, and ensures the invariance of the point cloud in the spatial domain.
- (3)
- We conduct extensive experiments on the public benchmark KITTI dataset for autonomous driving. The experiments demonstrate that the proposed method achieves competitive experimental results.
2. Related Works
3. Proposed Method
3.1. Framework Overview
3.2. The Point Cloud Processing Module
3.3. The Feature Extraction Module
3.4. The Prediction Module
Loss Function
4. Experiments and Analysis
4.1. Dataset and Experimental Details
4.1.1. Datasets
4.1.2. Experimental Details
4.2. Comparison with Other Advanced Methods
4.3. Ablation Study
4.3.1. The Ablation Study of Activation Function
4.3.2. The Ablation Study for GNN Iteration
4.3.3. The Ablation Study for the Proposed Module
4.4. Qualitative Results and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, X.; Kundu, K.; Zhang, Z.; Ma, H.; Fidler, S.; Urtasun, R. Monocular 3D Object Detection for Autonomous Driving. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2147–2156. [Google Scholar] [CrossRef]
- Mousavian, A.; Anguelov, D.; Flynn, J.; Kosecka, J. 3D Bounding Box Estimation Using Deep Learning and Geometry. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5632–5640. [Google Scholar] [CrossRef] [Green Version]
- Song, S.; Xiao, J. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 808–816. [Google Scholar] [CrossRef] [Green Version]
- Yang, B.; Luo, W.; Urtasun, R. PIXOR: Real-time 3D Object Detection from Point Clouds. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7652–7660. [Google Scholar] [CrossRef] [Green Version]
- Engelcke, M.; Rao, D.; Wang, D.Z.; Tong, C.H.; Posner, I. Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 1355–1361. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4490–4499. [Google Scholar] [CrossRef] [Green Version]
- Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar] [CrossRef] [Green Version]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Zhong, Z.; Xiao, G.; Wang, S.; Wei, L.; Zhang, X. PESA-Net: Permutation-Equivariant Split Attention Network for correspondence learning. Inform. Fus. 2022, 77, 81–89. [Google Scholar] [CrossRef]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast Encoders for Object Detection From Point Clouds. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12689–12697. [Google Scholar] [CrossRef] [Green Version]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3DSSD: Point-Based 3D Single Stage Object Detector. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11037–11045. [Google Scholar] [CrossRef]
- Shi, W.; Rajkumar, R. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1708–1716. [Google Scholar] [CrossRef]
- Feng, M.; Gilani, S.Z.; Wang, Y.; Zhang, L.; Mian, A. Relation Graph Network for 3D Object Detection in Point Clouds; Cornell University: Ithaca, NY, USA, 2019; pp. 92–107. [Google Scholar]
- Liu, Z.; Tang, H.; Lin, Y.; Han, S. Point-Voxel CNN for Efficient 3D Deep Learning; Cornell University: Ithaca, NY, USA, 2019. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10526–10535. [Google Scholar] [CrossRef]
- He, C.; Zeng, H.; Huang, J.; Hua, X.S.; Zhang, L. Structure Aware Single-Stage 3D Object Detection From Point Cloud. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11870–11879. [Google Scholar] [CrossRef]
- Li, Z.; Yao, Y.; Zhibin, Q.; Yang, W.; Xie, J. SIENet: Spatial Information Enhancement Network for 3D Object Detection from Point Cloud. arXiv 2021, arXiv:2103.15396. [Google Scholar]
- Deng, J.; Shi, S.; Li, P.; Zhou, W.; Zhang, Y.; Li, H. Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection. arXiv 2020, arXiv:2012.15712. [Google Scholar] [CrossRef]
- Zheng, L.; Xiao, G.; Shi, Z.; Wang, S.; Ma, J. MSA-Net: Establishing Reliable Correspondences by Multiscale Attention Network. IEEE Trans. Image Process. 2022, 31, 4598–4608. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Zheng, L.; Xiao, G.; Zhong, Z.; Ma, J. CSDA-Net: Seeking reliable correspondences by channel-Spatial difference augment network. Pattern Recognit. 2022, 126, 108539. [Google Scholar] [CrossRef]
- Zhong, Z.; Xiao, G.; Zheng, L.; Lu, Y.; Ma, J. T-Net: Effective Permutation-Equivariant Network for Two-View Correspondence Learning. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 1930–1939. [Google Scholar] [CrossRef]
- Xiao, G.; Luo, H.; Zeng, K.; Wei, L.; Ma, J. Robust Feature Matching for Remote Sensing Image Registration via Guided Hyperplane Fitting. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 16–21 June 2012. [Google Scholar]
- Zheng, W.; Tang, W.; Jiang, L.; Fu, C.-W. SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud. In Proceedings of the CVPR, Online, 19–25 June 2021; pp. 14494–14503. [Google Scholar]
- Liu, S.; Huang, W.; Cao, Y.; Li, D.; Chen, S. SMS-Net: Sparse multi-scale voxel feature aggregation network for LiDAR-based 3D object detection. Neurocomputing 2022, 501, 555–565. [Google Scholar] [CrossRef]
- Liu, M.; Ma, J.; Zheng, Q.; Liu, Y.; Shi, G. 3D Object Detection Based on Attention and Multi-Scale Feature Fusion. Sensors 2022, 22, 3935. [Google Scholar] [CrossRef] [PubMed]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar] [CrossRef] [Green Version]
- Jiang, T.; Song, N.; Liu, H.; Yin, R.; Gong, Y.; Yao, J. VIC-Net:Voxelization Information Compensation Network for Point Cloud 3D Object Detection. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation, Xi’an, China, 30 May–5 June 2021; pp. 13408–13414. [Google Scholar] [CrossRef]
- Noh, J.; Lee, S.; Ham, B. HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 14600–14609. [Google Scholar] [CrossRef]
- Zhang, Y.; Hu, Q.; Xu, G.; Ma, Y.; Wan, J.; Guo, Y. Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors For 3d Lidar Point Clouds; Cornell University: Ithaca, NY, USA, 2022; pp. 18953–18962. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
- Yang, Z.; Sun, Y.; Liu, S.; Shen, X.; Jia, J. STD: Sparse-to-Dense 3D Object Detector for Point Cloud. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 1951–1960. [Google Scholar] [CrossRef]
- Zhou, Y.; Sun, P.; Zhang, Y.; Anguelov, D.; Gao, J.; Ouyang, T.; Guo, J.; Ngiam, J.; Vasudevan, V. End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds. In Proceedings of the Conference on Robot Learning, Auckland, NZ, USA, 14–18 December 2020; pp. 923–932. [Google Scholar]
Method | Modality | Car | Pedestrian | Cyclist | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | Easy | Moderate | Hard | ||
voxel-based methods | ||||||||||
VoxelNet [6] | LiDAR only | 77.47 | 65.11 | 57.73 | 39.48 | 33.69 | 31.50 | 61.22 | 48.36 | 44.37 |
PointPillars [14] | LiDAR only | 79.05 | 74.99 | 68.30 | 52.08 | 43.53 | 41.49 | 75.78 | 59.07 | 52.92 |
CIA-SSD [13] | LiDAR+Image | 89.59 | 80.20 | 72.87 | - | - | - | - | - | - |
Voxel-RCNN [23] | LiDAR only | 90.90 | 81.62 | 77.06 | - | - | - | - | - | - |
SE-SSD [29] | LiDAR only | 91.49 | 85.54 | 77.15 | - | - | - | - | - | - |
SMS-Net [30] | LiDAR only | 89.34 | 79.04 | 77.76 | - | - | - | - | - | - |
MA-MFFC [31] | LiDAR only | 92.60 | 84.98 | 83.21 | - | - | - | - | - | - |
point-voxel methods | ||||||||||
F-PointNet [32] | LiDAR+Image | 81.02 | 70.39 | 62.19 | 51.21 | 44.89 | 40.23 | 71.96 | 56.77 | 50.39 |
PV-RCNN [20] | LiDAR+Image | 90.25 | 81.43 | 76.82 | 52.17 | 43.29 | 40.29 | 78.60 | 63.71 | 57.65 |
VIC-Net [33] | LiDAR only | 88.25 | 80.61 | 75.83 | 43.82 | 37.18 | 35.35 | 78.29 | 63.65 | 57.27 |
HVPR [34] | LiDAR only | 86.38 | 77.92 | 73.04 | 53.47 | 43.96 | 40.64 | - | - | - |
point-based methods | ||||||||||
PointRCNN [15] | LiDAR only | 86.69 | 75.64 | 70.70 | 47.98 | 39.37 | 36.01 | 74.96 | 58.82 | 52.53 |
3DSSD [16] | LiDAR only | 88.36 | 79.57 | 74.55 | 54.64 | 44.27 | 40.23 | 82.48 | 64.10 | 56.90 |
Point-GNN [17] | LiDAR only | 88.33 | 79.47 | 72.29 | 51.92 | 43.77 | 40.14 | 78.60 | 63.48 | 57.08 |
IA-SSD [35] | LiDAR only | 88.34 | 80.13 | 75.04 | 46.51 | 39.03 | 35.60 | 78.35 | 61.94 | 55.70 |
Ours | LiDAR only | 90.63 | 80.26 | 74.02 | 51.43 | 43.84 | 40.42 | 77.36 | 60.83 | 57.39 |
Method | Modality | Car | Pedestrian | Cyclist | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | Easy | Moderate | Hard | ||
AVOD [36] | LiDAR+Image | 88.53 | 83.79 | 77.90 | 58.75 | 51.05 | 47.54 | 68.06 | 57.48 | 50.77 |
F-PointNet [32] | LiDAR+Image | 88.70 | 84.00 | 75.33 | 58.09 | 50.22 | 47.20 | 75.38 | 61.96 | 54.68 |
PointPillars [14] | LiDAR only | 88.35 | 86.10 | 79.83 | 58.66 | 50.23 | 47.19 | 79.14 | 62.25 | 56.00 |
STD [37] | LiDAR only | 89.66 | 87.76 | 86.89 | 60.99 | 51.39 | 45.89 | 81.04 | 65.32 | 57.85 |
Point-GNN [17] | LiDAR only | 93.11 | 89.17 | 83.90 | 55.36 | 47.07 | 44.61 | 81.17 | 67.28 | 59.67 |
SMS-Net [30] | LiDAR only | 90.34 | 87.89 | 87.01 | - | - | - | - | - | - |
Ours | LiDAR only | 96.07 | 90.79 | 89.01 | 58.99 | 56.01 | 51.30 | 81.72 | 62.53 | 58.75 |
EXP. | Activation | ||||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
1 | ReLU | 89.49 | 85.63 | 83.02 | 86.04 | 74.04 | 74.68 |
2 | LeakyReLU | 94.32 | 88.37 | 86.41 | 89.78 | 78.33 | 77.34 |
3 | GELU | 96.07 | 90.79 | 89.01 | 90.63 | 80.26 | 74.02 |
Iteration | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
0 | 90.08 | 79.92 | 75.64 | 85.10 | 74.28 | 70.68 |
1 | 96.07 | 90.79 | 89.01 | 90.63 | 80.26 | 74.02 |
2 | 95.70 | 88.26 | 86.76 | 89.85 | 79.54 | 72.59 |
Methods | Time/ms | ||||
---|---|---|---|---|---|
Easy | Moderate | Hard | mAP | ||
Ours w/SV | 86.78 | 73.57 | 70.43 | 76.93 | 641 |
Ours w/DV | 89.68 | 76.68 | 73.46 | 79.94 | 573 |
Ours | 90.63 | 80.26 | 74.02 | 80.85 | 599 |
Methods | Gain | ||||
---|---|---|---|---|---|
Easy | Moderate | Hard | mAP | ||
Ours w/o align | 85.58 | 78.98 | 73.89 | 78.82 | +0.0% |
Ours w/align | 90.53 | 80.26 | 74.02 | 81.60 | ↑ 2.78% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Zhang, B.; Liu, N. The Graph Neural Network Detector Based on Neighbor Feature Alignment Mechanism in LIDAR Point Clouds. Machines 2023, 11, 116. https://doi.org/10.3390/machines11010116
Liu X, Zhang B, Liu N. The Graph Neural Network Detector Based on Neighbor Feature Alignment Mechanism in LIDAR Point Clouds. Machines. 2023; 11(1):116. https://doi.org/10.3390/machines11010116
Chicago/Turabian StyleLiu, Xinyi, Baofeng Zhang, and Na Liu. 2023. "The Graph Neural Network Detector Based on Neighbor Feature Alignment Mechanism in LIDAR Point Clouds" Machines 11, no. 1: 116. https://doi.org/10.3390/machines11010116
APA StyleLiu, X., Zhang, B., & Liu, N. (2023). The Graph Neural Network Detector Based on Neighbor Feature Alignment Mechanism in LIDAR Point Clouds. Machines, 11(1), 116. https://doi.org/10.3390/machines11010116