Density-Aware Tree–Graph Cross-Message Passing for LiDAR Point Cloud 3D Object Detection
Abstract
1. Introduction
- The DAGC strategy adaptively builds local neighborhoods by considering spatial proximity and local point density, thus avoiding redundant connections in dense regions and maintaining connectivity in sparse areas.
- HTR models multi-scale semantic features through structured downsampling, preserving the global context while being aware of spatial structure.
- The TGCMP mechanism enables bi-directional feature exchange between graph and tree domains, allowing local geometric patterns and global semantic cues to reinforce each other.
2. Related Works
2.1. Overview of LiDAR-Based Methods
2.2. Methods Based on Graph Network
2.3. Methods Based on Multi-Scale Feature Learning
2.4. Transformer-Based 3D Object Detection Methods
3. Methodology
3.1. The Overview of Our Method
3.2. Local Density Estimation
3.3. Dynamic Point Cloud Grouping
Algorithm 1 Dynamic Point Cloud Grouping (DPCG) |
|
3.4. Density Graph Construction
3.5. Hierarchical Tree Representation
3.5.1. Voxel Grid Construction and Parent–Child Mapping
3.5.2. Parent–Child Relationship and Layer-Wise Indexing
3.5.3. Coarse-Level Node Computation
Algorithm 2 Tree Construction for Point Cloud Downsampling |
|
3.6. Tree–Graph Cross-Message Passing
- The tree features at level l as , where is the number of nodes and D is the feature dimension.
- The graph features at level l as .
- The 3D coordinates of nodes as .
3.6.1. Top-Down Message Passing (Tree → Graph)
- Enhanced connectivity: Cross-layer aggregation increases node-disjoint paths:
- Reduced effective resistance: Cross-layer connections reduce the effective resistance between node pairs:
- Expanded receptive field: The tree-to-graph cross-message passing expands the receptive field logarithmically without increasing the network’s depth:
3.6.2. Bottom-Up Message Passing (Graph → Tree)
3.7. Loss Functions
4. Experiment
4.1. Datasets and Evaluation Metric
4.2. Implementation Details
5. Results and Analysis
5.1. Comparison on the KITTI Test Set
5.2. Comparison on the nuScenes Test Set
5.3. Comparison on Waymo Validation Set
5.4. Ablation Studies
5.4.1. Effect of Each Module
5.4.2. Effect of DAGC
5.4.3. Effect Between Dense and Sparse Regions
5.4.4. Effect of Different Dynamic Radius Designs
5.4.5. Effect of Tree Structure
5.4.6. Effect of TGCMP
5.4.7. Memory–Performance Trade-Off Analysis
6. Visualization
7. Discussion
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wu, T.J.; He, R.; Peng, C.C. Real-Time Environmental Contour Construction Using 3D LiDAR and Image Recognition with Object Removal. Remote Sens. 2024, 16, 4513. [Google Scholar] [CrossRef]
- Ren, H.; Zhou, R.; Zou, L.; Tang, H. Hierarchical Distribution-Based Exemplar Replay for Incremental SAR Automatic Target Recognition. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 6576–6588. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, S.; Sun, Z.; Liu, C.; Sun, Y.; Ji, K.; Kuang, G. Cross-sensor SAR image target detection based on dynamic feature discrimination and center-aware calibration. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5209417. [Google Scholar] [CrossRef]
- Xia, X.; Meng, Z.; Han, X.; Li, H.; Tsukiji, T.; Xu, R.; Zhang, Z.; Ma, J. Automated driving systems data acquisition and processing platform. arXiv 2022, arXiv:2211.13425. [Google Scholar]
- Liao, Z.; Dong, X.; He, Q. Calculating the Optimal Point Cloud Density for Airborne LiDAR Landslide Investigation: An Adaptive Approach. Remote Sens. 2024, 16, 4563. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4338–4364. [Google Scholar] [CrossRef]
- Gupta, A.; Jain, S.; Choudhary, P.; Parida, M. Dynamic object detection using sparse LiDAR data for autonomous machine driving and road safety applications. Expert Syst. Appl. 2024, 255, 124636. [Google Scholar]
- Li, Y.; Ren, H.; Yu, X.; Zhang, C.; Zou, L.; Zhou, Y. Threshold-free open-set learning network for SAR automatic target recognition. IEEE Sens. J. 2024, 24, 6700–6708. [Google Scholar] [CrossRef]
- Shi, W.; Rajkumar, R. Point-gnn: Graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1711–1719. [Google Scholar]
- Zhang, Y.; Huang, D.; Wang, Y. PC-RGNN: Point cloud completion and graph neural network for 3D object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 3430–3437. [Google Scholar]
- Yin, J.; Shen, J.; Gao, X.; Crandall, D.J.; Yang, R. Graph neural network and spatiotemporal transformer attention for 3D video object detection from point clouds. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 45, 9822–9835. [Google Scholar] [CrossRef]
- Yue, C.; Wang, Y.; Tang, X.; Chen, Q. DRGCNN: Dynamic region graph convolutional neural network for point clouds. Expert Syst. Appl. 2022, 205, 117663. [Google Scholar] [CrossRef]
- Zhang, R.; Wang, L.; Guo, Z.; Shi, J. Nearest neighbors meet deep neural networks for point cloud analysis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 1246–1255. [Google Scholar]
- Pei, Y.; Zhao, X.; Li, H.; Ma, J.; Zhang, J.; Pu, S. Clusterformer: Cluster-based transformer for 3d object detection in point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 6664–6673. [Google Scholar]
- Zhang, M.; Ma, Y.; Li, J.; Zhang, J. A density connection weight-based clustering approach for dataset with density-sparse region. Expert Syst. Appl. 2023, 230, 120633. [Google Scholar] [CrossRef]
- Chen, G.; Wang, M.; Yang, Y.; Yuan, L.; Yue, Y. Fast and Robust Point Cloud Registration with Tree-based Transformer. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 773–780. [Google Scholar]
- Pang, J.; Bui, K.; Tian, D. PIVOT-Net: Heterogeneous point-voxel-tree-based framework for point cloud compression. In Proceedings of the 2024 International Conference on 3D Vision (3DV), Davos, Switzerland, 18–21 March 2024; pp. 1270–1279. [Google Scholar]
- Ren, S.; Pan, X.; Zhao, W.; Nie, B.; Han, B. Dynamic graph transformer for 3D object detection. Knowl.-Based Syst. 2023, 259, 110085. [Google Scholar] [CrossRef]
- Vonessen, C.; Grötschla, F.; Wattenhofer, R. Next Level Message-Passing with Hierarchical Support Graphs. arXiv 2024, arXiv:2406.15852. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4490–4499. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Shi, S.; Wang, Z.; Shi, J.; Wang, X.; Li, H. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2647–2664. [Google Scholar] [CrossRef]
- Deng, J.; Shi, S.; Li, P.; Zhou, W.; Zhang, Y.; Li, H. Voxel r-cnn: Towards high performance voxel-based 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 1201–1209. [Google Scholar]
- Ye, M.; Xu, S.; Cao, T. Hvnet: Hybrid voxel network for lidar based 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1631–1640. [Google Scholar]
- Zheng, W.; Tang, W.; Chen, S.; Jiang, L.; Fu, C.W. Cia-ssd: Confident iou-aware single-stage object detector from point cloud. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 19–21 May 2021; Volume 35, pp. 3555–3562. [Google Scholar]
- Mao, J.; Xue, Y.; Niu, M.; Bai, H.; Feng, J.; Liang, X.; Xu, H.; Xu, C. Voxel transformer for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtually, 2–9 February 2021; pp. 3164–3173. [Google Scholar]
- Yang, Y.Q.; Guo, Y.X.; Xiong, J.Y.; Liu, Y.; Pan, H.; Wang, P.S.; Tong, X.; Guo, B. Swin3d: A pretrained transformer backbone for 3d indoor scene understanding. Comput. Vis. Media 2025, 11, 83–101. [Google Scholar] [CrossRef]
- Shi, S.; Wang, X.; Li, H. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5105–5114. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3dssd: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11040–11048. [Google Scholar]
- Zhang, Y.; Hu, Q.; Xu, G.; Ma, Y.; Wan, J.; Guo, Y. Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18953–18962. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Shen, X.; Jia, J. STD: Sparse-to-Dense 3D Object Detector for Point Cloud. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1951–1960. [Google Scholar] [CrossRef]
- He, C.; Zeng, H.; Huang, J.; Hua, X.S.; Zhang, L. Structure aware single-stage 3d object detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11873–11882. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10529–10538. [Google Scholar]
- Sheng, H.; Cai, S.; Liu, Y.; Deng, B.; Huang, J.; Hua, X.S.; Zhao, M.J. Improving 3d object detection with channel-wise transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2743–2752. [Google Scholar]
- Mao, J.; Niu, M.; Bai, H.; Liang, X.; Xu, H.; Xu, C. Pyramid r-cnn: Towards better performance and adaptability for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2723–2732. [Google Scholar]
- Li, Y.; Ma, L.; Tan, W.; Sun, C.; Cao, D.; Li, J. GRNet: Geometric relation network for 3D object detection from point clouds. ISPRS J. Photogramm. Remote Sens. 2020, 165, 43–53. [Google Scholar] [CrossRef]
- Chen, J.; Lei, B.; Song, Q.; Ying, H.; Chen, D.Z.; Wu, J. A hierarchical graph network for 3d object detection on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 392–401. [Google Scholar]
- Cheng, H.; Zhu, J.; Lu, J.; Han, X. EDGCNet: Joint dynamic hyperbolic graph convolution and dual squeeze-and-attention for 3D point cloud segmentation. Expert Syst. Appl. 2024, 237, 121551. [Google Scholar] [CrossRef]
- He, Q.; Wang, Z.; Zeng, H.; Zeng, Y.; Liu, Y. Svga-net: Sparse voxel-graph attention network for 3d object detection from point clouds. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 870–878. [Google Scholar]
- Sun, Z.; Leng, X.; Zhang, X.; Zhou, Z.; Xiong, B.; Ji, K.; Kuang, G. Arbitrary-direction SAR ship detection method for multi-scale imbalance. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5208921. [Google Scholar]
- Wang, R.; Wang, Z.; Zhang, G.; Kang, H.; Luo, F. KD-Net: A novel knowledge-and data-driven network for SAR target recognition with limited data. In Proceedings of the IET International Radar Conference (IRC 2023), IET, Chongqing, China, 3–5 December 2023; Volume 2023, pp. 2623–2627. [Google Scholar]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zhou, W.; Jiang, W.; Zheng, Z.; Li, J.; Su, T.; Hu, H. From grids to pseudo-regions: Dynamic memory augmented image captioning with dual relation transformer. Expert Syst. Appl. 2025, 273, 126850. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Yu, X.; Tang, L.; Rao, Y.; Huang, T.; Zhou, J.; Lu, J. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 19313–19322. [Google Scholar]
- Pan, X.; Xia, Z.; Song, S.; Li, L.E.; Huang, G. 3d object detection with pointformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7463–7472. [Google Scholar]
- Ning, Y.; Cao, J.; Bao, C.; Hao, Q. DVST: Deformable voxel set transformer for 3D object detection from point clouds. Remote Sens. 2023, 15, 5612. [Google Scholar] [CrossRef]
- Xu, X.; Dong, S.; Xu, T.; Ding, L.; Wang, J.; Jiang, P.; Song, L.; Li, J. Fusionrcnn: Lidar-camera fusion for two-stage 3d object detection. Remote Sens. 2023, 15, 1839. [Google Scholar] [CrossRef]
- Chen, Z.; Pham, K.T.; Ye, M.; Shen, Z.; Chen, Q. Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 4273–4280. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Shi, S.; Jiang, L.; Deng, J.; Wang, Z.; Guo, C.; Shi, J.; Wang, X.; Li, H. PV-RCNN++: Point-voxel feature set abstraction with local vector representation for 3D object detection. Int. J. Comput. Vis. 2023, 131, 531–551. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar]
- Vora, S.; Lang, A.H.; Helou, B.; Beijbom, O. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June2020; pp. 4604–4612. [Google Scholar]
- Yoo, J.H.; Kim, Y.; Kim, J.; Choi, J.W. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXVII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 720–736. [Google Scholar]
- Xie, T.; Wang, L.; Wang, K.; Li, R.; Zhang, X.; Zhang, H.; Yang, L.; Liu, H.; Li, J. FARP-Net: Local-global feature aggregation and relation-aware proposals for 3D object detection. IEEE Trans. Multimed. 2023, 26, 1027–1040. [Google Scholar] [CrossRef]
- Yang, H.; Wang, W.; Chen, M.; Lin, B.; He, T.; Chen, H.; He, X.; Ouyang, W. Pvt-ssd: Single-stage 3d object detector with point-voxel transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 13476–13487. [Google Scholar]
- Noh, J.; Lee, S.; Ham, B. Hvpr: Hybrid voxel-point representation for single-stage 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 14605–14614. [Google Scholar]
- Wang, C.H.; Chen, H.W.; Chen, Y.; Hsiao, P.Y.; Fu, L.C. VoPiFNet: Voxel-pixel fusion network for multi-class 3D object detection. IEEE Trans. Intell. Transp. Syst. 2024, 25, 8527–8537. [Google Scholar] [CrossRef]
- Yin, T.; Zhou, X.; Krähenbühl, P. Multimodal virtual point 3d detection. Adv. Neural Inf. Process. Syst. 2021, 34, 16494–16507. [Google Scholar]
- Xu, S.; Zhou, D.; Fang, J.; Yin, J.; Bin, Z.; Zhang, L. Fusionpainting: Multimodal fusion with adaptive attention for 3d object detection. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 3047–3054. [Google Scholar]
- Nagesh, S.; Baig, A.; Srinivasan, S.; Rangesh, A.; Trivedi, M. Structure Aware and Class Balanced 3D Object Detection on nuScenes Dataset. arXiv 2022, arXiv:2205.12519. [Google Scholar]
- Chen, Y.; Liu, J.; Zhang, X.; Qi, X.; Jia, J. Largekernel3d: Scaling up kernels in 3d sparse cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June2023; pp. 13488–13498. [Google Scholar]
- Bai, X.; Hu, Z.; Zhu, X.; Huang, Q.; Chen, Y.; Fu, H.; Tai, C.L. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1090–1099. [Google Scholar]
- Shi, G.; Li, R.; Ma, C. Pillarnet: Real-time and high-performance pillar-based 3d object detection. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 35–52. [Google Scholar]
- Zhou, Y.; Sun, P.; Zhang, Y.; Anguelov, D.; Gao, J.; Ouyang, T.; Guo, J.; Ngiam, J.; Vasudevan, V. End-to-end multi-view fusion for 3d object detection in lidar point clouds. In Proceedings of the Conference on Robot Learning, PMLR, Virtual, 16–18 November 2020; pp. 923–932. [Google Scholar]
- Ge, R.; Ding, Z.; Hu, Y.; Wang, Y.; Chen, S.; Huang, L.; Li, Y. Afdet: Anchor free one stage 3d object detection. arXiv 2020, arXiv:2006.12671. [Google Scholar]
- Wang, Y.; Fathi, A.; Kundu, A.; Ross, D.A.; Pantofaru, C.; Funkhouser, T.; Solomon, J. Pillar-based object detection for autonomous driving. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 18–34. [Google Scholar]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 11784–11793. [Google Scholar]
- Zhu, Z.; Meng, Q.; Wang, X.; Wang, K.; Yan, L.; Yang, J. Curricular object manipulation in lidar-based object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1125–1135. [Google Scholar]
- Nie, M.; Xue, Y.; Wang, C.; Ye, C.; Xu, H.; Zhu, X.; Huang, Q.; Mi, M.B.; Wang, X.; Zhang, L. Partner: Level up the polar representation for lidar 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 3801–3813. [Google Scholar]
- Lu, B.; Sun, Y.; Yang, Z. Voxel graph attention for 3-D object detection from point clouds. IEEE Trans. Instrum. Meas. 2023, 72, 5023012. [Google Scholar] [CrossRef]
Modality | Method | Car-3D.AP | Pedestrian-3D.AP | Cyclist-3D.AP | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod | Hard | Easy | Mod | Hard | Easy | Mod | Hard | ||
Fusion-based | ||||||||||
Camera + LiDAR | F-PointNet [57] | 82.19 | 69.79 | 60.59 | 50.53 | 42.15 | 38.08 | 72.27 | 56.12 | 49.01 |
PointPainting [58] | 88.93 | 78.27 | 77.48 | 50.32 | 40.97 | 37.87 | 77.63 | 63.78 | 55.89 | |
3D-CVF [59] | 89.20 | 80.05 | 73.11 | − | − | − | − | − | − | |
FusionRCNN [52] | 88.12 | 81.98 | 77.53 | − | − | − | − | − | − | |
Point-based | ||||||||||
LiDAR | PointRCNN [29] | 86.96 | 75.64 | 70.7 | 47.98 | 39.37 | 36.01 | 74.96 | 58.82 | 52.53 |
Point-GNN [9] | 88.33 | 79.46 | 72.29 | 51.92 | 43.77 | 40.14 | 78.60 | 63.48 | 57.08 | |
SA-SSD [34] | 88.75 | 79.79 | 74.16 | − | − | − | − | − | − | |
PV-RCNN [35] | 90.25 | 81.43 | 76.82 | 52.17 | 43.29 | 40.29 | 78.57 | 63.71 | 57.65 | |
IA-SSD [32] | 88.34 | 80.13 | 75.04 | 46.51 | 39.03 | 35.60 | 78.35 | 61.94 | 55.70 | |
FARP-Net [60] | 88.36 | 81.55 | 78.98 | − | − | − | − | − | − | |
PVT-SSD [61] | 90.65 | 82.29 | 76.85 | − | − | − | − | − | − | |
Voxel-based | ||||||||||
LiDAR | SECOND [21] | 84.65 | 75.96 | 68.71 | − | − | − | − | − | − |
PointPillars [22] | 82.58 | 74.31 | 68.99 | 51.45 | 41.92 | 38.89 | 77.1 | 58.65 | 51.92 | |
Part- [23] | 87.81 | 78.49 | 73.51 | 53.10 | 43.35 | 40.06 | 79.17 | 63.52 | 56.93 | |
CIA-SSD [26] | 89.59 | 80.28 | 72.87 | − | − | − | − | − | − | |
HVPR [62] | 86.38 | 77.92 | 73.04 | 53.47 | 43.96 | 40.64 | − | − | − | |
VoPiFNet [63] | 88.51 | 80.97 | 76.74 | 53.07 | 47.43 | 45.22 | 77.64 | 64.10 | 58.00 | |
DA-TGCMP | 91.01 | 83.56 | 79.14 | 53.53 | 47.92 | 45.96 | 81.62 | 67.15 | 61.73 |
Methods | Modality | mAP | NDS | Car | Truck | C.V | Bus | Trailer | Barrier | Motor. | Bike | Ped. | T.C. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PointPainting [58] | L + C | 46.4 | 58.1 | 77.9 | 35.8 | 15.8 | 36.2 | 37.3 | 60.2 | 41.5 | 24.1 | 73.3 | 62.4 |
3D-CVF [59] | L + C | 52.7 | 62.3 | 83.0 | 45.0 | 15.9 | 48.8 | 49.6 | 65.9 | 51.2 | 30.4 | 74.2 | 62.9 |
MVP [64] | L + C | 66.4 | 70.5 | 86.8 | 58.5 | 26.1 | 67.4 | 57.3 | 74.8 | 70.0 | 49.3 | 89.1 | 85.0 |
FusionPainting [65] | L + C | 68.1 | 71.6 | 87.1 | 60.8 | 30.0 | 68.5 | 61.7 | 71.8 | 74.7 | 53.5 | 88.3 | 85.0 |
PointPillars [22] | L | 30.5 | 45.3 | 68.4 | 23.0 | 4.1 | 28.2 | 23.4 | 38.9 | 27.4 | 1.1 | 59.7 | 30.8 |
CBGS [66] | L | 52.8 | 63.3 | 81.1 | 48.5 | 10.5 | 54.9 | 42.9 | 65.7 | 51.5 | 22.3 | 80.1 | 70.9 |
LargeKernel3D [67] | L | 65.3 | 70.5 | 85.9 | 55.3 | 26.8 | 66.2 | 60.2 | 74.3 | 72.5 | 46.6 | 85.6 | 80.0 |
Transfusion-L [68] | L | 65.5 | 70.2 | 86.2 | 56.7 | 28.2 | 66.3 | 58.8 | 78.2 | 68.3 | 44.2 | 86.1 | 82.0 |
PillarNet-34 [69] | L | 66.0 | 71.4 | 87.6 | 57.5 | 27.9 | 63.6 | 63.1 | 77.2 | 70.1 | 42.3 | 87.3 | 83.3 |
DA-TGCMP | L | 66.6 | 71.7 | 86.9 | 59.5 | 32.1 | 67.5 | 62.1 | 77.5 | 74.3 | 47.2 | 86.9 | 82.3 |
Method | Veh. (LEVEL_1) | Veh. (LEVEL_2) | Ped. (LEVEL_1) | Ped. (LEVEL_2) | Cyc. (LEVEL_1) | Cyc. (LEVEL_2) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
mAP | mAPH | mAP | mAPH | mAP | mAPH | mAP | mAPH | mAP | mAPH | mAP | mAPH | |
SECOND [21] | 72.27 | 71.69 | 63.85 | 63.33 | 68.7 | 58.18 | 60.72 | 51.31 | 60.62 | 59.28 | 58.34 | 57.05 |
PointPillars [22] | 56.62 | − | − | − | 59.25 | − | − | − | − | − | − | − |
MVF [70] | 62.93 | − | − | − | 65.33 | − | − | − | − | − | − | − |
AFDet [71] | 63.69 | − | − | − | 65.33 | − | − | − | − | − | − | − |
Pillar-based [72] | 69.80 | − | − | − | 72.51 | − | − | − | − | − | − | − |
Part-A2-Net [23] | 74.82 | 74.32 | 65.88 | 65.42 | 71.76 | 63.64 | 62.53 | 55.3 | 67.35 | 66.15 | 65.05 | 63.89 |
Voxel-RCNN [24] | 75.59 | − | 66.59 | − | 72.51 | − | − | − | − | − | − | − |
CT3D [36] | 76.30 | − | 69.04 | − | 65.33 | − | − | − | − | − | − | − |
Pyramid-RCNN [37] | 76.3 | 75.68 | 67.23 | 66.68 | − | − | − | − | − | − | − | − |
CenterPoint [73] | 76.7 | 76.2 | 68.8 | 68.3 | 79.0 | 72.9 | 71.0 | 65.3 | − | − | − | − |
Curricular [74] | 72.15 | 71.04 | 64.64 | 64.29 | 73.62 | 64.47 | 65.84 | 60.39 | − | − | − | − |
VoTr-TSD [27] | 74.95 | 74.25 | 65.91 | 65.29 | − | − | − | − | − | − | − | − |
CenterFormer [21] | 72.27 | 71.69 | 63.85 | 63.33 | 68.7 | 58.18 | 60.72 | 51.31 | 60.62 | 59.28 | 58.34 | 57.05 |
PV-RCNN [35] | 75.17 | 74.6 | 66.35 | 65.84 | 72.65 | 63.52 | 63.42 | 55.29 | 67.26 | 65.82 | 64.88 | 63.48 |
PARTNER [75] | 76.05 | 75.52 | 68.58 | 68.11 | − | − | − | − | − | − | − | − |
PV-RCNN++ [55] | 76.14 | 75.62 | 68.05 | 67.56 | 73.97 | 65.43 | 65.64 | 57.82 | 68.38 | 67.06 | 65.92 | 64.65 |
DA-TGCMP | 76.64 | 76.69 | 68.86 | 68.23 | 74.46 | 66.35 | 64.43 | 57.54 | 68.52 | 67.36 | 65.52 | 64.78 |
Module | KITTI (recall_40 mAP) | nuScenes | |||
---|---|---|---|---|---|
Easy | Mod | Hard | NDS | mAP | |
Baseline | 91.19 | 82.63 | 82.01 | 65.4 | 0.592 |
Baseline + DAGC | 91.56 | 84.86 | 82.61 | 66.7 | 0.622 |
Baseline + HTR | 91.35 | 84.24 | 82.24 | 65.9 | 0.610 |
Baseline + DAGC + HTR | 92.52 | 86.87 | 82.59 | 68.1 | 0.630 |
Baseline + DAGC + HTR + TGCMP | 92.66 | 87.82 | 82.75 | 68.5 | 0.638 |
Fixed-Radius (r = 0.5) | kNN | DAGC | Car-3D.AP (IoU = 0.7) | Ped-3D.AP (IoU = 0.5) | Cyc-3D.AP (IoU = 0.5) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod | Hard | Easy | Mod | Hard | Easy | Mod | Hard | |||
√ | 89.63 | 79.53 | 78.55 | 68.44 | 61.87 | 55.52 | 88.87 | 75.47 | 68.12 | ||
√ | √ | 91.61 | 80.33 | 78.12 | 69.12 | 62.35 | 58.89 | 89.12 | 76.78 | 68.22 | |
√ | √ | √ | 69.08 |
Method | ||||||
---|---|---|---|---|---|---|
KITTI (recall_40 mAP) | KITTI (recall_40 mAP) | |||||
Car. (0.7) | Ped. (0.5) | Cyc. (0.5) | Car. (0.7) | Ped. (0.5) | Cyc. (0.5) | |
PointGNN [9] | 93.23 | 86.21 | 89.02 | 23.12 | 12.36 | 7.15 |
VoxelGraph-RCNN [76] | 95.39 | 88.43 | 91.60 | 25.03 | 14.59 | 10.34 |
DA-TGCMP | 95.72 | 87.31 | 92.38 | 28.26 | 15.11 | 17.36 |
Design | Radius Form | Pedestrian AP (Mod) | Remarks |
---|---|---|---|
Fixed Radius | 56.21 | No adaptation to density | |
Linear Mapping | 60.82 | Balanced performance | |
Exponential | 61.47 | Enhanced recall in sparse areas | |
Logarithmic | 59.35 | Conservative adaptation | |
Extended Range | 62.03 | Finer granularity in dense regions | |
Learnable Radius | 63.40 | Task-adaptive, best performance |
Baseline Method | KITTI (recall_40 mAP) | nuScenes | |||
---|---|---|---|---|---|
Easy | Mod | Hard | NDS | mAP | |
Single-Scale Voxel | 90.10 | 81.22 | 77.36 | 61.42 | 0.517 |
PointNet-Style Global Pooling | 91.11 | 81.25 | 79.53 | 62.9 | 0.568 |
Hierarchical Tree Structure | 91.53 | 84.33 | 83.27 | 65.8 | 0.593 |
Message Passing Method | Car.AP | Pedestrian.AP | Cyclist.AP |
---|---|---|---|
Mod (IoU = 0.7) | Mod (IoU = 0.5) | Mod (IoU = 0.5) | |
Top-Down | 82.23 | 62.54 | 76.13 |
Bottom-Up | 79.65 | 60.98 | 73.45 |
Top-Down and Bottom-Up | 83.05 | 62.88 | 77.55 |
Structure | Car. (Mod) | Ped. (Mod) | Cyc. (Mod) | GFLOPs | Mem. (GB) |
---|---|---|---|---|---|
DAGC | 77.56 | 59.69 | 74.35 | −0 | 11.854 |
HTR | 78.36 | 58.37 | 71.60 | −2.875 | 8.949 |
TGCMP | 82.56 | 63.31 | 78.38 | +3.581 | 14.372 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, J.; Li, J.; Zhou, W.; Ren, H.; Long, Y.; Hu, H. Density-Aware Tree–Graph Cross-Message Passing for LiDAR Point Cloud 3D Object Detection. Remote Sens. 2025, 17, 2177. https://doi.org/10.3390/rs17132177
Zhao J, Li J, Zhou W, Ren H, Long Y, Hu H. Density-Aware Tree–Graph Cross-Message Passing for LiDAR Point Cloud 3D Object Detection. Remote Sensing. 2025; 17(13):2177. https://doi.org/10.3390/rs17132177
Chicago/Turabian StyleZhao, Jingwen, Jianchao Li, Wei Zhou, Haohao Ren, Yunliang Long, and Haifeng Hu. 2025. "Density-Aware Tree–Graph Cross-Message Passing for LiDAR Point Cloud 3D Object Detection" Remote Sensing 17, no. 13: 2177. https://doi.org/10.3390/rs17132177
APA StyleZhao, J., Li, J., Zhou, W., Ren, H., Long, Y., & Hu, H. (2025). Density-Aware Tree–Graph Cross-Message Passing for LiDAR Point Cloud 3D Object Detection. Remote Sensing, 17(13), 2177. https://doi.org/10.3390/rs17132177