HFSA-Net: A 3D Object Detection Network with Structural Encoding and Attention Enhancement for LiDAR Point Clouds
Abstract
1. Introduction
- (1)
- A structured voxel feature encoder is proposed, which explicitly compensates for the loss of local geometric information during voxelization by employing intra-voxel feature refinement and multi-scale neighborhood context aggregation. This is designed to improve the model’s representation capability for fine-grained structures.
- (2)
- A hybrid-domain attention-guided sparse backbone network is constructed. This network introduces a decoupled hybrid-domain attention mechanism that enables the network to dynamically focus on salient feature regions within sparse point clouds, thereby enhancing the effectiveness of feature extraction.
- (3)
- A scale-aggregated detection head is designed to enhance the model’s perception and localization capabilities for objects at varying distances and sizes. It achieves this by fusing a multi-level feature pyramid to adapt to variations in point cloud density.
2. Related Work
2.1. Point-Based Methods
2.2. Voxel-Based Methods
2.3. Point-Voxel Fusion Methods
3. Methods
3.1. Overall Architecture
3.2. Structured Voxel Feature Encoder
3.3. Hybrid-Domain Attention-Guided Sparse Backbone
3.4. Scale-Aggregation Head
3.5. Loss Function
4. Experiments and Result Analysis
4.1. Dataset
4.2. Experimental Setup and Parameters
4.3. Evaluation Metrics
4.4. Experimental Result Analysis
4.4.1. Loss Curve
4.4.2. Quantitative Analysis
4.4.3. Qualitative Analysis
4.5. Ablation Study
4.6. Real-Vehicle Experiment
5. Discussion
5.1. Performance Analysis and Interpretations
5.2. Generalization and Practicality
5.3. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| HFSA-Net | Hierarchical Focus and Structural-Aware Network |
| S-VFE | Structured Voxel Feature Encoder |
| HDA-Backbone | Hybrid-Domain Attention-guided sparse Backbone |
| SA-Head | Scale-Aggregation Head |
| FastCA | Fast Coordinate Attention |
| GCT | Gated Channel Transformation |
| FPN | Feature Pyramid Network |
| AP | Average Precision |
| AOS | Average Orientation Similarity |
| TP | True Positive |
References
- Janai, J.; Güney, F.; Behl, A.; Geiger, A. Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art. FNT Comput. Graph. Vis. 2020, 12, 1–308. [Google Scholar] [CrossRef]
- Geiger, A. Object Detection Evaluation 2012; The KITTI Vision Benchmark Suite: Tübingen, Germany, 2012. [Google Scholar]
- Linnhoff, C.; Hofrichter, K.; Elster, L.; Rosenberger, P.; Winner, H. Measuring the Influence of Environmental Conditions on Automotive Lidar Sensors. Sensors 2022, 22, 5266. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Ibanez-Guzman, J. Lidar for Autonomous Driving: The Principles, Challenges, and Trends for Automotive Lidar and Perception Systems. IEEE Signal Process. Mag. 2020, 37, 50–61. [Google Scholar] [CrossRef]
- Shi, W.; Rajkumar, R. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1711–1719. [Google Scholar]
- Fernandes, D.; Silva, A.; Névoa, R.; Simões, C.; Gonzalez, D.; Guevara, M.; Novais, P.; Monteiro, J.; Melo-Pinto, P. Point-Cloud Based 3D Object Detection and Classification Methods for Self-Driving Applications: A Survey and Taxonomy. Inf. Fusion 2021, 68, 161–191. [Google Scholar] [CrossRef]
- Wang, R.; Peethambaran, J.; Chen, D. LiDAR Point Clouds to 3-D Urban Models: A Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 606–627. [Google Scholar] [CrossRef]
- Li, Y.; Ma, L.; Zhong, Z.; Liu, F.; Chapman, M.A.; Cao, D.; Li, J. Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3412–3432. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Liu, L.; Fu, X.; Dong, J.; Huang, F.; Lang, P. Overview of LiDAR Point Cloud Target Detection Methods Based on Deep Learning. Sens. Rev. 2022, 42, 485–502. [Google Scholar] [CrossRef]
- Liu, M.; Ma, J.; Zheng, Q.; Liu, Y.; Shi, G. 3D object detection based on attention and multi-scale feature fusion. Sensors 2022, 22, 3935. [Google Scholar] [CrossRef] [PubMed]
- Li, B.; Zhang, T.; Xia, T. Vehicle Detection from 3D Lidar Using Fully Convolutional Network 2016. arXiv 2016, arXiv:1608.07916. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-View 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast Encoders for Object Detection From Point Clouds. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-Based 3D Object Detection and Tracking. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10529–10538. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.U.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.S.; Koltun, V. Point Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 16259–16268. [Google Scholar]
- Mao, J.; Xue, Y.; Niu, M.; Bai, H.; Feng, J.; Liang, X.; Xu, H.; Xu, C. Voxel Transformer for 3D Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 3164–3173. [Google Scholar]
- Rehman, M.Z.U.; Islam, S.M.S.; Blake, D.; Ulhaq, A.; Janjua, N. Deep Learning for Land Use Classification: A Systematic Review of HS-LiDAR Imagery. Artif. Intell. Rev. 2025, 58, 272. [Google Scholar] [CrossRef]
- Coglan, J.; Gharineiat, Z.; Tarsha Kurdi, F. Automatic Rooftop Solar Panel Recognition from UAV LiDAR Data Using Deep Learning and Geometric Feature Analysis. Remote Sens. 2025, 17, 3389. [Google Scholar] [CrossRef]
- Wang, Z.; Huang, X.; Hu, Z. Attention-Based LiDAR–Camera Fusion for 3D Object Detection in Autonomous Driving. World Electr. Veh. J. 2025, 16, 306. [Google Scholar] [CrossRef]
- Naich, A.Y.; Carrión, J.R. LiDAR-Based Intensity-Aware Outdoor 3D Object Detection. Sensors 2024, 24, 2942. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Virtual, 17 October 2021; pp. 13713–13722. [Google Scholar]
- Yang, Z.; Zhu, L.; Wu, Y.; Yang, Y. Gated Channel Transformation for Visual Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11794–11803. [Google Scholar]
- Liang, T.; Xie, H.; Yu, K.; Xia, Z.; Lin, Z.; Wang, Y.; Tang, T.; Wang, B.; Tang, Z. BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework. Adv. Neural Inf. Process. Syst. 2022, 35, 10421–10434. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2117–2125. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Chen, X.; Kundu, K.; Zhu, Y.; Berneshawi, A.G.; Ma, H.; Fidler, S.; Urtasun, R. 3D Object Proposals for Accurate Object Class Detection. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]








| Parameters | Values |
|---|---|
| Laser Harness (Wire) | 64 |
| Measuring range (m) | 120 |
| Range Accuracy (cm) | 2 |
| Horizontal FoV (°) Vertical FoV (°) | 360 26.8 |
| Output (points per second) | 1,300,000 |
| Model | Car | Pedestrian | Cyclist | mAP | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Easy | Mode | Hard | Easy | Mode | Hard | Easy | Mode | Hard | ||
| Second | 88.07 | 84.00 | 75.33 | 58.09 | 50.22 | 47.20 | 83.66 | 66.19 | 62.13 | 68.24 |
| VoxelNet | 89.35 | 79.26 | 77.39 | 46.13 | 40.74 | 38.11 | 66.7 | 54.76 | 50.55 | 60.33 |
| F-PointNet | 88.70 | 84.00 | 75.33 | 58.09 | 51.05 | 47.54 | 75.38 | 61.69 | 54.68 | 66.27 |
| PointPillars | 92.05 | 87.80 | 85.19 | 56.53 | 50.83 | 46.43 | 81.32 | 65.07 | 60.73 | 69.55 |
| Centerpoint | 91.41 | 85.63 | 83.04 | 56.02 | 51.77 | 47.96 | 80.10 | 68.11 | 64.80 | 69.87 |
| Ours | 91.31 | 87.62 | 86.06 | 62.16 | 57.06 | 52.58 | 83.19 | 68.93 | 64.83 | 72.64 |
| Model | Car | Pedestrian | Cyclist | mAP | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Easy | Mode | Hard | Easy | Mode | Hard | Easy | Mode | Hard | ||
| Second | 86.44 | 76.97 | 73.39 | 47.47 | 40.47 | 36.26 | 81.28 | 63.49 | 59.29 | 62.78 |
| VoxelNet | 77.47 | 65.11 | 57.73 | 39.48 | 33.69 | 31.50 | 61.22 | 48.36 | 44.37 | 50.99 |
| F-PointNet | 81.20 | 70.39 | 62.19 | 51.21 | 44.89 | 40.23 | 71.96 | 56.77 | 50.39 | 58.80 |
| PointPillars | 85.03 | 75.76 | 72.74 | 50.08 | 44.18 | 39.53 | 77.13 | 60.94 | 56.91 | 62.48 |
| Centerpoint | 86.86 | 75.98 | 73.09 | 49.70 | 45.13 | 41.16 | 76.73 | 63.34 | 60.34 | 63.59 |
| Ours | 85.80 | 77.63 | 75.46 | 56.18 | 51.17 | 46.85 | 82.39 | 65.01 | 61.90 | 66.93 |
| Model | Car | Pedestrian | Cyclist | mAP | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Easy | Mode | Hard | Easy | Mode | Hard | Easy | Mode | Hard | ||
| Second | 94.84 | 90.94 | 90.11 | 60.01 | 53.92 | 50.77 | 89.40 | 72.82 | 68.87 | 74.63 |
| SubCNN | 90.61 | 88.43 | 78.63 | 78.33 | 66.28 | 61.37 | 71.39 | 63.41 | 46.34 | 71.64 |
| AVOD-FPN | 89.95 | 87.13 | 79.74 | 53.36 | 44.92 | 43.77 | 67.61 | 57.53 | 54.16 | 64.24 |
| PointPillars | 95.02 | 91.24 | 88.46 | 47.33 | 44.40 | 41.31 | 84.75 | 71.35 | 67.24 | 70.12 |
| Centerpoint | 95.56 | 89.73 | 88.95 | 70.10 | 65.31 | 61.61 | 90.59 | 78.68 | 75.25 | 79.53 |
| Ours | 96.03 | 92.52 | 90.36 | 74.35 | 69.59 | 65.21 | 90.01 | 76.38 | 72.34 | 80.76 |
| S-VFE | HDA | SA-H | Car | Pedestrian | Cyclist | mAP | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Easy | Mode | Hard | Easy | Mode | Hard | Easy | Mode | Hard | ||||
| 91.41 | 85.63 | 83.04 | 56.02 | 51.77 | 47.96 | 80.10 | 68.11 | 64.80 | 69.87 | |||
| √ | 89.79 | 86.16 | 85.58 | 55.62 | 51.88 | 47.83 | 81.13 | 68.71 | 64.93 | 70.18 | ||
| √ | 89.34 | 86.08 | 85.61 | 56.65 | 51.78 | 47.72 | 83.53 | 67.80 | 63.60 | 70.23 | ||
| √ | 89.89 | 86.40 | 86.01 | 58.98 | 53.27 | 48.90 | 84.49 | 70.64 | 66.17 | 71.64 | ||
| √ | √ | 90.07 | 86.66 | 85.98 | 58.87 | 53.81 | 49.30 | 82.04 | 70.25 | 66.07 | 71.45 | |
| √ | √ | 91.99 | 87.94 | 86.33 | 59.53 | 54.07 | 50.02 | 82.46 | 67.80 | 64.16 | 71.59 | |
| √ | √ | 89.61 | 87.64 | 86.16 | 60.93 | 56.54 | 51.51 | 82.07 | 70.71 | 66.73 | 72.44 | |
| √ | √ | √ | 91.31 | 87.62 | 86.06 | 62.16 | 57.06 | 52.58 | 83.19 | 68.93 | 64.83 | 72.64 |
| S-VFE | HDA | SA-H | Car | Pedestrian | Cyclist | mAP | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Easy | Mode | Hard | Easy | Mode | Hard | Easy | Mode | Hard | ||||
| 86.86 | 75.98 | 73.09 | 49.70 | 45.13 | 41.16 | 76.73 | 63.34 | 60.34 | 63.59 | |||
| √ | 84.09 | 76.02 | 74.02 | 51.20 | 46.92 | 42.61 | 78.19 | 63.02 | 59.15 | 63.91 | ||
| √ | 85.49 | 75.91 | 74.07 | 50.83 | 45.97 | 41.55 | 81.54 | 63.18 | 59.04 | 64.18 | ||
| √ | 84.41 | 77.83 | 75.85 | 53.44 | 47.68 | 43.35 | 81.70 | 66.92 | 62.59 | 65.97 | ||
| √ | √ | 86.49 | 76.46 | 74.47 | 54.01 | 49.31 | 44.30 | 79.17 | 66.69 | 62.69 | 65.96 | |
| √ | √ | 85.63 | 78.15 | 74.70 | 53.19 | 48.09 | 44.22 | 79.54 | 62.98 | 59.32 | 65.09 | |
| √ | √ | 84.95 | 77.29 | 75.63 | 57.17 | 52.40 | 47.45 | 77.90 | 66.03 | 61.95 | 66.75 | |
| √ | √ | √ | 85.80 | 77.63 | 75.46 | 56.18 | 51.17 | 46.85 | 82.39 | 65.01 | 61.90 | 66.93 |
| S-VFE | HDA | SA-H | Car | Pedestrian | Cyclist | mAP | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Easy | Mode | Hard | Easy | Mode | Hard | Easy | Mode | Hard | ||||
| 95.56 | 89.73 | 88.95 | 70.10 | 65.31 | 61.61 | 90.59 | 78.68 | 75.25 | 79.53 | |||
| √ | 94.80 | 90.84 | 90.06 | 68.61 | 65.45 | 62.35 | 88.63 | 77.82 | 73.39 | 79.11 | ||
| √ | 94.30 | 92.06 | 90.23 | 71.68 | 67.10 | 62.63 | 89.22 | 75.72 | 72.23 | 79.47 | ||
| √ | 94.71 | 90.78 | 90.10 | 70.85 | 66.54 | 62.09 | 91.20 | 77.45 | 72.81 | 79.61 | ||
| √ | √ | 94.82 | 91.02 | 90.34 | 72.36 | 67.86 | 62.94 | 90.90 | 78.09 | 74.19 | 80.28 | |
| √ | √ | 94.75 | 92.02 | 90.20 | 71.50 | 67.37 | 63.57 | 88.85 | 76.82 | 73.52 | 79.84 | |
| √ | √ | 94.48 | 92.46 | 90.24 | 71.26 | 67.84 | 64.40 | 88.51 | 77.31 | 73.11 | 79.95 | |
| √ | √ | √ | 96.03 | 92.52 | 90.36 | 74.35 | 69.59 | 65.21 | 90.01 | 76.38 | 72.34 | 80.76 |
| Parameters | Values |
|---|---|
| Laser Harness (Wire) | 32 |
| Measuring range (m) | 80–100 |
| Range Accuracy (cm) | ±2 |
| Dimensions (mm) | 85 × 144 |
| Horizontal FoV (°) | 360 |
| Vertical FoV (°) | +10.67° to −30.67° |
| Supply Voltage (VDC) | 9–32 |
| Laser Class | Class 1 |
| Power (W) | 31.4 |
| Output (points per second) | 700,000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yin, X.; Xiao, Z.; Shao, J.; Qiu, Z.; Wang, L. HFSA-Net: A 3D Object Detection Network with Structural Encoding and Attention Enhancement for LiDAR Point Clouds. Sensors 2026, 26, 338. https://doi.org/10.3390/s26010338
Yin X, Xiao Z, Shao J, Qiu Z, Wang L. HFSA-Net: A 3D Object Detection Network with Structural Encoding and Attention Enhancement for LiDAR Point Clouds. Sensors. 2026; 26(1):338. https://doi.org/10.3390/s26010338
Chicago/Turabian StyleYin, Xuehao, Zhen Xiao, Jinju Shao, Zhimin Qiu, and Lei Wang. 2026. "HFSA-Net: A 3D Object Detection Network with Structural Encoding and Attention Enhancement for LiDAR Point Clouds" Sensors 26, no. 1: 338. https://doi.org/10.3390/s26010338
APA StyleYin, X., Xiao, Z., Shao, J., Qiu, Z., & Wang, L. (2026). HFSA-Net: A 3D Object Detection Network with Structural Encoding and Attention Enhancement for LiDAR Point Clouds. Sensors, 26(1), 338. https://doi.org/10.3390/s26010338
