Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention
Abstract
1. Introduction
- A point-cloud-enhanced feature modeling approach tailored for vehicle–infrastructure cooperative perception is proposed. By integrating a dual-dimension squeeze-and-excitation mechanism with a multi-scale feature pyramid, the representation capability of sparse point clouds is improved, particularly for long-range objects and heavily occluded regions.
- A spatially adaptive feature fusion module is designed to explicitly encode feature sources and generate fusion weights using both max pooling and average pooling. Through this design, dynamic and balanced weighting between vehicle-side local features and infrastructure-side global semantic information is achieved, thereby effectively mitigating fusion bias caused by field-of-view discrepancies.
- Extensive experiments are conducted on the DAIR-V2X dataset and an additional in-house dataset. The results demonstrate that, compared with mainstream cooperative perception approaches, the proposed method achieves a significant improvement in overall 3D detection accuracy and exhibits notably enhanced robustness for long-range targets, occluded regions, and scenarios with incomplete information.
2. Materials and Methods
2.1. Related Work
2.1.1. LiDAR-Based 3D Object Detection
2.1.2. LiDAR-Based 3D Object Detection
2.2. Method
2.2.1. Point Cloud Data Preprocessing
2.2.2. Point Cloud Feature Extraction
- (1)
- Feature Encoding with the Improved PointPillars Network
- (2)
- 2D Backbone Network
2.2.3. Feature Compression and Transmission
2.2.4. Spatially Adaptive Vehicle–Infrastructure Feature Fusion
2.2.5. Detection Head
3. Results
3.1. Device Information
3.2. Experimental Datasets
3.2.1. DAIR-V2X Dataset
3.2.2. Self-Collected Dataset
3.3. Evaluation Metrics
3.3.1. Intersection over Union (IoU)
3.3.2. Average Precision (AP)
3.4. Experimental Setup
3.5. Quantitative Results
3.6. Robustness Analysis
3.6.1. Robustness Analysis to Localization and Heading Errors
3.6.2. Robustness to Transmission Latency
3.7. Performance–Bandwidth Trade-off Analysis
3.8. Qualitative Results
3.9. Ablation Study
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- Huang, T.; Liu, J.; Zhou, X.; Nguyen, D.C.; Azghadi, M.R.; Xia, Y.; Sun, S. V2X cooperative perception for autonomous driving: Recent advances and challenges. arXiv 2023, arXiv:2310.03525. [Google Scholar] [CrossRef]
- Noor-A-Rahim, M.; Liu, Z.; Lee, H.; Khyam, M.O.; He, J.; Pesch, D.; Poor, H.V. 6G for vehicle-to-everything (V2X) communications: Enabling technologies, challenges, and opportunities. Proc. IEEE 2022, 110, 712–734. [Google Scholar] [CrossRef]
- Ye, X.; Shu, M.; Li, H.; Shi, Y.; Li, Y.; Wang, G.; Tan, X.; Ding, E. Rope3D: The roadside perception dataset for autonomous driving and monocular 3D object detection task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 21341–21350. [Google Scholar]
- Wu, J.; Xu, H.; Tian, Y.; Pi, R.; Yue, R. Vehicle detection under adverse weather from roadside LiDAR data. Sensors 2020, 20, 3433. [Google Scholar] [CrossRef]
- Liu, S.; Gao, C.; Chen, Y.; Peng, X.; Kong, X.; Wang, K.; Wang, M. Towards vehicle-to-everything autonomous driving: A survey on collaborative perception. arXiv 2023, arXiv:2308.16714. [Google Scholar]
- Chen, Q.; Ma, X.; Tang, S.; Guo, J.; Yang, Q.; Fu, S. F-Cooper: Feature-based cooperative perception for autonomous vehicle edge computing system using 3D point clouds. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, Arlington, VA, USA, 7–9 November 2019; pp. 88–100. [Google Scholar]
- Yu, H.; Tang, Y.; Xie, E.; Mao, J.; Yuan, J.; Luo, P.; Nie, Z. Vehicle–infrastructure cooperative 3D object detection via feature flow prediction. arXiv 2023, arXiv:2303.10552. [Google Scholar]
- Ren, S.; Lei, Z.; Wang, Z.; Dianati, M.; Wang, Y.; Chen, S.; Zhang, W. Interruption-aware cooperative perception for V2X communication-aided autonomous driving. IEEE Trans. Intell. Veh. 2024, 9, 4698–4714. [Google Scholar] [CrossRef]
- Bai, Z.; Wu, G.; Barth, M.J.; Liu, Y.; Sisbot, E.A.; Oguchi, K. PillarGrid: Deep learning-based cooperative perception for 3D object detection from onboard-roadside LiDAR. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 1743–1749. [Google Scholar]
- Xiang, C.; Xie, X.; Feng, C.; Bai, Z.; Niu, Z.; Yang, M. V2I-BEVF: Multi-modal fusion based on BEV representation for vehicle–infrastructure perception. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; pp. 5292–5299. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. PV-RCNN: Point-voxel feature set abstraction for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10529–10538. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3DSSD: Point-based 3D single-stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11040–11048. [Google Scholar]
- Liu, Z.; Zhang, Z.; Cao, Y.; Hu, H.; Tong, X. Group-free 3D object detection via transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 2949–2958. [Google Scholar]
- Mao, J.; Xue, Y.; Niu, M.; Bai, H.; Feng, J.; Liang, X.; Xu, C. Voxel transformer for 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3164–3173. [Google Scholar]
- Chen, Y.; Liu, J.; Zhang, X.; Qi, X.; Jia, J. VoxelNeXt: Fully sparse VoxelNet for 3D object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–23 June 2023; pp. 21674–21683. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3D object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Xie, Q.; Zhou, X.; Qiu, T.; Zhang, Q.; Qu, W. Soft actor–critic-based multilevel cooperative perception for connected autonomous vehicles. IEEE Internet Things J. 2022, 9, 21370–21381. [Google Scholar] [CrossRef]
- Guo, A.; Zhang, S.; Tang, E.; Gao, X.; Pang, H.; Tian, H.; Chen, Z. When autonomous vehicle meets V2X cooperative perception: How far are we? arXiv 2025, arXiv:2509.24927. [Google Scholar] [CrossRef]
- Chen, Q.; Tang, S.; Yang, Q.; Fu, S. COOPER: Cooperative perception for connected autonomous vehicles based on 3D point clouds. In Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, 7–9 July 2019; pp. 514–524. [Google Scholar]
- Arnold, E.; Dianati, M.; De Temple, R.; Fallah, S. Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors. IEEE Trans. Intell. Transp. Syst. 2020, 23, 1852–1864. [Google Scholar] [CrossRef]
- Mo, Y.; Zhang, P.; Chen, Z.; Ran, B. A method of vehicle–infrastructure cooperative perception based vehicle state information fusion using improved Kalman filter. Multimed. Tools Appl. 2022, 81, 4603–4620. [Google Scholar] [CrossRef]
- Yu, H.; Luo, Y.; Shu, M.; Huo, Y.; Yang, Z.; Shi, Y.; Nie, Z. DAIR-V2X: A large-scale dataset for vehicle–infrastructure cooperative 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 21361–21370. [Google Scholar]
- Feng, X.; Sun, H.; Zheng, H. LCV2I: Communication-efficient and high-performance collaborative perception framework with low-resolution LiDAR. arXiv 2025, arXiv:2502.17039. [Google Scholar]
- Wang, T.H.; Manivasagam, S.; Liang, M.; Yang, B.; Zeng, W.; Urtasun, R. V2VNet: Vehicle-to-vehicle communication for joint perception and prediction. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 605–621. [Google Scholar]
- Xu, R.; Xiang, H.; Xia, X.; Han, X.; Li, J.; Ma, J. OPV2V: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 2583–2589. [Google Scholar]
- Xu, R.; Xiang, H.; Tu, Z.; Xia, X.; Yang, M.H.; Ma, J. V2X-ViT: Vehicle-to-everything cooperative perception with vision transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 107–124. [Google Scholar]
- Hu, Y.; Fang, S.; Lei, Z.; Zhong, Y.; Chen, S. Where2Comm: Communication-efficient collaborative perception via spatial confidence maps. arXiv 2022, arXiv:2209.12836. [Google Scholar]
- Yan, W.; Cao, H.; Chen, J.; Wu, T. FETR: Feature transformer for vehicle–infrastructure cooperative 3D object detection. Neurocomputing 2024, 600, 128147. [Google Scholar] [CrossRef]
- Li, X.; Yin, J.; Li, W.; Xu, C.; Yang, R.; Shen, J. Di-V2X: Learning domain-invariant representation for vehicle–infrastructure collaborative 3D object detection. Proc. AAAI Conf. Artif. Intell. 2024, 38, 3208–3215. [Google Scholar] [CrossRef]
- Chen, Z.; Shi, Y.; Jia, J. TransIFF: An instance-level feature fusion framework for vehicle–infrastructure cooperative 3D detection with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–8 October 2023; pp. 18205–18214. [Google Scholar]
- Wang, J.; Nordström, T. Latency robust cooperative perception using asynchronous feature fusion. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); IEEE: New York, NY, USA, 2025; pp. 1–10. [Google Scholar]
- Li, B.; Zhao, Y.; Tan, H. CoFormerNet: A Transformer-Based Fusion Approach for Enhanced Vehicle–Infrastructure Cooperative Perception. Sensors 2024, 24, 4101. [Google Scholar] [CrossRef]
- Li, Y.; Dai, X.; Ge, B.; Song, Y.; Wang, J. Multi-Scale Dynamic Spatial Attention Module for Robust Point Cloud Perception in Cooperative Vehicle Infrastructure System. IEEE Access 2025, 13, 172895–172904. [Google Scholar] [CrossRef]
- Zhang, H.; Li, Y.; Zheng, S.; Lu, Z.; Gui, X.; Xu, W.; Bian, J. Battery lifetime prediction across diverse ageing conditions with inter-cell deep learning. Nat. Mach. Intell. 2025, 7, 270–277. [Google Scholar] [CrossRef]
- Zhang, H.; Gui, X.; Zheng, S.; Lu, Z.; Li, Y.; Bian, J. BatteryML: An open-source platform for machine learning on battery degradation. In Proceedings of the International Conference on Learning Representations (ICLR 2024), Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Wang, L.; Lan, J.; Li, M. PAFNet: Pillar attention fusion network for vehicle–infrastructure cooperative target detection using LiDAR. Symmetry 2024, 16, 401. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Mushtaq, H.; Deng, X.; Ullah, I.; Ali, M.; Malik, B.H. O2SAT: Object-oriented-segmentation-guided spatial-attention network for 3D object detection in autonomous vehicles. Information 2024, 15, 376. [Google Scholar] [CrossRef]
- Li, E.; Wang, S.; Li, C.; Li, D.; Wu, X.; Hao, Q. SUSTechPOINTS: A Portable 3D Point Cloud Interactive Annotation Platform System. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV 2020), Las Vegas, NV, USA, 19–23 October 2020; pp. 1108–1115. [Google Scholar]
- Xu, R.; Tu, Z.; Xiang, H.; Shao, W.; Zhou, B.; Ma, J. CoBEVT: Cooperative bird’s-eye-view semantic segmentation with sparse transformers. arXiv 2022, arXiv:2207.02202. [Google Scholar]
- Lu, Y.; Li, Q.; Liu, B.; Dianati, M.; Feng, C.; Chen, S.; Wang, Y. Robust collaborative 3D object detection in presence of pose errors. arXiv 2022, arXiv:2211.07214. [Google Scholar]













| Category | Device | Description |
|---|---|---|
| Roadside Equipment | LiDAR Sensor | RoboSense (RoboSense Technology Co., Ltd., Shenzhen, China) |
| 16-beam | ||
| 10 Hz | ||
| 360°/30° | ||
| Vehicle Equipment | LiDAR Sensor | RoboSense |
| 16-beam | ||
| 20 Hz | ||
| 360°/30° | ||
| Positioning system | RTK-based high-precision localization | |
| System Integration | Synchronization | Hardware-trigger via Time Server |
| Calibration | Precise Extrinsic Calibration |
| Method | Fusion Type | DAIR-V2X | Self-Collected | |||
|---|---|---|---|---|---|---|
| AP@0.5 | AP@0.7 | Inference Time (ms) | AP@0.5 | AP@0.7 | ||
| Baseline (PointPillars [22]) | None | 0.481 | - | 25.36 | 0.359 | - |
| Late Fusion [28] | Late | 0.561 | - | 36.72 | 0.437 | - |
| Cooper [25] | Early | 0.617 | - | 69.86 | 0.561 | - |
| Cooperative Baseline | Intermediate | 0.689 | 0.531 | 50.23 | 0.607 | 0.488 |
| F-Cooper [7] | Intermediate | 0.734 | 0.559 | 35.17 | 0.712 | 0.546 |
| V2VNet [30] | Intermediate | 0.654 | 0.402 | 73.58 | 0.656 | 0.409 |
| CoBEVT [48] | Intermediate | 0.580 | 0.443 | 63.76 | 0.571 | 0.440 |
| V2X-ViT [32] | Intermediate | 0.585 | 0.449 | 161.04 | 0.564 | 0.453 |
| Where2comm [33] | Intermediate | 0.625 | 0.488 | 82.52 | 0.611 | 0.462 |
| CoAlign [49] | Intermediate | 0.741 | 0.594 | 97.41 | 0.668 | 0.547 |
| The proposed | Intermediate | 0.762 | 0.617 | 60.67 | 0.694 | 0.563 |
| Module Configuration | Params (M) | FLOPs (G) | Latency (ms) |
|---|---|---|---|
| Baseline | 4.82 | 63.52 | 50.23 |
| +R-SENet | 4.86 (+0.04) | 64.38 (+0.86) | 53.91 (+3.68) |
| +FPB-Net | 5.38 (+0.52) | 69.55 (+5.17) | 58.22 (+4.31) |
| +SAFF (Full Model) | 5.55 (+0.17) | 70.93 (+1.38) | 60.67 (+2.45) |
| Ablation Setting | Fusion Type | DAIR-V2X | Self-Collected | ||
|---|---|---|---|---|---|
| AP@0.5 | AP@0.7 | AP@0.5 | AP@0.7 | ||
| Baseline (PointPillars) | None | 0.481 | - | 0.359 | - |
| Cooperative Baseline (Concatenation) | Intermediate | 0.641 | 0.498 | 0.572 | 0.451 |
| Coop Baseline + R-SENet | Intermediate | 0.701 | 0.556 | 0.628 | 0.507 |
| Coop Baseline + R-SENet + FPB-Net | Intermediate | 0.732 | 0.588 | 0.659 | 0.534 |
| Coop Baseline + R-SENet + SAFF | Intermediate | 0.719 | 0.573 | 0.642 | 0.520 |
| Coop Baseline + R-SENet + FPB-Net + SAFF | Intermediate | 0.762 | 0.617 | 0.694 | 0.563 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yan, S.; Wu, Y.; Liu, Z.; Xie, C. Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention. World Electr. Veh. J. 2026, 17, 164. https://doi.org/10.3390/wevj17040164
Yan S, Wu Y, Liu Z, Xie C. Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention. World Electric Vehicle Journal. 2026; 17(4):164. https://doi.org/10.3390/wevj17040164
Chicago/Turabian StyleYan, Shiyang, Yanfeng Wu, Zhennan Liu, and Chengwei Xie. 2026. "Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention" World Electric Vehicle Journal 17, no. 4: 164. https://doi.org/10.3390/wevj17040164
APA StyleYan, S., Wu, Y., Liu, Z., & Xie, C. (2026). Research on Cooperative Vehicle–Infrastructure Perception Integrating Enhanced Point-Cloud Features and Spatial Attention. World Electric Vehicle Journal, 17(4), 164. https://doi.org/10.3390/wevj17040164

