Next Article in Journal
A Novel and Highly Versatile Voltage Monitoring Circuit Enabling Power Consumption and Area Minimization
Previous Article in Journal
FPGA-Based Dual Learning Model for Wheel Speed Sensor Fault Detection in ABS Systems Using HIL Simulations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Pseudo-Point-Based Adaptive Fusion Network for Multi-Modal 3D Detection

1
Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China
2
University of Chinese Academy of Sciences, Beijing 101408, China
3
School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
*
Authors to whom correspondence should be addressed.
Electronics 2026, 15(1), 59; https://doi.org/10.3390/electronics15010059
Submission received: 27 November 2025 / Revised: 18 December 2025 / Accepted: 20 December 2025 / Published: 23 December 2025

Abstract

A 3D multi-modal detection method using a monocular camera and LiDAR has drawn much attention due to its low cost and strong applicability, making it highly valuable for autonomous driving and unmanned aerial vehicles (UAVs). However, conventional fusion approaches relying on static arithmetic operations often fail to adapt to dynamic, complex scenarios. Furthermore, existing ROI alignment techniques, such as local projection and cross-attention, are inadequate for mitigating the feature misalignment triggered by depth estimation noise in pseudo-point clouds. To address these issues, this paper proposes a pseudo-point-based 3D object detection method that achieves biased fusion of multi-modal data. First, a meta-weight fusion module dynamically generates fusion weights based on global context, adaptively balancing the contributions of point clouds and images. Second, a module combining bidirectional cross-attention and a gating filter mechanism is introduced to eliminate the ROI feature misalignment caused by depth completion noise. Finally, a class-agnostic box fusion strategy is introduced to aggregate highly overlapping detection boxes at the decision level, improving localization accuracy. Experiments on the KITTI dataset show that the proposed method achieves APs of 92.22%, 85.03%, and 82.25% on Easy, Moderate, and Hard difficulty levels, respectively, demonstrating leading performance. Ablation studies further validate the effectiveness and computational efficiency of each module.
Keywords: 3D object detection; multimodal; cross attention; unmanned vehicles; autonomous driving; intelligent transportation systems 3D object detection; multimodal; cross attention; unmanned vehicles; autonomous driving; intelligent transportation systems

Share and Cite

MDPI and ACS Style

Zhang, C.; Wang, W.; Yu, B.; Wei, H. A Pseudo-Point-Based Adaptive Fusion Network for Multi-Modal 3D Detection. Electronics 2026, 15, 59. https://doi.org/10.3390/electronics15010059

AMA Style

Zhang C, Wang W, Yu B, Wei H. A Pseudo-Point-Based Adaptive Fusion Network for Multi-Modal 3D Detection. Electronics. 2026; 15(1):59. https://doi.org/10.3390/electronics15010059

Chicago/Turabian Style

Zhang, Chenghong, Wei Wang, Bo Yu, and Hanting Wei. 2026. "A Pseudo-Point-Based Adaptive Fusion Network for Multi-Modal 3D Detection" Electronics 15, no. 1: 59. https://doi.org/10.3390/electronics15010059

APA Style

Zhang, C., Wang, W., Yu, B., & Wei, H. (2026). A Pseudo-Point-Based Adaptive Fusion Network for Multi-Modal 3D Detection. Electronics, 15(1), 59. https://doi.org/10.3390/electronics15010059

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop