Adaptive Cross-Modal Denoising: Enhancing LiDAR–Camera Fusion Perception in Adverse Circumstances
Highlights
- An Adaptive Cross-Modal Denoising (ACMD) framework is presented, introducing a reliability-driven uni-directional fusion mechanism that selectively refines the noisy modality using semantic cues from the cleaner sensor.
- A novel attention-based ABC + CMD pipeline is developed, enabling efficient noise-aware feature alignment and outperforming state-of-the-art unimodal and multimodal denoising methods across LiDAR–camera perception tasks.
- ACMD enhances the robustness of autonomous perception in adverse weather by achieving large gains in PSNR, Chamfer Distance, and Joint Denoising Effect, without adding computational burden.
- The plug-and-play ACMD design generalizes to any encoder–decoder backbone, making it suitable for deployment in real-time AV systems and for future multimodal sensing combinations (LiDAR–thermal, radar–camera).
Abstract
1. Introduction
2. Related Work
2.1. Denoising in LiDAR and Camera Data
2.2. Sensor Fusion Techniques
3. Adaptive Cross-Modal Denoising
3.1. Framework
3.2. Encoders
3.3. Adaptive Bridge Controller (ABC)
| Algorithm 1: Adaptive Bridge Controller (ABC) | ||
| [1] | Input: Self-denoised tensors (), Mid-level semantic tensors () | |
| [2] | Output: Semantic guidance for CMD | //Flow from clean sensor to noisier sensor |
| 1: | begin ABC module | |
| 2: | for each input do | |
| 3: | Common latent space | |
| 4: | ||
| 5: | Bidirectional Cross-attention | //Parallel operation |
| 6: | ||
| 7: | ||
| 8: | Calculate reliability scores for both modalities | |
| 9: | //Reliability for LiDAR | |
| 10: | //Reliability for Camera | |
| 11: | //Reliability difference | |
| 12: | //gating function | |
| 13: | if then | //LiDAR is more reliable |
| 14: | guidance mode ← “LiDAR to Camera” | //LiDAR guides Camera |
| 15: | semantic tensor ← | //LiDAR semantic tensor as cleane |
| 16: | noisy tensor ← | //Camera as noisy modality |
| 17: | else | |
| 18: | guidance mode ← “Camera_to_LiDAR” | //If Camera is more reliable |
| 19: | semantic tensor ← | //Use Camera semantic tensor |
| 20: | noisy tensor ← | //LiDAR is noisy |
| 21: | end if | |
| 22: | The guidance (semantic flow decision) | |
| 23: | ||
| 24: | return guidance | |
| 25: | end for | |
| 26: | End | |
3.4. Cross-Modal Denoising (CMD)
| Algorithm 2: CMD Module | |
| [1] | Input: , |
| Parameters: | |
| [2] | Output: Denoised features |
| 1: | procedure FORWARD (, ) |
| 2: | Common latent space |
| 3: | , |
| 4: | Noise-aware queries |
| 5: | # Query from noisier modality |
| 6: | Key and value metrics |
| 7: | # Key from clean modality |
| 8: | # Value from clean modality |
| 9: | Scaled dot-product |
| 10: | |
| 11: | Semantic gating mask |
| 12: | |
| 13: | Amplified Attention |
| 14: | |
| 15: | Residual fusion |
| 16: | |
| 17: | GDFN |
| 18: | return |
| 19: | end procedure |
3.5. Decoder
3.6. Training Strategy and Loss Function
4. Experiment and Analysis
4.1. Experimental Setup
4.2. WeatherKITTI Dataset
4.3. Noise Model
4.4. Metrics
4.4.1. Peak Signal-to-Noise Ratio (PSNR)
4.4.2. Chamfer Distance (CD)
4.4.3. Joint Denoising Effect (JDE)
4.5. Performance on WeatherKITTI Dataset
4.6. Ablation Experiments
4.7. Comparative Experiments
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ABC | Adaptive Bridge Controller |
| ACMD | Adaptive Cross-Modal Denoising |
| AV | Autonomous Vehicle |
| CD | Chamfer Distance |
| CMD | Cross-Modal Denoising |
| GDFN | Gated Depthwise Feed-Forward Network |
| InfoNCE | Info Noise Contrastive Estimation |
| JDE | Joint Detection Error |
| Lcm | Cross-Modal Denoising Loss |
| Lcon | Contrastive Loss |
| Lrec | Reconstruction Loss |
| LN | Layer Normalization |
| mIoU | Mean Intersection over Union |
| MLP | Multi-Layer Perceptron |
| PSNR | Peak Signal-to-Noise Ratio |
References
- Li, Y.; Zeng, F.; Lai, R.; Wu, T.; Guan, J.; Zhu, A.; Zhu, Z. TinyFusionDet: Hardware-Efficient LiDAR-Camera Fusion Framework for 3D Object Detection at Edge. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 8819–8834. [Google Scholar] [CrossRef]
- Park, J.I.; Jo, S.; Seo, H.T.; Park, J. LiDAR Denoising Methods in Adverse Environments: A Review. IEEE Sens. J. 2025, 25, 7916–7932. [Google Scholar] [CrossRef]
- Ding, J.; An, P.; Xu, Y.; Ma, J. Neighborhood Structure Consistency for Point Cloud Registration With Severe Outliers. IEEE Sens. J. 2025, 25, 20209–20223. [Google Scholar] [CrossRef]
- Zhang, P.; Fang, X.; Zhang, Z.; Fang, X.; Liu, Y.; Zhang, J. Horizontal multi-party data publishing via discriminator regularization and adaptive noise under differential privacy. Inf. Fusion 2025, 120, 103046. [Google Scholar] [CrossRef]
- Bi, T.; Li, X.; Chen, W.; Ma, Z.; Yu, R.; Xu, L. A probability density functions convolution based analytical detection probability model for LiDAR with pulse pe ak discriminator. Measurement 2025, 242, 115904. [Google Scholar] [CrossRef]
- Mushtaq, H.; Latif, S.; Ilyas, M.S.B.; Mohsin, S.M.; Ali, M. CLS-3D: Content-wise LiDAR-Camera Fusion and Slot Reweighting Transformer for 3D Object Detection in Autonomous Vehicles. IEEE Access 2025, 13, 69840–69856. [Google Scholar] [CrossRef]
- Wu, H.; Li, Y.; Xu, W.; Kong, F.; Zhang, F. Moving event detection from LiDAR point streams. Nat. Commun. 2024, 15, 345, Erratum in Nat. Commun. 2024, 15, 1994. [Google Scholar] [CrossRef]
- Zhi, P.; Xu, X.; Nie, H.; Yong, B.; Shen, J.; Zhou, Q. DefDeN: A Deformable Denoising-Based LiDAR and Camera Feature Fusion Model for 3D Object Detection. Tsinghua Sci. Technol. 2026, 31, 760–776. [Google Scholar] [CrossRef]
- Hu, M.; Mao, J.; Li, J.; Wang, Q.; Zhang, Y. A novel LiDAR signal denoising method based on convolutional autoencoding deep learning neural network. Atmosphere 2021, 12, 1403. [Google Scholar] [CrossRef]
- Zhang, W.; Ling, M. An improved point cloud denoising method in adverse weather conditions based on PP-LiteSeg network. PeerJ Comput. Sci. 2024, 10, e1832. [Google Scholar] [CrossRef]
- Zhao, X.; Wen, C.; Wang, Y.; Bai, H.; Dou, W. Triplemixer: A 3d Point Cloud Denoising Model For Adverse Weather. arXiv 2024, arXiv:2408.13802. [Google Scholar]
- Cai, H.; Zhang, Z.; Zhou, Z.; Li, Z.; Ding, W.; Zhao, J. BEVFusion4D: Learning LiDAR-camera Fusion Under Bird’s-Eye-View Via Cross-Modality Guidance And Temporal Aggregation. arXiv 2023, arXiv:2303.17099. [Google Scholar]
- Li, X.; Fan, B.; Tian, J.; Fan, H. GAFusion: Adaptive fusing LiDAR and camera with multiple guidance for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21209–21218. [Google Scholar]
- Jiao, Y.; Jie, Z.; Chen, S.; Chen, J.; Ma, L.; Jiang, Y.G. MSMDFusion: Fusing LiDAR and Camera At Multiple Scales With Multi-Depth Seeds For 3d Object Detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 21643–21652. [Google Scholar]
- Yu, K.; Tao, T.; Xie, H.; Lin, Z.; Liang, T.; Wang, B.; Chen, P.; Hao, D.; Wang, Y.; Liang, X. Benchmarking the Robustness Of Lidar-Camera Fusion For 3d Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 3188–3198. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning On Point Sets For 3d Classification And Segmentation. arXiv 2016. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning On Point Sets In a Metric Space. arXiv 2017. [Google Scholar] [CrossRef]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. arXiv 2018. [Google Scholar] [CrossRef]
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End learning for Point Cloud based 3D object Detection. arXiv 2017. [Google Scholar] [CrossRef]
- Yang, Z.; Sun, Y.; Liu, S.; Shen, X.; Jia, J. STD: Sparse-to-Dense 3D object Detector for Point Cloud. arXiv 2019. [Google Scholar] [CrossRef]
- Mohapatra, S.; Yogamani, S.; Gotzig, H.; Milz, S.; Mader, P. BEVDetNet: Bird’s eye view LiDAR point cloud based real-time 3D object detection for autonomous driving. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 2809–2815. [Google Scholar]
- Zhou, J.; Tan, X.; Shao, Z.; Ma, L. FVNet: 3D front-view proposal generation for real-time object detection from point clouds. In Proceedings of the 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Suzhou, China, 19–21 October 2019; pp. 1–8. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
- Wei, Z.; Chen, H.; Nan, L.; Wang, J.; Qin, J.; Wei, M. PathNet: Path-selective point cloud denoising. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4426–4442. [Google Scholar] [CrossRef] [PubMed]
- Zeng, H.; Feng, K.; Zhao, X.; Cao, J.; Huang, S.; Luong, H.; Philips, W. Degradation-noise-aware deep unfolding trans-former for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5507112. [Google Scholar] [CrossRef]
- Niu, T.; Xu, Z.; Luo, H.; Zhou, Z. Process Regression With Temporal Feature Extraction For Partially Interpretable Remaining Useful Life Interval Prediction In Aeroengine Prognostics. Sci. Rep. 2025, 15, 11057. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Lin, C.; Nie, L.; Huang, S.; Zhao, Y.; Pan, X.; Ai, R. Weatherdepth: Curriculum contrastive learning for self-supervised depth estimation under adverse weather conditions. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 4976–4982. [Google Scholar]
- Lebrun, M. An analysis and implementation of the BM3D image denoising method. Image Process. Line 2012, 2, 175–213. [Google Scholar] [CrossRef]
- Rakotosaona, M.J.; La Barbera, V.; Guerrero, P.; Mitra, N.J.; Ovsjanikov, M. Pointcleannet: Learning to denoise and remove outliers from dense point clouds. Comput. Graph. Forum 2020, 39, 185–203. [Google Scholar] [CrossRef]





| Configuration | Name | Specific Information |
|---|---|---|
| Hardware | CPU | Intel (R) Core i7-12700K |
| GPU | NVIDIA GeForce RTX 3090 | |
| VRAM | 24G | |
| RAM | 32G | |
| Software | Operating system | Windows 11 |
| Python | 3.1 | |
| CUDA | 13.0 | |
| cuDNN | 9.16.0 |
| Experiment | Self-Denoising | ABC | CMD | PSNR (dB) | CD (mm) | JDE (%) | Inference Time (ms) | Model Size (M) |
|---|---|---|---|---|---|---|---|---|
| Baseline | ✗ | ✗ | ✗ | 30 | 4.10 | 81.1 | 119.2 | 8 |
| SD + CMD | ✓ | ✗ | ✓ | 32.6 ↑ | 3.55 ↓ | 88.5 ↑ | 78 ↓ | 9 ↑ |
| ABC + CMD | ✗ | ✓ | ✓ | 33.5 ↑ | 3.46 ↓ | 91.5 ↑ | 70.6 ↓ | 9.2 ↑ |
| ACMD | ✓ | ✓ | ✓ | 35.9 ↑ | 2.80 ↓ | 97.3 ↑ | 49.4 ↓ | 9.5 ↑ |
| Models | PSNR (dB) | CD (mm) | Join Denoising Effect (%) | Fused (mIoU) |
|---|---|---|---|---|
| BM3D | 28 | - | - | - |
| PointCleanNet | - | 4.20 | - | - |
| BEVFusion | 30.0 | 4.10 | 81.1 | 77.2 |
| GAFusion | 30.5 | 3.95 | 83.3 | 71.3 |
| MSMDFusion | 31.1 | 4.15 | 84.9 | 74.6 |
| TripleMixer | 32.4 | 3.80 | 89.5 | 78.3 |
| CMD (Our) | 35.9 | 2.80 | 97.3 | 87.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ghaffar, M.A.; Zhang, K.; Pan, N.; Peng, L. Adaptive Cross-Modal Denoising: Enhancing LiDAR–Camera Fusion Perception in Adverse Circumstances. Sensors 2026, 26, 408. https://doi.org/10.3390/s26020408
Ghaffar MA, Zhang K, Pan N, Peng L. Adaptive Cross-Modal Denoising: Enhancing LiDAR–Camera Fusion Perception in Adverse Circumstances. Sensors. 2026; 26(2):408. https://doi.org/10.3390/s26020408
Chicago/Turabian StyleGhaffar, Muhammad Arslan, Kangshuai Zhang, Nuo Pan, and Lei Peng. 2026. "Adaptive Cross-Modal Denoising: Enhancing LiDAR–Camera Fusion Perception in Adverse Circumstances" Sensors 26, no. 2: 408. https://doi.org/10.3390/s26020408
APA StyleGhaffar, M. A., Zhang, K., Pan, N., & Peng, L. (2026). Adaptive Cross-Modal Denoising: Enhancing LiDAR–Camera Fusion Perception in Adverse Circumstances. Sensors, 26(2), 408. https://doi.org/10.3390/s26020408

