Infra-3DRC-FusionNet: Deep Fusion of Roadside Mounted RGB Mono Camera and Three-Dimensional Automotive Radar for Traffic User Detection
Abstract
:1. Introduction
Contributions
- A novel deep fusion architecture is proposed to fuse RGB mono-camera image and 3D radar point cloud data to enhance frame-wise object detection in the 3D ground plane of the smart infrastructure-based sensor setup.
- A 3D radar-based region proposal generator is developed to enhance object detection in bad light and weather conditions by taking inspiration from [16].
- Various experiments are conducted both at the feature level (using separate backbones for each sensor) and low level (combining all six channels of camera and radar in one backbone) to find the best performance of the model.
- The output of the model is validated with Lidar ground truth data, and it is benchmarked with four different variants of frame-wise object-level (high-level) spatial fusion for various environmental conditions.
2. Related Work
3. Measurement Setup and Dataset Generation
3.1. Training and Validation Dataset
3.2. Ground Truth Data
4. Methodology
4.1. RGB Camera Image
4.2. Three-Dimensional Radar Three-Channel Pseudo Image
4.2.1. Spatial Position Encoding
4.2.2. Measurement Encoding
4.2.3. Rendering of Projected Points to Solid Circles
4.3. Camera Backbone
4.4. Radar Backbone
4.5. Radar-Camera Feature Merger
4.6. Radar Point Cloud-Based Region Proposal Network
4.6.1. Region Proposal Head
4.6.2. Radar Point Cloud-Based Anchor Generator
4.6.3. Post-Processing of Anchors to Obtain the Proposals
4.7. 2D Bounding Box and Class Head
4.8. Radar Segmentation Mask Head
4.9. Losses
5. Experiments and Results
- Mostly Detected (MD)—detected and classified correctly for or more frames.
- Partially Detected (PD)—detected and classified correctly for more than but less than of frames.
- Partially Lost (PL)—detected and classified correctly for more than but less than of frames.
- Mostly Lost (ML)—detected and classified correctly for less than of frames.
- —detected and classified correctly at least one time within first 5 consecutive frames of entry into sensor FoV.
- —detected and classified correctly at least one time within first 10 consecutive frames of entry into sensor FoV.
Discussions
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Agrawal, S.; Elger, G. Concept of Infrastructure Based Environment Perception for IN2Lab test field for Automated Driving. In Proceedings of the 2021 IEEE International Smart Cities Conference (ISC2), Online, 7–10 September 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Creß, C.; Bing, Z.; Knoll, A.C. Intelligent Transportation Systems Using Roadside Infrastructure: A Literature Survey. IEEE Trans. Intell. Transp. Syst. 2023, 25, 6309–6327. [Google Scholar] [CrossRef]
- Bai, Z.; Wu, G.; Qi, X.; Liu, Y.; Oguchi, K.; Barth, M.J. Infrastructure-based object detection and tracking for cooperative driving automation: A survey. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 4–9 June 2022; pp. 1366–1373. [Google Scholar]
- Agrawal, S.; Song, R.; Doycheva, K.; Knoll, A.; Elger, G. Intelligent Roadside Infrastructure for Connected Mobility. In Proceedings of the Smart Cities, Green Technologies, and Intelligent Transport Systems, Prague, Czech Republic, 26–28 April 2023; pp. 134–157. [Google Scholar]
- Agrawal, S.; Song, R.; Kohli, A.; Korb, A.; Andre, M.; Holzinger, E.; Elger, G. Concept of Smart Infrastructure for Connected Vehicle Assist and Traffic Flow Optimization. In Proceedings of the 8th International Conference on Vehicle Technology and Intelligent Transport Systems—VEHITS, Online, 27–29 April 2022; INSTICC. SciTePress: Setúbal, Portugal, 2022; pp. 360–367. [Google Scholar] [CrossRef]
- Guerrero-Ibáñez, J.; Zeadally, S.; Contreras-Castillo, J. Sensor Technologies for Intelligent Transportation Systems. Sensors 2018, 18, 1212. [Google Scholar] [CrossRef] [PubMed]
- Zhong, Z.; Liu, S.; Mathew, M.; Dubey, A. Camera radar fusion for increased reliability in ADAS applications. Electron. Imaging 2018, 30, 1–4. [Google Scholar] [CrossRef]
- Yao, S.; Guan, R.; Huang, X.; Li, Z.; Sha, X.; Yue, Y.; Lim, E.G.; Seo, H.; Man, K.L.; Zhu, X.; et al. Radar-camera fusion for object detection and semantic segmentation in autonomous driving: A comprehensive review. IEEE Trans. Intell. Veh. 2023, 9, 2094–2128. [Google Scholar] [CrossRef]
- Wei, Z.; Zhang, F.; Chang, S.; Liu, Y.; Wu, H.; Feng, Z. Mmwave radar and vision fusion for object detection in autonomous driving: A review. Sensors 2022, 22, 2542. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Zhang, Z.; Di, X.; Tian, J. A roadside camera-radar sensing fusion system for intelligent transportation. In Proceedings of the 2020 17th European Radar Conference (EuRAD), Utrecht, The Netherlands, 13–15 January 2021; pp. 282–285. [Google Scholar]
- Notz, D.; Becker, F.; Kühbeck, T.; Watzenig, D. Extraction and assessment of naturalistic human driving trajectories from infrastructure camera and radar sensors. In Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), Online, 20–21 August 2020; pp. 455–462. [Google Scholar]
- Fu, Y.; Tian, D.; Duan, X.; Zhou, J.; Lang, P.; Lin, C.; You, X. A camera–radar fusion method based on edge computing. In Proceedings of the 2020 IEEE International Conference on Edge Computing (EDGE), Beijing, China, 18–24 October 2020; pp. 9–14. [Google Scholar]
- Jibrin, F.A.; Deng, Z.; Zhang, Y. An object detection and classification method using radar and camera data fusion. In Proceedings of the 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Chongqing, China, 11–13 December 2019; pp. 1–6. [Google Scholar]
- Tsaregorodtsev, A.; Buchholz, M.; Belagiannis, V. Infrastructure-based Perception with Cameras and Radars for Cooperative Driving Scenarios. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju Shinhwa World, Jeju Island, Republic of Korea, 2–5 June 2024; pp. 1678–1685. [Google Scholar]
- Garvin, J.; McVicker, M.; Mshar, A.; Williamson, J.; Alkhelaifi, Y.; Anderson, W.; Jeong, N. Sensor fusion for traffic monitoring using camera, radar, and ROS. In Proceedings of the 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT), Kannur, India, 11–12 August 2022; pp. 44–50. [Google Scholar]
- Nabati, R.; Qi, H. Rrpn: Radar region proposal network for object detection in autonomous vehicles. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 3093–3097. [Google Scholar]
- Nobis, F.; Geisslinger, M.; Weber, M.; Betz, J.; Lienkamp, M. A deep learning-based radar and camera sensor fusion architecture for object detection. In Proceedings of the 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF), Bonn, Germany, 15–17 October 2019; pp. 1–7. [Google Scholar]
- Chadwick, S.; Maddern, W.; Newman, P. Distant vehicle detection using radar and vision. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8311–8317. [Google Scholar]
- Chang, S.; Zhang, Y.; Zhang, F.; Zhao, X.; Huang, S.; Feng, Z.; Wei, Z. Spatial attention fusion for obstacle detection using mmwave radar and vision sensor. Sensors 2020, 20, 956. [Google Scholar] [CrossRef] [PubMed]
- John, V.; Mita, S. Deep feature-level sensor fusion using skip connections for real-time object detection in autonomous driving. Electronics 2021, 10, 424. [Google Scholar] [CrossRef]
- Song, Y.; Xie, Z.; Wang, X.; Zou, Y. MS-YOLO: Object detection based on YOLOv5 optimized fusion millimeter-wave radar and machine vision. IEEE Sens. J. 2022, 22, 15435–15447. [Google Scholar] [CrossRef]
- Shuai, X.; Shen, Y.; Tang, Y.; Shi, S.; Ji, L.; Xing, G. millieye: A lightweight mmwave radar and camera fusion system for robust object detection. In Proceedings of the International Conference on Internet-of-Things Design and Implementation, Online, 18–21 May 2021; pp. 145–157. [Google Scholar]
- Jia, D.; Shi, H.; Zhang, S.; Qu, Y. ODSen: A Lightweight, Real-time, and Robust Object Detection System via Complementary Camera and mmWave Radar. IEEE Access 2024, 12, 129120–129133. [Google Scholar] [CrossRef]
- John, V.; Mita, S. RVNet: Deep sensor fusion of monocular camera and radar for image-based obstacle detection in challenging environments. In Proceedings of the Image and Video Technology: 9th Pacific-Rim Symposium, PSIVT 2019, Sydney, NSW, Australia, 18–22 November 2019; Proceedings 9. Springer: Berlin/Heidelberg, Germany, 2019; pp. 351–364. [Google Scholar]
- Qi, C.; Song, C.; Zhang, N.; Song, S.; Wang, X.; Xiao, F. Millimeter-Wave Radar and Vision Fusion Target Detection Algorithm Based on an Extended Network. Machines 2022, 10, 675. [Google Scholar] [CrossRef]
- Stäcker, L.; Heidenreich, P.; Rambach, J.; Stricker, D. Fusion point pruning for optimized 2d object detection with radar-camera fusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 3087–3094. [Google Scholar]
- Danapal, G.; Mayr, C.; Kariminezhad, A.; Vriesmann, D.; Zimmer, A. Attention Empowered Feature-level Radar-Camera Fusion for Object Detection. In Proceedings of the 2022 Sensor Data Fusion: Trends, Solutions, Applications (SDF), Bonn, Germany, 12–14 October 2022; pp. 1–6. [Google Scholar]
- Yadav, R.; Vierling, A.; Berns, K. Radar + RGB Fusion For Robust Object Detection In Autonomous Vehicle. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Online, 25–28 September 2020; pp. 1986–1990. [Google Scholar] [CrossRef]
- Nabati, R.; Qi, H. Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles. arXiv 2020, arXiv:2009.08428. Available online: http://arxiv.org/abs/2009.08428 (accessed on 1 April 2025).
- Liu, Y.; Chang, S.; Wei, Z.; Zhang, K.; Feng, Z. Fusing mmWave radar with camera for 3-D detection in autonomous driving. IEEE Internet Things J. 2022, 9, 20408–20421. [Google Scholar] [CrossRef]
- Meyer, M.; Kuschk, G. Deep learning based 3d object detection for automotive radar and camera. In Proceedings of the 2019 16th European Radar Conference (EuRAD), Paris, France, 2–4 October 2019; pp. 133–136. [Google Scholar]
- Nabati, R.; Qi, H. Centerfusion: Center-based radar and camera fusion for 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 1527–1536. [Google Scholar]
- Bansal, K.; Rungta, K.; Bharadia, D. Radsegnet: A reliable approach to radar camera fusion. arXiv 2022, arXiv:2208.03849. Available online: http://arxiv.org/abs/2208.03849 (accessed on 1 April 2025).
- Zhu, H.; Hu, B.J.; Chen, Z. Instance Fusion for Addressing Imbalanced Camera and Radar Data. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar]
- Kalgaonkar, P.; El-Sharkawy, M. NeXtFusion: Attention-Based Camera-Radar Fusion Network for Improved Three-Dimensional Object Detection and Tracking. Future Internet 2024, 16, 114. [Google Scholar] [CrossRef]
- Zhang, H.; Wu, K.; Chen, R.; Wu, Z.; Zhong, Y.; Li, W. TL-4DRCF: A two-level 4D radar-camera fusion method for object detection in adverse weather. IEEE Sens. J. 2024, 24, 16408–16418. [Google Scholar] [CrossRef]
- Agrawal, S.; Bhanderi, S.; Elger, G. Semi-Automatic Annotation of 3D Radar and Camera for Smart Infrastructure-Based Perception. IEEE Access 2024, 12, 34325–34341. [Google Scholar] [CrossRef]
- Agrawal, S.; Bhanderi, S.; Doycheva, K.; Elger, G. Static Multitarget-Based Autocalibration of RGB Cameras, 3-D Radar, and 3-D Lidar Sensors. IEEE Sens. J. 2023, 23, 21493–21505. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the CVPR09, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Torchvision Python Package. Available online: https://pytorch.org/vision/stable/index.html (accessed on 27 September 2024).
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Bach, F., Blei, D., Eds.; Proceedings of Machine Learning Research: New York, NY, USA, 2015; Volume 37, pp. 448–456. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. Available online: http://arxiv.org/abs/1804.02767 (accessed on 1 April 2025).
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Binary Cross Entropy Loss with Logits. Available online: https://pytorch.org/docs/stable/generated/torch.nn.functional.binary_cross_entropy_with_logits.html (accessed on 27 September 2024).
- Smooth L1 Loss. Available online: https://pytorch.org/docs/stable/generated/torch.nn.SmoothL1Loss.html (accessed on 27 September 2024).
- Binary Cross Entropy Loss. Available online: https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html (accessed on 27 September 2024).
- ADAM Optmizer Pytorch Document. Available online: https://pytorch.org/docs/stable/generated/torch.optim.Adam.html (accessed on 27 September 2024).
- Agrawal, S.; Bhanderi, S. INFRA-3DRC Dataset, 2024. [CrossRef]
Set | Person | Bicycle | Motorcycle | Car | Bus | Frames |
---|---|---|---|---|---|---|
train: easy | 670 | 1772 | 1151 | 3634 | 539 | 3232 |
train: difficult | 382 | 1259 | 277 | 1429 | 418 | 2137 |
train: ∑ | 1052 | 3031 | 1428 | 5063 | 957 | 5369 |
val: easy | 305 | 694 | 406 | 1591 | 226 | 1310 |
val: difficult | 143 | 548 | 104 | 588 | 197 | 896 |
val: ∑ | 448 | 1242 | 510 | 2179 | 423 | 2206 |
train + val ∑ | 1500 | 4273 | 1938 | 7242 | 1380 | 7575 |
Environment | Scenes | Person | Bicycle | Motorcycle | Car | Bus | Frames |
---|---|---|---|---|---|---|---|
Night + clear sky | 7 | 0 | 323 | 51 | 566 | 174 | 651 |
Night + Rain | 4 | 285 | 329 | 0 | 196 | 0 | 360 |
Day + Clear sky | 7 | 579 | 577 | 0 | 547 | 0 | 719 |
Day + Rain | 10 | 959 | 504 | 53 | 627 | 82 | 921 |
∑ | 28 | 1823 | 1733 | 104 | 1936 | 256 | 2651 |
Exp No. | Description | Avg F1-Score |
---|---|---|
1 | separate backbones for the camera and radar | |
SGD (0.01), feature concatenation | ||
ResNet18 only last three layers re-trained | ||
2 | separate backbones for the camera and radar | |
SGD (0.01), feature concatenation | ||
ResNet18 all five layers re-trained | ||
3 | separate backbones for the camera and radar | |
SGD (0.01), feature element-wise addition | ||
ResNet18 all five layers re-trained | ||
4 | separate backbones for the camera and radar | |
SGD (0.01), feature element-wise addition | ||
ResNet18 all five layers re-trained | ||
class balanced by adding weights | ||
5 | separate backbones for camera and radar | |
Adam (0.001), feature element-wise addition | ||
ResNet18 all five layers re-trained | ||
class balanced by adding weights | ||
6 | separate backbones for camera and radar | |
SGD (0.01), feature concatenation | ||
ResNet18 all five layers re-trained | ||
class balanced by adding weights | ||
7 | One backbone Residual block with resNet18 | |
All six input channels stacked together | ||
SGD(0.01) | ||
ResNet18 only last three layers re-trained | ||
8 | One backbone Residual block with resNet18 | |
All 6 input channels stacked together | ||
SGD (0.01), ResNet18 all five layers re-trained | ||
class balanced by adding weights |
Hyper-Parameter | Value |
---|---|
Batch size | 4 |
Optimizer | SGD |
Learning rate | |
Gradient clipping | 10 |
LR Schedular | CosineAnnealingLR |
Epochs | 100 |
classes | 5 |
camera image size (h, w, c) | |
radar pseudo image size (h, w, c) | |
Proposal Expansion fraction | |
uv binarization threshold | |
feature merging | Concatenation/element-wise addition |
camera feature upsampling | TransposeConv |
Object Type | ||||||
---|---|---|---|---|---|---|
Object-level fusion—Faster-RCNN and DBSCAN | ||||||
person | ||||||
bicycle | ||||||
motorcycle | ||||||
car | ||||||
bus | ||||||
Object-level fusion—YOLOv8x and DBSCAN | ||||||
person | ||||||
bicycle | ||||||
motorcycle | ||||||
car | ||||||
bus | ||||||
Object-level fusion—Faster-RCNN and Radar objects | ||||||
person | ||||||
bicycle | ||||||
motorcycle | ||||||
car | ||||||
bus | ||||||
Object-level fusion—YOLOv8x and Radar objects | ||||||
person | ||||||
bicycle | ||||||
motorcycle | ||||||
car | ||||||
bus | ||||||
Deep fusion (ours) | ||||||
person | ||||||
bicycle | ||||||
motorcycle | ||||||
car | ||||||
bus |
Fusion Type | Precision | Recall | F1-Score |
---|---|---|---|
FasterRCNN and DBSCAN | |||
YOLOv8x and DBSCAN | |||
FasterRCNN and radar objects | |||
YOLOv8x and radar objects | |||
deep fusion (ours) | 0.92 | 0.78 | 0.85 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Agrawal, S.; Bhanderi, S.; Elger, G. Infra-3DRC-FusionNet: Deep Fusion of Roadside Mounted RGB Mono Camera and Three-Dimensional Automotive Radar for Traffic User Detection. Sensors 2025, 25, 3422. https://doi.org/10.3390/s25113422
Agrawal S, Bhanderi S, Elger G. Infra-3DRC-FusionNet: Deep Fusion of Roadside Mounted RGB Mono Camera and Three-Dimensional Automotive Radar for Traffic User Detection. Sensors. 2025; 25(11):3422. https://doi.org/10.3390/s25113422
Chicago/Turabian StyleAgrawal, Shiva, Savankumar Bhanderi, and Gordon Elger. 2025. "Infra-3DRC-FusionNet: Deep Fusion of Roadside Mounted RGB Mono Camera and Three-Dimensional Automotive Radar for Traffic User Detection" Sensors 25, no. 11: 3422. https://doi.org/10.3390/s25113422
APA StyleAgrawal, S., Bhanderi, S., & Elger, G. (2025). Infra-3DRC-FusionNet: Deep Fusion of Roadside Mounted RGB Mono Camera and Three-Dimensional Automotive Radar for Traffic User Detection. Sensors, 25(11), 3422. https://doi.org/10.3390/s25113422