Multi-Agent Sensor Fusion Methodology Using Deep Reinforcement Learning: Vehicle Sensors to Localization
Abstract
1. Introduction
2. Connected and Autonomous Vehicles Background
2.1. Vehicle Sensors
2.2. Vehicular Connectivity
- 5G: The fifth generation of mobile networks, offering data transfer speeds from 1 to 10 Gbps with latencies as low as 1 ms [14], thus supporting real-time applications.
- Cloud computing: All connected devices leveraging online processing and storage resources.
- V2X: Encompasses all vehicle communications with “anything” (vehicle-to-infrastructure, vehicle-to-vehicle, vehicle-to-device, vehicle-to-grid, and vehicle-to-cloud).
2.3. Carla Simulator and CarAware Framework
3. Deep Reinforcement Learning Background
3.1. Overview
3.2. Curriculum Learning
4. Collective Perception Methodology
4.1. Training Setup
- Actor Network: An MLP with three fully connected layers of dimensions , where defines the hidden layers (selected experimentally) and 2 is the action output dimension (coordinates x, y). The first two layers use ReLU activation; the last uses no activation. Input and output are normalized to to prevent early weight overfitting. During training, actions are drawn from the multivariate Gaussian ; during evaluation, the mean is directly used.
- Critic Network: An MLP with three fully connected layers of dimensions , where 1 is the output (value function ). The first two layers use ReLU activation; the last uses none.
- Start a simulation episode, spawning vehicles and assigning automatic agents.
- Store tuples for each transition.
- Calculate advantage estimates using GAE (Equation (2)).
- Divide the complete horizon of data into stochastically sampled mini-batches, and feed data into the actor–critic network, defined by the hyperparameter “Epoch Number”. The training process optimizes the parameters of actor () and critic () networks via Adam optimizer based on the Clipped Surrogate Objective Loss () function.
- Run the previous steps repeatedly until all episodes have been completed (define in “Episodes Number”).
4.2. Training Methodology
5. Results and Discussions
5.1. Scenario 1: Town 02—Localization with All Sensors and No Blackout Events
5.2. Scenario 2: Town 02—Localization with Eventual GNSS Blackout Events
5.3. Scenario 3: Town 02—Localization with Eventual IMU, SAS/WO Blackout Events
5.4. Scenario 4: Town 01—Localization with All Sensors and No Blackout Events
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Aria, M. A Survey of Self-driving Urban Vehicles Development. IOP Conf. Ser. Mater. Sci. Eng. 2019, 662, 042006. [Google Scholar] [CrossRef]
- Chougule, A.; Chamola, V.; Sam, A.; Yu, F.R.; Sikdar, B. A Comprehensive Review on Limitations of Autonomous Driving and Its Impact on Accidents and Collisions. IEEE Open J. Veh. Technol. 2024, 5, 142–161. [Google Scholar] [CrossRef]
- Hwang, S.; Lee, K.; Jeon, H.; Kum, D. Autonomous Vehicle Cut-In Algorithm for Lane-Merging Scenarios via Policy-Based Reinforcement Learning Nested Within Finite-State Machine. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17594–17606. [Google Scholar] [CrossRef]
- Cheng, J.; Ju, M.; Zhou, M.; Liu, C.; Gao, S.; Abusorrah, A.; Jiang, C. A Dynamic Evolution Method for Autonomous Vehicle Groups in a Highway Scene. IEEE Internet Things J. 2022, 9, 1445–1457. [Google Scholar] [CrossRef]
- García Cuenca, L.; Puertas, E.; Fernandez Andrés, J.; Aliane, N. Autonomous Driving in Roundabout Maneuvers Using Reinforcement Learning with Q-Learning. Electronics 2019, 8, 1536. [Google Scholar] [CrossRef]
- Sidauruk, A.; Ikmah. Congestion Correlation and Classification from Twitter and Waze Map Using Artificial Neural Network. In Proceedings of the 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 13–14 November 2018; pp. 224–229. [Google Scholar] [CrossRef]
- Shang, W.; Song, X.; Xiang, Q.; Chen, H.; Elhajj, M.; Bi, H.; Wang, K.; Ochieng, W. The impact of deep reinforcement learning-based traffic signal control on Emission reduction in urban Road networks empowered by cooperative vehicle-infrastructure systems. Appl. Energy 2025, 390, 125884. [Google Scholar] [CrossRef]
- Fang, S.; Yang, L.; Shang, W.; Zhao, X.; Li, F.; Ochieng, W. Cooperative Control Model Using Reinforcement Learning for Connected and Automated Vehicles and Traffic Signal Light at Signalized Intersections. IEEE Internet Things J. 2025, 2, 44037–44050. [Google Scholar] [CrossRef]
- Mahtani, A.; Sanchez, L.; Fernandez, E.; Martinez, A.; Joseph, L. ROS Programming: Building Powerful Robots; Packt Publishing: Birmingham, UK, 2018. [Google Scholar]
- Rosique, F.; Navarro, P.J.; Fernández, C.; Padilla, A. A Systematic Review of Perception System and Simulators for Autonomous Vehicles Research. Sensors 2019, 19, 648. [Google Scholar] [CrossRef] [PubMed]
- Vargas, J.; Alsweiss, S.; Toker, O.; Razdan, R.; Santos, J. An Overview of Autonomous Vehicles Sensors and Their Vulnerability to Weather Conditions. Sensors 2021, 21, 5397. [Google Scholar] [CrossRef] [PubMed]
- Ignatious, H.A.; Hesham-El-Sayed; Khan, M. An overview of sensors in Autonomous Vehicles. Procedia Comput. Sci. 2022, 198, 736–741. [Google Scholar] [CrossRef]
- Araújo, T.O.; Netto, M.L.; Justo, J.F. CarAware: A Deep Reinforcement Learning Platform for Multiple Autonomous Vehicles Based on CARLA Simulation Framework. In Proceedings of the 2023 8th International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), Nice, France, 14–16 June 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Jansen, M.; Beaton, P. 5G vs. 4G: How Does the Newest Network Improve on the Last? 2022. Available online: http://www.digitaltrends.com/mobile/5g-vs-4g/ (accessed on 15 December 2025).
- SAE. V2X Communications Message Set Dictionary; SAE: London, UK, 2020. [Google Scholar]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; López, A.M.; Koltun, V. CARLA: An Open Urban Driving Simulator. arXiv 2017. [Google Scholar] [CrossRef]
- Naeem, M.; Rizvi, S.T.H.; Coronato, A. A Gentle Introduction to Reinforcement Learning and its Application in Different Fields. IEEE Access 2020, 8, 209320–209344. [Google Scholar] [CrossRef]
- Alharin, A.; Doan, T.N.; Sartipi, M. Reinforcement Learning Interpretation Methods: A Survey. IEEE Access 2020, 8, 171058–171077. [Google Scholar] [CrossRef]
- OpenAI. Part 2: Kinds of RL Algorithms; OpenAI: San Francisco, CA, USA, 2022. [Google Scholar]
- Wang, T.; Bao, X.; Clavera, I.; Hoang, J.; Wen, Y.; Langlois, E.; Zhang, S.; Zhang, G.; Abbeel, P.; Ba, J. Benchmarking Model-Based Reinforcement Learning. arXiv 2019. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017. [Google Scholar] [CrossRef]
- Wang, X.; Chen, Y.; Zhu, W. A Survey on Curriculum Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4555–4576. [Google Scholar] [CrossRef] [PubMed]
- Khaitan, S.; Dolan, J.M. State Dropout-Based Curriculum Reinforcement Learning for Self-Driving at Unsignalized Intersections. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022. [Google Scholar] [CrossRef]
- Berner, C.; Brockman, G.; Chan, B.; Cheung, C.; Dębiak, P.; Dennison, C.; Farhi, D.; Fischer, Q.; Hashme, S.; Hesse, C.; et al. Dota 2 with Large Scale Deep Reinforcement Learning. arXiv 2019. [Google Scholar] [CrossRef]
- Kirk, R.; Zhang, A.; Grefenstette, E.; Rocktäschel, T. A Survey of Generalisation in Deep Reinforcement Learning. arXiv 2021. [Google Scholar] [CrossRef]
- Li, Z. A Hierarchical Autonomous Driving Framework Combining Reinforcement Learning and Imitation Learning. In Proceedings of the 2021 International Conference on Computer Engineering and Application (ICCEA), Kunming, China, 25–27 June 2021; pp. 395–400. [Google Scholar] [CrossRef]












| Feature | Ultrasonic | RADAR | LiDAR | Camera |
|---|---|---|---|---|
| Primary Technology | Sound wave | Radio wave | Laser beam | Light |
| Range | ∼5 m | ∼250 m | ∼200 m | ∼200 m |
| Infrared Frequency | 40–70 kHz | 24, 74 or 79 GHz | 193 or 331 THz | 272–1498 THz |
| Affected by Weather | Yes | No | Yes | Yes |
| Affected by Lighting | No | No | No | Yes |
| Size | Small | Small | Big | Small |
| Detects Speed | Poor | Very Good | Good | Poor |
| Resolution | Poor | Average | Good | Very Good |
| Detects Distance | Good | Very Good | Good | Poor |
| Interference Susceptibility | Good | Poor | Good | Very Good |
| Field of View | Poor | Average | Very Good | Good |
| Accuracy | Poor | Average | Very Good | Very Good |
| Frame Rate | Average | Average | Average | Good |
| Colour Perception | Poor | Poor | Poor | Very Good |
| Maintenance | Average | Poor | Poor | Average |
| Visibility | Poor | Poor | Average | Good |
| Price | Good | Average | Poor | Average |
| Hyperparameter | Value |
|---|---|
| Learning Rate | 0.0001 |
| Learning Rate Decay | 1 |
| GAE Discount Factor | 0.99 |
| GAE Lambda | 0.95 |
| Value Loss Scale Factor | 1 |
| Initial Deviation | 0.7 |
| Entropy Scale | 0.01 |
| PPO Epsilon | 0.2 |
| Horizon Number | 32,768 |
| Batch Size | 2048 |
| Epoch Number | 4 |
| Step | Vehicle | Restart | GNSS Error | Blackout | Town |
|---|---|---|---|---|---|
| 1 | Single | No | High | None | 01 |
| 2 | Single | No | High | None | 02 |
| 3 | Single | Yes | High | None | 01/02 |
| 4 | Single | Yes | High | None | 02 |
| 5 | Single | Yes | High | None | 02 |
| 6 | Single | Yes | Low | None | 02 |
| Step | Vehicle | Restart | GNSS Error | Blackout | Town |
|---|---|---|---|---|---|
| 1 | Single | No | High | None | 01 |
| 2 | Single | No | High | None | 02 |
| 3 | Single | Yes | High | None | 01/02 |
| 4 | Single | Yes | High | None | 02 |
| 5 | Single | Yes | High | GNSS | 02 |
| 6 | Single | Yes | Low | GNSS | 02 |
| Step | Vehicle | Restart | GNSS Error | Blackout | Town |
|---|---|---|---|---|---|
| 1 | Single | No | High | None | 01 |
| 2 | Single | No | High | None | 02 |
| 3 | Single | Yes | High | None | 01/02 |
| 4 | Single | Yes | High | None | 02 |
| 5 | Single | Yes | High | IMU/SAS/WO | 02 |
| 6 | Single | Yes | Low | IMU/SAS/WO | 02 |
| Step | Vehicle | Restart | GNSS Error | Blackout | Town |
|---|---|---|---|---|---|
| 1 | Single | No | High | None | 01 |
| 2 | Single | No | High | None | 02 |
| 3 | Single | Yes | High | None | 01/02 |
| 4 | Single | Yes | High | None | 01 |
| 5 | Single | Yes | Low | None | 01 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Araújo, T.O.; Netto, M.L.; Francisco Justo, J. Multi-Agent Sensor Fusion Methodology Using Deep Reinforcement Learning: Vehicle Sensors to Localization. Sensors 2026, 26, 1105. https://doi.org/10.3390/s26041105
Araújo TO, Netto ML, Francisco Justo J. Multi-Agent Sensor Fusion Methodology Using Deep Reinforcement Learning: Vehicle Sensors to Localization. Sensors. 2026; 26(4):1105. https://doi.org/10.3390/s26041105
Chicago/Turabian StyleAraújo, Túlio Oliveira, Marcio Lobo Netto, and João Francisco Justo. 2026. "Multi-Agent Sensor Fusion Methodology Using Deep Reinforcement Learning: Vehicle Sensors to Localization" Sensors 26, no. 4: 1105. https://doi.org/10.3390/s26041105
APA StyleAraújo, T. O., Netto, M. L., & Francisco Justo, J. (2026). Multi-Agent Sensor Fusion Methodology Using Deep Reinforcement Learning: Vehicle Sensors to Localization. Sensors, 26(4), 1105. https://doi.org/10.3390/s26041105

