Multi-Agent Cooperative Control of CAVs in Toll Plaza Diverging Areas: A Target-Path Approach
Abstract
1. Introduction
- (1)
- A MAPPO-based cooperative control framework is developed for CAVs in mixed traffic flow at toll plaza diverging areas. Different from existing hierarchical or lane-based decision formulations for structured road environments, the proposed method is tailored to cooperative target toll lane selection and maneuvers under weak lane constraints.
- (2)
- By adopting a high-fidelity simulation environment, the state and reward functions are specifically customized for the weakly constrained diverging scenario. The design incorporates path accessibility, queue conditions, surrounding vehicle distribution, and steering-related motion characteristics. This enables the learned policy to better balance traffic efficiency and safety in complex multi-vehicle interactions.
2. Methodology
3. Simulation Platform Establishment
3.1. Accessible Path Perception
3.1.1. Accessible Diverging Path Generation
3.1.2. Perception Based on Path
- Vehicle-related information: It comprises three categories: (i) dynamic kinematic states, including vehicle’s longitudinal and lateral position (, ), longitudinal and lateral velocity (, ), and longitudinal acceleration (); (ii) static attributes, including vehicle’s toll collection type () and initial lane (); and (iii) surrounding vehicle indicators () for the presence of other vehicles in predefined surrounding zones.
- Path-related information: It includes the longitudinal available moving distance (), lateral moving magnitude (), and the queue length () for path at time step , where is the toll lane number.
3.2. Dynamic Toll Lane Decision
3.3. Car-Following Model Considering Lateral Offsets
4. Multi-Agent Cooperative Decision Model
4.1. Action Space
4.2. State Space
- Local observation space: During the decentralized execution phase, each agent solely perceives the environment information through its own sensors, then forms a local observation . This allows any differences in their traffic performance to be attributed solely to their respective control strategies or human behavioral models. Specifically, includes the vehicle’s ego state (, , , ,), surrounding vehicles information (–), and path-related information (, , ).
- Global state space: During the centralized training phase, the critic network takes the global state information as input to accurately estimate the expected joint return of the agents, enabling the learning of cooperative policies. Consequently, the global state is defined as:
4.3. Reward Function
4.3.1. Traffic Efficiency Reward
4.3.2. Traffic Safety Reward
4.4. MAPPO Training Framework
5. Simulation Experiments
5.1. Data Collection and Processing
5.2. Model Setup
5.2.1. Simulation Platform Setup
5.2.2. MAPPO Algorithm Configuration
6. Simulation Results and Analysis
6.1. Benchmark Implementation
6.2. Performance Evaluation
6.3. Comparative Analysis
- Traffic volume sensitivity test: The length of the diverging area was fixed at 140 m, while traffic volumes were set to 1500, 1750, and 2000 veh/h.
- Geometric sensitivity test: With the traffic volume fixed at 1500 veh/h, the lengths of the diverging area were set to 120, 140, 160, and 180 m.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CAVs | Connected and autonomous vehicles |
| MAPPO | Multi-agent proximal policy optimization |
| CTDE | Centralized training and decentralized execution |
| MARL | Multi-agent reinforcement learning |
| PDA | Perception-Decision-Action |
| ETC | Electronic toll collection |
| MTC | Manual toll collection |
| FVD | Full velocity difference |
| MADDPG | Multi-agent deep deterministic policy gradient |
| QMIX | Monotonic mixing network |
| PPO | Proximal policy optimization |
| GAE | Generalized advantage estimation |
| TD | Temporal difference |
| UAV | Unmanned aerial vehicle |
| MLP | Multilayer perceptron |
| HVs | Human-driven vehicles |
| ETTC | Extended time-to-collision |
References
- Talebpour, A.; Mahmassani, H.S. Influence of Connected and Autonomous Vehicles on Traffic Flow Stability and Throughput. Transp. Res. Part C Emerg. Technol. 2016, 71, 143–163. [Google Scholar] [CrossRef]
- Rahman, M.M.; Thill, J.-C. Impacts of Connected and Autonomous Vehicles on Urban Transportation and Environment: A Comprehensive Review. Sustain. Cities Soc. 2023, 96, 104649. [Google Scholar] [CrossRef]
- Liu, W.; Hua, M.; Deng, Z.; Meng, Z.; Huang, Y.; Hu, C.; Song, S.; Gao, L.; Liu, C.; Shuai, B.; et al. A Systematic Survey of Control Techniques and Applications in Connected and Automated Vehicles. IEEE Internet Things J. 2023, 10, 21892–21916. [Google Scholar] [CrossRef]
- Abdelwahab, H.T.; Abdel-Aty, M.A. Artificial Neural Networks and Logit Models for Traffic Safety Analysis of Toll Plazas. Transp. Res. Rec. J. Transp. Res. Board 2002, 1784, 115–125. [Google Scholar] [CrossRef]
- Saad, M.; Abdel-Aty, M.; Lee, J. Analysis of Driving Behavior at Expressway Toll Plazas. Transp. Res. Part F Traffic Psychol. Behav. 2019, 61, 163–177. [Google Scholar] [CrossRef]
- Fei, Y.; Long, K.; Xing, L.; Pei, X.; Li, X.; Yao, L. Safety Performance Analysis of Toll Plaza Diverging Area Based on an Improved Simulation Platform for Weak-Constraint Driving Behaviors. Accid. Anal. Prev. 2025, 220, 108177. [Google Scholar] [CrossRef] [PubMed]
- Shladover, S.E.; Nowakowski, C.; Lu, X.-Y.; Ferlis, R. Cooperative Adaptive Cruise Control: Definitions and Operating Concepts. Transp. Res. Rec. J. Transp. Res. Board 2015, 2489, 145–152. [Google Scholar] [CrossRef]
- Lukose, E.; Levin, M.W.; Boyles, S.D. Incorporating Insights from Signal Optimization into Reservation-Based Intersection Controls. J. Intell. Transp. Syst. 2019, 23, 250–264. [Google Scholar] [CrossRef]
- Kamal, M.A.S.; Imura, J.; Hayakawa, T.; Ohata, A.; Aihara, K. A Vehicle-Intersection Coordination Scheme for Smooth Flows of Traffic Without Using Traffic Lights. IEEE Trans. Intell. Transp. Syst. 2015, 16, 1136–1147. [Google Scholar] [CrossRef]
- Wu, Y.; Chen, H.; Zhu, F. DCL-AIM: Decentralized Coordination Learning of Autonomous Intersection Management for Connected and Automated Vehicles. Transp. Res. Part C Emerg. Technol. 2019, 103, 246–260. [Google Scholar] [CrossRef]
- Boukerche, A.; Zhong, D.; Sun, P. A Novel Reinforcement Learning-Based Cooperative Traffic Signal System Through Max-Pressure Control. IEEE Trans. Veh. Technol. 2022, 71, 1187–1198. [Google Scholar] [CrossRef]
- Zhou, M.; Yu, Y.; Qu, X. Development of an Efficient Driving Strategy for Connected and Automated Vehicles at Signalized Intersections: A Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst. 2020, 21, 433–443. [Google Scholar] [CrossRef]
- Zhang, J.; Chang, C.; Zeng, X.; Li, L. Multi-Agent DRL-Based Lane Change with Right-of-Way Collaboration Awareness. IEEE Trans. Intell. Transp. Syst. 2023, 24, 854–869. [Google Scholar] [CrossRef]
- Mirheli, A.; Tajalli, M.; Hajibabai, L.; Hajbabaie, A. A Consensus-Based Distributed Trajectory Control in a Signal-Free Intersection. Transp. Res. Part C Emerg. Technol. 2019, 100, 161–176. [Google Scholar] [CrossRef]
- Xing, L.; He, J.; Abdel-Aty, M.; Cai, Q.; Li, Y.; Zheng, O. Examining Traffic Conflicts of Upstream Toll Plaza Area Using Vehicles’ Trajectory Data. Accid. Anal. Prev. 2019, 125, 174–187. [Google Scholar] [CrossRef] [PubMed]
- Xing, L.; He, J.; Li, Y.; Wu, Y.; Yuan, J.; Gu, X. Comparison of Different Models for Evaluating Vehicle Collision Risks at Upstream Diverging Area of Toll Plaza. Accid. Anal. Prev. 2020, 135, 105343. [Google Scholar] [CrossRef] [PubMed]
- Aoki, S.; Higuchi, T.; Altintas, O. Cooperative Perception with Deep Reinforcement Learning for Connected Vehicles. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV); IEEE: New York, NY, USA, 2020; pp. 328–334. [Google Scholar]
- Waga, A.; Benhlima, S.; Bekri, A.; Abdouni, J.; Saber, F.Z. A Survey on Autonomous Navigation for Mobile Robots: From Traditional Techniques to Deep Learning and Large Language Models. J. King Saud Univ. Comput. Inf. Sci. 2025, 37, 198. [Google Scholar] [CrossRef]
- Gregurić, M.; Kušić, K.; Ivanjko, E. Impact of Deep Reinforcement Learning on Variable Speed Limit Strategies in Connected Vehicles Environments. Eng. Appl. Artif. Intell. 2022, 112, 104850. [Google Scholar] [CrossRef]
- Jin, J.; Huang, H.; Li, Y.; Dong, Y.; Zhang, G.; Chen, J. Variable Speed Limit Control Strategy for Freeway Tunnels Based on a Multi-Objective Deep Reinforcement Learning Framework with Safety Perception. Expert Syst. Appl. 2025, 267, 126277. [Google Scholar] [CrossRef]
- Li, G.; Qiu, Y.; Yang, Y.; Li, Z.; Li, S.; Chu, W.; Green, P.; Li, S.E. Lane Change Strategies for Autonomous Vehicles: A Deep Reinforcement Learning Approach Based on Transformer. IEEE Trans. Intell. Veh. 2023, 8, 2197–2211. [Google Scholar] [CrossRef]
- Zhang, S.; Zhuang, W.; Li, B.; Li, K.; Xia, T.; Hu, B. Integration of Planning and Deep Reinforcement Learning in Speed and Lane Change Decision-Making for Highway Autonomous Driving. IEEE Trans. Transp. Electrif. 2025, 11, 521–535. [Google Scholar] [CrossRef]
- Fei, Y.; Xing, L.; Yao, L.; Yang, Z.; Zhang, Y. Deep Reinforcement Learning for Decision Making of Autonomous Vehicle in Non-Lane-Based Traffic Environments. PLoS ONE 2025, 20, e0320578. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, Y.; Zhang, X.S.; Zang, Y.; Cheng, J. Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2024; Volume 38, pp. 17600–17608. [Google Scholar] [CrossRef]
- Xing, L.; Zou, D.; Fei, Y.; Long, K.; Wang, J. Safety Evaluation of Toll Plaza Diverging Area Considering Different Vehicles’ Toll Collection Types. Appl. Sci. 2023, 13, 9005. [Google Scholar] [CrossRef]
- Bai, R.; Xu, R.; Rui, T.; Liu, J.; Lee, H.L.; Oung, Q.W.; Tian, Z.; Yuan, F. Safe and Efficient Lane-Changing for Autonomous Vehicles: An Improved Double Quintic Polynomial Approach with Time-to-Collision Evaluation. J. King Saud Univ. Comput. Inf. Sci. 2026, 38, 36. [Google Scholar] [CrossRef]
- Li, Y.; Li, L.; Ni, D. Dynamic Trajectory Planning for Automated Lane Changing Using the Quintic Polynomial Curve. J. Adv. Transp. 2023, 2023, 6926304. [Google Scholar] [CrossRef]
- Kumar, P.; Perrollaz, M.; Lefevre, S.; Laugier, C. Learning-Based Approach for Online Lane Change Intention Prediction. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV); IEEE: New York, NY, USA, 2013; pp. 797–802. [Google Scholar]
- Shi, Q.; Zhang, H. An Improved Learning-Based LSTM Approach for Lane Change Intention Prediction Subject to Imbalanced Data. Transp. Res. Part C Emerg. Technol. 2021, 133, 103414. [Google Scholar] [CrossRef]
- Peng, J.; Guo, Y.; Fu, R.; Yuan, W.; Wang, C. Multi-Parameter Prediction of Drivers’ Lane-Changing Behaviour with Neural Network Model. Appl. Ergon. 2015, 50, 207–217. [Google Scholar] [CrossRef]
- Song, X.-M.; Jin, S.; Wang, D.-H.; Cao, J.-H. Vehicle-Following Model Considering Lateral Offset. J. Jilin Univ. (Eng. Technol. Ed.) 2011, 41, 333–337. [Google Scholar]
- Qi, W.; Ma, S.; Fu, C. An Improved Car-Following Model Considering the Influence of Multiple Preceding Vehicles in the Same and Two Adjacent Lanes. Phys. A Stat. Mech. Its Appl. 2023, 632, 129356. [Google Scholar] [CrossRef]
- Helbing, D.; Tilch, B. Generalized Force Model of Traffic Dynamics. Phys. Rev. E 1998, 58, 133–138. [Google Scholar] [CrossRef]
- Hoel, C.-J.; Wolff, K.; Laine, L. Automated Speed and Lane Change Decision Making Using Deep Reinforcement Learning. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC); IEEE: New York, NY, USA, 2018; pp. 2148–2155. [Google Scholar]
- Zheng, O.; Abdel-Aty, M.; Wu, Y. UCF-SST Automated Roadway Conflicts Identify System (ARCIS). Available online: https://github.com/fatemehjdi/A-R-C-I-S (accessed on 15 March 2026).












| Variable | Description | |
|---|---|---|
| Vehicle-related variables | Longitudinal position of SV at time step . | |
| Lateral position of SV at time step . | ||
| The velocity of SV in X direction at time step . | ||
| The velocity of SV in Y direction at time step . | ||
| Longitudinal acceleration of SV at time step . | ||
| The current toll collection type of SV, 0 for a MTC vehicle, 1 for an ETC vehicle. | ||
| The initial lane of SV before it enters the diverging area. | ||
| Presence of another vehicle in the left area at time . (1 = Yes, 0 = No) | ||
| Presence of another vehicle in the right area at time . (1 = Yes, 0 = No) | ||
| Presence of another vehicle in the right-behind area at time . (1 = Yes, 0 = No) | ||
| Presence of another vehicle in the left-behind area at time . (1 = Yes, 0 = No) | ||
| Path-related variables | Available longitudinal distance on path at time . | |
| Required steering magnitude for selecting path at time (positive: leftward turn, negative: rightward turn) | ||
| The number of vehicles queued on path at time | ||
| Mainline lane | Lane ID | 1 | 2 | 3 | Total | ||||
| Toll type | ETC | MTC | ETC | MTC | ETC | MTC | ETC | MTC | |
| Vehicle counts | 115 | 29 | 202 | 54 | 122 | 106 | 439 | 189 | |
| Toll lane | Lane ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Toll type | ETC | ETC | ETC | ETC | ETC | MTC | MTC | MTC | |
| Vehicle counts | 165 | 128 | 94 | 42 | 10 | 94 | 69 | 26 | |
| Parameters | Values | Parameters | Values |
|---|---|---|---|
| Number of hidden layers | 2 | Actor learning rate | 0.001 |
| Number of units per layer | 256 | Critic learning rate | 0.001 |
| Entropy coefficient | 0.1 | Batch size | 128 |
| Discount coefficient | 0.98 | Buffer size | 20,000 |
| coefficient | 0.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Long, S.; Zheng, L.; Fei, Y. Multi-Agent Cooperative Control of CAVs in Toll Plaza Diverging Areas: A Target-Path Approach. Actuators 2026, 15, 267. https://doi.org/10.3390/act15050267
Long S, Zheng L, Fei Y. Multi-Agent Cooperative Control of CAVs in Toll Plaza Diverging Areas: A Target-Path Approach. Actuators. 2026; 15(5):267. https://doi.org/10.3390/act15050267
Chicago/Turabian StyleLong, Siyu, Lili Zheng, and Yi Fei. 2026. "Multi-Agent Cooperative Control of CAVs in Toll Plaza Diverging Areas: A Target-Path Approach" Actuators 15, no. 5: 267. https://doi.org/10.3390/act15050267
APA StyleLong, S., Zheng, L., & Fei, Y. (2026). Multi-Agent Cooperative Control of CAVs in Toll Plaza Diverging Areas: A Target-Path Approach. Actuators, 15(5), 267. https://doi.org/10.3390/act15050267
