Enhancing Mixed Traffic Stability with TD3-Driven Bilateral Control in Autonomous Vehicle Chains
Abstract
:1. Introduction
- Theoretical Contribution: This study extends the existing TD3-BCM framework to heterogeneous mixed traffic scenarios involving both AVs and HDVs. By introducing AV penetration rate as a key parameter, we derive a chain stability criterion and identify a critical threshold necessary for maintaining stable operation. Based on this, we develop a multi-dimensional evaluation framework that incorporates stability, traffic efficiency, safety, and energy conservation—thereby enriching the theoretical foundation of mixed traffic flow control.
- Methodological Contribution: We construct a TD3-BCM control model suitable for heterogeneous vehicle platoons by embedding AV–HDV interaction states into the state–action–reward formulation of the DRL framework. Compared to previous BCM-DRL models designed for homogeneous AV scenarios, our approach demonstrates superior policy generalization and disturbance suppression, making it better suited for dynamic control tasks in real-world AV–HDV mixed environments.
- Empirical Validation: Leveraging reconstructed NGSIM I-80 trajectory data and extensive numerical simulations, we evaluate the performance of TD3-BCM across varying AV penetration rates. The results indicate that, compared to conventional unidirectional CACC and fixed-gain BCM models, TD3-BCM significantly reduces the amplitude and frequency of traffic oscillations, improves speed stability and throughput, and concurrently lowers fuel consumption and collision risk. These findings confirm the model’s adaptability and practical relevance in future mixed traffic scenarios.
2. Related Work
2.1. Traffic Oscillations and Sustainability Impacts
2.2. Evolution of Control Models: From Car-Following Model to Bilateral Control Model
2.3. Reinforcement Learning in Mixed Traffic Control
3. Methodology
3.1. Introduction to Traditional Bilateral Control Models
- : Acceleration of vehicle n at time t.
- : Distances between vehicle n and its leading and following vehicles, respectively.
- : Speeds of the leading, current, and following vehicles.
- : Target cruising speed of vehicle n.
- : Accelerations of the leading vehicle and vehicle n, respectively.
- : Acceleration of the n-th vehicle at time t.
- : Desired distance, defined as , with and [36].
- : Actual distance between the n-th vehicle and the preceding vehicle.
- : Desired speed, typically set as the target speed or road speed limit.
- : Actual speed of the n-th vehicle at time t.
3.2. TD3-Driven Bilateral Control Model
3.2.1. Core Mechanisms of TD3
- (a)
- Actor-Critic Architecture
- Tail Vehicle (Actor 1 and Critic 1): The tail vehicle operates under a unilateral car-following model. Its state vector is fed into Actor 1, which comprises two hidden layers with 32 neurons each. Actor 1 outputs the acceleration action . Critic 1 receives the same state vector along with as input and estimates the corresponding Q-value using a 32 × 32 hidden structure, as shown in Figure 2.
- Middle Vehicles (Actor 2 and Critic 2): Middle vehicles adopt a bilateral control structure, extending their state vector to to incorporate information from both preceding and following vehicles. Actor 2 consists of two hidden layers with 64 neurons each and outputs . Critic 2 estimates the Q-value based on the same state and action input, using a 64 × 64 hidden layer architecture, as shown in Figure 3.
- (b)
- Target Policy Smoothing
- (c)
- Clipped Double Q-Learning of TD3
- and : Two independent critic networks.
- and : Next state and next action.
- : Discount factor, balancing short- and long-term returns, typically 0.95 or 0.99 [46].
3.2.2. TD3-BCM Formulation
- (a)
- State Space
- : Represents the current speed of the middle vehicle.
- : Is the distance to the preceding vehicle.
- : Is the relative speed with the preceding vehicle.
- : Is the current speed of the following vehicle.
- : Is the distance to the following vehicle.
- : Represents the current speed of the tail vehicle.
- : Is the distance to the preceding vehicle.
- : Is the relative speed with the preceding vehicle.
- (b)
- Action Space
- (c)
- Reward Function
- : Is the computed safe speed to avoid collisions.
- : Is the speed of the preceding vehicle.
- [36]: Is the reaction time.
- [56]: Is the maximum deceleration.
- : Is the penalty weight for unsafe speeds.
- (d)
- Policy Update
3.3. Training and Evaluation of TD3-BCM
3.3.1. Training and Test Data
3.3.2. Training and Test Steps
3.3.3. Parameters and Training Results
4. Simulation Setup and Performance Evaluation
4.1. Simulation Setup
4.2. Stability Results
4.2.1. Theoretical Stability Analysis
- is the AV penetration rate.
- is the transfer function for HDVs.
- is the transfer function for AVs.
- is the stability of the AV sub-chain, which is defined as the inverse of the AV sub-chain’s speed.
- is the stability of the HDV sub-chain, which is defined as the inverse of the HDV sub-chain’s speed.
- is the stability of the boundary between AV and HDV sub-chains, which is equal to the reciprocal of the speed of boundary vehicles (AV and HDV).
4.2.2. Cumulative Damping Ratio
- represents cumulative damping ratio.
- represents the acceleration of the n-th vehicle at time t.
- represents the acceleration of the lead vehicle at time t.
- T denotes the total simulation time.
- Scenario 1: The ratio decreases from 0.534 at vehicle 2 to 0.338 at vehicle 30, but with noticeable fluctuations, indicating that the unidirectional feedback of CACC limits its ability to suppress disturbances, leading to persistent oscillations across the chain.
- Scenario 2: The ratio drops more rapidly, reaching 0.060 at vehicle 30, demonstrating improved disturbance suppression with bidirectional feedback. However, residual oscillations persist due to the lack of adaptive adjustment, limiting overall stability.
- Scenario 3: The damping ratio further reduces to 0.048 at vehicle 30, highlighting the effectiveness of TD3-BCM in dynamically mitigating disturbances and enhancing overall stability compared to traditional BCM.
- Scenario 4: With an increased AV penetration rate, the ratio declines to 0.035 at vehicle 30, reflecting improved coordination among AVs and more efficient disturbance absorption across the vehicle chain.
- Scenario 5: The damping ratio reaches its lowest level, below 0.030 at vehicle 30, indicating near elimination of traffic oscillations. This confirms that higher AV penetration combined with TD3-BCM maximizes stability and minimizes disturbance propagation.
4.2.3. Traffic Dynamics
4.3. Performance Results
4.3.1. Efficiency
- In Scenario 1, using the CACC model, there is significant fluctuation in the time gaps, especially at the head of the vehicle chain, with a mean time gap of 2.64 s and a standard deviation of 1.44 s. This indicates that, due to the limitations of unidirectional control, the spacing between vehicles fluctuates considerably, leading to unstable traffic flow.
- In Scenario 2, after the introduction of the BCM model with bidirectional feedback, fluctuations are reduced, and the mean time gap decreases to 2.1 s with a reduced standard deviation, indicating improved traffic stability.
- In Scenario 3, with the introduction of TD3-BCM, the stability of time gaps improves further. The average time gap for vehicles 2 to 10 stabilizes around 2 s, with a standard deviation reduced to 0.34 s, ensuring smoother traffic flow.
- In Scenario 4, as AV penetration increases, the mean time gap for vehicles beyond the 15th position decreases to 1.94 s, with a standard deviation below 0.2 s, highlighting the continued advantage of TD3-BCM in higher penetration scenarios, ensuring even more stable traffic flow.
- In Scenario 5, with further increases in AV penetration, the mean time gap reaches its lowest level across the vehicle chain. The average time gap for mid- and tail-end vehicles remains between 1.84 s and 0.80 s, with a standard deviation falling below 0.036 s, demonstrating the superior performance of TD3-BCM in high AV penetration scenarios.
4.3.2. Safety
- : Distance between the vehicle and its preceding vehicle at time t.
- : Relative velocity between vehicles.
4.3.3. Energy Savings
- : Empirical regression coefficients derived from real-world data [66].
- : Acceleration of the n-th vehicle at time t.
- : Speed of the n-th vehicle at time t.
- In Scenario 1, using the CACC model, fuel consumption remains the highest across the vehicle chain. The lead vehicle’s average consumption is 0.99 mL/s, with trailing vehicles showing significant fluctuations, and a standard deviation of 0.073 mL/s. This indicates substantial inefficiencies due to inconsistent driving behavior.
- In Scenario 2, the introduction of bidirectional control with the BCM model reduces fluctuations, leading to a moderate decrease in fuel consumption. The fuel consumption of the mid- and tail-end vehicles stabilizes at approximately 0.88 mL/s, with a standard deviation below 0.05 mL/s, indicating an improvement in traffic efficiency.
- In Scenario 3, after deploying TD3-BCM, fuel consumption further decreases and stabilizes at 0.85 mL/s, showing the advantage of TD3-BCM in optimizing fuel consumption.
- In Scenario 4, with the increase in AV penetration, fuel consumption decreases further to approximately 0.82 mL/s, with a standard deviation below 0.04 mL/s. This shows that TD3-BCM continues to demonstrate better stability and lower fuel consumption in higher penetration scenarios.
- In Scenario 5, with further increases in AV penetration, fuel consumption reaches its lowest level across the vehicle chain. The average consumption of mid- and tail-end vehicles remains between 0.78 mL/s and 0.80 mL/s, with a standard deviation falling below 0.036 mL/s, highlighting the superior performance of TD3-BCM in high AV penetration scenarios.
5. Conclusions and Discussion
5.1. Conclusions
5.2. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
TD3 | Twin Delayed Deep Deterministic Policy Gradient |
BCM | Bilateral Control Model |
CFM | Car-Following Model |
AVs | Autonomous Vehicles |
HDVs | Human-Driven Vehicles |
TTC | Time-To-Collision |
IDM | Intelligent Driver Model |
References
- Chen, X.; Sun, D.; Li, Y. A future intelligent traffic system with mixed autonomous vehicles and human-driven vehicles. Inf. Sci. 2020, 529, 59–72. [Google Scholar] [CrossRef]
- Guo, Q.; Ban, X.J.; Aziz, H.M.A. Mixed traffic flow of human driven vehicles and automated vehicles on dynamic transportation networks. Transp. Res. Part Emerg. Technol. 2021, 128, 103159. [Google Scholar] [CrossRef]
- Zhou, Y.; Wang, M.; Zhang, H. Study on mixed traffic of autonomous vehicles and human-driven vehicles with different cyber interaction approaches. Veh. Commun. 2022, 33, 100550. [Google Scholar]
- Li, Y.; Zhang, H.; Wang, M. Traffic breakdown probability estimation for mixed flow of autonomous vehicles and human driven vehicles. Sensors 2023, 23, 3486. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Li, K.; Wu, G. Optimizing mixed traffic flow: Longitudinal control of connected and automated vehicles to mitigate traffic oscillations. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4001–4012. [Google Scholar]
- Sun, M. A day-to-day dynamic model for mixed traffic flow of autonomous vehicles and inertial human-driven vehicles. Transp. Res. Part E Logist. Transp. Rev. 2023, 173, 103113. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, X.; Li, Z. Car-following behavior of human-driven vehicles in mixed-flow traffic: A driving simulator study. Transp. Res. Rec. 2023, 2677, 1–12. [Google Scholar]
- Bang, S.; Ahn, S. Mixed traffic of connected and autonomous vehicles and human-driven vehicles: Traffic evolution and control using spring-mass-damper system. Transp. Res. Rec. 2019, 2673, 1–10. [Google Scholar] [CrossRef]
- Ge, J.; Orosz, G. Connected cruise control design in mixed traffic flow consisting of human-driven and automated vehicles. Transp. Res. Part C Emerg. Technol. 2018, 95, 445–459. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, Y.; Wu, Y.; Yao, Z. Cooperative driving in mixed-flow traffic of connected vehicles and human-driven vehicles: A state estimation approach. Expert Syst. Appl. 2023, 235, 121275. [Google Scholar] [CrossRef]
- Yao, Z.; Luo, R.; Gu, Q.; Xu, T. Analysis of linear internal stability for mixed traffic flow of connected and automated vehicles considering multiple influencing factors. Phys. A Stat. Mech. Its Appl. 2022, 597, 127200. [Google Scholar]
- Li, K.; Zhang, Y.; Wang, X. A survey of lateral stability criterion and control application for autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4567–4580. [Google Scholar]
- Pan, X.; Li, H.; Zhang, M. The impacts of connected autonomous vehicles on mixed traffic flow: A comprehensive review. Phys. A Stat. Mech. Its Appl. 2023, 635, 129454. [Google Scholar] [CrossRef]
- Ding, H.; Pan, H.; Bai, H.; Zheng, X.; Chen, J. Driving strategy of connected and autonomous vehicles based on multiple preceding vehicles state estimation in mixed vehicular traffic. Phys. A Stat. Mech. Its Appl. 2022, 596, 127154. [Google Scholar] [CrossRef]
- Sun, M.; Li, Y.; Wang, J. Trajectory planning and control of autonomous vehicles for static vehicle avoidance in dynamic traffic environments. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1234–1245. [Google Scholar]
- Zhou, Y.; Wang, M.; Zhang, H. A survey on urban traffic control under mixed traffic environment with connected automated vehicles. Transp. Res. Part C Emerg. Technol. 2023, 145, 103902. [Google Scholar]
- Zhu, J.; Easa, S.; Gao, K. Merging control strategies of connected and autonomous vehicles at freeway on-ramps: A comprehensive review. J. Intell. Connect. Veh. 2022, 5, 15–30. [Google Scholar] [CrossRef]
- Kim, S.; Lee, J.; Park, H. Active lane management and control using connected and automated vehicles in a mixed traffic environment. Transp. Res. Part C Emerg. Technol. 2022, 139, 103648. [Google Scholar]
- Zhao, C.; Yu, H.; Molnar, T.G. Safety-critical traffic control by connected automated vehicles. Transp. Res. Part C Emerg. Technol. 2023, 154, 104230. [Google Scholar] [CrossRef]
- Ozioko, E.F.; Kunkel, J.; Stahl, F. Road Intersection Coordination Scheme for Mixed Traffic (Human-Driven and Driverless Vehicles): A Systematic Review. J. Adv. Transp. 2022, 2022, 2951999. [Google Scholar] [CrossRef]
- Guo, L.; Jia, Y. Bilateral Adaptation of Longitudinal Control of Automated Vehicles and Human Drivers. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5663–5671. [Google Scholar] [CrossRef]
- Shi, T.; Ai, Y.; ElSamadisy, O.; Abdulhai, B. Bilateral deep reinforcement learning approach for better-than-human car-following. In Proceedings of the IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 3986–3992. [Google Scholar]
- Xie, J.; Liu, Y.; Chen, N. Two-Sided Deep Reinforcement Learning for Dynamic Mobility-on-Demand Management with Mixed Autonomy. Transp. Sci. 2022, 56, 1123–1144. [Google Scholar] [CrossRef]
- Poudel, B.; Li, W.; Li, S. Carl: Congestion-aware reinforcement learning for imitation-based perturbations in mixed traffic control. In Proceedings of the IEEE 14th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Copenhagen, Denmark, 16–19 July 2024; pp. 7–14. [Google Scholar]
- Liu, K.; Jiao, P.; Hong, W.; Chen, Y. Bilateral Control Model for Autonomous Vehicles Based on Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2025, 26, 6216–6230. [Google Scholar] [CrossRef]
- Treiber, M.; Hennecke, A.; Helbing, D. Congested traffic states in empirical observations and microscopic simulations. Phys. Rev. E 2000, 62, 1805. [Google Scholar] [CrossRef]
- Kerner, B.S.; Lieu, H. The physics of traffic: Empirical freeway pattern features, engineering applications; and theory. Phys. Today 2005, 58, 54–56. [Google Scholar]
- Sugiyama, Y.; Fukui, M.; Kikuchi, M.; Hasebe, K.; Nakayama, A.; Nishinari, K.; Tadaki, S.; Yukawa, S. Traffic jams without bottlenecks—Experimental evidence for the physical mechanism of the formation of a jam. New J. Phys. 2008, 10, 033001. [Google Scholar] [CrossRef]
- Zheng, Z.; Ahn, S.; Monsere, C.M. Impact of traffic oscillations on freeway crash occurrences. Accid. Anal. Prev. 2010, 42, 626–636. [Google Scholar] [CrossRef]
- Li, X.; Cui, J.; An, S.; Parsafard, M. Stop-and-go traffic analysis: Theoretical properties, environmental impacts and oscillation mitigation. Transp. Res. Part B 2014, 70, 319–339. [Google Scholar] [CrossRef]
- Qin, Y.; Liu, M.; Hao, W. Energy-optimal car-following model for connected automated vehicles considering traffic flow stability. Energy 2024, 298, 131333. [Google Scholar] [CrossRef]
- Jeon, C.M.; Amekudzi, A.; Guensler, R.L. Evaluating Transportation System Sustainability: Atlanta Metropolitan Region. Transp. Res. Rec. 2006, 1983, 10–17. [Google Scholar]
- Heckelmann, P.; Rinderknecht, S. Influence of an automated vehicle with predictive longitudinal control on mixed urban traffic using SUMO. World Electr. Veh. J. 2024, 15, 448. [Google Scholar] [CrossRef]
- Hou, K.; Giannopoulos, G. Modeling the Deployment and Management of Large-Scale Autonomous Vehicle Circulation in Mixed Road Traffic Conditions Considering Virtual Track Theory. Future Transp. 2024, 4, 215–235. [Google Scholar] [CrossRef]
- Li, P.; Liu, M.; Zhu, M.; Yao, M. Preemptive-Level-Based Cooperative Autonomous Vehicle Trajectory Optimization for Unsignalized Intersection with Mixed Traffic. Electronics 2025, 14, 71. [Google Scholar] [CrossRef]
- Brackstone, M.; McDonald, M. Car-following: A historical review. Transp. Res. Part F Traffic Psychol. Behav. 1999, 2, 181–196. [Google Scholar] [CrossRef]
- Kesting, A.; Treiber, M.; Helbing, D. Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity. Philos. Trans. R. Soc. A 2010, 368, 4585–4605. [Google Scholar] [CrossRef]
- Milanés, V.; Shladover, S.E. Cooperative adaptive cruise control: A state-of-the-art review. IEEE Trans. Intell. Veh. 2014, 1, 98–113. [Google Scholar]
- Horn, B.K.P. Suppressing traffic flow instabilities. In Proceedings of the 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), Hague, The Netherlands, 6–9 October 2013; IEEE: Hague, The Netherlands, 2013; pp. 13–20. [Google Scholar]
- Horn, B.K.P.; Wang, J. Wave equation based control of vehicular platoons. Transp. Res. Part Methodol. 2018, 106, 340–360. [Google Scholar]
- Wang, J.; Wang, R.; Horn, B.K.P. Chain stability of a platoon with bidirectional control. Transp. Res. Part C Emerg. Technol. 2019, 100, 1–17. [Google Scholar]
- Wang, J.; Horn, B.K.P.; Wang, R. Multi-node bidirectional control for vehicle platooning. IEEE Trans. Intell. Transp. Syst. 2019, 20, 2262–2276. [Google Scholar]
- Wang, J.; Horn, B.K.P. Eigenvalue-based analysis of bidirectional platoon stability. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2211–2221. [Google Scholar]
- Wang, J.; Horn, B.K.P.; Wang, R. Mixed platoon stability analysis with bidirectional control. Transp. Res. Part C Emerg. Technol. 2019, 102, 1–14. [Google Scholar]
- Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 1989. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D.P. Continuous Control with Deep Reinforcement Learning. U.S. Patent 10,776,692, 15 September 2020. [Google Scholar]
- Lin, Y.; McPhee, J.; Azad, N.L. Comparison of deep reinforcement learning and model predictive control for adaptive cruise control. IEEE Trans. Intell. Veh. 2020, 6, 221–231. [Google Scholar] [CrossRef]
- Ernst, D.; Glavic, M.; Capitanescu, F.; Wehenkel, L. Reinforcement learning versus model predictive control: A comparison on a power system problem. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2008, 39, 517–529. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Roncoli, C.; Ju, Y. A Helly model-based MPC control system for jam-absorption driving strategy against traffic waves in mixed traffic. Appl. Sci. 2024, 14, 1424. [Google Scholar] [CrossRef]
- Wang, L.; Horn, B.K.P. On the stability analysis of mixed traffic with vehicles under car-following and bilateral control. IEEE Trans. Autom. Control 2019, 65, 3076–3083. [Google Scholar] [CrossRef]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc. Int. Conf. Mach. Learn. 2018, 80, 1861–1870. [Google Scholar]
- Zhu, M.; Wang, Y.; Pu, Z.; Hu, J.; Wang, X.; Ke, R. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transp. Res. Part C Emerg. Technol. 2020, 117, 102662. [Google Scholar] [CrossRef]
- Dong, J.; Wang, J.; Chen, L.; Gao, Z.; Luo, D. Effect of adaptive cruise control on mixed traffic flow: A comparison of constant time gap policy with variable time gap policy. J. Adv. Transp. 2021, 3745989. [Google Scholar] [CrossRef]
- Kesting, A.; Treiber, M.; Helbing, D. General lane-changing model MOBIL for car-following models. Transp. Res. Rec. 2007, 86–94. [Google Scholar] [CrossRef]
- Liu, Y.; Sun, W.; Xu, W.; Xiong, X.; Hao, L.; Qu, L. Multi-agent collaborative adaptive cruise control based on reinforcement learning. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; pp. 3388–3393. [Google Scholar]
- Montanino, M.; Punzo, V. Trajectory data reconstruction and simulation-based validation against macroscopic traffic patterns. Transp. Res. Part B Methodol. 2015, 80, 82–106. [Google Scholar] [CrossRef]
- Jiang, L.; Xie, Y.; Evans, N.G.; Wen, X.; Li, T.; Chen, D. Reinforcement Learning based cooperative longitudinal control for reducing traffic oscillations and improving platoon stability. Transp. Res. Part C Emerg. Technol. 2022, 141, 103744. [Google Scholar] [CrossRef]
- Li, Y.; Chen, S.; Ha, P.Y.J.; Dong, J.; Steinfeld, A.; Labi, S. Leveraging vehicle connectivity and autonomy to stabilize flow in mixed traffic conditions: Accounting for human-driven vehicle driver behavioral heterogeneity and perception-reaction time delay. Transp. Res. Part C Emerg. Technol. 2020, 121, 102890. [Google Scholar]
- He, Y.; Zhou, Q.; Wang, C.; Li, J.; Shuai, B.; Lei, L.; Xu, H. Microscopic modeling of car-following behavior: Developments and future directions. Int. J. Automot. Manuf. Mater. 2023, 2, 6. [Google Scholar]
- Németh, B.; Gáspár, P. LPV-based control design of vehicle platoon considering road inclinations. IFAC Proc. Vol. 2011, 44, 3837–3842. [Google Scholar] [CrossRef]
- Ploeg, J.; Van De Wouw, N.; Nijmeijer, H. LP string stability of cascaded systems: Application to vehicle platooning. IEEE Trans. Control Syst. Technol. 2013, 22, 786–793. [Google Scholar] [CrossRef]
- Minderhoud, M.M.; Bovy, P.H. Extended time-to-collision measures for road traffic safety assessment. Accid. Anal. Prev. 2001, 33, 89–97. [Google Scholar] [CrossRef]
- Minocha, V.K.; Saini, G. Discussion of “Estimating Vehicle Fuel Consumption and Emissions Based on Instantaneous Speed and Acceleration Levels” by Kyoung Ahn, Hesham Rakha, Antonio Trani, and Michel Van Aerde. J. Transp. Eng. 2003, 129, 578–579. [Google Scholar] [CrossRef]
- West, B.H.; McGill, R.N.; Hodgson, J.W.; Sluder, C.S.; Smith, D.E. Development of data-based light-duty modal emissions and fuel consumption models. SAE Trans. 1997, 106, 1274–1280. [Google Scholar]
Component | Parameters | Functions |
---|---|---|
Stability Reward () | , | Reduces acceleration fluctuations. |
Time Gap Penalty () | Penalizes insufficient time gaps. | |
Distance Penalty () | Penalizes unsafe following distances. | |
TTC Penalty () | Enhances collision avoidance. | |
Efficiency Reward () | Encourages target speed tracking. | |
Smoothness Reward () | — | Improves control smoothness. |
Parameter Name | Description | Value |
---|---|---|
Speed of the vehicle at time t | Dynamic | |
Acceleration of the vehicle | ||
Distance to the preceding vehicle | Dynamic | |
Distance to the following vehicle | Dynamic | |
Minimum safe distance | ||
Target speed | ||
Minimum time gap ensuring safety | ||
Distance feedback gain | ||
Velocity feedback gain | ||
Target velocity feedback gain | ||
Acceleration feedback gain | ||
Stability penalty weight | ||
Reaction time for vehicles | ||
Maximum deceleration | ||
Penalty weight for unsafe time gaps | ||
Penalty weight for unsafe distances | ||
Penalty weight for exceeding safe speed | ||
Penalty weight for efficiency deviations | ||
Actor network learning rate | ||
Critic network learning rate | ||
Discount factor | ||
Soft update rate | ||
Replay Buffer Size | Experience replay buffer size | (tail), (mid) |
Batch Size | Mini-batch size | 64 (tail), 256 (mid) |
Standard deviation of target noise | ||
Clipping Range | Clipping range for target noise |
Scenario | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | Scenario 5 |
---|---|---|---|---|---|
0.814 | 0.920 | 1.056 | 1.066 | 1.130 | |
0.870 | 0.967 | 1.076 | 1.145 | 1.121 | |
0.860 | 1.015 | 1.098 | 1.153 | 1.143 | |
0.556 | 1.015 | 1.132 | 1.163 | 1.182 | |
(Std) | 0.061 | 0.057 | 0.039 | 0.033 | 0.023 |
(Std) | 0.085 | 0.068 | 0.067 | 0.056 | 0.043 |
(Std) | 0.046 | 0.044 | 0.037 | 0.021 | 0.016 |
((Std) | 0.087 | 0.081 | 0.077 | 0.065 | 0.054 |
TTC Threshold | 1 s, 1.5 s, 2 s | 2.5 s | 3 s |
---|---|---|---|
Scenario 1—Mean | 0.0000 | 0.2453 | 1.4385 |
Scenario 1—Std | 0.0000 | 0.7341 | 2.8119 |
Scenario 2—Mean | 0.0000 | 0.3186 | 1.6214 |
Scenario 2—Std | 0.0000 | 0.6453 | 2.7348 |
Scenario 3—Mean | 0.0000 | 0.3924 | 1.8749 |
Scenario 3—Std | 0.0000 | 0.5632 | 2.5231 |
Scenario 4—Mean | 0.0000 | 0.4657 | 2.1874 |
Scenario 4—Std | 0.0000 | 0.4712 | 2.3142 |
Scenario 5—Mean | 0.0000 | 0.5432 | 2.6951 |
Scenario 5—Std | 0.0000 | 0.3951 | 2.0713 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, K.; Jiao, P.; Hong, W.; Chen, Y. Enhancing Mixed Traffic Stability with TD3-Driven Bilateral Control in Autonomous Vehicle Chains. Sustainability 2025, 17, 4790. https://doi.org/10.3390/su17114790
Liu K, Jiao P, Hong W, Chen Y. Enhancing Mixed Traffic Stability with TD3-Driven Bilateral Control in Autonomous Vehicle Chains. Sustainability. 2025; 17(11):4790. https://doi.org/10.3390/su17114790
Chicago/Turabian StyleLiu, Kan, Pengpeng Jiao, Weiqi Hong, and Yue Chen. 2025. "Enhancing Mixed Traffic Stability with TD3-Driven Bilateral Control in Autonomous Vehicle Chains" Sustainability 17, no. 11: 4790. https://doi.org/10.3390/su17114790
APA StyleLiu, K., Jiao, P., Hong, W., & Chen, Y. (2025). Enhancing Mixed Traffic Stability with TD3-Driven Bilateral Control in Autonomous Vehicle Chains. Sustainability, 17(11), 4790. https://doi.org/10.3390/su17114790