Next Article in Journal
Optimization of Hybrid Energy System Control Using MPC and MILP
Previous Article in Journal
Effects of Valve Closure Strategies on Hydraulic Transients in Long-Distance Gravitational Water Supply Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

AM-DIMPO: Action-Masked Diffusion-Implicit Policy Optimization for On-Ramp Merging Under Dense Traffic

1
Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
2
School of Robotics, Beijing Union University, Beijing 100101, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(8), 3687; https://doi.org/10.3390/app16083687
Submission received: 10 February 2026 / Revised: 2 April 2026 / Accepted: 7 April 2026 / Published: 9 April 2026

Abstract

Highway ramp merging requires autonomous vehicles to make safe and efficient decisions in dense mixed traffic, where strong vehicle interactions and rapidly changing acceptable gaps make the task particularly challenging. Existing reinforcement learning methods are often unimodal and overly conservative, while diffusion-based policies, despite their ability to generate multimodal actions, usually suffer from high inference latency and safety risks caused by unconstrained sampling. To address these issues, this paper proposes AM-DIMPO, an action-mask-guided safe diffusion-implicit policy optimization framework for ramp-merging tasks. The proposed method combines DDIM-based implicit sampling with a state-dependent continuous action mask to improve multimodal action generation efficiency while enhancing action feasibility. In addition, the mask correction signal is incorporated into policy learning to encourage the policy to generate actions closer to the safe feasible region. Experiments are conducted in a Gym-based ramp-merging simulator under both light-traffic and dense-traffic scenarios, where the proposed method is compared with classical reinforcement learning baselines, diffusion reinforcement learning baselines, and a safety-aware PPO baseline. The results show that, in dense traffic, AM-DIMPO achieves a merging success rate of 97.3%, an average speed of 16.27 m/s, and an inference latency of 68 ms; in light traffic, the success rate reaches 98.1%. Moreover, the proposed method maintains robust performance under the tested noisy-observation and reduced-visibility settings. Overall, AM-DIMPO achieves a favorable balance among empirical safety, traffic efficiency, robustness, and real-time inference performance in dense highway ramp-merging tasks.
Keywords: reinforcement learning; diffusion-implicit policy; autonomous driving; on-ramp merging; action mask reinforcement learning; diffusion-implicit policy; autonomous driving; on-ramp merging; action mask

Share and Cite

MDPI and ACS Style

Gao, Q.; Li, J.; Huang, X.; Zhu, Y.; Du, Y. AM-DIMPO: Action-Masked Diffusion-Implicit Policy Optimization for On-Ramp Merging Under Dense Traffic. Appl. Sci. 2026, 16, 3687. https://doi.org/10.3390/app16083687

AMA Style

Gao Q, Li J, Huang X, Zhu Y, Du Y. AM-DIMPO: Action-Masked Diffusion-Implicit Policy Optimization for On-Ramp Merging Under Dense Traffic. Applied Sciences. 2026; 16(8):3687. https://doi.org/10.3390/app16083687

Chicago/Turabian Style

Gao, Qiuqi, Jiahong Li, Xiaoxiang Huang, Yidian Zhu, and Yu Du. 2026. "AM-DIMPO: Action-Masked Diffusion-Implicit Policy Optimization for On-Ramp Merging Under Dense Traffic" Applied Sciences 16, no. 8: 3687. https://doi.org/10.3390/app16083687

APA Style

Gao, Q., Li, J., Huang, X., Zhu, Y., & Du, Y. (2026). AM-DIMPO: Action-Masked Diffusion-Implicit Policy Optimization for On-Ramp Merging Under Dense Traffic. Applied Sciences, 16(8), 3687. https://doi.org/10.3390/app16083687

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop