This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning
by
Liang Ma
Liang Ma 1,2,*
and
Yuanli Bao
Yuanli Bao 1
1
School of Information Science and Technology, Southwest Jiaotong University, No. 999, Xi’an Road, Pidu District, Chengdu 611756, China
2
Sichuan Engineering Research Center of Train Operation Control Technology, No. 999, Xi’an Road, Pidu District, Chengdu 611756, China
*
Author to whom correspondence should be addressed.
Future Transp. 2026, 6(3), 127; https://doi.org/10.3390/futuretransp6030127 (registering DOI)
Submission received: 12 May 2026
/
Revised: 9 June 2026
/
Accepted: 12 June 2026
/
Published: 14 June 2026
Abstract
In heavy-haul railway systems, effective empty railcar distribution (ERD) can optimize composition planning and meet empty railcar requirements (ERRs) at all loading ends, thereby improving the efficiency of train operations. To solve practical challenges such as the imbalanced supply–demand of empty trains, redundant loading and unloading cycles, and prolonged waiting times, this study establishes a multi-objective and 0-1 integer programming model for ERD at the loading end of a heavy-haul railway. The model can simultaneously maximize the fulfilment of all ERRs, minimize the ERD delay time, and reduce the waiting time in the heavy-train combination problem under complex constraints, including the passing capacity of sections, combination capacity of stations, and ERR at the loading end. While traditional optimization methods such as mathematical programming or heuristic algorithms partially address these issues, they are ineffective under dynamic constraints and state-space explosion. Furthermore, traditional reinforcement learning-based methods, such as Q-learning, exhibit limitations in railway scheduling due to the state-space explosion problem and inadequate model generalization. To overcome these limitations, this study proposes an innovative framework; the ERD at the loading end of the heavy-haul railway is formalized as a Markov decision process and optimized using deep Q-network (DQN) reinforcement learning. In addition, this study proposes an experience data fusion mechanism that integrates the empirical rules of the dispatchers through a modular architecture, achieving real-time constraint compliance while maintaining scalability for practical implementation. The NSGA-II genetic algorithm for multi-objective problems is used in this study to evaluate the performance of the DQN algorithm. The experimental results demonstrate that the DQN algorithm can fully meet ERRs with zero delay and produce optimal schemes for train combinations. Meanwhile, NSGA-II presents superior performance in minimizing the combination waiting time and same-destination train combinations. Meanwhile, the DQN algorithm can identify superior ERD strategies in the expanded-action and state spaces, enabling the effective handling of complex constraint-based ERD.
Share and Cite
MDPI and ACS Style
Ma, L.; Bao, Y.
Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning. Future Transp. 2026, 6, 127.
https://doi.org/10.3390/futuretransp6030127
AMA Style
Ma L, Bao Y.
Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning. Future Transportation. 2026; 6(3):127.
https://doi.org/10.3390/futuretransp6030127
Chicago/Turabian Style
Ma, Liang, and Yuanli Bao.
2026. "Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning" Future Transportation 6, no. 3: 127.
https://doi.org/10.3390/futuretransp6030127
APA Style
Ma, L., & Bao, Y.
(2026). Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning. Future Transportation, 6(3), 127.
https://doi.org/10.3390/futuretransp6030127
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.