Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning

Ma, Liang; Bao, Yuanli

doi:10.3390/futuretransp6030127

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning

by

Liang Ma

^1,2,*

and

Yuanli Bao

¹

School of Information Science and Technology, Southwest Jiaotong University, No. 999, Xi’an Road, Pidu District, Chengdu 611756, China

²

Sichuan Engineering Research Center of Train Operation Control Technology, No. 999, Xi’an Road, Pidu District, Chengdu 611756, China

^*

Author to whom correspondence should be addressed.

Future Transp. 2026, 6(3), 127; https://doi.org/10.3390/futuretransp6030127 (registering DOI)

Submission received: 12 May 2026 / Revised: 9 June 2026 / Accepted: 12 June 2026 / Published: 14 June 2026

(This article belongs to the Special Issue Advancements in Traffic Simulation, Calibration, and Optimization for Future Transportation Systems)

Download Versions Notes

Abstract

In heavy-haul railway systems, effective empty railcar distribution (ERD) can optimize composition planning and meet empty railcar requirements (ERRs) at all loading ends, thereby improving the efficiency of train operations. To solve practical challenges such as the imbalanced supply–demand of empty trains, redundant loading and unloading cycles, and prolonged waiting times, this study establishes a multi-objective and 0-1 integer programming model for ERD at the loading end of a heavy-haul railway. The model can simultaneously maximize the fulfilment of all ERRs, minimize the ERD delay time, and reduce the waiting time in the heavy-train combination problem under complex constraints, including the passing capacity of sections, combination capacity of stations, and ERR at the loading end. While traditional optimization methods such as mathematical programming or heuristic algorithms partially address these issues, they are ineffective under dynamic constraints and state-space explosion. Furthermore, traditional reinforcement learning-based methods, such as Q-learning, exhibit limitations in railway scheduling due to the state-space explosion problem and inadequate model generalization. To overcome these limitations, this study proposes an innovative framework; the ERD at the loading end of the heavy-haul railway is formalized as a Markov decision process and optimized using deep Q-network (DQN) reinforcement learning. In addition, this study proposes an experience data fusion mechanism that integrates the empirical rules of the dispatchers through a modular architecture, achieving real-time constraint compliance while maintaining scalability for practical implementation. The NSGA-II genetic algorithm for multi-objective problems is used in this study to evaluate the performance of the DQN algorithm. The experimental results demonstrate that the DQN algorithm can fully meet ERRs with zero delay and produce optimal schemes for train combinations. Meanwhile, NSGA-II presents superior performance in minimizing the combination waiting time and same-destination train combinations. Meanwhile, the DQN algorithm can identify superior ERD strategies in the expanded-action and state spaces, enabling the effective handling of complex constraint-based ERD.

Keywords: heavy-haul railway; empty railcar distribution; DQN; reinforcement learning; multi-objective

Share and Cite

MDPI and ACS Style

Ma, L.; Bao, Y. Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning. Future Transp. 2026, 6, 127. https://doi.org/10.3390/futuretransp6030127

AMA Style

Ma L, Bao Y. Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning. Future Transportation. 2026; 6(3):127. https://doi.org/10.3390/futuretransp6030127

Chicago/Turabian Style

Ma, Liang, and Yuanli Bao. 2026. "Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning" Future Transportation 6, no. 3: 127. https://doi.org/10.3390/futuretransp6030127

APA Style

Ma, L., & Bao, Y. (2026). Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning. Future Transportation, 6(3), 127. https://doi.org/10.3390/futuretransp6030127

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Optimization of Empty Railcar Distribution at the Loading End of a Heavy-Haul Railway Based on Deep Reinforcement Learning

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI