Energy-Efficient Train Control Based on Energy Consumption Estimation Model and Deep Reinforcement Learning
Abstract
1. Introduction
- The proposed energy consumption estimation model can accurately reflect the influence of basic resistance on running energy consumption, and has good generalization ability to the change in running environment, such as train mass, gradient, curve radius, and running length.
- The proposed method ensures that all the optional actions are feasible, and has good adaptability, which can adapt to the influence of different train masses and different running times while maintaining the optimization effect of energy consumption.
- The proposed energy consumption estimation model is based on data-driven training, and effectively replaces the accurate mathematical model as a part of the reinforcement learning environment without affecting the energy consumption optimization effect of the running trajectory.
2. Methodology
2.1. Assumptions and Problem Conditions
- Point Mass Model: The train is modeled as a single point mass, ignoring the internal coupler forces between carriages. This is a standard assumption in longitudinal train dynamics control.
- Discretized Environment: The line conditions, including gradient, curvature, and speed limits, are assumed to be constant within each discretized distance segment.
- Stable Tunnel Environment: The study focuses on urban metro systems operating primarily in tunnels (specifically based on Guangzhou Metro data). Therefore, external environmental factors such as strong crosswinds, rain, or snow are considered negligible, creating a relatively stable operating environment compared to open-air railways.
2.2. Train Dynamics Model
2.3. Unit Basic Resistance Coefficient Fitting
2.4. Train-Operation Energy Consumption Estimation Model
2.5. Design of Reinforcement Learning Algorithm
- Control Roughness: Discrete actions can cause abrupt changes in acceleration, resulting in “chattering” control behavior that reduces passenger comfort and increases mechanical wear.
- Curse of Dimensionality: To achieve high control precision with DQN, the action space must be finely discretized, leading to an exponential increase in the number of action states and making the model difficult to converge.
2.5.1. Constraint-Based Prior Knowledge and Experience Replay Mechanism
- State and Track Information: The agent can acquire real-time state information, such as position, speed, acceleration, and runtime, as well as track information such as slope and curve radius. This comprehensive environmental awareness allows the agent to make precise decisions based on the real-time state and track conditions.
- Stopping Requirement: The agent is required to bring the train’s speed to exactly zero at the endpoint to ensure accurate stopping. This goal sets a clear objective for the agent, guiding it to optimize decisions and ensure stopping precision at the endpoint.
- Acceleration Safety Constraints: The agent is fully aware of the maximum and minimum achievable acceleration under given position and speed conditions. This knowledge provides safe boundaries for exploration and optimization, preventing potential risks from exceeding system limits.
- Time Constraints: The agent is aware of the minimum remaining running time at each position. If the actual remaining time is below this minimum, the agent will take appropriate acceleration measures to ensure on-time arrival at the destination. This mechanism helps optimize running efficiency and prevent delays.
2.5.2. State Space of the Train
2.5.3. Action Space of the Train
2.5.4. Design of Reward Function
- Positive Reward Bias: Ensuring sufficient positive incentives is crucial for accelerating learning. Our experiments indicated that when the positive reward bias significantly exceeds 1, the algorithm tends to overlook the energy consumption factor (measured in kWh) because the reward magnitude overshadows the energy penalty. Conversely, when the bias is close to 0, the convergence speed becomes slow. A bias value of 1 (and similar magnitudes between 0.1 and 5) was found to ensure stable convergence while maintaining sensitivity to energy costs.
- Penalty for Abnormal States: When a significant time deviation occurs, the train is in an abnormal state. A penalty mechanism is needed to constrain the agent. We found that a penalty coefficient of 0.5 effectively balances the need for constraint without imposing an excessive negative impact that could destabilize the learning process.
- Time Threshold (50 s): The value of 50 represents the critical boundary for time deviation. Prior to training, we calculated the theoretical fastest running time based on expert knowledge. Experimental results showed that setting this threshold to 50 s achieves a good trade-off between exploration capability and stability. If set too large, the agent may over-explore detrimental actions; if set too small, the agent’s exploration capability is unduly restricted.
2.5.5. Training and Parameter Updating of Network
| Algorithm 1 DDPG algorithm |
|
3. Numerical Experiments
3.1. Identification of Unit Basic Resistance Coefficient
3.2. Data Generation and Preprocessing
- Parameter Sampling: The input state variables were sampled to cover the full operational range of the Guangzhou Metro Line 21. Specifically, the train mass was sampled from t. To ensure the diversity of the training environment, other track parameters were sampled from the specific design specifications of the line: the gradient ranges from ‰, the curve radius ranges from m, and the tunnel length ranges from m. Additionally, the running length for energy calculation was sampled within m, and the train speed covered the range of km/h.
- Noise Injection: To simulate real-world sensor uncertainties and improve model robustness, Gaussian noise was added to the calculated energy consumption labels.
- Data Preprocessing: To accelerate convergence, all input features were normalized to the range using Min-Max scaling before being fed into the network. Outliers in the synthetic generation process (e.g., physically impossible kinematic states) were filtered out.
3.3. Analysis of Training Results of Energy Consumption Estimation Model
3.4. Reinforcement Learning Trajectory Optimization—Unchanged Initial State
3.5. Reinforcement Learning Trajectory Optimization—Changed Initial State
4. Conclusions
- Dependence on Synthetic Data: Although the training data are synthetic, it is generated based on parameters strictly regressed from real-world operation data of Guangzhou Metro. The simulation environment functions as a “Digital Twin” of the specific track section, maintaining high fidelity to the actual physical dynamics. However, we acknowledge that the gap between this high-fidelity simulation and the stochastic raw sensor data in real-time operations still requires further validation.
- Model Simplicity: Compared to modern deep learning techniques (e.g., LSTMs or Transformers) that capture long-term temporal dependencies, the proposed BP neural network is relatively simple. While this ensures computational efficiency for real-time control, it may limit the prediction accuracy under highly dynamic transient conditions.
- Sim-to-Real Gap: There is a potential risk of the RL policy overfitting to the estimated energy model rather than the true physical system (the “Sim-to-Real” gap). Although our resistance regression mitigates this, further validation on real-world hardware is required.
- Computational Load: While our experimental analysis (Section 3.3) shows a low inference time (2 ms), this is based on a PC environment. A rigorous analysis of the computational load on embedded onboard controllers remains to be conducted.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| EETC | Energy-efficient Train Control |
| BP | Back Propagation |
| DDPG | Deep Deterministic Policy Gradient |
| PMP | Pontryagin maximum Principle |
| DQN | Deep Q-Network |
References
- Wang, B.; Sun, Y.; Chen, Q.; Wang, Z. Determinants analysis of carbon dioxide emissions in passenger and freight transportation sectors in China. Struct. Chang. Econ. Dyn. 2018, 47, 127–132. [Google Scholar] [CrossRef]
- Li, J.; Sun, X.; Cong, W.; Miyoshi, C.; Ying, L.C.; Wandelt, S. On the air-HSR mode substitution in China: From the carbon intensity reduction perspective. Transp. Res. Part A Policy Pract. 2024, 180, 103977. [Google Scholar] [CrossRef]
- González-Gil, A.; Palacin, R.; Batty, P.; Powell, J. A systems approach to reduce urban rail energy consumption. Energy Convers. Manag. 2014, 80, 509–524. [Google Scholar] [CrossRef]
- Yin, J.; Tang, T.; Yang, L.; Xun, J.; Huang, Y.; Gao, Z. Research and development of automatic train operation for railway transportation systems: A survey. Transp. Res. Part C Emerg. Technol. 2017, 85, 548–572. [Google Scholar] [CrossRef]
- Wang, Y.; De Schutter, B.; van den Boom, T.J.; Ning, B. Optimal trajectory planning for trains—A pseudospectral method and a mixed integer linear programming approach. Transp. Res. Part C Emerg. Technol. 2013, 29, 97–114. [Google Scholar] [CrossRef]
- Ichikawa, K. Application of optimization theory for bounded state variable problems to the operation of train. Bull. JSME 1968, 11, 857–865. [Google Scholar] [CrossRef]
- Howlett, P. An optimal strategy for the control of a train. ANZIAM J. 1990, 31, 454–471. [Google Scholar] [CrossRef]
- Khmelnitsky, E. On an optimal control problem of train operation. IEEE Trans. Autom. Control 2000, 45, 1257–1266. [Google Scholar] [CrossRef]
- He, D.; Guo, S.; Chen, Y.; Liu, B.; Chen, J.; Xiang, W. Energy efficient metro train running time rescheduling model for fully automatic operation lines. J. Transp. Eng. Part A Syst. 2021, 147, 04021032. [Google Scholar] [CrossRef]
- Deng, L.; Cai, L.; Zhang, G.; Tang, S. Energy consumption analysis of urban rail fast and slow train modes based on train running curve optimization. Energy Rep. 2024, 11, 412–422. [Google Scholar] [CrossRef]
- Huang, Y.; Yang, C.; Gong, S. Energy optimization for train operation based on an improved ant colony optimization methodology. Energies 2016, 9, 626. [Google Scholar] [CrossRef]
- Yildiz, A.; Arikan, O.; Keskin, K. Traction energy optimization considering comfort parameter: A case study in Istanbul metro line. Electr. Power Syst. Res. 2023, 218, 109196. [Google Scholar] [CrossRef]
- Peng, Y.; Lu, S.; Chen, F.; Liu, X.; Tian, Z. Energy-efficient train control incorporating inherent reduced-power and hybrid braking characteristics of railway vehicles. Transp. Res. Part C Emerg. Technol. 2024, 163, 104626. [Google Scholar] [CrossRef]
- Goverde, R.M.; Scheepmaker, G.M.; Wang, P. Pseudospectral optimal train control. Eur. J. Oper. Res. 2021, 292, 353–375. [Google Scholar] [CrossRef]
- Feng, M.; Huang, Y.; Lu, S. Eco-driving strategy optimization for high-speed railways considering dynamic traction system efficiency. IEEE Trans. Transp. Electrif. 2023, 10, 1617–1627. [Google Scholar] [CrossRef]
- Haahr, J.T.; Pisinger, D.; Sabbaghian, M. A dynamic programming approach for optimizing train speed profiles with speed restrictions and passage points. Transp. Res. Part B Methodol. 2017, 99, 167–182. [Google Scholar] [CrossRef]
- Lu, S.; Hillmansen, S.; Ho, T.K.; Roberts, C. Single-train trajectory optimization. IEEE Trans. Intell. Transp. Syst. 2013, 14, 743–750. [Google Scholar] [CrossRef]
- Ghaviha, N.; Bohlin, M.; Holmberg, C.; Dahlquist, E.; Skoglund, R.; Jonasson, D. A driver advisory system with dynamic losses for passenger electric multiple units. Transp. Res. Part C Emerg. Technol. 2017, 85, 111–130. [Google Scholar] [CrossRef]
- Zhou, K.; Song, S.; Xue, A.; You, K.; Wu, H. Smart train operation algorithms based on expert knowledge and reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 716–727. [Google Scholar] [CrossRef]
- Zhao, Z.; Xun, J.; Wen, X.; Chen, J. Safe reinforcement learning for single train trajectory optimization via shield SARSA. IEEE Trans. Intell. Transp. Syst. 2022, 24, 412–428. [Google Scholar] [CrossRef]
- Tang, H.; Wang, Y.; Liu, X.; Feng, X. Reinforcement learning approach for optimal control of multiple electric locomotives in a heavy-haul freight train: A Double-Switch-Q-network architecture. Knowl.-Based Syst. 2020, 190, 105173. [Google Scholar] [CrossRef]
- Zhang, H.; Xu, K.; Huang, D.; He, D.; Wu, S.; Xian, G. Hybrid decision-making for intelligent high-speed train operation: A boundary constraint and pre-evaluation reinforcement learning approach. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17979–17992. [Google Scholar] [CrossRef]
- Chen, X.; Guo, X.; Meng, J.; Xu, R.; Li, S.; Li, D. Research on ATO control method for urban rail based on deep reinforcement learning. IEEE Access 2023, 11, 5919–5928. [Google Scholar] [CrossRef]
- Yin, J.; Chen, D.; Li, L. Intelligent train operation algorithms for subway by expert system and reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2014, 15, 2561–2571. [Google Scholar] [CrossRef]
- Busetto, R.; Lucchini, A.; Formentin, S.; Savaresi, S.M. Data-driven optimal tuning of BLDC motors with safety constraints: A set membership approach. IEEE/ASME Trans. Mechatronics 2023, 28, 1975–1983. [Google Scholar] [CrossRef]
- Ning, L.; Zhou, M.; Hou, Z.; Goverde, R.M.; Wang, F.Y.; Dong, H. Deep Deterministic Policy Gradient for High-Speed Train Trajectory Optimization. IEEE Trans. Intell. Transp. Syst. 2022, 23, 11562–11574. [Google Scholar] [CrossRef]
- Zhu, Q.; Su, S.; Tang, T.; Xiao, X. Energy-efficient train control method based on soft actor-critic algorithm. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 2423–2428. [Google Scholar]
- Yin, Y.; Wang, Z.; Zheng, L.; Su, Q.; Guo, Y. Autonomous UAV navigation with adaptive control based on deep reinforcement learning. Electronics 2024, 13, 2432. [Google Scholar] [CrossRef]
- Sresakoolchai, J.; Kaewunruen, S. Railway infrastructure maintenance efficiency improvement using deep reinforcement learning integrated with digital twin based on track geometry and component defects. Sci. Rep. 2023, 13, 2439. [Google Scholar] [CrossRef]
- Seo, J.; Kim, S.; Jalalvand, A.; Conlin, R.; Rothstein, A.; Abbate, J.; Erickson, K.; Wai, J.; Shousha, R.; Kolemen, E. Avoiding fusion plasma tearing instability with deep reinforcement learning. Nature 2024, 626, 746–751. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, W.; Wu, Q.; Fan, P.; Fan, Q.; Wang, J.; Letaief, K.B. Distributed deep reinforcement learning based gradient quantization for federated learning enabled vehicle edge computing. IEEE Internet Things J. 2024, 12, 4899–4913. [Google Scholar] [CrossRef]
- Niu, W.; Zhou, Y.; Jiao, X.; Fujita, H.; Aljuaid, H. Trajectory optimization of train cooperative energy-saving operation using a safe deep reinforcement learning approach. Appl. Intell. 2025, 55, 651. [Google Scholar] [CrossRef]
- Li, H.; Yin, J.; Tang, T.; D’Ariano, A.; You, M. Integrated Optimization of Energy-Efficient Timetable and Speed Profiles for Train Platoons in Urban Rail Transit Systems. In Proceedings of the 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), Edmonton, AB, Canada, 24–27 September 2024; pp. 4082–4088. [Google Scholar]
- Jia, C.; He, H.; Zhou, J.; Li, J.; Wei, Z.; Li, K.; Li, M. A novel deep reinforcement learning-based predictive energy management for fuel cell buses integrating speed and passenger prediction. Int. J. Hydrogen Energy 2025, 100, 456–465. [Google Scholar] [CrossRef]
- Cunillera, A.; Bešinović, N.; Lentink, R.M.; van Oort, N.; Goverde, R.M. A literature review on train motion model calibration. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3660–3677. [Google Scholar] [CrossRef]
- Cunillera, A.; Bešinović, N.; van Oort, N.; Goverde, R.M. Real-time train motion parameter estimation using an unscented Kalman filter. Transp. Res. Part C Emerg. Technol. 2022, 143, 103794. [Google Scholar] [CrossRef]
- Zhao, X.H.; Ke, B.R.; Lian, K.L. Optimization of train speed curve for energy saving using efficient and accurate electric traction models on the mass rapid transit system. IEEE Trans. Transp. Electrif. 2018, 4, 922–935. [Google Scholar] [CrossRef]
- Xiao, Z.; Wang, Q.; Sun, P.; You, B.; Feng, X. Modeling and energy-optimal control for high-speed trains. IEEE Trans. Transp. Electrif. 2020, 6, 797–807. [Google Scholar] [CrossRef]
- Wang, J.; Rakha, H.A. Electric train energy consumption modeling. Appl. Energy 2017, 193, 346–355. [Google Scholar] [CrossRef]
- Liu, X.; Ning, B.; Xun, J.; Wang, C.; Xiao, X.; Liu, T. Parameter identification of train basic resistance using multi-innovation theory. IFAC-PapersOnLine 2018, 51, 637–642. [Google Scholar] [CrossRef]
- Radosavljevic, A. Measurement of train traction characteristics. Proc. Inst. Mech. Eng. Part F J. Rail Rapid Transit 2006, 220, 283–291. [Google Scholar] [CrossRef]
- Fernández, P.M.; Román, C.G.; Franco, R.I. Modelling electric trains energy consumption using Neural Networks. Transp. Res. Procedia 2016, 18, 59–65. [Google Scholar] [CrossRef]
- Pineda-Jaramillo, J.; Martínez-Fernández, P.; Villalba-Sanchis, I.; Salvador-Zuriaga, P.; Insa-Franco, R. Predicting the traction power of metropolitan railway lines using different machine learning models. Int. J. Rail Transp. 2021, 9, 461–478. [Google Scholar] [CrossRef]
- Peng, Y.; Chen, F.; Chen, F.; Wu, C.; Wang, Q.; He, Z.; Lu, S. Energy-efficient train control: A comparative study based on permanent magnet synchronous motor and induction motor. IEEE Trans. Veh. Technol. 2024, 73, 16148–16159. [Google Scholar] [CrossRef]
- Davis, W.J. The tractive resistance of electric locomotives and cars. Gen. Electr. Rev. 1926, 29, 685–707. [Google Scholar]
- Zhou, K. Research on the Car-Ground Simulation System of the LKJ-15 Train Operation Monitoring System Simulation Test Platform. Master’s Thesis, Southwest Jiaotong University, Chengdu, China, 2022. [Google Scholar]
- Li, K.; Zhou, J.; Jia, C.; Yi, F.; Zhang, C. Energy sources durability energy management for fuel cell hybrid electric bus based on deep reinforcement learning considering future terrain information. Int. J. Hydrogen Energy 2024, 52, 821–833. [Google Scholar] [CrossRef]
- Jia, C.; Liu, W.; He, H.; Chau, K. Superior energy management for fuel cell vehicles guided by improved DDPG algorithm: Integrating driving intention speed prediction and health-aware control. Appl. Energy 2025, 394, 126195. [Google Scholar] [CrossRef]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Chen, F.; Peng, Y.; Lu, S. Energy-Efficient Train Control Based on Improved Dynamic Programming Algorithm for Online Applications. In Proceedings of the International Conference on Electrical and Information Technologies for Rail Transportation, Beijing, China, 19–21 October 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 518–529. [Google Scholar]












| Parameter | Value |
|---|---|
| Input Layer Nodes | 7 |
| Hidden Layers | 3 |
| Hidden Layer Nodes | 100, 400, 100 |
| Output Layer Nodes | 1 |
| Activation Function | Leaky ReLU |
| Loss Function | MSE |
| Optimizer | Adam |
| Learning Rate | 0.005 |
| Training Sample Size | 1000 |
| Training Iterations | 2000 |
| Parameter | Value |
|---|---|
| Network Architecture | |
| Actor Hidden Layers | 3 |
| Actor Hidden Nodes | 60, 1000, 60 |
| Actor Activation | ReLU (Hidden), Tanh (Output) |
| Critic Hidden Layers | 3 |
| Critic Hidden Nodes | 60, 1000, 60 |
| Critic Activation | ReLU (Hidden), Linear (Output) |
| Training Parameters | |
| Actor Learning Rate () | |
| Critic Learning Rate () | |
| Discount Factor () | 1.0 |
| Soft Update Parameter () | 0.1 |
| Replay Buffer Size | 2000 |
| Batch Size | 500 |
| Exploration Noise () | 0.3 |
| THSC-SZL | SZL-KXC | KXC-SY | SY-SX | SX-CP | CP-JK | |
|---|---|---|---|---|---|---|
| (kWh) | 21.34 | 15.49 | 8.40 | 23.92 | 32.53 | 48.56 |
| (kWh) | 21.50 | 15.85 | 8.62 | 24.33 | 32.80 | 48.81 |
| (kWh) | 0.16 | 0.36 | 0.22 | 0.41 | 0.27 | 0.25 |
| (%) | 0.75 | 2.32 | 2.64 | 1.7 | 0.83 | 0.51 |
| E/T | −20 s | −10 s | Original | +10 s | +20 s |
|---|---|---|---|---|---|
| Energy (kWh) | 24.87 | 24.00 | 23.28 | 22.98 | 22.76 |
| Time (s) | 1.0 s | 1.2 s | 0.5 | 0.9 | 0.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Wang, Y.; Liu, Y.; Li, X.; Chen, F.; Lu, S. Energy-Efficient Train Control Based on Energy Consumption Estimation Model and Deep Reinforcement Learning. Electronics 2025, 14, 4939. https://doi.org/10.3390/electronics14244939
Liu J, Wang Y, Liu Y, Li X, Chen F, Lu S. Energy-Efficient Train Control Based on Energy Consumption Estimation Model and Deep Reinforcement Learning. Electronics. 2025; 14(24):4939. https://doi.org/10.3390/electronics14244939
Chicago/Turabian StyleLiu, Jia, Yuemiao Wang, Yirong Liu, Xiaoyu Li, Fuwang Chen, and Shaofeng Lu. 2025. "Energy-Efficient Train Control Based on Energy Consumption Estimation Model and Deep Reinforcement Learning" Electronics 14, no. 24: 4939. https://doi.org/10.3390/electronics14244939
APA StyleLiu, J., Wang, Y., Liu, Y., Li, X., Chen, F., & Lu, S. (2025). Energy-Efficient Train Control Based on Energy Consumption Estimation Model and Deep Reinforcement Learning. Electronics, 14(24), 4939. https://doi.org/10.3390/electronics14244939

