1. Introduction
With the rapid development of China’s high-speed railway system, the operating mileage has continuously expanded, the rail network structure has become increasingly complex, and both the operating speed and train frequency have significantly increased. However, the frequency of high-speed rail operation disruptions caused by natural disasters has notably risen. In the Northeast, Northwest, and high-altitude regions of China, railway lines are perennially vulnerable to blizzard attacks, snow accumulation and freezing braking distances. Concurrently, poor electrical conductivity due to ice accumulation on catenary systems may trigger power supply interruptions, while reduced visibility necessitates speed restrictions. These factors directly compromise train dynamics and operational stability, posing serious threats to traffic safety and punctuality. To mitigate blizzard impacts, railway authorities typically implement speed restrictions, service suspensions, or timetable adjustments. However, these countermeasures often induce cascading train delays, potentially leading to regional rail network disruptions with significant economic losses and socio-economic consequences.
Significant progress has been made in high-speed rail dynamic scheduling research through multidisciplinary approaches. In the domain of delay propagation modeling, Meester et al. [
1] established a train delay propagation model and analytically derived the probabilistic distribution of cascading delays from initial delay distributions. Wang et al. [
2] identified critical stations for delay propagation and proposed temporal interval thresholds to construct delay propagation chains for determining propagation occurrences. For large-scale disruptions like section blockages that severely impact passenger mobility, Zhu et al. [
3] developed a train operation adjustment model minimizing generalized travel time, stoppage waiting time, and transfer frequencies, enabling more passengers to complete planned journeys under disruption scenarios. Hong et al. [
4] incorporated passenger reallocation mechanisms in rescheduling processes to mitigate interval blockage impacts. Zhan et al. [
5] proposed a mixed-integer programming model for real-time scheduling under complete high-speed rail blockage scenarios, employing a two-stage optimization strategy to minimize total weighted delays and train cancellations while satisfying interval and station capacity constraints. Empirical validation on the Beijing-Tianjin Intercity High-Speed Railway demonstrated 32.7% improvement in passenger service recovery efficiency compared to heuristic methods. Törnquist et al. [
6] addressed scheduling challenges in high-density heterogeneous railway systems through multi-track network optimization, validating its effectiveness in minimizing multi-stakeholder impacts using Swedish railway data while systematically analyzing theoretical advantages and practical limitations. Yang et al. [
7] formulated a mixed-integer linear programming model for timetable and stop-schedule co-optimization, minimizing total train dwell and delay times, solved via CPLEX. Yue et al. [
8] developed an integer programming model maximizing train profits with penalties for stop-schedule frequencies and durations, solved by column generation algorithms. Dai et al. [
9] conducted a systematic review of high-speed rail dynamic scheduling and train control integration, proposing a co-optimization framework through three-layer information-driven architecture that enhances safety, punctuality, operational efficiency, and system resilience, while identifying critical future challenges in information fusion mechanisms, real-time co-optimization algorithms, and cross-layer decision coordination. Nitisiri et al. [
10] introduced a parallel multi-objective genetic algorithm with hybrid sampling strategies and learning-based mutation for railway scheduling. Peng et al. [
11] provided integrated solutions for optimal rescheduling and speed control strategies under disruption uncertainties, employing rolling horizon algorithms. Shi et al. [
12] developed a delay prediction method combining XGBoost with Bayesian optimization, achieving superior performance on Chinese high-speed rail lines through feature modeling and hyperparameter optimization, validated by Friedman and Wilcoxon tests for long-term anomaly delay prediction. Zhang et al. [
13] explored multi-stage decision-making via stochastic optimization models. Song et al. [
14] proposed an adaptive co-evolutionary differential evolution algorithm (QGDECC) integrating quantum evolution and genetic algorithms with quantum variable decomposition, incremental mutation, and parameter self-adaptation strategies. The algorithm demonstrated 18.3% faster convergence and 24.6% higher precision in real operational data tests, effectively mitigating network-wide delay impacts while minimizing schedule deviation from original timetables.
Significant advancements in high-speed rail dynamic scheduling have been achieved through multidisciplinary research addressing emergency operation adjustment and resilience enhancement. Bešinović [
15] highlighted that train operation plan adjustments under emergent incidents have become a critical research focus in railway transportation organization. Chen et al. [
16] addressed interference management in high-frequency urban rail transit during peak hours by developing a nonlinear programming model integrating dynamic train scheduling with skip-stop strategies. Combining train vehicle plan constraints with customized model predictive control (MPC) methods enabled real-time solutions, with empirical validation on the Yizhuang Metro Line in Beijing demonstrating its superiority in reducing train deviation, enhancing service quality, and accommodating uncertain passenger demand. Multi-scenario robustness tests and predictive time adjustment analyses further emphasized the information updating value of the MPC approach. Dong et al. [
17] tackled capacity restoration limitations in existing hierarchical emergency response systems for high-speed rail, proposing an integrated operational control and online rescheduling framework. Through analyzing information flow processing defects and mechanism validation using wind-induced speed restrictions as a case study, the system was shown to significantly enhance dynamic capacity restoration capabilities in high-speed rail networks, providing theoretical foundations and practical pathways for intelligent emergency management. Li et al. [
18] investigated frequent emergency incidents in metro systems during peak hours, developing a discrete-event hybrid simulation method based on multi-agent modeling and parallel computing. By constructing train motion algorithms, defining three agent types (passengers, stations, trains), six emergency event categories, and parallel acceleration strategies, the method demonstrated efficiency and practicality in evaluating emergency event impacts on train and passenger delays through case studies on the Yizhuang Line in Beijing. This provided a high-precision simulation tool for metro emergency response optimization. Li et al. [
19] addressed post-earthquake high-speed rail traffic demand dynamics by proposing a mixed-integer linear programming (MILP) model integrating track deactivation/reactivation, station recovery, and dynamic traffic demand. Leveraging the original timetable as a guiding solution significantly reduced computational time, with empirical validation on the Harbin-Dalian High-Speed Railway between Shenyang and Dalian demonstrating the model’s effectiveness in generating optimal recovery timetables within short timeframes, thereby enhancing seismic resilience. Hassannayebi et al. [
20] addressed replanning challenges under stochastic disruptions in high-speed urban railways by developing an integrated optimization model combining short-haul operations with skip-stop services. A discrete-event simulation coupled with variable neighborhood search algorithms was employed, with probability-based scenario analysis addressing obstacle duration uncertainties. Validation on the Tehran Metro Network confirmed the simulation-optimization method’s superiority in minimizing average passenger waiting times, suppressing cascading effects, and improving system responsiveness, offering robustness-recovery synergistic control strategies for urban rail. Adithya et al. [
21] revealed significant meteorological impacts on Swedish railway delays through extreme weather event analyses, while William et al. [
22] demonstrated strong correlations between abrupt weather changes and delay propagation. Zhou et al. [
23] tackled scheduling complexities in high-speed rail emergency scenarios (e.g., strong winds, foreign object collisions) by proposing a parallel railway traffic management (RTM) system based on the ACP framework (Artificial Systems-Computational Experiments-Parallel Execution). Through agent-based modeling of artificial RTM environments and multi-objective optimization strategies (hybrid, FCFS, FSFS), the system demonstrated superior train rescheduling capabilities via real-time physical-artificial system feedback loops in temporary speed restriction and complete blockage scenarios, outperforming traditional strategies in emergency response efficiency and dispatcher decision support. Zhou et al. [
24] integrated GIS high-resolution precipitation data with non-spatial high-speed rail operation data to construct a grid model. Empirical analysis of the 2015–2017 rainy seasons in eastern China revealed that extreme rainfall significantly exacerbated daily surface rainfall delays on Hangzhou-Shenzhen and Nanjing-Hangzhou lines, with Beijing-Shanghai lines more sensitive to rainfall intensity and Shanghai-Nanjing/Denver-Wenzhou lines most vulnerable to extreme precipitation. This led to regional adaptive strategies for enhancing climate resilience in high-speed rail systems. Wang et al. [
25] proposed a dual-layer model predictive control (MPC) framework for high-speed rail online delay management and train control. The upper layer optimized global train delay minimization, while the lower layer coordinated operational time constraints with energy efficiency objectives. Validation using Beijing-Shanghai High-Speed Railway data demonstrated significant improvements in real-time performance, delay reduction efficiency, and robustness against multi-disturbance scenarios compared to FCFS/FSFS benchmarks. Song et al. [
26] developed an autonomous route management system based on colored Petri nets, verifying its safety and performance to enhance station delay handling efficiency. Song et al. [
27] proposed an autonomous train control system that improves train operation coordination and delay handling capabilities through data fusion and predictive control.
Despite these advancements, conventional methods remain constrained by static rules or single-scenario assumptions, exhibiting notable shortcomings in dynamic modeling, multi-objective real-time trade-offs, and high-dimensional constraint solving efficiency. In recent years, machine learning techniques have demonstrated transformative potential in addressing these challenges: Chen et al. [
28] developed a deep learning model that effectively captures complex spatiotemporal correlations, significantly improving delay prediction accuracy. Luo et al. [
29] introduced a Bayesian-optimized multi-output model for dynamic parameter adjustment, enhancing sequential train delay assessment and real-time forecasting capabilities. Shady et al. [
30] empirically validated the adaptability and practicality of machine learning in complex railway scenarios through real-world deployment. Sun et al. [
31] addressed the challenges of electromagnetic suspension systems in maglev trains under complex operational conditions such as track irregularities, external disturbances, time-varying mass, and input delays. They proposed an adaptive neural network controller integrating input delay compensation and parameter optimization. This approach employs a dual-layer neural network to approximate uncertain dynamics, a sliding mode surface delay compensation design, and Actor-Critic reinforcement learning for real-time parameter optimization. Lyapunov theory was used to prove finite-time stability, with simulations and experiments demonstrating superior performance in suppressing air-gap vibrations caused by delays and uncertain dynamics, significantly outperforming traditional methods and enhancing suspension control efficiency. Yue et al. [
32] tackled the real-time train timetable reorganization (TTR) challenge in high-speed rail by introducing a multi-stage decision-making framework based on reinforcement learning. The framework optimizes training efficiency through a compact, high-quality action set and uncertainty-aware action sampling strategies while designing a rule-free scheduling policy self-learning mechanism. Experimental validation confirmed its universality and competitiveness across diverse scenarios, establishing a novel paradigm for intelligent scheduling under dynamic disruptions. Wang et al. [
33] addressed the issue of traction power consumption accounting for 50% of total metro operational energy, proposing an energy-saving deep reinforcement learning algorithm (ES-MEDRL) that integrates Lagrange multipliers and maximum policy entropy. By constructing a dual-objective optimization function with enhanced velocity domain exploration and a quadratic time-energy trade-off strategy, the algorithm achieved a 20% reduction in traction energy consumption compared to manual driving on the Yizhuang Metro Line in Beijing. It simultaneously balanced operational comfort, punctuality, and safety, offering a new paradigm for intelligent energy-efficient scheduling at the metro system planning level. Qiao et al. [
34] addressed challenges in millimeter-wave communication for high-speed rail, including rapid time-varying channel modeling and beam management. Their intelligent beam management scheme based on deep Q-networks (DQN) exploits hidden patterns in millimeter-wave train-to-ground communication systems, improving downlink signal-to-noise ratio (SNR) while ensuring communication stability and low training overhead. Simulations confirmed its superior performance over four baseline methods, highlighting advantages in SNR stability and implementation complexity. Ling et al. [
35] focused on lightweight, high-quality data transmission and dynamic interaction requirements for sensor monitoring and remote communication in future intelligent high-speed rail networks. They proposed a self-powered multi-sensor monitoring and communication integration system, featuring a low-power backscatter communication framework, Gaussian mixture model analysis for coverage regions, and a total task completion time optimization problem considering energy transfer, data collection, and transmission rate constraints. An innovative option-based hierarchical deep reinforcement learning method (OHDRL) was developed to address system complexity, with experiments showing significant improvements in reward values and learning stability over existing algorithms. These advancements establish a theoretical foundation for the integration of intelligent algorithms and dynamic modeling. However, developing scheduling optimization methods that simultaneously achieve forward-looking design, robustness, and real-time capability remains a critical challenge for addressing multidimensional uncertainties in high-speed rail systems under adverse weather conditions.
In summary, existing research has made significant progress in delay propagation modeling, multi-objective optimization, and collaborative scheduling. However, three shortcomings persist in snowstorm scenarios: First, most methods rely on static rules or single-scenario assumptions, failing to capture the dynamic propagation characteristics of extreme weather. Second, achieving real-time trade-offs among safety constraints, on-time performance, and scheduling stability remains challenging. Third, solution efficiency is limited under high-dimensional constraints. Although deep learning and reinforcement learning demonstrate potential in delay prediction and dynamic scheduling, constructing a scheduling optimization framework that integrates foresight, robustness, and real-time capability remains a core challenge. To address these issues, this paper proposes an LSTM-PPO-based dynamic scheduling optimization algorithm for high-speed rail. This method leverages LSTM networks to capture long-term dependencies in snowstorm propagation and delay diffusion, while employing the PPO algorithm to ensure stable policy updates. By simulating snowstorm conditions, it predicts speed-restricted sections and overhead contact system failure risks, ultimately establishing a dynamic scheduling strategy for high-speed rail that combines real-time responsiveness with adaptability to complex scenarios. This approach effectively addresses the limitations of traditional methods in snowstorm response, providing new theoretical support for high-speed rail dynamic scheduling.
4. Simulation Experiments and Results Analysis
4.1. Experimental Environment Design
This study selects the Lanzhou-Xinjiang High-Speed Railway (Lanzhou-Xinjiang Passenger Dedicated Line) section from Lanzhou West Station to Urumqi Station as the experimental validation object. This line is characterized by a prominent high-cold climate, with an average winter operating temperature below −15 °C, a maximum wind speed of 30 m/s, and an annual average of over 20 days of heavy snowfall. The operational stability of the line faces multiple meteorological threats, including dynamic changes in snow depth, high probability of contact wire icing, and abrupt reductions in visibility.
According to the disaster classification criteria in this study, when the snowfall intensity on the core sections of the line exceeds the threshold of 0.6, the track friction coefficient decreases by 60%, triggering a speed restriction mechanism of 200 km/h. When the blizzard intensity surpasses the threshold of 0.8, the probability of contact wire icing surges to 50%, accompanied by visibility dropping below 50 m, necessitating an emergency speed restriction of 150 km/h. These meteorological conditions lead to compound risks, such as degradation of train dynamic performance, extended braking distances, and instability of the power supply system, posing severe challenges to the real-time risk perception and multi-objective coordination capabilities of the dynamic scheduling system. This provides a high-value experimental scenario for validating the adaptability of the LSTM-PPO algorithm in key aspects such as blizzard propagation modeling, contact wire icing warning, and emergency speed restriction decision-making.
The specific engineering parameters and operational characteristics of this section are detailed in
Table 3. These parameters include the line length, number of interval stations, average distance between stations, and major geographical environmental features, providing comprehensive benchmark data for model performance evaluation.
4.2. Model Parameter Settings
Prior to the formal training of the LSTM-PPO model, a series of parameter sensitivity analyses and hyperparameter optimization experiments were conducted to determine the optimal model configuration. These parameter tuning experiments were based on a systematic evaluation of model performance, employing a progressive and adaptive parameter adjustment strategy. The model utilizes a dynamic parameter adjustment mechanism, where the clipping range parameter follows a decremental strategy, progressively narrowing as training progresses to enhance policy stability and promote convergence. By comparing cumulative rewards, policy loss, and value function loss under different parameter configurations, the optimal parameter settings presented in
Table 4 were obtained.
4.3. Sensitivity Analysis
To validate the robustness of the multi-objective reward function in the LSTM-PPO model and guide weight parameter optimization, this study conducted a systematic sensitivity analysis on five key weight parameters. A grid search method tested 80 different weight combinations, evaluating the impact and interactions of each weight parameter on model performance. The results are shown in
Table 5.
Experimental results indicate that the prediction accuracy reward weight is the most critical factor influencing system performance, with a sensitivity score as high as 19.91. It exhibits a strong positive correlation (r = 0.831) with overall performance, meaning even minor weight adjustments can significantly impact the model’s overall performance. The weights for the forward-looking reward and on-time reward exhibit moderate sensitivity, both positively correlated with performance, indicating these reward mechanisms play a vital role in enhancing system performance. In contrast, the track occupancy penalty and scheduling stability penalty weights demonstrate low sensitivity and negative correlations with performance, suggesting excessive penalty mechanisms may suppress the system’s learning effectiveness.
The optimal weight configuration identified through sensitivity analysis significantly outperforms the baseline: the prediction accuracy reward weight increases substantially from 0.25 to 0.596, the forward reward weight is moderately reduced to 0.138, the punctuality reward weight is fine-tuned to 0.158, while both penalty weights are substantially decreased. This configuration embodies an optimization strategy of “reward-driven with penalty-assisted,” achieving a 19.3% improvement in overall performance score, an 18.3% increase in convergence speed, and a 15.3% rise in final reward value.
Sensitivity analysis revealed distinct mechanisms for reward and penalty components within the multi-objective reward function: reward components primarily drive system performance improvement and should be assigned higher weights; penalty components primarily serve as constraints and should maintain moderate weights to avoid excessive suppression of the system’s exploration capabilities.
4.4. Simulation Setup
To simulate the uncertainties in real operational environments, a multi-level stochastic perturbation mechanism is designed. This study adopts a three-stage progressive blizzard intensity generation strategy, through dynamically adjusting parameters to simulate the real evolution of weather conditions. The domain of the blizzard intensity parameter is defined as [0.0, 0.8] and its assignment rule follows a piecewise linear regulation principle. In the initial stage of the training cycle, the blizzard intensity parameter is implemented with a linear increment strategy, starting from an initial value of 0.0 and increasing proportionally with training progress until reaching the upper threshold of 0.8; in the mid-training stage, the blizzard intensity parameter is maintained at the maximum threshold of 0.8 to simulate a prolonged heavy snowfall environment; in the late-training stage, a random perturbation factor is introduced, causing the blizzard intensity parameter to follow a uniform distribution within the interval [0.0, 0.8] thereby enhancing the model’s environmental adaptability.
Referring to the friction coefficient range of 0.25–0.35 for dry rails and wheels specified in the Railway Track Engineering Construction Quality Acceptance Standard, the midpoint value is adopted as the baseline for snow-free and ice-free conditions, defining the base friction coefficient as 0.3. As blizzard intensity increases and snow accumulation thickens, track surface roughness decreases, reducing the friction coefficient. A decay function is established by introducing a linear combination of blizzard intensity parameters and icing probability parameters. The spatiotemporal evolution of track occupancy status is modeled using a discrete-time Markov chain, with the state space defined as S = {Idle, Occupied}. The probability distribution satisfies the following: in the Idle state, the probability of remaining in the current state is 0.8, and the probability of transitioning to the Occupied state is 0.2; in the Occupied state, the probability of remaining in the current state is 0.7, and the probability of transitioning back to the Idle state is 0.3. Detailed parameter configurations are presented in
Table 6.
To quantify the impact of different blizzard intensities on high-speed rail operations, this study establishes a five-level blizzard intensity classification system and defines corresponding operational parameter adjustment criteria. As shown in
Table 7, based on actual operational experience and meteorological data, blizzard intensity is categorized into five levels: no snow, light snow, moderate snow, heavy snow, and blizzard. Each level corresponds to specific blizzard intensity ranges, friction coefficient impacts, and maximum speed restrictions. This classification method not only considers the direct impact of blizzard intensity on train operation safety but also takes into account operational efficiency and practical feasibility, providing a quantitative basis for subsequent scheduling decisions.
To comprehensively evaluate the effectiveness of the proposed scheduling method, this study selects two mainstream reinforcement learning algorithms—PPO and DQN—as baseline comparisons. Both algorithms are representative in the field of reinforcement learning and can reflect the difficulty of scheduling tasks and the adaptability of models from different perspectives. In terms of experimental design, all algorithms are trained and evaluated within the same high-speed rail scheduling simulation environment to ensure a fair comparison. By comparing the performance of the three algorithms during the training process, their strengths and weaknesses in terms of final performance and stability are analyzed.
4.5. Result Analysis
To validate the superiority of the LSTM-PPO algorithm in high-speed rail dynamic scheduling under blizzard conditions, this study conducted comparative experiments, evaluating the proposed method against DQN and PPO algorithms.
Figure 3 illustrates the training loss and average reward trends in the high-speed rail scheduling simulation environment. Left panel (Training Loss): The DQN algorithm exhibits a rapid decline in loss during the initial phase, stabilizing after approximately 400 training episodes with a minimum loss of ~0.99. While this indicates fast convergence, further improvements are limited, and minor fluctuations persist. The PPO algorithm maintains a relatively high loss level throughout training, demonstrating slower convergence but consistent stability with minimal fluctuations. In contrast, the LSTM-PPO algorithm initially shows higher loss values, which steadily decrease over training, ultimately achieving a final loss of 0.03—significantly outperforming both DQN and PPO. This highlights the enhanced capability of the LSTM-PPO framework to capture dynamic environmental features and optimize policy learning. Right panel (Average Reward): The DQN algorithm exhibits large initial reward fluctuations, dropping as low as −34.01, followed by gradual improvement and stabilization at ~9.11, reflecting moderate policy refinement. The PPO algorithm achieves a relatively high initial reward of ~11.43, maintaining stability throughout training, demonstrating robust baseline performance. The LSTM-PPO algorithm, however, demonstrates a continuous upward trend in average reward, ultimately reaching 21.37—substantially exceeding both DQN and PPO. This underscores the LSTM architecture’s ability to enhance long-term reward accumulation and adaptability to complex temporal dependencies.
This study demonstrates significant advantages in two core aspects: decision variable design and multi-objective handling. Traditional scheduling methods are constrained by discrete, single-dimensional decision variables and operate in a “memoryless” manner, failing to leverage historical state information. In contrast, the proposed agent’s decision is represented as a multidimensional action vector, incorporating critical parameters such as speed restrictions and stop-time adjustments, enabling fine-grained control over train operations. The innovation lies in the LSTM-PPO model’s integration of LSTM units to establish temporal dependencies in decision-making. The LSTM network dynamically encodes historical state sequences into a high-dimensional context vector, allowing decisions to be based not on isolated current states but on a comprehensive understanding of the entire journey’s dynamic evolution. This substantially enhances decision-making foresight and globality. As quantitatively validated in
Figure 4, the LSTM-PPO’s Value Loss curve achieves the lowest values and stabilizes around 20 in later training stages, whereas PPO’s Value Loss stabilizes at ~40 (double the magnitude). This indicates that LSTM-PPO’s history-informed decisions improve future cumulative reward estimation accuracy by nearly 50%, enabling superior long-term scheduling strategies. Regarding multi-objective handling, traditional methods rely on static weights predefined by expert experience, which struggle to adapt to dynamic environments. This study employs reinforcement learning algorithms to implicitly and adaptively balance multiple objectives. The integration of curriculum learning further enhances robustness in complex environments. By maximizing a composite cumulative reward, the model autonomously learns to trade-off sub-objectives across different states without manual intervention.
Figure 4 experimentally validates these advantages: LSTM-PPO achieves a loss reduction to <30% of DQN/PPO levels within the first 10% of training, significantly shortening training time. Final losses are 60–90% lower than DQN/PPO in later stages, reflecting superior performance. The loss curves for LSTM-PPO exhibit smoother profiles with markedly smaller fluctuations compared to DQN/PPO, demonstrating enhanced stability and generalization. The LSTM architecture’s temporal modeling capability—capturing historical information to improve policy and value estimation accuracy—proves particularly effective for tasks involving temporal dependencies.
Through reinforcement learning algorithms, the objective function evolves from traditional single-metric optimization to maximizing a comprehensive cumulative reward. As demonstrated in the final train schedule comparison (
Figure 5), this integrated objective function yields significant results. After optimization by the three algorithms, cumulative train delays are substantially reduced. Departure time curves shift downward overall, with slopes markedly diminished. Comparisons reveal that relative to the unoptimized timetable, LSTM-PPO reduces delays to less than 5% of the original level, while PPO and DQN achieve reductions of approximately 25% and 16%, respectively. This optimization magnitude far exceeds that of the other two algorithms. This fully demonstrates that by maximizing a comprehensive reward function, the model can learn a scheduling strategy that is far superior to single-objective approaches, offering greater comprehensiveness and efficiency.