1. Introduction
Heavy-haul trains (HHT) serve as the backbone of bulk cargo transportation, offering significant advantages in both scale and cost-effectiveness. They play a crucial role in maintaining stable resource distribution and promoting sustainable economic growth by ensuring the continuous operation of critical energy and mineral supply chains [
1]. To meet the increasing global demand for these resources, enhancing the transport capacity and operational efficiency of heavy-haul railways (HHR) has become a major focus of innovation. As an advanced train control technology, the virtual coupling train system (VCTS) employs real-time train-to-train (T2T) communication and high-precision positioning to dynamically shorten inter-train spacing from the absolute braking distance to the relative braking distance. This capability substantially reduces train headways and enhancing overall line capacity [
2], addressing the urgent need for higher efficiency in HHR.
However, implementing VCTS in HHR introduces unique challenges absent in conventional rail systems. The enormous mass and length of HHT make them highly sensitive to internal forces, where even minor communication delays or losses may trigger control actions leading to serious safety risks, such as in-train force instability or derailment. Moreover, heavy-haul railways often operate in remote, complex terrains with unreliable wireless channels, rendering the system vulnerable to both natural interference and intentional attacks. Given the high strategic and economic value of transported cargo, ensuring the security and resilience of VCTS operation in HHR is crucial. Consequently, the integrity and availability of T2T communication in VCTS are highly vulnerable to jamming attacks (JAs) and other external disturbances [
3]. JAs, as one of the most prevalent forms of communication threats, directly compromise the stability of VCTS by causing transmission delays and degrading T2T communication quality [
4], which can lead to unintentional decoupling or emergency braking events [
5]. Therefore, mitigating the effects of JAs on the operation of virtually coupled fleets in HHR is of vital importance.
In response to these operational and safety challenges, a substantial body of research has explored cooperative control strategies for VCTS. Early studies use model-based control frameworks such as dynamic programming (DP) and model predictive control (MPC) [
5,
6] for trajectory optimization and predictive coordination. Subsequent work sought to enhance robustness by integrating adaptive and nonlinear control techniques, including sliding mode control and variable-structure control, to address model uncertainties and disturbances [
7,
8,
9]. At the formation-control level, hybrid and distributed approaches have also been proposed, such as integrating generalized predictive control (GPC) with potential field methods or deploying distributed MPC (DMPC) for multi-train coordination [
10,
11]. Recently, a paradigm shift toward data-driven and learning-based control has emerged. Reinforcement learning (RL) and other model-free methods are increasingly applied to cope with nonlinear dynamics without requiring precise physical models [
12,
13,
14,
15]. Meanwhile, the broader control community has explored imitation learning for approximating stochastic models of systems with complex uncertainties [
16].
In HHR, these control paradigms have been extended to address unique operational challenges. Research efforts have focused on cooperative speed tracking, trajectory smoothing, and train formation coordination [
17,
18,
19]. Operational-level studies have also tackled problems such as headway-energy joint optimization, disruption-tolerant timetable rescheduling, and large-scale scheduling optimization [
20,
21,
22]. Furthermore, recent work has emphasized distributed communication-based learning protocols to mitigate the effects of in-train forces and enhance cooperative stability [
23,
24,
25].
From a methodological standpoint, robust control and game theory have independently proven effective in ensuring stability under uncertainty. On one hand,
control has been widely adopted in railway systems to guarantee bounded responses against stochastic faults and disturbances [
26]. On the other hand, game-theoretic approaches have been used to capture strategic interactions in networked control systems, with cooperative frameworks such as potential games enabling optimal coordination among agents to achieve a system-wide objective [
27].
Nevertheless, most existing VCTS studies implicitly assume an ideal and secure communication environment, overlooking the strategic and adversarial nature of cyber-physical threats. While some efforts have addressed communication resilience through event-triggered control (ETC) or resilient distributed methods for intermittent data transmission under attacks [
28,
29], these approaches are predominantly reactive. More recently, a proactive defense based on a stochastic game framework has also been proposed for jamming-resistant virtual coupled trains [
30]. However, a critical research gap remains in applying such advanced defense mechanisms specifically to the unique operational context of HHR. The stringent stability requirements of HHR, magnified by their immense mass and the severe consequences of control failure, present unique challenges under communication uncertainty that are not fully addressed by general VCTS security frameworks. To bridge this gap, this paper proposes a stochastic game-based anti-jamming control (SGAC) strategy. The SGAC method explicitly couples the adversarial dynamics at the physical communication layer, modeled as a stochastic game, with the robust
control synthesis at the network layer. This cross-layer co-design offers a proactive defense mechanism capable of anticipating and mitigating worst-case jamming attacks, thereby ensuring the stability, safety, and resilience of virtually coupled heavy-haul train systems.
The main contributions of this paper are summarized as follows:
A novel cross-layer defense method is proposed, specifically tailored to the stringent stability requirements of VCTS in HHR. This method systematically couples the adversarial dynamics at the communication layer, modeled as a stochastic game, with the design of a robust controller at the physical layer.
A zero-sum stochastic game is formulated to model the strategic conflict between the jammer and the VCTS. Solving for the saddle-point equilibrium of this game yields an optimal probabilistic defense strategy, enabling a paradigm shift from conventional reactive measures to proactive, anticipatory defense.
A Game-Theory-Based controller is developed through a co-design process. The equilibrium outcomes of the stochastic game are systematically embedded into the linear matrix inequality (LMI) constraints of the controller synthesis, ensuring the controller is inherently robust against the worst-case attacks predicted by the game.
Together, these innovations establish a proactive, cross-layer defense mechanism that significantly enhances the resilience of virtual coupling train systems in heavy-haul railways. The remainder of this paper is organized as follows:
Section 2 introduces the proposed SGAC methodology, including the stochastic game formulation and
-based control synthesis.
Section 3 presents numerical simulations and comparative analysis, and
Section 4 concludes the paper with future research directions.
2. The Stochastic Game-Based Anti-Jamming Control (SGAC) Strategy
To address the critical security vulnerabilities of VCTS in HHR, this paper proposes a SGAC strategy. The fundamental challenge motivating this work is illustrated in
Figure 1. While VCTS enables efficient operation under ideal conditions, its reliance on T2T communication makes it susceptible to jamming attacks. As shown, JAs can degrade communication quality, leading to hazardous events such as emergency braking, which highlights the necessity for a robust defense mechanism.
The overall architecture of the proposed SGAC strategy is illustrated in
Figure 2. The method adopts a cross-layer co-design that integrates game-theoretic decision-making at the physical layer with robust control synthesis at the network layer. This architecture enables a proactive defense paradigm that anticipates and mitigates adversarial actions, rather than reacting to them.
The methodological workflow of SGAC is presented in
Figure 3, providing a detailed roadmap from problem formulation to final controller synthesis. At its core, SGAC formulates a zero-sum stochastic game between the attacker and defender, where jamming attacks are modeled as stochastic disturbances causing random packet losses. Solving for the saddle-point equilibrium yields the optimal probabilistic defense strategy, which is subsequently mapped to control actions through an
robust controller to ensure system stability and performance under worst-case attacks.
In summary, the SGAC method provides a unified methodology that translates strategic decisions into robust control actions, ensuring safe and efficient operation under contested communication environments. The following section details its mathematical formulation and algorithmic implementation.
2.1. Train Dynamics Model
The longitudinal dynamics of virtually coupled train fleets in heavy-haul transportation can be expressed as follows:
where the subscript
k denotes the
k-th train in a virtually coupled fleet.
and
represent the mass and speed of the
k-th train, respectively.
is the traction or dynamic braking force generated by the locomotive, and
denotes the air braking force.
and
represent the basic resistance and line resistance, respectively. The variable
t denotes the running time.
where
is the standard acceleration due to gravity, and
is an empirical resistance coefficient. The constants
,
, and
are positive coefficients determined by the train type and external environment.
The line resistance represents the additional resistance caused by the railway line’s geometric and topographic features, including both gradient and curvature effects. It is defined as
where
is the track gradient,
is the track curvature radius, and
is an empirical coefficient reflecting the influence of curve sharpness and train length.
2.2. The Hybrid Control Model of Heavy-Haul Train-Based Virtual Coupling Train System
Let
and
represent the dynamic characteristics of the
k-th train and the state of T2T communication. The hybrid control model of VCTS can be expressed as
where
represent the state of the
k-th train,
denote the control input,
is the disturbance and
represent the state of the
k-th train at the initial time
. The time-varing state value
depends on the defense action
and the attack action
.
represents the cross-layer control strategy. Besides, attacks are defined as
, while defense strategies are defined as
.
2.3. The Stochastic Game Method in SGAC
The proposed SGAC algorithm transforms the robust control problem into a minimax optimization by employing a zero-sum game framework. Within the VCTS, the stochastic game in SGAC represents the interactive decision-making process between two agents: the jammer (attacker) and the train control system (defender). The jammer seeks to maximize communication degradation by inducing packet loss and delay, whereas the defender aims to minimize performance deterioration via adaptive control. Each player’s strategy affects communication quality, thereby influencing the accuracy and timeliness of control information exchange. Thus, the stochastic game method quantitatively describes how cyber-level conflicts translate into variations in the control performance of the VCTS.
To obtain the optimal closed-loop controller
, the attacker–defender interaction is formulated as a minimax stochastic game, in which the performance cost
quantifies the influence of closed-loop disturbances and serves as a guaranteed upper bound on system degradation.
The cost function is composed of two main components:
where
denotes the nominal system performance, and the second term models the degradation caused by jamming, reflecting the trade-off between control effort and robustness.
By treating the stochastic game outcome as an uncertainty source in system dynamics, the optimal attenuation level
can be determined, yielding the corresponding
controller
that guarantees robustness against worst-case jammer-induced disturbances, as constrained by
where
denotes the
-norm.
When the game reaches its saddle-point equilibrium, the optimal strategies
define an upper-bound value function, obtained from the following partial differential equation:
where
denotes the state transition probability of the system.
As both players act simultaneously and independently, neither knows the other’s decision in advance, and each action affects both the instantaneous cost and state transition probability of the VCTS. To quantify this effect, a cost function
is defined, mapping the current system state and player strategies to the corresponding instantaneous performance loss. To evaluate the long-term impact of repeated attacker–defender interactions, a discounted payoff metric is introduced to quantify the expected cumulative performance degradation of the VCTS over time, given an initial communication state
and the strategy pair
. The specific calculation is formulated as follows:
where
denotes the discounting factor, and
represents the expectation operator.
Before deriving the optimal defensive control law, the existence of an equilibrium under the given stochastic dynamics must be ensured, as formalized in the following theorem.
Theorem 1. Consider a game defined for each state . If the following conditions hold:
- 1.
the state transition rates are continuous with respect to the mixed strategy pair ;
- 2.
the cost function is bounded.
then, for each state , there exists at least one static strategy pair that satisfies the following equilibrium condition:where is the abbreviation for . Additionally, and denote the lower and upper bounds of the mixed-strategy game, respectively. The pair constitutes a saddle-point equilibrium algorithm, with a unique game value satisfying .
Given the existence of equilibrium, the saddle-point is rigorously defined to characterize the relationship between both players’ optimal strategies. This definition serves as the foundation for deriving the optimal defensive algorithm through a cross-layer design approach.
Definition 1 (Saddle-Point Equilibrium).
A pair of strategies constitutes a saddle-point equilibrium for a zero-sum stochastic game if the following inequalities hold for any other strategy pair , and for all initial states i:where represents the cumulative discounted return. The unique value is called the Value of the Game. The equilibrium strategies can be computed via a value iteration algorithm, which reformulates the stochastic game as a dynamic programming problem and updates the expected cost recursively until convergence according to the Bellman equation:
where the
operator computes the value of the matrix game
, which is constructed at each step from the immediate cost
and the expected future costs.
Once the value iteration process has converged, the saddle-point equilibrium strategies
constitute the optimal policies satisfying the following condition:
The SGAC algorithm integrates game-theoretic equilibrium strategies with an controller to ensure robustness against worst-case JAs. This establishes a cross-layer link from communication-layer defense to physical-layer control, ensuring safe headway and coordinated motion under varying jamming conditions. The next section details the final solution process of the SGAC algorithm.
2.4. The -Based Cross-Layer Controller in SGAC Strategy
The SGAC algorithm is a robust closed-loop state-feedback controller formulated within the framework to ensure the operational stability of virtual coupling fleets in HHR, even under worst-case conditions. Within the VCTS architecture, the controller continuously regulates traction and braking commands according to real-time communication states, which may be disrupted by jamming attacks manifested as random packet losses.
To formalize this process, the relationship between the ideal controller output and the received measurement at time
is expressed as
At time , the controller cannot directly access the ideal output due to potential communication disruptions and instead relies on the received measurement transmitted through an unreliable channel. This formulation explicitly captures the information mismatch caused by packet loss, distinguishing the intended control signal from the actually received measurement.
The state of the T2T communication link is denoted by
, where the discrete state space is defined as
. Packet loss events are modeled as a state-dependent random variable
. Under the influence of JAs, the random variable
follows a Bernoulli distribution:
where
signifies a successful data transmission, allowing the controller to use the current information,
. Conversely,
indicates that a packet has been lost, requiring the controller to utilize the information from the previous time step,
.
The dynamics of the SGAC observer and state-feedback controller under stochastic packet losses are described by the following state-space equations, which serve as the basis for subsequent
stability and performance analysis:
where
is the estimated state of train
k at time
,
denotes the control signal received by the vehicle onboard controller, and
are the control and the observation gains, respectively.
The quality of service (QoS) of the T2T communication link is quantified using two random variables,
and
, whose distributions are dynamically governed by the strategic interaction between JAs and the system’s defense mechanisms. Given the communication state
and the players’ strategies
, the expectations of these QoS indicators are calculated as follows:
The preceding equations establish a mathematical linkage between the attacker–defender game strategies and the resulting QoS performance, providing a bridge through which the SGAC algorithm dynamically adjusts each train’s control input according to the real-time communication state. Based on this connection, the next section derives the performance metrics for the VCTS hybrid system, explicitly accounting for these game-dependent variables.
2.5. The SGAC Model Solution
The central principle of the proposed cross-layer design is the establishment of a direct link that translates the strategic outcomes of the stochastic game into robust control actions. This linkage is achieved by explicitly mapping the expected results of the game-theoretic strategies to the system’s overall performance index for each network state .
For zero initial conditions, the performance objective is achieved if
satisfies the following inequality for all possible states
:
A smaller value of signifies better disturbance rejection performance.
Although this inequality defines the control objective, it cannot be directly solved and must therefore be reformulated into a tractable form. To this end, the following section will convert this control condition into a set of linear matrix inequalities.
Theorem 2. For a given scalar and any state , if there exist positive definite matrices , , , and , and real matrices , , such that the LMI is feasible, then a hybrid algorithm pair exists that guarantees the following properties:
- 1.
The VCTS hybrid system is exponentially mean-square stable.
- 2.
The prescribed -norm constraint is satisfied for all nonzero disturbances .
The LMI feasibility condition is given by inequality:where the entries of the matrix are given as follows: Furthermore, if the LMI condition is satisfied, the controller gain and observer gain can be derived as follows: Building on the feasibility conditions of the previous theorem, Formulas (
18) and (
19) are reformulated into a convex optimization problem in Theorem 2, which determines the minimum achievable
performance index
and offers quantitative performance guarantees for the controller design:
where
denotes the optimal value obtained from the optimization, corresponding to the game value in the underlying zero-sum stochastic method. To formalize this relationship, we define the system’s performance payoff matrix,
, where each element
represents the physical layer
performance under
. Consequently, the Value of the Game,
, is the expected outcome of this payoff matrix when both players adopt their saddle-point equilibrium strategies
. The value
results from the following mapping:
Based on the theoretical analysis and objective control function, the stochastic game defense algorithm employs a coupled co-design to address the
index and the
optimal controller. The solution to the LMI problem presented in Theorem 2 simultaneously provides both the optimal game strategies
and the corresponding optimal controller gains. These strategies then determine the expected Qos of the communication link, ensuring that the design satisfies the conditions
and
. The proof of Theorem 2 can be found in the
Appendix A.
The primary control objective for the VCTS system is to maintain a predefined target spacing between consecutive trains. Strict adherence to this spacing is crucial for ensuring platoon stability and guaranteeing both operational efficiency and safety. However, this critical task is compromised by JAs that disrupt the T2T communication necessary for cooperative control.
To address this challenge, this paper has developed the SGAC algorithm, an integrated defense strategy that combines stochastic game theory with robust control. By jointly optimizing the communication defense strategies and physical control parameters, the SGAC method enables the VCTS to dynamically adjust its feedback control inputs in response to real-time network states, thereby preserving stability and coordination under severe jamming attacks.
3. Numerical and Simulation Results
In this section, a series of simulations are conducted to validate the effectiveness and robustness of the proposed SGAC algorithm. The simulations are implemented in MATLAB R2021a, using the YALMIP toolbox [Available online:
https://yalmip.github.io/; accessed on 20 October 2024]; to formulate the convex optimization problem and the SDPT3 solver [Available online:
https://www.math.nus.edu.sg/~mattohkc/sdpt3.html; accessed on 20 October 2024].
The simulation is based on a virtual platoon scenario consisting of four heavy-haul trains. The predefined target spacing between consecutive trains in the VCTS is set to 600 m. Each train is configured to include one HXD1 locomotive and fifty C80 carriages, resulting in a total train length of 635 m. The emergency braking deceleration is set to −0.6 m/s2.
The primary objective of these experiments is to evaluate the performance of the SGAC strategy under varying probabilities and durations of JAs. To highlight its superiority, the robustness of the proposed SGAC algorithm is compared against two classical control methods: DP and MPC.
The state-space matrices for the train dynamics model are defined as follows:
,
;
, Other system parameters are set to
and
. The state transition matrix for the Signal-to-Interference-plus-Noise Ratio (SINR), which is used for the payoff matrices
and
, is given by
Before running the dynamic simulations, the LMI optimization problem presented in Theorem 2 is solved to synthesize the required controller and derive the game-theoretic solution. The results of this offline computation are as follows: the Saddle-point equilibrium strategies are and . The optimized minimum value of is . The controller gain is and the observer gain is . These synthesized gains and strategies are then used to implement the SGAC controller in the following simulation tests.
The operational performance of the proposed SGAC strategy was comprehensively evaluated against baseline methods under jamming attacks using quantitative criteria. To facilitate this rigorous assessment, the root mean square error (RMSE) metric is employed. The RMSE quantifies the average magnitude of deviation between the actual and reference values over the simulation horizon, providing a clear measure of tracking performance and stability. A lower RMSE value indicates superior performance. The RMSE is calculated as follows:
where
and
are the speed and position of train
k at time
,
is the desired inter-train headway, and
L denotes the physical length of train.
Figure 4 illustrates the trajectory results under a targeted jamming attack, while
Table 1 summarizes the corresponding quantitative RMSE metrics. In the simulated scenario, a communication loss occurs in the second train. As clearly shown in
Figure 4b,c, both the traditional DP and MPC algorithms interpret this data loss as a critical fault, forcing all subsequent trains into emergency braking. This behavior results in catastrophic degradation, reflected in their extremely high headway RMSE values exceeding 3700 m (
Table 1). The ETC strategy, depicted in
Figure 4d, demonstrates significantly better resilience compared to DP and MPC, managing to avoid complete failure but still exhibiting noticeable deviations, resulting in a headway RMSE of
m. In stark contrast, the proposed SGAC method, shown in
Figure 4e, maintains exceptional stability. By leveraging its integrated game-theoretic model and robust
controller, the SGAC algorithm effectively anticipates and compensates for the disruption, achieving a near-ideal headway RMSE of only
m, significantly outperforming all baseline methods.
The robustness of the SGAC strategy against varying attack intensities was further assessed, as shown in
Figure 5 and quantified in
Table 2. The lower plot in each subfigure of
Figure 5 visualizes the corresponding random distribution of jamming events. Despite the visually evident increase in attack frequency and randomness, the operational trajectories demonstrate that platoon stability is well preserved across all scenarios. This qualitative observation is strongly corroborated by the quantitative data in
Table 2, where the headway RMSE remains negligible even as the jamming probability increases to a high of
, rising only from
m to
m. These results confirm the high resilience of the SGAC strategy against frequent and intermittent attacks.
The capability of the SGAC system to withstand sustained communication disruptions is illustrated in
Figure 6, with quantitative results listed in
Table 2. When the communication failure is transient (lasting 1–5 cycles), the operational dynamics remain largely unaffected, as the state observer and predictive game strategy effectively compensate for short-term data loss. As the failure duration increases to 10 cycles, uncertainty in the state estimation grows, resulting in more conservative speed regulation. Nevertheless, the headway RMSE increases only to
m, demonstrating the system’s graceful degradation property. Unlike the abrupt performance collapse observed in the baseline methods, the SGAC method degrades smoothly and predictably as the attack severity intensifies, validating its effectiveness as a proactive and robust game-theoretic defense mechanism for secure train operations.