Next Article in Journal
A New Hybrid Sensor Design Based on a Patch Antenna with an Enhanced Sensitivity Using Frequency-Selective Surfaces (FSS) in the Microwave Region for Non-Invasive Glucose Concentration Level Monitoring
Previous Article in Journal
Efficient Reliability-Aware Hardware Trojan Design and Insertion for SET-Induced Soft Error Attacks
Previous Article in Special Issue
A Distributed Hybrid Extended Kalman Filtering–Machine Learning Model for Trust-Based Authentication and Authorization in IoT Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Game Theory Model for Network Attack–Defense Strategy Selection in Power Internet of Things

1
Jilin Information & Telecommunication Company, State Grid Jilin Electric Power, Changchun 130021, China
2
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(2), 426; https://doi.org/10.3390/electronics15020426
Submission received: 5 November 2025 / Revised: 19 December 2025 / Accepted: 25 December 2025 / Published: 19 January 2026
(This article belongs to the Special Issue Intelligent Solutions for Network and Cyber Security)

Abstract

As the digitalization and intelligent transformation of power systems accelerates, the Power Internet of Things (PIoT) plays a pivotal role in ensuring efficient energy transmission and real-time regulation. However, this openness and interconnectivity also expose the system to diverse cyber threats, where attackers can disrupt stable power communication and dispatch operations through means such as data tampering, denial-of-service attacks, and control intrusion. To characterize the dynamic adversarial process between attackers and defenders in the PIoT, this paper constructs a zero-sum differential game model for cyber attack–defense strategy selection. To achieve equilibrium in the formulated differential game, optimal control theory is employed to solve the optimization problems of the game participants, thereby deriving the optimal strategies for both attackers and defenders. Finally, simulation results illustrate the evolution of network resource competition between attackers and defenders in the PIoT. The results also demonstrate that our proposed model can effectively and accurately describe the evolution of the system security state and the impact of strategic interactions between attackers and defenders.

1. Introduction

As the physical backbone of the energy transmission system, the power grid constitutes the most fundamental and strategically critical infrastructure for modern electricity supply [1,2]. It serves not merely as a carrier for delivering electrical energy from producers to consumers, but also forms the operational foundation for various sectors including industrial manufacturing, commercial services, healthcare, and transportation systems [3]. The stability and reliability of the power grid directly determine a nation’s economic operational efficiency and public safety levels [4]. Furthermore, any interruption or failure in the power grid may trigger cascading effects, resulting in severe consequences such as production line shutdowns, data service disruptions, and urban function paralysis [5,6], thereby posing substantial threats to socio-economic stability and national security. Driven by the wave of digital transformation, Internet of Things (IoT) technology is rapidly proliferating across various domains including energy, industrial operations, and urban management [7]. One of the most distinctive features of IoT applications lies in their capacity to achieve comprehensive perception of the physical world and real-time data acquisition through the extensive connectivity of massive terminal devices [8]. This characteristic aligns closely with the modernization requirements of power systems, a synergy particularly evident in the power grid sector [9]. Hundreds of millions of devices—such as smart meters, line monitoring equipment, substation inspection robots, and distribution automation terminals—are being integrated into the grid system at an unprecedented density, forming a pervasive IoT network that spans the entire chain of power generation, transmission, transformation, distribution, and consumption, thereby constituting the Power Internet of Things (PIoT) [10,11]. This infrastructure provides robust data support and an intelligent foundation for building a new generation of power systems [12].
Smart energy grid systems equipped with intelligent bidirectional data communication capabilities can significantly enhance the operational and control performance of traditional energy grids [13]. Such improvements address long-standing challenges related to the reliability, flexibility, and efficiency of conventional grid systems [14,15]. In smart grid environments, the system must deliver a range of services, including large-scale integration of distributed renewable energy resources, real-time dynamic data communication between consumers and service providers regarding electricity pricing and energy consumption, collection and transmission of system parameters for statistical analysis, and implementation of necessary actions based on such analyses. Smart energy grids generate vast volumes of data and information that require transmission, processing, and storage to facilitate intelligent decision-making and operational responses.
However, the large-scale deployment of PIoT devices, communication networks, and intelligent control technologies has given rise to what is termed the PIoT or smart grid environment. Within this context, the security of the PIoT becomes particularly critical. At the same time, increased system openness and interconnectivity expand the attack surface; intrusions into communication or control layers could lead to imbalances in energy dispatch, interruptions of critical infrastructure, and threats to national power security [9,13,16]. Therefore, it is imperative to construct a secure, reliable, and sustainable PIoT architecture to ensure the safety and resilience of power IoT services.
With the deep integration of the Power Internet of Things (PIoT) into all segments of modern power systems, its security risks have become increasingly prominent. Zhang [17] proposed a zero-trust-based security protection architecture to meet the requirements of massive connectivity and open-sharing services in PIoT environments. Choi [18], by analyzing security vulnerabilities in cloud-enabled power systems, developed an appropriate security service framework tailored for cyber-attack detection in PIoT cloud environments. Gong and Xie [19] employed a zero-sum Markov game to examine security issues in smart grids, focusing on how network-layer attacks can maximize physical system damage and deriving optimal defensive strategies in discrete time steps. To address challenges related to device trust, data integrity, and large-scale data processing in PIoT, Zhou et al. [9] explored blockchain-integrated security architectures for emerging applications such as vehicle-to-grid (V2G) energy trading. For data injection attacks, Sanjab proposed a Stackelberg game-based attack–defense model that analyzes scenarios involving multiple attackers and a single smart grid defender. Although these studies contribute valuable insights into PIoT security from perspectives such as architectural design, cloud-based security services, blockchain mechanisms, and static or discrete-time game models, they still lack a system-level formulation capable of capturing the continuous-time evolution of defensive and adversarial resource competition across heterogeneous device populations. This gap motivates the core contribution of this paper, which develops a differential-game-based framework to model and analyze the dynamic attack–defense interactions in PIoT.
In this paper, we adopt a macroscopic system-level perspective, focusing on the strategic interaction and dynamic evolution of cybersecurity states. Recognizing that the essence of cyber attack and defense can be attributed to competition for limited resources, modeling the contention for network resources and the strategic interaction between attackers and defenders provides theoretical support for behavior prediction and the design of proactive security defense mechanisms. Within this framework, attackers gradually erode the integrity of the PIoT by compromising its devices, whereas defenders work to maintain normal network functionality by protecting these devices. We abstract PIoT devices as the core network resources contested between the two adversarial parties. The state of each PIoT device undergoes transitions in response to dynamic adjustments of attack and defense strategies, manifested as the probabilistic allocation of competitive intensity to each device by both sides. The evolutionary process of the system state is described by a set of differential equations, each capturing the temporal variation in the number of devices in a particular state. To this end, we formulate a zero-sum differential game model to analyze the competitive relationship between attackers and defenders over devices in the PIoT. In this model, the attacker aims to maximize their payoff, while the defender seeks to minimize system losses through optimal competitive strategies. We employ optimal control theory to solve the aforementioned optimization problem and achieve an equilibrium state. The main contributions of this paper are summarized as follows: (1) Accounting for the dynamic confrontation and strategic interaction inherent in the struggle for control over PIoT devices, the real-time evolution of device states is characterized by a set of nonlinear differential equations. Concurrently, a differential game model is established based on this evolutionary process to formally describe the competitive behavior between attackers and defenders over device control in the PIoT across a continuous time dimension. (2) By formulating a dual-objective optimization problem that aims to maximize the attacker’s benefits and minimize the defender’s system losses, the Hamiltonian-based optimal control method is effectively introduced to solve it. Theoretical analysis demonstrates that the differential game admits an equilibrium solution in the form of a saddle-point strategy. Numerical simulations reveal the dynamic evolution patterns of the system under different security states, and the results indicate that the proposed game-theoretic framework significantly enhances system security performance.
The remainder of this paper is structured as follows. Section 2 introduces a game-theoretic model for offensive and defensive operations in networks. Section 3 derives optimal offensive and defensive strategies by analyzing the proposed game model. Section 4 presents and analyzes numerical results. Finally, Section 5 concludes the paper.

2. Game Theory-Based PIoT Attack–Defense Model

This section analyzes the dynamic process of security state transitions in electric power communication networks and constructs an attack–defense game model.

2.1. Network Model and Network States

Figure 1 illustrates an example of the considered attack and defense model for the electric power communication network. The network is assumed to contain n PIoT devices that serve as network resources. These power grid IoT terminal devices include smart meters and phasor measurement units (PMUs) and other grid equipment [20,21,22]. The fundamental objective of both attackers and defenders is to gain control over these PIoT devices, with the core strategy revolving around the dynamic allocation of competitive intensity to each device. At time t, the competitive intensity invested by each party in a particular IoT device can be quantified as L A ( t ) for the attacker and L D ( t ) for the defender. This intensity reflects the real-time level of adversarial investment by the attacker in seizing device control versus the defender’s effort to maintain device security. It is important to note that the competitive intensities allocated to all PIoT devices are mutually independent and identically distributed.
In the aforementioned network model, we consider that the attacker possesses J levels of competitive intensity ( a 1 , a 2 , , a j ) , with corresponding selection probabilities ( p 1 , p 2 , , p j ) . The defender, on the other hand, has K levels of competitive intensity ( d 1 , d 2 , , d k ) , associated with selection probabilities ( q 1 , q 2 , , q k ) . The objective of the cyber attacker is to maximize system loss in the PIoT by paralyzing PIoT devices through the allocation of appropriate attack competitive intensity. Conversely, the cyber defender aims to minimize system loss in the PIoT by protecting each device from attacks through the assignment of suitable defense competitive intensity.

2.2. Network State Transition Relationships

In this paper, we consider scenarios where more competitive players can control devices and even render them inoperable, even when there is a significant disparity in capabilities. For each IoT device, we consider four states: normal state (N), attack state (A), defense state (D), and malfunction state (M).
N indicates that the power IoT device is in normal operating status, but the node may be vulnerable to attack due to inherent weaknesses.
A indicates that the power IoT device is under attack, but its service quality has not yet degraded.
D indicates that the power IoT device node is protected by defense strategies and immune to attack strategies.
M indicates that the power IoT device node is experiencing severe degradation in service quality or has lost service capability.
The state transition patterns for power IoT devices under the above four states are illustrated in Figure 2.
Referring to [23], we define the state of power IoT devices as transitioning between four states based on the relative contention strength injected by attackers and defenders.
NA: For a device in its normal state, if the attacker’s injected contention strength exceeds that of the defender, the attacker gains control over the dormant device. The device state transitions to A, where the strength difference L A ( t ) L D ( t ) surpasses the predefined threshold δ 1 . At time t, the transition probability from the normal state to the attacked state is defined as
α A N ( t ) = P r { L A ( t ) L D ( t ) δ 1 } ,
ND: For a device in its normal state, if the defender provides a stronger contest on the device than the attacker, the defender gains control over the dormant device, transitioning its state to D, where the strength difference L D ( t ) L A ( t ) exceeds δ 1 . At time t, the transition probability shifting the device state from normal to defense is defined as
α D N ( t ) = P r { L D ( t ) L A ( t ) δ 1 } ,
The aforementioned conversion process can also be termed control over neutral devices within the power IoT, where attackers and defenders vie for control over power IoT devices in their normal state.
DA: For a power IoT device in a defensive state, if the attacker injects a contention strength exceeding the device’s own strength, the attacker gains control over the defensive device. The device state transitions to A, meaning the strength difference L A ( t ) L D ( t ) surpasses the predefined threshold δ 2 . At time t, the transition probability shifting the device state from defensive to attacked is defined as
α D A ( t ) = P r { L A ( t ) L D ( t ) δ 2 } ,
AD: For power IoT devices under attack, if the defender injects a contention strength exceeding that of the attacker, the defender gains control of the device. The device state transitions to D, where the strength difference L D ( t ) L A ( t ) surpasses the predefined threshold δ 2 . At time t, the transition probability shifting the device state from attacked to defended is defined as
α A D ( t ) = P r { L D ( t ) L A ( t ) δ 2 } ,
The transition process from state D to state A and from state A to state D represents the struggle between attackers and defenders to gain control over power IoT devices. It is important to note that this process is significantly more challenging, as defined by δ 1 < δ 2 .
AM: If an attacker aims to induce a fault state in an IoT device, they must inject a substantially higher level of intensity. In this paper, we assume that if the contention strength injected by the attacker into the defense device exceeds the strength the defender can provide, the defense device will become paralyzed, and the device state can transition to a fault state. That is, the strength difference L A ( t ) L D ( t ) exceeds a predefined threshold δ 3 . At time t, the transition probability of changing the device state to a fault state is defined as
α A M ( t ) = P r { L A ( t ) L D ( t ) δ 3 } ,
The transition process from state A to state M is termed the adversarial process of the device, wherein an attacker (or defender) attempts to render the defended (or attacked) device inoperable. Note that the adversarial process is more challenging than gaining device control, hence we define δ 2 < δ 3 . Thus, the relationship among these three thresholds is δ 1 < δ 2 < δ 3 .
MN: For power IoT devices in a fault state, security measures such as patching, rebooting, or repairing can restore them to the normal state N. We define a fixed parameter ρ as the transition probability for a device moving from the fault state to the normal state.
In this paper, we assume that the competition intensity among different PIoT devices is independent and identically distributed (i.i.d.), which facilitates the analytical characterization of the strategic interactions between attackers and defenders competing for system-wide resources. However, this assumption inevitably overlooks the heterogeneous nature of PIoT nodes in terms of workload, criticality, and vulnerability. In practical PIoT deployments, devices often exhibit substantial variations in operational importance, traffic load, exposure to cyber threats, and communication patterns, all of which may significantly influence their actual competitive behaviors. Specifically, the cost coefficients c A and c D can be adjusted using historical attack and defense cost data, while the transition thresholds δ 1 , δ 2 , δ 3 can be weighted based on vulnerability assessment scores or security classifications. Such data-driven calibration allows the proposed framework to approximate heterogeneous behaviors at a macroscopic level and enhances its applicability to real-world PIoT environments.

2.3. Network Evolution

Let N ( t ) , A ( t ) , D ( t ) , and M ( t ) denote the number of devices in states N, A, D, and M, respectively, at time t. As illustrated in Figure 2, the state transitions N D and A D positively influence D ( t ) , with rates α D N ( t ) N ( t ) and α A D ( t ) N ( t ) , respectively. Conversely, the state transition D A negatively affects D ( t ) , with a rate of α D A ( t ) D ( t ) . Therefore, the rate of change of D ( t ) can be expressed as
D ˙ ( t ) = α D N ( t ) N ( t ) + α D A ( t ) A ( t ) α A D ( t ) ,
Similarly, the rates of change for A ( t ) , M ( t ) , and N ( t ) can be expressed as A ˙ ( t ) , M ˙ ( t ) and N ˙ ( t ) in Equation (7). Therefore, the evolution of the network state can be described by the following system of differential equations:
D ˙ ( t ) = α D N ( t ) N ( t ) + α D A ( t ) A ( t ) α A D ( t ) D ( t ) A ˙ ( t ) = α A N ( t ) N ( t ) + α A D ( t ) D ( t ) α D A ( t ) A ( t ) α M A ( t ) A ( t ) M ˙ ( t ) = α M A ( t ) A ( t ) ρ M ( t ) N ˙ ( t ) = ρ M ( t ) α D N ( t ) N ( t ) α A N ( t ) N ( t )

2.4. Game Model

Based on the preceding analysis, we propose a differential game model in which the attacker and defender serve as the game participants. Specifically, the differential game can be represented as G = { n , W , O , x ( t ) , t , U } , where:
n = { n A , n D } denotes the set of players in the attack–defense game, with representing the defender and the attacker;
W = { W A , W D } defines the action space, where W A ( t ) = { a 1 , a 2 , , a j } denotes the competitive intensity levels available to the attacker, and W D ( t ) = { d 1 , d 2 , , d k } denotes those available to the defender;
O = { O A , O D } represents the strategy space of the players in the attack–defense game, with corresponding to the defender and to the attacker;
x ( t ) describes the state evolution variables of the PIoT, modeled via differential equations capturing the network dynamics;
t [ 0 , T ] denotes the time variable in the attack–defense differential game. The system state, the control strategy trajectories of both players, and the game payoff are all functions of t;
U is the utility function of the differential game. In this work, U is defined as the system loss resulting from the competitive interactions between attack and defense over the IoT devices. The attacker aims to maximize the system loss, while the defender seeks to minimize it, forming a zero-sum game structure.
In the proposed game, the system loss is quantitatively defined as the number of devices favorably influenced by the attacker minus the number of devices favorably influenced by the defender.It is worth noting that although this study develops the PIoT differential game model from a macro-level network perspective, real-world PIoT datasets—such as those derived from PMU-based false data injection attack analyses—can serve as an essential basis for defining realistic adversarial behaviors within the proposed framework. Specifically, vulnerability assessment at the device or measurement level can be leveraged to determine the weighting of attack intensities and to refine the attacker’s actionable strategy space, thereby enhancing the fidelity of the modeled adversarial interactions.
Within the PIoT, the objective of the cyber attacker is to maximize the number of compromised devices and induce failures among defended devices at minimal cost, whereas the defender aims to maximize the number of securely defended devices and neutralize attacks with minimal expenditure. The attack cost is modeled as a function of both the attack intensity applied to PIoT devices and the number of devices the attacker seeks to engage. Meanwhile, the number of devices contested between the attacker and defender depends on the quantity of devices in states N, A, and D. This follows from the state definition, wherein devices in a failed state are excluded from the competition target set of both parties.
For an individual PIoT device, let E A ( t ) and E D ( t ) denote the average competitive intensity injected by the attacker and defender, respectively, expressed as [23]
E A = i = 1 M a i p i ( t ) , E D = i = 1 k d k q k ( t ) ,
When implementing strategies, both attackers and defenders incur corresponding strategy costs, which are typically proportional to the performance of the strategies. With reference to [24], the strategy execution cost at time t is given by
C = 1 2 c A E A 2 ( A ( t ) + D ( t ) + N ( t ) ) 1 2 c D E D 2 ( A ( t ) + D ( t ) + N ( t )
where c A and c D represent the cost/utility coefficients of the defense strategy and attack strategy, respectively. Furthermore, these are dimensionless coefficients indicating the ratio of strategy cost to effectiveness. The smaller their values, the better the cost performance of the strategy. Both attackers and defenders pursue strategies with smaller cost/utility coefficients, which are relatively more challenging to achieve. Taking into account both strategy benefits and execution costs, the utility functions of the attacker and defender in the game over the time interval [ 0 , T ] are given by
U ( p , q ) = t = 0 T u ( t ) d t = t = 0 T A ( t ) + M ( t ) D ( t ) C ( t ) d t
Therefore, the attacker aims to maximize the utility function U by optimizing the attack strategy, while the defender seeks to minimize the utility function U by optimizing the defense strategy.
In the cyber attack–defense process of the PIoT, the optimization problem for the cyber attacker can be formulated as
max p i ( t ) U p i ( t ) , q k * ( t ) s . t . 0 p i ( t ) 1 , i = 1 M p i ( t ) = 1
In the cyber attack–defense process of the PIoT, the optimization problem for the cyber defender can be formulated as
max q k ( t ) U p i * ( t ) , q k ( t ) s . t . 0 q k ( t ) 1 , k = 1 K q k ( t ) = 1
It is worth noting that although this study develops the PIoT differential game model from a macro-level network perspective, real-world PIoT datasets—such as those derived from PMU-based false data injection attack analyses—can serve as an essential basis for defining realistic adversarial behaviors within the proposed framework. Specifically, vulnerability assessment at the device or measurement level can be leveraged to determine the weighting of attack intensities and to refine the attacker’s actionable strategy space, thereby enhancing the fidelity of the modeled adversarial interactions.

3. Optimal Network Decision Analysis

In the offense–defense game described above, the strategies of both players are interdependent. The pair of optimal strategies { p i * ( t ) , q k * ( t ) } constitutes the optimal decision strategy for this offense–defense game.
Definition 1.
If the strategy combination { p i * ( t ) , q k * ( t ) } satisfies the condition
U p i * ( t ) , q k * ( t ) U p i ( t ) , q k * ( t ) , U p i * ( t ) , q k * ( t ) U p i * ( t ) , q k ( t ) ,
then this strategy combination { p i * ( t ) , q k * ( t ) } is an optimal strategy, which also represents the equilibrium point of the zero-sum differential game described.
To identify the optimal saddle point strategy, this paper employs Pontryagin Minimum Principle (PMP) from optimal control theory to derive the optimal strategy for game players [25]. Specifically, we first construct the Hamiltonian function by introducing joint state variables. Subsequently, we jointly solve the cybersecurity state function and the joint state function to obtain the optimal control strategy. Before solving for the optimal strategy, we first analyze the existence of optimal solutions in the game, as detailed in Theorem 1.
Theorem 1.
In game G, there exists a set of optimal strategies satisfying
p i * ( t ) = arg max p i ( t ) U ,
q k * ( t ) = arg max q k ( t ) U .
Proof of Theorem 1.
Both the network dynamics evolution equation of the PIoT considered herein and the instantaneous payoff u ( t ) of the players are continuous and bounded. Based on the definition of the maximin problem in (13) and (14), we differentiate with respect to strategies p i and q k , respectively, yielding
2 u p i 2 = c A a i 2 < 0 ,
2 u q k 2 = c D d k 2 > 0 .
The utility function u ( t ) is convex with respect to p i and concave with respect to q k . Therefore, it can be proven that a set of optimal strategies { p i * ( t ) , q k * ( t ) } exists in the constructed game. □
Before determining the optimal strategies for both sides, we must first construct the Hamilton function. In this paper, the Hamilton function is a combination of the integral of the differential equation and the instantaneous utility function, expressed as [26]
H β , O , u β , t = u ( t ) + λ β β ˙ ( t ) , β { A , D , N , M } = u ( t ) + λ N N ˙ ( t ) + λ A A ˙ ( t ) + λ D D ˙ ( t ) + λ M M ˙ ( t )
where β ( t ) is a joint state variable.
Next, we will analyze them separately, as shown in Theorems 2 and 3.
Theorem 2.
In the proposed game, the optimal strategy for the network attacker can be expressed as
p i * ( t ) = 0 , p i ( t ) 0 λ A θ A + λ D θ D + λ N θ N + λ M θ M [ A ( t ) + D ( t ) + N ( t ) ] c A a i 2 s i a s p s a i , 0 < p i ( t ) < 1 1 , p i ( t ) 1
where switching function θ β is defined as
θ D = d k a i δ 1 q k N ( t ) + d k a i δ 2 q k A ( t ) a i d k δ 2 q k D ( t ) , θ A = a i d k δ 1 q k N ( t ) + a i d k δ 2 q k D ( t ) d k a i δ 2 q k A ( t ) a i d k δ 3 q k A ( t ) , θ M = a i d k δ 3 q k A ( t ) , θ N = d k a i δ 1 q k N ( t ) a i d k δ 1 q k N ( t ) .
Proof of Theorem 2.
Equations (1)–(5) describe the state transitions resulting from the occurrence probabilities of various states in power IoT devices. After defining the differential game model and analyzing attack and defense strategies, the transition probabilities can be expressed in policy form as follows [25].
At time t, the attacker selects a competitive intensity level a i with probability p i ( t ) , while the defender selects a competitive intensity level d k with probability q k ( t ) . Under the mixed-strategy assumption and the independence of strategy selections, the joint probability that the intensity pair ( a i , d k ) is chosen is given by:
P r { L A ( t ) = a i , L D ( t ) = d k } = p i q k ,
At time t, the transition probability of a PIoT device moving from the normal state N to the defense state D is
α A N = P r L D ( t ) L A ( t ) δ 1 = d k a i δ 1 p i q k .
At time t, the transition probability of a power IoT device from its normal state N to its attacked state A is
α D N = P r L D ( t ) L A ( t ) δ 1 = a i d k δ 1 p i q k .
At time t, the transition probability of a power IoT device from defense state D to attack state A is
α A D = P r L A ( t ) L D ( t ) δ 2 = a i d k δ 2 p i q k .
At time t, the probability of transition from attack state A to defense state D for power IoT devices is
α D A = P r L D ( t ) L A ( t ) δ 2 = d k a i δ 2 p i q k .
At time t, the transition probability of a power IoT device from the attacked state A to the failure state M is
α M A = P r L A ( t ) L D ( t ) δ 3 = a i d k δ 3 p i q k .
By substituting the updated state transition probabilities from (22) to (26) into Equation (18), the Hamilton function H β , O , λ β , t can be expressed as
H β , O , λ β , t = A ( t ) D ( t ) + M ( t ) c A ( N ( t ) + A ( t ) + D ( t ) ) 2 i = 1 J a i p i i = 1 J 1 v = i + 1 J a i p i a v p v + c D ( N ( t ) + A ( t ) + D ( t ) ) 2 i = 1 K d i p i i = 1 K 1 v = i + 1 K d i q i d v q v + λ D d k a i δ 1 p i q k N ( t ) + λ D d k a i δ 2 p i q k A ( t ) λ D a i d k δ 2 p i q k D ( t ) + λ A a i d k δ 1 p i q k N ( t ) + λ A a i d k δ 2 p i q k D ( t ) λ A d k a i δ 2 p i q k A ( t ) λ A a i d k δ 3 p i q k A ( t ) + λ M a i d k δ 3 p i q k A ( t ) λ M ρ M ( t ) λ N a i d k δ 3 p i q k N ( t ) λ N a i d k δ 1 p i q k N ( t )
Then, solve for the partial derivative of the Hamilton function with respect to the attack policy, and set it to zero to obtain
H β , O , λ β , t p i = [ A ( t ) + N ( t ) + D ( t ) ] c D a i 2 p i a i s k a s p s + β { A , D , N , M } λ β θ β = 0
Therefore, we can determine that the optimal strategy for cyber attackers is
p i ( t ) = λ A θ A + λ D θ D + λ N θ N + λ M θ M [ A ( t ) + D ( t ) + N ( t ) ] c A a i 2 s i a s p s a i .
By considering the constraints of the strategy, we can derive Equation (20). □
Theorem 3.
In the proposed game, the optimal strategy for the network attacker can be expressed as
q k * ( t ) = 0 , q k ( t ) 0 λ A τ A + λ D τ D + λ N τ N + λ M τ M [ A ( t ) + D ( t ) + N ( t ) ] c D d k 2 s k d s q s d k , 0 < q k ( t ) < 1 1 , q k ( t ) 1
where switching function τ is defined as
τ D = d k a i δ 1 p i N ( t ) + d k a i δ 2 p i A ( t ) a i d k δ 2 p i D ( t ) , τ A = a i d k δ 1 p i N ( t ) + a i d k δ 2 p i D ( t ) d k a i δ 2 p i A ( t ) a i d k δ 3 p i A ( t ) , τ M = a i d k δ 3 p i A ( t ) , τ N = d k a i δ 1 p i N ( t ) a i d k δ p i N ( t ) .
Proof of Theorem 3.
Similar to the proof of Theorem 2, by solving the partial derivatives of the Hamiltonian function H ( β , O , λ β , t ) with respect to the defend policy we obtain
H β , O , λ β , t q k = [ A ( t ) + N ( t ) + D ( t ) ] c D d k 2 q k d k s k d s q s + β { A , D , N , M } λ β τ β = 0 ,
where
τ D = d k a i δ 1 p i N ( t ) + d k a i δ 2 p i A ( t ) a i d k δ 2 p i D ( t ) , τ A = a i d k δ 1 p i N ( t ) + a i d k δ 2 p i D ( t ) d k a i δ 2 p i A ( t ) a i d k δ 3 p i A ( t ) , τ M = a i d k δ 3 p i A ( t ) , τ N = d k a i δ 1 p i N ( t ) a i d k δ p i N ( t ) .
Therefore, we can determine that the optimal strategy for network defenders is
q k ( t ) = λ A τ A + λ D τ D + λ N τ N + λ M τ M [ A ( t ) + D ( t ) + N ( t ) ] c D d k 2 s k d s q s d k .
By considering the constraints of the strategy, we obtain Equation (30). □

4. Simulation Result

In this section, we numerically evaluate the interaction between attack and defense using the constructed differential game model. The total number of network devices is set to 200. In the initial state, the number of devices in the normal state is 100, the number of devices in the attacked state is 50, and the number of devices in the defended state is 50, while no devices are in the failed state. The competitive intensities L A and L D for the attacker and defender are specified in Table 1 and Table 2, respectively. The strategy cost coefficients for both the cyber attacker and cyber defender are set to 10, the device recovery coefficient is 0.17, and the thresholds are configured as δ 1 = 0 , δ 2 = 0.1 , and δ 3 = 0.2 .
The evolution of device quantities in different states is illustrated in Figure 3. As observed in Figure 3, the four device states (A, D, N, M) in the system gradually stabilize over time through iterative updates. During the initial phase, the number of devices in the attacked state (A) increases rapidly and reaches a peak within a short period, indicating a sharp rise in the proportion of compromised devices due to initial system disturbances or intensified attack strategies. Subsequently, the number of devices in state A gradually declines and stabilizes at approximately 100, suggesting that defense mechanisms begin to take effect, partially suppressing malicious activity while maintaining a steady level of attack presence. The number of devices in the defended state (D) experiences a slight initial increase and remains at a moderate level, reflecting continuous investment in defensive resources and adaptive system balancing. The number of devices in the normal state (N) decreases sharply at first and then stabilizes, illustrating the significant impact of attacks during the initial stage and the subsequent recovery of some devices to normal operation through dynamic adjustments. The number of devices in the maintenance state (M) rises rapidly from zero and stabilizes at a relatively low level, indicating that a subset of devices requires ongoing maintenance to support stable system operation. From a physical-layer perspective, this behavior aligns well with the characteristics of large-scale PIoT deployments. In such environments, an initial wave of attacks typically causes a sharp degradation in security, after which coordinated defensive actions and self-healing mechanisms drive the system toward a stable operational state. In this equilibrium, a certain proportion of devices remain compromised, another proportion is effectively protected, and only a relatively small subset continues to experience persistent failures due to sustained damage. Overall, the smooth convergence of the curves, along with the conservation of the total number of devices and the dynamic equilibrium achieved among all states, demonstrates the stability of device state evolution in the Power Internet of Things under cyber attack–defense game conditions.
The evolution of the system utility function is illustrated in Figure 4. As shown in Figure 4, the system utility function u ( t ) experiences a pronounced fluctuation and decline during the initial phase before rapidly rising and subsequently stabilizing, exhibiting typical monotonic convergence characteristics. The low utility value at the initial moment reflects that during the system startup phase, the strategies of both attackers and defenders have not yet converged, and resource allocation efficiency is poor, resulting in low overall system returns. Over time, as attackers and defenders gradually adjust their strategies, the transfer of equipment status between different compartments stabilizes. A dynamic equilibrium is achieved between the system’s energy consumption, attack losses, and defense gains, causing the utility value to gradually increase. Ultimately, the system converges toward stability. This indicates that under long-term operation, the system reaches a steady-state game equilibrium point where resource allocation and strategy selection achieve an optimal response relationship. This result validates the convergence and stability of the established differential game model. It also demonstrates that optimized strategy control can effectively enhance the security benefits of the power Internet of Things system, mitigate the impact of attacks on network operational efficiency, and maintain the system at an acceptable stable utility level. From a physical standpoint, such convergence implies that, over time, the network defender is able to maintain the PIoT system within an acceptable security–performance trade-off region, even under sustained adversarial attacks.
The impact of different cost coefficients on the utility function is illustrated in Figure 5. As shown in Figure 5, varying cost coefficients exert a significant influence on the evolutionary process of system utility. Overall, all curves exhibit a rapid upward trend in the initial phase before stabilizing, indicating that the system converges to a stable equilibrium state under dynamic game dynamics. When the cost coefficient is small, system utility rises rapidly and stabilizes at a high level, indicating that lower cost inputs by both attackers and defenders yield higher overall benefits and better system efficiency. As the cost coefficient increases, the utility curve shifts downward overall, with steady-state values gradually decreasing, reflecting that higher cost inputs diminish the system’s net benefits. Simultaneously, all curves in the figure ultimately converge smoothly, demonstrating the model’s stability and convergence under varying cost parameters. From a physical standpoint, this behavior reflects the fundamental trade-off between security performance and resource investment in practical IoT systems. Overall analysis indicates that the cost coefficient is a critical factor influencing the performance of the power IoT attack–defense game system. Moderate defense and attack costs facilitate an optimal balance between security and economic efficiency.
The impact of different device recovery parameters on the utility function is illustrated in Figure 6. The figure reveals that varying recovery rate parameters ρ significantly influence the dynamic evolution of system utility. The overall trend indicates that all three curves experience a rapid initial ascent followed by gradual convergence toward stable values over time, demonstrating that the system achieves steady-state equilibrium under the dynamic offensive–defensive game. When the recovery rate is low, the system utility ultimately reaches its maximum level. This indicates that a moderate rate of device recovery from the maintenance to normal state helps sustain stable growth in overall benefits. As ρ increases, system utility declines markedly. This occurs because excessively high recovery rates accelerate the restoration of attacked or defended devices, ultimately diminishing the system’s overall utility. From a physical standpoint, although a certain degree of self-healing is beneficial to network resilience, overly aggressive recovery mechanisms may induce unstable operational patterns and additional operational overhead. These factors, in turn, manifest as a reduction in net utility within the performance metrics.
Next, we modify the competitive intensity between attackers and defenders in the network, with specific parameters detailed in Table 3 and Table 4.
The evolution of device counts across different states is illustrated in Figure 7. As shown, the number of PIoT devices in various states exhibits distinct dynamic convergence characteristics over time. During the initial phase, the count of devices in the normal state (N) remains high but rapidly declines following an attack. Meanwhile, the counts of devices in the defense state (D) and malfunction state (M) significantly increase, while the count of devices in the attack state (A) fluctuates sharply before stabilizing. As time iterations progress, the curves for each state gradually stabilize, and the system reaches a dynamic equilibrium. Comparing with Figure 3, it can be observed that different competition intensities have a significant impact on the number of devices in each state. This is because varying competition intensities lead to different strategies adopted by both attackers and defenders, thereby altering the evolution process of the network.
The evolution of the system utility function is shown in Figure 8. As depicted in Figure 8, the system utility exhibits a distinct transient peak during the initial phase, followed by a rapid decline and subsequent stabilization. As time progresses, the offense and defense sides gradually form a dynamic equilibrium through the game process, with strategies tending toward stability. This enables the system utility to maintain a steady level. This trend indicates the model exhibits excellent convergence and steady-state characteristics, enabling the system to achieve optimal strategy convergence within a short timeframe while sustaining stable long-term operational benefits. Comparing with Figure 4, it is evident that varying competition intensities exert a significant impact on system returns. Overall, both parameter sets converge within a finite timeframe, validating the stability and controllability of the offense–defense game model.

5. Conclusions and Future Research Directions

In this paper, we investigate the competitive process between attackers and defenders in the PIoT from a macroscopic perspective, with primary focus on modeling and analyzing strategic interactions between adversarial parties to enhance security threat response capabilities in PIoT systems. Initially, PIoT devices are characterized as network resources contested between cyber attackers and defenders, and a differential game model is constructed to capture the dynamic interactions and continuous strategy selection of participants. Subsequently, with the objective of maximizing attack benefits and minimizing system losses, both attackers and defenders probabilistically select strategies from multiple levels of competitive intensity. The Pontryagin Maximum Principle (PMP) is then employed to optimize the formulated game and derive optimal strategies for both parties in the cyber attack–defense scenario. Finally, experimental simulations are conducted, and numerical results demonstrate the evolutionary process of network resource competition in the PIoT, along with the utility functions achieved by attackers and defenders.
For future research directions, we will introduce timeliness analysis of defense responses to enhance the effectiveness of defense strategies. We will also explore the integrated application of stochastic and differential game theory to broaden the model’s applicability. In addition, future work will further incorporate QoS-related metrics—such as latency, packet loss, and service availability—into the modeling framework to enhance its applicability in practical power system environments. By integrating these performance considerations, we aim to develop a more comprehensive and realistic security strategy model that better reflects the operational characteristics of the PIoT.

Author Contributions

Conceptualization, D.L. and W.S.; methodology, D.L.; software, D.L. and W.S.; validation, D.W.; formal analysis, L.C.; investigation, D.W.; writing—original draft preparation, D.L.; writing—review and editing, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Unveiling Projects (Research and Application of Key Technologies for Security and Self-healing of Power Communication Networks Based on Distributed Intelligent Perception) of State Grid Jilin Electric Power Corporation Ltd., under Grant 2024JBGS-13.

Data Availability Statement

All simulation data used for parameter identification were generated based on model assumptions and synthetic settings and do not involve real-world proprietary datasets.

Conflicts of Interest

Authors Danni Liu, Weijia Su, Li Cong and Di Wu were employed by the company Jilin Information & Telecommunication Company, State Grid Jilin Electric Power. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SymbolDescription
L A ( t ) The competitive intensity invested by attacker on specific IoT devices at time t
L D ( t ) The competitive intensity invested by defender on specific IoT devices at time t
NNormal state
DDefense state
AAttack state
MMalfunction state
α Transition probability
Gattack–defense game model
E A ( t ) The average competitive intensity injected by the attacker
E D ( t ) The average competitive intensity injected by the defender
CThe strategy execution cost
UThe utility function

References

  1. Abir, S.A.A.; Anwar, A.; Choi, J.; Kayes, A. Iot-enabled smart energy grid: Applications and challenges. IEEE Access 2021, 9, 50961–50981. [Google Scholar] [CrossRef]
  2. Liu, Y.; Ning, P.; Reiter, M.K. False data injection attacks against state estimation in electric power grids. ACM Trans. Inf. Syst. Secur. 2011, 14, 13. [Google Scholar] [CrossRef]
  3. Hatziargyriou, N.; Milanovic, J.; Rahmann, C.; Ajjarapu, V.; Canizares, C.; Erlich, I.; Hill, D.; Hiskens, I.; Kamwa, I.; Pal, B.; et al. Definition and classification of power system stability–revisited & extended. IEEE Trans. Power Syst. 2020, 36, 3271–3281. [Google Scholar]
  4. Zhang, H.; Liu, B.; Wu, H. Smart grid cyber-physical attack and defense: A review. IEEE Access 2021, 9, 29641–29659. [Google Scholar] [CrossRef]
  5. He, H.; Yan, J. Cyber-physical attacks and defences in the smart grid: A survey. IET Cyber-Phys. Syst. Theory Appl. 2016, 1, 13–27. [Google Scholar] [CrossRef]
  6. Hahn, A.; Ashok, A.; Sridhar, S.; Govindarasu, M. Cyber-physical security testbeds: Architecture, application, and evaluation for smart grid. IEEE Trans. Smart Grid 2013, 4, 847–855. [Google Scholar] [CrossRef]
  7. Chen, M.; Zhang, Y.; Li, Y.; Mao, S.; Leung, V.C. EMC: Emotion-aware mobile cloud computing in 5G. IEEE Netw. 2015, 29, 32–38. [Google Scholar] [CrossRef]
  8. Miorandi, D.; Sicari, S.; De Pellegrini, F.; Chlamtac, I. Internet of things: Vision, applications and research challenges. Ad Hoc Netw. 2012, 10, 1497–1516. [Google Scholar] [CrossRef]
  9. Zhou, Z.; Wang, B.; Dong, M.; Ota, K. Secure and efficient vehicle-to-grid energy trading in cyber physical systems: Integration of blockchain and edge computing. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 43–57. [Google Scholar] [CrossRef]
  10. Kong, X.; Xu, Y.; Jiao, Z.; Dong, D.; Yuan, X.; Li, S. Fault location technology for power system based on information about the power internet of things. IEEE Trans. Ind. Inform. 2019, 16, 6682–6692. [Google Scholar] [CrossRef]
  11. Qin, P.; Zhao, H.; Fu, Y.; Geng, S.; Chen, Z.; Zhou, H.; Zhao, X. Energy-efficient resource allocation for space–air–ground integrated industrial power Internet of Things network. IEEE Trans. Ind. Inform. 2023, 20, 5274–5284. [Google Scholar] [CrossRef]
  12. Fang, X.; Misra, S.; Xue, G.; Yang, D. Smart grid—The new and improved power grid: A survey. IEEE Commun. Surv. Tutor. 2011, 14, 944–980. [Google Scholar] [CrossRef]
  13. Bakare, M.S.; Abdulkarim, A.; Zeeshan, M.; Shuaibu, A.N. A comprehensive overview on demand side energy management towards smart grids: Challenges, solutions, and future direction. Energy Inform. 2023, 6, 4. [Google Scholar] [CrossRef]
  14. Wang, W.; Lu, Z. Cyber security in the smart grid: Survey and challenges. Comput. Netw. 2013, 57, 1344–1371. [Google Scholar] [CrossRef]
  15. Ghasempour, A. Internet of things in smart grid: Architecture, applications, services, key technologies, and challenges. Inventions 2019, 4, 22. [Google Scholar] [CrossRef]
  16. Hossain, M.M.; Peng, C. Cyber–physical security for on-going smart grid initiatives: A survey. IET Cyber-Phys. Syst. Theory Appl. 2020, 5, 233–244. [Google Scholar] [CrossRef]
  17. Zhang, X.; Chen, L.; Fan, J.; Wang, X.; Qi, W. Power IoT security protection architecture based on zero trust framework. In Proceedings of the 2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP), Zhuhai, China, 8–10 January 2021; pp. 166–170. [Google Scholar]
  18. Choi, C.; Choi, J. Ontology-based security context reasoning for power IoT-cloud security service. IEEE Access 2019, 7, 110510–110517. [Google Scholar] [CrossRef]
  19. Guo, Y.; Gong, Y.; Njilla, L.L.; Kamhoua, C.A. A stochastic game approach to cyber-physical security with applications to smart grid. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Honolulu, HI, USA, 15–19 April 2018; pp. 33–38. [Google Scholar]
  20. Theodorakatos, N.P.; Babu, R.; Moschoudis, A.P. The branch-and-bound algorithm in optimizing mathematical programming models to achieve power grid observability. Axioms 2023, 12, 1040. [Google Scholar] [CrossRef]
  21. Varghese, A.C.; Shah, H.; Azimian, B.; Pal, A.; Farantatos, E. Deep neural network-based state estimator for transmission system considering practical implementation challenges. J. Mod. Power Syst. Clean Energy 2024, 12, 1810–1822. [Google Scholar] [CrossRef]
  22. Alexopoulos, T.A.; Korres, G.N.; Manousakis, N.M. Complementarity reformulations for false data injection attacks on PMU-only state estimation. Electr. Power Syst. Res. 2020, 189, 106796. [Google Scholar] [CrossRef]
  23. Zhang, H.; Jiang, L.V.; Huang, S.; Wang, J.; Zhang, Y. Attack-defense differential game model for network defense strategy selection. IEEE Access 2018, 7, 50618–50629. [Google Scholar] [CrossRef]
  24. Agah, A.; Das, S.K. Preventing DoS attacks in wireless sensor networks: A repeated game theory approach. Int. J. Netw. Secur. 2007, 5, 145–153. [Google Scholar]
  25. Wu, H.; Gao, Q.; Tao, X.; Zhang, N.; Chen, D.; Han, Z. Differential game approach for attack-defense strategy analysis in internet of things networks. IEEE Internet Things J. 2021, 9, 10340–10353. [Google Scholar] [CrossRef]
  26. Bressan, A. Noncooperative differential games. Milan J. Math. 2011, 79, 357–427. [Google Scholar] [CrossRef]
Figure 1. Network model.
Figure 1. Network model.
Electronics 15 00426 g001
Figure 2. Migration Relationships Between Different Device States.
Figure 2. Migration Relationships Between Different Device States.
Electronics 15 00426 g002
Figure 3. Evolution of device quantity across different states under scenario 1.
Figure 3. Evolution of device quantity across different states under scenario 1.
Electronics 15 00426 g003
Figure 4. Evolution of the system utility function under scenario 1.
Figure 4. Evolution of the system utility function under scenario 1.
Electronics 15 00426 g004
Figure 5. Effect of cost coefficients on the revenue function.
Figure 5. Effect of cost coefficients on the revenue function.
Electronics 15 00426 g005
Figure 6. Effect of parameter ρ on the utility function.
Figure 6. Effect of parameter ρ on the utility function.
Electronics 15 00426 g006
Figure 7. Evolution of device quantity across different states under scenario 2.
Figure 7. Evolution of device quantity across different states under scenario 2.
Electronics 15 00426 g007
Figure 8. Evolution of the system utility function under scenario 2.
Figure 8. Evolution of the system utility function under scenario 2.
Electronics 15 00426 g008
Table 1. Competitive Strengths and Initial Strategies of Network Attackers (scenario 1).
Table 1. Competitive Strengths and Initial Strategies of Network Attackers (scenario 1).
L A 0.40.9
Initial Probability0.10.9
Table 2. Competitive Strength and Initial Strategy of Network Defenders (scenario 1).
Table 2. Competitive Strength and Initial Strategy of Network Defenders (scenario 1).
L D 0.60.8
Initial Probability0.250.75
Table 3. Competitive Strengths and Initial Strategies of Network Attackers (scenario 2).
Table 3. Competitive Strengths and Initial Strategies of Network Attackers (scenario 2).
L A 0.50.8
Initial Probability0.60.9
Table 4. Competitive Strength and Initial Strategy of Network Defenders (scenario 2).
Table 4. Competitive Strength and Initial Strategy of Network Defenders (scenario 2).
L D 0.60.9
Initial Probability0.20.8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, D.; Lv, T.; Su, W.; Cong, L.; Wu, D. A Game Theory Model for Network Attack–Defense Strategy Selection in Power Internet of Things. Electronics 2026, 15, 426. https://doi.org/10.3390/electronics15020426

AMA Style

Liu D, Lv T, Su W, Cong L, Wu D. A Game Theory Model for Network Attack–Defense Strategy Selection in Power Internet of Things. Electronics. 2026; 15(2):426. https://doi.org/10.3390/electronics15020426

Chicago/Turabian Style

Liu, Danni, Ting Lv, Weijia Su, Li Cong, and Di Wu. 2026. "A Game Theory Model for Network Attack–Defense Strategy Selection in Power Internet of Things" Electronics 15, no. 2: 426. https://doi.org/10.3390/electronics15020426

APA Style

Liu, D., Lv, T., Su, W., Cong, L., & Wu, D. (2026). A Game Theory Model for Network Attack–Defense Strategy Selection in Power Internet of Things. Electronics, 15(2), 426. https://doi.org/10.3390/electronics15020426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop