Stackelberg Dynamic Game-Based Resource Allocation in Threat Defense for Internet of Things

With the rapid development of the Internet of Things, there are a series of security problems faced by the IoT devices. As the IoT devices are generally devices with limited resources, how to effectively allocate the restricted resources facing the security problems is the key issue at present. In this paper, we study the resource allocation problem in threat defense for the resource-constrained IoT system, and propose a Stackelberg dynamic game model to get the optimal allocated resources for both the defender and attackers. The proposed Stackelberg dynamic game model is composed by one defender and many attackers. Given the objective functions of the defender and attackers, we analyze both the open-loop Nash equilibrium and feedback Nash equilibrium for the defender and attackers. Then both the defender and attackers can control their available resources based on the Nash equilibrium solutions of the dynamic game. Numerical simulation results show that correctness and effeteness of the proposed model.


Introduction
The Internet of Things (IoT) [1] refers to a huge network of various information-sensing devices combined with the Internet. These sensing devices include infrared sensors, Radio Frequency Identification Devices (RFID) [2], laser scanners, global positioning systems (GPS), and other devices. In recent years, with the development of computer intelligence technology, communication technology and perceptual recognition technology, the IoT has been widely used in smart home, smart medical, smart grid, Intelligent Transportation System (ITS) and other fields, and brought great convenience to people's lives [3][4][5].
Generally, the IoT system is composed of a large number of nodes that are often exposed to public situations, lack effective protection, and are easily attacked [6][7][8]. Therefore, the security threats faced by IoT devices are more serious than those of the traditional network. In addition, the IoT environment is complex and IoT devices with limited resources are more vulnerable to cyber-attack [9]. Faced with limited resources, how to effectively allocate resources [10] to defend against threats in the IoT system has become a serious problem that desperately needs to be solved.
Lots of work has been done on resource allocation problems in threat defense for IoT systems. In order to build up the overall security of the IoT, studies [11,12] propose an overall security architecture for the IoT from different perspectives. In order to promote multiple resource-sharing and heterogeneous resource-demanding allocations, Intrusion Detection Systems (IDS) architecture and resource allocation are recommended [13]. Zhang et al. [14] evaluated the four levels of a security index system of the Internet of Things through fuzzy analytic hierarchy process, and concluded that the key indicators for improving the security of the Internet of Things are privacy protection, WSN

•
Firstly, we research a cyber-security IoT system, which consists of one defender and N attackers. The defender tries to find the optimal allocated resources for threat defense, and the attackers try to use their resources to attack the IoT system. • Secondly, a Stackelberg dynamic game model is proposed to formulate the resource allocation problem in threat defense for the Internet of Things. The Stackelberg game is a one-leader-many-followers game, where the defender is the leader and the attackers are the followers. • For the dynamic game, we use the risk level as the system state. The objectives for the defender and attackers are to optimize the cost during the threat defense to find the optimal allocated resources for both the defender and attackers. • Finally, the open-loop control solutions and the feedback control solutions for both the defender and attackers are given based on Bellman dynamic programming.
The remainder of the paper is organized as follows. Section 2 introduces the system model and problem formulation. Section 3 provides the Nash equilibrium solutions for the proposed game model. Numerical simulations are given in Section 4. Finally, we conclude the work in Section 5.

System Model and Problem Formulation
We consider a cyber-security IoT system that is composed of one defender and N attackers. The defender can control its resources, such as energy resources, computing resources, and bandwidth resources, to resist intrusion from all sorts of attackers. The attackers will use all available resources to successfully break the defense and break into the IoT system. Based on these, we will try to formulate a dynamic model for both the defender and the attackers, to find their optimal strategies for resources allocation of the cyber-security IoT system in the process of defense and attack. In our proposed model, the relationship between the defender and the attackers is considered a one-leader-N-followers Stackelberg dynamic game, where the defender is the leader and the attackers are the followers. At the beginning of the Stackelberg dynamic game, the defender will choose a resource strategy to protect the system. After observing the defender's strategy, the attackers will choose their optimal resource strategies for intrusion based on the observed defense strategy. Then the defender will allocate its resources to defense the invasion based on the attackers' strategies. The strategies for both the defender and the attackers are dynamic and time-dependent.
In order to formulate the Stackelberg dynamic game, there should be a system state for both the defender and the attackers. As the risk level of the IoT system concerns both the defender and the attackers, we use it as the system state of the Stackelberg dynamic game. Generally speaking, the risk level is affected by the input variables, which are the defense strategy and the attack strategies. In our proposed model, we assume that the risk level is not only related to the current input variables, but also is affected by the current risk level with a degradation coefficient. Assuming u 0 (t) and u i (t) are the control variables of the defender and the attackers for the resource allocation, respectively, let x(t) denote the risk level of the IoT system, which can be given by the following differential equation [32]: where α is a negative weighted factor and β i is a positive weighted factor that denote the strategies' relative importance on the risk level. Through an effective defense strategy, the system will be more robust as time elapses. Then the risk level of the system will decrease with a random degradation coefficient, which is denoted by ε. Based on Equation (1), we find that, in our proposed IoT system, the risk level will decrease with the defender's action, and increase with the attackers' actions. Meanwhile, the longer the system continues, the lower the risk level, which is denoted by the random degradation coefficient ε.
Given the system state, we can now discuss the objective functions for the defender and the attackers. As the leader of the Stackelberg dynamic game, we will formulate the objective function for the defender first. The IoT system is usually composed of devices with limited resources, such as low power supply and limited battery capacity, so the defense cost is the main concern for the IoT system protection. For the defender, it aims to minimize the cost for protecting the IoT system and resisting the attackers with limited resources. Its objective function can be given as follows: where µ 0 , ν 0 , and ρ 0 are positive weighted factors that denote the relative importance of the components. In our paper, we assume that the weighted factors add up to 1, which means the weighted factors are decimals larger than 0 and less than 1. The physical meanings of the weighted factors are the importance of the components in the cost function. The objective function of the defender has three components. The first part is , which means the observing cost. The defender should observe the attackers' strategies to generate its own strategy for system defense. The observing cost is a direct ratio to the attackers' resource strategies; when the attackers spend more resources, the defender should pay more attention for observation. u 2 0 (t) is the second part of the objective function, and means the defending cost of the IoT system. Generally, the defending cost is directly related to the resources allocated for defending. The third part of the objective function of the defender is given by ( x − x(t)), which is the disparity between the maximum permissible risk level and the real-time risk level. The purpose of the defender is to reduce the risk level to ensure data security, even with a risk level of zero. Let x denote the maximum permissible risk level; we should try to make the risk level no higher than the threshold. According to Equation (2), we find that the total instantaneous cost of the defender is a function of the allocated resources u 0 (t), u i (t), and the risk level x(t). The defender wants to find the optimal allocated resources u 0 (t) that minimize its cost function over time interval [0, T]: subject to Equation (1). Here, r is the discount rate. The attackers also want to invade the entire IoT system at a low cost. Therefore the aims of the attackers are to minimize the cost of breaking into the IoT system with limited resources. Its objective function can be given as follows: where µ i , ν i , and ρ i are positive weighted factors that denote the relative importance of the components. The objective function of the defender has three components. The first part is u 2 i (t), the resource cost to the attacker. Generally, the attacker should allocate enough energy resources to attack the IoT system, and we use the linear quadratic form to denote the attacking resource cost, the energy or power cost during attack. The second part is the cost for observing and weakening the IoT system. We use u − u i (t) to denote the resources available for observing, except for the allocated attacking resources, where u denotes the maximum resources that can be allocated. u 0 (t) denotes the difficulty of weakening the IoT system; the more resources allocated for protecting by the defender, the harder it is to weaken the IoT system. We combine the above components to denote the system observing and weakening cost. The third part is the cost caused by the risk level, which is denoted by (x(t) − x). Each attacker wants to increase the risk level of the IoT system higher than the maximum permissible risk level x. Based on the objective function, attacker i wants to find the optimal allocated resources u i (t) that minimize its cost function over time interval [0, T]: subject to Equation (1). Here, r is the discount rate.

Game Analysis
In this section, we will discuss both the open-loop Nash equilibrium solutions and the feedback Nash equilibrium solutions to the proposed game model established in the previous section, and analyze the optimal strategies for the attackers and defender in the IoT system. The open-loop Nash equilibrium solutions will be given first, followed by the feedback Nash equilibriums. Both solutions are given based on Bellman's dynamic programming principle.

Open-Loop Nash Equilibrium Solutions
During the Stackelberg relations in the threat defense, the IoT system will first consume certain resources for implementing a defense strategy, then the attackers will attack the system based on the initial defending resource. After observing the attackers' strategies, the resources of the IoT system will be recalibrated to cope with all kinds of risk. Because both the IoT system and attackers' resources are limited, it is important to effectively allocate resources during the defense and attack based on the proposed Stackelberg dynamic game. If the defenders and attackers choose to commit their strategies from the outset, their information structure can be seen as an open-loop pattern, which means the optimal strategies for the defender and attackers are functions of the initial risk level x(0) and the time instant t. In this section, the open-loop Nash equilibrium will be given to the game Equations (3) and (5) to obtain the optimal strategies in a finite time horizon [0, T].

Open-Loop Solutions for the Attackers
In order to minimize the cost function, each attacker needs to choose their optimal resource strategies based on the observed defense strategy. Before getting the optimal allocated resources, we first give some definitions for understanding the proposed game model.
constitutes an open-loop Stackelberg equilibrium to the problem in Equation (5), and x * (t) is the corresponding state trajectory, if there exists a costate function Λ i (t) such that the following relations are satisfied, .
where Λ i (t) in Equation (8) is an adjoint equation to describe the dynamics of a costate variable. The costate function Λ i (t) is a function associated with the state variable x(t). Generally, Equation (7) can be considered a Hamiltonian system H i (t) of the proposed game model, and x(t).
Based on the definitions given above, we can solve the attacker's optimal resource strategy problem based on the Bellman's dynamic programming principle. Lemma 1. The optimal resource strategy to attacker i is where Λ i (t) is given by the following: Proof. See Appendix A.
Equation (9) shows that the optimal resource strategy of the attacker will be affected by u 0 (t) and costate functions Λ i (t). We can see that the optimal resource strategy of attacker i is in positive proportion to the resource strategy u 0 (t) of the defender. The attackers will choose their optimal resource strategies for intrusion based on the initial defense strategy.

Open-Loop Solutions for the Defender
The defender will allocate its resources to defend the attackers based on the attackers' strategies. In this subsection, we will give the open-loop Nash equilibrium to the defender.

Definition 3.
For the defender, the resource strategy u * 0 (t) is optimal if the following inequality holds for all feasible control u 0 (t) = u * 0 (t):  (3), and x * (t) is the corresponding state trajectory, if there exist costate functions λ 0 (t) and λ i (t) such that the following relations are satisfied: . .
where the Hamiltonian system H 0 (t) of the defender can be expressed as follows: Calculate the partial derivative for λ 0 (t) and λ i (t) in the Hamiltonian system H 0 (t), we can obtain, .
Solving Equations (16) and (17), we have, Calculating the partial derivative for u 0 (t) in Equation (15), we obtain, Based on the above analysis, the optimal solutions for both the attackers and defender are obtained, we get the corresponding state trajectory x * (t) using Equations (9) and (20) as follows:

Open-Loop Control Algorithm
In this subsection, we discuss the implementation open-loop control algorithm for the proposed game analysis. Algorithm 1 is the open-loop control algorithm for the attackers and defenders. The whole algorithm cycling can be divided into two parts. One is the "open-loop control of attackers" part, which is used to calculate the optimal strategies of resource allocation during the attacks. The other is the "open-loop control of defender" part, to make a decision on the resource level for threat Step 4.2. Calculate the solutions for the attackers.
Step 5. Get the open-loop solutions of the attackers for the defender; Step 6. Start to calculate the open-loop control solutions for the defender; Step 6.1. Set up the objective function for the defender; Step 6.2. Calculate the solutions for the defender. Algorithm End

Feedback Nash Equilibrium Solutions
To eliminate information nonuniqueness in the derivation of Nash equilibria, we can obtain the optimal solutions for the proposed game mode to satisfy the feedback Nash equilibrium property. In the feedback situation, the information structures of the defender and attackers follow a closed-loop perfect state pattern, and the optimal strategies for the defender and attackers become functions of the initial risk level x(t), the current risk level x(t) at time instant t, and the current time t. In this subsection, the feedback Nash equilibrium solutions to the proposed Stackelberg dynamic game are discussed based on the dynamic optimization programming technique developed by Bellman [33]. In the following subsections, we first discuss the optimal resource strategies for each attacker in a finite time horizon [0, T]. Then, the optimal strategy of the defender is obtained based on the attackers' solutions.

Feedback Solutions for the Attackers
In this section, we first discuss the optimal resource strategies for the attackers, the feedback Nash equilibrium solutions to the game Equations (1) and (5) will be discussed.

Definition 5.
A set of control u * i (t) constitutes a feedback solution to Equations (1) and (5); if there exists a continuously differentiable function V i (t, x), and V i (t, x) satisfies the following differential equation: Calculating the partial derivative for u i (t) in Equation (22), we can then obtain Lemma 2. The value function V i (t, x) admits a solution that satisfies, where A i (t) is given by and B i (t) are satisfied, Proof. See Appendix B.

Feedback Solutions for the Defender
In this subsection, the feedback Nash equilibrium solution for the defender will be discussed.

Definition 6.
A set of control u * 0 (t) constitutes an feedback solution to Equations (1) and (3), if there exists a continuously differentiable function V 0 (t, x), and V 0 (t, x) satisfies the following differential equation: As the game leader, the defender should consider the resource strategies of the attackers before making a decision on the resource strategies. Calculating the partial derivative for u 0 (t) in (27), we obtain Lemma 3. The value function V 0 (t, x) admits a solution that satisfies, where A 0 (t) and B 0 (t) are given by Proof. By taking the derivative of V 0 (t, x) with respect to t and x, we obtain, V 0 Solving Equations (27), (31), and (32), A 0 (t) and B 0 (t) are satisfied: Solving the above equation, we can obtain the expression of A 0 (t) as follows: Substituting Equation (34) into Equation (28), we can derive the optimal resource strategy for the defender as follows: Solving Equation (1), we can get the optimal state:

Feedback Control Algorithm
In this subsection, we will discuss the implementation feedback control algorithm for the proposed game analysis, which is given in Algorithm 2. Similarly, the whole algorithm cycling can be divided into the attackers' part, and the defender's part. The time complexity of the feedback control algorithm will be O(n), and the space complexity is O(n). The progress can be described as follows.

Start algorithm
Step 1. Set up the parameter for the attackers and defender; Step 2. The defender control its initial strategy for resource allocation for threat defense; Step 3. Start the feedback control of the attackers and the defender; Step 4. Start to calculate the feedback control solutions for the attackers, Step 4.1. Set up the objective function for the attackers; Step 4.2. Calculate the solutions for the attackers.
Step 5. Get the feedback solutions of the attackers for the defender; Step 6. Start to calculate the feedback control solutions for the defender; Step 6.1. Set up the objective function for the defender; Step 6.2. Calculate the solutions for the defender. Algorithm End

Numerical Simulations
In this section, we will use MATLAB software to simulate the proposed Stackelberg dynamic game model. We will analyze the resource strategies of attackers and defender, in the form of open-loop and feedback. The simulation parameters are shown in Table 1. To simplify the simulations, we assume all the attackers are uniform with the same simulation parameters.

Numerical Simulations of Open-Loop Nash Equilibrium Solutions
We first simulate the open-loop Nash equilibrium solutions of the model to get the optimal defense resource strategies of the defender and attackers. Figure 1 describes the relationship between the optimal strategies u * i (t) and u * 0 (t) over time t(t ∈ [0, 10]). As shown in Figure 1, the optimal resource strategy of both the attacker and defenders monotonically decrease with time t. In order to protect the security of the system, the defender adopts a strategy to consume its own resources when the attacker attacks. The attacker adopts a strategy to attack the system and consumes its own resources. As the time changes, the optimal strategies for the defender and attackers tend to convergence. assume all the attackers are uniform with the same simulation parameters.

Numerical Simulations of Open-Loop Nash Equilibrium Solutions
We first simulate the open-loop Nash equilibrium solutions of the model to get the optimal defense resource strategies of the defender and attackers.  ). As shown in Figure 1, the optimal resource strategy of both the attacker and defenders monotonically decrease with time t . In order to protect the security of the system, the defender adopts a strategy to consume its own resources when the attacker attacks. The attacker adopts a strategy to attack the system and consumes its own resources. As the time changes, the optimal strategies for the defender and attackers tend to convergence.  Figure 2 describes the changes in the attacker's optimal resource strategy when u 0 (t) takes different values. We find that the smaller u 0 (t), the smaller the optimal resource strategy u i (t). This is because the attacker will choose their optimal resource strategies for attacks based on the observed defense strategy. Figure 3 describes the relationship between the risk level of the system and time t. The risk level at the initial moment is the highest, and with the effective defense of the defender, the risk level shows a decreasing trend. As shown in Figure 3b, the risk level is a decreasing function with respect to time t, which is proportional to the number of attackers. The number of attackers are set to 1, 5, and 20, respectively. Figure 4 shows the risk level variation of the system, when the number of the devices in the IoT system becomes a large number, to analyze the scalability of the proposed model. We can prove that the proposed model can be used for IoTs with a large number of devices based on Figure 4.  Table 1. To simplify the simulations, we assume all the attackers are uniform with the same simulation parameters.

Numerical Simulations of Open-Loop Nash Equilibrium Solutions
We first simulate the open-loop Nash equilibrium solutions of the model to get the optimal defense resource strategies of the defender and attackers.  ). As shown in Figure 1, the optimal resource strategy of both the attacker and defenders monotonically decrease with time t . In order to protect the security of the system, the defender adopts a strategy to consume its own resources when the attacker attacks. The attacker adopts a strategy to attack the system and consumes its own resources. As the time changes, the optimal strategies for the defender and attackers tend to convergence. Variation of u i (t) with time risk level shows a decreasing trend. As shown in Figure 3b, the risk level is a decreasing function with respect to time t , which is proportional to the number of attackers. The number of attackers are set to 1, 5, and 20, respectively. Figure 4 shows the risk level variation of the system, when the number of the devices in the IoT system becomes a large number, to analyze the scalability of the proposed model. We can prove that the proposed model can be used for IoTs with a large number of devices based on Figure 4.

Numerical Simulations of Feedback Nash Equilibrium Solutions
This subsection mainly simulates the feedback Nash equilibrium solution of the model. Figure  5 describes the relationship between the optimal strategy ( ) As shown in Figure 5a, the attacker's optimal resource strategy is an increasing function with respect to time t . The attackers control their own resource strategies. The aim of the attack is to increase the risk level, so they allocate more resources for attacks under the feedback control situation. As shown in Figure 5b, the defender's optimal resource strategy is a decreasing function with respect to time t with respect to time t , which is proportional to the number of attackers. The number of attackers are set to 1, 5, and 20, respectively. Figure 4 shows the risk level variation of the system, when the number of the devices in the IoT system becomes a large number, to analyze the scalability of the proposed model. We can prove that the proposed model can be used for IoTs with a large number of devices based on Figure 4.

Numerical Simulations of Feedback Nash Equilibrium Solutions
This subsection mainly simulates the feedback Nash equilibrium solution of the model. Figure  5 describes the relationship between the optimal strategy ( ) As shown in Figure 5a, the attacker's optimal resource strategy is an increasing function with respect to time t . The attackers control their own resource strategies. The aim of the attack is to increase the risk level, so they allocate more resources for attacks under the feedback control situation. As shown in Figure 5b, the defender's optimal resource strategy is a decreasing function with respect to time t

Numerical Simulations of Feedback Nash Equilibrium Solutions
This subsection mainly simulates the feedback Nash equilibrium solution of the model. Figure 5 describes the relationship between the optimal strategy u * i (t) and u * 0 (t) with time t(t ∈ [0, 10]). As shown in Figure 5a, the attacker's optimal resource strategy is an increasing function with respect to time t. The attackers control their own resource strategies. The aim of the attack is to increase the risk level, so they allocate more resources for attacks under the feedback control situation. As shown in Figure 5b, the defender's optimal resource strategy is a decreasing function with respect to time t. The defender controls its own resource strategy to minimize risk, but, because of limited resources, may not have enough for defense as the time changes. Figure 6 describes the relationship between the risk level of the system and time t. As shown in Figure 6, the risk level is proportional to the number of attackers. In the feedback Nash equilibrium solution, the attacker uses more attacks to increase the risk. Figure 7 shows the risk level variation of the system, when the number of devices in the IoT system becomes large. Figure 8 gives the time complexity of the proposed Stackelberg dynamic game. As shown in Figure 6, the time complexity of both the open-loop and feedback control algorithm will be O(n). . The defender controls its own resource strategy to minimize risk, but, because of limited resources, may not have enough for defense as the time changes.
(a) (b) Figure 5. (a) Optimal strategy of the attacker; (b) optimal strategy of the defender. Figure 6 describes the relationship between the risk level of the system and time t . As shown in Figure 6, the risk level is proportional to the number of attackers. In the feedback Nash equilibrium solution, the attacker uses more attacks to increase the risk. Figure 7 shows the risk level variation of the system, when the number of devices in the IoT system becomes large. Figure 8 gives the time complexity of the proposed Stackelberg dynamic game. As shown in Figure 6 Figure 6 describes the relationship between the risk level of the system and time t . As shown in Figure 6, the risk level is proportional to the number of attackers. In the feedback Nash equilibrium solution, the attacker uses more attacks to increase the risk. Figure 7 shows the risk level variation of the system, when the number of devices in the IoT system becomes large. Figure 8 gives the time complexity of the proposed Stackelberg dynamic game. As shown in Figure 6

Conclusions
This paper proposes a Stackelberg dynamic game-based resource allocation model in the cybersecurity IoT system that is composed by one defender and N attackers. We formulate a dynamic model for both the defender and the attackers to find their optimal strategies for resource allocation in the process of defense and attack. By solving the open-loop Nash equilibrium solution and the feedback Nash equilibrium solution, we find that the optimal resource solution for the defender is the open-loop Nash equilibrium solution, and under the open-loop situation, the defender can effectively reduce the risk level of the system. However, attackers can obtain more profit under the feedback situation.