Optimal Power Control in Wireless Powered Sensor Networks: A Dynamic Game-Based Approach

In wireless powered sensor networks (WPSN), it is essential to research uplink transmit power control in order to achieve throughput performance balancing and energy scheduling. Each sensor should have an optimal transmit power level for revenue maximization. In this paper, we discuss a dynamic game-based algorithm for optimal power control in WPSN. The main idea is to use the non-cooperative differential game to control the uplink transmit power of wireless sensors in WPSN, to extend their working hours and to meet QoS (Quality of Services) requirements. Subsequently, the Nash equilibrium solutions are obtained through Bellman dynamic programming. At the same time, an uplink power control algorithm is proposed in a distributed manner. Through numerical simulations, we demonstrate that our algorithm can obtain optimal power control and reach convergence for an infinite horizon.


Introduction
Conventional wireless sensor networks (WSN) are always disposable systems, because sensors cannot be recharged due to random deployment and the entire network is invalid when the batteries of wireless sensors run out of energy [1]. As the use of the WSN is strictly limited by the life span of the sensors' batteries, energy consumption has become one of the biggest constraints of the wireless sensor node and has posed many challenges to WSNs [2]. Energy has become one of the scarcest resources in WSN [3]. Wireless power transfer (WPT) and other energy harvesting technologies provide solutions in such situations. Benefiting from microwave wireless power transfer, the wireless powered sensor networks (WPSN) can be used to reduce the operational cost, provide a stable energy supply, and achieve much longer operating lifetimes [4].
WPSN has been widely researched in the recent literature [5][6][7][8]. Wireless sensors in WPSN can be powered through WPT in the downlink which is radio frequency enabled, and can use the harvesting energy for information transmission in the uplink. Compared to other energy harvesting technologies, WPT can achieve long-distance energy transfer and constant energy supplementation [6]. However, in WPSN, the distance between the wireless sensors and energy nodes (ENs) may cause performance unfairness, because of the near-far effect. When wireless sensors are located far away from the ENs, they will receive less energy, because of power transmission attenuation. But they may need more energy for the uplink information transmission. Thus, communication and energy scheduling should be considered on this occasion. For the downlink energy transfer, energy beamforming technology is mode. The wireless sensors use the energy harvested from H-AP for information transmission [17]. The energy harvested by each sensor is stored in a rechargeable battery and then used for wireless information transmission (WIT). Moreover, the wireless sensors control their uplink transmit power to extend the working hours, meanwhile improving their own QoS, which is a distributed optimization problem and leads to a dynamic game that can be modeled by a non-cooperative differential game. In this paper, the "harvest-then-transmit" protocol [18] is considered. The time duration of transmission is assumed to be a different block transmission time with a normalized duration. Energy and information are transmitted from block to block. For each transmission block, it can be divided into two phases (as shown in Figure 2). The first phase is the time duration of wireless energy transfer (WET), which is denoted by i  , and the second phase is the time duration of wireless information transmission (WIT), which is represented as 1 i   . Our target is to control the uplink transmit power of sensors in WPSN, to maximize the sensors' own economic revenue during the time period . A differential game-based model is constructed to describe the revenue maximization problem. Through the optimal power control, the wireless sensors in WPSN can achieve a balance between energy consumption and QoS improvement. In order to simplify the system, we will consider a WPSN with one H-AP and N wireless sensors, where N is the set of wireless sensors (players). During the first phase of wireless energy transfer, let i T p denote the transfer power from H-AP to sensor i . It is assumed that i T p In this paper, the "harvest-then-transmit" protocol [18] is considered. The time duration of transmission is assumed to be a different block transmission time with a normalized duration. Energy and information are transmitted from block to block. For each transmission block, it can be divided into two phases (as shown in Figure 2). The first phase is the time duration of wireless energy transfer (WET), which is denoted by τ i , and the second phase is the time duration of wireless information transmission (WIT), which is represented as 1 − τ i . mode. The wireless sensors use the energy harvested from H-AP for information transmission [17]. The energy harvested by each sensor is stored in a rechargeable battery and then used for wireless information transmission (WIT). Moreover, the wireless sensors control their uplink transmit power to extend the working hours, meanwhile improving their own QoS, which is a distributed optimization problem and leads to a dynamic game that can be modeled by a non-cooperative differential game.

Energy Transmitter Wireless Sensors
Wireless Sensors

Wireless Sensors
Wireless Sensors Energy Information Figure 1. System model for a WPSN.
In this paper, the "harvest-then-transmit" protocol [18] is considered. The time duration of transmission is assumed to be a different block transmission time with a normalized duration. Energy and information are transmitted from block to block. For each transmission block, it can be divided into two phases (as shown in Figure 2). The first phase is the time duration of wireless energy transfer (WET), which is denoted by i  , and the second phase is the time duration of wireless information transmission (WIT), which is represented as 1 i   . Our target is to control the uplink transmit power of sensors in WPSN, to maximize the sensors' own economic revenue during the time period . A differential game-based model is constructed to describe the revenue maximization problem. Through the optimal power control, the wireless sensors in WPSN can achieve a balance between energy consumption and QoS improvement. In order to simplify the system, we will consider a WPSN with one H-AP and N wireless sensors, where N is the set of wireless sensors (players). During the first phase of wireless energy transfer, let i T p denote the transfer power from H-AP to sensor i . It is assumed that i T p Figure 2. Two-step transmission phase.
Our target is to control the uplink transmit power of sensors in WPSN, to maximize the sensors' own economic revenue during the time period t ∈ [0, T]. A differential game-based model is constructed to describe the revenue maximization problem. Through the optimal power control, the wireless sensors in WPSN can achieve a balance between energy consumption and QoS improvement. In order to simplify the system, we will consider a WPSN with one H-AP and N wireless sensors, where N is the set of wireless sensors (players). During the first phase of wireless energy transfer, let p i T denote the transfer power from H-AP to sensor i. It is assumed that p i T satisfies a maximum power constraint P max T (i.e., 0 ≤ p i T ≤ P max T ) [19]. The harvesting energy in sensor i is given by [20]: where η i is the energy conversion efficiency of player i, and 0 < η i ≤ 1. g i T 2 is the downlink channel gain. Let x i (t) denote the power level of player i, which can be interpreted as the state variables of a system. State variables are dynamic variables over different time periods that are influenced by the uplink transmit power, as well as by exiting levels of the state variables. Let p i (t) denote the uplink transmit power of player i, which is viewed as the control variable. The dynamic of the power level can be characterized as a linear differential equation, i.e.: where µ i is the energy loss coefficient. x i (0) = 0, is the initial state, which means that there is no energy transmission at the beginning of the game. Now, we discuss how wireless sensors control their uplink transmit power to achieve revenue maximization, to reach an equilibrium between energy consumption and an achievable throughput. Subject to the limited energy, each sensor aims to minimize the uplink transmit power to extend the working hours, but may result in less information transmission and a low QoS. Therefore, each sensor needs to balance the conflict between energy consumption and an achievable throughput. Generally speaking, there will be a queue length or buffer size for each H-AP. When the buffer size of the H-AP is full, it will refuse to provide the service for any uplink information transmission. Therefore, in our game, we suppose that there are enough buffer sizes for information transmission and only consider how to control the uplink transmission power to achieve revenue maximization. The structure of the optimization model will consist of energy revenue specifications and QoS revenue specifications.
Firstly, we give the energy revenue definition. The energy revenue depends on the energy storage in the sensors and the energy's unit price. Assuming the unit price is ε, the instantaneous energy revenue is defined as a linear form, as follows: In perfect competition, each sensor will use the lowest power possible, to reduce the energy consumption and increase the energy revenue, given by Equation (2). However, less transmission power may cause a low transmission rate and low QoS. Thus, we introduce a QoS revenue to describe the conflict between energy consumption and QoS requirements. As the "harvest-then-transmit" protocol is considered, there is no interference from the energy transmission. Let the achieve rate of sensor i denote the QoS revenue, where the QoS revenue specifications are obtained as: where γ i = g i /σ 2 i , p i is the uplink transmit power and the control variables of the game. g i is the uplink channel power gain. ρ is a constant parameter that denotes the unit rate revenue.
Based on the above assumption, the total revenue of wireless sensor i is denoted as follows: In this paper, we use the noncooperative differential game theory [21] to analyze the optimal uplink transmit power and to achieve revenue maximization for each sensor. Let the target QoS level for each sensor be denoted by S i . We evaluate the balance between energy consumption and QoS over the time interval [0, T], using the term α i (x i (T) − S i ), where α i is a constant parameter and T is the end of the control. Let r denote the discount rate, where the dynamic game of the power control for each sensor noncooperatively chooses its uplink transmit power as: Subject to the deterministic dynamics: Now, we formulate the optimal power control for all sensors in WPSN as a differential game, as follows.
• Players : All wireless sensors i ∈ N in the WPSN. • Strategy space: All wireless sensors can noncooperatively choose their uplink transmit power p * i (t) , to maximize the revenue. • State: The power level state is denoted by vector x i (t), where the state is controlled by the dynamic constraint in Equation (2).

•
Objective function: All of the wireless sensors act to maximize their discounted revenues over a time interval [0, T], respectively.

Game Analysis
In this section, we analyse the optimal uplink transmit power for each wireless sensor. In the following subsections, we first discuss the optimal uplink transmit power in a finite horizon. Then, the optimal strategy will be considered under an infinite horizon. An uplink power control algorithm based on the differential game will be given in the third subsection.

Analysis of Differential Game in Finite-Horizon
The finite horizon differential game will be solved, based on the dynamic optimization program technique, which was developed by Bellman [22,23]. According to Bellman's dynamic programming principle, the uplink transmit power should be optimal for the given time duration. (6) and (7), an n-tuple of strategies p * i (t, x), f or i ∈ N constitutes a feedback Nash equilibrium solution if there exists a functional V i (t, x), defined on the time interval [0, T] and satisfying the following relations for each i ∈ N [22,23]:

Lemma 1. For the optimization Equations
where the time interval [0, T]: For all t ∈ [0, T], if the strategies p * i (s), f or i ∈ N provide a feedback Nash equilibrium to the differential game problem on the time interval [0, T], it can provide a feedback Nash equilibrium for the same problem on the time interval [t, T].  (6) and (7) has to satisfy the following conditions: Lemma 3. In the wireless information phase, the optimal uplink transmit power for each sensor i ∈ N in WPSN, satisfies: Proof. See Appendix A.

Analysis of Infinite-Horizon Differential Game
Consider the infinite-horizon autonomous game problem with constant discounting, in which T approaches infinity and where the objective functions and state dynamics are both autonomous. Now consider the alternative game to (6) and (7): Subject to the deterministic dynamics: The infinite-horizon autonomous game is independent of the choice of t and only dependent upon the state at the starting time, which is 0. Then, a feedback Nash equilibrium solution for the infinite-horizon autonomous games (14) and (15) can be characterized as follows: Lemma 4. An n-tuple of strategies q * i (x), f or i ∈ N constitutes a feedback Nash equilibrium solution if there exists a functional W i (x) , defined on the time interval [0, T] and satisfying the following set of partial differential equations for each i ∈ N: Lemma 5. The optimal uplink power for each wireless sensor is independent of the time, which is the game equilibrium strategy and can be expressed as: Proof. See Appendix B.
Lemma 6. The optimal strategy for the infinite-horizon differential game satisfies: Proof. Substituting the optimal uplink power obtained in Equation (17), which is also the game equilibrium strategy, into the state function, yields: The optimal state trajectory can be obtained through solving the above dynamics, and is denoted as:

Uplink Power Control Algorithm
In this subsection, we present an uplink power control algorithm (Algorithm 1) in wireless powered sensor networks, based on the infinite-horizon solutions presented in Section 3.2, which is as follows: Algorithm 1. The strategy for each sensor to determine the optimal uplink transmit power 1: Initially, sensor set the power level x as 0, there is no energy transmission at the beginning of the game. 2: for sensor i ∈ N do 3: Start game, initial parameters τ i ,µ i ,η i for the game; 4: Based on the QoS requirements, set the final rate revenue level as S i 5: while x i > 0, do 6: Calculate the optimal uplink power based on Equation (17); 7: Calculate the optimal strategy of power level based on Equation (20); 8: Calculate the maximized revenue for each sensor based on Equations (14), (17) and (20); 9: Updata power level x i for each sensor; 10: end while 11: end for In the above algorithm, each sensor continues to calculate the optimal uplink transmit power, until there is no energy left in the sensor's batteries for information transmission.

Optimal Power and Revenue
In this section, we evaluate the proposed differential game model by simulations. The simulation results of the finite-horizon and infinite-horizon differential game are both presented. We assume that the number of sensors in WPSN is N = 20, and consider the time horizon T = 100. Based on Equation (21), the parameter A i (t) of the value function V i (t, x) will directly impact the variation of the optimal uplink power. Thus, Figure 3 shows how the key parameter A i (t) varies with time. It is plotted in seconds. We observe that A i (t) monotonically increases for sensor 1, sensor 2, and sensor 5, monotonically decreases for sensor 3. Based on Equation (27) in Appendix A, we can see that the variation of A i (t) is affected by the constant parameter α i and the energy loss coefficient µ i . Then, different sensors will have a different variation trend of A i (t). The optimal uplink transmit power of sensors under a finite-horizon are plotted in Figure 4. The optimal uplink transmit power has the same variation trend as parameter A i (t). In Figure 5, we show the optimal uplink transmit power under infinite-horizon. The uplink transmit power is constant and independent of time. Figure 6 explores the relationship between the optimal trajectories of the state, which are the power levels of each sensor.
It can be observed that the power level has exhibits an initial growth trend. However, as the time increases, it converges to a state value. In other words, the dynamic of the power level is convergent and the convergence speed is fast. Finally, the revenue variation with time and the maximized revenue of each sensor, are evaluated and shown in Figures 7 and 8. i  . Then, different sensors will have a different variation trend of   i A t . The optimal uplink transmit power of sensors under a finite-horizon are plotted in Figure 4. The optimal uplink transmit power has the same variation trend as parameter   i A t . In Figure 5, we show the optimal uplink transmit power under infinite-horizon. The uplink transmit power is constant and independent of time. Figure  6 explores the relationship between the optimal trajectories of the state, which are the power levels of each sensor. It can be observed that the power level has exhibits an initial growth trend. However, as the time increases, it converges to a state value. In other words, the dynamic of the power level is convergent and the convergence speed is fast. Finally, the revenue variation with time and the maximized revenue of each sensor, are evaluated and shown in Figures 7 and 8.     power of sensors under a finite-horizon are plotted in Figure 4. The optimal uplink transmit power has the same variation trend as parameter   i A t . In Figure 5, we show the optimal uplink transmit power under infinite-horizon. The uplink transmit power is constant and independent of time. Figure  6 explores the relationship between the optimal trajectories of the state, which are the power levels of each sensor. It can be observed that the power level has exhibits an initial growth trend. However, as the time increases, it converges to a state value. In other words, the dynamic of the power level is convergent and the convergence speed is fast. Finally, the revenue variation with time and the maximized revenue of each sensor, are evaluated and shown in Figures 7 and 8.  Variation of A(t)  WN1  WN2  WN3  WN4  WN5  WN6  WN7  WN8  WN9  WN10  WN11  WN12  WN13  WN14  WN15  WN16  WN17  WN18 WN19 WN20

Residual Energy
In this section, we compare the proposed differential game (DG) algorithm with the Nash bargaining game (NBG) algorithm in [14], which is also a game theory-based power control method in WPSN. We use the same information transmission power for the simulations, and the test is configured with the same parameters. The residual energy of sensors one to four are shown in Figure 9. Each sensor should have a residual energy, in order to deal with information transmission tasks. As the time increases, the residual energy of the sensors under our algorithm increase, and rapidly converge to produce a stable level. The residual energy of the sensors based on the Nash bargaining game remain unchanged. Figure 9 also shows that the residual energy under our algorithm is higher than that under the NBG algorithm. Wireless sensors thus have more power for information transmission under our algorithm.

QoS Revenue
According to the QoS revenue function in Equation (4), the QoS revenue is simulated and the comparison between our DG algorithm and the NBG algorithm is shown in Figure 10. All sensors are tested in our simulations. As the time increases, because the QoS revenue is directly proportional to

Residual Energy
In this section, we compare the proposed differential game (DG) algorithm with the Nash bargaining game (NBG) algorithm in [14], which is also a game theory-based power control method in WPSN. We use the same information transmission power for the simulations, and the test is configured with the same parameters. The residual energy of sensors one to four are shown in Figure 9. Each sensor should have a residual energy, in order to deal with information transmission tasks. As the time increases, the residual energy of the sensors under our algorithm increase, and rapidly converge to produce a stable level. The residual energy of the sensors based on the Nash bargaining game remain unchanged. Figure 9 also shows that the residual energy under our algorithm is higher than that under the NBG algorithm. Wireless sensors thus have more power for information transmission under our algorithm.

Residual Energy
In this section, we compare the proposed differential game (DG) algorithm with the Nash bargaining game (NBG) algorithm in [14], which is also a game theory-based power control method in WPSN. We use the same information transmission power for the simulations, and the test is configured with the same parameters. The residual energy of sensors one to four are shown in Figure 9. Each sensor should have a residual energy, in order to deal with information transmission tasks. As the time increases, the residual energy of the sensors under our algorithm increase, and rapidly converge to produce a stable level. The residual energy of the sensors based on the Nash bargaining game remain unchanged. Figure 9 also shows that the residual energy under our algorithm is higher than that under the NBG algorithm. Wireless sensors thus have more power for information transmission under our algorithm.

QoS Revenue
According to the QoS revenue function in Equation (4), the QoS revenue is simulated and the comparison between our DG algorithm and the NBG algorithm is shown in Figure 10. All sensors are tested in our simulations. As the time increases, because the QoS revenue is directly proportional to

QoS Revenue
According to the QoS revenue function in Equation (4), the QoS revenue is simulated and the comparison between our DG algorithm and the NBG algorithm is shown in Figure 10. All sensors are tested in our simulations. As the time increases, because the QoS revenue is directly proportional to the energy level, the QoS revenue under the DG algorithm increases. Although the revenue under the DG algorithm is lower than that of the NBG algorithm, the increase of QoS revenue is fast. However, the QoS revenue under the NBG algorithm maintains a constant value. In addition, our algorithm reveals a better performance than the NBG algorithm. the energy level, the QoS revenue under the DG algorithm increases. Although the revenue under the DG algorithm is lower than that of the NBG algorithm, the increase of QoS revenue is fast. However, the QoS revenue under the NBG algorithm maintains a constant value. In addition, our algorithm reveals a better performance than the NBG algorithm.

Conclusions
In this paper, we research the uplink transmit power control problem in wireless powered sensor networks. We propose a non-cooperative differential game model to analyze the optimal transmission power for the energy harvesting sensors. In the game, each sensor determines the uplink transmit power, to maximize the utility combination of energy revenue and QoS revenue in a time horizon. According to the Bellman dynamic programming, we can individually obtain the Nash equilibrium (NE) solutions under a finite-horizon and an infinite-horizon. When all sensors achieve NE, the optimal trajectory of the power level can be derived and the maximized revenue can be obtained. The correctness and convergence of the proposed algorithm is proved through numerical simulations.
In future work, we will attempt to combine the power control problem and time scheduling problem, in order to analyse the buffer size influences in our model, which is more practical for the limited network resource. Then, the way in which we can achieve optimal power control under an appropriate MAC algorithm can be ascertained. Finally, the whole revenue can be maximized, based on this solution.