1. Introduction
With the development of the Internet of Things (IoTs), all things will have the ability to perceive the environment. Through the deep integration of the physical world and the digital world [
1], the IoTs revolutionizes connection and interaction among all objects. Based on the IoT, the physical equipment and physical world are no longer cold but become a vitality community. In addition, wireless sensor networks (WSN), as the nerve ending of IoTs, have attracted extensive attention and research interest by many scholars in recent years [
2], and their theoretical and applied research has received more and more attention [
3].
WSNs are composed of a series of sensor nodes. Generally, communication between the sensor nodes is completed using the common frequency band. However, with the development of wireless communication, this frequency band is increasingly crowded, and the communication between sensor nodes receives interference not only by their own nodes, but also increasingly serious and uncontrollable interference from other application type networks. As interconnected devices increase rapidly, sensor nodes with multiple applications overlapping coverage areas will suffer from more severe interference. Over the past few years, cognitive radio networks have been proposed as an effective way to alleviate the spectrum scarcity problem [
4]. Many scholars have realized that cognitive radio technology can be an effective means to improve the utilization of spectrum resources and have done a great deal of research on it [
5].
By configuring the cognitive radio module to the sensor node, we can detect the state of the licensed spectrum and opportunistically utilize the idle licensed spectrum for data transmission [
6]. A wireless network of sensor nodes with cognitive radio modules is also known as a cognitive radio wireless sensor network (CRWSN), which is no longer subject to signal interference from the public frequency band. Although CRWSNs are no longer subject to transmission interference caused by a lack of spectrum, nodes in the sensor network need to consume additional energy to implement cognitive radio functions, such as spectrum detection, spectrum switching, etc., which are generally powered by difficult-to-replace batteries. For sensor nodes, the problem of insufficient energy becomes more severe. Therefore, compared with traditional wireless sensor networks, the network lifetime problem of CRWSNs caused by insufficient node energy becomes more urgent [
7]. In order to overcome the lack of nodes and ensure that the network can continue to operate effectively, energy-harvesting technology has begun to be adopted by sensor nodes. Using energy-harvesting technology, sensor nodes can collect ambient conditions such as solar energy, wind energy, and vibration energy [
8,
9]. The nodes continuously obtain energy from the surrounding environment, and the lifetime of the sensor network can be effectively extended [
10].
Cognitive radio wireless sensor networks with energy harvesting can achieve a continuous supply of spectrum resources and energy resources to make up for the shortcomings of traditional sensor networks [
11]. Compared with traditional sensor networks, CRWSNs with energy harvesting have obvious advantages and great development potentials, which brings new opportunities to the development of sensor networks and lays a foundation for the development of the Internet of Things. However, both energy and spectrum resources in the CRWSNs are dynamic and difficult to predict. The sensor node performs spectrum detection by consuming energy to obtain transmission opportunities. Under the guarantee of sufficient residual energy, the sensor node can use this opportunity to transmit the collected data; otherwise, even if there is a transmission opportunity, the sensor node will not have enough energy and no data transmission or data collection.
Therefore, the resource allocation of CRWSNs with energy harvesting needs to consider the dynamic energy-harvesting process and the random available spectrum resources. Researchers have made some achievements in the research of the above problems. Firstly, there are three popular resource allocation schemes [
12] in the field of cognitive radio sensor networks: centralized, cluster-based, and distributed scenarios. The centralized scheme means that the central node of the cognitive radio network is the network backbone, which can control the power of other sensing nodes in the network and make decisions on resource allocation. The centralized scheme can be found in the literature [
13,
14,
15,
16], which achieves multiple optimization goals through efficient decision making, such as improving energy efficiency and maximizing throughput. However, the control channel transmits information that requires higher power, and the network is susceptible to central node failure. The cluster-based scheme means that the network is divided into multiple clusters, and each cluster has one cluster head and multiple intracluster nodes. After the cluster head aggregates the information, it makes the decision of resource allocation and communicates to each node. A number of cluster-based approaches have been adopted in some of the literature [
17,
18,
19,
20]. There are fewer members in the cluster, which can reduce the signaling overhead for each cluster head, but if the cluster head node fails, the connections to neighboring nodes in the cluster may be invalid. The distributed scenarios means that each node in a distributed system makes decisions either autonomously or in cooperation with neighboring nodes. In a time-varying environment, it is possible to quickly adapt to node failures and ensure robustness. However, this solution is based on local information decisions, which are susceptible to interference and malicious attacks, resulting in inefficient solutions [
21,
22,
23].
Additionally, lots of productive work has been done on the resource allocation problem under the cognitive radio networks with energy harvesting. Previous work focused mainly on the optimal spectrum sensing algorithms to improve spectrum utilization efficiency [
24]. There are also some studies that focused on transmission strategies, which aim to improve energy use efficiency and choose different channel models to improve the efficiency of energy collections [
25,
26]. Recently, cognitive radio network energy-harvesting systems based on spectrum sharing have become a significant research direction. A novel energy cooperation transmission scheme was proposed for cognitive spectrum-sharing-based D2D communication system in [
27]. Li et al. [
28] considered a spectrum-sharing scheme based on simultaneous wireless information and power transfer (SWIPT). Zhang et al. [
29] proposed a new cooperative spectrum-sharing protocol with dynamic time-slot allocation based on energy harvesting. However, none of the related works mentioned above takes into account resource allocation in CRWSNs with energy harvesting.
In CRWSNs with energy harvesting, the wireless sensors can be considered as the secondary users (SUs), and the base station or infrastructure can be seen as the primary user (PU). The SUs can capture the spectrum from the PU. Before information transmission, the SUs should harvest enough energy for information transmission. Then, when the channels are not occupied by the PU, the SUs will use the harvest energy to transmit information. The available information transmission spectrum mainly depends on the spectrum leased from the PU. The SUs should control their spectrum requirements to lower the spectrum lease cost, while completing the information transmission tasks. In this paper, we investigated the optimal resource allocation problem for the PUs and SU based on a differential game. The SUs control their spectrum requirements to minimize the cost.
In this paper, we propose a differential game-based resource allocation problem in CRWSN. The system state of the proposed CRWSN is the capacity of the spectrum resource that the PU wants to lease. All the wireless sensors that can be seen as SUs should control the resource level leased from the PU to minimize their cost during the information transmission. We obtained open loop and feedback Nash equilibria from the SUs. The numerical results are given to present the correctness of the differential game analysis. The whole paper is organized as follows:
Section 2 is the system differential game model and problem formulation, which consists of two parts, that is, system model and game formulation.
Section 3 is the game analysis, which consists of three parts, that is, open loop Nash equilibrium, feedback Nash equilibrium under the finite horizon, and feedback Nash equilibrium under the infinite horizon.
Section 4 is a numerical simulation and analysis. Finally, there is the conclusion about the main work and the corresponding summary in
Section 5.
3. Game Analysis
In this section, we try to find the equilibrium solution to the proposed differential game. Based on the spectrum price controlled by the PU, the SUs should control their spectrum requirements to minimize the overall cost given by Equations (3) and (4). Firstly, assuming the initial state of the spectrum capacity is known by all the SUs, we can find the open loop Nash equilibrium for each SU. Then, the feedback solutions to the proposed game are discussed based on Bellman dynamic programming, when the game players know the exact system state at time instant .
3.1. Open Loop Nash Equilibrium
First, we discuss the open loop solutions. In order to get the open loop equilibrium solutions, some definitions are needed as follows.
Definition 1. The allocated spectrum resource from PU to SU is optimal under the open loop condition if the following inequality holds for all control variables in the set of admissible spectrum , Definition 2. The Hamiltonian function of each SU in the proposed differential game in the time period can be given as follows:where is the costate function and is given by: Theorem 1. The allocated spectrum resource provides an open loop Nash equilibrium to the proposed resource allocation game in Equations (3) and (4) if there are constate functions , satisfying the following equations: Considering the optimal control problem given by Equations (3) and (4), based on the Pontryagin’s maximum principle, we can have the Nash equilibrium solutions of the optimal resource allocation problem for each SU.
Theorem 2. The optimal resource allocation strategy of SU is given by:where are the solutions to the following Riccati function: In Equation (9), the allocated spectrum resource for SU is a linear function of the system state , and affected by the unit spectrum price controlled by the PU. The SU should make decision for the spectrum from the PU based on the available system capacity at time and should consider the influence of unit resource price .
Proof. For the open loop equilibrium, Pontryagin’s maximum principle can be used as the necessary condition to find the optimal strategies. The Hamiltonian function of the SU is given by Equation (6), and take the derivative of the Hamiltonian function yields:
which is the optimal strategy for the SUs under the open loop condition. Based on Equation (12), we can allocate the spectrum resource to SUs for the proposed game durations
. □
3.2. Feedback Nash Equilibrium under Finite Horizon
In the open loop Nash equilibrium, the system state is not known by all the game players. The game players only know the initial system state. The SUs make decisions on the optimal allocated resource based on the time instant and the initial system state Next, we try to find the feedback strategies when the system state is known by the SUs. The optimal solutions to the proposed game under feedback condition depend on the current time and current system state. In order to have the feedback solutions, some definitions are needed.
Definition 3. The allocated spectrum resource from PU to SU is optimal under feedback condition if the following inequality holds for all control variables in the set of admissible spectrum : Theorem 3. The allocated spectrum resource provides a feedback Nash equilibrium to the proposed resource allocation game in Equations (3) and (4) if there are continuously differentiable functions , satisfying the following set of partial differential equations:where is the game equilibrium payoff of SU at time with the system state being , which is called value function of SU in the proposed resource allocation problem. Definition 4. The value function of each SU under feedback control can be given as follows:and satisfying the boundary condition: Given another resource allocation strategy , with the corresponding system state , then we can have the following equations:and: Integrating the above expressions from to , we obtain: Performing the indicated minimization in Equations (14) and (15) yields: Substituting Equation (18) into Equations (16) and (17) and solving Equations (16) and (17), one obtains:where , and satisfy the following differential equations: Theorem 4. The optimal resource allocation strategy of SU under the feedback control situation is given by:where and are the solutions of the differential equations given by Equations (20)–(22). 3.3. Feedback Nash Equilibrium under Infinite Horizon
We now turn the proposed game to the infinite-horizon autonomous game with a constant discount factor. In this subsection, the observing time for the differential game is infinity. The objective function and system state function, which are given by Equations (3) and (4), are both non-autonomous. The game problem in Equations (3) and (4) is changed to a problem with infinite horizon as follows:
subject to:
Under the infinite horizon, the solutions are independent of time-instant and dependent only on the system state at the starting time. A feedback solution for the infinite horizon game in Equations (24) and (25) can be characterized as follows.
Theorem 5. The allocated spectrum resource provides a feedback Nash equilibrium to the proposed resource allocation game in Equations (24) and (25) if there are functions , satisfying the following set of partial differential equations: Theorem 6. The optimal resource allocation strategy of SU is given by: Proof. Performing the indicated minimization in Equation (27), we can obtain:
Incorporating the solution
into Equations (24) and (25), and solving the equations, we can obtain:
where
,
and
satisfy the following equations:
□
4. Numerical Simulations
In this section, a series of numerical simulation experiments have been done using a mathematical software named MATLAB, version R2016a, to show the optimal strategy’s change over time about each of the SUs’ resource leased through the differential game model formulated and solved above. In the next portion, we comprehensively analyze the differential game model open loop Nash equilibrium solution and feedback Nash equilibrium solution to SUs about the spectrum band resource leased. In the simulations, for both the open loop and feedback situation, three SUs are chosen to make the simulation environment, and to show the dynamic changing process of the system strategy based on the proposed differential game model. All the differential game model simulation parameters that are used in the experiment process are shown in
Table 1, where we see that there are some parameters (discount rate r, the unit price
that the PU appointed to lease their spectrum resource, the spectrum loss rate
during the process of spectrum leasing) that are the same for the three users; on the one hand, to simplify the simulations, we consider some special scenarios.
Firstly, we analyze the open loop solution and feedback solution
of the differential game model that are formulated in this paper, and from
Section 3, we have got the optimal resource allocation strategy expression of SU about the open loop Nash equilibrium and feedback Nash equilibrium. Through the optimal strategy expression and the parameters’ setting, we get the strategy simulation figure, and via the simulation in
Figure 3a and
Figure 4a, we see that the changing of the resource allocation strategy
over time is decreasing gradually and stabilizing to a certain value in both the open loop and feedback solution simulation figure, which satisfies the actual fact that we know
represents the spectrum resource that SU
i leased from the PU at t instantaneous, and in the information transmission stage, the leased spectrum and the harvested energy are used by the SUs to complete certain information transmission tasks; as time goes by, some information transmission tasks may tend to be finished or stable, which results in that fewer spectrum resources may be needed with time than the previous moment, so the spectrum resource that needs to be leased from the PU may decrease gradually, and with transmission tasks finished or tending toward a stable state, some permanent spectrum resource may just be needed to maintain the message transmission status, so the curve trend is stable near to a certain value.
Secondly, we analyze the variation of
over time under the optimal resource allocation strategy
in both the open loop and feedback solution situation; from Formula (4) and same as with the analysis description of
in the above, we know
x(
t) represents the spectrum resource capacity that the PU wants to rent externally at t instantaneous and the variations of
x(
t) are relevant to not only the optimal strategy
but also
x(
t) itself. Here, through Formula (4), the optimal strategy
, and the parameters’ setting, we get the changing simulation figure about
,
Figure 3b that represents the change of
with time under an open loop solution, and
Figure 4b that represents the change of
with time under a feedback solution situation. In addition, from
Figure 3b and
Figure 4b, we see that the trend of
over time gradually decreases and stabilizes to a certain value, which satisfies the actual fact that, on the one side, with the information transmission tasks near to being finished, the resource of the spectrum band that the SU wants to lease from the PU may be reduces, which result in the consequence that the resource the PU wants to rent outside may be depressed to achieve the maximum of resource utilization. On the other side of the shield, long-term rental resources may cause the reduction of the spectrum used by the PU, which may bring about the reduction of its work efficiency to the PU; therefore, the spectrum that the PU rents externally may decrease over time. The optimal solutions in the feedback situation under an infinite horizon are also analyzed, which are given in
Figure 5. Based on the results given in
Figure 5, we can find that the system state will be changed with the time varying, and the optimal solutions would be changed, and there will be more optimal solutions for the users to choose.