DDPG-Based Throughput Optimization with AoI Constraint in Ambient Backscatter-Assisted Overlay CRN

Jia, Xueli; Zheng, Kechen; Chi, Kaikai; Liu, Xiaoying

doi:10.3390/s22093262

Open AccessArticle

DDPG-Based Throughput Optimization with AoI Constraint in Ambient Backscatter-Assisted Overlay CRN

School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(9), 3262; https://doi.org/10.3390/s22093262

Submission received: 4 April 2022 / Revised: 18 April 2022 / Accepted: 22 April 2022 / Published: 24 April 2022

(This article belongs to the Section Communications)

Download

Browse Figures

Versions Notes

Abstract

:

The combination of ambient backscatter (AB) communications (ABCs) and RF-powered cognitive radio networks (CRNs) deals with challenges of both energy supply and spectrum shortage, and improves the network performances. With the expansion of wireless networks, many applications raise requirements for both high-throughput and timely data. Driven by these facts, we study the long-term throughput optimization of the secondary network in the AB-assisted overlay CRN (ABO-CRN), ABCs, and CRNs with the age of information (AoI) constraint, which is a novel metric for measuring the freshness of data received by receivers. Due to the dynamic environment, complete knowledge of the environment could not be obtained. Then, the deep deterministic policy gradient (DDPG), a deep reinforcement learning (DRL) method that addresses decision issues in both continuous and discrete spaces, is deployed to address the throughput optimization. We consider the impacts of time and energy allocation on the reward when the AoI constraint can not be satisfied, and develop the corresponding reward functions. Furthermore, we analyze the impacts of the minimum throughput requirement and maximum allowable AoI on the throughput and AoI of the secondary networks in the ABO-CRN, ABCs, and CRNs. We compare the throughput optimization scheme under the AoI constraint with two baseline schemes (i.e., throughput-optimal (T-O) and AoI-optimal (A-O) baseline schemes), and the simulation results show that the throughput of the ABO-CRN is close to the optimal throughput of the T-O baseline scheme, and the AoI of the ABO-CRN is close to the optimal AoI of the A-O baseline scheme.

Keywords:

cognitive radio networks; ambient backscatter; age of information; DDPG

1. Introduction

Nowadays, the number of wireless devices has increased year by year, and the amount of spectrum demands has also increased [1]. However, the amount of spectrum is limited, and the majority of spectrum has been allocated to licensed users (primary users, PUs) as the licensed spectrum. PUs occupy the licensed spectrum at a certain time and places, and the licensed spectrum may not be occupied for a long time. In this case, the utilization ratio of the licensed spectrum is low [2]. In addition, due to the limited size, wireless devices can not carry large-capacity batteries, and frequent replacement of batteries is not allowed under certain circumstances, such as the inside of chimneys and the inside of bodies [3]. Therefore, issues of spectrum shortage and energy supply attract a large number of researchers. The RF-powered ambient backscatter (AB)-assisted cognitive radio networks (CRNs) (AB-CRNs) are introduced to improve the spectrum utilization ratio and alleviate the difficulty of energy supply. We introduce the RF-powered AB-CRNs from the following three aspects: the energy harvesting technology, the CRNs, and the AB communications (ABCs).

Energy harvesting technology, which allows wireless devices to scavenge energy from the environmental energy sources, is an efficient way to address the difficulty of energy supply [2,4]. The energy sources can be solar, wind, heat, and RF signals, etc. Some energy sources are unstable and conditionally available. Taking solar as an example, harvesting energy from solar depends on weather conditions, and can be used during the daytime. In addition, the devices powered by solar are bulky and costly [5], and most wireless devices can not carry huge solar panels. Differently, harvesting energy from RF signals has lots of advantages, i.e., stability, ubiquity, and controllability [6], and has been applied in a variety of wireless network fields. In [7], Yang et al. investigated the application of energy harvesting technology in wireless body area networks. Chi et al. [8] aimed to minimize the RF energy provision of time-division-multiple-access (TDMA) and non-orthogonal-multiple-access (NOMA) schemes in wireless powered communication network. In [9], Hoang et al. explored security and energy harvesting requirements in MIMO-OFDM networks. Authors in [10] and [11] applied energy harvesting technology in wireless sensor networks (WSN). Azarhava et al. [10] studied the energy harvesting WSN with TDMA scheme. Ghosh et al. [11] maximized the energy efficiency by proposing the upper confidence bound based algorithm.

Cognitive radio (CR) technology is introduced to deal with the issue of spectrum shortage. Secondary users (SUs), i.e., cognitive users, are allowed to access the licensed spectrum opportunistically without causing harmful influence to the PUs, which improves the spectrum utilization ratio [12]. SUs are able to execute various transmission modes in the CRNs, such as overlay mode and underlay mode. When secondary transmitters (STs) transmit data in overlay mode, they exclusively occupy the licensed spectrum when PUs are not using the licensed spectrum, and immediately leave the licensed spectrum when PUs begin to transmit. When STs transmit data in underlay mode, STs could access the licensed spectrum even when the licensed spectrum is used by PUs. However, the secondary transmission should not lower the transmission quality of PUs, hence the transmit power of STs can not exceed the power threshold that PUs tolerate [13]. RF-powered CRNs allow SUs to harvest energy from RF signals, and access the licensed spectrum following various transmission modes. As a result, RF-powered CRNs are able to alleviate both spectrum shortage and energy insufficiency, and have now been extensively studied. In [14], Zheng et al. divided a unit circular area into three concentric circle regions: overlay region, underlay region, and energy harvesting region, and examined spectrum-limited and energy-limited scenarios. In the cooperative CRN, Papadopoulos et al. [15] enhanced the overall throughput performance by two network coding algorithms. In [16], Rathee et al. considered security threats in the CRNs, and proposed a secure hand-off mechanism for the emulation attack of cognitive users.

Although the RF-powered CRNs have excellent performance in solving the problems of spectrum shortage and low utilization ratio, as well as energy supply difficulties, whether the transmission performance of the CRNs could be improved depends on the energy harvesting from the RF signals of primary users and the spectrum usage [17]. For example, if a ST harvests little energy, or its energy storage capacity is too small to store sufficient energy, the ST lacks energy to transmit data, or spends much time harvesting energy, degrading the throughput performance of the secondary network. Furthermore, if the PU occupies the licensed spectrum with high probability, the ST has few opportunities to transmit data. Driven by these facts, AB technology is proposed to alleviate difficulties of both spectrum shortage and energy insufficiency. AB mode transmission is performed by using ambient RF signals, such as TV signals, and it does not need extra spectrum. In addition, after receiving the RF signal, the devices with AB functionality map the data to a series of reflective states by adjusting the load impedance of the antenna, and then reflect the RF signals to the receiver according to the matching degree between the antenna impedance and the load impedance. The operation of reflection consumes little energy [18]. Ye et al. [19] investigated the ABCs with multiple channel links, and minimized the outage probability. Liu et al. [20] utilized coherent and non-coherent orthogonal space-time block codes to improve the backscatter efficiency in ABCs with multiple antennas. In [21], Madavani et al. studied a full-duplex ABC, and optimized the throughput of the minimum AB device and the overall throughput.

However, throughput of AB mode transmission is relatively low, especially when the primary RF signals are weak. Considering the characteristics of CR technology and AB technology, the combination of AB technology and RF-powered CRNs is potentially helpful for dealing with spectrum shortage and energy insufficiency. The RF-powered AB-CRNs are energy-saving and spectrum-saving [22]. As far as we know, Hoang et al. in [22] initially introduced the RF-powered AB-CRNs, and analyzed the throughput of the AB-assisted overlay CRN (ABO-CRN) and the AB-assisted underlay CRN (ABU-CRN). Extended from [22], Zhuang et al. [23] studied the RF-powered AB-CR-NOMA networks, where STs perform underlay mode transmission. In [24], Zhu et al. investigated the distributed resource allocation in AB-CRNs.

In [22,23], authors investigated popular metrics, such as throughput. Besides, the age of information (AoI), a novel metric that measures the freshness of data received at the receiver, has also been extensively studied in recent years [25]. Different from delay metric, AoI focuses on the data timeliness of the receiver, while delay metric focuses on that of the transmitter. Authors in [26,27,28] investigated the AoI minimization in the CRNs. In [26], Leng et al. utilized the partially observation Markov decision process to analyze the AoI performance in several cases. In [27], Gu et al. studied the AoI in overlay and underlay scenarios, and analyzed the effect of critical generation rate of the primary IoT on the secondary IoT. In [28], Wang et al. took the collision constraint into account to minimize the long-term average AoI. The AoI minimization challenge in ABCs is also a fascinating and important research topic. In [29], Abbas et al. focused on the minimization of AoI in the backscatter communications, and introduced several algorithms for the AoI minimization.

With the expansion of wireless networks, timely delivery is required [30], and in the scenarios, such as cyber-physical system, low-quality but timely data is useless [31]. However, the throughput optimization can not guarantee the freshness of data, and AoI optimization can not guarantee the quality of data. Liu et al. [32] investigated the AoI minimization under throughput requirements in the multi-path network. Kadota et al. [33] proposed a low-complexity scheduling algorithm for the AoI minimization with throughput constraints of the wireless network. Bhat et al. [34] studied the throughput maximization under the AoI constraint in fading channels.

Obviously, compared with the optimization of a single frame, long-term optimization is more practical. In practice, the network environment is dynamic, and the channel quality, such as the channel gain, varies with the frame. The optimization of a single frame is limited by the channel quality, and affects that of the subsequent frames. The short-term optimization ignores the connection between the optimization of the current frame and that of subsequent frames, which degrades the performance of the network. However, the long-term optimization takes the aforementioned connection into account, and provides more practical decision to enhance the performance of the network. Taking the throughput optimization of a single frame by energy management as the example, the throughput increases with the consumed energy, hence consuming all the available energy of the frame is optimal for the short-term optimization. However, consuming all the available energy in the frame with poor channel quality leads to the lack of available energy in the frame with good channel quality. Therefore, the long-term optimization is more practical than the short-term optimization. Due to the dynamic and uncertain network parameters, the complete knowledge about the network could not be obtained in advance. Some traditional methods are incapable of addressing challenges with too many dynamic and unpredictable environmental parameters. Deep reinforcement learning (DRL) has been proved as an effective way to tackle the challenge [35,36]. When applying value-based DRLs, the action space for DRLs has to be discrete. If the discrete methods are improper for the scenario, vital information may be lost, or the action space dimension may be too large [37]. Different from value-based DRLs, DRLs based on the policy gradient are able to deal with the problems of continuous spaces, and have no need to discretize the action space. Policy gradient-based DRLs have been applied into the field of the wireless networks [38].

Taking the limitations summarized in Table 1 into account, we conclude the novelties and contributions as follows. Considering the problems of spectrum shortage and energy insufficiency, we focus on AB-CRNs, while a majority of researches in AB-CRNs evaluated the network performances such as throughput, energy consumption, delay, etc., and ignored the data freshness of the secondary receivers (SRs). Driven by the fact, we optimize the long-term throughput of the secondary network in the ABO-CRN with the AoI constraint, in order to guarantee the high throughput and data freshness of the secondary network. According to our knowledge, we are the first to study the optimization of both throughput and AoI in the research area of AB-CRNs. The main contributions are summarized as follows.

In order to achieve the long-term throughput optimization of the secondary network with the AoI constraint, we utilize deep deterministic policy gradient (DDPG), a DRL based on the policy gradient, to find the optimal policy for jointly managing time and energy of STs. Considering the impacts of time and energy allocation on the reward when the AoI constraint can not be satisfied, we develop the corresponding reward functions with respect to the channel states.
We analyze the minimum throughput requirement and the maximum allowable AoI for the throughput and AoI performances in the ABO-CRN, ABCs, and CRNs.
We introduce throughput-optimal (T-O) and AoI-optimal (A-O) baseline schemes as comparisons for the throughput optimization with the AoI constraint. The simulation results show that the throughput of the ABO-CRN is close to the optimal throughput of the T-O baseline scheme, and the AoI of the ABO-CRN is close to the optimal AoI of the A-O baseline scheme.
We evaluate the impacts of the minimum throughput requirement and maximum allowable AoI on the throughput and AoI performances of the secondary networks in the ABO-CRN, ABCs, and CRNs, and demonstrate that the ABO-CRN improves the throughput and AoI performances of the ABCs and CRNs.

The remainder of this paper is organized as follows. In Section 2, we introduce the network model and operations of STs in the ABO-CRN, ABCs, and CRNs. In Section 3, we introduce the problem formulation, such as throughput and AoI definitions. In Section 4, we utilize DDPG to find the optimal policy for jointly managing time and energy of STs. In Section 5, simulation results are shown. In Section 6, we conclude the paper.

2. System Model

In this section, we first depict the structures and channel models of the ABO-CRN, ABCs, and CRNs, and then introduce the network models of the ABO-CRN, ABCs, and CRNs, respectively.

2.1. Structures and Channel Models

The ABO-CRN, ABCs, and CRNs are composed of a primary network and a secondary network. The secondary network consists of a SR and

n + 1

STs,

n \in {0, 1, \dots}

. In the primary network, the primary transmitter (PT) utilizes the licensed channel to transmit data. The probability that the PT occupies the channel in each frame, denoted by

P_{a}

, can be obtained through the long time observation. When the PT transmits data in frame

t \in {1, 2, \dots, K}

, the channel state is active, denoted by

s_{t}^{a} = 1

. When the PT does not transmit data in frame t, the channel state is inactive, denoted by

s_{t}^{a} = 0

.

In the secondary network, as the random distribution of the SUs in [39], the SR and STs are randomly placed within the coverage of the primary RF signals from the PT, as shown in the Figure 1a,c. Each ST is equipped with a single antenna and a rechargeable capacitor with finite capacity E. The SR is equipped with a single antenna and a wired energy source, hence there is no need to consider the energy supply of the SR. In order to measure the freshness of the data that the SR receives from STs, SR has capability to record the AoI of the data from each ST. Similar as [40], the SR plays the role of center controller to manage the time and energy for STs. At the very beginning of the frame, the SR senses the channel, and then provides the allocation of time and energy for STs.

In the considered scenario, frames with equal duration are successive. The frame duration is synchronized with the primary network, and without loss of generality, we normalize the frame duration as 1 [2]. As shown in Figure 1b,d, Figure 2b and Figure 3b,d, each frame consists of one or more slots. The duration of each slot is determined by the SR. We consider that the channel state remains unchanged in one frame, but varies in subsequent frames. When

s_{t}^{a} = 1

, the SR and STs receive stable and continuous RF signals from the PT. In frame t, the channel gain between the PT and ST

_{i}

,

i = 0, 1, \dots, n

, is denoted by

g_{t, i}

, and that between ST

_{i}

and the SR is denoted by

h_{t, i}

. The Rayleigh distribution [41] is used to formulate the channel gains that remain unchanged in one frame, and the channel noise is modeled as Additive White Gaussian Noise (AWGN) with variance

δ^{2}

.

2.2. Network Models

2.2.1. Network Model of ABO-CRN

The operations executed by the STs in the ABO-CRN depend on the value of

s_{t}^{a}

. When

s_{t}^{a} = 1

, as shown in Figure 1a,b, STs execute AB mode transmission by TDMA scheme, and ST

_{i}

harvests energy when the other STs transmit data in AB mode. We consider the scenario where the energy consumption of AB mode transmission is negligible. Therefore, no dedicated slot for energy harvesting is required by STs. When

s_{t}^{a} = 0

, as shown in Figure 1c,d, following the TDMA scheme, STs execute overlay mode transmission by consuming the energy stored in the rechargeable capacitor. With the aim to optimize the long-term average throughput with the AoI constraint, ST

_{i}

may not consume all the available energy

ε_{t, i}

during frame t. The energy

e_{t, i}

consumed by ST

_{i}

, denoted by

e_{t, i}

, is determined by the SR.

We provide a flow chart of the ABO-CRN in Figure 4. The actions in Figure 4 are executed by STs according to

s_{t}^{a}

. The SR decides the time and energy allocation of STs according to the channel information and states of each ST. The channel information includes the channel state

s_{t}^{a}

and channel gains. The states of each ST include the available energy and the AoI of the current frame. Note that, the feedback information in Figure 4 includes two parts. The first part is the new energy state in each ST, which is the available energy of the next frame. The second part is the received reward after STs execute the actions decided by the SR.

2.2.2. Network Model of ABCs

The operations executed by the STs in the ABCs depend on the value of

s_{t}^{a}

. When

s_{t}^{a} = 1

, as shown in Figure 2, STs take turns to execute AB mode transmission by TDMA scheme. When

s_{t}^{a} = 0

, since the PT does not broadcast RF signals on the current channel, STs do not execute AB mode transmission.

2.2.3. Network Model of CRNs

The operations executed by the STs in the CRNs depend on the value of

s_{t}^{a}

. When

s_{t}^{a} = 1

, as shown in Figure 3a,b, STs harvest energy. When

s_{t}^{a} = 0

, as shown in Figure 3c,d, following the TDMA scheme, STs execute overlay mode transmission by consuming the energy stored in the rechargeable capacitor. The energy

e_{t, i}

consumed by ST

_{i}

is determined by the SR.

3. Formulation and Analysis of the Problem

For the readability, we provide a parameter list in Table 2 that summarizes the main parameters and meanings.

3.1. Throughput Definition

The total throughput

T_{t}

of the secondary network in the ABO-CRN, ABCs, and CRNs in frame t can be expressed as

T_{t} = \sum_{i = 0}^{n} T_{t, i},

(1)

where

T_{t, i}

denotes the throughput of ST

_{i}

in frame t. Due to the fact that the operations executed by STs depend on the channel state

s_{t}^{a}

, the calculation of

T_{t, i}

depends on the value of

s_{t}^{a}

.

3.1.1. Throughput Definition of ABO-CRN

In the ABO-CRN,

T_{t, i}

is expressed as

T_{t + 1, i} = \{\begin{matrix} T_{t, i}^{A}, & s_{t}^{a} = 1; \\ T_{t, i}^{O}, & s_{t}^{a} = 0 . \end{matrix}

(2)

When

s_{t}^{a} = 1

, according to the Shannon Theory, the throughput of ST

_{i}

by AB mode transmission, denoted by

T_{t, i}^{A}

, is expressed as

T_{t, i}^{A} = α_{t, i} W {log}_{2} (1 + \frac{θ P g_{t, i} h_{t, i}}{δ^{2}}),

(3)

where

α_{t, i} \in [0, 1]

denotes the duration of data transmission by ST

_{i}

through AB mode, W denotes the bandwidth,

θ \in [0, 1]

denotes the backscatter reflection coefficient that depends on the electronic component factors,

P

denotes the transmit power of the PT, and

g_{t, i}

denotes the channel gain from the PT to ST

_{i}

, and

h_{t, i}

denotes the channel gain from ST

_{i}

to the SR. In particular,

θ P g_{t, i}

represents the transmit power of ST

_{i}

for AB mode transmission. ST

_{i}

harvests energy when the other STs transmit data in AB mode. The harvested energy of ST

_{i}

, denoted by

e_{t, i}^{h}

, is calculated as

e_{t, i}^{h} = min \{\sum_{j = 0, j \neq i}^{n} α_{t, i} P g_{t, i}, E - ε_{t, i}\} .

(4)

After energy harvesting, the available energy in ST

_{i}

of frame

t + 1

, denoted by

ε_{t + 1, i}

, is updated as

ε_{t + 1, i} = min \{ε_{t, i} + e_{t, i}^{h}, E\} .

(5)

When

s_{t}^{a} = 0

, according to the Shannon Theory, the throughput of ST

_{i}

by overlay mode transmission, denoted by

T_{t, i}^{O}

, is expressed as

T_{t, i}^{O} = α_{t, i} W {log}_{2} (1 + \frac{e_{t, i} h_{t, i}}{α_{t, i} δ^{2}}),

(6)

where

α_{t, i} \in [0, 1]

denotes the duration of data transmission by ST

_{i}

through overlay mode, and

e_{t, i} \in [0, ε_{t, i}]

denotes the energy consumed for overlay mode transmission in frame t. In particular,

e_{t, i}

is determined by the SR, and

ε_{t + 1, i}

in ST

_{i}

of frame

t + 1

is updated as

ε_{t + 1, i} = max \{ε_{t, i} - e_{t, i}, 0\} .

(7)

3.1.2. Throughput Definition of ABCs

In the ABCs, the throughput of STs is achieved by AB mode transmission when the channel state is active. Therefore, when

s_{t}^{a} = 0

,

T_{t, i} = 0

holds, and when

s_{t}^{a} = 1

, according to the Shannon Theory,

T_{t, i}

is expressed as

T_{t, i} = T_{t, i}^{A} = α_{t, i} W {log}_{2} (1 + \frac{θ P g_{t, i} h_{t, i}}{δ^{2}}),

(8)

where

α_{t, i}

,

θ

,

P

,

g_{t, i}

, and

h_{t, i}

represent the same meaning as that in Equation (3). Since the energy consumption of AB mode transmission is negligible, the energy update is not considered in the ABCs.

3.1.3. Throughput Definition of CRNs

In the CRNs, the throughput of STs is achieved by overlay mode transmission when the channel is inactive. When the channel is active, STs harvest energy from the RF signal of the PT. Therefore, when

s_{t}^{a} = 1

,

T_{t, i} = 0

holds, and

ε_{t + 1, i}

in ST

_{i}

of frame

t + 1

is updated as

ε_{t + 1, i} = min \{ε_{t, i} + e_{t, i}^{h}, E\} .

(9)

When

s_{t}^{a} = 0

, according to the Shannon Theory,

T_{t, i}

is expressed as

T_{t, i} = T_{t, i}^{O} = α_{t, i} W {log}_{2} (1 + \frac{e_{t, i} h_{t, i}}{α_{t, i} δ^{2}}) .

(10)

ε_{t + 1, i}

in ST

_{i}

of frame

t + 1

is updated as that in Equation (7).

3.2. Definition of AoI

AoI is a novel metric to measure the freshness of data received by the receiver. In particular, AoI is used to track the time elapsed since the time point of the latest data generation to the time point that the latest data is successfully received by the receiver [33]. We utilize the linear scheme to calculate AoI of STs, where the AoI is updated as

a_{t + 1, i} = \{\begin{matrix} 1, & λ_{t, i} = 1; \\ a_{t, i} + 1, & λ_{t, i} = 0, \end{matrix}

(11)

where

a_{t, i}

denotes the AoI of ST

_{i}

in frame t,

λ_{t, i} = 1

indicates that the latest data of ST

_{i}

is successfully received by the SR, and

λ_{t, i} = 0

indicates that the latest data of ST

_{i}

is not successfully received by the SR. With the aim to optimize the long-term average throughput of the secondary network with the AoI constraint, we set a minimum throughput requirement

T_{m i n}

for every ST. Specifically, if the throughput of ST

_{i}

during frame t is no less than

T_{m i n}

, the latest transmitted data of ST

_{i}

is considered to be successfully received by the SR. Based on Equation (11) and the aforementioned analysis of

λ_{t, i}

,

λ_{t, i}

is expressed as

λ_{t, i} = \{\begin{matrix} 1, & T_{t, i} \geq T_{m i n}; \\ 0, & else . \end{matrix}

(12)

By combining Equations (11) and (12), the update of AoI is calculated as

a_{t + 1, i} = \{\begin{matrix} 1, & T_{t, i} \geq T_{m i n}; \\ a_{t, i} + 1, & else . \end{matrix}

(13)

Obviously, when

s_{t}^{a} = 0

in the ABCs, STs achieve negligible throughput, hence the throughput of each ST can not exceed

T_{m i n}

,

a_{t + 1, i} = a_{t, i} + 1

holds. When

s_{t}^{a} = 1

in the CRNs, the same conclusion holds.

3.3. Problem Formulation

The throughput optimization objective function of the ABO-CRN, ABCs, and CRNs is expressed as

\begin{matrix} Maximize & \bar{T} = lim_{K \to \infty} \frac{1}{K} E (\sum_{t = 1}^{K} T_{t}) \end{matrix}

(14a)

\begin{matrix} s . t . : & 0 \leq α_{t, i} \leq 1 and \sum_{i = 0}^{n} α_{t, i} \leq 1, \end{matrix}

(14b)

\begin{matrix} 0 \leq e_{t, i} \leq ε_{t, i}, \end{matrix}

(14c)

\begin{matrix} lim_{K \to \infty} \frac{1}{K (n + 1)} E (\sum_{t = 1}^{K} \sum_{i = 0}^{n} a_{t, i}) \leq A_{m a x}, \end{matrix}

(14d)

where

A_{m a x}

denotes the maximum allowable AoI that the secondary network tolerates, and Equation (14d) indicates that the average accumulated AoI should be smaller than

A_{m a x}

. Since the energy consumed by AB mode transmission of STs is negligible, the SR in the ABCs does not need consider the constraint in Equation (14c).

3.4. Analysis of $T_{m i n}$ and $A_{m a x}$

In this subsection, we analyze

T_{m i n}

and

A_{m a x}

in the ABO-CRN, ABCs, and CRNs. The expectation of the long-term average throughput in the ABO-CRN, ABCs, and CRNs is expressed as

E (\bar{T_{t, i}}) = lim_{K \to \infty} \frac{1}{K (n + 1)} E (\sum_{t = 1}^{K} \sum_{i = 0}^{n} T_{t, i}) .

(15)

According to Equations (2), (8) and (10), we have

E (\sum_{t = 1}^{K} \sum_{i = 0}^{n} T_{t, i}) = \{\begin{matrix} E (\sum_{t = 1}^{K} \sum_{i = 0}^{n} (P_{a} T_{t, i}^{A} + (1 - P_{a}) T_{t, i}^{O})), & ABO - CRN; \\ E (\sum_{t = 1}^{K} \sum_{i = 0}^{n} P_{a} T_{t, i}^{A}), & ABCs; \\ E (\sum_{t = 1}^{K} \sum_{i = 0}^{n} (1 - P_{a}) T_{t, i}^{O}), & CRNs . \end{matrix}

(16)

T^{-}

denotes the long-term average throughput of each ST whose average throughput is smaller than

T_{m i n}

,

T^{+}

denotes that is no smaller than

T_{m i n}

, a denotes the long-term average AoI of each ST whose average throughput is smaller than

T_{m i n}

, and let N equal

n + 1

.

Lemma 1.

With

E (\bar{T_{t, i}}) \leq T_{m i n}

holds, when

T^{-}

is closer to

T_{m i n}

, the tolerable value interval of a is larger. When

a > A_{m a x} - 1

, the AoI constraint can not be satisfied.

Proof.

We assume there are x STs with

T^{-}

, and

N - x

STs with

T^{+}

. We have

E (\bar{T_{t, i}}) = \frac{x T^{-} + (N - x) T^{+}}{N} = \frac{x (T^{-} - T^{+}) + N T^{+}}{N} .

(17)

Since

E (\bar{T_{t, i}} \leq T_{m i n})

holds, we have

E (\bar{T_{t, i}}) - T_{m i n} = \frac{x (T^{-} - T^{+}) + N T^{+}}{N} - T_{m i n} \leq 0, and x \geq \frac{N (T - T^{+})}{T^{-} - T^{+}} .

(18)

In order to satisfy the AoI constraint, Equation (14d) is updated to

\frac{x (a + 1) + N - x}{N}

, and we have

\frac{x (a + 1) + N - x}{N} - A_{m a x} \leq 0, and a \leq \frac{(A_{m a x} - 1) N}{x} .

(19)

Bring

x = \frac{N (T - T^{+})}{T^{-} - T^{+}}

into Equation (19), and we have

a \leq (A_{m a x} - 1) \frac{T^{-} - T^{+}}{T - T^{+}} .

(20)

Obviously, when

T^{-}

is closer to

T_{m i n}

, the tolerable value interval of a is larger.

Since

T^{-} < T

holds, the

\frac{T^{-} - T^{+}}{T - T^{+}} < 1

holds, hence a can not exceed

A_{m a x} - 1

. Therefore, when

a > A_{m a x} - 1

, the AoI constraint can not be satisfied. The proof is completed. □

Lemma 2.

The lower bound of

A_{m a x}

that makes STs satisfy the Equation (14d) decreases with n, and increases with the number of STs whose average throughput is smaller than

T_{m i n}

.

Proof.

We assume x is the number of STs whose average throughput is smaller than

T_{m i n}

, a has been given. From Equation (19), we deduce

A_{m a x} \geq \frac{a x}{n + 1} - 1 .

(21)

With the larger value of n, the lower bound of

A_{m a x}

that makes STs satisfy the Equation (14d) decreases. With the larger value of x, the lower bound for

A_{m a x}

increases. The proof is completed. □

Then we compare the impacts of

T_{m i n}

and

A_{m a x}

on the ABO-CRN, ABCs, and CRNs. We discuss the impacts in some extremely cases, i.e.,

P_{a}

, the probability of

s_{t}^{a} = 1

, is relatively small or relatively large. In the ABCs, when

P_{a}

is relatively small, there are few opportunities for AB mode transmission. In the CRNs, when

P_{a}

is relatively large, there are few opportunities for overlay mode transmission. In these two cases,

E (\bar{T_{t, i}})

is small, and the lower bound of

T_{m i n}

is low, and the tolerable value interval and

A_{m a x}

is small. Different from the ABCs and CRNs, when

s_{t}^{a} = 1

, the STs in the ABO-CRN execute AB mode transmission, and when

s_{t}^{a} = 0

, STs execute overlay mode transmission. As described in Equation (16),

E (\bar{T_{t, i}})

of the ABO-CRN is higher than that of the ABCs and CRNs. Under the same conditions of

P_{a}

,

T_{m i n}

, and

A_{m a x}

, the ABO-CRN achieves higher throughput while satisfying the AoI constraint.

4. Policies of Time and Energy Management

As described in the introduction, long-term optimization is more practical than the optimization of a single frame. Maximizing the throughput of a single frame with the AoI constraint may not be desirable. As a result, we consider the long-term optimization of the throughput. However, since network environmental factors, such as the channel state and channel gains, are dynamic and uncertain, it is difficult for SUs to obtain complete knowledge about the network environmental factors in advance. DRL is an excellent way to tackle the challenge. For some DRLs, such as deep Q-learning network (DQN) that is based on the value-function policy, discrete spaces are necessary. If the discrete methods are not suitable for the scenario, it may lose important information, or lead to the high space dimension. Therefore, we utilize DDPG, which deals with problems of continuous spaces, to find the optimal policy of time and energy management for throughput optimization. We define the details about DDPG in the following subsections.

4.1. Definitions of Spaces and Rewards

The SR plays the role of agent that provides decisions for STs. According to Equations (2)–(14d), the state spaces, action space, and rewards are introduced as follows.

4.1.1. State Space

The SR determines time and energy allocation of STs based on the states of STs and channel information of the current frame, including the available energy in STs, the AoI about STs, channel gains, and the channel state. Therefore, the state space contains information about energy states, AoI states, states of channel gains, and channel states. The energy-state space is represented by

S_{E} = {(ε_{t, 0}, ε_{t, 1}, \dots, ε_{t, n}); 0 \leq ε_{t, i} \leq E} .

(22)

The AoI-state space is represented by

S_{A} = {(a_{t, 0}, a_{t, 1}, \dots, a_{t, n})},

(23)

where the average accumulated AoI satisfies Equation (14d). In order to reduce the dimension of the channel-gain-state space, we represent the channel gains as

\begin{matrix} h_{t, i} & = 10^{- 3} η_{h, t} l_{i}^{ϵ}, \\ g_{t, i} & = 10^{- 3} η_{g, t} L_{i}^{ϵ}, \end{matrix}

(24)

where

η_{h, t}

denotes the path loss coefficient from the PT to STs, and

η_{g, t}

denotes the path loss coefficient from STs to the SR,

l_{i}

denotes the distance between the PT and ST

_{i}

, and

L_{i}

denotes the distance between the SR and ST

_{i}

, and

ϵ

denotes the channel path fading exponent. Therefore, the channel-gain-state space is represented by

S_{G} = {(η_{h, t}, η_{g, t})},

(25)

where

η_{h, t}

and

η_{g, t}

follow the Rayleigh distribution. The channel-state space is expressed as

S_{C} = {s_{t}^{a}; s_{t}^{a} \in {0, 1}},

(26)

where

P_{a}

represents the probability of

s_{t}^{a} = 1

.

In summary, the state space of the ABO-CRN and of the CRNs when

s_{t}^{a} = 0

is expressed as

S = S_{E} \times S_{A} \times S_{G} \times S_{C} .

(27)

The state space of the ABCs when

s_{t}^{a} = 1

is expressed as

S = S_{A} \times S_{G} \times S_{C} .

(28)

Note that, when

s_{t}^{a} = 0

, STs in the ABCs do not execute AB mode transmission. When

s_{t}^{a} = 1

, STs in the CRNs only harvest energy. As a result, the SR does not need to determine actions for STs, hence we do not design state space for these two cases.

4.1.2. Action Space

In the ABO-CRN, based on the state of the current frame, the SR determines the actions that STs execute in the current frame. When

s_{t}^{a} = 1

, STs execute AB mode transmission, and ST

_{i}

harvests energy when the other STs transmit data in AB mode. When

s_{t}^{a} = 0

, STs execute overlay mode transmission. We define the action space as

A = \{\begin{matrix} (α_{t, 0}, α_{t, 1}, \dots, α_{t, n}, e_{t, 0}, e_{t, 1}, \dots, e_{t, n}); \sum_{i = 0}^{n} α_{t, i} \leq 1, 0 \leq e_{t, i} \leq ε_{t, i} . \end{matrix}\}

(29)

In particular, when

s_{t}^{a} = 1

, the SR only utilizes the time

(α_{t, 0}, α_{t, 1}, \dots, α_{t, n})

of A to address the time management for STs, which is shown as Figure 5.

When

s_{t}^{a} = 1

, STs in the ABCs execute AB mode transmission. Since the energy consumption of AB mode transmission can be ignored, the SR in the ABCs focuses on the time allocation of STs. We define the action space of the ABCs as

A = \{(α_{t, 0}, α_{t, 1}, \dots, α_{t, n}); \sum_{i = 0}^{n} α_{t, i} \leq 1 .\}

(30)

When

s_{t}^{a} = 0

, STs in the CRNs execute overlay mode transmission. The action space of the CRNs is defined as Equation (29).

4.1.3. Rewards

After the SR determines the action

x_{t}

based on the state

s_{t}

of frame t, an immediate reward

r_{t} (s_{t}, x_{t})

is obtained, where

r_{t} (s_{t}, x_{t})

represents the evaluation of choosing

x_{t}

under

s_{t}

. With the aim to optimize the throughput of the secondary networks with the AoI constraint,

r_{t} (s_{t}, x_{t})

is defined as

r_{t} (s_{t}, x_{t}) = \frac{1}{n + 1} (\frac{T_{t}}{T_{m i n}} - \sum_{i = 0}^{n} ρ_{t, i}),

(31)

where

ρ_{t, i}

denotes the penalty that is related to the AoI of ST

_{i}

. Since the actions vary with respect to the value of

s_{t}^{a}

, the values of

ρ_{t, i}

vary with respect to the value of

s_{t}^{a}

. When

s_{t}^{a} = 1

, we have

ρ_{t, i} = \{\begin{matrix} a_{t, i}, & T_{t, i} \geq T_{m i n}; \\ a_{t, i}^{2}, & else . \end{matrix}

(32)

When

s_{t}^{a} = 0

, we have

ρ_{t, i} = \{\begin{matrix} a_{t, i}, & T_{t, i} \geq T_{m i n}; \\ a_{t, i}^{2} m \frac{e_{t, i}}{E} α_{t, i}, & else, \end{matrix}

(33)

where m is a constant that is set according to

A_{m a x}

, and is used to ensure that

ρ_{t, i}

is larger than

A_{m a x}

.

\frac{e_{t, i}}{E} α_{t, i}

indicates that the penalty increases with

e_{t, i}

and

α_{t, i}

when

a_{t, i}

is larger than

A_{m a x}

. The SR in the ABCs does not determine actions of the energy allocation for STs, hence we set

\frac{e_{t, i}}{E}

in Equation (33) as

\frac{e_{t, i}}{E} = 1

.

4.2. Time and Energy Management by DDPG

DDPG utilizes the architecture of the actor-critic algorithm and the scheme of DQN. Therefore, DDPG consists of two parts, actor and critic. The actor is used to output a deterministic action, and the critic is used to output an evaluation, which fits the Q-table. Both actor and critic consist of evaluated networks and target networks. The target networks make the training process more stable, and have the same structure with the evaluated networks. The evaluated network of the actor is named as the actor network, and that of the critic is named as the critic network. The target network of the actor is named as the actor target network, and that of the critic is named as the critic target network.

These networks are expressed as parametric functions. The actor network is expressed as a function mapping

s_{t}

to

x_{t}

,

x_{t} = Π (s_{t} | ω),

(34)

where

Π

denotes the policy of time and energy management, and

ω

denotes the weights of neural network in the actor network. The critic network is expressed as an action-value function, which maps

s_{t}

and

x_{t}

to a Q-value,

Q = Q (s_{t}, x_{t} | μ),

(35)

where

μ

denotes the weights of the neural network in the critic network. Furthermore, the Q-value function is expressed as

Q = E [r_{t} (s_{t}, x_{t}) + γ [Q (s_{t + 1}, Π (s_{t + 1} | ω^{+}) | μ^{+})]],

(36)

where

ω^{+}

denotes the weights of the neural network in actor target network,

μ^{+}

denotes the weights of the neural network in critic target network, and

γ \in [0, 1]

denotes the discounting factor, which represents the effect of the future action choices.

In order to weaken the dependence of DDPG on hyper-parameters, the batch normalization [42] is adopted for DDPG, i.e., each layer in the neural networks of DDPG is connected to a batch normalization layer, which makes the DDPG less sensitive to the initial parameters, and prevents the unstable training process resulted from the unstable data distributions of each layer in the neural networks. The batch normalization accelerates the converge of DDPG, and efficiently avoids the gradient vanishing. Furthermore, due to the different value ranges of each factor in states, we normalize the input state of DDPG so that each factor in the state has the same value range.

Algorithm 1 finds the optimal policy for the time and energy management by DDPG. The exploration noise

N_{t}^{e}

in Algorithm 1 is used to fully explore the action space, in order to avoid being stuck in the local optimum policy. In the training process, the exploring noise decay factor

κ

restricts the exploration range. The weights

ω^{+}

and

μ^{+}

of the target networks are updated by the soft replacement that increases the stability of the evaluated networks.

Algorithm 1: Finding the optimal policy for the time and energy management by DDPG.

5. Simulation

In order to evaluate the performances of throughput and AoI, we compare the long-term average throughput

T

of the ABO-CRN with the AoI constraint with two baseline schemes, throughput-optimal (T-O) scheme and AoI-optimal (A-O) scheme. The T-O baseline scheme optimizes the throughput of the secondary network, and the A-O baseline scheme optimizes the AoI of STs. Furthermore, we compare the throughput and AoI performances among the ABO-CRN, ABCs, and CRNs to evaluate the impacts of

T_{m i n}

and

A_{m a x}

on the throughput and AoI performances. The simulation configuration is set as follows unless otherwise specified: The transmit power of PT

P = 17

kW, the bandwidth

W = 6

MHz, the AWGN

δ^{2} = 10^{- 3} μ W

, the energy capacity

E = 30 μ J

, and backscatter reflection efficiency

θ = 0.9

.

Figure 6 plots

T

and AoI of the ABO-CRN, T-O baseline scheme, and A-O baseline scheme with the minimum throughput requirement

T_{m i n}

under

P_{a} =

0.3, 0.6, 0.9. We observe from Figure 6a that

T

of the ABO-CRN decreases with

T_{m i n}

, and observe from Figure 6b that AoI of the ABO-CRN increases with

T_{m i n}

. For T-O baseline scheme, the throughput does not change with

T_{m i n}

, and the AoI increases faster than that of the ABO-CRN and A-O baseline scheme. The throughput of A-O baseline scheme decreases faster with

T_{m i n}

than that of the ABO-CRN. The AoI of A-O baseline scheme increases with

T_{m i n}

, is close to that of the ABO-CRN, and is lower than that of T-O baseline scheme. The reasons can be explained as follows. When

T_{m i n}

increases, each ST needs more throughput to reach the minimum throughput requirement. The SR in the ABO-CRN has to allocate more time and energy for the STs with high AoI and poor channel quality, and sacrifices the total throughput to satisfy the AoI constraint.

When

P_{a} = 0.9

, we observe that

T

in Figure 6a decreases faster than that when

P_{a} =

0.3 and 0.6, and the corresponding AoI in Figure 6b increases faster. The reason is provided as follows. When

P_{a} = 0.9

, due to the active channel state for the most time, STs only execute AB mode transmission for the most time. When

T_{m i n}

is higher than the expected throughput achieved by STs through AB mode transmission, the AoI increases for the most frames, and the number of frames of the increased AoI becomes more with

T_{m i n}

. Therefore, the average AoI increases with

T_{m i n}

. The expected throughput achieved by STs through AB mode transmission and overlay mode transmission when

P_{a} =

0.3 and 0.6 is higher than that when

P_{a} =

0.9. Therefore, the average AoI when

P_{a} =

0.9 increases faster with

T_{m i n}

than that when

P_{a} =

0.3 and 0.6. We also observe that when

T_{m i n}

is small, the curves of the ABO-CRN and two baseline schemes are close. The reason is that, the throughput of three schemes meets the minimum throughput requirement, and the AoI of them satisfies the AoI constraint. In addition, Figure 6 shows that

T

of the ABO-CRN is closer to that of the T-O than that of A-O, and the AoI of the the ABO-CRN is closer to that of the A-O than that of T-O. It indicates that DDPG finds the optimal policy of time and energy management to optimize the throughput, and satisfies the AoI constraint.

Figure 7 plots

T

and AoI of the ABO-CRN, T-O baseline scheme, and A-O baseline scheme with the maximum allowable AoI constraint

A_{m a x}

. We observe that both

T

and AoI increase with

A_{m a x}

, and when

A_{m a x}

is large, the throughput of the ABO-CRN and that of two baseline schemes are close. The reason is provided as follows. When

A_{m a x}

increases, the limitation of the AoI constraint on throughput becomes weak. The SR allocates more time and energy to the STs with high throughput, hence the throughput increases with

A_{m a x}

. With the increase of

A_{m a x}

, STs with high AoI (but not exceed

A_{m a x}

) and poor channel quality is allocated less time and energy, hence AoI of these STs increases, and the average AoI of STs increases. When

A_{m a x}

is large, all the three schemes satisfy the AoI constraint.

Figure 8 plots

T

and AoI in the ABO-CRN, ABCs, and CRNs with

T_{m i n}

, and Figure 9 plots

T

and AoI in the ABO-CRN, ABCs, and CRNs with

A_{m a x}

, under

P_{a} =

0.3, 0.6, 0.9. It is obvious that

T

in the ABO-CRN is higher than that in the ABCs and CRNs, and AoI in the the ABO-CRN is lower than that in the ABCs and CRNs. When

P_{a} =

0.3 and 0.9, the AoI in the ABCs and CRNs are high, and when

T

is large, STs in the ABCs and CRNs can not satisfy the AoI constraint. The reason is explained as follows. When

P_{a} =

0.3, the channel keeps inactive for the majority part of the time. For the ABCs, AB mode transmission has a few opportunities to be executed. From Figure 6, we infer that, when

T_{m i n}

is higher than the expected throughput achieved by each ST through AB mode transmission in a frame, STs in the ABCs are difficult to satisfy the AoI constraint by sacrificing the total throughput. Therefore, the throughput of the ABCs in this case keeps nearly unchanged. When

P_{a} =

0.9, the channel keeps active for the majority part of the time. For the CRNs, overlay mode transmission has few opportunities to be executed, hence the AoI of the CRNs is high, and the throughput of the CRNs keeps nearly unchanged. STs in the ABO-CRN execute AB mode transmission when the channel is active, and execute overlay mode transmission when the channel is inactive. As described in Equation (16), the expected throughput achieved by STs in the ABO-CRN is higher than that in the ABCs and CRNs. Therefore, the ABO-CRN achieves better throughput and AoI performances than that of the ABCs and CRNs.

6. Conclusions

We optimized the long-term throughput of the secondary network with the AoI constraint by jointly managing the time and energy for STs in the ABO-CRN, ABCs, and CRNs through DDPG. When the AoI constraint can not be satisfied, the impacts of time and energy allocation on the reward were investigated, and the corresponding reward functions was developed based on the channel states. We discussed the minimum throughput requirement and the maximum allowable AoI that are related to the throughput and AoI performances. We compared the throughput optimization scheme with the AoI constraint with T-O and A-O baseline schemes, and varied the minimum throughput requirement and maximum allowable AoI to evaluate the effects on the throughput and AoI performances of the secondary networks in the ABO-CRN, ABCs, and CRNs. We had following findings:

Throughput of the ABO-CRN is close to the optimal throughput of T-O baseline scheme, and the AoI of the ABO-CRN is close to the optimal AoI of A-O baseline scheme. DDPG finds the optimal policy of time and energy management to optimize the throughput, and satisfies the AoI constraint at the same time.
Throughput of the ABO-CRN is higher than that of A-O baseline scheme, and AoI of the ABO-CRN is lower than that of T-O baseline scheme. The observation validates the benefit of considering both throughput and AoI performances over only one metric.
The ABO-CRN improves the throughput and AoI performances of the ABCs and CRNs. Even in extreme cases, such as the long time active channel state, the ABO-CRN obtains better throughput and AoI performances than the ABCs and CRNs.
The lower bound of the maximum allowable AoI that makes STs satisfy the AoI constraint decreases with the total number of STs, and increases with the number of STs whose average throughput is smaller than the minimum throughput requirement.

Author Contributions

This research has been carried out through a concerted effort by seven months. Each author’s basic role has been summarizing in following: Conceptualization, X.J.; methodology, X.J.; software, X.J.; validation, K.Z., K.C., and X.L.; investigation, X.J., K.Z., K.C., and X.L.; writing—original draft preparation, X.J.; writing—review and editing, K.Z.; supervision, K.C.; funding acquisition, K.C. All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundation of China, Grant Number: 61902351, 61902353, 61872322. Zhejiang Provincial Natural Science Foundation of China, Grant Number: LY21F020022, LY21F020023, LR20F020003.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

We used the abbreviations in this paper:

RF	Radio frequency
CR	Cognitive radio
CRN	CR network
AB	Ambient backscatter
ABC	AB communication
AB-CRN	AB-assisted CRN
ABO-CRN	AB-assisted overlay CRN
ABU-CRN	AB-assisted underlay CRN
DRL	Deep reinforcement learning
DDPG	Deep deterministic policy gradient
AoI	Age of information
PU	Primary user
PT	Primary transmitter
PR	Primary receiver
SU	Secondary user
ST	Secondary transmitter
SR	Secondary receiver

References

Liu, X.; Zheng, K.; Chi, K.; Zhu, Y. Cooperative spectrum sensing optimization in energy-harvesting cognitive radio networks. IEEE Trans. Wirel. Commun. 2020, 19, 7663–7676. [Google Scholar] [CrossRef]
Zheng, K.; Liu, X.; Zhu, Y.; Chi, K.; Liu, K. Total throughput maximization of cooperative cognitive radio networks with energy harvesting. IEEE Trans. Wirel. Commun. 2019, 19, 533–546. [Google Scholar] [CrossRef]
Siegel, J.E.; Kumar, S.; Sarma, S.E. The future Internet of Things: Secure, efficient, and model-based. IEEE Internet Things J. 2018, 5, 2386–2398. [Google Scholar] [CrossRef] [Green Version]
Zhang, S.; Kong, S.; Chi, K.; Huang, L. Energy management for secure transmission in wireless powered communication networks. IEEE Internet Things J. 2021, 9, 1171–1181. [Google Scholar] [CrossRef]
Niyato, D.; Kim, D.I.; Han, Z.; Maso, M. Wireless powered communication networks: Architectures, protocol designs, and standardization. IEEE Wirel. Commun. 2016, 23, 8–9. [Google Scholar] [CrossRef]
Chi, K.; Zhu, Y.; Li, Y.; Huang, L.; Xia, M. Minimization of transmission completion time in wireless powered communication networks. IEEE Internet Things J. 2017, 4, 1671–1683. [Google Scholar] [CrossRef]
Yang, L.; Zhou, Y.J.; Zhang, C.; Zhang, X.; Yang, X.; Tan, C. Compact multiband wireless energy harvesting based battery-free body area networks sensor for mobile healthcare. IEEE J. Electromagn. Microw. Med. Biol. 2018, 2, 109–115. [Google Scholar] [CrossRef]
Chi, K.; Chen, Z.; Zheng, K.; Zhu, Y.H.; Liu, J. Energy provision minimization in wireless powered communication networks with network throughput demand: TDMA or NOMA. IEEE Trans. Commun. 2019, 67, 6401–6414. [Google Scholar] [CrossRef]
Hoang, T.M.; El Shafie, A.; da Costa, D.B.; Duong, T.; Tuan, H.; Marshall, A. Security and energy harvesting for MIMO-OFDM networks. IEEE Trans. Commun. 2019, 68, 2593–2606. [Google Scholar] [CrossRef] [Green Version]
Azarhava, H.; Musevi Niya, J. Energy efficient resource allocation in wireless energy harvesting sensor networks. IEEE Wirel. Commun. Lett. 2020, 9, 1000–1003. [Google Scholar] [CrossRef]
Ghosh, D.; Hanawal, M.K.; Zlatanov, N. Learning to optimize energy efficiency in energy harvesting wireless sensor networks. IEEE Wirel. Commun. Lett. 2021, 10, 1153–1157. [Google Scholar] [CrossRef]
Gu, Z.; Shen, T.; Wang, Y.; Lau, F.C.M. Efficient rendezvous for heterogeneous interference in cognitive radio networks. IEEE Trans. Wirel. Commun. 2020, 19, 91–105. [Google Scholar] [CrossRef]
Sangdeh, P.K.; Pirayesh, H.; Quadri, A.; Zeng, H. A practical spectrum sharing scheme for cognitive radio networks: Design and experiments. IEEE/ACM Trans. Netw. 2020, 28, 1818–1831. [Google Scholar] [CrossRef]
Zheng, K.; Liu, X.; Liu, X.; Zhu, Y. Hybrid overlay-underlay cognitive radio networks with energy harvesting. IEEE Trans. Commun. 2019, 67, 4669–4682. [Google Scholar] [CrossRef]
Papadopoulos, A.; Chatzidiamantis, N.D.; Georgiadis, L. Network coding techniques for primary-secondary user cooperation in cognitive radio networks. IEEE Trans. Wirel. Commun. 2020, 19, 4195–4208. [Google Scholar] [CrossRef]
Rathee, G.; Jaglan, N.; Garg, S.; Choi, B.J.; Choo, K.K.R. A secure spectrum handoff mechanism in cognitive radio networks. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 959–969. [Google Scholar] [CrossRef]
Hoang, D.T.; Niyato, D.; Kim, D.I.; Van Huynh, N.; Gong, S. Ambient Backscatter Communication Networks, 1st ed.; Cambridge University Press: Cambridge, UK, 2020; pp. 18–93. [Google Scholar]
Liu, V.; Parks, A.; Talla, V.; Gollakota, S.; Wetherall, D.; Smith, J.R. Ambient backscatter: Wireless communication out of thin air. ACM SIGCOMM Comput. Commun. Rev. 2013, 43, 39–50. [Google Scholar] [CrossRef]
Ye, Y.; Shi, L.; Chu, X.; Lu, G. On the outage performance of ambient backscatter communications. IEEE Internet Things J. 2020, 7, 7265–7278. [Google Scholar] [CrossRef]
Liu, W.; Shen, S.; Tsang, D.H.; Murch, R. Enhancing ambient backscatter communication utilizing coherent and non-coherent space-time codes. IEEE Trans. Wirel. Commun. 2022, 20, 6884–6897. [Google Scholar] [CrossRef]
Madavani, F.K.; Soleimanpour-Moghadam, M.; Talebi, S.; Chatzinotas, S.; Ottersten, B. Joint resource allocation for full-duplex ambient backscatter communication: A difference convex algorithm. IEEE Trans. Wirel. Commun. 2022, in press. [Google Scholar] [CrossRef]
Hoang, D.T.; Niyato, D.; Wang, P.; Kim, D.I.; Han, Z. Ambient backscatter: A new approach to improve network performance for RF-powered cognitive radio networks. IEEE Trans. Commun. 2017, 65, 3659–3674. [Google Scholar] [CrossRef]
Zhuang, Y.; Li, X.; Ji, H.; Zhang, H.; Leung, V.C.M. Optimal resource allocation for RF-powered underlay cognitive radio networks with ambient backscatter communication. IEEE Trans. Veh. Technol. 2020, 69, 15216–15228. [Google Scholar] [CrossRef]
Zhu, K.; Xu, L.; Niyato, D. Distributed resource allocation in RF-powered cognitive ambient backscatter networks. IEEE Trans. Green Commun. Netw. 2021, 5, 1657–1668. [Google Scholar] [CrossRef]
Kaul, S.; Yates, R.; Gruteser, M. Real-time status: How often should one update. In Proceedings of the IEEE INFOCOM, Orlando, FL, USA, 25–30 March 2012. [Google Scholar]
Leng, S.; Yener, A. Age of information minimization for an energy harvesting cognitive radio. IEEE Trans. Cogn. Commun. Netw. 2019, 5, 427–439. [Google Scholar] [CrossRef]
Gu, Y.; Chen, H.; Zhai, C.; Li, Y.; Vucetic, B. Minimizing age of information in cognitive radio-based IoT systems: Underlay or overlay? IEEE Internet Things J. 2019, 6, 10273–10288. [Google Scholar] [CrossRef] [Green Version]
Wang, Q.; Chen, H.; Gu, Y.; Li, Y.; Vucetic, B. Minimizing the age of information of cognitive radio-based IoT systems under a collision constraint. IEEE Trans. Wirel. Commun. 2020, 19, 8054–8067. [Google Scholar] [CrossRef]
Abbas, Q.; Zeb, S.; Hassan, S.A. Wireless-Powered Backscatter Communications for Internet of Things, 1st ed.; Springer International Publishing: Cham, Switzerland, 2021; pp. 67–80. [Google Scholar]
Sutton, G.J.; Zeng, J.; Liu, R.P.; Ni, W.; Nguyen, D.N.; Jayawickrama, B.A.; Huang, X.; Abolhasan, M.; Zhang, Z.; Dutkiewicz, E.; et al. Enabling technologies for ultra-reliable and low latency communications: From PHY and MAC layer perspectives. IEEE Commun. Surv. Tuts. 2019, 21, 2488–2524. [Google Scholar] [CrossRef]
Rajaraman, N.; Vaze, R.; Reddy, G. Not just age but age and quality of information. IEEE J. Sel. Area Commun. 2021, 39, 1325–1338. [Google Scholar] [CrossRef]
Liu, Q.; Zeng, H.; Chen, M. Minimizing age-of-information with throughput requirements in multi-path network communication. In Proceedings of the Twentieth ACM International Symposium on Mobile Ad Hoc Networking and Computing, New York, NY, USA, 2 July 2019. [Google Scholar]
Kadota, I.; Sinha, A.; Modiano, E. Scheduling algorithms for optimizing age of information in wireless networks with throughput constraints. IEEE/ACM Trans. Netw. 2019, 27, 1359–1372. [Google Scholar] [CrossRef]
Bhat, R.V.; Vaze, R.; Motani, M. Throughput maximization with an average age of information constraint in fading channels. IEEE Trans. Commun. 2021, 20, 481–494. [Google Scholar] [CrossRef]
Luong, N.; Hoang, D.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.; Kim, D.I. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun. Surv. Tuts. 2019, 21, 3133–3174. [Google Scholar] [CrossRef] [Green Version]
Zhu, B.; Chi, K.; Liu, J.; Yu, K.; Mumtaz, S. Efficient offloading for minimizing task computation delay of NOMA-based multi-access edge computing. IEEE Trans. Commun. 2022, in press. [Google Scholar] [CrossRef]
Wei, Y.; Yu, F.; Song, M.; Han, Z. User scheduling and resource allocation in hetNets with hybrid energy supply: An actor-critic reinforcement learning approach. IEEE Trans. Wirel. Commun. 2018, 17, 680–692. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the ICLR, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Yan, Z.; Chen, S.; Zhang, X.; Liu, H. Outage performance analysis of wireless energy harvesting relay-assisted random underlay cognitive networks. IEEE Internet Things J. 2018, 5, 2691–2699. [Google Scholar] [CrossRef]
Huynh, N.V.; Hoang, D.T.; Nguyen, D.N.; Dutkiewicz, E.; Niyato, D.; Wang, P. Reinforcement learning approach for RF-powered cognitive radio network with ambient backscatter. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018. [Google Scholar]
Taricco, G. On the convergence of multipath fading channel gains to the rayleigh distribution. IEEE Wirel. Commun. Lett. 2015, 4, 549–552. [Google Scholar] [CrossRef] [Green Version]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the PMLR, Lille, France, 7–9 July 2015. [Google Scholar]

Figure 1. System model and frame structure of the AB-assisted overlay CRN: (a) depicts the system model when

s_{t}^{a} = 1

, i.e., the channel is active. (b) depicts the time frame structure when

s_{t}^{a} = 1

. (c) depicts the system model when

s_{t}^{a} = 0

, i.e., the channel is inactive. (d) depicts the time frame structure when

s_{t}^{a} = 0

.

Figure 1. System model and frame structure of the AB-assisted overlay CRN: (a) depicts the system model when

s_{t}^{a} = 1

, i.e., the channel is active. (b) depicts the time frame structure when

s_{t}^{a} = 1

. (c) depicts the system model when

s_{t}^{a} = 0

, i.e., the channel is inactive. (d) depicts the time frame structure when

s_{t}^{a} = 0

.

Figure 2. System model and frame structure of the ABCs: (a) depicts the system mode when

s_{t}^{a} = 1

. (b) depicts the time frame structure when

s_{t}^{a} = 1

.

Figure 2. System model and frame structure of the ABCs: (a) depicts the system mode when

s_{t}^{a} = 1

. (b) depicts the time frame structure when

s_{t}^{a} = 1

.

Figure 3. System model and frame structure of the CRNs: (a) depicts the system model when

s_{t}^{a} = 1

. (b) depicts the time frame structure when

s_{t}^{a} = 1

. (c) depicts the system model when

s_{t}^{a} = 0

. (d) depicts the time frame structure when

s_{t}^{a} = 0

.

Figure 3. System model and frame structure of the CRNs: (a) depicts the system model when

s_{t}^{a} = 1

. (b) depicts the time frame structure when

s_{t}^{a} = 1

. (c) depicts the system model when

s_{t}^{a} = 0

. (d) depicts the time frame structure when

s_{t}^{a} = 0

.

Figure 4. Flow chart of the AB-assisted overlay CRN.

Figure 5. The action space diagram.

Figure 6. Throughput

T

and AoI versus

T_{m i n}

under

P_{a} =

0.3 (circle), 0.6 (square), 0.9 (triangle) compared with that of T-O and A-O baseline schemes, and

A_{m a x} = 5

: (a) describes the throughput performance. (b) describes the AoI performance.

Figure 6. Throughput

T

and AoI versus

T_{m i n}

under

P_{a} =

0.3 (circle), 0.6 (square), 0.9 (triangle) compared with that of T-O and A-O baseline schemes, and

A_{m a x} = 5

: (a) describes the throughput performance. (b) describes the AoI performance.

Figure 7. Throughput

T

and AoI versus

A_{m a x}

under

P_{a} =

0.3 (circle), 0.6 (square), 0.9 (triangle) compared with that of T-O and A-O baseline schemes, and

T_{m i n} = 700

bps: (a) describes the throughput performance. (b) describes the AoI performance.

Figure 7. Throughput

T

and AoI versus

A_{m a x}

under

P_{a} =

0.3 (circle), 0.6 (square), 0.9 (triangle) compared with that of T-O and A-O baseline schemes, and

T_{m i n} = 700

bps: (a) describes the throughput performance. (b) describes the AoI performance.

Figure 8. Throughput

T

and AoI versus

T_{m i n}

under

P_{a} =

0.3 (circle), 0.6 (square), 0.9 (triangle) compared with that in the ABCs and CRNs, and

A_{m a x} = 5

: (a) describes the throughput performance. (b) describes the AoI performance.

Figure 8. Throughput

T

and AoI versus

T_{m i n}

under

P_{a} =

0.3 (circle), 0.6 (square), 0.9 (triangle) compared with that in the ABCs and CRNs, and

A_{m a x} = 5

: (a) describes the throughput performance. (b) describes the AoI performance.

Figure 9. Throughput

T

and AoI versus

A_{m a x}

under

P_{a} =

0.3 (circle), 0.6 (square), 0.9 (triangle) compared with that in the ABCs and CRNs, and

T_{m i n} = 700

bps: (a) describes the throughput performance. (b) describes the AoI performance.

Figure 9. Throughput

T

and AoI versus

A_{m a x}

under

P_{a} =

0.3 (circle), 0.6 (square), 0.9 (triangle) compared with that in the ABCs and CRNs, and

T_{m i n} = 700

bps: (a) describes the throughput performance. (b) describes the AoI performance.

Table 1. Comparison Table of the Related works.

Scenario	Metric	Limitations
CRNs	Throughput [14,15], AoI [26,27,28]	Short-term optimization [14,15,27], single ST [14,15,26,27,28], single metric optimization, single resource management [14,15,26,27].
ABCs	Outage probability [19], backscatter efficiency [20], throughput [21], AoI [29]	Short-term optimization, single resource management [19,20,21], single metric optimization [19,20,21,29]
AB-CRNs	Throughput [22,23], coverage probability [24]	Short-term optimization [22,23], single ST [22,23,26], single metric optimization [22,23,24], single resource management [22,23].

Table 2. Parameter list.

Parameter	Description
n	The number of STs is n + 1
$s_{t}^{a}$	The channel state in frame t
$P_{a}$	The probability of the active channel state
E	The capacity of rechargeable capacitor
$e_{t, i}$	The allocated energy for overlay mode transmission of ST $_{i}$
$ε_{t, i}$	The available energy of ST $_{i}$ in frame t
$α_{t, i}$	The duration of data transmission by ST $_{i}$ in frame t
$T_{t}$	The total throughput of secondary network in frame t
$T_{t, i}$	The throughput of ST $_{i}$ in frame t
$T_{t}^{A}$	The throughput of STs by AB mode transmission
$T_{t}^{O}$	The throughput of STs by overlay mode transmission
$T_{m i n}$	The minimum throughput requirement for each ST
W	The bandwidth
$P$	The transmit power of the PT
$g_{t, i}$	The channel gain from the PT to ST $_{i}$ in frame t
$h_{t, i}$	The channel gain from ST $_{i}$ to gateway in frame t
$θ$	The backscatter reflection coefficient
$δ^{2}$	The variance of AWGN
$a_{t, i}$	The AoI of ST $_{i}$ in frame t
$A_{m a x}$	The maximum allowable AoI

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jia, X.; Zheng, K.; Chi, K.; Liu, X. DDPG-Based Throughput Optimization with AoI Constraint in Ambient Backscatter-Assisted Overlay CRN. Sensors 2022, 22, 3262. https://doi.org/10.3390/s22093262

AMA Style

Jia X, Zheng K, Chi K, Liu X. DDPG-Based Throughput Optimization with AoI Constraint in Ambient Backscatter-Assisted Overlay CRN. Sensors. 2022; 22(9):3262. https://doi.org/10.3390/s22093262

Chicago/Turabian Style

Jia, Xueli, Kechen Zheng, Kaikai Chi, and Xiaoying Liu. 2022. "DDPG-Based Throughput Optimization with AoI Constraint in Ambient Backscatter-Assisted Overlay CRN" Sensors 22, no. 9: 3262. https://doi.org/10.3390/s22093262

APA Style

Jia, X., Zheng, K., Chi, K., & Liu, X. (2022). DDPG-Based Throughput Optimization with AoI Constraint in Ambient Backscatter-Assisted Overlay CRN. Sensors, 22(9), 3262. https://doi.org/10.3390/s22093262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DDPG-Based Throughput Optimization with AoI Constraint in Ambient Backscatter-Assisted Overlay CRN

Abstract

1. Introduction

2. System Model

2.1. Structures and Channel Models

2.2. Network Models

2.2.1. Network Model of ABO-CRN

2.2.2. Network Model of ABCs

2.2.3. Network Model of CRNs

3. Formulation and Analysis of the Problem

3.1. Throughput Definition

3.1.1. Throughput Definition of ABO-CRN

3.1.2. Throughput Definition of ABCs

3.1.3. Throughput Definition of CRNs

3.2. Definition of AoI

3.3. Problem Formulation

3.4. Analysis of T m i n and A m a x

4. Policies of Time and Energy Management

4.1. Definitions of Spaces and Rewards

4.1.1. State Space

4.1.2. Action Space

4.1.3. Rewards

4.2. Time and Energy Management by DDPG

5. Simulation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.4. Analysis of $T_{m i n}$ and $A_{m a x}$