DDPG-Based Throughput Optimization with AoI Constraint in Ambient Backscatter-Assisted Overlay CRN

The combination of ambient backscatter (AB) communications (ABCs) and RF-powered cognitive radio networks (CRNs) deals with challenges of both energy supply and spectrum shortage, and improves the network performances. With the expansion of wireless networks, many applications raise requirements for both high-throughput and timely data. Driven by these facts, we study the long-term throughput optimization of the secondary network in the AB-assisted overlay CRN (ABO-CRN), ABCs, and CRNs with the age of information (AoI) constraint, which is a novel metric for measuring the freshness of data received by receivers. Due to the dynamic environment, complete knowledge of the environment could not be obtained. Then, the deep deterministic policy gradient (DDPG), a deep reinforcement learning (DRL) method that addresses decision issues in both continuous and discrete spaces, is deployed to address the throughput optimization. We consider the impacts of time and energy allocation on the reward when the AoI constraint can not be satisfied, and develop the corresponding reward functions. Furthermore, we analyze the impacts of the minimum throughput requirement and maximum allowable AoI on the throughput and AoI of the secondary networks in the ABO-CRN, ABCs, and CRNs. We compare the throughput optimization scheme under the AoI constraint with two baseline schemes (i.e., throughput-optimal (T-O) and AoI-optimal (A-O) baseline schemes), and the simulation results show that the throughput of the ABO-CRN is close to the optimal throughput of the T-O baseline scheme, and the AoI of the ABO-CRN is close to the optimal AoI of the A-O baseline scheme.


Introduction
Nowadays, the number of wireless devices has increased year by year, and the amount of spectrum demands has also increased [1]. However, the amount of spectrum is limited, and the majority of spectrum has been allocated to licensed users (primary users, PUs) as the licensed spectrum. PUs occupy the licensed spectrum at a certain time and places, and the licensed spectrum may not be occupied for a long time. In this case, the utilization ratio of the licensed spectrum is low [2]. In addition, due to the limited size, wireless devices can not carry large-capacity batteries, and frequent replacement of batteries is not allowed under certain circumstances, such as the inside of chimneys and the inside of bodies [3]. Therefore, issues of spectrum shortage and energy supply attract a large number of researchers. The RF-powered ambient backscatter (AB)-assisted cognitive radio networks (CRNs) (AB-CRNs) are introduced to improve the spectrum utilization ratio and alleviate the difficulty of energy supply. We introduce the RF-powered AB-CRNs from the following three aspects: the energy harvesting technology, the CRNs, and the AB communications (ABCs).
Energy harvesting technology, which allows wireless devices to scavenge energy from the environmental energy sources, is an efficient way to address the difficulty of energy supply [2,4]. The energy sources can be solar, wind, heat, and RF signals, etc. Some energy sources are unstable and conditionally available. Taking solar as an example, AB technology, the combination of AB technology and RF-powered CRNs is potentially helpful for dealing with spectrum shortage and energy insufficiency. The RF-powered AB-CRNs are energy-saving and spectrum-saving [22]. As far as we know, Hoang et al. in [22] initially introduced the RF-powered AB-CRNs, and analyzed the throughput of the AB-assisted overlay CRN (ABO-CRN) and the AB-assisted underlay CRN (ABU-CRN). Extended from [22], Zhuang et al. [23] studied the RF-powered AB-CR-NOMA networks, where STs perform underlay mode transmission. In [24], Zhu et al. investigated the distributed resource allocation in AB-CRNs.
In [22,23], authors investigated popular metrics, such as throughput. Besides, the age of information (AoI), a novel metric that measures the freshness of data received at the receiver, has also been extensively studied in recent years [25]. Different from delay metric, AoI focuses on the data timeliness of the receiver, while delay metric focuses on that of the transmitter. Authors in [26][27][28] investigated the AoI minimization in the CRNs. In [26], Leng et al. utilized the partially observation Markov decision process to analyze the AoI performance in several cases. In [27], Gu et al. studied the AoI in overlay and underlay scenarios, and analyzed the effect of critical generation rate of the primary IoT on the secondary IoT. In [28], Wang et al. took the collision constraint into account to minimize the long-term average AoI. The AoI minimization challenge in ABCs is also a fascinating and important research topic. In [29], Abbas et al. focused on the minimization of AoI in the backscatter communications, and introduced several algorithms for the AoI minimization.
With the expansion of wireless networks, timely delivery is required [30], and in the scenarios, such as cyber-physical system, low-quality but timely data is useless [31]. However, the throughput optimization can not guarantee the freshness of data, and AoI optimization can not guarantee the quality of data. Liu et al. [32] investigated the AoI minimization under throughput requirements in the multi-path network. Kadota et al. [33] proposed a low-complexity scheduling algorithm for the AoI minimization with throughput constraints of the wireless network. Bhat et al. [34] studied the throughput maximization under the AoI constraint in fading channels.
Obviously, compared with the optimization of a single frame, long-term optimization is more practical. In practice, the network environment is dynamic, and the channel quality, such as the channel gain, varies with the frame. The optimization of a single frame is limited by the channel quality, and affects that of the subsequent frames. The short-term optimization ignores the connection between the optimization of the current frame and that of subsequent frames, which degrades the performance of the network. However, the long-term optimization takes the aforementioned connection into account, and provides more practical decision to enhance the performance of the network. Taking the throughput optimization of a single frame by energy management as the example, the throughput increases with the consumed energy, hence consuming all the available energy of the frame is optimal for the short-term optimization. However, consuming all the available energy in the frame with poor channel quality leads to the lack of available energy in the frame with good channel quality. Therefore, the long-term optimization is more practical than the short-term optimization. Due to the dynamic and uncertain network parameters, the complete knowledge about the network could not be obtained in advance. Some traditional methods are incapable of addressing challenges with too many dynamic and unpredictable environmental parameters. Deep reinforcement learning (DRL) has been proved as an effective way to tackle the challenge [35,36]. When applying value-based DRLs, the action space for DRLs has to be discrete. If the discrete methods are improper for the scenario, vital information may be lost, or the action space dimension may be too large [37]. Different from value-based DRLs, DRLs based on the policy gradient are able to deal with the problems of continuous spaces, and have no need to discretize the action space. Policy gradient-based DRLs have been applied into the field of the wireless networks [38].
Taking the limitations summarized in Table 1 into account, we conclude the novelties and contributions as follows. Considering the problems of spectrum shortage and energy insufficiency, we focus on AB-CRNs, while a majority of researches in AB-CRNs evaluated the network performances such as throughput, energy consumption, delay, etc., and ignored the data freshness of the secondary receivers (SRs). Driven by the fact, we optimize the long-term throughput of the secondary network in the ABO-CRN with the AoI constraint, in order to guarantee the high throughput and data freshness of the secondary network. According to our knowledge, we are the first to study the optimization of both throughput and AoI in the research area of AB-CRNs. The main contributions are summarized as follows. • In order to achieve the long-term throughput optimization of the secondary network with the AoI constraint, we utilize deep deterministic policy gradient (DDPG), a DRL based on the policy gradient, to find the optimal policy for jointly managing time and energy of STs. Considering the impacts of time and energy allocation on the reward when the AoI constraint can not be satisfied, we develop the corresponding reward functions with respect to the channel states. • We analyze the minimum throughput requirement and the maximum allowable AoI for the throughput and AoI performances in the ABO-CRN, ABCs, and CRNs. • We introduce throughput-optimal (T-O) and AoI-optimal (A-O) baseline schemes as comparisons for the throughput optimization with the AoI constraint. The simulation results show that the throughput of the ABO-CRN is close to the optimal throughput of the T-O baseline scheme, and the AoI of the ABO-CRN is close to the optimal AoI of the A-O baseline scheme. • We evaluate the impacts of the minimum throughput requirement and maximum allowable AoI on the throughput and AoI performances of the secondary networks in the ABO-CRN, ABCs, and CRNs, and demonstrate that the ABO-CRN improves the throughput and AoI performances of the ABCs and CRNs.
The remainder of this paper is organized as follows. In Section 2, we introduce the network model and operations of STs in the ABO-CRN, ABCs, and CRNs. In Section 3, we introduce the problem formulation, such as throughput and AoI definitions. In Section 4, we utilize DDPG to find the optimal policy for jointly managing time and energy of STs. In Section 5, simulation results are shown. In Section 6, we conclude the paper.

System Model
In this section, we first depict the structures and channel models of the ABO-CRN, ABCs, and CRNs, and then introduce the network models of the ABO-CRN, ABCs, and CRNs, respectively.

Structures and Channel Models
The ABO-CRN, ABCs, and CRNs are composed of a primary network and a secondary network. The secondary network consists of a SR and n + 1 STs, n ∈ {0, 1, . . . }. In the primary network, the primary transmitter (PT) utilizes the licensed channel to transmit data. The probability that the PT occupies the channel in each frame, denoted by P a , can be obtained through the long time observation. When the PT transmits data in frame t ∈ {1, 2, . . . , K}, the channel state is active, denoted by s a t = 1. When the PT does not transmit data in frame t, the channel state is inactive, denoted by s a t = 0. In the secondary network, as the random distribution of the SUs in [39], the SR and STs are randomly placed within the coverage of the primary RF signals from the PT, as shown in the Figure 1a,c. Each ST is equipped with a single antenna and a rechargeable capacitor with finite capacity E. The SR is equipped with a single antenna and a wired energy source, hence there is no need to consider the energy supply of the SR. In order to measure the freshness of the data that the SR receives from STs, SR has capability to record the AoI of the data from each ST. Similar as [40], the SR plays the role of center controller to manage the time and energy for STs. At the very beginning of the frame, the SR senses the channel, and then provides the allocation of time and energy for STs.
In the considered scenario, frames with equal duration are successive. The frame duration is synchronized with the primary network, and without loss of generality, we normalize the frame duration as 1 [2]. As shown in Figures 1b,d, 2b and 3b,d, each frame consists of one or more slots. The duration of each slot is determined by the SR. We consider that the channel state remains unchanged in one frame, but varies in subsequent frames. When s a t = 1, the SR and STs receive stable and continuous RF signals from the PT. In frame t, the channel gain between the PT and ST i , i = 0, 1, . . . , n, is denoted by g t,i , and that between ST i and the SR is denoted by h t,i . The Rayleigh distribution [41] is used to formulate the channel gains that remain unchanged in one frame, and the channel noise is modeled as Additive White Gaussian Noise (AWGN) with variance δ 2 .

Network Model of ABO-CRN
The operations executed by the STs in the ABO-CRN depend on the value of s a t . When s a t = 1, as shown in Figure 1a,b, STs execute AB mode transmission by TDMA scheme, and ST i harvests energy when the other STs transmit data in AB mode. We consider the scenario where the energy consumption of AB mode transmission is negligible. Therefore, no dedicated slot for energy harvesting is required by STs. When s a t = 0, as shown in Figure 1c,d, following the TDMA scheme, STs execute overlay mode transmission by consuming the energy stored in the rechargeable capacitor. With the aim to optimize the long-term average throughput with the AoI constraint, ST i may not consume all the available energy ε t,i during frame t. The energy e t,i consumed by ST i , denoted by e t,i , is determined by the SR.
We provide a flow chart of the ABO-CRN in Figure 4. The actions in Figure 4 are executed by STs according to s a t . The SR decides the time and energy allocation of STs according to the channel information and states of each ST. The channel information includes the channel state s a t and channel gains. The states of each ST include the available energy and the AoI of the current frame. Note that, the feedback information in Figure 4 includes two parts. The first part is the new energy state in each ST, which is the available energy of the next frame. The second part is the received reward after STs execute the actions decided by the SR.

Network Model of ABCs
The operations executed by the STs in the ABCs depend on the value of s a t . When s a t = 1, as shown in Figure 2, STs take turns to execute AB mode transmission by TDMA scheme. When s a t = 0, since the PT does not broadcast RF signals on the current channel, STs do not execute AB mode transmission.

Network Model of CRNs
The operations executed by the STs in the CRNs depend on the value of s a t . When s a t = 1, as shown in Figure 3a,b, STs harvest energy. When s a t = 0, as shown in Figure 3c,d, following the TDMA scheme, STs execute overlay mode transmission by consuming the energy stored in the rechargeable capacitor. The energy e t,i consumed by ST i is determined by the SR.

Formulation and Analysis of the Problem
For the readability, we provide a parameter list in Table 2 that summarizes the main parameters and meanings.

Parameter
Description n The number of STs is n + 1 s a t The channel state in frame t P a The probability of the active channel state E The capacity of rechargeable capacitor e t,i The allocated energy for overlay mode transmission of ST i ε t,i The available energy of ST i in frame t α t,i The duration of data transmission by ST i in frame t T t The total throughput of secondary network in frame t T t,i The The throughput of STs by overlay mode transmission T min The minimum throughput requirement for each ST W The bandwidth P The transmit power of the PT g t,i The channel gain from the PT to ST i in frame t h t,i The channel gain from ST i to gateway in frame t θ The backscatter reflection coefficient δ 2 The variance of AWGN a t,i The AoI of ST i in frame t A max The maximum allowable AoI

Throughput Definition
The total throughput T t of the secondary network in the ABO-CRN, ABCs, and CRNs in frame t can be expressed as where T t,i denotes the throughput of ST i in frame t. Due to the fact that the operations executed by STs depend on the channel state s a t , the calculation of T t,i depends on the value of s a t .

Throughput Definition of ABO-CRN
In the ABO-CRN, T t,i is expressed as When s a t = 1, according to the Shannon Theory, the throughput of ST i by AB mode transmission, denoted by T A t,i , is expressed as where α t,i ∈ [0, 1] denotes the duration of data transmission by ST i through AB mode, W denotes the bandwidth, θ ∈ [0, 1] denotes the backscatter reflection coefficient that depends on the electronic component factors, P denotes the transmit power of the PT, and g t,i denotes the channel gain from the PT to ST i , and h t,i denotes the channel gain from ST i to the SR. In particular, θP g t,i represents the transmit power of ST i for AB mode transmission. ST i harvests energy when the other STs transmit data in AB mode. The harvested energy of ST i , denoted by e h t,i , is calculated as After energy harvesting, the available energy in ST i of frame t + 1, denoted by ε t+1,i , is updated as When s a t = 0, according to the Shannon Theory, the throughput of ST i by overlay mode transmission, denoted by T O t,i , is expressed as where α t,i ∈ [0, 1] denotes the duration of data transmission by ST i through overlay mode, and e t,i ∈ [0, ε t,i ] denotes the energy consumed for overlay mode transmission in frame t.
In particular, e t,i is determined by the SR, and ε t+1,i in ST i of frame t + 1 is updated as 3.

Throughput Definition of ABCs
In the ABCs, the throughput of STs is achieved by AB mode transmission when the channel state is active. Therefore, when s a t = 0, T t,i = 0 holds, and when s a t = 1, according to the Shannon Theory, T t,i is expressed as where α t,i , θ, P, g t,i , and h t,i represent the same meaning as that in Equation (3). Since the energy consumption of AB mode transmission is negligible, the energy update is not considered in the ABCs.

Throughput Definition of CRNs
In the CRNs, the throughput of STs is achieved by overlay mode transmission when the channel is inactive. When the channel is active, STs harvest energy from the RF signal of the PT. Therefore, when s a t = 1, T t,i = 0 holds, and ε t+1,i in ST i of frame t + 1 is updated as When s a t = 0, according to the Shannon Theory, T t,i is expressed as ε t+1,i in ST i of frame t + 1 is updated as that in Equation (7).

Definition of AoI
AoI is a novel metric to measure the freshness of data received by the receiver. In particular, AoI is used to track the time elapsed since the time point of the latest data generation to the time point that the latest data is successfully received by the receiver [33]. We utilize the linear scheme to calculate AoI of STs, where the AoI is updated as where a t,i denotes the AoI of ST i in frame t, λ t,i = 1 indicates that the latest data of ST i is successfully received by the SR, and λ t,i = 0 indicates that the latest data of ST i is not successfully received by the SR. With the aim to optimize the long-term average throughput of the secondary network with the AoI constraint, we set a minimum throughput requirement T min for every ST. Specifically, if the throughput of ST i during frame t is no less than T min , the latest transmitted data of ST i is considered to be successfully received by the SR. Based on Equation (11) and the aforementioned analysis of λ t,i , λ t,i is expressed as By combining Equations (11) and (12), the update of AoI is calculated as Obviously, when s a t = 0 in the ABCs, STs achieve negligible throughput, hence the throughput of each ST can not exceed T min , a t+1,i = a t,i + 1 holds. When s a t = 1 in the CRNs, the same conclusion holds.

Problem Formulation
The throughput optimization objective function of the ABO-CRN, ABCs, and CRNs is expressed as where A max denotes the maximum allowable AoI that the secondary network tolerates, and Equation (14d) indicates that the average accumulated AoI should be smaller than A max . Since the energy consumed by AB mode transmission of STs is negligible, the SR in the ABCs does not need consider the constraint in Equation (14c).

Analysis of T min and A max
In this subsection, we analyze T min and A max in the ABO-CRN, ABCs, and CRNs. The expectation of the long-term average throughput in the ABO-CRN, ABCs, and CRNs is expressed as According to Equations (2), (8) and (10), we have T − denotes the long-term average throughput of each ST whose average throughput is smaller than T min , T + denotes that is no smaller than T min , a denotes the long-term average AoI of each ST whose average throughput is smaller than T min , and let N equal n + 1.

Lemma 1.
With E T t,i ≤ T min holds, when T − is closer to T min , the tolerable value interval of a is larger. When a > A max − 1, the AoI constraint can not be satisfied.

Proof.
We assume there are x STs with T − , and N − x STs with T + . We have Since E T t,i ≤ T min holds, we have In order to satisfy the AoI constraint, Equation (14d) is updated to Bring x = N(T −T + ) T − −T + into Equation (19), and we have Obviously, when T − is closer to T min , the tolerable value interval of a is larger. Since T − < T holds, the T − −T + T −T + < 1 holds, hence a can not exceed A max − 1. Therefore, when a > A max − 1, the AoI constraint can not be satisfied. The proof is completed.

Lemma 2.
The lower bound of A max that makes STs satisfy the Equation (14d) decreases with n, and increases with the number of STs whose average throughput is smaller than T min .

Proof.
We assume x is the number of STs whose average throughput is smaller than T min , a has been given. From Equation (19), we deduce With the larger value of n, the lower bound of A max that makes STs satisfy the Equation (14d) decreases. With the larger value of x, the lower bound for A max increases. The proof is completed.
Then we compare the impacts of T min and A max on the ABO-CRN, ABCs, and CRNs. We discuss the impacts in some extremely cases, i.e., P a , the probability of s a t = 1, is relatively small or relatively large. In the ABCs, when P a is relatively small, there are few opportunities for AB mode transmission. In the CRNs, when P a is relatively large, there are few opportunities for overlay mode transmission. In these two cases, E T t,i is small, and the lower bound of T min is low, and the tolerable value interval and A max is small. Different from the ABCs and CRNs, when s a t = 1, the STs in the ABO-CRN execute AB mode transmission, and when s a t = 0, STs execute overlay mode transmission. As described in Equation (16), E T t,i of the ABO-CRN is higher than that of the ABCs and CRNs. Under the same conditions of P a , T min , and A max , the ABO-CRN achieves higher throughput while satisfying the AoI constraint.

Policies of Time and Energy Management
As described in the introduction, long-term optimization is more practical than the optimization of a single frame. Maximizing the throughput of a single frame with the AoI constraint may not be desirable. As a result, we consider the long-term optimization of the throughput. However, since network environmental factors, such as the channel state and channel gains, are dynamic and uncertain, it is difficult for SUs to obtain complete knowledge about the network environmental factors in advance. DRL is an excellent way to tackle the challenge. For some DRLs, such as deep Q-learning network (DQN) that is based on the value-function policy, discrete spaces are necessary. If the discrete methods are not suitable for the scenario, it may lose important information, or lead to the high space dimension. Therefore, we utilize DDPG, which deals with problems of continuous spaces, to find the optimal policy of time and energy management for throughput optimization. We define the details about DDPG in the following subsections.

Definitions of Spaces and Rewards
The SR plays the role of agent that provides decisions for STs. According to Equations (2)-(14d), the state spaces, action space, and rewards are introduced as follows.

State Space
The SR determines time and energy allocation of STs based on the states of STs and channel information of the current frame, including the available energy in STs, the AoI about STs, channel gains, and the channel state. Therefore, the state space contains information about energy states, AoI states, states of channel gains, and channel states. The energy-state space is represented by The AoI-state space is represented by where the average accumulated AoI satisfies Equation (14d). In order to reduce the dimension of the channel-gain-state space, we represent the channel gains as where η h,t denotes the path loss coefficient from the PT to STs, and η g,t denotes the path loss coefficient from STs to the SR, l i denotes the distance between the PT and ST i , and L i denotes the distance between the SR and ST i , and denotes the channel path fading exponent. Therefore, the channel-gain-state space is represented by where η h,t and η g,t follow the Rayleigh distribution. The channel-state space is expressed as where P a represents the probability of s a t = 1. In summary, the state space of the ABO-CRN and of the CRNs when s a t = 0 is expressed as The state space of the ABCs when s a t = 1 is expressed as Note that, when s a t = 0, STs in the ABCs do not execute AB mode transmission. When s a t = 1, STs in the CRNs only harvest energy. As a result, the SR does not need to determine actions for STs, hence we do not design state space for these two cases.

Action Space
In the ABO-CRN, based on the state of the current frame, the SR determines the actions that STs execute in the current frame. When s a t = 1, STs execute AB mode transmission, and ST i harvests energy when the other STs transmit data in AB mode. When s a t = 0, STs execute overlay mode transmission. We define the action space as A = (α t,0 , α t,1 , . . . , α t,n , e t,0 , e t,1 , . . . , e t,n ); In particular, when s a When s a t = 1, STs in the ABCs execute AB mode transmission. Since the energy consumption of AB mode transmission can be ignored, the SR in the ABCs focuses on the time allocation of STs. We define the action space of the ABCs as A = (α t,0 , α t,1 , . . . , α t,n ); When s a t = 0, STs in the CRNs execute overlay mode transmission. The action space of the CRNs is defined as Equation (29).

Rewards
After the SR determines the action x t based on the state s t of frame t, an immediate reward r t (s t , x t ) is obtained, where r t (s t , x t ) represents the evaluation of choosing x t under s t . With the aim to optimize the throughput of the secondary networks with the AoI constraint, r t (s t , x t ) is defined as where ρ t,i denotes the penalty that is related to the AoI of ST i . Since the actions vary with respect to the value of s a t , the values of ρ t,i vary with respect to the value of s a t . When s a t = 1, we have When s a t = 0, we have where m is a constant that is set according to A max , and is used to ensure that ρ t,i is larger than A max . e t,i E α t,i indicates that the penalty increases with e t,i and α t,i when a t,i is larger than A max . The SR in the ABCs does not determine actions of the energy allocation for STs, hence we set

Time and Energy Management by DDPG
DDPG utilizes the architecture of the actor-critic algorithm and the scheme of DQN. Therefore, DDPG consists of two parts, actor and critic. The actor is used to output a deterministic action, and the critic is used to output an evaluation, which fits the Q-table. Both actor and critic consist of evaluated networks and target networks. The target networks make the training process more stable, and have the same structure with the evaluated networks. The evaluated network of the actor is named as the actor network, and that of the critic is named as the critic network. The target network of the actor is named as the actor target network, and that of the critic is named as the critic target network.
These networks are expressed as parametric functions. The actor network is expressed as a function mapping s t to x t , x t = Π(s t |ω), (34) where Π denotes the policy of time and energy management, and ω denotes the weights of neural network in the actor network. The critic network is expressed as an action-value function, which maps s t and x t to a Q-value, where µ denotes the weights of the neural network in the critic network. Furthermore, the Q-value function is expressed as where ω + denotes the weights of the neural network in actor target network, µ + denotes the weights of the neural network in critic target network, and γ ∈ [0, 1] denotes the discounting factor, which represents the effect of the future action choices. In order to weaken the dependence of DDPG on hyper-parameters, the batch normalization [42] is adopted for DDPG, i.e., each layer in the neural networks of DDPG is connected to a batch normalization layer, which makes the DDPG less sensitive to the initial parameters, and prevents the unstable training process resulted from the unstable data distributions of each layer in the neural networks. The batch normalization accelerates the converge of DDPG, and efficiently avoids the gradient vanishing. Furthermore, due to the different value ranges of each factor in states, we normalize the input state of DDPG so that each factor in the state has the same value range.
Algorithm 1 finds the optimal policy for the time and energy management by DDPG. The exploration noise N e t in Algorithm 1 is used to fully explore the action space, in order to avoid being stuck in the local optimum policy. In the training process, the exploring noise decay factor κ restricts the exploration range. The weights ω + and µ + of the target networks are updated by the soft replacement that increases the stability of the evaluated networks.

Simulation
In order to evaluate the performances of throughput and AoI, we compare the longterm average throughput T of the ABO-CRN with the AoI constraint with two baseline schemes, throughput-optimal (T-O) scheme and AoI-optimal (A-O) scheme. The T-O baseline scheme optimizes the throughput of the secondary network, and the A-O baseline scheme optimizes the AoI of STs. Furthermore, we compare the throughput and AoI performances among the ABO-CRN, ABCs, and CRNs to evaluate the impacts of T min and A max on the throughput and AoI performances. The simulation configuration is set as follows unless otherwise specified: The transmit power of PT P = 17 kW, the bandwidth W = 6 MHz, the AWGN δ 2 = 10 −3 µW, the energy capacity E = 30 µJ, and backscatter reflection efficiency θ = 0.9. Figure 6 plots T and AoI of the ABO-CRN, T-O baseline scheme, and A-O baseline scheme with the minimum throughput requirement T min under P a = 0.3, 0.6, 0.9. We observe from Figure 6a that T of the ABO-CRN decreases with T min , and observe from Figure 6b that AoI of the ABO-CRN increases with T min . For T-O baseline scheme, the throughput does not change with T min , and the AoI increases faster than that of the ABO-CRN and A-O baseline scheme. The throughput of A-O baseline scheme decreases faster with T min than that of the ABO-CRN. The AoI of A-O baseline scheme increases with T min , is close to that of the ABO-CRN, and is lower than that of T-O baseline scheme. The reasons can be explained as follows. When T min increases, each ST needs more throughput to reach the minimum throughput requirement. The SR in the ABO-CRN has to allocate more time and energy for the STs with high AoI and poor channel quality, and sacrifices the total throughput to satisfy the AoI constraint. When P a = 0.9, we observe that T in Figure 6a decreases faster than that when P a = 0.3 and 0.6, and the corresponding AoI in Figure 6b increases faster. The reason is provided as follows. When P a = 0.9, due to the active channel state for the most time, STs only execute AB mode transmission for the most time. When T min is higher than the expected throughput achieved by STs through AB mode transmission, the AoI increases for the most frames, and the number of frames of the increased AoI becomes more with T min . Therefore, the average AoI increases with T min . The expected throughput achieved by STs through AB mode transmission and overlay mode transmission when P a = 0.3 and 0.6 is higher than that when P a = 0.9. Therefore, the average AoI when P a = 0.9 increases faster with T min than that when P a = 0.3 and 0.6. We also observe that when T min is small, the curves of the ABO-CRN and two baseline schemes are close. The reason is that, the throughput of three schemes meets the minimum throughput requirement, and the AoI of them satisfies the AoI constraint. In addition, Figure 6 shows that T of the ABO-CRN is closer to that of the T-O than that of A-O, and the AoI of the the ABO-CRN is closer to that of the A-O than that of T-O. It indicates that DDPG finds the optimal policy of time and energy management to optimize the throughput, and satisfies the AoI constraint. Figure 7 plots T and AoI of the ABO-CRN, T-O baseline scheme, and A-O baseline scheme with the maximum allowable AoI constraint A max . We observe that both T and AoI increase with A max , and when A max is large, the throughput of the ABO-CRN and that of two baseline schemes are close. The reason is provided as follows. When A max increases, the limitation of the AoI constraint on throughput becomes weak. The SR allocates more time and energy to the STs with high throughput, hence the throughput increases with A max . With the increase of A max , STs with high AoI (but not exceed A max ) and poor channel quality is allocated less time and energy, hence AoI of these STs increases, and the average AoI of STs increases. When A max is large, all the three schemes satisfy the AoI constraint. Figure 8 plots T and AoI in the ABO-CRN, ABCs, and CRNs with T min , and Figure 9 plots T and AoI in the ABO-CRN, ABCs, and CRNs with A max , under P a = 0.3, 0.6, 0.9. It is obvious that T in the ABO-CRN is higher than that in the ABCs and CRNs, and AoI in the the ABO-CRN is lower than that in the ABCs and CRNs. When P a = 0.3 and 0.9, the AoI in the ABCs and CRNs are high, and when T is large, STs in the ABCs and CRNs can not satisfy the AoI constraint. The reason is explained as follows. When P a = 0.3, the channel keeps inactive for the majority part of the time. For the ABCs, AB mode transmission has a few opportunities to be executed. From Figure 6, we infer that, when T min is higher than the expected throughput achieved by each ST through AB mode transmission in a frame, STs in the ABCs are difficult to satisfy the AoI constraint by sacrificing the total throughput. Therefore, the throughput of the ABCs in this case keeps nearly unchanged. When P a = 0.9, the channel keeps active for the majority part of the time. For the CRNs, overlay mode transmission has few opportunities to be executed, hence the AoI of the CRNs is high, and the throughput of the CRNs keeps nearly unchanged. STs in the ABO-CRN execute AB mode transmission when the channel is active, and execute overlay mode transmission when the channel is inactive. As described in Equation (16), the expected throughput achieved by STs in the ABO-CRN is higher than that in the ABCs and CRNs. Therefore, the ABO-CRN achieves better throughput and AoI performances than that of the ABCs and CRNs.

Conclusions
We optimized the long-term throughput of the secondary network with the AoI constraint by jointly managing the time and energy for STs in the ABO-CRN, ABCs, and CRNs through DDPG. When the AoI constraint can not be satisfied, the impacts of time and energy allocation on the reward were investigated, and the corresponding reward functions was developed based on the channel states. We discussed the minimum throughput requirement and the maximum allowable AoI that are related to the throughput and AoI performances. We compared the throughput optimization scheme with the AoI constraint with T-O and A-O baseline schemes, and varied the minimum throughput requirement and maximum allowable AoI to evaluate the effects on the throughput and AoI performances of the secondary networks in the ABO-CRN, ABCs, and CRNs. We had following findings: • Throughput of the ABO-CRN is close to the optimal throughput of T-O baseline scheme, and the AoI of the ABO-CRN is close to the optimal AoI of A-O baseline scheme. DDPG finds the optimal policy of time and energy management to optimize the throughput, and satisfies the AoI constraint at the same time. • Throughput of the ABO-CRN is higher than that of A-O baseline scheme, and AoI of the ABO-CRN is lower than that of T-O baseline scheme. The observation validates the benefit of considering both throughput and AoI performances over only one metric. • The ABO-CRN improves the throughput and AoI performances of the ABCs and CRNs. Even in extreme cases, such as the long time active channel state, the ABO-CRN obtains better throughput and AoI performances than the ABCs and CRNs. • The lower bound of the maximum allowable AoI that makes STs satisfy the AoI constraint decreases with the total number of STs, and increases with the number of STs whose average throughput is smaller than the minimum throughput requirement.
Author Contributions: This research has been carried out through a concerted effort by seven months. Each author's basic role has been summarizing in following: Conceptualization, X.J.; methodology, X.J.; software, X.J.; validation, K.Z., K.C., and X.L.; investigation, X.J., K.Z., K.C., and X.L.; writing-original draft preparation, X.J.; writing-review and editing, K.Z.; supervision, K.C.; funding acquisition, K.C. All authors have read and agreed to the published version of the manuscript. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
We used the abbreviations in this paper: