Next Article in Journal
The Dynamical Evolution Parameter in Manifestly Covariant Quantum Gravity Theory
Previous Article in Journal
Leveraging Neural ODEs for Population Pharmacokinetics of Dalbavancin in Sparse Clinical Data
Previous Article in Special Issue
A Zero-Touch Dynamic Configuration Management Framework for Time-Sensitive Networking (TSN)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Age of Information Minimization in Multicarrier-Based Wireless Powered Sensor Networks

1
School of Computer and Data Engineering, NingboTech University, Ningbo 315100, China
2
School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(6), 603; https://doi.org/10.3390/e27060603
Submission received: 1 March 2025 / Revised: 14 April 2025 / Accepted: 15 April 2025 / Published: 5 June 2025

Abstract

This study investigates the challenge of ensuring timely information delivery in wireless powered sensor networks (WPSNs), where multiple sensors forward status-update packets to a base station (BS). Time is partitioned to multiple time blocks, with each time block dedicated to either data packet transmission or energy transfer. Our objective is to minimize the long-term average weighted sum of the Age of Information (WAoI) for physical processes monitored by sensors. We formulate this optimization problem as a multi-stage stochastic optimization program. To tackle this intricate problem, we propose a novel approach that leverages Lyapunov optimization to transform the complex original problem into a sequence of per-time-bock deterministic problems. These deterministic problems are then solved using model-free deep reinforcement learning (DRL). Simulation results demonstrate that our proposed algorithm achieves significantly lower WAoI compared to the DQN, AoI-based greedy, and energy-based greedy algorithms. Furthermore, our method effectively mitigates the issue of excessive instantaneous AoI experienced by individual sensors compared to the DQN.

1. Introduction

The timeliness of status updates, originating from diverse stochastic processes and collected by source nodes, plays a crucial role in the performance of numerous real-time systems [1,2,3]. Examples of such applications include safety-critical systems, health monitoring, and environmental surveillance. In these time-sensitive contexts, rapid delivery of sampled information to the destination is necessary. Outdated information can result in suboptimal control decisions and potentially severe consequences. Consequently, the Age of Information (AoI) metric has emerged as a valuable tool for quantifying the freshness and timeliness of status-update data [4]. It represents the elapsed time since the most recent update was generated at the source and successfully received at the destination. The AoI at time t is given by t u ( t ) , where u ( t ) is the timestamp of the latest received update [5].
Energy limitations in wireless devices (WDs) pose a significant challenge to timely data delivery due to the increased likelihood of packet loss. Wireless energy transfer (WET) has emerged as a key technology to address this issue, allowing for over-the-air recharging of WD batteries and obviating the need for manual replacement [6]. In this work, we focus on the long-term average weighted sum of AoI (WAoI) performance in wireless powered sensor networks (WPSNs), where sensors rely on radio frequency (RF) energy harvested from the base station (BS) to sustain their sensing and communication activities. In WPSNs, the wireless link quality between sensors and the BS is time-varying. Furthermore, the residual energy of sensors and their monitored AoI values vary over time. Consequently, the design of a scheduling policy that minimizes the WAoI at the BS becomes crucial. The network should dynamically select sensors for packet transmission based on a combination of channel quality, energy availability, and AoI. Intuitively, sensors with strong channels, sufficient energy, and high AoI should be prioritized. Additionally, the BS should adapt to network conditions by conducting WET when the communication links are degraded or sensors’ energy is depleted.

1.1. Related Work

The introduction of AoI in [4] has spurred extensive research into the characterization of both average and peak AoI [7,8,9,10,11,12,13]. In another line of inquiry, researchers are developing the optimal transmission policy for AoI minimization in communication systems [14,15,16,17,18,19,20,21,22,23,24,25], e.g., broadcast networks [14,15,16], information-update systems employing multiple servers [17], relay-based multi-hop communication systems [18], Internet of Things (Iot) networks [19], UAV-enabled networks [20,21], cognitive radio communication systems [22], multicast networks [23,24], mission-critical vehicular networks [25], and multi-state time-varying networks [26]. Particularly, [14,15,16,17,18,19,20,21,22,23,25] do not consider source nodes utilizing an energy harvesting technique to maintain self-supplying communication operations.
Different from [14,15,16,17,18,19,20,21,22,23,25], another research direction focuses on communication systems that use an energy harvesting technique to power the source nodes. This research aims to determine age-oriented optimal policies for transmitting status-update packets, considering the source nodes’ energy causality constraint. In [27], the authors demonstrate that an energy-dependent threshold policy is optimal for minimizing AoI by triggering new samples. Multiple RF energy harvesting sensors are considered in [28]. To minimize the weighted sum of AoI, the authors employed a DRL framework that concurrently optimize WET and update packet transmissions. In [29], the authors studied relay-based networks, where a source sends status updates to a destination via a relay. Considering the spectrum scarcity, in [30], the authors address the problem of minimizing AoI in cognitive radio networks (CRNs). They derive optimal scheduling actions for both imperfect and perfect spectrum sensing scenarios. Different from traditional fixed energy sources, the work [31] dispatched a UAV to transfer energy to ground sensor nodes. To minimize the average AoI, the authors concurrently optimized energy harvesting durations, the UAV’s trajectory, and data collection time for sensors.
However, none of these prior studies has employed a DRL-based algorithm for the efficient design of freshness-aware WPSNs utilizing multiple subcarriers. In contrast to these works, this paper investigates a scenario where multiple RF-powered source nodes are deployed to sense potentially distinct physical processes and transmit status-update packets across multiple orthogonal subcarriers. For this setting, we provide a novel reinforcement. Before detailing our contributions, we emphasize the key distinctions between our work and those presented in [24,28].
  • Ref. [28] investigates a single-carrier system in contrast to this work, where we focus on a multicarrier system.
  • Ref. [24] considers source nodes with embedded power supplies, whereas our work adopts WPT technology to energize these nodes.
  • In contrast to the approach presented in [24], which aims to minimize the average total transmit power subject to per sensor AoI constraints, our work focuses on minimizing the long-term WAoI.
  • In terms of optimization strategies, ref. [24] relies on conventional numerical methods. In contrast, our work pioneers a scheduling algorithm based on DRL. Moreover, while [28] employs the classical Deep Q-Network (DQN) algorithm, our research introduces a distinctly different DRL algorithm tailored to the specific challenges of our problem.

1.2. Contributions

This paper investigates WPSNs for status updates, comprising multiple sensors and a BS, where the BS receives timely information regarding different physical processes monitored by the sensors. The sensors harvest energy from RF signals transmitted by the BS and communicate using orthogonal subcarriers within each time block. The primary contributions of this work are summarized as follows:
  • We formulate the problem of jointly optimizing subcarrier assignment, WET duration, and sensor sampling schedules to minimize the WAoI for diverse physical processes at the BS within a time-sensitive communication system. This is modeled as a multi-stage stochastic optimization problem, subject to energy causality constraints at the sensors.
  • To address this optimization problem, we propose a novel dynamic control algorithm that integrates DRL and Lyapunov optimization techniques. Specifically, Lyapunov optimization is employed to decompose the multi-stage stochastic problem into a sequence of deterministic optimization problems, one for each time block. Subsequently, a DRL algorithm is utilized to determine the optimal scheduling decisions for each time block, with action exploration facilitated by a randomization policy.
  • Extensive simulation results demonstrate the significant performance gains of our proposed algorithm in reducing the WAoI compared to benchmark algorithms, including the DQN, energy-based greedy, and AoI-based greedy schemes. Notably, our DRL algorithm exhibits good convergence performance and eliminates the need for a predefined upper limit for AoI values, unlike the DQN approach.

2. System Model and Problem Formulation

2.1. Network Model

A real-time monitoring system is considered, as shown in Figure 1, where a BS collects time-critical information from N sensors. The sensors are responsible for providing the BS with fresh information about their respective measured processes. Additionally, the sensors share K subcarriers, each with a bandwidth of W Hz. The BS is assumed to have a stable power supply, whereas each sensor n is powered by the RF energy transmitted by the BS in the downlink. This harvested energy is stored in a battery with a finite capacity of B max , n joules. The communication is divided into discrete time intervals, indexed by t = 0 , 1 , 2 , , T 1 . In each time block, either energy transfer or pack transmission is conducted. Figure 2 illustrates a representative schedule, where the BS broadcasts RF energy in time block 0, 2, and 3, and sensors 1, 3, and 5 transmit update packs in time block 1.
We consider a quasi-static fading channel model, where the channel power gain is constant within a time block but varies independently across different time blocks. Let g n , k ( t ) and h n , k ( t ) represent the channel power gains on subcarrier k of the uplink and downlink channels between the BS and sensor n, respectively. Additionally, let A n ( t ) and B n ( t ) represent the AoI of sensor n’s monitoring process and its remaining energy, respectively. It should be noted that this paper does not impose an upper bound on the AoI.

2.2. State and Action Spaces

At time block t, sensor n’s state, s n ( t ) , is defined by its downlink and uplink channel power gains on subcarrier k, the AoI of its measured process at the BS, and its battery level, i.e., s n ( t ) ( h n , k ( t ) , g n , k ( t ) , A n ( t ) , B n ( t ) ) S n a . S n a denotes the state space encompassing all combinations of h n , k ( t ) , g n , k ( t ) , A n ( t ) , and B n ( t ) . Consequently, s ( t ) = { s n ( t ) } n N S a denotes the state of the system at time slot t, where S a denotes the state space of the system. Then, at time block t, the possible action is expressed as a ( t ) A { H , ( T 1 , T 2 , ) K , ( T 1 , T 3 , ) K , , ( T i , T j , ) K } . If a ( t ) = H , the BS transmits RF energy to the sensors via the downlink. For sensor n, the captured energy is expressed as
E n H ( t ) = η P h n , k ( t ) ,
where η denotes the efficiency of the energy harvesting and P represents the power transmitted by the BS.
If a ( t ) = ( T i , T j , ) K , K sensor nodes (i,e., node i, node j, …) send the status-update packs to the BS through the uplink over the K subcarriers. The sensors employ a generate-at-will strategy, where data packets are generated immediately following a scheduling decision [32]. It should be noted that when sensor n transmits a data packet of size S n to the BS at time block t, the energy consumption is denoted by E n C ( t ) . According to Shannon’s formula, E n C ( t ) is expressed as
E n C ( t ) = σ 2 g n , k ( t ) ( 2 S n W 1 ) ,
where σ 2 represents the noise variance. Therefore, sensor n is eligible for data transmission only if its remaining energy satisfies
B n ( t ) E n C ( t ) = σ 2 g n , k ( t ) ( 2 S n W 1 ) .
At time block t, the AoI for different physical processes and the battery level of each sensor are updated after executing decisions as shown in Figure 1. Specifically, when a ( t ) = H , process n’s AoI observed by sensor n increments by one and the remaining battery energy of sensor n increases by E n H ( t ) . When a ( t ) = ( T i , T j , ) K , i.e., the packet is transmitted, and the battery level of K sensors scheduled decreases by E n C ( t ) , while the battery level of other sensors remain unchanged. Meanwhile, the AOI of the physical processes monitored by the selected K sensors are set to 1, while the other sensors remain unchanged. Therefore, the dynamic changes in sensor n’s remaining energy and the AoI are given, respectively, by
B n ( t + 1 ) = B n ( t ) E n C ( t ) , if a ( t ) = ( T i , T j , ) K and T n ( T i , T j , ) K , min { B n ( t ) + E n H ( t ) , B max , n } , if a ( t ) = H , B n ( t ) , otherwise .
A n ( t + 1 ) = 1 , if a ( t ) = ( T i , T j , ) K and T n ( T i , T j , ) K , A n ( t + 1 ) + 1 , otherwise .
To help visualize (5), Figure 3 shows the AoI evolution of process 1 when taking action over time, where N = 5 and K = 2 . We can observe from Figure 3 that a ( 1 ) = ( T 1 , T 2 ) then A 1 ( 2 ) = 1 , and a ( 5 ) = ( T 1 , T 3 ) then A 1 ( 6 ) = 1 . Specifically, the AoI of process 1 is reset to 1 at the start of time blocks 2 and 6, corresponding to status updates transmitted at time blocks 1 and 5. During time blocks 0, 2, 3, 4, 6, and 7, sensor 1 remains inactive (either harvesting energy or idle), causing the AoI of its monitoring process increment by 1 in each of these time blocks.

2.3. Problem Formulation

We aim to minimize the WAoI at the BS by finding the optimal policy for action selection at each time block. The policy consists of a set of decision rules { π ( 0 ) , π ( 1 ) , } such that for any time block t, π ( t ) assigns an action a ( t ) A to each possible system state s ( t ) S a . Given policy π and initial state s ( 0 ) , process n’s long-term AoI is given by
A ¯ n π lim sup t 1 T + 1 E [ t = 0 T A n ( t ) | s ( 0 ) ] .
Consequently, the WAoI minimization problem for WPSNs is formulated as
( P 1 ) : π * = arg min π n = 1 N β n A ¯ n π , s . t . ( 3 ) ( 5 ) ,
where π * denotes the optimal policy, and the weight β n is non-negative and satisfies n = 1 N β n = 1 . Clearly, if π ( t ) = H , the optimal policy is to allocate all sensors harvesting RF energy from the BS. If π ( t ) = ( T i , T j , ) K , the optimal policy is to allocate K sensors send status-update packs to the BS based on each sensor’s battery level, channel state information, and each process n’s AoI state monitored by sensor n.
Classic optimization techniques like combinatorial optimization and heuristics are not appropriate for this problem due to its long-term and stochastic nature. Primarily designed for deterministic scenarios, these methods can only optimize for the immediate time block, thus struggling to achieve optimal performance over extended periods.

3. The Decoupling Strategy for Multi-Stage Stochastic Optimization Based on Lyapunov Theory

This section introduces LODR, an algorithm that combines Lyapunov optimization and DRL, to tackle the problem (P1). We begin by employing Lyapunov optimization to transform the original problem into a series of deterministic problems, one for each time block. For each sensor, we define N virtual energy queues, denoted by { Q n ( t ) } n = 1 N . These queues are initialized with Q n ( 1 ) = 0 and updated according to the following equation
Q n ( t + 1 ) = max { Q n ( t ) ν E n C ( t ) + ν E n H ( t ) , 0 } , n ,
where ν is a scaling coefficient. Q n ( t ) functions as a queue, which is incremented by energy harvests ν E n H ( t ) and decremented by energy consumption ν E n C ( t ) .
We introduce a queue backlog vector Q ( t ) = { Q n ( t ) } n = 1 N to characterize the energy queue status of all sensors, where Q n ( t ) denotes the backlog of the energy queue for sensor n at time block t. We then define a Lyapunov function L ( Q ( t ) ) and its associated drift L ( Q ( t ) ) as [33]
L ( Q ( t ) ) = 0.5 n = 1 N Q n ( t ) 2 ,
L ( Q ( t ) ) = E [ L ( Q ( t + 1 ) ) L ( Q ( t ) ) | Q ( t ) ) ] .
Then, we utilize the Lyapunov-plus-penalty minimization method from [34] to minimize the WAoI subject to the stability constraint of the queue Q ( t ) . Our approach involves minimizing an upper bound on the following drift-plus-penalty expression for time block t.
( Q ( t ) ) L ( Q ( t ) ) + V · E [ n = 1 N β n A ¯ n π | Q ( t ) ] .
The parameter V > 0 serves as a control variable, balancing the importance of the penalty term (i.e., the objective function) and the queue backlog sizes. Adjusting V > 0 can achieve a desired balance between the objective function value and the sizes of the queue backlogs.
We next establish an upper bound for ( Q ( t ) ) . To start, we obtain
Q n ( t + 1 ) 2 = Q n ( t ) 2 + 2 Q n ( t ) ( E n H ( t ) E n C ( t ) ) + ( E n H ( t ) E n C ( t ) ) 2 .
Performing a summation over all queues, we obtain
0.5 n = 1 N Q n ( t + 1 ) 2 0.5 n = 1 N Q n ( t ) 2 = 0.5 n = 1 N ( E n H ( t ) E n C ( t ) ) 2 + n = 1 N Q n ( t ) ( E n H ( t ) E n C ( t ) ) .
Applying the conditional expectation operator to both sides of (13) [35], we obtain
L ( Q ( t ) ) B + n = 1 N Q n ( t ) E [ ( E n H ( t ) E n C ( t ) ) | Q ( t ) ] .
Here, B is a constant given as
0.5 n = 1 N E ( ( E n H ( t ) E n C ( t ) ) 2 0.5 n = 1 N ( E n , max H ( t ) ) 2 + ( E n , max C ( t ) ) 2 B ,
where E n , max H ( t ) is defined as sensor n harvesting RF energy at time block t, and E n , max C ( t ) is defined as sensor n forwarding data at time block t. Therefore, the drift-plus-penalty expression in (11) is upper-bounded by
B + n = 1 N Q n ( t ) E [ ( E n H ( t ) E n C ( t ) ) | Q ( t ) ] + V · E [ n = 1 N β n A ¯ n π | Q ( t ) ] .
Applying the principle of opportunistic expectation minimization [36] at time block t, the scheduling decision is made based on the observed queue backlog Q ( t ) to minimize the upper bound established in (16). Noting that the control action decision at time block t is only affected by the second and third terms, the algorithm’s action selection at time block t is predicated on the minimization of the subsequent expression, derived after removing constants from the initial observation:
V n = 1 N β n A ¯ n π + n = 1 N Q n ( t ) E n H ( t ) n = 1 N Q n ( t ) E n C ( t ) ,
where the second term corresponds to RF energy harvesting by sensor n, and the third term indicates that sensor n is selected for transmission. Then, the original problem (P1) is reformulated as
( P 2 ) : min Δ ( s ( t ) , π ( t ) ) = V n = 1 N β n A ¯ n π + n = 1 N Q n ( t ) E n H ( t ) n = 1 N Q n ( t ) E n C ( t ) , s . t . ( 3 ) ( 5 ) .
If problem (7) admits a feasible solution, relying solely on channel-only policies is sufficient to approach the optimal performance with arbitrary precision. Given the assumption of stationary and independent and identically distributed (i.i.d.) channels across time slots, the feasibility of problem (7) implies that for any v > 0 , a policy relying solely on channel state information exists that satisfies
n = 1 N E ( β n A ¯ n π ) G opt + v ,
where G opt denotes the optimal value of WAoI.
Proof. 
See proof of Theorem 4.5 in [Ref. [33], Appendix 4.A].    □
Subsequently, the primary obstacle lies in the effective solution of problem (P2) for each time block to achieve the WAoI minimization. For the solution of (P2), we consider the system state s ( t ) { h n , k ( t ) , g n , k ( t ) , A n ( t ) , B n ( t ) } n = 1 N at time block t, encompassing downlink and uplink channel gains, the AoI of sensor n’s monitoring process, and the remaining energy of sensor n. The scheduling control action is then determined based on this state. In general, obtaining the optimal policy requires enumerating C N K + 1 scheduling actions, a computationally intensive task for moderate N and K values. Alternative search-based techniques, including block coordinate descent and branch-and-bound, also suffer from high computational cost. To address the challenge of online decision-making in dynamically varying channel environments, we introduce LODR, a based-DRL algorithm. In the following, we describe the LODR algorithm to solve (P2) efficiently.

4. Lyapunov-Guided DRL for Online Scheduling Decisions

Figure 4 shows the architectural framework of the LODR algorithm. It consists of three core modules: the scheduling action generation module, scheduling policy update module, and input update module. The scheduling action generation module begins by receiving the current system state, denoted as s ( t ) . Subsequently, it generates a set of potential scheduling actions. From this set, an evaluation process is executed, resulting in the selection of a t * , the action deemed most advantageous. The policy of the scheduling action generation mechanism is refined over time by the scheduling policy update module. Following the execution of the scheduled actions, the input update module changes the state of battery remaining energy of sensors and their monitoring processes’ AoI. A sequential iterative process is followed by these three modules through successive interactions with the stochastic environment { h n , k ( t ) , g n , k } n = 1 N , as outlined in the subsequent sections.
(1) Scheduling Action Generation Module: The DNN, parameterized by θ t , is employed for scheduling action synthesis. When t = 1 , the parameters θ t are initialized stochastically according to a Gaussian distribution with zero mean. Subsequent to the DNN’s production of a real-valued scheduling output a ^ t , a discretization process is performed to formulate a set of binary-valued scheduling operations. Specifically, we set K largest values of a ^ t to 1 and the rest to 0, where 1 indicates that the corresponding sensor is selected to send the status update to the BS, while 0 represents the sensor remaining idle. Furthermore, we generate another M random actions and one fixed action { 0 , 0 , , 0 } , together with the above DNN output action as candidate actions. Here, { 0 , 0 , , 0 } represents all the sensors harvesting RF energy from the BS.
It is established by the universal approximation theorem that a neural network with a single hidden layer of adequate size is capable of approximating any continuous function f, provided that an appropriate activation function is employed, such as the ReLU, tanh, or sigmoid [37] functions. In this implementation, the ReLU activation function is employed within the hidden layers, where the relationship between the neuron’s output y and input x is defined by y = max { v , 0 } . For the output layer, the sigmoid activation function is implemented, resulting in the relaxed scheduling action a ^ t being constrained to the interval ( 0 , 1 ) .
It is noted that each potential action yields Δ ( s ( t ) , π ( t ) ) , which represents the performance metric, through the solution of problem (P2), a defined optimization problem. Consequently, the optimal scheduling action a t * at time block t is determined as
a t * = arg min Δ ( s ( t ) , π ( t ) ) .
(2) Scheduling Policy Update Module: The scheduling solution obtained in (19) will be used to update the scheduling policy of the DNN. To facilitate training, a memory repository of bounded capacity, starting in an empty state, is maintained. In the tth time block, a fresh training sample ( s ( t ) , a t * ) is incorporated into the memory buffer. When the memory buffer is at maximum capacity, a first-in-first-out (FIFO) replacement policy is enacted, with the oldest sample being replaced by the newest.
Leveraging the data samples stored within the memory, the training process of the DNN is conducted, an approach recognized as experience replay [38]. The stochastic selection of a training data batch { ( s ( τ ) , a τ * ) | τ Γ t } is performed from the memory during the tth temporal iteration, defined by the set of temporal indices Γ t . The DNN’s parameter θ t is updated via the Adam optimization algorithm to minimize the mean cross-entropy loss as
L ( θ t ) = 1 | Γ t | τ Γ t ( ( a τ * ) T log f θ t ( s ( τ ) ) + ( 1 a τ * ) log ( 1 f θ t ( s ( t ) ) ) ) ,
where | Γ t | represents the number of elements within Γ t , ( · ) T indicates the operation of transpose, and the log function represents the application of the logarithm to each element of a vector. The DNN undergoes periodic retraining every ζ time frames, triggered by the accumulation of a predetermined volume of fresh data.
(3) Input Update Module: Based on the scheduling action generation module, the system executes the scheduling action a t * , then the battery level of sensor n B n ( t + 1 ) is updated using (4), the AoI A n ( t + 1 ) updates using (5), and the virtual energy queues { Q n ( t + 1 ) } n = 1 N updates using (8). Following the observation of wireless channel gains { h n , k ( t + 1 ) , g n , k ( t + 1 ) } n = 1 N , the system utilizes the composite input s ( t + 1 ) = { h n , k ( t + 1 ) , g n , k ( t + 1 ) , A n ( t + 1 ) , B n ( t + 1 ) } to drive the DNN, thereby triggering a new iterative cycle starting from the scheduling action generation module in Step 1).
In summary, through successive iterations, the DNN refines its scheduling policy by learning from optimal state-action pairs, ( s ( t ) , a t * ) , thereby enhancing decision-making over time. Due to the limitation imposed by the finite replay memory, the DNN’s learning is restricted to the most recent data samples, which reflect the latest scheduling policies. Through continuous feedback and adaptation, this closed-loop reinforcement learning framework optimizes the scheduling policy, leading to convergence. Algorithm 1 delineates the methodology employed to determine the optimal scheduling policy for (P1).
Algorithm 1: LODR algorithm to solve the AoI minimization problem.
Entropy 27 00603 i001

5. Performance Evaluation

The proposed LODR algorithm’s performance was evaluated by comparing it with the DQN, energy-based greedy, and AoI-based greedy algorithms. Specifically, for the energy-based and AoI-based greedy algorithms, on the premise of ensuring the minimum energy requirement, we arranged K sensors with relatively large battery level and high AoI to send status-update packs to the BS.

5.1. Experimental Settings

This subsection details the simulation parameters. The channel model incorporates both small-scale fading and path loss effects. The downlink and uplink channel gains between the BS and sensor n, denoted by h n , k and g n , k , are modeled as random variables. Specifically, h n , k and g n , k are given by h n , k = Υ Ψ 1 2 d n , k κ and g n , k = Υ Ψ 2 2 d n , k κ . Υ is the reference distance (1 m) signal power gain, d n , k κ is the distance between the BS and sensor n, and κ is the path loss exponent. Ψ 1 2 and Ψ 2 2 represent independent, exponentially distributed (mean 1) small-scale fading gains. Unless otherwise stated, the primary simulation parameters are as follows: d 1 = 20 , d 2 = 25 , d 3 = 40 , d 4 = 15 , d 5 = 50 , d 6 = 10 , d 7 = 45 , S n = 15 , B n , max = 0.3 mJ, K = 2 , and A n ( 1 ) = 2 . d n represents the distance between sensor n to the BS.
The DNN employed, whose architecture is specified in Table 1, processed 4 N input features. The network’s core consisted of two hidden layers, the first with 120 neurons and the second with 80, both employing the ReLU activation function. The final layer, responsible for generating N + 1 outputs, utilized a sigmoid activation. Table 2 shows the simulation parameters for the LODR algorithm.
The state space for the DQN is constructed by discretizing system parameters. Five levels are used to represent both the uplink and downlink channel gains and four levels for the remaining battery energy.

5.2. Training Loss for LODR Algorithm

Figure 5 presents the training loss of LODR as a function of training steps for a network with N = 7 and K = 2 . The gradual decrease and subsequent convergence of the loss function to a low value demonstrate LODR’s ability to automatically adapt its scheduling policy and reach to the optimal value. The DNN within LODR reaches a stable state within approximately 8000 time blocks, indicating rapid convergence. This convergence behavior is consistently observed in simulations with a larger number of sensors. In contrast, the DQN exhibits significantly slower convergence, or even a lack thereof, as the state space expands. For instance, with seven sensors, discretizing the uplink and downlink channel gains into six and five levels, the battery energy into four levels, and the maximum AoI setting to 5 results in a state space of ( 6 5 4 5 ) 7 . Such a vast state space requires extensive exploration for Q-value stabilization, leading to slow convergence. Furthermore, even upon convergence, the DQN algorithm’s loss remains comparatively high.

5.3. Impact of M and ζ

Figure 6 illustrates the influence of the number of random actions M on the WAoI, employing a parameter configuration consistent with that used in Figure 5. This figure demonstrates rapid convergence of the WAoI for all three values of M considered. Furthermore, at 3000 time blocks, the WAoI for M = 15 is 3.5 % greater than that observed for M = 5 . This suggests that the number of random actions has a limited impact on the WoI. Meanwhile, as M increases, the WAoI decreases.
Figure 7 shows the effect of the training interval ζ on the WAoI, using parameters consistent with Figure 5. The WAoI converge to similar values for ζ = 10 , ζ = 20 , and ζ = 30 after 3000 time slots. ζ = 10 was therefore selected for subsequent experiments.

5.4. The WAoI of LODR

Figure 8 shows the WAoI for LODR and the DQN, where the parameter setup is similar to that in Figure 5. From Figure 8, we can draw two conclusions. First, clearly the WAoI of our proposed LODR algorithm is superior to the classic DQN algorithm. Second, for the DQN, as A max increases, the system performance deteriorates. This is because the higher the degree of information stale (e.g., A max is larger) that the system can tolerate, the greater AoI is likely to be. Thus, the setting of A max obviously affects the performance of the DQN.
Figure 9 illustrates a comparison of AoI evolution for each sensor using LODR and the DQN. As shown in Figure 9, LODR achieves greater AoI stability for all sensors than the DQN. Using the DQN, sensor 1’s AoI reaches 8 by time block 8, and the status-update packet is sent at time block 9. Sensors 2 and 3 exhibit significantly lower peak AoI values, at 50 % and 37.5 % of sensor 1’s peak, respectively. Notably, the maximum AoI values for sensors 2 and 3 are significantly lower, at only 50 % and 37.5 % of sensor 1’s peak. This is due to the DQN algorithm’s failure to converge to an optimal policy. Conversely, LODR maintains a maximum AoI of 2 for all sensors, demonstrating its effective dynamic scheduling. Additionally, we observe the consequent time blocks, and the AoI of each sensor is the same in the steady state. It should be noted that this only applies to the current parameter setup. When we change the parameters, each sensor has a different AOI.
Figure 10 shows the WAoI for the LODR, DQN, AoI-based greedy, and energy-based greedy algorithms. As illustrated in Figure 10, LODR outperforms the other three algorithms. Our simulations revealed that the DQN struggles to converge with an increasing number of sensors. When the number of sensors is large but the selection is limited, considering the case of selecting two nodes, the stability of the DQN’s state is compromised. After multiple rounds of selection, there will inevitably be several sensors with continuously increasing AoI until the upper limit is reached. However, now, if two nodes are selected based on a greedy algorithm, the senor with the maximum energy will undoubtedly have the maximum AoI. At this point, the two greedy algorithms degenerate into one, resulting in similar AoI. Additionally, the WAoI increases with the number of sensors. Larger sensor deployments result in reduced transmission probability and increased WAoI.
Figure 11 illustrates how the WAoI varies with the number of subcarriers k, where the number of sensors N is set to 6, 7, 8, and 9. In Figure 11, we can observe that the WAoI exhibits a trend of initial decrease followed by a subsequent increase. Larger K values correlate with a higher chance of repeated sensor selections. When K is below about half of N, the probability of repeating the same sensor is less likely. The sensors are used evenly, entering a charging state slowly and reducing the WAoI with increasing K. However, when K surpasses about half of N, reusing the same sensor in different time blocks accelerates energy consumption, leading to faster charging and an increase in the WAoI with growing K.
Figure 12 illustrates the influence of the size of the status-update pack on WAoI for the LODR, AoI-based greedy, energy-based greedy, and DQN methods with A max = 10 . It is observed that the WAoI exhibits a positive correlation with the size of status-update pack. This is because the increased energy demands of larger status-update packets necessitate longer energy harvesting periods, consequently leading to a higher AoI.
Figure 13 illustrates the relationship between the battery capacity and WAoI for the LODR, AoI-based greedy, energy-based greedy, and DQN methods with A max = 6 . From Figure 13, we can see that the WAoI exhibits an inverse relationship with the battery capacity. This is because as the battery capacity increases, so does the energy storage capacity of sensors, which in turn increases the probability of status-update transmissions.

6. Conclusions

We considered the WPSNs where multiple sensors send status-update packets to the BS, aiming to minimize the WAoI of different processes at the BS. Specifically, we first formulated the WAoI minimization problem as a multi-stage stochastic optimization problem subject to energy causality constraints. Second, we designed the algorithm LODR, which jointly utilizes DRL and Lyapunov optimization. Compared to the classic DQN algorithm, LODR has better convergence performance, alleviates the problem that some sensors may have large AoIs, and is able to effectively schedule the energy transfer and packet transmit with the dynamic of network state. LODR achieves smaller WAoI than the DQN, AoI-based greedy, and energy-based greedy algorithms. Additionally, the WAoI is an increasing function with respect to the size of status-update packets, and it is a decreasing function with respect to the capacity of batteries.

Author Contributions

Conceptualization, S.Z. and J.S.; methodology, J.S.; software, J.X.; validation, X.Y. and S.Z.; formal analysis, J.S.; investigation, S.Z.; resources, X.Y.; data curation, J.S.; writing—original draft preparation, J.S.; writing—review and editing, J.S.; visualization, J.S.; supervision, X.Y.; project administration, X.Y.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Zhejiang Provincial Nature, grant number LQ22F020009.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AoIAge of Information
WPSNsWireless powered sensor networks
WAoILong-term average weighted sum of Age of Information
RFRadio frequency

References

  1. Kosta, A.; Pappas, N.; Angelakis, V. Age information: A new concept, metric, tool. Found. Trends Netw. 2017, 12, 1–73. [Google Scholar] [CrossRef]
  2. Yates, R.D.; Kaul, S.K. The age of information: Real-time status updating by multiple sources. IEEE Trans. Inf. Theory 2019, 65, 1807–1827. [Google Scholar] [CrossRef]
  3. Sun, Y.; Kadota, I.; Talak, R.; Modiano, E. Age information: A new metric for information freshness. Synth. Lect. Commun. Netw. 2019, 12, 1–24. [Google Scholar]
  4. Kaul, S.; Yates, R.; Gruteser, M. Real-time status: How often should one update. In Proceedings of the IEEE Conference Computer Communications (INFOCOM), Orlando, FL, USA, 25–30 March 2012; pp. 2731–2735. [Google Scholar]
  5. Feng, S.; Yang, J. Age of information minimization for an energy harvesting source With updating erasures: Without and with feedback. IEEE Trans. Commun. 2021, 69, 5091–5105. [Google Scholar] [CrossRef]
  6. Bi, S.; Ho, C.K.; Zhang, R. Wireless powered communication: Opportunities and challenges. IEEE Commun. Mag. 2015, 53, 117–125. [Google Scholar] [CrossRef]
  7. Yates, R.D.; Kaul, S. Real-time status updating: Multiple sources. In Proceedings of the 2012 IEEE International Symposium on Information Theory, Cambridge, MA, USA, 1–6 July 2012; pp. 2666–2670. [Google Scholar]
  8. Kam, C.; Kompella, S.; Ephremides, A. Age of information under random updates. In Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, 7–12 July 2013; pp. 66–70. [Google Scholar]
  9. Huang, L.; Modiano, E. Optimizing age-of-information in a multiclass queueing system. In Proceedings of the 2015 IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 1681–1685. [Google Scholar]
  10. Costa, M.; Codreanu, M.; Ephremides, A. On the age of information in status update systems with packet management. IEEE Trans. Inf. Theory 2016, 62, 1897–1910. [Google Scholar] [CrossRef]
  11. Chen, K.; Huang, L. Age-of-information in the presence of error. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 2579–2583. [Google Scholar]
  12. Barakat, B.; Keates, S.; Wassell, I.; Arshad, K. Is the zero-wait policy always optimum for information freshness (Peak Age) or throughput? IEEE Commun. Lett. 2019, 23, 987–990. [Google Scholar] [CrossRef]
  13. Kosta, A.; Pappas, N.; Ephremides, A.; Angelakis, V. Age and value of information: Non-linear age case. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 326–330. [Google Scholar]
  14. Hsu, Y.-P.; Modiano, E.; Duan, L. Scheduling algorithms for minimizing age of information in wireless broadcast networks with random arrivals. IEEE Trans. Mobile Comput. 2019, 19, 2903–2915. [Google Scholar] [CrossRef]
  15. Kadota, I.; Uysal-Biyikoglu, E.; Singh, R.; Modiano, E. Minimizing the age of information in broadcast wireless networks. In Proceedings of the 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 27–30 September 2016; pp. 844–851. [Google Scholar]
  16. Chen, X.; Bidokhti, S.S. Benefits of coding on age of information in broadcast networks. In Proceedings of the 2019 IEEE Information Theory Workshop (ITW), Visby, Sweden, 25–28 August 2019. [Google Scholar]
  17. Bedewy, A.M.; Sun, Y.; Shroff, N.B. Optimizing data freshness, throughput, and delay in multi-server information-update systems. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 2569–2573. [Google Scholar]
  18. Talak, R.; Karaman, S.; Modiano, E. Minimizing age-of-information in multi-hop wireless networks. In Proceedings of the 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 3–6 October 2017; pp. 486–493. [Google Scholar]
  19. Abd-Elmagid, M.A.; Pappas, N.; Dhillon, H.S. On the role of age of information in the Internet of Things. IEEE Commun. Mag. 2019, 57, 72–77. [Google Scholar] [CrossRef]
  20. Liu, J.; Wang, X.; Bai, B.; Dai, H. Age-optimal trajectory planning for UAV-assisted data collection. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Honolulu, HI, USA, 15–19 April 2018; pp. 553–558. [Google Scholar]
  21. Abd-Elmagid, M.A.; Dhillon, H.S. Average peak Age-of- Information minimization in UAV-assisted IoT networks. IEEE Trans. Veh. Technol. 2019, 68, 2003–2008. [Google Scholar] [CrossRef]
  22. Valehi, A.; Razi, A. Maximizing energy efficiency of cognitive wireless sensor networks with constrained age of information. IEEE Trans. Cognit. Commun. Netw. 2017, 3, 643–654. [Google Scholar] [CrossRef]
  23. Buyukates, B.; Soysal, A.; Ulukus, S. Age of information in two-hop multicast networks. In Proceedings of the 2018 52nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 28–31 October 2018; pp. 513–517. [Google Scholar]
  24. Moltafet, M.; Leinonen, M.; Codreanu, M.; Pappas, N. Power minimization for Age of information constrained dynamic control in wireless sensor networks. IEEE Trans. Commun. 2022, 70, 419–432. [Google Scholar] [CrossRef]
  25. Abdel-Aziz, M.K.; Liu, C.-F.; Samarakoon, S.; Bennis, M.; Saad, W. Ultra-reliable low-latency vehicular networks: Taming the age of information tail. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–7. [Google Scholar]
  26. Tang, H.; Wang, J.; Wang, J.; Song, L.; Song, J.; Song, J. Minimizing age of information with power constraints: Multiuser opportunistic scheduling in multi-state time-varying channels. IEEE J. Sel. Areas Commun. 2020, 38, 854–868. [Google Scholar] [CrossRef]
  27. Arafa, A.; Yang, J.; Ulukus, S.; Poor, H.V. Age-minimal transmission for energy harvesting sensors with finite batteries: Online policies. IEEE Trans. Inf. Theory 2020, 66, 534–556. [Google Scholar] [CrossRef]
  28. Abd-Elmagid, M.A.; Dhillon, H.S.; Pappas, N. A reinforcement learning framework for optimizing Age of information in RF-powered communication systems. IEEE Trans. Commun. 2020, 68, 4747–4760. [Google Scholar] [CrossRef]
  29. Arafa, A.; Ulukus, S. Timely updates in energy harvesting two-Hop networks: Offline and online policies. IEEE Trans. Wirel. Commun. 2019, 18, 4017–4030. [Google Scholar] [CrossRef]
  30. Leng, S.; Yener, A. Age of information minimization for an energy harvesting cognitive radio. IEEE Trans. Cognit. Commun. Netw. 2019, 5, 427–439. [Google Scholar] [CrossRef]
  31. Hu, H.; Xiong, K.; Qu, G.; Ni, Q.; Fan, P.; Letaief, K.B. AoI-minimal trajectory planning and data collection in UAV-assisted wireless powered IoT networks. IEEE Internet Things J. 2021, 8, 1211–1223. [Google Scholar] [CrossRef]
  32. Sun, Y.; Uysal-Biyikoglu, E.; Yates, R.D.; Koksal, C.E.; Shroff, N.B. Update or wait: How to keep your data fresh. IEEE Trans. Inf. Theory 2017, 3, 7492–7508. [Google Scholar] [CrossRef]
  33. Neely, M.J. Stochastic network optimization with application to communication and queueing systems. In Synthesis Lectures on Communication Networks; Morgan and Claypool: Belmont, MA, USA, 2010; Volume 3, pp. 1–211. [Google Scholar]
  34. Georgiadis, L.; Neely, M.J.; Tassiulas, L. Resource allocation and cross-layer control in wireless networks. Found. Trends Netw. 2006, 1, 1–144. [Google Scholar] [CrossRef]
  35. Bi, S.; Huang, L.; Wang, H.; Zhang, Y.-J.A. Lyapunov-guided deep reinforcement learning for stable online computation offloading in mobile-edge computing networks. IEEE Trans. Wirel. Commun. 2021, 20, 7519–7537. [Google Scholar] [CrossRef]
  36. Mao, Y.; Zhang, J.; Letaief, K.B. Dynamic computation offloading for mobile-edge computing with energy harvesting devices. IEEE J. Sel. Areas Commun. 2016, 34, 3590–3605. [Google Scholar] [CrossRef]
  37. Lin, L.-J. Reinforcement Learning for Robots Using Neural Networks; School of Computer Science, Carnegie Mellon University: Pittsburgh, PA, USA, 1993; Tech. Rep. CMU-CS-93-103. [Google Scholar]
  38. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjel, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Figure 1. System model.
Figure 1. System model.
Entropy 27 00603 g001
Figure 2. Illustration of the scheduling process for energy transfer and packet transmission in WPSNs.
Figure 2. Illustration of the scheduling process for energy transfer and packet transmission in WPSNs.
Entropy 27 00603 g002
Figure 3. Illustration of sensor’s AOI evolution.
Figure 3. Illustration of sensor’s AOI evolution.
Entropy 27 00603 g003
Figure 4. The structure of the proposed LODR algorithm.
Figure 4. The structure of the proposed LODR algorithm.
Entropy 27 00603 g004
Figure 5. LODR algorithm’s training loss.
Figure 5. LODR algorithm’s training loss.
Entropy 27 00603 g005
Figure 6. Impact of M on WAoI.
Figure 6. Impact of M on WAoI.
Entropy 27 00603 g006
Figure 7. Impact of training interval ζ on the WAoI.
Figure 7. Impact of training interval ζ on the WAoI.
Entropy 27 00603 g007
Figure 8. Comparing LODR and DQN in terms of WAoI.
Figure 8. Comparing LODR and DQN in terms of WAoI.
Entropy 27 00603 g008
Figure 9. The comparison of each sensor’s AoI state for LODR and the DQN.
Figure 9. The comparison of each sensor’s AoI state for LODR and the DQN.
Entropy 27 00603 g009
Figure 10. The WAoI for the LODR, DQN, AoI-based, and energy-based algorithms.
Figure 10. The WAoI for the LODR, DQN, AoI-based, and energy-based algorithms.
Entropy 27 00603 g010
Figure 11. The influence of the number of subcarriers K on WAoI.
Figure 11. The influence of the number of subcarriers K on WAoI.
Entropy 27 00603 g011
Figure 12. The impact of the status-update packet size on WAoI.
Figure 12. The impact of the status-update packet size on WAoI.
Entropy 27 00603 g012
Figure 13. The impact of the capacity of batteries on WAoI.
Figure 13. The impact of the capacity of batteries on WAoI.
Entropy 27 00603 g013
Table 1. The DNN architecture.
Table 1. The DNN architecture.
LayersNumber of NeuronsActivation Function
Input layer 4 N /
Hidden layer 1120ReLU
Hidden layer 280ReLU
HOutput layer N + 1 Sigmoid
Table 2. Simulation parameters of LODR.
Table 2. Simulation parameters of LODR.
Simulation ParameterValues
Learning rate0.01
Training interval10
Memory size1024
Batch size128
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, J.; Xia, J.; Zhang, S.; Yu, X. Age of Information Minimization in Multicarrier-Based Wireless Powered Sensor Networks. Entropy 2025, 27, 603. https://doi.org/10.3390/e27060603

AMA Style

Sun J, Xia J, Zhang S, Yu X. Age of Information Minimization in Multicarrier-Based Wireless Powered Sensor Networks. Entropy. 2025; 27(6):603. https://doi.org/10.3390/e27060603

Chicago/Turabian Style

Sun, Juan, Jingjie Xia, Shubin Zhang, and Xinjie Yu. 2025. "Age of Information Minimization in Multicarrier-Based Wireless Powered Sensor Networks" Entropy 27, no. 6: 603. https://doi.org/10.3390/e27060603

APA Style

Sun, J., Xia, J., Zhang, S., & Yu, X. (2025). Age of Information Minimization in Multicarrier-Based Wireless Powered Sensor Networks. Entropy, 27(6), 603. https://doi.org/10.3390/e27060603

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop