DRL-Based Scheduling for AoI Minimization in CR Networks with Perfect Sensing
Abstract
1. Introduction
- We focus on minimizing the average AoI for a single RF energy-harvesting SU with a fixed time duration. The SU obtains its energy supply through harvesting energy from PU transmissions and is enabled to deliver status update data packets to the CBS only when the PU spectrum is identified as being in an idle condition. In each discrete time interval, the SU’s spectrum sensing and status update decisions are adaptively made considering its current energy reserves, AoI, channel link quality, and the availability of the PU spectrum. The decision-making problem under consideration is modeled using a POMDP with discrete state and action sets. The optimal policy for this model is then derived through the application of dynamic programming.
- We extend the scenario to multiple SUs, where the objective is to minimize the long-term average weighted sum-AoI by making adaptive sensing and update decisions. We model this decision-making problem as a POMDP with finite state and action spaces. However, due to the computational challenges posed by the extreme curse of dimensionality in the state space of the POMDP, we propose an improved DQN approach to learn the optimal policy. This enhanced DQN approach is tailored to handle the POMDP problem, where the partially observable state is modeled as a Markov chain.
- We validate through extensive simulations that the proposed policies essentially improve the system performance compared to the myopic policy, and we also analyze the impact of system parameter settings on the system performance.
2. System Model for RF Energy-Harvesting CRN with One SU
2.1. Primary User Model
2.2. Secondary User Model
3. FINITE POMDP FORMULATION for RF Energy-Harvesting CRN with One SU
3.1. POMDP Formulation
- Actions: Initially, the SU determines whether to perform spectrum sensing. If it does not sense the spectrum, then it harvests energy from the PU transmissions and does not deliver the status update data pack, i.e., . If it senses the spectrum and finds that the spectrum is occupied by the PU, it also cannot perform an update i.e., . If it senses the spectrum and finds that the spectrum is vacated by the PU, it needs to further decide whether to deliver the status update data pack based on its AoI, channel state from it to the CBS, the channel state from the PU to it, and the energy availability, i.e., or . Consequently, the action in each time slot can be defined as , where and .
- Observations and beliefs: The PU’s state is observed as , while the belief signifies the probability of spectrum availability. This belief is dynamically updated based on the sequence of past actions and observations, according to the transition function , as follows:Specifically, if the SU chooses not to perform spectrum sensing, the subsequent belief update is contingent upon two possible scenarios: (1) If the battery state remains unchanged, the belief is updated based solely on the PU state Markov chain. (2) If the battery energy increases, it means that the PU channel in the time slot t is busy, and . When the SU performs spectrum sensing, the outcome of this process reflects the actual occupancy status of the spectrum. Equation (9) reveals that the SU is restricted to transitioning between only three distinct belief states, implying a finite belief space within the T time slot horizon. Consequently, given a finite duration of T time slots, the belief space constitutes a finite set.
- States: The discrete battery energy level of the SU at the start of time slot t is denoted by , where signifies the SU’s maximum battery energy capacity. Consequently, the energy associated with each quantum is Joules. The continuous available battery energy of the SU is discretized into energy levels using the formula . The floor function applied here yields a lower bound on the AoI for the continuous system. Similarly, the continuous channel power gains are quantized into a finite number of levels according to the fading probability density function (PDF). These discrete levels of channel power gain are represented by and , where and denote the peak channel power gain values for the SU-to-CBS link and the PU-to-SU link. There are fully observable states in each time slot, including the AoI state, the SU-to-CBS channel state, the PU-to-SU channel state, and the battery state, represented by . It is important to note that the state space is finite. Furthermore, the PU spectrum state is partially observable and characterized by the belief . Consequently, for , the entire system state is represented by . Given the finite nature of both and , the SU can only encounter a limited number of possible system states .
- Transition probabilities: Given the current state and action , the probability of transitioning to the next state is expressed as . Since the harvested energy and the channel power gains are i.i.d, we haveEquation (12) means that battery state transition probability is 1 if the battery’s state changes according to the actual action; otherwise, it is 0.
- Cost: In the state , the immediate cost is represented by , where signifies the accumulated AoI at the time t. We then have
- Policy: The policy is defined as a sequence of deterministic decision rules , where each rule, , maps the system state into an action, , i.e., . In this paper, let represent the set of all deterministic decision policies.
3.2. Dynamic Programming-Based POMDP Solution
4. System Model for RF Energy-Harvesting CRN with Multiple SUs
Secondary Users Model
5. INFINITE POMDP FORMULATION for RF Energy-Harvesting CRN with Multiple SUs
5.1. POMDP Formulation
- Actions: The CC decides the sensing SU and the updating SU. The action implemented within each time slot is , where and .
- Observations and beliefs: Let denote the discrete belief level at the beginning of the time slot t, where represents the maximum belief level. In this case, the continuous belief can be converted into the discrete belief level according to .
- State: The discrete battery energy level of the n-th SU at the start of the time slot t is denoted by , where belongs to the set . Here, represents the maximum energy storage capacity of the n-th SU’s battery. Thus, each energy quantum of the n-th SU’s battery corresponds to Joules. In this case, the n-th SU’s continuous battery energy can be converted into the discrete battery energy level state according to . Likewise, the continuous channel power gains for the links between the n-th SU and the CBS, and the PU and the n-th SU, are mapped to discrete levels, i.e., and , where and signify the upper bounds of the channel power gain levels from the n-th SU to the CBS and from the PU to the n-th SU, respectively. The completely observable state of the n-th SU at any time slot, t, is composed of its AoI value, the channel condition between it and the CBS, the channel condition from the PU to it, and its residual battery energy. These are denoted by . The state of all the SUs in the time slot t is represented by . Integrating the PU spectrum belief, the complete system state is denoted by .
- Transition probabilities: For the n-th SU, the transition probability from the current state to the next state under the action is given byThen, the overall transition probability is given by
- Cost: Let the immediate cost incurred in the time slot t in the state be denoted by , quantifying the weighted sum-AoI at that specific time instant. Therefore, we obtain
5.2. DRL-Based POMDP Solution
Algorithm 1: The new DQN for average weighted sum-AoI minimization |
6. Numerical Results and Discussions
6.1. One SU’s Finite-Horizon AoI Evaluation
6.2. Multiple-SU Infinite-Horizon AoI Evaluation
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Force, S. Spectrum Policy Task Force Report; Federal Communications Commission: Washington, DC, USA, 2002. [Google Scholar]
- Zheng, K.; Jia, X.; Chi, K.; Liu, X. DDPG-based joint time and energy management in ambient backscatter-assisted hybrid underlay CRNs. IEEE Trans. Commun. 2023, 71, 441–456. [Google Scholar] [CrossRef]
- Ghosh, S.; Maity, S.P.; Chakraborty, C. On EE maximization in D2D-CRN with eavesdropping using LSTM-based channel estimation. IEEE Trans. Consum. Electron. 2024, 70, 3906–3913. [Google Scholar] [CrossRef]
- Wu, Y.; Zhou, F.; Wu, W.; Wu, Q.; Hu, R.Q.; Wong, K.-K. Multi-Objective Optimization for Spectrum and Energy Efficiency Tradeoff in IRS-Assisted CRNs With NOMA. IEEE Trans. Wirel. Commun. 2022, 21, 6627–6642. [Google Scholar] [CrossRef]
- Zheng, K.; Liu, X.; Liu, X.; Zhu, Y. Hybrid overlay-underlay cognitive radio networks with energy harvesting. IEEE Trans. Wirel. Commun. 2019, 67, 4669–4682. [Google Scholar] [CrossRef]
- Thakur, S.; Singh, A.; Majhi, S. Secrecy Analysis of Underlay CRN in the Presence of Correlated and Imperfect Channel. IEEE Trans. Cogn. Commun. Netw. 2023, 9, 754–764. [Google Scholar] [CrossRef]
- Bi, S.; Zeng, Y.; Zhang, R. Wireless powered communication networks: An overview. IEEE Wirel. Commun. 2016, 23, 10–18. [Google Scholar] [CrossRef]
- Zhang, G.; Xu, J.; Wu, Q.; Cui, M.; Li, X.; Lin, F. Wireless powered cooperative jamming for secure OFDM system. IEEE Trans. Veh. Technol. 2018, 67, 1331–1346. [Google Scholar] [CrossRef]
- Sudevalayam, S.; Kulkarni, P. Energy harvesting sensor nodes: Survey and implications. IEEE Commun. Surveys Tuts. 2011, 13, 443–461. [Google Scholar] [CrossRef]
- Ju, H.; Zhang, R. Throughput maximization in wireless powered communication networks. IEEE Trans. Wirel. Commun. 2014, 13, 418–428. [Google Scholar] [CrossRef]
- Chen, Y.; Zhao, Q.; Swami, A. Distributed spectrum sensing and access in cognitive radio networks with energy constraint. IEEE Trans. Signal Process. 2009, 57, 783–797. [Google Scholar] [CrossRef]
- Zhang, S.; Bao, S.; Chi, K.; Yu, K.; Mumtaz, S. DRL-based computation rate maximization for wireless powered Multi-AP edge computing. IEEE Trans. Commun. 2024, 72, 1105–1118. [Google Scholar] [CrossRef]
- Chi, K.; Sun, J.; Zhang, S.; Huang, L. Secrecy rate maximization for multicarrier-based cognitive radio networks with an energy harvesting Jammer. IEEE Sens. J. 2023, 23, 3220–3232. [Google Scholar] [CrossRef]
- Zhang, S.; Gu, H.; Chi, K.; Huang, L.; Yu, K.; Mumtaz, S. DRL-based partial offloading for maximizing sum computation rate of wireless Powered mobile edge computing network. IEEE Trans. Wirel. Commun. 2022, 21, 10934–10948. [Google Scholar] [CrossRef]
- Zhang, Y.; Han, W.; Li, D.; Zhang, P.; Cui, S. Power versus spectrum 2-D sensing in energy harvesting cognitive radio networks. IEEE Trans. Signal Process. 2015, 63, 6200–6212. [Google Scholar] [CrossRef]
- Bae, Y.H.; Baek, J.W. Performance analysis of delay-constrained traffic in a cognitive radio network with RF energy harvesting. IEEE Commun. Lett. 2019, 23, 2177–2181. [Google Scholar] [CrossRef]
- Pratibha, K.; Li, H.; Teh, K.C. Optimal spectrum access and energy supply for cognitive radio systems with opportunistic RF energy harvesting. IEEE Trans. Veh. Technol. 2017, 66, 7114–7122. [Google Scholar] [CrossRef]
- Xu, M.; Jin, M.; Guo, Q.; Li, Y. Multichannel selection for cognitive radio networks with RF energy harvesting. IEEE Commun. Lett. 2018, 7, 178–181. [Google Scholar] [CrossRef]
- Celik, A.; Alsharoa, A.; Kamal, A.E. Hybrid energy harvesting-Based cooperative spectrum sensing and access in heterogeneous cognitive radio networks. IEEE Tran. Cognit. Commun. Netw. 2017, 3, 37–48. [Google Scholar] [CrossRef]
- Xu, C.; Zheng, M.; Liang, W.; Yu, H.; Liang, Y. End-to-End throughput maximization for underlay multi-Hop cognitive radio networks with RF energy harvesting. IEEE Trans. Wirel. Commun. 2017, 16, 3561–3572. [Google Scholar] [CrossRef]
- Perera, C.; Liu, C.H.; Jayawardena, S. The emerging internet of things marketplace from an industrial perspective: A survey. IEEE Trans. Emerg. Top. Comput. 2015, 3, 585–598. [Google Scholar] [CrossRef]
- Khan, A.A.; Rehmani, M.H.; Rachedi, A. Cognitive-radio-based internet of things: Applications, architectures, spectrum related functionalities, and future research directions. IEEE Wirel. Commun. 2017, 24, 17–25. [Google Scholar] [CrossRef]
- Khan, A.A.; Rehmani, M.H.; Rachedi, A. When cognitive radio meets the Internet of Things? In Proceedings of the 2016 International Wireless Communications and Mobile Computing Conference (IWCMC), Paphos, Cyprus, 5–9 September 2016; pp. 469–474. [Google Scholar]
- Zhu, B.; Bedeer, E.; Nguyen, H.H.; Barton, R.; Gao, Z. UAV trajectory planning for AoI-minimal data collection in UAV-Aided IoT networks by transformer. IEEE Trans. Wirel. Commun. 2023, 22, 1343–1358. [Google Scholar] [CrossRef]
- Zhang, G.; Lu, Y.; Lin, Y.; Zhong, Z.; Ding, Z.; Niyato, D. AoI minimization in RIS-aided SWIPT systems. IEEE Trans. Veh. Technol. 2024, 73, 2895–2900. [Google Scholar] [CrossRef]
- Gao, X.; Zhu, X.; Zhai, L. AoI-sensitive data collection in multi-UAV-assisted wireless sensor networks. IEEE Trans. Wirel. Commun. 2023, 22, 5185–5197. [Google Scholar] [CrossRef]
- Zhang, G.; Shen, C.; Shi, Q.; Ai, B.; Zhong, Z. AoI minimization for WSN data collection with periodic updating scheme. IEEE Trans. Wirel. Commun. 2023, 22, 32–46. [Google Scholar] [CrossRef]
- Valehi, A.; Razi, A. Maximizing energy efficiency of cognitive wireless sensor networks with constrained age of information. IEEE Tran. Cognit. Commun. Netw. 2017, 3, 643–654. [Google Scholar] [CrossRef]
- Gu, Y.; Chen, H.; Zhai, C.; Li, Y.; Vucetic, B. Minimizing age of information in cognitive radio-based IoT systems: Underlay or Overlay? IEEE Internet Things J. 2019, 6, 10273–10288. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhou, B.; Saad, W.; Luo, X. Age of information analysis for dynamic spectrum sharing. In Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Ottawa, ON, Canada, 11–14 November 2019; pp. 1–5. [Google Scholar]
- Leng, S.; Yener, A. Age of information minimization for an energy harvesting cognitive radio. IEEE Trans. Cognit. Commun. Netw. 2019, 5, 427–439. [Google Scholar] [CrossRef]
- Leng, S.; Ni, X.; Yener, A. Age of information for wireless energy harvesting secondary users in cognitive radio networks. In Proceedings of the 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Monterey, CA, USA, 4–7 November 2019; pp. 353–361. [Google Scholar]
- Wang, Q.; Chen, H.; Gu, Y.; Li, Y.; Vucetic, B. Minimizing the Age of Information of cognitive radio-based IoT systems under a collision constraint. IEEE Trans. Wirel. Commun. 2020, 19, 8054–8067. [Google Scholar] [CrossRef]
- Agarwal, P.; Ojha, S.; Srivastava, V.; Prasad, B. STAR-RIS assisted overlay cognitive radio network using DRL. In Proceedings of the 2024 IEEE International Conference on Intelligent Signal Processing and Effective Communication Technologies (INSPECT), Gwalior, India, 7–8 December 2024; pp. 1–6. [Google Scholar]
- Jia, X.; Zheng, K.; Chi, K.; Liu, X. DDPG-based throughput optimization with AoI constraint in ambient backscatter-assisted overlay CRN. Sensors 2022, 22, 3262. [Google Scholar] [CrossRef]
- López-Benítez, M.; Casadevall, F. Time-dimension models of spectrum usage for the analysis, design, and simulation of cognitive radio networks. IEEE Trans. Veh. Technol. 2013, 62, 2091–2104. [Google Scholar] [CrossRef]
- Lopez-Benitez, M.; Casadevall, F. Empirical time-dimension model of spectrum use based on a discrete-time Markov chain with deterministic and stochastic duty cycle models. IEEE Trans. Veh. Technol. 2011, 60, 2519–2533. [Google Scholar] [CrossRef]
- Abd-Elmagid, M.A.; Dhillon, H.S.; Pappas, N. A reinforcement learning framework for optimizing Age of Information in RF-powered communication systems. IEEE Trans. Commun. 2020, 68, 4747–4760. [Google Scholar] [CrossRef]
- Ho, C.K.; Zhang, R. Optimal energy allocation for wireless communications with energy harvesting constraints. IEEE Trans. Signal Process. 2012, 60, 4808–4818. [Google Scholar] [CrossRef]
- Bertsekas, D.P. Dynamic Programming and Optimal Control; Athena Scientific: Belmont, MA, USA, 2005; Volume 1–2. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Miuccio, L.; Riolo, S.; Samarakoon, S.; Bennis, M.; Panno, D. On learning generalized wireless MAC communication protocols via a feasible multi-agent reinforcement learning framework. IEEE Trans. Mach. Learn. Commun. Netw. 2024, 2, 298–317. [Google Scholar] [CrossRef]
- Miuccio, L.; Riolo, S.; Bennis, M.; Panno, D. Design of a feasible wireless MAC communication protocol via multi-agent reinforcement learning. In Proceedings of the 2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), Stockholm, Sweden, 5–8 May 2024; pp. 94–100. [Google Scholar]
- Jin, W.; Sun, J.; Chi, K.; Zhang, B. Deep reinforcement learning based scheduling for minimizing age of information in wireless powered sensor networks. Comput. Commun. 2022, 191, 1–10. [Google Scholar] [CrossRef]
Simulation Parameter | Value |
---|---|
W | 1 MHz |
−95 dBm | |
0.5 | |
2 | |
0.2 | |
0.2 s | |
13 | |
10 | |
10 | |
5 |
t | a | b | g | h | Action | |
---|---|---|---|---|---|---|
0 | 3 | 2 | 3 | 0 | 0.8 | (0,0) |
1 | 4 | 4 | 2 | 7 | 0.5 | (1,0) |
2 | 5 | 4 | 6 | 7 | 0.5 | (1,0) |
3 | 6 | 4 | 7 | 4 | 0.5 | (1,0) |
4 | 7 | 4 | 6 | 5 | 0.5 | (1,1) |
5 | 1 | 2 | 3 | 2 | 0.8 | (0,0) |
6 | 2 | 2 | 6 | 7 | 0.8 | (1,1) |
7 | 1 | 0 | 7 | 7 | 0.8 | (0,0) |
8 | 2 | 4 | 3 | 7 | 0.5 | (1,1) |
9 | 1 | 2 | 3 | 7 | 0.8 | (1,1) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, J.; Zhang, S.; Yu, X. DRL-Based Scheduling for AoI Minimization in CR Networks with Perfect Sensing. Entropy 2025, 27, 855. https://doi.org/10.3390/e27080855
Sun J, Zhang S, Yu X. DRL-Based Scheduling for AoI Minimization in CR Networks with Perfect Sensing. Entropy. 2025; 27(8):855. https://doi.org/10.3390/e27080855
Chicago/Turabian StyleSun, Juan, Shubin Zhang, and Xinjie Yu. 2025. "DRL-Based Scheduling for AoI Minimization in CR Networks with Perfect Sensing" Entropy 27, no. 8: 855. https://doi.org/10.3390/e27080855
APA StyleSun, J., Zhang, S., & Yu, X. (2025). DRL-Based Scheduling for AoI Minimization in CR Networks with Perfect Sensing. Entropy, 27(8), 855. https://doi.org/10.3390/e27080855