Throughput Maximization in EH Symbiotic Radio System Based on LSTM-Attention-Driven DDPG
Abstract
1. Introduction
2. Related Works
2.1. EH in CR-NOMA Networks
2.2. Deep Reinforcement Learning for Resource Allocation
2.3. Motivations
3. System Model
4. Problem Formulation and Decomposition
4.1. Problem Transformation
4.2. Solving Problem P3
4.3. Solving Problem P4
5. Proposed LAMDDPG Algorithm
5.1. Framework of LAMDDPG Algorithm
5.2. Neural Network Architectures and Training Parameters
| Algorithm 1 LAMDDPG algorithm | |
| Input: Environment, settings of and PDs Output: parameters Initialize system parameters, initialize Actor network, and Critic network | |
| Initialize target network weights parameters | |
| Initialize experience replay memory | |
| 1 | For episode = 1 to nep, do: |
| 2 | Initialization of noise n, initialization of large-scale fading, and small-scale random fading |
| 3 | Obtain the initial state s1 |
| 4 | For k = 1 to T, do: |
| 5 | Select |
| 6 | Execute action ak, receive reward rk, and environment state sk+1, and store the array in the experience replay memory |
| 7 | Randomly sample a batch of experience from the replay memory |
| 8 | Set |
| 9 | Minimize the loss function to update the Critic Q network |
| 10 | Sample strategy gradient update Actor policy network |
| 11 | Update target network |
| 12 | End for |
| 13 | End for |
6. Simulation Results
- (1)
- DDPG algorithm: using the baseline algorithm proposed in [27].
- (2)
- Greedy algorithm: consumes all battery energy during the data transmission and then begins energy harvesting [27].
- (3)
- Random algorithm: transmit data using as the transmit power, is randomly generated between 0 and .
- (4)
- LMDDPG algorithm: To demonstrate the effect of LSTM and attention mechanisms, we design the LMDDPG algorithm by incorporating LSTM layers into the Actor networks of DDPG.
6.1. Rewards Analysis Under Different Algorithms
6.2. Rewards Analysis Under Different EH Model
6.3. Mechanism Analysis of Performance Improvement
6.4. Rewards Analysis Under Different Numbers of PDs
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| IoT | Internet of Things |
| EH | Energy harvesting |
| SR | Symbiotic radio |
| NOMA | Non-orthogonal multiple access |
| CR | Cognitive radio |
| LSTM | Long short-term memory |
| DDPG | Deep Deterministic Policy Gradient |
| SIC | Successive interference cancellation |
| QoS | Quality of service |
| SUs | Secondary users |
| PUs | Primary users |
| SWIPT | Simultaneous wireless information and power transfer |
| TS | Time-switching |
| PS | Power-splitting |
| CSI | Channel state information |
| DRL | Deep reinforcement learning |
| PPO | Proximal Policy Optimization |
| CER-DDPG | Combined experience replay with DDPG |
| NLPMs | Nonlinear power models |
| PER | Prioritized experience replay |
| TDMA | Time-Division Multiple Access |
| PDs | Primary devices |
Appendix A. Proof That Problems P7 and P8 Are Feasible
Appendix B. Proof That Problems P7 and P8 Are Concave Functions
References
- Donta, P.K.; Srirama, S.N.; Amgoth, T.; Annavarapu, C.S.R. Survey on recent advances in IoT application layer protocols and machine learning scope for research directions. Digit. Commun. Netw. 2022, 8, 727–744. [Google Scholar] [CrossRef]
- Andrews, J.G.; Buzzi, S.; Choi, W.; Hanly, S.V.; Lozano, A.; Soong, A.K.; Zhang, J.C. What will 5G be? IEEE J. Sel. Areas Commun. 2014, 32, 1065–1082. [Google Scholar] [CrossRef]
- Makki, B.; Chitti, K.; Behravan, A.; Alouini, M.-S. A survey of NOMA: Current status and open research challenges. IEEE Open J. Commun. Soc. 2020, 1, 179–189. [Google Scholar] [CrossRef]
- Lei, H.; She, X.; Park, K.-H.; Ansari, I.S.; Shi, Z.; Jiang, J.; Alouini, M.-S. On secure CDRT with NOMA and physical-layer network coding. IEEE Trans. Commun. 2023, 71, 381–396. [Google Scholar] [CrossRef]
- Kilzi, A.; Farah, J.; Nour, C.A.; Douillard, C. Mutual successive interference cancellation strategies in NOMA for enhancing the spectral efficiency of CoMP systems. IEEE Trans. Commun. 2020, 68, 1213–1226. [Google Scholar] [CrossRef]
- Li, X.; Zheng, Y.; Khan, W.U.; Zeng, M.; Li, D.; Ragesh, G.K.; Li, L. Physical layer security of cognitive ambient backscatter communications for green Internet-of-Things. IEEE Trans. Green Commun. Netw. 2021, 5, 1066–1076. [Google Scholar] [CrossRef]
- Chen, B.; Chen, Y.; Chen, Y.; Cao, Y.; Zhao, N.; Ding, Z. A novel spectrum sharing scheme assisted by secondary NOMA relay. IEEE Wirel. Commun. Lett. 2018, 7, 732–735. [Google Scholar] [CrossRef]
- Do, D.-T.; Le, A.-T.; Lee, B.M. NOMA in Cooperative Underlay Cognitive Radio Networks Under Imperfect SIC. IEEE Access 2020, 8, 86180–86195. [Google Scholar] [CrossRef]
- Ali, Z.; Khan, W.U.; Sidhu, G.A.S.; K, N.; Li, X.; Kwak, K.S.; Bilal, M. Fair power allocation in cooperative cognitive systems under NOMA transmission for future IoT networks. Alex. Eng. J. 2022, 61, 575–583. [Google Scholar] [CrossRef]
- Jiang, Q.; Zhang, C.; Zheng, W.; Wen, X. Research on Delay DRL in Energy-Constrained CR-NOMA Networks based on Multi-Threads Markov Reward Process. In Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications (ICCC), Nanjing, China, 29 March–1 April 2021. [Google Scholar] [CrossRef]
- Elmadina, N.N.; Saeid, E.; Mokhtar, R.A.; Saeed, R.A.; Ali, E.S.; Khalifa, O.O. Performance of Power Allocation Under Priority User in CR-NOMA. In Proceedings of the 2023 IEEE 3rd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA), Benghazi, Libya, 21–23 May 2023. [Google Scholar] [CrossRef]
- Alhamad, R.; Boujemâa, H. Optimal power allocation for CRN-NOMA systems with adaptive transmit power. Signal Image Video Process. 2020, 14, 1327–1334. [Google Scholar] [CrossRef]
- Abidrabbu, S.S.; Arslan, H. Energy-Efficient Resource Allocation for 5G Cognitive Radio NOMA Using Game Theory. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC), Nanjing, China, 29 March–1 April 2021. [Google Scholar] [CrossRef]
- Xie, N.; Tan, H.; Huang, L.; Liu, A.X. Physical-layer authentication in wirelessly powered communication networks. IEEE/ACM Trans. Netw. 2021, 29, 1827–1840. [Google Scholar] [CrossRef]
- Huang, J.; Xing, C.; Guizani, M. Power allocation for D2D communications with SWIPT. IEEE Trans. Wirel. Commun. 2020, 19, 2308–2320. [Google Scholar] [CrossRef]
- Liu, Y.; Ding, Z.; Elkashlan, M.; Poor, H.V. Cooperative non-orthogonal multiple access with simultaneous wireless information and power transfer. IEEE J. Sel. Areas Commun. 2016, 34, 938–953. [Google Scholar] [CrossRef]
- Mazhar, N.; Ullah, S.A.; Jung, H.; Nadeem, Q.-U.-A.; Hassan, S.A. Enhancing spectral efficiency in IoT networks using deep deterministic policy gradient and opportunistic NOMA. In Proceedings of the 2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall), Washington, DC, USA, 7–10 October 2024. [Google Scholar] [CrossRef]
- Yang, J.; Cheng, Y.; Peppas, K.P.; Mathiopoulos, P.T.; Ding, J. Outage performance of cognitive DF relaying networks employing SWIPT. China Commun. 2018, 15, 28–40. [Google Scholar] [CrossRef]
- Song, Z.; Wang, X.; Liu, Y.; Zhang, Z. Joint Spectrum Resource Allocation in NOMA-based Cognitive Radio Network with SWIPT. IEEE Access 2019, 7, 89594–89603. [Google Scholar] [CrossRef]
- Yang, C.; Lu, W.; Huang, G.; Qian, L.; Li, B.; Gong, Y. Power Optimization in Two-way AF Relaying SWIPT based Cognitive Sensor Networks. In Proceedings of the 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), Victoria, BC, Canada, 18 November–16 December 2020. [Google Scholar] [CrossRef]
- Liu, X.; Zheng, K.; Chi, K.; Zhu, Y.-H. Cooperative Spectrum Sensing Optimization in Energy-Harvesting Cognitive Radio Networks. IEEE Trans. Wirel. Commun. 2020, 19, 7663–7676. [Google Scholar] [CrossRef]
- Wang, Y.; Chen, S.; Wu, Y.; Zhao, C. Maximizing Average Throughput of Cooperative Cognitive Radio Networks Based on Energy Harvesting. Sensors 2022, 22, 8921. [Google Scholar] [CrossRef]
- Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.-C.; Kim, D.I. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
- Umeonwuka, O.O.; Adejumobi, B.S.; Shongwe, T. Deep Learning Algorithms for RF Energy Harvesting Cognitive IoT Devices: Applications, Challenges and Opportunities. In Proceedings of the 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), Prague, Czech Republic, 20–22 July 2022. [Google Scholar] [CrossRef]
- Du, K.; Xie, X.; Shi, Z.; Li, M. Joint Time and Power Control of Energy Harvesting CRN Based on PPO. In Proceedings of the 2022 Wireless Telecommunications Symposium (WTS), Pomona, CA, USA, 6–8 April 2022. [Google Scholar] [CrossRef]
- Al Rabee, F.T.; Masadeh, A.; Abdel-Razeq, S.; Salameh, H.B. Actor–Critic Reinforcement Learning for Throughput-Optimized Power Allocation in Energy Harvesting NOMA Relay-Assisted Networks. IEEE Open J. Commun. Soc. 2024, 5, 7941–7953. [Google Scholar] [CrossRef]
- Ding, Z.; Schober, R.; Poor, H.V. No-Pain No-Gain: DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks. IEEE Trans. Commun. 2021, 69, 5917–5932. [Google Scholar] [CrossRef]
- Shi, Z.; Xie, X.; Lu, H.; Yang, H.; Cai, J.; Ding, Z. Deep Reinforcement Learning-Based Multidimensional Resource Management for Energy Harvesting Cognitive NOMA Communications. IEEE Trans. Commun. 2022, 70, 3110–3125. [Google Scholar] [CrossRef]
- Ullah, A.; Zeb, S.; Mahmood, A.; Hassan, S.A.; Gidlund, M. Opportunistic CR-NOMA Transmissions for Zero-Energy Devices: A DRL-Driven Optimization Strategy. IEEE Wirel. Commun. Lett. 2023, 12, 893–897. [Google Scholar] [CrossRef]
- Du, K.; Xie, X.; Shi, Z.; Li, M. Throughput maximization of EH-CRN-NOMA based on PPO. In Proceedings of the 2023 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 26–28 April 2023. [Google Scholar] [CrossRef]
- Zhou, F.; Chu, Z.; Wu, Y.; Al-Dhahir, N.; Xiao, P. Enhancing PHY security of MISO NOMA SWIPT systems with a practical non-linear EH model. In Proceedings of the 2018 IEEE International Conference on Communications Workshops (ICC Workshops), Kansas City, MO, USA, 20–24 May 2018. [Google Scholar] [CrossRef]
- Kumar, D.; Singya, P.K.; Choi, K.; Bhatia, V. SWIPT enabled cooperative cognitive radio sensor network with non-linear power amplifier. IEEE Trans. Cogn. Commun. Netw. 2023, 9, 884–896. [Google Scholar] [CrossRef]
- Mohammed, A.A.; Baig, M.W.; Sohail, M.A.; Ullah, S.A.; Jung, H.; Hassan, S.A. Navigating boundaries in quantifying robustness: A DRL expedition for non-linear energy harvesting IoT networks. IEEE Commun. Lett. 2024, 28, 2447–2451. [Google Scholar] [CrossRef]
- Ullah, S.A.; Mahmood, A.; Nasir, A.A.; Gidlund, M.; Hassan, S.A. DRL-driven optimization of a wireless powered symbiotic radio with non-linear EH model. IEEE Open J. Commun. Soc. 2024, 5, 5232–5247. [Google Scholar] [CrossRef]
- Li, K.; Ni, W.; Dressler, F. LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UAV-Assisted Sensor Network. IEEE Internet Things J. 2022, 9, 4179–4189. [Google Scholar] [CrossRef]
- He, X.; Mao, Y.; Liu, Y.; Ping, P.; Hong, Y.; Hu, H. Channel assignment and power allocation for throughput improvement with PPO in B5G heterogeneous edge networks. Digit. Commun. Netw. 2024, 10, 109–116. [Google Scholar] [CrossRef]
- Ullah, I.; Singh, S.K.; Adhikari, D.; Khan, H.; Jiang, W.; Bai, X. Multi-Agent Reinforcement Learning for task allocation in the Internet of Vehicles: Exploring benefits and paving the future. Swarm Evol. Comput. 2025, 94, 101878. [Google Scholar] [CrossRef]
- Alhartomi, M.; Salh, A.; Audah, L.; Alzahrani, S.; Alzahmi, A. Enhancing Sustainable Edge Computing Offloading via Renewable Prediction for Energy Harvesting. IEEE Access 2024, 12, 74011–74023. [Google Scholar] [CrossRef]
- Choi, J.; Lee, B.-J.; Zhang, B.-T. Multi-focus Attention Network for Efficient Deep Reinforcement Learning. In Proceedings of the AAAI 2017 Workshop on What’s Next for AI in Games, AAAI 2017, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar] [CrossRef]
- Zhou, X.; Zhang, R.; Ho, C.K. Wireless Information and Power Transfer: Architecture Design and Rate-Energy Tradeoff. IEEE Trans. Commun. 2013, 61, 4754–4767. [Google Scholar] [CrossRef]
- Yuan, T.; Liu, M.; Feng, Y. Performance Analysis for SWIPT Cooperative DF Communication Systems with Hybrid Receiver and Non-Linear Energy Harvesting Model. Sensors 2020, 20, 2472. [Google Scholar] [CrossRef]
- Boyd, S.; Vandenberghe, L. Convex Optimization, 1st ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series and Products, 6th ed.; Academic Press: New York, NY, USA, 2000. [Google Scholar]












| Notation | Description |
|---|---|
| M | Number of primary devices |
| EH symbiotic radio SD | |
| Any given time slot | |
| Channel gain between BS and | |
| Channel gain between and | |
| Channel gain between and BS | |
| Time allocation factor | |
| Duration for every time slot | |
| Transmit power of | |
| Energy harvesting threshold | |
| Energy harvesting efficiency | |
| Residual energy in the battery of | |
| Maximum battery capacity of | |
| Transmit power of | |
| Maximum transmit power of | |
| Discount rate | |
| Energy surplus | |
| Principal branch of the Lambert W function | |
| Current state | |
| Parameters of Actor, target Actor network | |
| Noise | |
| Action | |
| Target action | |
| Features from the hidden layer of Critic | |
| Parameters of Critic, Critic target network | |
| Output of the Critic network | |
| Output of the Critic target network | |
| System reward | |
| New state | |
| Experience tuple | |
| Weight | |
| Batch size | |
| Target value | |
| Update coefficient |
| Parameter Settings | |
|---|---|
| Learning rate of the Actor network | 0.002 |
| Learning rate of the Critic network | 0.004 |
| discount rate | 0.9 |
| Network update parameters | 0.01 |
| Batch size of experience replay pool | 32 |
| transmission power of | 30 dBm |
| initial energy | 0.1 J |
| The duration of each time slot T | 1 s |
| The maximum transmission power of | 0.1 W |
| Energy harvesting efficiency | 0.7 |
| Noise power spectral density | −170 dBm/Hz |
| Bandwidth of noise | 1 MHz |
| Noise carrier frequency | 914 MHz |
| Path Loss Exponent | 3 |
| Number of hidden layer nodes | 64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, Y.; Kang, L.; Su, J.; Yang, D. Throughput Maximization in EH Symbiotic Radio System Based on LSTM-Attention-Driven DDPG. Electronics 2025, 14, 4835. https://doi.org/10.3390/electronics14244835
Zhu Y, Kang L, Su J, Yang D. Throughput Maximization in EH Symbiotic Radio System Based on LSTM-Attention-Driven DDPG. Electronics. 2025; 14(24):4835. https://doi.org/10.3390/electronics14244835
Chicago/Turabian StyleZhu, Yanjun, Lin Kang, Jinrong Su, and Di Yang. 2025. "Throughput Maximization in EH Symbiotic Radio System Based on LSTM-Attention-Driven DDPG" Electronics 14, no. 24: 4835. https://doi.org/10.3390/electronics14244835
APA StyleZhu, Y., Kang, L., Su, J., & Yang, D. (2025). Throughput Maximization in EH Symbiotic Radio System Based on LSTM-Attention-Driven DDPG. Electronics, 14(24), 4835. https://doi.org/10.3390/electronics14244835

