Abstract
Massive Internet of Things (IoT) deployments face critical spectrum crowding and energy scarcity challenges. Energy harvesting (EH) symbiotic radio (SR), where secondary devices share spectrum and harvest energy from non-orthogonal multiple access (NOMA)-based primary systems, offers a sustainable solution. We consider long-term throughput maximization in an EHSR network with a nonlinear EH model. To solve this non-convex problem, we designed a two-layered optimization algorithm combining convex optimization with a deep reinforcement learning (DRL) framework. The derived optimal power, time allocation factor, and the time-varying environment state are fed into the proposed long short-term memory (LSTM) attention mechanism combined Deep Deterministic Policy Gradient, named the LAMDDPG algorithm to achieve the optimal long-term throughput. Simulation results demonstrate that by equipping the Actor with LSTM to capture temporal state and enhancing the Critic with channel-wise attention mechanism, namely Squeeze-and-Excitation Block, for precise Q-evaluation, the LAMDDPG algorithm achieves a faster convergence rate and optimal long-term throughput compared to the baseline algorithms. Moreover, we find the optimal number of PDs to maintain efficient network performance under NLPM, which is highly significant for guiding practical EHSR applications.