An Intelligent Energy-Aware Framework for 6G-Enabled Non-Terrestrial IoT via Reinforcement Learning
Abstract
1. Introduction
1.1. Motivation
1.2. Challenges
- Dynamic Channel Conditions: The wireless channel between IoT devices and HAPS undergoes rapid temporal variations due to atmospheric turbulence, shadowing, and multipath propagation. Static power allocation schemes cannot adapt to these variations, leading to either wasted energy or degraded link quality.
- Battery Constraints: IoT devices in remote areas are typically constrained by finite battery capacity. Aggressive power allocation may achieve higher instantaneous throughput but at the cost of premature battery depletion, ultimately reducing the operational lifetime of the device.
- Queue Management: Stochastic packet arrivals create variable traffic loads that must be managed efficiently. Excessive queue buildup leads to increased end-to-end latency, while aggressive queue draining may encourage power waste on transmissions of marginal benefit.
- Multi-Objective Optimization: Jointly optimizing EE, throughput, and packet delivery ratio under dynamic channel conditions represents a complex, high-dimensional optimization problem that does not admit closed-form solutions in general settings.
- Model Uncertainty: Accurate mathematical models of the channel, battery dynamics, and traffic arrivals are often unavailable or computationally expensive to maintain in real time. Model-free approaches that can learn directly from environmental interaction are therefore highly desirable.
1.3. Contributions
- A comprehensive framework is developed that jointly considers channel variability, battery constraints, and traffic dynamics in HAPS-assisted IoT communication. The model captures the interdependencies between these components and serves as the foundation for RL-based optimization.
- The transmit power control problem is formulated as an MDP and solved using a Q-learning approach that enables adaptive and model-free decision-making. The agent interacts directly with the system environment, learning optimal policies without requiring explicit knowledge of channel statistics or traffic patterns.
- A carefully designed reward function balances EE, queueing delay, and packet loss, ensuring practical applicability in real-world IoT scenarios. The reward formulation encodes the trade-offs among competing objectives and guides the RL agent toward policies that are beneficial in the long run.
- Extensive simulations demonstrate that the proposed method significantly outperforms conventional schemes, including random power selection, RR, Max-SNR, and fixed-power allocation, achieving higher EE, lower power consumption, and more stable performance. The superiority of the proposed approach is further confirmed by statistical analysis using Cumulative Distribution Function (CDF).
- The computational complexity of the Q-learning agent is analyzed with respect to the state and action space sizes, demonstrating that the proposed framework scales gracefully and is deployable on resource-constrained IoT hardware.
1.4. Related Work
2. System Model
2.1. Network Architecture
2.2. Channel Model
2.3. Assumptions
- The HAPS-to-BS backhaul link has sufficient capacity and is not a bottleneck; hence, we focus exclusively on the IoT-to-HAPS uplink.
- The IoT device has no knowledge of future channel states; decisions at time t are based solely on the current state .
- Packet arrivals are Independent and Identically Distributed (i.i.d.) across time slots.
- The RL agent operates on the IoT device and has access only to locally observable quantities (battery level and queue length), making the framework fully distributed.
- The Q-table is pre-trained in simulation and loaded onto the IoT device for inference; the device does not perform online Q-table updates during deployment.
2.4. Communication Model
2.5. Energy and Battery Model
2.6. Traffic and Queueing Model
2.7. Energy Efficiency and Reward Function
3. Reinforcement Learning Formulation
3.1. State Space
3.2. Action Space
3.3. Q-Learning Algorithm
3.4. Convergence Analysis
3.5. Complexity Analysis
4. Simulation Setup and Parameter Configuration
Benchmark Schemes
5. Results and Analysis
6. Discussion, Limitations and Future Directions
- Battery-aware conservatism: At low battery (), the agent predominantly selects silence or minimum power.
- Queue-responsive aggression: At long queues (), the agent escalates to or W to clear the backlog and avoid drops.
- Moderate power preference: In most intermediate states, W is favored, matching the peak efficiency region of the Shannon capacity curve at moderate SNR.
6.1. Limitations
- The single-device setting isolates the core battery queue EE trade-off without confounding multi-access interference, providing a clean proof-of-concept. Analytically, the robustness to parameter variation can be argued as follows: at lower packet arrival rates (), the queue rarely builds and the agent’s battery-conserving behavior dominates; at higher rates, the queue-responsive aggression behavior naturally activates. The battery capacity scales the number of transmissions before depletion but does not alter the structure of the learned policy once states are normalized. These arguments support robustness to parameter variations without requiring retraining.
- The model abstracts Doppler shift, elevation-angle dependency, and atmospheric scintillation. As discussed in Section 1.2, this is a conservative worst-case model; Rician or shadowed-Rician (Loo) models per 3GPP TR 38.811 would be more realistic and constitute a direct future extension.
- No energy harvesting; battery is non-replenishable in the current model.
- Stationary channel and traffic statistics; non-stationary environments require periodic retraining.
- Tabular Q-learning scalability; the curse of dimensionality limits state-space richness.
6.2. Future Research Directions
- Multi-Device Scenarios: Extending to multiple IoT devices sharing the HAPS uplink requires multi-agent RL (MARL) [23].
- Deep Reinforcement Learning: Replacing the Q-table with a DQN [26] agent enables richer state representations and continuous action spaces. DQN would likely improve performance, particularly in non-stationary channels, at the cost of inference overhead incompatible with current Class-AB IoT hardware [17].
- Energy Harvesting Integration: Solar or RF energy harvesting transform the problem into a co-optimization of harvest–store–transmit decisions. Lyapunov optimization combined with RL provides a framework with provable near-optimal guarantees [27].
- Reconfigurable Intelligent Surfaces: Joint optimization of RIS phase shifts and IoT transmit power via RL can substantially improve coverage for shadowed devices [28].
- Federated RL: With many IoT devices, federated RL enables distributed policy learning without sharing raw data, preserving privacy while leveraging collective device experience [29].
- Non-Stationary Adaptation: Meta-RL and sliding-window Q-learning can maintain performance under time-varying channel or traffic statistics without full retraining [30].
- Comparison with Lyapunov-Based Optimization: Lyapunov drift-plus-penalty methods provide provable near-optimal guarantees under stationary channel statistics and represent a strong analytical baseline. Comparison under both stationary and non-stationary conditions would clarify the operating regimes where model-free RL is preferable.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jamshed, M.A.; Ayaz, F.; Kaushik, A.; Fischione, C.; Ur-Rehman, M. Green uav-enabled internet-of-things network with ai-assisted noma for disaster management. In Proceedings of the 2024 IEEE 35th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Jamshed, M.A.; Kaushik, A.; Manzoor, S.; Shakir, M.Z.; Seong, J.; Toka, M.; Shin, W.; Schellmann, M. A tutorial on non-terrestrial networks: Towards global and ubiquitous 6G connectivity. Found. Trends Netw. 2025, 14, 160–253. [Google Scholar] [CrossRef]
- Haq, B.; Jamshed, M.A.; Nauman, A. Integrated Terrestrial and Non-Terrestrial Network: An Overview. In Integrated Terrestrial and Non-Terrestrial Networks; Springer: Cham, Switzeerland, 2024; pp. 1–16. [Google Scholar]
- Zeng, Y.; Zhang, R.; Lim, T.J. Wireless communications with unmanned aerial vehicles: Opportunities and challenges. IEEE Commun. Mag. 2016, 54, 36–42. [Google Scholar] [CrossRef]
- Khan, W.U.; Sheemar, C.K.; Jamshed, M.A.; Lagunas, E.; Querol, J.; Kaushik, A.; Chatzinotas, S. RIS-Enabled Joint Communications and Sensing in 6G NTNs: Opportunities and Challenges. IEEE Open J. Commun. Soc. 2026, 7, 821–843. [Google Scholar] [CrossRef]
- Kurt, G.K.; Khoshkholgh, M.G.; Alfattani, S.; Ibrahim, A.; Darwish, T.S.; Alam, M.S.; Yanikomeroglu, H.; Yongacoglu, A. A vision and framework for the high altitude platform station (HAPS) networks of the future. IEEE Commun. Surv. Tutor. 2021, 23, 729–779. [Google Scholar] [CrossRef]
- Shibata, Y.; Kanazawa, N.; Konishi, M.; Hoshino, K.; Ohta, Y.; Nagate, A. System design of gigabit HAPS mobile communications. IEEE Access 2020, 8, 157995–158007. [Google Scholar] [CrossRef]
- Jamshed, M.A.; Ali, K.; Abbasi, Q.H.; Imran, M.A.; Ur-Rehman, M. Challenges, applications, and future of wireless sensors in Internet of Things: A review. IEEE Sens. J. 2022, 22, 5482–5494. [Google Scholar] [CrossRef]
- Nauman, A.; Jamshed, M.A.; Ahmad, Y.; Ali, R.; Zikria, Y.B.; Kim, S.W. An intelligent deterministic D2D communication in narrow-band Internet of Things. In Proceedings of the 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC); IEEE: New York, NY, USA, 2019; pp. 2111–2115. [Google Scholar]
- Li, Z. Research on dynamic allocation of wireless resources optimized by reinforcement learning. In Proceedings of the 2024 International Symposium on Artificial Intelligence for Education, Xi’an, China, 6–8 September 2024; pp. 579–582. [Google Scholar]
- Sinha, S. Number of Connected IoT Devices Growing 14% to 21.1 Billion; IoT Analytics: Hamburg, Germany, 2025. [Google Scholar]
- Jamshed, M.A. (Ed.) Artificial Intelligence for Integrated Terrestrial and Non-Terrestrial Networks; Synthesis Lectures on Communications; Springer: Cham, Switzerland, 2026. [Google Scholar] [CrossRef]
- Jamshed, M.A.; Haq, B.; Mohsin, M.A.; Nauman, A.; Yanikomeroglu, H. Artificial Intelligence, Ambient Backscatter Communication and Non-Terrestrial Networks: A 6G Commixture. IEEE Internet Things Mag. 2025, 8, 88–94. [Google Scholar] [CrossRef]
- Plastras, S.; Tsoumatidis, D.; Skoutas, D.N.; Rouskas, A.; Kormentzas, G.; Skianis, C. Non-terrestrial networks for energy-efficient connectivity of remote iot devices in the 6g era: A survey. Sensors 2024, 24, 1227. [Google Scholar] [CrossRef] [PubMed]
- Worka, C.E.; Khan, F.A.; Ahmed, Q.Z.; Sureephong, P.; Alade, T. Reconfigurable intelligent surface (RIS)-assisted non-terrestrial network (NTN)-based 6G communications: A contemporary survey. Sensors 2024, 24, 6958. [Google Scholar] [CrossRef] [PubMed]
- Yang, D.; Wu, J.; He, Y. Optimizing the agricultural internet of things (IoT) with edge computing and low-altitude platform stations. Sensors 2024, 24, 7094. [Google Scholar] [CrossRef] [PubMed]
- Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
- Takabatake, W.; Shibata, Y.; Hoshino, K. Neural-network-based dynamic area optimization algorithm for high-altitude platform station. In Proceedings of the 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall); IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Cao, Y.; Sun, Y.; Zhu, Y.; An, K.; Lin, Z. Game-theoretic clustering and scalable beamforming for multi-RIS-assisted cohesive satellite anti-jamming systems. Sci. China Inf. Sci. 2025, 68, 190304. [Google Scholar] [CrossRef]
- Mahmoud, K.R.; Montaser, A.M. Synthesize multiple V/H directional beams for high altitude platform station based on deep-learning algorithm. Sci. Rep. 2025, 15, 10846. [Google Scholar] [CrossRef] [PubMed]
- Niknam, S.; Dhillon, H.S.; Reed, J.H. Federated learning for wireless communications: Motivation, opportunities, and challenges. IEEE Commun. Mag. 2020, 58, 46–51. [Google Scholar] [CrossRef]
- Mao, Q.; Hu, F.; Hao, Q. Deep learning for intelligent wireless networks: A comprehensive survey. IEEE Commun. Surv. Tutor. 2018, 20, 2595–2621. [Google Scholar] [CrossRef]
- Al Janaby, A.; Al-Rizzo, H.; Qassim, Y. Enhancing Spectral Efficiency of 6G Downlink Beamforming via Cooperative Multi-Agent Deep Reinforcement Learning. Sensors 2026, 26, 950. [Google Scholar] [CrossRef] [PubMed]
- Nauman, A.; Jamshed, M.A.; Ali, R.; Cengiz, K.; Zulqarnain; Kim, S.W. Reinforcement learning-enabled intelligent device-to-device (I-D2D) communication in narrowband Internet of Things (NB-IoT). Comput. Commun. 2021, 176, 13–22. [Google Scholar] [CrossRef]
- Bracciale, L.; Loreti, P. Lyapunov Drift-Plus-Penalty Optimization for Queues With Finite Capacity. IEEE Commun. Lett. 2020, 24, 2555–2558. [Google Scholar] [CrossRef]
- Jameel, F.; Jamshed, M.A.; Chang, Z.; Jäntti, R.; Pervaiz, H. Low latency ambient backscatter communications with deep Q-learning for beyond 5G applications. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring); IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Jamshed, M.A.; Khan, W.U.; Pervaiz, H.; Imran, M.A.; Ur-Rehman, M. Emission-aware resource optimization framework for backscatter-enabled uplink noma networks. In Proceedings of the 2022 IEEE 95th Vehicular Technology Conference:(VTC2022-Spring); IEEE: New York, NY, USA, 2022; pp. 1–5. [Google Scholar]
- Hassouna, S.; Jamshed, M.A.; Ur-Rehman, M.; Imran, M.A.; Abbasi, Q.H. Configuring reconfigurable intelligent surfaces using a practical codebook approach. Sci. Rep. 2023, 13, 11869. [Google Scholar] [CrossRef] [PubMed]
- Meng, R.; Shah, A.A.; Jamshed, M.A.; Pezaros, D. Federated learning-based intrusion detection framework for internet of things and edge computing backed critical infrastructure. In Proceedings of the 2024 IEEE International Conference on Communications Workshops (ICC Workshops); IEEE: New York, NY, USA, 2024; pp. 810–815. [Google Scholar]
- Bilal, A.; Mohsin, M.A.; Umer, M.; Bangash, M.A.K.; Jamshed, M.A. Meta-thinking in llms via multi-agent reinforcement learning: A survey. arXiv 2025, arXiv:2504.14520. [Google Scholar]







| Metric | Single Device | 100-Device Extension |
|---|---|---|
| Q-table entries | 400 per device | |
| Storage | 1.6 KB | 160 KB total |
| Inference/slot | ops | per device |
| Training updates | Offline; loaded as static table |
| Parameter | Symbol | Value |
|---|---|---|
| Time steps per episode | T | 100 |
| Training episodes | 500 | |
| Max battery level | 100 (energy units) | |
| Max queue length | 10 packets | |
| Battery state levels | 10 | |
| Queue state levels | 10 | |
| Power action space | W | |
| Noise power | W | |
| Energy scaling | 10 | |
| Arrival probability | 0.3 | |
| Learning rate | 0.1 | |
| Discount factor | 0.95 | |
| - | ||
| Reward weights | ||
| Regularization |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Nauman, A.; Kim, S.W. An Intelligent Energy-Aware Framework for 6G-Enabled Non-Terrestrial IoT via Reinforcement Learning. Sensors 2026, 26, 4057. https://doi.org/10.3390/s26134057
Nauman A, Kim SW. An Intelligent Energy-Aware Framework for 6G-Enabled Non-Terrestrial IoT via Reinforcement Learning. Sensors. 2026; 26(13):4057. https://doi.org/10.3390/s26134057
Chicago/Turabian StyleNauman, Ali, and Sung Won Kim. 2026. "An Intelligent Energy-Aware Framework for 6G-Enabled Non-Terrestrial IoT via Reinforcement Learning" Sensors 26, no. 13: 4057. https://doi.org/10.3390/s26134057
APA StyleNauman, A., & Kim, S. W. (2026). An Intelligent Energy-Aware Framework for 6G-Enabled Non-Terrestrial IoT via Reinforcement Learning. Sensors, 26(13), 4057. https://doi.org/10.3390/s26134057
