Next Article in Journal
Dynamic Incidence Angle Effects of Non-Spherical Raindrops on Rain Attenuation and Scattering for Millimeter-Wave Fuzes
Previous Article in Journal
Enhancing the Performance and Reliability of an Automotive Reed Sensor Through Spring Integration and Advanced Manufacturing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks

1
School of Computer Science and Engineering, Yeungnam University, Gyeongsan-si 38541, Republic of Korea
2
Department of Electrical Engineering, Yeungnam University, Gyeongsan-si 38541, Republic of Korea
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2025, 25(21), 6777; https://doi.org/10.3390/s25216777
Submission received: 19 September 2025 / Revised: 31 October 2025 / Accepted: 2 November 2025 / Published: 5 November 2025
(This article belongs to the Section Internet of Things)

Abstract

The Vehicular Internet of Things (V-IoT) has emerged as a cornerstone of next-generation intelligent transportation systems (ITSs), enabling applications ranging from safety-critical collision avoidance and cooperative awareness to infotainment and fleet management. These heterogeneous services impose stringent quality-of-service (QoS) demands for latency, reliability, and fairness while competing for limited and dynamically varying spectrum resources. Conventional schedulers, such as round-robin or static priority queues, lack adaptability, whereas deep reinforcement learning (DRL) solutions, though powerful, remain computationally intensive and unsuitable for real-time roadside unit (RSU) deployment. This paper proposes a lightweight and interpretable reinforcement learning (RL)-based spectrum management framework for Vehicular Internet of Things (V-IoT) networks. Two enhanced Q-Learning variants are introduced: a Value-Prioritized Action Double Q-Learning with Constraints (VPADQ-C) algorithm that enforces reliability and blocking constraints through a Constrained Markov Decision Process (CMDP) with online primal–dual optimization, and a contextual Q-Learning with Upper Confidence Bound (Q-UCB) method that integrates uncertainty-aware exploration and a Success-Rate Prior (SRP) to accelerate convergence. A Risk-Aware Heuristic baseline is also designed as a transparent, low-complexity benchmark to illustrate the interpretability–performance trade-off between rule-based and learning-driven approaches. A comprehensive simulation framework incorporating heterogeneous traffic classes, physical-layer fading, and energy-consumption dynamics is developed to evaluate throughput, delay, blocking probability, fairness, and energy efficiency. The results demonstrate that the proposed methods consistently outperform conventional Q-Learning and Double Q-Learning methods. VPADQ-C achieves the highest energy efficiency (≈8.425×107 bits/J) and reduces interruption probability by over 60%, while Q-UCB achieves the fastest convergence (within ≈190 episodes), lowest blocking probability (≈0.0135), and lowest mean delay (≈0.351 ms). Both schemes maintain fairness near 0.364, preserve throughput around 28 Mbps, and exhibit sublinear training-time scaling with O(1) per-update complexity and O(N2) overall runtime growth. Scalability analysis confirms that the proposed frameworks sustain URLLC-grade latency (<0.2 ms) and reliability under dense vehicular loads, validating their suitability for real-time, large-scale V-IoT deployments.
Keywords: V-IoT; QoS; reinforcement learning; Markov Decision Process; 5G; IoT; priority-aware spectrum management; spectrum access; resource allocation V-IoT; QoS; reinforcement learning; Markov Decision Process; 5G; IoT; priority-aware spectrum management; spectrum access; resource allocation

Share and Cite

MDPI and ACS Style

Iqbal, A.; Nauman, A.; Khurshaid, T. Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks. Sensors 2025, 25, 6777. https://doi.org/10.3390/s25216777

AMA Style

Iqbal A, Nauman A, Khurshaid T. Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks. Sensors. 2025; 25(21):6777. https://doi.org/10.3390/s25216777

Chicago/Turabian Style

Iqbal, Adeel, Ali Nauman, and Tahir Khurshaid. 2025. "Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks" Sensors 25, no. 21: 6777. https://doi.org/10.3390/s25216777

APA Style

Iqbal, A., Nauman, A., & Khurshaid, T. (2025). Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks. Sensors, 25(21), 6777. https://doi.org/10.3390/s25216777

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop