Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks

Iqbal, Adeel; Nauman, Ali; Khurshaid, Tahir

doi:10.3390/s25216777

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks

by

Adeel Iqbal

^1,†

,

Ali Nauman

^1,*,†

and

Tahir Khurshaid

^2,*

¹

School of Computer Science and Engineering, Yeungnam University, Gyeongsan-si 38541, Republic of Korea

²

Department of Electrical Engineering, Yeungnam University, Gyeongsan-si 38541, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2025, 25(21), 6777; https://doi.org/10.3390/s25216777

Submission received: 19 September 2025 / Revised: 31 October 2025 / Accepted: 2 November 2025 / Published: 5 November 2025

(This article belongs to the Section Internet of Things)

Download Versions Notes

Abstract

The Vehicular Internet of Things (V-IoT) has emerged as a cornerstone of next-generation intelligent transportation systems (ITSs), enabling applications ranging from safety-critical collision avoidance and cooperative awareness to infotainment and fleet management. These heterogeneous services impose stringent quality-of-service (QoS) demands for latency, reliability, and fairness while competing for limited and dynamically varying spectrum resources. Conventional schedulers, such as round-robin or static priority queues, lack adaptability, whereas deep reinforcement learning (DRL) solutions, though powerful, remain computationally intensive and unsuitable for real-time roadside unit (RSU) deployment. This paper proposes a lightweight and interpretable reinforcement learning (RL)-based spectrum management framework for Vehicular Internet of Things (V-IoT) networks. Two enhanced Q-Learning variants are introduced: a Value-Prioritized Action Double Q-Learning with Constraints (VPADQ-C) algorithm that enforces reliability and blocking constraints through a Constrained Markov Decision Process (CMDP) with online primal–dual optimization, and a contextual Q-Learning with Upper Confidence Bound (Q-UCB) method that integrates uncertainty-aware exploration and a Success-Rate Prior (SRP) to accelerate convergence. A Risk-Aware Heuristic baseline is also designed as a transparent, low-complexity benchmark to illustrate the interpretability–performance trade-off between rule-based and learning-driven approaches. A comprehensive simulation framework incorporating heterogeneous traffic classes, physical-layer fading, and energy-consumption dynamics is developed to evaluate throughput, delay, blocking probability, fairness, and energy efficiency. The results demonstrate that the proposed methods consistently outperform conventional Q-Learning and Double Q-Learning methods. VPADQ-C achieves the highest energy efficiency (≈

8.425 \times 10^{7}

bits/J) and reduces interruption probability by over

60 %

, while Q-UCB achieves the fastest convergence (within ≈190 episodes), lowest blocking probability (≈0.0135), and lowest mean delay (≈0.351 ms). Both schemes maintain fairness near

0.364

, preserve throughput around 28 Mbps, and exhibit sublinear training-time scaling with

O (1)

per-update complexity and

O (N^{2})

overall runtime growth. Scalability analysis confirms that the proposed frameworks sustain URLLC-grade latency (<0.2 ms) and reliability under dense vehicular loads, validating their suitability for real-time, large-scale V-IoT deployments.

Keywords: V-IoT; QoS; reinforcement learning; Markov Decision Process; 5G; IoT; priority-aware spectrum management; spectrum access; resource allocation

Share and Cite

MDPI and ACS Style

Iqbal, A.; Nauman, A.; Khurshaid, T. Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks. Sensors 2025, 25, 6777. https://doi.org/10.3390/s25216777

AMA Style

Iqbal A, Nauman A, Khurshaid T. Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks. Sensors. 2025; 25(21):6777. https://doi.org/10.3390/s25216777

Chicago/Turabian Style

Iqbal, Adeel, Ali Nauman, and Tahir Khurshaid. 2025. "Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks" Sensors 25, no. 21: 6777. https://doi.org/10.3390/s25216777

APA Style

Iqbal, A., Nauman, A., & Khurshaid, T. (2025). Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks. Sensors, 25(21), 6777. https://doi.org/10.3390/s25216777

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI