MDPI - Publisher of Open Access Journals

17 pages, 2723 KB

Open AccessArticle

Reinforcement Learning-Based Handover Algorithm for 5G/6G AI-RAN

by Ildar A. Safiullin, Ivan P. Ashaev, Alexey A. Korobkov, Artur K. Gaysin and Adel F. Nadeev

Inventions 2026, 11(1), 8; https://doi.org/10.3390/inventions11010008 - 10 Jan 2026

Viewed by 72

The increasing number of Base Stations (BSs) and connected devices, coupled with their mobility, poses significant challenges and makes mobility management even more pressing. Therefore, advanced handover (HO) management technologies are required to address this issue. This paper focuses on the ping-pong HO [...] Read more.

The increasing number of Base Stations (BSs) and connected devices, coupled with their mobility, poses significant challenges and makes mobility management even more pressing. Therefore, advanced handover (HO) management technologies are required to address this issue. This paper focuses on the ping-pong HO problem. To address this issue, we propose an algorithm using Reinforcement Learning (RL) based on the Double Deep Q-Network (DDQN). The novelty of our approach is to assign specialized RL agents to users based on their mobility patterns. The use of specialized RL agents simplifies the learning process. The effectiveness of the proposed algorithm is demonstrated in tests on the ns-3 platform due to its ability to replicate real-world scenarios. To compare the results of the proposed approach, the baseline handover algorithm based on Events A2 and A4 is used. The results show that the proposed approach reduces the number of HO by more than four times on average, resulting in a more stable data rate and increasing it up to two times in the best case. Full article

(This article belongs to the Section Inventions and Innovation in Electrical Engineering/Energy/Communications)

38 pages, 4535 KB

Open AccessArticle

Double Deep Q-Network-Based Solution for the Dynamic Electric Vehicle Routing Problem

by Mehmet Bilge Han Taş, Kemal Özkan, İnci Sarıçiçek and Ahmet Yazıcı

Appl. Sci. 2026, 16(1), 278; https://doi.org/10.3390/app16010278 - 26 Dec 2025

Viewed by 253

Abstract

The Dynamic Electric Vehicle Routing Problem (D-EVRP) presents a framework that requires electric vehicles to meet demand with limited energy capacity. When dynamic demand flows and charging requirements are considered together, traditional methods cannot provide sufficient adaptation for real-time decision-making. Therefore, a learning-based [...] Read more.

The Dynamic Electric Vehicle Routing Problem (D-EVRP) presents a framework that requires electric vehicles to meet demand with limited energy capacity. When dynamic demand flows and charging requirements are considered together, traditional methods cannot provide sufficient adaptation for real-time decision-making. Therefore, a learning-based approach was chosen to ensure that decision-making processes respond quickly to changing conditions. The solution utilizes a model with a Double Deep Q-Network (DDQN) architecture and a discrete valuation structure. Prioritized Experience Replay (PER) was implemented to increase model stability, allowing infrequent but effective experiments to contribute more to the learning process. The state representation is constructed using the vehicle’s location, battery level, load status, and current customer demands. Scalability is ensured by dividing customer locations into clusters using the K-means method, with each cluster handled by an independent representative. The approach was tested with real-world road data obtained from the Meşelik Campus of Osmangazi University in Eskişehir. Experiments conducted under different demand levels and data sizes have shown that the PER-assisted DDQN structure produces more stable and shorter route lengths in dynamic scenarios, but random selection, greedy method and genetic algorithm experience significant performance losses as dynamicity increases. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 3688 KB

Open AccessArticle

An End-to-End Hierarchical Intelligent Inference Model for Collaborative Operation of Grid Switches

by Mingrui Zhao, Tie Chen, Jiaxin Yuan, Yuting Jiang and Junlin Ren

Energies 2025, 18(24), 6574; https://doi.org/10.3390/en18246574 - 16 Dec 2025

Viewed by 243

Abstract

To address the issue of heavy reliance on manual intervention in substation maintenance tasks, this paper proposes an end-to-end hierarchical intelligent inference method for collaborative operation of grid switches. The method constructs a unified knowledge environment that can simultaneously describe the operational characteristics [...] Read more.

To address the issue of heavy reliance on manual intervention in substation maintenance tasks, this paper proposes an end-to-end hierarchical intelligent inference method for collaborative operation of grid switches. The method constructs a unified knowledge environment that can simultaneously describe the operational characteristics of both the power grid and the substation, and combines Dueling Double Deep Q-Network (D3QN) with Multi-Task Dueling Double Deep Q-Network (MT-D3QN) algorithms for interactive training, achieving hierarchical inference. The upper layer uses bays as the base nodes to reflect the power flow, designing a reward and penalty function under N-1 power flow constraints and ring-current impact constraints, optimizing the load transfer plan for the power outages caused by maintenance tasks. The lower layer uses switches as the base nodes to reflect the main wiring status of the substation, introduces a multi-task learning mechanism for parallel training of bays with the same tasks, designs the reward and penalty function according to the five protection rules, and optimizes the switching operations within the bay. The experimental results show that the trained model can quickly deduce the switching operation sequence for different maintenance tasks. Full article

► Show Figures

Figure 1

25 pages, 821 KB

Open AccessArticle

Enhancing Microservice Security Through Adaptive Moving Target Defense Policies to Mitigate DDoS Attacks in Cloud-Native Environments

by Yuyang Zhou, Guang Cheng and Kang Du

Future Internet 2025, 17(12), 580; https://doi.org/10.3390/fi17120580 - 16 Dec 2025

Viewed by 268

Abstract

Cloud-native microservice architectures offer scalability and resilience but introduce complex interdependencies and new attack surfaces, making them vulnerable to resource-exhaustion Distributed Denial-of-Service (DDoS) attacks. These attacks propagate along service call chains, closely mimic legitimate traffic, and evade traditional detection and mitigation techniques, resulting [...] Read more.

Cloud-native microservice architectures offer scalability and resilience but introduce complex interdependencies and new attack surfaces, making them vulnerable to resource-exhaustion Distributed Denial-of-Service (DDoS) attacks. These attacks propagate along service call chains, closely mimic legitimate traffic, and evade traditional detection and mitigation techniques, resulting in cascading bottlenecks and degraded Quality of Service (QoS). Existing Moving Target Defense (MTD) approaches lack adaptive, cost-aware policy guidance and are often ineffective against spatiotemporally adaptive adversaries. To address these challenges, this paper proposes ScaleShield, an adaptive MTD framework powered by Deep Reinforcement Learning (DRL) that learns coordinated, attack-aware defense policies for microservices. ScaleShield formulates defense as a Markov Decision Process (MDP) over multi-dimensional discrete actions, leveraging a Multi-Dimensional Double Deep Q-Network (MD3QN) to optimize service availability and minimize operational overhead. Experimental results demonstrate that ScaleShield achieves near 100% defense success rates and reduces compromised nodes to zero within approximately 5 steps, significantly outperforming state-of-the-art baselines. It lowers service latency by up to 72% under dynamic attacks while maintaining over 94% resource efficiency, providing robust and cost-effective protection against resource-exhaustion DDoS attacks in cloud-native environments. Full article

(This article belongs to the Special Issue DDoS Attack Detection for Cyber–Physical Systems)

► Show Figures

Figure 1

14 pages, 2239 KB

Open AccessArticle

Energy-Efficient Path Planning for Snake Robots Using a Deep Reinforcement Learning-Enhanced A* Algorithm

by Yang Gu, Zelin Wang and Zhong Huang

Biomimetics 2025, 10(12), 826; https://doi.org/10.3390/biomimetics10120826 - 10 Dec 2025

Viewed by 420

Abstract

Snake-like robots, characterized by their high flexibility and multi-joint structure, exhibit exceptional adaptability to complex terrains such as snowfields, jungles, deserts, and underwater environments. Their ability to navigate narrow spaces and circumvent obstacles makes them ideal for operations in confined or rugged environments. [...] Read more.

Snake-like robots, characterized by their high flexibility and multi-joint structure, exhibit exceptional adaptability to complex terrains such as snowfields, jungles, deserts, and underwater environments. Their ability to navigate narrow spaces and circumvent obstacles makes them ideal for operations in confined or rugged environments. However, efficient motion in such conditions requires not only mechanical flexibility but also effective path planning to ensure safety, energy efficiency, and overall task performance. Most existing path planning algorithms for snake-like robots focus primarily on finding the shortest path between the start and target positions while neglecting the optimization of energy consumption during real operations. To address this limitation, this study proposes an energy-efficient path planning method based on an improved A* algorithm enhanced with deep reinforcement learning: Dueling Double-Deep Q-Network (D3QN). An Energy Consumption Estimation Model (ECEM) is first developed to evaluate the energetic cost of snake robot motion in three-dimensional space. This model is then integrated into a new heuristic function to guide the A* search toward energy-optimal trajectories. Simulation experiments were conducted in a 3D environment to assess the performance of the proposed approach. The results demonstrate that the improved A* algorithm effectively reduces the energy consumption of the snake robot compared with conventional algorithms. Specifically, the proposed method achieves an energy consumption of 68.79 J, which is 3.39%, 27.26%, and 5.91% lower than that of the traditional A* algorithm (71.20 J), the bidirectional A* algorithm (94.61 J), and the weighted improved A* algorithm (73.11 J), respectively. These findings confirm that integrating deep reinforcement learning with an adaptive heuristic function significantly enhances both the energy efficiency and practical applicability of snake robot path planning in complex 3D environments. Full article

(This article belongs to the Section Locomotion and Bioinspired Robotics)

► Show Figures

Figure 1

26 pages, 4507 KB

Open AccessArticle

A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning

by Hadi Mohammadian KhalafAnsar, Jaime Rohten and Jafar Keighobadi

AI 2025, 6(12), 319; https://doi.org/10.3390/ai6120319 - 6 Dec 2025

Viewed by 629

Abstract

Objectives: This paper presents an innovative control framework for the classical Cart–Pole problem. Methods: The proposed framework combines Interval Type-2 Fuzzy Logic, the Dueling Double DQN deep reinforcement learning algorithm, and adaptive reward shaping techniques. Specifically, fuzzy logic acts as an a priori [...] Read more.

Objectives: This paper presents an innovative control framework for the classical Cart–Pole problem. Methods: The proposed framework combines Interval Type-2 Fuzzy Logic, the Dueling Double DQN deep reinforcement learning algorithm, and adaptive reward shaping techniques. Specifically, fuzzy logic acts as an a priori knowledge layer that incorporates measurement uncertainty in both angle and angular velocity, allowing the controller to generate adaptive actions dynamically. Simultaneously, the deep Q-network is responsible for learning the optimal policy. To ensure stability, the Double DQN mechanism successfully alleviates the overestimation bias commonly observed in value-based reinforcement learning. An accelerated convergence mechanism is achieved through a multi-component reward shaping function that prioritizes angle stability and survival. Results: Given the training results, the method stabilizes rapidly; it achieves a 100% success rate by episode 20 and maintains consistent high rewards (650–700) throughout training. While Standard DQN and other baselines take 100+ episodes to become reliable, our method converges in about 20 episodes (4–5 times faster). It is observed that in comparison with advanced baselines like C51 or PER, the proposed method is about 15–20% better in final performance. We also found that PPO and QR-DQN surprisingly struggle on this task, highlighting the need for stability mechanisms. Conclusions: The proposed approach provides a practical solution that balances exploration with safety through the integration of fuzzy logic and deep reinforcement learning. This rapid convergence is particularly important for real-world applications where data collection is expensive, achieving stable performance much faster than existing methods without requiring complex theoretical guarantees. Full article

► Show Figures

Figure 1

18 pages, 1169 KB

Open AccessArticle

Fusion of Deep Reinforcement Learning and Educational Data Mining for Decision Support in Journalism and Communication

by Weichen Jia and Zhi Li

Information 2025, 16(12), 1029; https://doi.org/10.3390/info16121029 - 26 Nov 2025

Viewed by 537

Abstract

The project-based learning model in journalism and communication faces challenges of sparse multimodal behavior data and delayed teaching interventions, making it difficult to perceive student states and optimize decisions in real-time. This study aims to construct an intelligent decision-support framework integrating educational data [...] Read more.

The project-based learning model in journalism and communication faces challenges of sparse multimodal behavior data and delayed teaching interventions, making it difficult to perceive student states and optimize decisions in real-time. This study aims to construct an intelligent decision-support framework integrating educational data mining (EDM) and deep reinforcement learning (DRL) to address these issues. A bidirectional long short-term memory (Bi-LSTM) network models behavioral sequences, while a conditional generative adversarial network (cGAN) with Wasserstein optimization enhances low-activity student data. The extracted and augmented features are then fed into a Double Deep Q-Network (DQN) to generate adaptive teaching intervention strategies. Experimental results from a 26-week study show that the proposed framework improved personalized learning-path matching from 0.42 to 0.68, increased knowledge mastery from 40.46% to 77.13%, and reduced intervention latency from 210.5 min to 144.6 min. The results demonstrate that the fusion of EDM and DRL can achieve efficient and adaptive decision-making, providing a viable approach for intelligent teaching support in journalism and communication education. Full article

(This article belongs to the Special Issue Human–Computer Interactions and Computer-Assisted Education)

► Show Figures

Figure 1

22 pages, 2341 KB

Open AccessArticle

Assembly Measurement Path Planning for Mobile Robots Using an Improved Deep Reinforcement Learning

by Gang Yuan, Bo Zhu, Yi Hu, Guangdong Tian and Zhiwu Li

Appl. Sci. 2025, 15(23), 12406; https://doi.org/10.3390/app152312406 - 22 Nov 2025

Viewed by 399

Abstract

In addressing the challenges associated with mobile robot path planning during complex product assembly measurements, this study introduces the N-step Priority Double DQN (NDDQN) algorithm, which integrates double Q-learning and an N-step priority strategy to accelerate convergence. This approach aims to improve the [...] Read more.

In addressing the challenges associated with mobile robot path planning during complex product assembly measurements, this study introduces the N-step Priority Double DQN (NDDQN) algorithm, which integrates double Q-learning and an N-step priority strategy to accelerate convergence. This approach aims to improve the obstacle avoidance capabilities of mobile robots while accelerating their learning efficiency. We conducted three grid-based obstacle avoidance simulation experiments of varying scales to compare and analyze the path planning performance of both the proximal policy optimization algorithm and the Deep Q Network algorithm. To accurately simulate real-world robotic measurement scenarios, two Gazebo environments were utilized to validate the effectiveness of our proposed algorithm. Through a comprehensive analysis of simulation results from all three algorithms, we demonstrate that the NDDQN algorithm exhibits significant effectiveness and stability in path planning. Notably, it substantially reduces iteration counts and enhances convergence speeds. This research provides a theoretical foundation for adaptive path planning in mobile robots engaged in complex product assembly measurements. Full article

► Show Figures

Figure 1

18 pages, 5815 KB

Open AccessArticle

Dual-Objective Pareto Optimization Method of Flapping Hydrofoil Propulsion Performance Based on MLP and Double DQN

by Jingling Zhang, Xuchen Qiu, Wenyu Chen, Ertian Hua and Yajie Shen

Water 2025, 17(22), 3290; https://doi.org/10.3390/w17223290 - 18 Nov 2025

Viewed by 449

Abstract

To address the inherent complexities of underwater operating environments and achieve the design of a highly efficient, energy-saving flapping hydrofoil, this paper proposes an intelligent agent-based model for real-time parametric optimization. A non-parametric surrogate model based on a Multilayer Perceptron (MLP) is established [...] Read more.

To address the inherent complexities of underwater operating environments and achieve the design of a highly efficient, energy-saving flapping hydrofoil, this paper proposes an intelligent agent-based model for real-time parametric optimization. A non-parametric surrogate model based on a Multilayer Perceptron (MLP) is established using data samples of multi-dimensional flapping hydrofoil geometric parameters obtained through Computational Fluid Dynamics (CFD) simulations. An improved Double Deep Q-Network (DDQN) algorithm incorporating Pareto frontier information is deployed within the surrogate model to obtain the Pareto optimal solution set for propulsion efficiency and average input power, and a set of propulsion parameter combinations with error ranges between 0.24% and 1.27% across continuous intervals was obtained. Experimental results demonstrate that the proposed MLP-DDQN method is capable of learning the domain-wide optimal solution within the experimental environment, satisfying the Pareto optimality between propulsion efficiency and average input power. Further analysis of the flow field around the flapping hydrofoil under the obtained optimal parameter combination revealed that the presence of stable and continuously attached vortex structures on the wing surface is the intrinsic mechanism responsible for its superior propulsion performance. Full article

(This article belongs to the Section New Sensors, New Technologies and Machine Learning in Water Sciences)

► Show Figures

Figure 1

28 pages, 5070 KB

Open AccessArticle

Energy-Efficient Scheduling for Distributed Hybrid Flowshop of Offshore Wind Blade Manufacturing Considering Limited Buffers

by Qinglei Zhang, Qianyuan Zhang, Jianguo Duan, Jiyun Qin and Ying Zhou

J. Mar. Sci. Eng. 2025, 13(11), 2176; https://doi.org/10.3390/jmse13112176 - 17 Nov 2025

Viewed by 332

Abstract

Amidst the backdrop of energy transition, scheduling problems in offshore manufacturing have emerged as critical challenges in marine engineering. However, the inherently coupled constraints of sequence-dependent setup times (SDST) and limited buffers (LB) have been largely overlooked. Therefore, this paper establishes the first [...] Read more.

Amidst the backdrop of energy transition, scheduling problems in offshore manufacturing have emerged as critical challenges in marine engineering. However, the inherently coupled constraints of sequence-dependent setup times (SDST) and limited buffers (LB) have been largely overlooked. Therefore, this paper establishes the first multi-objective scheduling model, DHFSP-SDST&LB, specifically tailored for large components like turbine blades. A hybrid optimization algorithm, DDQN-MOCE, integrating an evolutionary algorithm (EA) and a double deep Q-network (DDQN), is proposed to overcome the inherent limitations of traditional MOEAs. In the EA component, a three-phase crossover and mutation policy is employed to generate offspring. In the DDQN component, the dimension-reduced feature vectors serve as the state input, and three makespan-oriented and two energy-oriented heuristic search actions are defined based on the knowledge. Finally, the optimal parameter combination is determined via Taguchi experimental design, and the effectiveness of DDQN-MOCE is evaluated on 36 instances and 1 industrial case. Experimental results demonstrate that DDQN-MOCE’s HV surpasses the second-best result by over 50% in 34 instances. It achieves the best GD, near-absolute dominance, and saves over 22% in total energy, with its high volume of solutions compensating for a minor weakness in spacing. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

12 pages, 578 KB

Open AccessArticle

A Power-Aware 5G Network Slicing Scheme for IIoT Systems with Age Tolerance

by Mingjiang Weng, Yixuan Bai and Xin Xie

Sensors 2025, 25(22), 6956; https://doi.org/10.3390/s25226956 - 14 Nov 2025

Viewed by 591

Abstract

Network slicing has emerged as a pivotal technology in addressing the diverse customization requirements of the Industrial Internet of Things (IIoT) within 5G networks, enabling the deployment of multiple logical networks over shared infrastructure. Efficient resource management in this context is essential to [...] Read more.

Network slicing has emerged as a pivotal technology in addressing the diverse customization requirements of the Industrial Internet of Things (IIoT) within 5G networks, enabling the deployment of multiple logical networks over shared infrastructure. Efficient resource management in this context is essential to ensure energy efficiency and meet the stringent real-time demands of IIoT applications. This study focuses on the scheduling problem of minimizing average transmission power while maintaining Age of Information (AoI) tolerance constraints within 5G wireless network slicing. To tackle this challenge, an improved Dueling Double Deep Q-Network (D3QN) is leveraged to devise intelligent slicing schemes that dynamically allocate resources, ensuring optimal performance in time-varying wireless environments. The proposed improved D3QN approach introduces a novel heuristic-based exploration strategy that restricts action choices to the most effective options, significantly; reducing ineffective learning steps. The simulation results show that the method not only speeds up convergence considerably but also achieves lower transmit power while preserving strict AoI reliability constraints and slice isolation. Full article

(This article belongs to the Special Issue Effective Software-Defined Internet-of-Things (SD-IoT) Leveraging AI, 5G and NFV—2nd Edition)

► Show Figures

Figure 1

30 pages, 695 KB

Open AccessArticle

Task Offloading and Resource Allocation for ICVs in Vehicular Edge Computing Networks Based on Hybrid Hierarchical Deep Reinforcement Learning

by Jiahui Liu, Yuan Zou, Guodong Du, Xudong Zhang and Jinming Wu

Sensors 2025, 25(22), 6914; https://doi.org/10.3390/s25226914 - 12 Nov 2025

Viewed by 1079

Abstract

Intelligent connected vehicles (ICVs) face challenges in handling intensive onboard computational tasks due to limited computing capacity. Vehicular edge computing networks (VECNs) offer a promising solution by enabling ICVs to offload tasks to mobile edge computing (MEC), alleviating computational load. As transportation systems [...] Read more.

Intelligent connected vehicles (ICVs) face challenges in handling intensive onboard computational tasks due to limited computing capacity. Vehicular edge computing networks (VECNs) offer a promising solution by enabling ICVs to offload tasks to mobile edge computing (MEC), alleviating computational load. As transportation systems are dynamic, vehicular tasks and MEC capacities vary over time, making efficient task offloading and resource allocation crucial. We explored a vehicle–road collaborative edge computing network and formulated the task offloading scheduling and resource allocation problem to minimize the sum of time and energy costs. To address the mixed nature of discrete and continuous decision variables and reduce computational complexity, we propose a hybrid hierarchical deep reinforcement learning (HHDRL) algorithm, structured in two layers. The upper layer of HHDRL enhances the double deep Q-network (DDQN) with a self-attention mechanism to improve feature correlation learning and generates discrete actions (communication decisions), while the lower layer employs deep deterministic policy gradient (DDPG) to produce continuous actions (power control, task offloading, and resource allocation decision). This hybrid design enables efficient decomposition of complex action spaces and improves adaptability in dynamic environments. Results from numerical simulations reveal that HHDRL achieves a significant reduction in total computational cost relative to current benchmark algorithms. Furthermore, the robustness of HHDRL to varying environmental conditions was confirmed by uniformly designing random numbers within a specified range for certain simulation parameters. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

41 pages, 5751 KB

Open AccessArticle

Efficient Scheduling for GPU-Based Neural Network Training via Hybrid Reinforcement Learning and Metaheuristic Optimization

by Nana Du, Chase Wu, Aiqin Hou, Weike Nie and Ruiqi Song

Big Data Cogn. Comput. 2025, 9(11), 284; https://doi.org/10.3390/bdcc9110284 - 10 Nov 2025

Viewed by 1711

Abstract

On GPU-based clusters, the training workloads of machine learning (ML) models, particularly neural networks (NNs), are often structured as Directed Acyclic Graphs (DAGs) and typically deployed for parallel execution across heterogeneous GPU resources. Efficient scheduling of these workloads is crucial for optimizing performance [...] Read more.

On GPU-based clusters, the training workloads of machine learning (ML) models, particularly neural networks (NNs), are often structured as Directed Acyclic Graphs (DAGs) and typically deployed for parallel execution across heterogeneous GPU resources. Efficient scheduling of these workloads is crucial for optimizing performance metrics such as execution time, under various constraints including GPU heterogeneity, network capacity, and data dependencies. DAG-structured ML workload scheduling could be modeled as a Nonlinear Integer Program (NIP) problem, and is shown to be NP-complete. By leveraging a positive correlation between Scheduling Plan Distance (SPD) and Finish Time Gap (FTG) identified through an empirical study, we propose to develop a Running Time Gap Strategy for scheduling based on Whale Optimization Algorithm (WOA) and Reinforcement Learning, referred to as WORL-RTGS. The proposed method integrates the global search capabilities of WOA with the adaptive decision-making of Double Deep Q-Networks (DDQN). Particularly, we derive a novel function to generate effective scheduling plans using DDQN, enhancing adaptability to complex DAG structures. Comprehensive evaluations on practical ML workload traces collected from Alibaba on simulated GPU-enabled platforms demonstrate that WORL-RTGS significantly improves WOA’s stability for DAG-structured ML workload scheduling and reduces completion time by up to 66.56% compared with five state-of-the-art scheduling algorithms. Full article

(This article belongs to the Special Issue Advanced Software and Machine Learning Techniques for System Architectures and Big Data)

► Show Figures

Figure 1

27 pages, 4763 KB

Open AccessArticle

Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks

by Adeel Iqbal, Ali Nauman and Tahir Khurshaid

Sensors 2025, 25(21), 6777; https://doi.org/10.3390/s25216777 - 5 Nov 2025

Viewed by 686

Abstract

The Vehicular Internet of Things (V-IoT) has emerged as a cornerstone of next-generation intelligent transportation systems (ITSs), enabling applications ranging from safety-critical collision avoidance and cooperative awareness to infotainment and fleet management. These heterogeneous services impose stringent quality-of-service (QoS) demands for latency, reliability, [...] Read more.

The Vehicular Internet of Things (V-IoT) has emerged as a cornerstone of next-generation intelligent transportation systems (ITSs), enabling applications ranging from safety-critical collision avoidance and cooperative awareness to infotainment and fleet management. These heterogeneous services impose stringent quality-of-service (QoS) demands for latency, reliability, and fairness while competing for limited and dynamically varying spectrum resources. Conventional schedulers, such as round-robin or static priority queues, lack adaptability, whereas deep reinforcement learning (DRL) solutions, though powerful, remain computationally intensive and unsuitable for real-time roadside unit (RSU) deployment. This paper proposes a lightweight and interpretable reinforcement learning (RL)-based spectrum management framework for Vehicular Internet of Things (V-IoT) networks. Two enhanced Q-Learning variants are introduced: a Value-Prioritized Action Double Q-Learning with Constraints (VPADQ-C) algorithm that enforces reliability and blocking constraints through a Constrained Markov Decision Process (CMDP) with online primal–dual optimization, and a contextual Q-Learning with Upper Confidence Bound (Q-UCB) method that integrates uncertainty-aware exploration and a Success-Rate Prior (SRP) to accelerate convergence. A Risk-Aware Heuristic baseline is also designed as a transparent, low-complexity benchmark to illustrate the interpretability–performance trade-off between rule-based and learning-driven approaches. A comprehensive simulation framework incorporating heterogeneous traffic classes, physical-layer fading, and energy-consumption dynamics is developed to evaluate throughput, delay, blocking probability, fairness, and energy efficiency. The results demonstrate that the proposed methods consistently outperform conventional Q-Learning and Double Q-Learning methods. VPADQ-C achieves the highest energy efficiency (≈

8.425 \times 10^{7}

bits/J) and reduces interruption probability by over

60 %

, while Q-UCB achieves the fastest convergence (within ≈190 episodes), lowest blocking probability (≈0.0135), and lowest mean delay (≈0.351 ms). Both schemes maintain fairness near

0.364

, preserve throughput around 28 Mbps, and exhibit sublinear training-time scaling with

O (1)

per-update complexity and

O (N^{2})

overall runtime growth. Scalability analysis confirms that the proposed frameworks sustain URLLC-grade latency (<0.2 ms) and reliability under dense vehicular loads, validating their suitability for real-time, large-scale V-IoT deployments. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

26 pages, 2510 KB

Open AccessArticle

GA-HPO PPO: A Hybrid Algorithm for Dynamic Flexible Job Shop Scheduling

by Yiming Zhou, Jun Jiang, Qining Shi, Maojie Fu, Yi Zhang, Yihao Chen and Longfei Zhou

Sensors 2025, 25(21), 6736; https://doi.org/10.3390/s25216736 - 4 Nov 2025

Cited by 1 | Viewed by 1125

Abstract

The Job Shop Scheduling Problem (JSP), a classical NP-hard challenge, has given rise to various complex extensions to accommodate modern manufacturing requirements. Among them, the Dynamic Flexible Job Shop Scheduling Problem (DFJSP) remains particularly challenging, due to its stochastic task arrivals, heterogeneous deadlines, [...] Read more.

The Job Shop Scheduling Problem (JSP), a classical NP-hard challenge, has given rise to various complex extensions to accommodate modern manufacturing requirements. Among them, the Dynamic Flexible Job Shop Scheduling Problem (DFJSP) remains particularly challenging, due to its stochastic task arrivals, heterogeneous deadlines, and varied task types. Traditional optimization- and rule-based approaches often fail to capture these dynamics effectively. To address this gap, this study proposes a hybrid algorithm, GA-HPO PPO, tailored for the DFJSP. The method integrates genetic-algorithm–based hyperparameter optimization with proximal policy optimization to enhance learning efficiency and scheduling performance. The algorithm was trained on four datasets and evaluated on ten benchmark datasets widely adopted in DFJSP research. Comparative experiments against Double Deep Q-Network (DDQN), standard PPO, and rule-based heuristics demonstrated that GA-HPO PPO consistently achieved superior performance. Specifically, it reduced the number of overdue tasks by an average of 18.5 in 100-task scenarios and 197 in 1000-task scenarios, while maintaining a machine utilization above 67% and 28% in these respective scenarios, and limiting the makespan to within 108–114 and 506–510 time units. The model also demonstrated a 25% faster convergence rate and 30% lower variance in performance across unseen scheduling instances compared to standard PPO, confirming its robustness and generalization capability across diverse scheduling conditions. These results indicate that GA-HPO PPO provides an effective and scalable solution for the DFJSP, contributing to improved dynamic scheduling optimization in practical manufacturing environments. Full article

(This article belongs to the Special Issue Intelligent Sensing and Decision-Making in Advanced Manufacturing: 2nd Edition)

► Show Figures

Figure 1

Search Results (227)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (227)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI