Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (227)

Search Parameters:
Keywords = double deep Q-network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2723 KB  
Article
Reinforcement Learning-Based Handover Algorithm for 5G/6G AI-RAN
by Ildar A. Safiullin, Ivan P. Ashaev, Alexey A. Korobkov, Artur K. Gaysin and Adel F. Nadeev
Inventions 2026, 11(1), 8; https://doi.org/10.3390/inventions11010008 - 10 Jan 2026
Viewed by 72
Abstract
The increasing number of Base Stations (BSs) and connected devices, coupled with their mobility, poses significant challenges and makes mobility management even more pressing. Therefore, advanced handover (HO) management technologies are required to address this issue. This paper focuses on the ping-pong HO [...] Read more.
The increasing number of Base Stations (BSs) and connected devices, coupled with their mobility, poses significant challenges and makes mobility management even more pressing. Therefore, advanced handover (HO) management technologies are required to address this issue. This paper focuses on the ping-pong HO problem. To address this issue, we propose an algorithm using Reinforcement Learning (RL) based on the Double Deep Q-Network (DDQN). The novelty of our approach is to assign specialized RL agents to users based on their mobility patterns. The use of specialized RL agents simplifies the learning process. The effectiveness of the proposed algorithm is demonstrated in tests on the ns-3 platform due to its ability to replicate real-world scenarios. To compare the results of the proposed approach, the baseline handover algorithm based on Events A2 and A4 is used. The results show that the proposed approach reduces the number of HO by more than four times on average, resulting in a more stable data rate and increasing it up to two times in the best case. Full article
38 pages, 4535 KB  
Article
Double Deep Q-Network-Based Solution for the Dynamic Electric Vehicle Routing Problem
by Mehmet Bilge Han Taş, Kemal Özkan, İnci Sarıçiçek and Ahmet Yazıcı
Appl. Sci. 2026, 16(1), 278; https://doi.org/10.3390/app16010278 - 26 Dec 2025
Viewed by 253
Abstract
The Dynamic Electric Vehicle Routing Problem (D-EVRP) presents a framework that requires electric vehicles to meet demand with limited energy capacity. When dynamic demand flows and charging requirements are considered together, traditional methods cannot provide sufficient adaptation for real-time decision-making. Therefore, a learning-based [...] Read more.
The Dynamic Electric Vehicle Routing Problem (D-EVRP) presents a framework that requires electric vehicles to meet demand with limited energy capacity. When dynamic demand flows and charging requirements are considered together, traditional methods cannot provide sufficient adaptation for real-time decision-making. Therefore, a learning-based approach was chosen to ensure that decision-making processes respond quickly to changing conditions. The solution utilizes a model with a Double Deep Q-Network (DDQN) architecture and a discrete valuation structure. Prioritized Experience Replay (PER) was implemented to increase model stability, allowing infrequent but effective experiments to contribute more to the learning process. The state representation is constructed using the vehicle’s location, battery level, load status, and current customer demands. Scalability is ensured by dividing customer locations into clusters using the K-means method, with each cluster handled by an independent representative. The approach was tested with real-world road data obtained from the Meşelik Campus of Osmangazi University in Eskişehir. Experiments conducted under different demand levels and data sizes have shown that the PER-assisted DDQN structure produces more stable and shorter route lengths in dynamic scenarios, but random selection, greedy method and genetic algorithm experience significant performance losses as dynamicity increases. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

22 pages, 3688 KB  
Article
An End-to-End Hierarchical Intelligent Inference Model for Collaborative Operation of Grid Switches
by Mingrui Zhao, Tie Chen, Jiaxin Yuan, Yuting Jiang and Junlin Ren
Energies 2025, 18(24), 6574; https://doi.org/10.3390/en18246574 - 16 Dec 2025
Viewed by 243
Abstract
To address the issue of heavy reliance on manual intervention in substation maintenance tasks, this paper proposes an end-to-end hierarchical intelligent inference method for collaborative operation of grid switches. The method constructs a unified knowledge environment that can simultaneously describe the operational characteristics [...] Read more.
To address the issue of heavy reliance on manual intervention in substation maintenance tasks, this paper proposes an end-to-end hierarchical intelligent inference method for collaborative operation of grid switches. The method constructs a unified knowledge environment that can simultaneously describe the operational characteristics of both the power grid and the substation, and combines Dueling Double Deep Q-Network (D3QN) with Multi-Task Dueling Double Deep Q-Network (MT-D3QN) algorithms for interactive training, achieving hierarchical inference. The upper layer uses bays as the base nodes to reflect the power flow, designing a reward and penalty function under N-1 power flow constraints and ring-current impact constraints, optimizing the load transfer plan for the power outages caused by maintenance tasks. The lower layer uses switches as the base nodes to reflect the main wiring status of the substation, introduces a multi-task learning mechanism for parallel training of bays with the same tasks, designs the reward and penalty function according to the five protection rules, and optimizes the switching operations within the bay. The experimental results show that the trained model can quickly deduce the switching operation sequence for different maintenance tasks. Full article
Show Figures

Figure 1

25 pages, 821 KB  
Article
Enhancing Microservice Security Through Adaptive Moving Target Defense Policies to Mitigate DDoS Attacks in Cloud-Native Environments
by Yuyang Zhou, Guang Cheng and Kang Du
Future Internet 2025, 17(12), 580; https://doi.org/10.3390/fi17120580 - 16 Dec 2025
Viewed by 268
Abstract
Cloud-native microservice architectures offer scalability and resilience but introduce complex interdependencies and new attack surfaces, making them vulnerable to resource-exhaustion Distributed Denial-of-Service (DDoS) attacks. These attacks propagate along service call chains, closely mimic legitimate traffic, and evade traditional detection and mitigation techniques, resulting [...] Read more.
Cloud-native microservice architectures offer scalability and resilience but introduce complex interdependencies and new attack surfaces, making them vulnerable to resource-exhaustion Distributed Denial-of-Service (DDoS) attacks. These attacks propagate along service call chains, closely mimic legitimate traffic, and evade traditional detection and mitigation techniques, resulting in cascading bottlenecks and degraded Quality of Service (QoS). Existing Moving Target Defense (MTD) approaches lack adaptive, cost-aware policy guidance and are often ineffective against spatiotemporally adaptive adversaries. To address these challenges, this paper proposes ScaleShield, an adaptive MTD framework powered by Deep Reinforcement Learning (DRL) that learns coordinated, attack-aware defense policies for microservices. ScaleShield formulates defense as a Markov Decision Process (MDP) over multi-dimensional discrete actions, leveraging a Multi-Dimensional Double Deep Q-Network (MD3QN) to optimize service availability and minimize operational overhead. Experimental results demonstrate that ScaleShield achieves near 100% defense success rates and reduces compromised nodes to zero within approximately 5 steps, significantly outperforming state-of-the-art baselines. It lowers service latency by up to 72% under dynamic attacks while maintaining over 94% resource efficiency, providing robust and cost-effective protection against resource-exhaustion DDoS attacks in cloud-native environments. Full article
(This article belongs to the Special Issue DDoS Attack Detection for Cyber–Physical Systems)
Show Figures

Figure 1

14 pages, 2239 KB  
Article
Energy-Efficient Path Planning for Snake Robots Using a Deep Reinforcement Learning-Enhanced A* Algorithm
by Yang Gu, Zelin Wang and Zhong Huang
Biomimetics 2025, 10(12), 826; https://doi.org/10.3390/biomimetics10120826 - 10 Dec 2025
Viewed by 420
Abstract
Snake-like robots, characterized by their high flexibility and multi-joint structure, exhibit exceptional adaptability to complex terrains such as snowfields, jungles, deserts, and underwater environments. Their ability to navigate narrow spaces and circumvent obstacles makes them ideal for operations in confined or rugged environments. [...] Read more.
Snake-like robots, characterized by their high flexibility and multi-joint structure, exhibit exceptional adaptability to complex terrains such as snowfields, jungles, deserts, and underwater environments. Their ability to navigate narrow spaces and circumvent obstacles makes them ideal for operations in confined or rugged environments. However, efficient motion in such conditions requires not only mechanical flexibility but also effective path planning to ensure safety, energy efficiency, and overall task performance. Most existing path planning algorithms for snake-like robots focus primarily on finding the shortest path between the start and target positions while neglecting the optimization of energy consumption during real operations. To address this limitation, this study proposes an energy-efficient path planning method based on an improved A* algorithm enhanced with deep reinforcement learning: Dueling Double-Deep Q-Network (D3QN). An Energy Consumption Estimation Model (ECEM) is first developed to evaluate the energetic cost of snake robot motion in three-dimensional space. This model is then integrated into a new heuristic function to guide the A* search toward energy-optimal trajectories. Simulation experiments were conducted in a 3D environment to assess the performance of the proposed approach. The results demonstrate that the improved A* algorithm effectively reduces the energy consumption of the snake robot compared with conventional algorithms. Specifically, the proposed method achieves an energy consumption of 68.79 J, which is 3.39%, 27.26%, and 5.91% lower than that of the traditional A* algorithm (71.20 J), the bidirectional A* algorithm (94.61 J), and the weighted improved A* algorithm (73.11 J), respectively. These findings confirm that integrating deep reinforcement learning with an adaptive heuristic function significantly enhances both the energy efficiency and practical applicability of snake robot path planning in complex 3D environments. Full article
(This article belongs to the Section Locomotion and Bioinspired Robotics)
Show Figures

Figure 1

26 pages, 4507 KB  
Article
A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning
by Hadi Mohammadian KhalafAnsar, Jaime Rohten and Jafar Keighobadi
AI 2025, 6(12), 319; https://doi.org/10.3390/ai6120319 - 6 Dec 2025
Viewed by 629
Abstract
Objectives: This paper presents an innovative control framework for the classical Cart–Pole problem. Methods: The proposed framework combines Interval Type-2 Fuzzy Logic, the Dueling Double DQN deep reinforcement learning algorithm, and adaptive reward shaping techniques. Specifically, fuzzy logic acts as an a priori [...] Read more.
Objectives: This paper presents an innovative control framework for the classical Cart–Pole problem. Methods: The proposed framework combines Interval Type-2 Fuzzy Logic, the Dueling Double DQN deep reinforcement learning algorithm, and adaptive reward shaping techniques. Specifically, fuzzy logic acts as an a priori knowledge layer that incorporates measurement uncertainty in both angle and angular velocity, allowing the controller to generate adaptive actions dynamically. Simultaneously, the deep Q-network is responsible for learning the optimal policy. To ensure stability, the Double DQN mechanism successfully alleviates the overestimation bias commonly observed in value-based reinforcement learning. An accelerated convergence mechanism is achieved through a multi-component reward shaping function that prioritizes angle stability and survival. Results: Given the training results, the method stabilizes rapidly; it achieves a 100% success rate by episode 20 and maintains consistent high rewards (650–700) throughout training. While Standard DQN and other baselines take 100+ episodes to become reliable, our method converges in about 20 episodes (4–5 times faster). It is observed that in comparison with advanced baselines like C51 or PER, the proposed method is about 15–20% better in final performance. We also found that PPO and QR-DQN surprisingly struggle on this task, highlighting the need for stability mechanisms. Conclusions: The proposed approach provides a practical solution that balances exploration with safety through the integration of fuzzy logic and deep reinforcement learning. This rapid convergence is particularly important for real-world applications where data collection is expensive, achieving stable performance much faster than existing methods without requiring complex theoretical guarantees. Full article
Show Figures

Figure 1

18 pages, 1169 KB  
Article
Fusion of Deep Reinforcement Learning and Educational Data Mining for Decision Support in Journalism and Communication
by Weichen Jia and Zhi Li
Information 2025, 16(12), 1029; https://doi.org/10.3390/info16121029 - 26 Nov 2025
Viewed by 537
Abstract
The project-based learning model in journalism and communication faces challenges of sparse multimodal behavior data and delayed teaching interventions, making it difficult to perceive student states and optimize decisions in real-time. This study aims to construct an intelligent decision-support framework integrating educational data [...] Read more.
The project-based learning model in journalism and communication faces challenges of sparse multimodal behavior data and delayed teaching interventions, making it difficult to perceive student states and optimize decisions in real-time. This study aims to construct an intelligent decision-support framework integrating educational data mining (EDM) and deep reinforcement learning (DRL) to address these issues. A bidirectional long short-term memory (Bi-LSTM) network models behavioral sequences, while a conditional generative adversarial network (cGAN) with Wasserstein optimization enhances low-activity student data. The extracted and augmented features are then fed into a Double Deep Q-Network (DQN) to generate adaptive teaching intervention strategies. Experimental results from a 26-week study show that the proposed framework improved personalized learning-path matching from 0.42 to 0.68, increased knowledge mastery from 40.46% to 77.13%, and reduced intervention latency from 210.5 min to 144.6 min. The results demonstrate that the fusion of EDM and DRL can achieve efficient and adaptive decision-making, providing a viable approach for intelligent teaching support in journalism and communication education. Full article
(This article belongs to the Special Issue Human–Computer Interactions and Computer-Assisted Education)
Show Figures

Figure 1

22 pages, 2341 KB  
Article
Assembly Measurement Path Planning for Mobile Robots Using an Improved Deep Reinforcement Learning
by Gang Yuan, Bo Zhu, Yi Hu, Guangdong Tian and Zhiwu Li
Appl. Sci. 2025, 15(23), 12406; https://doi.org/10.3390/app152312406 - 22 Nov 2025
Viewed by 399
Abstract
In addressing the challenges associated with mobile robot path planning during complex product assembly measurements, this study introduces the N-step Priority Double DQN (NDDQN) algorithm, which integrates double Q-learning and an N-step priority strategy to accelerate convergence. This approach aims to improve the [...] Read more.
In addressing the challenges associated with mobile robot path planning during complex product assembly measurements, this study introduces the N-step Priority Double DQN (NDDQN) algorithm, which integrates double Q-learning and an N-step priority strategy to accelerate convergence. This approach aims to improve the obstacle avoidance capabilities of mobile robots while accelerating their learning efficiency. We conducted three grid-based obstacle avoidance simulation experiments of varying scales to compare and analyze the path planning performance of both the proximal policy optimization algorithm and the Deep Q Network algorithm. To accurately simulate real-world robotic measurement scenarios, two Gazebo environments were utilized to validate the effectiveness of our proposed algorithm. Through a comprehensive analysis of simulation results from all three algorithms, we demonstrate that the NDDQN algorithm exhibits significant effectiveness and stability in path planning. Notably, it substantially reduces iteration counts and enhances convergence speeds. This research provides a theoretical foundation for adaptive path planning in mobile robots engaged in complex product assembly measurements. Full article
Show Figures

Figure 1

18 pages, 5815 KB  
Article
Dual-Objective Pareto Optimization Method of Flapping Hydrofoil Propulsion Performance Based on MLP and Double DQN
by Jingling Zhang, Xuchen Qiu, Wenyu Chen, Ertian Hua and Yajie Shen
Water 2025, 17(22), 3290; https://doi.org/10.3390/w17223290 - 18 Nov 2025
Viewed by 449
Abstract
To address the inherent complexities of underwater operating environments and achieve the design of a highly efficient, energy-saving flapping hydrofoil, this paper proposes an intelligent agent-based model for real-time parametric optimization. A non-parametric surrogate model based on a Multilayer Perceptron (MLP) is established [...] Read more.
To address the inherent complexities of underwater operating environments and achieve the design of a highly efficient, energy-saving flapping hydrofoil, this paper proposes an intelligent agent-based model for real-time parametric optimization. A non-parametric surrogate model based on a Multilayer Perceptron (MLP) is established using data samples of multi-dimensional flapping hydrofoil geometric parameters obtained through Computational Fluid Dynamics (CFD) simulations. An improved Double Deep Q-Network (DDQN) algorithm incorporating Pareto frontier information is deployed within the surrogate model to obtain the Pareto optimal solution set for propulsion efficiency and average input power, and a set of propulsion parameter combinations with error ranges between 0.24% and 1.27% across continuous intervals was obtained. Experimental results demonstrate that the proposed MLP-DDQN method is capable of learning the domain-wide optimal solution within the experimental environment, satisfying the Pareto optimality between propulsion efficiency and average input power. Further analysis of the flow field around the flapping hydrofoil under the obtained optimal parameter combination revealed that the presence of stable and continuously attached vortex structures on the wing surface is the intrinsic mechanism responsible for its superior propulsion performance. Full article
Show Figures

Figure 1

28 pages, 5070 KB  
Article
Energy-Efficient Scheduling for Distributed Hybrid Flowshop of Offshore Wind Blade Manufacturing Considering Limited Buffers
by Qinglei Zhang, Qianyuan Zhang, Jianguo Duan, Jiyun Qin and Ying Zhou
J. Mar. Sci. Eng. 2025, 13(11), 2176; https://doi.org/10.3390/jmse13112176 - 17 Nov 2025
Viewed by 332
Abstract
Amidst the backdrop of energy transition, scheduling problems in offshore manufacturing have emerged as critical challenges in marine engineering. However, the inherently coupled constraints of sequence-dependent setup times (SDST) and limited buffers (LB) have been largely overlooked. Therefore, this paper establishes the first [...] Read more.
Amidst the backdrop of energy transition, scheduling problems in offshore manufacturing have emerged as critical challenges in marine engineering. However, the inherently coupled constraints of sequence-dependent setup times (SDST) and limited buffers (LB) have been largely overlooked. Therefore, this paper establishes the first multi-objective scheduling model, DHFSP-SDST&LB, specifically tailored for large components like turbine blades. A hybrid optimization algorithm, DDQN-MOCE, integrating an evolutionary algorithm (EA) and a double deep Q-network (DDQN), is proposed to overcome the inherent limitations of traditional MOEAs. In the EA component, a three-phase crossover and mutation policy is employed to generate offspring. In the DDQN component, the dimension-reduced feature vectors serve as the state input, and three makespan-oriented and two energy-oriented heuristic search actions are defined based on the knowledge. Finally, the optimal parameter combination is determined via Taguchi experimental design, and the effectiveness of DDQN-MOCE is evaluated on 36 instances and 1 industrial case. Experimental results demonstrate that DDQN-MOCE’s HV surpasses the second-best result by over 50% in 34 instances. It achieves the best GD, near-absolute dominance, and saves over 22% in total energy, with its high volume of solutions compensating for a minor weakness in spacing. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

12 pages, 578 KB  
Article
A Power-Aware 5G Network Slicing Scheme for IIoT Systems with Age Tolerance
by Mingjiang Weng, Yixuan Bai and Xin Xie
Sensors 2025, 25(22), 6956; https://doi.org/10.3390/s25226956 - 14 Nov 2025
Viewed by 591
Abstract
Network slicing has emerged as a pivotal technology in addressing the diverse customization requirements of the Industrial Internet of Things (IIoT) within 5G networks, enabling the deployment of multiple logical networks over shared infrastructure. Efficient resource management in this context is essential to [...] Read more.
Network slicing has emerged as a pivotal technology in addressing the diverse customization requirements of the Industrial Internet of Things (IIoT) within 5G networks, enabling the deployment of multiple logical networks over shared infrastructure. Efficient resource management in this context is essential to ensure energy efficiency and meet the stringent real-time demands of IIoT applications. This study focuses on the scheduling problem of minimizing average transmission power while maintaining Age of Information (AoI) tolerance constraints within 5G wireless network slicing. To tackle this challenge, an improved Dueling Double Deep Q-Network (D3QN) is leveraged to devise intelligent slicing schemes that dynamically allocate resources, ensuring optimal performance in time-varying wireless environments. The proposed improved D3QN approach introduces a novel heuristic-based exploration strategy that restricts action choices to the most effective options, significantly; reducing ineffective learning steps. The simulation results show that the method not only speeds up convergence considerably but also achieves lower transmit power while preserving strict AoI reliability constraints and slice isolation. Full article
Show Figures

Figure 1

30 pages, 695 KB  
Article
Task Offloading and Resource Allocation for ICVs in Vehicular Edge Computing Networks Based on Hybrid Hierarchical Deep Reinforcement Learning
by Jiahui Liu, Yuan Zou, Guodong Du, Xudong Zhang and Jinming Wu
Sensors 2025, 25(22), 6914; https://doi.org/10.3390/s25226914 - 12 Nov 2025
Viewed by 1079
Abstract
Intelligent connected vehicles (ICVs) face challenges in handling intensive onboard computational tasks due to limited computing capacity. Vehicular edge computing networks (VECNs) offer a promising solution by enabling ICVs to offload tasks to mobile edge computing (MEC), alleviating computational load. As transportation systems [...] Read more.
Intelligent connected vehicles (ICVs) face challenges in handling intensive onboard computational tasks due to limited computing capacity. Vehicular edge computing networks (VECNs) offer a promising solution by enabling ICVs to offload tasks to mobile edge computing (MEC), alleviating computational load. As transportation systems are dynamic, vehicular tasks and MEC capacities vary over time, making efficient task offloading and resource allocation crucial. We explored a vehicle–road collaborative edge computing network and formulated the task offloading scheduling and resource allocation problem to minimize the sum of time and energy costs. To address the mixed nature of discrete and continuous decision variables and reduce computational complexity, we propose a hybrid hierarchical deep reinforcement learning (HHDRL) algorithm, structured in two layers. The upper layer of HHDRL enhances the double deep Q-network (DDQN) with a self-attention mechanism to improve feature correlation learning and generates discrete actions (communication decisions), while the lower layer employs deep deterministic policy gradient (DDPG) to produce continuous actions (power control, task offloading, and resource allocation decision). This hybrid design enables efficient decomposition of complex action spaces and improves adaptability in dynamic environments. Results from numerical simulations reveal that HHDRL achieves a significant reduction in total computational cost relative to current benchmark algorithms. Furthermore, the robustness of HHDRL to varying environmental conditions was confirmed by uniformly designing random numbers within a specified range for certain simulation parameters. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

41 pages, 5751 KB  
Article
Efficient Scheduling for GPU-Based Neural Network Training via Hybrid Reinforcement Learning and Metaheuristic Optimization
by Nana Du, Chase Wu, Aiqin Hou, Weike Nie and Ruiqi Song
Big Data Cogn. Comput. 2025, 9(11), 284; https://doi.org/10.3390/bdcc9110284 - 10 Nov 2025
Viewed by 1711
Abstract
On GPU-based clusters, the training workloads of machine learning (ML) models, particularly neural networks (NNs), are often structured as Directed Acyclic Graphs (DAGs) and typically deployed for parallel execution across heterogeneous GPU resources. Efficient scheduling of these workloads is crucial for optimizing performance [...] Read more.
On GPU-based clusters, the training workloads of machine learning (ML) models, particularly neural networks (NNs), are often structured as Directed Acyclic Graphs (DAGs) and typically deployed for parallel execution across heterogeneous GPU resources. Efficient scheduling of these workloads is crucial for optimizing performance metrics such as execution time, under various constraints including GPU heterogeneity, network capacity, and data dependencies. DAG-structured ML workload scheduling could be modeled as a Nonlinear Integer Program (NIP) problem, and is shown to be NP-complete. By leveraging a positive correlation between Scheduling Plan Distance (SPD) and Finish Time Gap (FTG) identified through an empirical study, we propose to develop a Running Time Gap Strategy for scheduling based on Whale Optimization Algorithm (WOA) and Reinforcement Learning, referred to as WORL-RTGS. The proposed method integrates the global search capabilities of WOA with the adaptive decision-making of Double Deep Q-Networks (DDQN). Particularly, we derive a novel function to generate effective scheduling plans using DDQN, enhancing adaptability to complex DAG structures. Comprehensive evaluations on practical ML workload traces collected from Alibaba on simulated GPU-enabled platforms demonstrate that WORL-RTGS significantly improves WOA’s stability for DAG-structured ML workload scheduling and reduces completion time by up to 66.56% compared with five state-of-the-art scheduling algorithms. Full article
Show Figures

Figure 1

27 pages, 4763 KB  
Article
Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks
by Adeel Iqbal, Ali Nauman and Tahir Khurshaid
Sensors 2025, 25(21), 6777; https://doi.org/10.3390/s25216777 - 5 Nov 2025
Viewed by 686
Abstract
The Vehicular Internet of Things (V-IoT) has emerged as a cornerstone of next-generation intelligent transportation systems (ITSs), enabling applications ranging from safety-critical collision avoidance and cooperative awareness to infotainment and fleet management. These heterogeneous services impose stringent quality-of-service (QoS) demands for latency, reliability, [...] Read more.
The Vehicular Internet of Things (V-IoT) has emerged as a cornerstone of next-generation intelligent transportation systems (ITSs), enabling applications ranging from safety-critical collision avoidance and cooperative awareness to infotainment and fleet management. These heterogeneous services impose stringent quality-of-service (QoS) demands for latency, reliability, and fairness while competing for limited and dynamically varying spectrum resources. Conventional schedulers, such as round-robin or static priority queues, lack adaptability, whereas deep reinforcement learning (DRL) solutions, though powerful, remain computationally intensive and unsuitable for real-time roadside unit (RSU) deployment. This paper proposes a lightweight and interpretable reinforcement learning (RL)-based spectrum management framework for Vehicular Internet of Things (V-IoT) networks. Two enhanced Q-Learning variants are introduced: a Value-Prioritized Action Double Q-Learning with Constraints (VPADQ-C) algorithm that enforces reliability and blocking constraints through a Constrained Markov Decision Process (CMDP) with online primal–dual optimization, and a contextual Q-Learning with Upper Confidence Bound (Q-UCB) method that integrates uncertainty-aware exploration and a Success-Rate Prior (SRP) to accelerate convergence. A Risk-Aware Heuristic baseline is also designed as a transparent, low-complexity benchmark to illustrate the interpretability–performance trade-off between rule-based and learning-driven approaches. A comprehensive simulation framework incorporating heterogeneous traffic classes, physical-layer fading, and energy-consumption dynamics is developed to evaluate throughput, delay, blocking probability, fairness, and energy efficiency. The results demonstrate that the proposed methods consistently outperform conventional Q-Learning and Double Q-Learning methods. VPADQ-C achieves the highest energy efficiency (≈8.425×107 bits/J) and reduces interruption probability by over 60%, while Q-UCB achieves the fastest convergence (within ≈190 episodes), lowest blocking probability (≈0.0135), and lowest mean delay (≈0.351 ms). Both schemes maintain fairness near 0.364, preserve throughput around 28 Mbps, and exhibit sublinear training-time scaling with O(1) per-update complexity and O(N2) overall runtime growth. Scalability analysis confirms that the proposed frameworks sustain URLLC-grade latency (<0.2 ms) and reliability under dense vehicular loads, validating their suitability for real-time, large-scale V-IoT deployments. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

26 pages, 2510 KB  
Article
GA-HPO PPO: A Hybrid Algorithm for Dynamic Flexible Job Shop Scheduling
by Yiming Zhou, Jun Jiang, Qining Shi, Maojie Fu, Yi Zhang, Yihao Chen and Longfei Zhou
Sensors 2025, 25(21), 6736; https://doi.org/10.3390/s25216736 - 4 Nov 2025
Cited by 1 | Viewed by 1125
Abstract
The Job Shop Scheduling Problem (JSP), a classical NP-hard challenge, has given rise to various complex extensions to accommodate modern manufacturing requirements. Among them, the Dynamic Flexible Job Shop Scheduling Problem (DFJSP) remains particularly challenging, due to its stochastic task arrivals, heterogeneous deadlines, [...] Read more.
The Job Shop Scheduling Problem (JSP), a classical NP-hard challenge, has given rise to various complex extensions to accommodate modern manufacturing requirements. Among them, the Dynamic Flexible Job Shop Scheduling Problem (DFJSP) remains particularly challenging, due to its stochastic task arrivals, heterogeneous deadlines, and varied task types. Traditional optimization- and rule-based approaches often fail to capture these dynamics effectively. To address this gap, this study proposes a hybrid algorithm, GA-HPO PPO, tailored for the DFJSP. The method integrates genetic-algorithm–based hyperparameter optimization with proximal policy optimization to enhance learning efficiency and scheduling performance. The algorithm was trained on four datasets and evaluated on ten benchmark datasets widely adopted in DFJSP research. Comparative experiments against Double Deep Q-Network (DDQN), standard PPO, and rule-based heuristics demonstrated that GA-HPO PPO consistently achieved superior performance. Specifically, it reduced the number of overdue tasks by an average of 18.5 in 100-task scenarios and 197 in 1000-task scenarios, while maintaining a machine utilization above 67% and 28% in these respective scenarios, and limiting the makespan to within 108–114 and 506–510 time units. The model also demonstrated a 25% faster convergence rate and 30% lower variance in performance across unseen scheduling instances compared to standard PPO, confirming its robustness and generalization capability across diverse scheduling conditions. These results indicate that GA-HPO PPO provides an effective and scalable solution for the DFJSP, contributing to improved dynamic scheduling optimization in practical manufacturing environments. Full article
Show Figures

Figure 1

Back to TopTop