Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (805)

Search Parameters:
Keywords = deep Q-network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 1850 KiB  
Article
Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction
by Jue Chen, Xin Cheng, Yanjie Jia and Shuai Tan
Appl. Sci. 2025, 15(15), 8335; https://doi.org/10.3390/app15158335 - 26 Jul 2025
Viewed by 50
Abstract
With the rapid development of smart devices and the Internet of Things (IoT), the explosive growth of data has placed increasingly higher demands on real-time processing and intelligent decision making. Cloud-edge collaborative computing has emerged as a mainstream architecture to address these challenges. [...] Read more.
With the rapid development of smart devices and the Internet of Things (IoT), the explosive growth of data has placed increasingly higher demands on real-time processing and intelligent decision making. Cloud-edge collaborative computing has emerged as a mainstream architecture to address these challenges. However, in sky-ground integrated systems, the limited computing capacity of edge devices and the inconsistency between cloud-side fusion results and edge-side detection outputs significantly undermine the reliability of edge inference. To overcome these issues, this paper proposes a cloud-edge collaborative model adaptation framework that integrates deep reinforcement learning via Deep Q-Networks (DQN) with local feature transfer. The framework enables category-level dynamic decision making, allowing for selective migration of classification head parameters to achieve on-demand adaptive optimization of the edge model and enhance consistency between cloud and edge results. Extensive experiments conducted on a large-scale multi-view remote sensing aircraft detection dataset demonstrate that the proposed method significantly improves cloud-edge consistency. The detection consistency rate reaches 90%, with some scenarios approaching 100%. Ablation studies further validate the necessity of the DQN-based decision strategy, which clearly outperforms static heuristics. In the model adaptation comparison, the proposed method improves the detection precision of the A321 category from 70.30% to 71.00% and the average precision (AP) from 53.66% to 53.71%. For the A330 category, the precision increases from 32.26% to 39.62%, indicating strong adaptability across different target types. This study offers a novel and effective solution for cloud-edge model adaptation under resource-constrained conditions, enhancing both the consistency of cloud-edge fusion and the robustness of edge-side intelligent inference. Full article
Show Figures

Figure 1

27 pages, 3211 KiB  
Article
Hybrid Deep Learning-Reinforcement Learning for Adaptive Human-Robot Task Allocation in Industry 5.0
by Claudio Urrea
Systems 2025, 13(8), 631; https://doi.org/10.3390/systems13080631 - 26 Jul 2025
Viewed by 72
Abstract
Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural [...] Read more.
Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural Network (CNN) classifies nine fatigue–skill combinations from synthetic physiological cues (heart-rate, blink rate, posture, wrist acceleration); its outputs feed a Double Deep Q-Network (DDQN) whose state vector also includes task-queue and robot-status features. The DDQN optimises a multi-objective reward balancing throughput, workload and safety and executes at 10 Hz within a closed-loop pipeline implemented in MATLAB R2025a and RoboDK v5.9. Benchmarking on a 1000-episode HRC dataset (2500 allocations·episode−1) shows the hybrid CNN+DDQN controller raises throughput to 60.48 ± 0.08 tasks·min−1 (+21% vs. rule-based, +12% vs. SARSA, +8% vs. Dueling DQN, +5% vs. PPO), trims operator fatigue by 7% and sustains 99.9% collision-free operation (one-way ANOVA, p < 0.05; post-hoc power 1 − β = 0.87). Visual analyses confirm responsive task reallocation as fatigue rises or skill varies. The approach outperforms strong baselines (PPO, A3C, Dueling DQN) by mitigating Q-value over-estimation through double learning, providing robust policies under stochastic human states and offering a reproducible blueprint for multi-robot, Industry 5.0 factories. Future work will validate the controller on a physical Doosan H2017 cell and incorporate fairness constraints to avoid workload bias across multiple operators. Full article
(This article belongs to the Section Systems Engineering)
Show Figures

Figure 1

33 pages, 4841 KiB  
Article
Research on Task Allocation in Four-Way Shuttle Storage and Retrieval Systems Based on Deep Reinforcement Learning
by Zhongwei Zhang, Jingrui Wang, Jie Jin, Zhaoyun Wu, Lihui Wu, Tao Peng and Peng Li
Sustainability 2025, 17(15), 6772; https://doi.org/10.3390/su17156772 - 25 Jul 2025
Viewed by 209
Abstract
The four-way shuttle storage and retrieval system (FWSS/RS) is an advanced automated warehousing solution for achieving green and intelligent logistics, and task allocation is crucial to its logistics efficiency. However, current research on task allocation in three-dimensional storage environments is mostly conducted in [...] Read more.
The four-way shuttle storage and retrieval system (FWSS/RS) is an advanced automated warehousing solution for achieving green and intelligent logistics, and task allocation is crucial to its logistics efficiency. However, current research on task allocation in three-dimensional storage environments is mostly conducted in the single-operation mode that handles inbound or outbound tasks individually, with limited attention paid to the more prevalent composite operation mode where inbound and outbound tasks coexist. To bridge this gap, this study investigates the task allocation problem in an FWSS/RS under the composite operation mode, and deep reinforcement learning (DRL) is introduced to solve it. Initially, the FWSS/RS operational workflows and equipment motion characteristics are analyzed, and a task allocation model with the total task completion time as the optimization objective is established. Furthermore, the task allocation problem is transformed into a partially observable Markov decision process corresponding to reinforcement learning. Each shuttle is regarded as an independent agent that receives localized observations, including shuttle position information and task completion status, as inputs, and a deep neural network is employed to fit value functions to output action selections. Correspondingly, all agents are trained within an independent deep Q-network (IDQN) framework that facilitates collaborative learning through experience sharing while maintaining decentralized decision-making based on individual observations. Moreover, to validate the efficiency and effectiveness of the proposed model and method, experiments were conducted across various problem scales and transport resource configurations. The experimental results demonstrate that the DRL-based approach outperforms conventional task allocation methods, including the auction algorithm and the genetic algorithm. Specifically, the proposed IDQN-based method reduces the task completion time by up to 12.88% compared to the auction algorithm, and up to 8.64% compared to the genetic algorithm across multiple scenarios. Moreover, task-related factors are found to have a more significant impact on the optimization objectives of task allocation than transport resource-related factors. Full article
Show Figures

Figure 1

27 pages, 5145 KiB  
Article
An Improved Deep Q-Learning Approach for Navigation of an Autonomous UAV Agent in 3D Obstacle-Cluttered Environment
by Ghulam Farid, Muhammad Bilal, Lanyong Zhang, Ayman Alharbi, Ishaq Ahmed and Muhammad Azhar
Drones 2025, 9(8), 518; https://doi.org/10.3390/drones9080518 - 23 Jul 2025
Viewed by 241
Abstract
The performance of the UAVs while executing various mission profiles greatly depends on the selection of planning algorithms. Reinforcement learning (RL) algorithms can effectively be utilized for robot path planning. Due to random action selection in case of action ties, the traditional Q-learning [...] Read more.
The performance of the UAVs while executing various mission profiles greatly depends on the selection of planning algorithms. Reinforcement learning (RL) algorithms can effectively be utilized for robot path planning. Due to random action selection in case of action ties, the traditional Q-learning algorithm and its other variants face the issues of slow convergence and suboptimal path planning in high-dimensional navigational environments. To solve these problems, we propose an improved deep Q-network (DQN), incorporating an efficient tie-breaking mechanism, prioritized experience replay (PER), and L2-regularization. The adopted tie-breaking mechanism improves the action selection and ultimately helps in generating an optimal trajectory for the UAV in a 3D cluttered environment. To improve the convergence speed of the traditional Q-algorithm, prioritized experience replay is used, which learns from experiences with high temporal difference (TD) error and avoids uniform sampling of stored transitions during training. This also allows the prioritization of high-reward experiences (e.g., reaching a goal), which helps the agent to rediscover these valuable states and improve learning. Moreover, L2-regularization is adopted that encourages smaller weights for more stable and smoother Q-values to reduce the erratic action selections and promote smoother UAV flight paths. Finally, the performance of the proposed method is presented and thoroughly compared against the traditional DQN, demonstrating its superior effectiveness. Full article
Show Figures

Figure 1

12 pages, 1562 KiB  
Article
Intra-Host Evolution During Relapsing Parvovirus B19 Infection in Immunocompromised Patients
by Anne Russcher, Yassene Mohammed, Margriet E. M. Kraakman, Xavier Chow, Stijn T. Kok, Eric C. J. Claas, Manfred Wuhrer, Ann C. T. M. Vossen, Aloys C. M. Kroes and Jutte J. C. de Vries
Viruses 2025, 17(8), 1034; https://doi.org/10.3390/v17081034 - 23 Jul 2025
Viewed by 194
Abstract
Background: Parvovirus B19 (B19V) can cause severe relapsing episodes of pure red cell aplasia in immunocompromised individuals, which are commonly treated with intravenous immunoglobulins (IVIGs). Few data are available on B19V intra-host evolution and the role of humoral immune selection. Here, we report [...] Read more.
Background: Parvovirus B19 (B19V) can cause severe relapsing episodes of pure red cell aplasia in immunocompromised individuals, which are commonly treated with intravenous immunoglobulins (IVIGs). Few data are available on B19V intra-host evolution and the role of humoral immune selection. Here, we report the dynamics of genomic mutations and subsequent protein changes during relapsing infection. Methods: Longitudinal plasma samples from immunocompromised patients with relapsing B19V infection in the period 2011–2019 were analyzed using whole-genome sequencing to evaluate intra-host evolution. The impact of mutations on the 3D viral protein structure was predicted by deep neural network modeling. Results: Of the three immunocompromised patients with relapsing infections for 3 to 9 months, one patient developed two consecutive nonsynonymous mutations in the VP1/2 region: T372S/T145S and Q422L/Q195L. The first mutation was detected in multiple B19V IgG-seropositive follow-up samples and resolved after IgG seroreversion. Computational prediction of the VP1 3D structure of this mutant showed a conformational change in the proximity of the antibody binding domain. No conformational changes were predicted for the other mutations detected. Discussion: Analysis of relapsing B19V infections showed mutational changes occurring over time. Resulting amino acid changes were predicted to lead to a conformational capsid protein change in an IgG-seropositive patient. The impact of humoral response and IVIG treatment on B19V infections should be further investigated to understand viral evolution and potential immune escape. Full article
(This article belongs to the Collection Parvoviridae)
Show Figures

Figure 1

18 pages, 1138 KiB  
Article
Intelligent Priority-Aware Spectrum Access in 5G Vehicular IoT: A Reinforcement Learning Approach
by Adeel Iqbal, Tahir Khurshaid and Yazdan Ahmad Qadri
Sensors 2025, 25(15), 4554; https://doi.org/10.3390/s25154554 - 23 Jul 2025
Viewed by 214
Abstract
Efficient and intelligent spectrum access is crucial for meeting the diverse Quality of Service (QoS) demands of Vehicular Internet of Things (V-IoT) systems in next-generation cellular networks. This work proposes a novel reinforcement learning (RL)-based priority-aware spectrum management (RL-PASM) framework, a centralized self-learning [...] Read more.
Efficient and intelligent spectrum access is crucial for meeting the diverse Quality of Service (QoS) demands of Vehicular Internet of Things (V-IoT) systems in next-generation cellular networks. This work proposes a novel reinforcement learning (RL)-based priority-aware spectrum management (RL-PASM) framework, a centralized self-learning priority-aware spectrum management framework operating through Roadside Units (RSUs). RL-PASM dynamically allocates spectrum resources across three traffic classes: high-priority (HP), low-priority (LP), and best-effort (BE), utilizing reinforcement learning (RL). This work compares four RL algorithms: Q-Learning, Double Q-Learning, Deep Q-Network (DQN), and Actor-Critic (AC) methods. The environment is modeled as a discrete-time Markov Decision Process (MDP), and a context-sensitive reward function guides fairness-preserving decisions for access, preemption, coexistence, and hand-off. Extensive simulations conducted under realistic vehicular load conditions evaluate the performance across key metrics, including throughput, delay, energy efficiency, fairness, blocking, and interruption probability. Unlike prior approaches, RL-PASM introduces a unified multi-objective reward formulation and centralized RSU-based control to support adaptive priority-aware access for dynamic vehicular environments. Simulation results confirm that RL-PASM balances throughput, latency, fairness, and energy efficiency, demonstrating its suitability for scalable and resource-constrained deployments. The results also demonstrate that DQN achieves the highest average throughput, followed by vanilla QL. DQL and AC maintain fairness at high levels and low average interruption probability. QL demonstrates the lowest average delay and the highest energy efficiency, making it a suitable candidate for edge-constrained vehicular deployments. Selecting the appropriate RL method, RL-PASM offers a robust and adaptable solution for scalable, intelligent, and priority-aware spectrum access in vehicular communication infrastructures. Full article
(This article belongs to the Special Issue Emerging Trends in Next-Generation mmWave Cognitive Radio Networks)
Show Figures

Figure 1

20 pages, 3000 KiB  
Article
NRNH-AR: A Small Robotic Agent Using Tri-Fold Learning for Navigation and Obstacle Avoidance
by Carlos Vasquez-Jalpa, Mariko Nakano, Martin Velasco-Villa and Osvaldo Lopez-Garcia
Appl. Sci. 2025, 15(15), 8149; https://doi.org/10.3390/app15158149 - 22 Jul 2025
Viewed by 211
Abstract
We propose a tri-fold learning algorithm, called Neuroevolution of Hybrid Neural Networks in a Robotic Agent (acronym in Spanish, NRNH-AR), based on deep reinforcement learning (DRL), with self-supervised learning (SSL) and unsupervised learning (USL) steps, specifically designed to be implemented in a small [...] Read more.
We propose a tri-fold learning algorithm, called Neuroevolution of Hybrid Neural Networks in a Robotic Agent (acronym in Spanish, NRNH-AR), based on deep reinforcement learning (DRL), with self-supervised learning (SSL) and unsupervised learning (USL) steps, specifically designed to be implemented in a small autonomous navigation robot capable of operating in constrained physical environments. The NRNH-AR algorithm is designed for a small physical robotic agent with limited resources. The proposed algorithm was evaluated in four critical aspects: computational cost, learning stability, required memory size, and operation speed. The results obtained show that the performance of NRNH-AR is within the ranges of the Deep Q Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). The proposed algorithm comprises three types of learning algorithms: SSL, USL, and DRL. Thanks to the series of learning algorithms, the proposed algorithm optimizes the use of resources and demonstrates adaptability in dynamic environments, a crucial aspect of navigation robotics. By integrating computer vision techniques based on a Convolutional Neuronal Network (CNN), the algorithm enhances its abilities to understand visual observations of the environment rapidly and detect a specific object, avoiding obstacles. Full article
Show Figures

Figure 1

24 pages, 6250 KiB  
Article
A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network
by Shaojun Tao, Hongying Tang, Jiang Wang and Baoqing Li
Sensors 2025, 25(14), 4416; https://doi.org/10.3390/s25144416 - 15 Jul 2025
Viewed by 202
Abstract
Multi-hop routing over low-power wide-area networks (LPWANs) has emerged as a promising technology for extending network coverage. However, existing protocols face high transmission disruption risks due to factors such as dynamic topology driven by stochastic events, dynamic link quality, and coverage holes induced [...] Read more.
Multi-hop routing over low-power wide-area networks (LPWANs) has emerged as a promising technology for extending network coverage. However, existing protocols face high transmission disruption risks due to factors such as dynamic topology driven by stochastic events, dynamic link quality, and coverage holes induced by imbalanced energy consumption. To address this issue, we propose a failure risk-aware deep Q-network-based multi-hop routing (FRDR) protocol, aiming to reduce transmission disruption probability. First, we design a power regulation mechanism (PRM) that works in conjunction with pre-selection rules to optimize end-device node (EN) activations and candidate relay selection. Second, we introduce the concept of routing failure risk value (RFRV) to quantify the potential failure risk posed by each candidate next-hop EN, which correlates with its neighborhood state characteristics (i.e., the number of neighbors, the residual energy level, and link quality). Third, a deep Q-network (DQN)-based routing decision mechanism is proposed, where a multi-objective reward function incorporating RFRV, residual energy, distance to the gateway, and transmission hops is utilized to determine the optimal next-hop. Simulation results demonstrate that FRDR outperforms existing protocols in terms of packet delivery rate and network lifetime while maintaining comparable transmission delay. Full article
(This article belongs to the Special Issue Security, Privacy and Trust in Wireless Sensor Networks)
Show Figures

Figure 1

17 pages, 1301 KiB  
Article
Carbon-Aware, Energy-Efficient, and SLA-Compliant Virtual Machine Placement in Cloud Data Centers Using Deep Q-Networks and Agglomerative Clustering
by Maraga Alex, Sunday O. Ojo and Fred Mzee Awuor
Computers 2025, 14(7), 280; https://doi.org/10.3390/computers14070280 - 15 Jul 2025
Viewed by 253
Abstract
The fast expansion of cloud computing has raised carbon emissions and energy usage in cloud data centers, so creative solutions for sustainable resource management are more necessary. This work presents a new algorithm—Carbon-Aware, Energy-Efficient, and SLA-Compliant Virtual Machine Placement using Deep Q-Networks (DQNs) [...] Read more.
The fast expansion of cloud computing has raised carbon emissions and energy usage in cloud data centers, so creative solutions for sustainable resource management are more necessary. This work presents a new algorithm—Carbon-Aware, Energy-Efficient, and SLA-Compliant Virtual Machine Placement using Deep Q-Networks (DQNs) and Agglomerative Clustering (CARBON-DQN)—that intelligibly balances environmental sustainability, service level agreement (SLA), and energy efficiency. The method combines a deep reinforcement learning model that learns optimum placement methods over time, carbon-aware data center profiling, and the hierarchical clustering of virtual machines (VMs) depending on resource constraints. Extensive simulations show that CARBON-DQN beats conventional and state-of-the-art algorithms like GRVMP, NSGA-II, RLVMP, GMPR, and MORLVMP very dramatically. Among many virtual machine configurations—including micro, small, high-CPU, and extra-large instances—it delivers the lowest carbon emissions, lowered SLA violations, and lowest energy usage. Driven by real-time input, the adaptive decision-making capacity of the algorithm allows it to dynamically react to changing data center circumstances and workloads. These findings highlight how well CARBON-DQN is a sustainable and intelligent virtual machine deployment system for cloud systems. To improve scalability, environmental effect, and practical applicability even further, future work will investigate the integration of renewable energy forecasts, dynamic pricing models, and deployment across multi-cloud and edge computing environments. Full article
Show Figures

Figure 1

22 pages, 2108 KiB  
Article
Deep Reinforcement Learning for Real-Time Airport Emergency Evacuation Using Asynchronous Advantage Actor–Critic (A3C) Algorithm
by Yujing Zhou, Yupeng Yang, Bill Deng Pan, Yongxin Liu, Sirish Namilae, Houbing Herbert Song and Dahai Liu
Mathematics 2025, 13(14), 2269; https://doi.org/10.3390/math13142269 - 15 Jul 2025
Viewed by 314
Abstract
Emergencies can occur unexpectedly and require immediate action, especially in aviation, where time pressure and uncertainty are high. This study focused on improving emergency evacuation in airport and aircraft scenarios using real-time decision-making support. A system based on the Asynchronous Advantage Actor–Critic (A3C) [...] Read more.
Emergencies can occur unexpectedly and require immediate action, especially in aviation, where time pressure and uncertainty are high. This study focused on improving emergency evacuation in airport and aircraft scenarios using real-time decision-making support. A system based on the Asynchronous Advantage Actor–Critic (A3C) algorithm, an advanced deep reinforcement learning method, was developed to generate faster and more efficient evacuation routes compared to traditional models. The A3C model was tested in various scenarios, including different environmental conditions and numbers of agents, and its performance was compared with the Deep Q-Network (DQN) algorithm. The results showed that A3C achieved evacuations 43.86% faster on average and converged in fewer episodes (100 vs. 250 for DQN). In dynamic environments with moving threats, A3C also outperformed DQN in maintaining agent safety and adapting routes in real time. As the number of agents increased, A3C maintained high levels of efficiency and robustness. These findings demonstrate A3C’s strong potential to enhance evacuation planning through improved speed, adaptability, and scalability. The study concludes by highlighting the practical benefits of applying such models in real-world emergency response systems, including significantly faster evacuation times, real-time adaptability to evolving threats, and enhanced scalability for managing large crowds in high-density environments including airport terminals. The A3C-based model offers a cost-effective alternative to full-scale evacuation drills by enabling virtual scenario testing, supports proactive safety planning through predictive modeling, and contributes to the development of intelligent decision-support tools that improve coordination and reduce response time during emergencies. Full article
Show Figures

Figure 1

24 pages, 8216 KiB  
Article
Application of Dueling Double Deep Q-Network for Dynamic Traffic Signal Optimization: A Case Study in Danang City, Vietnam
by Tho Cao Phan, Viet Dinh Le and Teron Nguyen
Mach. Learn. Knowl. Extr. 2025, 7(3), 65; https://doi.org/10.3390/make7030065 - 14 Jul 2025
Viewed by 453
Abstract
This study investigates the application of the Dueling Double Deep Q-Network (3DQN) algorithm to optimize traffic signal control at a major urban intersection in Danang City, Vietnam. The objective is to enhance signal timing efficiency in response to mixed traffic flow and real-world [...] Read more.
This study investigates the application of the Dueling Double Deep Q-Network (3DQN) algorithm to optimize traffic signal control at a major urban intersection in Danang City, Vietnam. The objective is to enhance signal timing efficiency in response to mixed traffic flow and real-world traffic dynamics. A simulation environment was developed using the Simulation of Urban Mobility (SUMO) software version 1.11, incorporating both a fixed-time signal controller and two 3DQN models trained with 1 million (1M-Step) and 5 million (5M-Step) iterations. The models were evaluated using randomized traffic demand scenarios ranging from 50% to 150% of baseline traffic volumes. The results demonstrate that the 3DQN models outperform the fixed-time controller, significantly reducing vehicle delays, with the 5M-Step model achieving average waiting times of under five minutes. To further assess the model’s responsiveness to real-time conditions, traffic flow data were collected using YOLOv8 for object detection and SORT for vehicle tracking from live camera feeds, and integrated into the SUMO-3DQN simulation. The findings highlight the robustness and adaptability of the 3DQN approach, particularly under peak traffic conditions, underscoring its potential for deployment in intelligent urban traffic management systems. Full article
Show Figures

Graphical abstract

19 pages, 3865 KiB  
Article
The Voltage Regulation of Boost Converters via a Hybrid DQN-PI Control Strategy Under Large-Signal Disturbances
by Pengqiang Nie, Yanxia Wu, Zhenlin Wang, Song Xu, Seiji Hashimoto and Takahiro Kawaguchi
Processes 2025, 13(7), 2229; https://doi.org/10.3390/pr13072229 - 12 Jul 2025
Viewed by 331
Abstract
The DC-DC boost converter plays a crucial role in interfacing low-voltage sources with high-voltage DC buses in DC microgrid systems. To enhance the dynamic response and robustness of the system under large-signal disturbances and time-varying system parameters, this paper proposes a hybrid control [...] Read more.
The DC-DC boost converter plays a crucial role in interfacing low-voltage sources with high-voltage DC buses in DC microgrid systems. To enhance the dynamic response and robustness of the system under large-signal disturbances and time-varying system parameters, this paper proposes a hybrid control strategy that integrates proportional–integral (PI) control with a deep Q-network (DQN). The proposed framework leverages the advantages of PI control in terms of steady-state regulation and a fast transient response, while also exploiting the capabilities of the DQN agent to learn optimal control policies in dynamic and uncertain environments. To validate the effectiveness and robustness of the proposed hybrid control framework, a detailed boost converter model was developed in the MATLAB 2024/Simulink environment. The simulation results demonstrate that the proposed framework exhibits a significantly faster transient response and enhanced robustness against nonlinear disturbances compared to the conventional PI and fuzzy controllers. Moreover, by incorporating PI-based fine-tuning in the steady-state phase, the framework effectively compensates for the control precision limitations caused by the discrete action space of the DQN algorithm, thereby achieving high-accuracy voltage regulation without relying on an explicit system model. Full article
(This article belongs to the Special Issue Challenges and Advances of Process Control Systems)
Show Figures

Figure 1

22 pages, 2261 KiB  
Article
Learning Deceptive Strategies in Adversarial Settings: A Two-Player Game with Asymmetric Information
by Sai Krishna Reddy Mareddy and Dipankar Maity
Appl. Sci. 2025, 15(14), 7805; https://doi.org/10.3390/app15147805 - 11 Jul 2025
Viewed by 312
Abstract
This study explores strategic deception and counter-deception in multi-agent reinforcement learning environments for a police officer–robber game. The research is motivated by real-world scenarios where agents must operate with partial observability and adversarial intent. We develop a suite of progressively complex grid-based environments [...] Read more.
This study explores strategic deception and counter-deception in multi-agent reinforcement learning environments for a police officer–robber game. The research is motivated by real-world scenarios where agents must operate with partial observability and adversarial intent. We develop a suite of progressively complex grid-based environments featuring dynamic goals, fake targets, and navigational obstacles. Agents are trained using deep Q-networks (DQNs) with game-theoretic reward shaping to encourage deceptive behavior in the robber and intent inference in the police officer. The robber learns to reach the true goal while misleading the police officer, and the police officer adapts to infer the robber’s intent and allocate resources effectively. The environments include fixed and dynamic layouts with varying numbers of goals and obstacles, allowing us to evaluate scalability and generalization. Experimental results demonstrate that the agents converge to equilibrium-like behaviors across all settings. The inclusion of obstacles increases complexity but also strengthens learned policies when guided by reward shaping. We conclude that integrating game theory with deep reinforcement learning enables the emergence of robust, deceptive strategies and effective counter-strategies, even in dynamic, high-dimensional environments. This work advances the design of intelligent agents capable of strategic reasoning under uncertainty and adversarial conditions. Full article
(This article belongs to the Special Issue Research Progress on the Application of Multi-agent Systems)
Show Figures

Figure 1

17 pages, 624 KiB  
Article
Parallel Simulation Multi-Sample Task Scheduling Approach Based on Deep Reinforcement Learning in Cloud Computing Environment
by Yuhao Xiao, Yping Yao and Feng Zhu
Mathematics 2025, 13(14), 2249; https://doi.org/10.3390/math13142249 - 11 Jul 2025
Viewed by 249
Abstract
Complex scenario analysis and evaluation simulations often involve multiple sets of simulation applications with different combinations of parameters, thus resulting in high computing power consumption, which is one of the factors that limits the efficiency of multi-sample parallel simulations. Cloud computing provides considerable [...] Read more.
Complex scenario analysis and evaluation simulations often involve multiple sets of simulation applications with different combinations of parameters, thus resulting in high computing power consumption, which is one of the factors that limits the efficiency of multi-sample parallel simulations. Cloud computing provides considerable amounts of cheap and convenient computing resources, thus providing efficient support for multi-sample simulation tasks. However, traditional simulation scheduling methods do not consider the collaborative parallel scheduling of multiple samples and multiple entities under multi-objective constraints. Deep reinforcement learning methods can continuously learn and adjust their strategies through interactions with the environment, demonstrating strong adaptability in response to dynamically changing task requirements. Therefore, herein, a parallel simulation multi-sample task scheduling method based on deep reinforcement learning in a cloud computing environment is proposed. The method collects cluster load information and simulation application information as state inputs in the cloud environment, designs a multi-objective reward function to balance the cost and execution efficiency, and uses deep Q-networks (DQNs) to train agents for intelligent scheduling of multi-sample parallel simulation tasks. In a real cloud environment, the proposed method demonstrates runtime reductions of 4–11% and execution cost savings of 11–22% compared to the Round-Robin algorithm, Best Fit algorithm, and genetic algorithm. Full article
Show Figures

Figure 1

22 pages, 2867 KiB  
Article
Hierarchical Deep Reinforcement Learning-Based Path Planning with Underlying High-Order Control Lyapunov Function—Control Barrier Function—Quadratic Programming Collision Avoidance Path Tracking Control of Lane-Changing Maneuvers for Autonomous Vehicles
by Haochong Chen and Bilin Aksun-Guvenc
Electronics 2025, 14(14), 2776; https://doi.org/10.3390/electronics14142776 - 10 Jul 2025
Viewed by 312
Abstract
Path planning and collision avoidance are essential components of an autonomous driving system (ADS), ensuring safe navigation in complex environments shared with other road users. High-quality planning and reliable obstacle avoidance strategies are essential for advancing the SAE autonomy level of autonomous vehicles, [...] Read more.
Path planning and collision avoidance are essential components of an autonomous driving system (ADS), ensuring safe navigation in complex environments shared with other road users. High-quality planning and reliable obstacle avoidance strategies are essential for advancing the SAE autonomy level of autonomous vehicles, which can largely reduce the risk of traffic accidents. In daily driving scenarios, lane changing is a common maneuver used to avoid unexpected obstacles such as parked vehicles or suddenly appearing pedestrians. Notably, lane-changing behavior is also widely regarded as a key evaluation criterion in driver license examinations, highlighting its practical importance in real-world driving. Motivated by this observation, this paper aims to develop an autonomous lane-changing system capable of dynamically avoiding obstacles in multi-lane traffic environments. To achieve this objective, we propose a hierarchical decision-making and control framework in which a Double Deep Q-Network (DDQN) agent operates as the high-level planner to select lane-level maneuvers, while a High-Order Control Lyapunov Function–High-Order Control Barrier Function–based Quadratic Program (HOCLF-HOCBF-QP) serves as the low-level controller to ensure safe and stable trajectory tracking under dynamic constraints. Simulation studies are used to evaluate the planning efficiency and overall collision avoidance performance of the proposed hierarchical control framework. The results demonstrate that the system is capable of autonomously executing appropriate lane-changing maneuvers to avoid multiple obstacles in complex multi-lane traffic environments. In computational cost tests, the low-level controller operates at 100 Hz with an average solve time of 0.66 ms per step, and the high-level policy operates at 5 Hz with an average solve time of 0.60 ms per step. The results demonstrate real-time capability in autonomous driving systems. Full article
(This article belongs to the Special Issue Intelligent Technologies for Vehicular Networks, 2nd Edition)
Show Figures

Figure 1

Back to TopTop