Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (98)

Search Parameters:
Keywords = dueling network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 3211 KiB  
Article
Hybrid Deep Learning-Reinforcement Learning for Adaptive Human-Robot Task Allocation in Industry 5.0
by Claudio Urrea
Systems 2025, 13(8), 631; https://doi.org/10.3390/systems13080631 - 26 Jul 2025
Viewed by 468
Abstract
Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural [...] Read more.
Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural Network (CNN) classifies nine fatigue–skill combinations from synthetic physiological cues (heart-rate, blink rate, posture, wrist acceleration); its outputs feed a Double Deep Q-Network (DDQN) whose state vector also includes task-queue and robot-status features. The DDQN optimises a multi-objective reward balancing throughput, workload and safety and executes at 10 Hz within a closed-loop pipeline implemented in MATLAB R2025a and RoboDK v5.9. Benchmarking on a 1000-episode HRC dataset (2500 allocations·episode−1) shows the hybrid CNN+DDQN controller raises throughput to 60.48 ± 0.08 tasks·min−1 (+21% vs. rule-based, +12% vs. SARSA, +8% vs. Dueling DQN, +5% vs. PPO), trims operator fatigue by 7% and sustains 99.9% collision-free operation (one-way ANOVA, p < 0.05; post-hoc power 1 − β = 0.87). Visual analyses confirm responsive task reallocation as fatigue rises or skill varies. The approach outperforms strong baselines (PPO, A3C, Dueling DQN) by mitigating Q-value over-estimation through double learning, providing robust policies under stochastic human states and offering a reproducible blueprint for multi-robot, Industry 5.0 factories. Future work will validate the controller on a physical Doosan H2017 cell and incorporate fairness constraints to avoid workload bias across multiple operators. Full article
(This article belongs to the Section Systems Engineering)
Show Figures

Figure 1

24 pages, 8216 KiB  
Article
Application of Dueling Double Deep Q-Network for Dynamic Traffic Signal Optimization: A Case Study in Danang City, Vietnam
by Tho Cao Phan, Viet Dinh Le and Teron Nguyen
Mach. Learn. Knowl. Extr. 2025, 7(3), 65; https://doi.org/10.3390/make7030065 - 14 Jul 2025
Viewed by 503
Abstract
This study investigates the application of the Dueling Double Deep Q-Network (3DQN) algorithm to optimize traffic signal control at a major urban intersection in Danang City, Vietnam. The objective is to enhance signal timing efficiency in response to mixed traffic flow and real-world [...] Read more.
This study investigates the application of the Dueling Double Deep Q-Network (3DQN) algorithm to optimize traffic signal control at a major urban intersection in Danang City, Vietnam. The objective is to enhance signal timing efficiency in response to mixed traffic flow and real-world traffic dynamics. A simulation environment was developed using the Simulation of Urban Mobility (SUMO) software version 1.11, incorporating both a fixed-time signal controller and two 3DQN models trained with 1 million (1M-Step) and 5 million (5M-Step) iterations. The models were evaluated using randomized traffic demand scenarios ranging from 50% to 150% of baseline traffic volumes. The results demonstrate that the 3DQN models outperform the fixed-time controller, significantly reducing vehicle delays, with the 5M-Step model achieving average waiting times of under five minutes. To further assess the model’s responsiveness to real-time conditions, traffic flow data were collected using YOLOv8 for object detection and SORT for vehicle tracking from live camera feeds, and integrated into the SUMO-3DQN simulation. The findings highlight the robustness and adaptability of the 3DQN approach, particularly under peak traffic conditions, underscoring its potential for deployment in intelligent urban traffic management systems. Full article
Show Figures

Graphical abstract

31 pages, 17361 KiB  
Article
Path Planning Design and Experiment for a Recirculating Aquaculture AGV Based on Hybrid NRBO-ACO with Dueling DQN
by Zhengjiang Guo, Yingkai Xia, Jiajun Liu, Jian Gao, Peng Wan and Kan Xu
Drones 2025, 9(7), 476; https://doi.org/10.3390/drones9070476 - 5 Jul 2025
Viewed by 252
Abstract
This study introduces an advanced automated guided vehicle (AGV) specifically designed for application in recirculating aquaculture systems (RASs). The proposed AGV seamlessly integrates automated feeding, real-time monitoring, and an intelligent path-planning system to enhance operational efficiency. To achieve optimal and adaptive navigation, a [...] Read more.
This study introduces an advanced automated guided vehicle (AGV) specifically designed for application in recirculating aquaculture systems (RASs). The proposed AGV seamlessly integrates automated feeding, real-time monitoring, and an intelligent path-planning system to enhance operational efficiency. To achieve optimal and adaptive navigation, a hybrid algorithm is developed, incorporating Newton–Raphson-based optimisation (NRBO) alongside ant colony optimisation (ACO). Additionally, dueling deep Q-networks (dueling DQNs) dynamically optimise critical parameters, thereby improving the algorithm’s adaptability to the complexities of RAS environments. Both simulation-based and real-world experiments substantiate the system’s effectiveness, demonstrating superior convergence speed, path quality, and overall operational efficiency compared to traditional methods. The findings of this study highlight the potential of AGV to enhance precision and sustainability in recirculating aquaculture management. Full article
Show Figures

Figure 1

31 pages, 9881 KiB  
Article
Guide Robot Based on Image Processing and Path Planning
by Chen-Hsien Yang and Jih-Gau Juang
Machines 2025, 13(7), 560; https://doi.org/10.3390/machines13070560 - 27 Jun 2025
Viewed by 288
Abstract
While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm [...] Read more.
While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm and guiding them to a specified destination. To enable hand-holding, we employed a camera combined with object detection to identify the human hand and a closed-loop control system to manage the robotic arm’s movements. For path planning, we implemented a Dueling Double Deep Q Network (D3QN) enhanced with a genetic algorithm. To address dynamic obstacles, the robot utilizes a depth camera alongside fuzzy logic to control its wheels and navigate around them. A 3D point cloud map is generated to determine the start and end points accurately. The D3QN algorithm, supplemented by variables defined using the genetic algorithm, is then used to plan the robot’s path. As a result, the robot can safely guide blind individuals to their destinations without collisions. Full article
(This article belongs to the Special Issue Autonomous Navigation of Mobile Robots and UAVs, 2nd Edition)
Show Figures

Figure 1

19 pages, 3650 KiB  
Article
Enhanced-Dueling Deep Q-Network for Trustworthy Physical Security of Electric Power Substations
by Nawaraj Kumar Mahato, Junfeng Yang, Jiaxuan Yang, Gangjun Gong, Jianhong Hao, Jing Sun and Jinlu Liu
Energies 2025, 18(12), 3194; https://doi.org/10.3390/en18123194 - 18 Jun 2025
Viewed by 369
Abstract
This paper introduces an Enhanced-Dueling Deep Q-Network (EDDQN) specifically designed to bolster the physical security of electric power substations. We model the intricate substation security challenge as a Markov Decision Process (MDP), segmenting the facility into three zones, each with potential normal, suspicious, [...] Read more.
This paper introduces an Enhanced-Dueling Deep Q-Network (EDDQN) specifically designed to bolster the physical security of electric power substations. We model the intricate substation security challenge as a Markov Decision Process (MDP), segmenting the facility into three zones, each with potential normal, suspicious, or attacked states. The EDDQN agent learns to strategically select security actions, aiming for optimal threat prevention while minimizing disruptive errors and false alarms. This methodology integrates Double DQN for stable learning, Prioritized Experience Replay (PER) to accelerate the learning process, and a sophisticated neural network architecture tailored to the complexities of multi-zone substation environments. Empirical evaluation using synthetic data derived from historical incident patterns demonstrates the significant advantages of EDDQN over other standard DQN variations, yielding an average reward of 7.5, a threat prevention success rate of 91.1%, and a notably low false alarm rate of 0.5%. The learned action policy exhibits a proactive security posture, establishing EDDQN as a promising and reliable intelligent solution for enhancing the physical resilience of power substations against evolving threats. This research directly addresses the critical need for adaptable and intelligent security mechanisms within the electric power infrastructure. Full article
(This article belongs to the Special Issue Energy, Electrical and Power Engineering: 3rd Edition)
Show Figures

Graphical abstract

29 pages, 5292 KiB  
Article
Path Planning for Lunar Rovers in Dynamic Environments: An Autonomous Navigation Framework Enhanced by Digital Twin-Based A*-D3QN
by Wei Liu, Gang Wan, Jia Liu and Dianwei Cong
Aerospace 2025, 12(6), 517; https://doi.org/10.3390/aerospace12060517 - 8 Jun 2025
Viewed by 619
Abstract
In lunar exploration missions, rovers must navigate multiple waypoints within strict time constraints while avoiding dynamic obstacles, demanding real-time, collision-free path planning. This paper proposes a digital twin-enhanced hierarchical planning method, A*-D3QN-Opt (A-Star-Dueling Double Deep Q-Network-Optimized). The framework combines the A* algorithm for [...] Read more.
In lunar exploration missions, rovers must navigate multiple waypoints within strict time constraints while avoiding dynamic obstacles, demanding real-time, collision-free path planning. This paper proposes a digital twin-enhanced hierarchical planning method, A*-D3QN-Opt (A-Star-Dueling Double Deep Q-Network-Optimized). The framework combines the A* algorithm for global optimal paths in static environments with an improved D3QN (Dueling Double Deep Q-Network) for dynamic obstacle avoidance. A multi-dimensional reward function balances path efficiency, safety, energy, and time, while priority experience replay accelerates training convergence. A high-fidelity digital twin simulation environment integrates a YOLOv5-based multimodal perception system for real-time obstacle detection and distance estimation. Experimental validation across low-, medium-, and high-complexity scenarios demonstrates superior performance: the method achieves shorter paths, zero collisions in dynamic settings, and 30% faster convergence than baseline D3QN. Results confirm its ability to harmonize optimality, safety, and real-time adaptability under dynamic constraints, offering critical support for autonomous navigation in lunar missions like Chang’e and future deep space exploration, thereby reducing operational risks and enhancing mission efficiency. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

25 pages, 7158 KiB  
Article
Anti-Jamming Decision-Making for Phased-Array Radar Based on Improved Deep Reinforcement Learning
by Hang Zhao, Hu Song, Rong Liu, Jiao Hou and Xianxiang Yu
Electronics 2025, 14(11), 2305; https://doi.org/10.3390/electronics14112305 - 5 Jun 2025
Viewed by 612
Abstract
In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, [...] Read more.
In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, the existing anti-jamming decision-making models based on reinforcement learning often suffer from problems such as low convergence speeds and low decision-making accuracy. In this paper, a multi-aspect improved deep Q-network (MAI-DQN) is proposed to improve the exploration policy, the network structure, and the training methods of the deep Q-network. In order to solve the problem of the ϵ-greedy strategy being highly dependent on hyperparameter settings, and the Q-value being overly influenced by the action in other deep Q-networks, this paper proposes a structure that combines a noisy network, a dueling network, and a double deep Q-network, which incorporates an adaptive exploration policy into the neural network and increases the influence of the state itself on the Q-value. These enhancements enable a highly adaptive exploration strategy and a high-performance network architecture, thereby improving the decision-making accuracy of the model. In order to calculate the target value more accurately during the training process and improve the stability of the parameter update, this paper proposes a training method that combines n-step learning, target soft update, variable learning rate, and gradient clipping. Moreover, a novel variable double-depth priority experience replay (VDDPER) method that more accurately simulates the storage and update mechanism of human memory is used in the MAI-DQN. The VDDPER improves the decision-making accuracy by dynamically adjusting the sample size based on different values of experience during training, enhancing exploration during the early stages of training, and placing greater emphasis on high-value experiences in the later stages. Enhancements to the training method improve the model’s convergence speed. Moreover, a reward function combining signal-level and data-level benefits is proposed to adapt to complex jamming environments, which ensures a high reward convergence speed with fewer computational resources. The findings of a simulation experiment show that the proposed phased-array radar anti-jamming decision-making method based on MAI-DQN can achieve a high convergence speed and high decision-making accuracy in environments where deceptive jamming and suppressive jamming coexist. Full article
Show Figures

Figure 1

19 pages, 713 KiB  
Article
LLM-Assisted Reinforcement Learning for U-Shaped and Circular Hybrid Disassembly Line Balancing in IoT-Enabled Smart Manufacturing
by Xiwang Guo, Chi Jiao, Jiacun Wang, Shujin Qin, Bin Hu, Liang Qi, Xianming Lang and Zhiwei Zhang
Electronics 2025, 14(11), 2290; https://doi.org/10.3390/electronics14112290 - 4 Jun 2025
Viewed by 507
Abstract
With the sharp increase in the number of products and the development of the remanufacturing industry, disassembly lines have become the mainstream recycling method. In view of the insufficient research on the layout of multi-form disassembly lines and human factors, we previously proposed [...] Read more.
With the sharp increase in the number of products and the development of the remanufacturing industry, disassembly lines have become the mainstream recycling method. In view of the insufficient research on the layout of multi-form disassembly lines and human factors, we previously proposed a linear-U-shaped hybrid layout considering the constraints of employee posture and a Duel-DQN algorithm assisted by Large Language Model (LLM). However, there is still room for improvement in the utilization efficiency of workstations. Based on this previous work, this study proposes an innovative layout of U-shaped and circular disassembly lines and retains the constraints of employee posture. The LLM is instruction-fine-tuned using the Quantized Low-Rank Adaptation (QLoRA) technique to improve the accuracy of disassembly sequence generation, and the Dueling Deep Q-Network(Duel-DQN) algorithm is reconstructed to maximize profits under posture constraints. Experiments show that in the more complex layout of U-shaped and circular disassembly lines, the iterative efficiency of this method can still be increased by about 26% compared with the traditional Duel-DQN, and the profit is close to the optimal solution of the traditional CPLEX solver, verifying the feasibility of this algorithm in complex scenarios. This study further optimizes the layout problem of multi-form disassembly lines and provides an innovative solution that takes into account both human factors and computational efficiency, which has important theoretical and practical significance. Full article
Show Figures

Figure 1

23 pages, 4463 KiB  
Article
Dual-Priority Delayed Deep Double Q-Network (DPD3QN): A Dueling Double Deep Q-Network with Dual-Priority Experience Replay for Autonomous Driving Behavior Decision-Making
by Shuai Li, Peicheng Shi, Aixi Yang, Heng Qi and Xinlong Dong
Algorithms 2025, 18(5), 291; https://doi.org/10.3390/a18050291 - 19 May 2025
Viewed by 442
Abstract
The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this [...] Read more.
The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this paper proposes a dueling double deep Q-network based on dual-priority experience replay—DPD3QN. Initially, the dueling network is integrated with the double deep Q-network, and the original network’s output layer is restructured to enhance the precision of action value estimation. Subsequently, dual-priority experience replay is incorporated to facilitate the model’s ability to swiftly recognize and leverage critical experiences. Ultimately, the training and evaluation are conducted on the OpenAI Gym simulation platform. The test results show that DPD3QN helps to improve the convergence speed of driverless vehicle behavior decision-making. Compared with the currently popular DQN and DDQN algorithms, this algorithm achieves higher success rates in challenging scenarios. Test scenario I increases by 11.8 and 25.8 percentage points, respectively, while the success rates in test scenarios I and II rise by 8.8 and 22.2 percentage points, respectively, indicating a more secure and efficient autonomous driving decision-making capability. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

28 pages, 4738 KiB  
Article
AEM-D3QN: A Graph-Based Deep Reinforcement Learning Framework for Dynamic Earth Observation Satellite Mission Planning
by Shuo Li, Gang Wang and Jinyong Chen
Aerospace 2025, 12(5), 420; https://doi.org/10.3390/aerospace12050420 - 9 May 2025
Viewed by 591
Abstract
Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To [...] Read more.
Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To overcome these limitations, we introduce AEM-D3QN, a novel intelligent task scheduling framework that integrates Graph Neural Networks (GNNs) with an Adaptive Exploration Mechanism-enabled Double Dueling Deep Q-Network (D3QN). This framework constructs a Directed Acyclic Graph (DAG) atlas to represent task dependencies and constraints, leveraging GNNs to extract spatial–temporal task features. These features are then encoded into a reinforcement learning model that dynamically optimizes scheduling policies under multiple resource constraints. The adaptive exploration mechanism improves learning efficiency by balancing exploration and exploitation based on task urgency and satellite status. Extensive experiments conducted under both periodic and emergency planning scenarios demonstrate that AEM-D3QN outperforms state-of-the-art algorithms in scheduling efficiency, response time, and task completion rate. The proposed framework offers a scalable and robust solution for real-time satellite mission planning in complex and dynamic operational environments. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

23 pages, 6216 KiB  
Article
A Macro-Control and Micro-Autonomy Pathfinding Strategy for Multi-Automated Guided Vehicles in Complex Manufacturing Scenarios
by Jiahui Le, Lili He and Junhong Zheng
Appl. Sci. 2025, 15(10), 5249; https://doi.org/10.3390/app15105249 - 8 May 2025
Viewed by 507
Abstract
To effectively plan the travel paths of automated guided vehicles (AGVs) in complex manufacturing scenarios and avoid dynamic obstacles, this paper proposes a pathfinding strategy that integrates macro-control and micro-autonomy. At the macro level, a central system employs a modified A* algorithm for [...] Read more.
To effectively plan the travel paths of automated guided vehicles (AGVs) in complex manufacturing scenarios and avoid dynamic obstacles, this paper proposes a pathfinding strategy that integrates macro-control and micro-autonomy. At the macro level, a central system employs a modified A* algorithm for preliminary pathfinding, guiding the AGVs toward their targets. At the micro level, a distributed system incorporates a navigation and obstacle avoidance strategy trained by Prioritized Experience Replay Double Dueling Deep Q-Network with ε-Dataset Aggregation (PER-D3QN-EDAgger). Each AGV integrates its current state with information from the central system and the neighboring AGVs to make autonomous pathfinding decisions. The experimental results indicate that this strategy exhibits a strong adaptability to diverse environments, low path costs, and rapid solution speeds. It effectively avoids the neighboring AGVs and other dynamic obstacles, and maintains a high task completion rate of over 95% when the number of AGVs is below 200 and the obstacle density is below 0.5. This approach combines the advantages of centralized pathfinding, which ensures high path quality, with distributed planning, which enhances adaptability to dynamic environments. Full article
Show Figures

Figure 1

28 pages, 16065 KiB  
Article
Optimization of Adaptive Observation Strategies for Multi-AUVs in Complex Marine Environments Using Deep Reinforcement Learning
by Jingjing Zhang, Weidong Zhou, Xiong Deng, Shuo Yang, Chunwang Yang and Hongliang Yin
J. Mar. Sci. Eng. 2025, 13(5), 865; https://doi.org/10.3390/jmse13050865 - 26 Apr 2025
Cited by 1 | Viewed by 505
Abstract
This paper explores the application of Deep Reinforcement Learning (DRL) to optimize adaptive observation strategies for multi-AUV systems in complex marine environments. Traditional algorithms struggle with the strong coupling between environmental information and observation modeling, making it challenging to derive optimal strategies. To [...] Read more.
This paper explores the application of Deep Reinforcement Learning (DRL) to optimize adaptive observation strategies for multi-AUV systems in complex marine environments. Traditional algorithms struggle with the strong coupling between environmental information and observation modeling, making it challenging to derive optimal strategies. To address this, we designed a DRL framework based on the Dueling Double Deep Q-Network (D3QN), enabling AUVs to interact directly with the environment for more efficient 3D dynamic ocean observation. However, traditional D3QN faces slow convergence and weak action–decision correlation in partially observable, dynamic marine settings. To overcome these challenges, we integrate a Gated Recurrent Unit (GRU) into the D3QN, improving state-space prediction and accelerating reward convergence. This enhancement allows AUVs to optimize observations, leverage ocean currents, and navigate obstacles while minimizing energy consumption. Experimental results demonstrate that the proposed approach excels in safety, energy efficiency, and observation effectiveness. Additionally, experiments with three, five, and seven AUVs reveal that while increasing platform numbers enhances predictive accuracy, the benefits diminish with additional units. Full article
(This article belongs to the Special Issue Underwater Observation Technology in Marine Environment)
Show Figures

Figure 1

18 pages, 1793 KiB  
Article
FL-MD3QN-Based IoT Intelligent Access Algorithm for Smart Construction Sites
by Qiangwen Zong, Jiaxiang Xu, Wenqiang Li, Feng Pan, Wenting Wang, Yang Liao and Yong Liao
Electronics 2025, 14(7), 1372; https://doi.org/10.3390/electronics14071372 - 29 Mar 2025
Viewed by 492
Abstract
With the deployment of fifth-generation (5G) mobile communication technology and rapid advancements in artificial intelligence and edge computing, smart construction sites have emerged as a critical direction for the construction industry’s transformation and upgrading. However, existing intelligent Internet of Things (IoT) access algorithms [...] Read more.
With the deployment of fifth-generation (5G) mobile communication technology and rapid advancements in artificial intelligence and edge computing, smart construction sites have emerged as a critical direction for the construction industry’s transformation and upgrading. However, existing intelligent Internet of Things (IoT) access algorithms often struggle to simultaneously meet practical requirements for high-efficiency data transmission rates, low latency, and secure privacy-aware access in the dynamic and complex environments of smart construction sites. To address this, this paper proposes a federated learning-based Multi-Objective Dueling Double Deep Q-Network (FL-MD3QN)-based IoT access algorithm for multi-site, multi-modal, multi-user IoT systems under the same Base Station (BS). First, a three-objective optimization mathematical model was established. The optimization goals include maximizing data transmission rates, minimizing transmission delays, and maximizing reliability. Constraints such as bandwidth, rate, bit error rate (BER), and security/privacy are defined. Second, the FL-MD3QN algorithm is developed to solve this optimization problem. This algorithm can adaptively adjust the access strategy to cope with the complex and ever-changing communication needs of smart construction sites and, by introducing a federated learning mechanism, it achieves collaborative optimization of multiple construction site IoT systems while ensuring user privacy. Simulation results demonstrated significant advantages of the FL-MD3QN algorithm. For latency, it achieved markedly lower delays across multi-modal services compared to benchmark algorithms, with the shortest training time. In transmission rates, FL-MD3QN delivered the highest average rates, particularly excelling in video services. Under high signal-to-noise ratio conditions, FL-MD3QN achieved exceptionally low BER values. Additionally, it attained high levels in average access success rate and average reward value, confirming its robust adaptability and optimization performance in complex smart construction environments. Full article
Show Figures

Figure 1

16 pages, 1040 KiB  
Article
Trade-Offs in Navigation Problems Using Value-Based Methods
by Petra Csereoka and Mihai V. Micea
AI 2025, 6(3), 53; https://doi.org/10.3390/ai6030053 - 10 Mar 2025
Viewed by 818
Abstract
Deep Q-Networks (DQNs) have shown remarkable results over the last decade in scenarios ranging from simple 2D fully observable short episodes to partially observable, graphically intensive, and complex tasks. However, the base architecture of a vanilla DQN presents several shortcomings, some of which [...] Read more.
Deep Q-Networks (DQNs) have shown remarkable results over the last decade in scenarios ranging from simple 2D fully observable short episodes to partially observable, graphically intensive, and complex tasks. However, the base architecture of a vanilla DQN presents several shortcomings, some of which were mitigated by new variants focusing on increased stability, faster convergence, and time dependencies. These additions, on the other hand, bring increased costs in terms of the required memory and lengthier training times. In this paper, we analyze the performance of state-of-the-art DQN families in a simple partially observable mission created in Minecraft and try to determine the optimal architecture for such problem classes in terms of the cost and accuracy. To the best of our knowledge, the analyzed methods have not been tested on the same scenario before, and hence a more in-depth comparison is required to understand the real performance improvement they provide better. This manuscript also offers a detailed overview of state-of-the-art DQN methods, together with the training heuristics and performance metrics registered during the proposed mission, allowing researchers to select better-suited models to solving future problems. Our experiments show that Double DQN networks are capable of handling partially observable scenarios gracefully while maintaining a low hardware footprint, Recurrent Double DQNs can be a good candidate even when the resources must be restricted, and double-dueling DQNs are a well-performing middle ground in terms of their cost and performance. Full article
Show Figures

Figure 1

30 pages, 22571 KiB  
Article
Joint Pricing, Server Orchestration and Network Slice Deployment in Mobile Edge Computing Networks
by Yijian Hou, Kaisa Zhang, Gang Chuai, Weidong Gao, Xiangyu Chen and Siqi Liu
Electronics 2025, 14(5), 841; https://doi.org/10.3390/electronics14050841 - 21 Feb 2025
Viewed by 770
Abstract
The integration of mobile edge computing (MEC) and network slicing can provide low-latency and customized services. In such integrated wireless networks, we propose a pricing-driven joint MEC server orchestration and network slice deployment scheme (PD-JSOSD), jointly solving the pricing, MEC server orchestration and [...] Read more.
The integration of mobile edge computing (MEC) and network slicing can provide low-latency and customized services. In such integrated wireless networks, we propose a pricing-driven joint MEC server orchestration and network slice deployment scheme (PD-JSOSD), jointly solving the pricing, MEC server orchestration and network slicing deployment issues. We divide the system into an infrastructure provider layer (IPL), network planning layer (NPL) and resource allocation layer (RAL), and a three-stage Stackelberg game is proposed to describe their relationships. To obtain the Stackelberg equalization, we propose a three-layer deep reinforcement learning (DRL) algorithm. Specifically, the dueling double deep Q-network (D3QN) is used in the IPL, and the DRL with branching dueling Q-network (BDQ) is used in the NPL and the RAL to cope with the large-scale discrete action spaces. Moreover, we propose an innovative illegal action modification algorithm to improve the convergence of the BDQ. Simulations verify the convergence of the three-layer DRL and the superiority of modified-BDQ in dealing with large-scale action spaces, where modified-BDQ can improve the convergence by 21.9% and 28.3%. Furthermore, compared with the benchmark algorithms, JSOSD in the NPL and the RAL can improve system utility by up to 52.1%, proving the superiority of the server orchestration and slice deployment scheme. Full article
(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)
Show Figures

Graphical abstract

Back to TopTop