MDPI - Publisher of Open Access Journals

22 pages, 4519 KB

Open AccessArticle

Multi-Level Attention Dueling Double Deep Q-Network for Local Path Planning

by Hepengfei Wang, Jie Huang, Nan Wang and Huajie Hong

Appl. Sci. 2026, 16(12), 6235; https://doi.org/10.3390/app16126235 - 21 Jun 2026

Viewed by 199

Deep reinforcement learning (DRL) has shown considerable potential in local path planning for autonomous robots. However, existing DRL methods still suffer from limited training efficiency, poor generalization, and weak sim-to-real transferability in complex environments. To address these issues, this paper proposes a Multi-Level [...] Read more.

Deep reinforcement learning (DRL) has shown considerable potential in local path planning for autonomous robots. However, existing DRL methods still suffer from limited training efficiency, poor generalization, and weak sim-to-real transferability in complex environments. To address these issues, this paper proposes a Multi-Level Attention Dueling Double Deep Q-Network (MLA-D3QN) framework, which progressively enhances feature extraction, spatial perception, and modality fusion through three attention levels: rule-based attention for obstacle contour extraction, implicit neural multi-scale spatial attention for environment perception, and bidirectional cross-attention for multi-modal feature alignment. Simulation results show that MLA-D3QN outperforms baseline and comparison methods in terms of convergence speed and average reward. Real-world experiments are conducted on a Scout mini platform with 50 trials in simple task scenarios (sparse obstacles, short distance) and 50 trials in complex task scenarios (dense obstacles, long distance). The proposed method achieves success rates of 98% in simple tasks and 94% in complex tasks. Compared to CNN-D3QN and D3QN, MLA-D3QN improves success rates by 10 percentage points (vs. CNN-D3QN) and 38 percentage points (vs. D3QN) in simple tasks, and by 34 percentage points (vs. CNN-D3QN) and 84 percentage points (vs. D3QN) in complex tasks. Path costs are reduced by 24.0% (vs. CNN-D3QN) and 59.9% (vs. D3QN). These results validate the effectiveness of MLA-D3QN in improving generalization and sim-to-real transferability for local path planning in complex environments. Full article

► Show Figures

Figure 1

28 pages, 8499 KB

Open AccessArticle

A Load-Aware Task Offloading Method for Mobile Edge Computing Under Eligibility Constraints

by Yarong Liu, Zijian Che and Xiaolan Xie

Future Internet 2026, 18(6), 317; https://doi.org/10.3390/fi18060317 - 10 Jun 2026

Viewed by 272

Abstract

Mobile edge computing (MEC) enables computation-intensive and latency-sensitive tasks to be offloaded from mobile devices to nearby edge servers. Most existing MEC task offloading studies formulate offloading as a selection problem over a fixed or fully available set of candidate servers, which is [...] Read more.

Mobile edge computing (MEC) enables computation-intensive and latency-sensitive tasks to be offloaded from mobile devices to nearby edge servers. Most existing MEC task offloading studies formulate offloading as a selection problem over a fixed or fully available set of candidate servers, which is restrictive in heterogeneous MEC scenarios with task-node eligibility constraints. Under such constraints, a task can be processed by an edge server only when task attributes, service requirements, link conditions, and node states jointly satisfy the corresponding eligibility conditions. The feasible action set therefore varies over time, while offloading decisions are further coupled with edge-node-side queue competition and long-term load evolution. To address this problem, this paper proposes Resource-oriented Scheduling Coordination (RoSCo), a load-aware task offloading method with scheduling-level constraint handling for eligibility-constrained MEC systems. In this paper, scheduling coordination refers to the joint use of feasible-action control, priority-aware edge-node service-order modeling, and load-responsive feedback within the task offloading decision process; it does not denote inter-server communication, task aggregation, federated model aggregation, or a distributed coordination protocol. RoSCo constructs a dynamic feasible action set, applies eligibility-aware action masking to exclude infeasible offloading actions, incorporates priority-aware edge-node service-order information to characterize queueing competition among heterogeneous tasks, and designs a load-responsive reward to guide congestion mitigation and load balancing. A dueling double deep Q-network (D3QN) is adopted as the value-learning backbone, while the main methodological contribution lies in embedding task-specific feasible-action control, priority-aware node-side queue information, and load-responsive feedback into the constrained offloading process. Simulation results show that RoSCo reduces the task drop rate and edge-node load imbalance while maintaining competitive task completion delay and energy consumption, especially under high-load and sparse-eligibility conditions. Full article

► Show Figures

Figure 1

34 pages, 68569 KB

Open AccessArticle

Perception-Aware Cooperative Path Planning for Multi-UAV Systems in Urban Wind Fields via Deep Reinforcement Learning

by Jie Ding, Linshen Wang, Shuxin Jin and Di Wang

Sensors 2026, 26(10), 2960; https://doi.org/10.3390/s26102960 - 8 May 2026

Viewed by 837

Abstract

The safe deployment of multiple Unmanned Aerial Vehicles (UAVs) in complex urban environments relies heavily on accurate environmental perception and efficient cooperative path planning. However, executing multi-UAV operations in low-altitude airspaces faces severe challenges due to the dual constraints of complex building clusters [...] Read more.

The safe deployment of multiple Unmanned Aerial Vehicles (UAVs) in complex urban environments relies heavily on accurate environmental perception and efficient cooperative path planning. However, executing multi-UAV operations in low-altitude airspaces faces severe challenges due to the dual constraints of complex building clusters and steady-state wind field disturbances. These dynamic environmental factors frequently distort sensory expectations, inducing trajectory drift and degrading policy robustness. To address these limitations, this paper proposes an enhanced Dueling Double Deep Q-Network (D3QN) algorithm, termed NPD3QN, tailored for perception-aware multi-UAV cooperative path planning. By formulating the perceived environmental data (e.g., wind speed, obstacle distances, and inter-UAV states) into a Markov Decision Process, an N-step update strategy is integrated to enhance the characterization of long-term returns. Simultaneously, an improved Prioritized Experience Replay (PER) mechanism is developed to actively filter negative experiences and assign dynamic weights to critical state-action samples, thereby significantly elevating training stability. A 3D urban kinematic environment incorporating a steady-state simulated wind field is constructed. Extensive ablation and comparative results demonstrate that NPD3QN effectively maps high-dimensional state perceptions to robust control commands. In wind-disturbed scenarios, it generates highly streamlined cooperative trajectories, reducing the total path length by approximately 11.7% compared to the standard D3QN baseline. While currently evaluated within steady-state simulated constraints, this study establishes a robust, sensor-driven methodological foundation for autonomous multi-UAV cooperative path planning in wind-disturbed airspaces. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

25 pages, 4053 KB

Open AccessArticle

Resource Allocation for D2D Communications in Multi-Slice NOMA-Based Cellular Networks

by Lijun Dong, Jingjing Wu and Yitong Yang

Future Internet 2026, 18(5), 246; https://doi.org/10.3390/fi18050246 - 6 May 2026

Viewed by 250

Abstract

Significant challenges will be encountered in next-generation cellular networks to achieve both high spectral efficiency (SE) and diverse quality of service (QoS) requirements simultaneously, particularly under stringent bandwidth and power budgets within highly dynamic and dense topologies. To address these challenges, we formulate [...] Read more.

Significant challenges will be encountered in next-generation cellular networks to achieve both high spectral efficiency (SE) and diverse quality of service (QoS) requirements simultaneously, particularly under stringent bandwidth and power budgets within highly dynamic and dense topologies. To address these challenges, we formulate an optimization problem in a multi-slice non-orthogonal multiple access (NOMA) system with underlay device-to-device (D2D) communications. This problem aims to maximize SE and satisfy user QoS demands by jointly optimizing power allocation and resource block (RB) assignment. To solve this non-convex and NP-hard problem, we propose a resource allocation mechanism based on joint optimization and cooperative multi-agent deep reinforcement learning (MADRL). Specifically, we construct an optimization framework based on successive convex approximation (SCA) and the Lagrange duality method to derive an analytical iterative solution for the optimal power allocation under a given RB assignment, thereby avoiding the inherent discretization error of the action space in pure learning methods. Furthermore, we propose a cooperative multi-agent algorithm based on dueling double deep Q-Network (CMAD3QN) to address the discrete RB assignment problem. Simulation results demonstrate that, compared with benchmark schemes, the proposed scheme exhibits faster convergence speed and significantly enhances system spectral efficiency while ensuring slice isolation and resource constraints. Full article

(This article belongs to the Special Issue 6G Wireless Network Technologies)

► Show Figures

Figure 1

45 pages, 7679 KB

Open AccessArticle

Conquering the Urban Firefighting Challenge: A Deep Q-Network Approach for Autonomous UAV Navigation

by Shafiqul Alam Khan, Damian Valles, Marcelo M. Carvalho and Wenquan Dong

Inventions 2026, 11(2), 35; https://doi.org/10.3390/inventions11020035 - 2 Apr 2026

Viewed by 1095

Abstract

Firefighters must locate victims reliably to carry out rescue operations within burning structures during urban firefighting events. Low visibility, reduced oxygen levels, weakened structural rigidity, and dense smoke make it difficult to locate victims. In addition to these challenges, victims may be unconscious [...] Read more.

Firefighters must locate victims reliably to carry out rescue operations within burning structures during urban firefighting events. Low visibility, reduced oxygen levels, weakened structural rigidity, and dense smoke make it difficult to locate victims. In addition to these challenges, victims may be unconscious and unable to report their locations to firefighters. This research work explores the Double Deep Q-Network (Double DQN), Dueling Deep Q-Network (Dueling DQN), and Dueling Double Deep Q-Network (D3QN) agents for an unmanned aerial vehicle (UAV) to navigate around a structure and locate trapped victims within it. The UAV’s position, Light Detection and Ranging (LiDAR), and infrared camera data are utilized as inputs for the Deep Q-Networks. The PER is used to store transitions and sample them according to priority for training. Python’s Pygame library is used in this research to create a simulated environment in which infrared camera and LiDAR data are simulated. The performance of the UAV agent is evaluated using cumulative maximum reward, reward distribution histogram, Temporal Difference (TD) error over time, and number of successful episodes. Among the three DQN UAV agents, the Dueling DQN and Double DQN have potential for real-world applications in firefighting. Full article

(This article belongs to the Special Issue Unmanned Aerial Vehicles (UAVs): Innovations and Applications)

► Show Figures

Figure 1

26 pages, 5704 KB

Open AccessArticle

Intent-Aware Collision Avoidance for UAVs in High-Density Non-Cooperative Environments Using Deep Reinforcement Learning

by Xuchuan Liu, Yuan Zheng, Chenglong Li, Bo Jiang and Wenyong Gu

Aerospace 2026, 13(2), 111; https://doi.org/10.3390/aerospace13020111 - 23 Jan 2026

Viewed by 890

Abstract

Collision avoidance between unmanned aerial vehicles (UAVs) and non-cooperative targets (e.g., off-nominal operations or birds) presents significant challenges in urban air mobility (UAM). This difficulty arises due to the highly dynamic and unpredictable flight intentions of these targets. Traditional collision-avoidance methods primarily focus [...] Read more.

Collision avoidance between unmanned aerial vehicles (UAVs) and non-cooperative targets (e.g., off-nominal operations or birds) presents significant challenges in urban air mobility (UAM). This difficulty arises due to the highly dynamic and unpredictable flight intentions of these targets. Traditional collision-avoidance methods primarily focus on cooperative targets or non-cooperative ones with fixed behavior, rendering them ineffective when dealing with highly unpredictable flight patterns. To address this, we introduce a deep reinforcement learning-based collision-avoidance approach leveraging global and local intent prediction. Specifically, we propose a Global and Local Perception Prediction Module (GLPPM) that combines a state-space-based global intent association mechanism with a local feature extraction module, enabling accurate prediction of short- and long-term flight intents. Additionally, we propose a Fusion Sector Flight Control Module (FSFCM) that is trained with a Dueling Double Deep Q-Network (D3QN). The module integrates both predicted future and current intents into the state space and employs a specifically designed reward function, thereby ensuring safe UAV operations. Experimental results demonstrate that the proposed method significantly improves mission success rates in high-density environments, with up to 80 non-cooperative targets per square kilometer. In 1000 flight tests, the mission success rate is 15.2 percentage points higher than that of the baseline D3QN. Furthermore, the approach retains an 88.1% success rate even under extreme target densities of 120 targets per square kilometer. Finally, interpretability analysis via Deep SHAP further verifies the decision-making rationality of the algorithm. Full article

(This article belongs to the Section Aeronautics)

► Show Figures

Figure 1

25 pages, 4648 KB

Open AccessSystematic Review

Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review

by Lekhetho Joseph Mpoporo, Pius Adewale Owolawi and Chunling Tu

Appl. Sci. 2026, 16(2), 1048; https://doi.org/10.3390/app16021048 - 20 Jan 2026

Viewed by 1591

Abstract

Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, [...] Read more.

Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, and susceptibility to adversarial exploits. Deep reinforcement learning (DRL) has recently surfaced as a viable substitute, providing resilience in unanticipated circumstances, dynamic adaptation, and continuous learning. This study conducts a thorough bibliometric analysis and systematic literature review (SLR) of DRL-based intrusion detection systems (DRL-based IDS). The relevant literature from 2020 to 2024 was identified and investigated using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Emerging research themes, influential works, and structural relationships in the research fields were identified using a bibliometric analysis. SLR was used to synthesize methodological techniques, datasets, and performance analysis. The results indicate that DRL algorithms such as deep Q-network (DQN), double DQNs (DDQN), dueling DQN (D3QN), policy gradient methods, and actor–critic models have been actively utilized for enhancing IDS performance in various applications and datasets. The results highlight the increasing significance of DRL-based solutions for developing intelligent and robust intrusion detection systems and advancing cybersecurity. Full article

(This article belongs to the Special Issue Advances in Cyber Security)

► Show Figures

Figure 1

22 pages, 3688 KB

Open AccessArticle

An End-to-End Hierarchical Intelligent Inference Model for Collaborative Operation of Grid Switches

by Mingrui Zhao, Tie Chen, Jiaxin Yuan, Yuting Jiang and Junlin Ren

Energies 2025, 18(24), 6574; https://doi.org/10.3390/en18246574 - 16 Dec 2025

Cited by 1 | Viewed by 503

Abstract

To address the issue of heavy reliance on manual intervention in substation maintenance tasks, this paper proposes an end-to-end hierarchical intelligent inference method for collaborative operation of grid switches. The method constructs a unified knowledge environment that can simultaneously describe the operational characteristics [...] Read more.

To address the issue of heavy reliance on manual intervention in substation maintenance tasks, this paper proposes an end-to-end hierarchical intelligent inference method for collaborative operation of grid switches. The method constructs a unified knowledge environment that can simultaneously describe the operational characteristics of both the power grid and the substation, and combines Dueling Double Deep Q-Network (D3QN) with Multi-Task Dueling Double Deep Q-Network (MT-D3QN) algorithms for interactive training, achieving hierarchical inference. The upper layer uses bays as the base nodes to reflect the power flow, designing a reward and penalty function under N-1 power flow constraints and ring-current impact constraints, optimizing the load transfer plan for the power outages caused by maintenance tasks. The lower layer uses switches as the base nodes to reflect the main wiring status of the substation, introduces a multi-task learning mechanism for parallel training of bays with the same tasks, designs the reward and penalty function according to the five protection rules, and optimizes the switching operations within the bay. The experimental results show that the trained model can quickly deduce the switching operation sequence for different maintenance tasks. Full article

► Show Figures

Figure 1

14 pages, 2239 KB

Open AccessArticle

Energy-Efficient Path Planning for Snake Robots Using a Deep Reinforcement Learning-Enhanced A* Algorithm

by Yang Gu, Zelin Wang and Zhong Huang

Biomimetics 2025, 10(12), 826; https://doi.org/10.3390/biomimetics10120826 - 10 Dec 2025

Cited by 2 | Viewed by 866

Abstract

Snake-like robots, characterized by their high flexibility and multi-joint structure, exhibit exceptional adaptability to complex terrains such as snowfields, jungles, deserts, and underwater environments. Their ability to navigate narrow spaces and circumvent obstacles makes them ideal for operations in confined or rugged environments. [...] Read more.

Snake-like robots, characterized by their high flexibility and multi-joint structure, exhibit exceptional adaptability to complex terrains such as snowfields, jungles, deserts, and underwater environments. Their ability to navigate narrow spaces and circumvent obstacles makes them ideal for operations in confined or rugged environments. However, efficient motion in such conditions requires not only mechanical flexibility but also effective path planning to ensure safety, energy efficiency, and overall task performance. Most existing path planning algorithms for snake-like robots focus primarily on finding the shortest path between the start and target positions while neglecting the optimization of energy consumption during real operations. To address this limitation, this study proposes an energy-efficient path planning method based on an improved A* algorithm enhanced with deep reinforcement learning: Dueling Double-Deep Q-Network (D3QN). An Energy Consumption Estimation Model (ECEM) is first developed to evaluate the energetic cost of snake robot motion in three-dimensional space. This model is then integrated into a new heuristic function to guide the A* search toward energy-optimal trajectories. Simulation experiments were conducted in a 3D environment to assess the performance of the proposed approach. The results demonstrate that the improved A* algorithm effectively reduces the energy consumption of the snake robot compared with conventional algorithms. Specifically, the proposed method achieves an energy consumption of 68.79 J, which is 3.39%, 27.26%, and 5.91% lower than that of the traditional A* algorithm (71.20 J), the bidirectional A* algorithm (94.61 J), and the weighted improved A* algorithm (73.11 J), respectively. These findings confirm that integrating deep reinforcement learning with an adaptive heuristic function significantly enhances both the energy efficiency and practical applicability of snake robot path planning in complex 3D environments. Full article

(This article belongs to the Special Issue Theory and Application of Bioinspired Robotics and Intelligent Control)

► Show Figures

Figure 1

12 pages, 578 KB

Open AccessArticle

A Power-Aware 5G Network Slicing Scheme for IIoT Systems with Age Tolerance

by Mingjiang Weng, Yixuan Bai and Xin Xie

Sensors 2025, 25(22), 6956; https://doi.org/10.3390/s25226956 - 14 Nov 2025

Cited by 2 | Viewed by 1086

Abstract

Network slicing has emerged as a pivotal technology in addressing the diverse customization requirements of the Industrial Internet of Things (IIoT) within 5G networks, enabling the deployment of multiple logical networks over shared infrastructure. Efficient resource management in this context is essential to [...] Read more.

Network slicing has emerged as a pivotal technology in addressing the diverse customization requirements of the Industrial Internet of Things (IIoT) within 5G networks, enabling the deployment of multiple logical networks over shared infrastructure. Efficient resource management in this context is essential to ensure energy efficiency and meet the stringent real-time demands of IIoT applications. This study focuses on the scheduling problem of minimizing average transmission power while maintaining Age of Information (AoI) tolerance constraints within 5G wireless network slicing. To tackle this challenge, an improved Dueling Double Deep Q-Network (D3QN) is leveraged to devise intelligent slicing schemes that dynamically allocate resources, ensuring optimal performance in time-varying wireless environments. The proposed improved D3QN approach introduces a novel heuristic-based exploration strategy that restricts action choices to the most effective options, significantly; reducing ineffective learning steps. The simulation results show that the method not only speeds up convergence considerably but also achieves lower transmit power while preserving strict AoI reliability constraints and slice isolation. Full article

(This article belongs to the Special Issue Effective Software-Defined Internet-of-Things (SD-IoT) Leveraging AI, 5G and NFV—2nd Edition)

► Show Figures

Figure 1

20 pages, 3535 KB

Open AccessArticle

Optimization Method of Energy Saving Strategy for Networked Driving in Road Sections with Frequent Traffic Flow Changes

by Minghao Gao, Dayi Qu, Kedong Wang, Yicheng Chen and Jintao Zhan

Vehicles 2025, 7(4), 118; https://doi.org/10.3390/vehicles7040118 - 16 Oct 2025

Viewed by 766

Abstract

It is of great significance to construct a networked energy-saving driving strategy method and application framework to solve the problems of traffic disorder, speed fluctuations, and high energy consumption caused by frequent acceleration, deceleration, and lane changing of vehicles in road sections with [...] Read more.

It is of great significance to construct a networked energy-saving driving strategy method and application framework to solve the problems of traffic disorder, speed fluctuations, and high energy consumption caused by frequent acceleration, deceleration, and lane changing of vehicles in road sections with variable traffic flow. Considering the mixed traffic scenario where autonomous vehicles and manually driven vehicles interact and infiltrate, a hybrid traffic flow vehicle energy-saving driving model was established, and the Dueling Double Deep Q-Network (D3QN) was used to optimize and solve the energy-saving driving model; Selecting Qingdao urban intersections as application scenarios, energy-saving driving strategy application facilities were constructed in simulation experiments to carry out simulation verification of energy-saving driving strategies for mixed traffic flow in the context of vehicle networking. The simulation results show that in different scenarios with different proportions of CAVs, the energy-saving strategy based on D3QN deep reinforcement learning algorithm can achieve fuel savings of 8.41%~6.67% compared to conventional strategies. Compared with the ordinary reinforcement learning algorithm Q-learning, its fuel saving rate is increased by 1.94%~1.5%, and the energy-saving effect becomes more significant with the increase of traffic density; From the perspective of dynamic characteristics, the speed stability under the control of D3QN algorithm is superior to Q-learning algorithm, and significantly better than conventional strategies, further highlighting the comprehensive advantages of D3QN algorithm in optimizing traffic flow status and energy consumption control. The energy-saving driving strategy in the networked environment can reduce fuel consumption caused by speed fluctuations and traffic flow frequency disturbances, and optimize the stability of traffic flow operation. Full article

► Show Figures

Figure 1

31 pages, 9881 KB

Open AccessArticle

Guide Robot Based on Image Processing and Path Planning

by Chen-Hsien Yang and Jih-Gau Juang

Machines 2025, 13(7), 560; https://doi.org/10.3390/machines13070560 - 27 Jun 2025

Cited by 1 | Viewed by 1557

Abstract

While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm [...] Read more.

While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm and guiding them to a specified destination. To enable hand-holding, we employed a camera combined with object detection to identify the human hand and a closed-loop control system to manage the robotic arm’s movements. For path planning, we implemented a Dueling Double Deep Q Network (D3QN) enhanced with a genetic algorithm. To address dynamic obstacles, the robot utilizes a depth camera alongside fuzzy logic to control its wheels and navigate around them. A 3D point cloud map is generated to determine the start and end points accurately. The D3QN algorithm, supplemented by variables defined using the genetic algorithm, is then used to plan the robot’s path. As a result, the robot can safely guide blind individuals to their destinations without collisions. Full article

(This article belongs to the Special Issue Autonomous Navigation of Mobile Robots and UAVs, 2nd Edition)

► Show Figures

Figure 1

29 pages, 5292 KB

Open AccessArticle

Path Planning for Lunar Rovers in Dynamic Environments: An Autonomous Navigation Framework Enhanced by Digital Twin-Based A*-D3QN

by Wei Liu, Gang Wan, Jia Liu and Dianwei Cong

Aerospace 2025, 12(6), 517; https://doi.org/10.3390/aerospace12060517 - 8 Jun 2025

Cited by 4 | Viewed by 3454

Abstract

In lunar exploration missions, rovers must navigate multiple waypoints within strict time constraints while avoiding dynamic obstacles, demanding real-time, collision-free path planning. This paper proposes a digital twin-enhanced hierarchical planning method, A*-D3QN-Opt (A-Star-Dueling Double Deep Q-Network-Optimized). The framework combines the A* algorithm for [...] Read more.

In lunar exploration missions, rovers must navigate multiple waypoints within strict time constraints while avoiding dynamic obstacles, demanding real-time, collision-free path planning. This paper proposes a digital twin-enhanced hierarchical planning method, A*-D3QN-Opt (A-Star-Dueling Double Deep Q-Network-Optimized). The framework combines the A* algorithm for global optimal paths in static environments with an improved D3QN (Dueling Double Deep Q-Network) for dynamic obstacle avoidance. A multi-dimensional reward function balances path efficiency, safety, energy, and time, while priority experience replay accelerates training convergence. A high-fidelity digital twin simulation environment integrates a YOLOv5-based multimodal perception system for real-time obstacle detection and distance estimation. Experimental validation across low-, medium-, and high-complexity scenarios demonstrates superior performance: the method achieves shorter paths, zero collisions in dynamic settings, and 30% faster convergence than baseline D3QN. Results confirm its ability to harmonize optimality, safety, and real-time adaptability under dynamic constraints, offering critical support for autonomous navigation in lunar missions like Chang’e and future deep space exploration, thereby reducing operational risks and enhancing mission efficiency. Full article

(This article belongs to the Section Astronautics & Space Science)

► Show Figures

Figure 1

28 pages, 4738 KB

Open AccessArticle

AEM-D3QN: A Graph-Based Deep Reinforcement Learning Framework for Dynamic Earth Observation Satellite Mission Planning

by Shuo Li, Gang Wang and Jinyong Chen

Aerospace 2025, 12(5), 420; https://doi.org/10.3390/aerospace12050420 - 9 May 2025

Cited by 4 | Viewed by 2359

Abstract

Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To [...] Read more.

Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To overcome these limitations, we introduce AEM-D3QN, a novel intelligent task scheduling framework that integrates Graph Neural Networks (GNNs) with an Adaptive Exploration Mechanism-enabled Double Dueling Deep Q-Network (D3QN). This framework constructs a Directed Acyclic Graph (DAG) atlas to represent task dependencies and constraints, leveraging GNNs to extract spatial–temporal task features. These features are then encoded into a reinforcement learning model that dynamically optimizes scheduling policies under multiple resource constraints. The adaptive exploration mechanism improves learning efficiency by balancing exploration and exploitation based on task urgency and satellite status. Extensive experiments conducted under both periodic and emergency planning scenarios demonstrate that AEM-D3QN outperforms state-of-the-art algorithms in scheduling efficiency, response time, and task completion rate. The proposed framework offers a scalable and robust solution for real-time satellite mission planning in complex and dynamic operational environments. Full article

(This article belongs to the Section Astronautics & Space Science)

► Show Figures

Figure 1

23 pages, 6216 KB

Open AccessArticle

A Macro-Control and Micro-Autonomy Pathfinding Strategy for Multi-Automated Guided Vehicles in Complex Manufacturing Scenarios

by Jiahui Le, Lili He and Junhong Zheng

Appl. Sci. 2025, 15(10), 5249; https://doi.org/10.3390/app15105249 - 8 May 2025

Viewed by 1514

Abstract

To effectively plan the travel paths of automated guided vehicles (AGVs) in complex manufacturing scenarios and avoid dynamic obstacles, this paper proposes a pathfinding strategy that integrates macro-control and micro-autonomy. At the macro level, a central system employs a modified A* algorithm for [...] Read more.

To effectively plan the travel paths of automated guided vehicles (AGVs) in complex manufacturing scenarios and avoid dynamic obstacles, this paper proposes a pathfinding strategy that integrates macro-control and micro-autonomy. At the macro level, a central system employs a modified A* algorithm for preliminary pathfinding, guiding the AGVs toward their targets. At the micro level, a distributed system incorporates a navigation and obstacle avoidance strategy trained by Prioritized Experience Replay Double Dueling Deep Q-Network with

ε

-Dataset Aggregation (PER-D3QN-EDAgger). Each AGV integrates its current state with information from the central system and the neighboring AGVs to make autonomous pathfinding decisions. The experimental results indicate that this strategy exhibits a strong adaptability to diverse environments, low path costs, and rapid solution speeds. It effectively avoids the neighboring AGVs and other dynamic obstacles, and maintains a high task completion rate of over 95% when the number of AGVs is below 200 and the obstacle density is below 0.5. This approach combines the advantages of centralized pathfinding, which ensures high path quality, with distributed planning, which enhances adaptability to dynamic environments. Full article

► Show Figures

Figure 1

Search Results (38)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (38)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI