Saved Queries

High-rise elevator group control systems operate under pronounced nonstationarity during commuting peaks, post-event surges, and capacity degradation, where the waiting time distribution becomes right-tail heavy and stresses service-level agreements (SLAs) defined by coverage and high-quantile targets. At the same time, the time-of-use tariffs and carbon constraints sharpen the tension between peak-power control, energy savings, and service capacity. This paper proposes a two-layer resilience scheduling framework that integrates queueing-based planning with safe reinforcement learning (RL) fine-tuning. In the planning layer, parsimonious queueing approximations and scenario-based evaluation construct a finite set of implementable mode cards and emergency switching cards; Sample Average Approximation (SAA) combined with Conditional Value-at-Risk (CVaR) constraints filter candidates to enforce tail-risk-aware service limits while keeping power demand within a prescribed envelope. In the execution layer, online dispatch is formulated as a constrained Markov decision process; within the planning layer limits, action masking and Lagrangian safe RL learn small adaptive adjustments to suppress tail-waiting risk and improve recovery dynamics without increasing peak-power commitments. The experiments under morning peaks and post-event surges confirm tail risk reduction and accelerated recovery. For partial outages, the framework prioritizes SLA coverage and recovery speed, accepting a bounded increase in tail risk as a manageable trade-off. Throughout all tests, peak power remains within the prescribed limits. Improvements persist across random seeds and demand fluctuations, indicating distributional robustness and cross-scenario generalization. Ablation studies further reveal complementary roles: removing the planning layer CVaR screening worsens tail performance, while removing the execution layer action masking increases constraint violations and destabilizes recovery. Full article

(This article belongs to the Special Issue AI-Driven Intelligent Maintenance and Health Management for Complex Industrial Systems)

►▼ Show Figures

Figure 1

33 pages, 2030 KB

Open AccessArticle

Distributed Task Allocation Algorithm for Heterogeneous UAVs Based on Reinforcement Learning

by Peng Sun, Guangwei Yang, Xin Xu, Jieyong Zhang, Xida Deng, Yongzhuang Zhang and Jie Cui

Drones 2026, 10(3), 220; https://doi.org/10.3390/drones10030220 - 20 Mar 2026

Abstract

To address the challenges faced by heterogeneous Unmanned Aerial Vehicle (UAV) systems in complex task allocation, including over-reliance on centralized scheduling, training deadlock, inadequate capture of temporal collaboration, and unstable training under sparse reward conditions, this paper proposes a distributed task allocation algorithm based on reinforcement learning. The algorithm adopts a decentralized decision-making architecture, which enables the autonomous formation of UAV collaborative groups without the need for a global scheduling center. A cascaded submission timeout mechanism is introduced to prevent training deadlock; the combination of Long Short-Term Memory (LSTM) and attention mechanism is employed to accurately model temporal correlations and collaborative dependencies; and the Proximal Policy Optimization (PPO) algorithm is leveraged to optimize the training stability under sparse reward conditions. Experimental results demonstrate that the proposed algorithm achieves a 100% task success rate in scenarios of different scales, and its key metrics, including makespan, time cost and waiting time, are significantly superior to those of mainstream baseline methods such as the Genetic Algorithm (GA) and the Hungarian Algorithm (HA). Moreover, the algorithm still maintains excellent robustness under the conditions of UAV failures, parameter variations, and dynamic task perturbations. This method supports zero-shot generalization for any number of UAVs and tasks and provides an efficient and reliable solution for the real-time collaborative scheduling of heterogeneous UAV systems. Full article

(This article belongs to the Section Artificial Intelligence in Drones (AID))

18 pages, 1843 KB

Open AccessArticle

Heterogeneous Computing Resources Scheduling Based on Time-Varying Graphs and Multi-Agent Reinforcement Learning

by Jinshan Yuan, Xuncai Zhang and Kexin Gong

Future Internet 2026, 18(3), 168; https://doi.org/10.3390/fi18030168 (registering DOI) - 20 Mar 2026

Abstract

The evolution toward 6G Computing Power Networks (CPN) aims to deeply integrate multi-tier computing resources across Cloud, Edge, and end devices. However, the significant heterogeneity of computing resources, characterized by varying hardware architectures such as CPUs, GPUs, and NPUs, coupled with the time-varying network topology caused by terminal mobility, poses severe challenges to realizing efficient integrated scheduling that satisfies Quality of Service (QoS). To address spatiotemporal mismatches between task requirements and hardware architectures, this paper proposes an integrated scheduling method combining Discrete Time-Varying Graph (DTVG) construction with Multi-Agent Reinforcement Learning (MARL). Specifically, we model the dynamic interaction between mobile tasks and heterogeneous nodes as a DTVG to capture spatiotemporal evolution and employ a QMIX-based algorithm to enable collaborative decision-making among distributed agents. Simulation results demonstrate that the proposed approach effectively solves the joint optimization problem of heterogeneous resource matching and dynamic path planning, significantly outperforming traditional baselines in terms of resource utilization and average latency. This study confirms that incorporating graph-theoretic modeling with reinforcement learning offers a robust solution for the complex coupling of communication and computation in dynamic 6G networks. Full article

(This article belongs to the Special Issue Collaborative Intelligence for Connected Agents)

►▼ Show Figures

Figure 1

26 pages, 2242 KB

Open AccessArticle

A Multi-Source Feedback-Driven Framework for Generating WAF Test Cases

by Pengcheng Lu, Xiaofeng Zhong, Wenbo Xu and Yongjie Wang

Future Internet 2026, 18(3), 167; https://doi.org/10.3390/fi18030167 (registering DOI) - 20 Mar 2026

Abstract

Web application firewalls (WAFs) are critical defenses against persistent threats to web applications, yet their security evaluation remains challenging. Traditional manual testing methods are often inefficient and resource-intensive, while existing reinforcement learning (RL)-based automated approaches face two key limitations: (1) attackers cannot perceive opaque WAF rule logic; (2) boolean feedback from WAFs results in sparse/delayed rewards—sparse rewards trap agents in blind exploration, and delayed rewards hinder the association between early actions and final outcomes, adversely affecting learning efficiency. To address those challenges, we propose Ouroboros—a framework integrating genetic algorithm-based symbolic rule reconstruction (translating WAF rules into interpretable RNNs for fine-grained confidence scoring), timing side-channel analysis (evaluating rule-matching depth), and a multi-tiered reward mechanism to enable self-evolving RL testing. Experiments show that the framework reaches 89.2% bypass success rate on signature-based WAFs. This paper presents an efficient solution for automated WAF testing and delivers insights for optimizing rule logic and anomaly detection mechanisms. Full article

(This article belongs to the Special Issue Adversarial Attacks and Cyber Security)

►▼ Show Figures

Figure 1

27 pages, 1492 KB

Open AccessArticle

Managing Demand and Travel Time Uncertainties in Pandemic Emergencies: A Risk-Averse Multi-Objective Location- Routing Model

by Fenggang Li, Xiaodong Sun, Bangxing Xue, Jing Zhang, Pengpeng Yao and Qingbin Zou

Symmetry 2026, 18(3), 534; https://doi.org/10.3390/sym18030534 (registering DOI) - 20 Mar 2026

Abstract

During pandemic emergencies, demand for relief supplies in affected areas surges abruptly and evolves randomly and dynamically, resulting in highly asymmetric supply and demand. Ensuring timely and reliable supply requires robust decision-making under risk. This study addresses a stochastic multi-objective location-routing problem (LRP) that simultaneously considers demand uncertainty and travel time variability. A multi-scenario stochastic programming model is developed with three objectives: minimizing total system cost, minimizing total waiting time, and minimizing the composite conditional value at risk (CVaR–Rcomp) to capture tail risks under extreme scenarios. A novel regret-based risk mechanism is introduced to unify temporal and cost dimensions, enabling joint evaluation of uncertainties within a single framework. To solve this challenging high-dimensional problem, a reinforcement learning-enhanced NSGA-III (RL-NSGAIII) is proposed. Specifically, Q-learning generates high-quality initial solutions, which accelerate convergence and improve population diversity for NSGA-III. Case studies demonstrate that the proposed method outperforms traditional evolutionary algorithms in convergence efficiency and Pareto solution quality, while effectively revealing potential risk blind spots. The results provide quantitative decision support and robust optimization insights for emergency logistics networks operating under uncertain conditions. Full article

24 pages, 427 KB

Open AccessReview

A Survey on Recent Advances in the Integration of Discrete Event Systems and Artificial Intelligence

by Jie Ren, Ruotian Liu, Agostino Marcello Mangini and Maria Pia Fanti

Appl. Sci. 2026, 16(6), 3000; https://doi.org/10.3390/app16063000 - 20 Mar 2026

Abstract

The increasing complexity and uncertain system of modern discrete event system (DES) challenge traditional model-based control approaches, while artificial intelligence (AI) techniques offer powerful data-driven decision-making capabilities but lack formal guarantees. This review surveys recent research on the integration of AI with DES and supervisory control theory. Following a systematic literature mapping methodology, the literature is organized using a taxonomy based on three orthogonal perspectives: control and decision paradigm, system capability and property, and application and operational objectives. The review highlights how learning-based methods enhance adaptability and performance in DES, while also exposing persistent challenges related to safety, nonblocking behavior, data efficiency, and interpretability. By structuring existing approaches and identifying open issues, this review provides a coherent overview of the current research landscape and outlines key directions for future work on AI-enabled DES. Full article

(This article belongs to the Special Issue Modeling and Control of Discrete Event Systems)

►▼ Show Figures

Figure 1

23 pages, 6306 KB

Open AccessArticle

Trustless Federated Reinforcement Learning for VPP Dispatch

by Xin Zhang and Fan Liang

Electronics 2026, 15(6), 1303; https://doi.org/10.3390/electronics15061303 - 20 Mar 2026

Abstract

Large-scale Virtual Power Plants (VPPs) are increasingly essential as Distributed Energy Resources (DERs) assume ancillary service duties once supplied by conventional generation, yet scaling a VPP exposes a persistent trilemma among economic efficiency, data privacy, and operational security. Centralized coordination can approach optimal revenue but requires collecting fine-grained DER operational data and creates a single point of compromise. Federated Learning (FL) mitigates raw data centralization by keeping measurements and experience local, but it introduces a fragile trust assumption that the aggregator will correctly and fairly combine model updates. This trust gap is acute in reinforcement learning-based VPP control because aggregation deviations, including selectively dropping updates, manipulating weights, replaying stale models, or injecting a replacement model, can silently bias the learned policy and degrade both profit and compliance. We propose a zero-knowledge federated reinforcement learning framework for trustless VPP coordination in which each DER trains a local deep reinforcement learning agent to solve a multi-objective dispatch problem that balances ancillary service revenue against battery degradation under operational and grid constraints, while the global aggregation step is made externally verifiable. In each round, participants bind membership via signed receipts and commit to their updates, and the aggregator produces a zk-SNARK, proving that the published global parameters equal the agreed aggregation rule applied to the receipt-bound set of committed updates under a fixed-point encoding with range constraints. Verification is lightweight and can be performed independently by each DER, removing the need to trust the aggregator for aggregation integrity without centralizing raw DER operational data or trajectories. The proposed design does not aim to hide model updates from the aggregator. Instead, it provides external verifiability of the aggregation computation while keeping raw measurements and local experience. We formalize the threat model and verifiable security properties for aggregation correctness and update inclusion, present a circuit construction with proof complexity characterized by model dimension and fleet size, and evaluate the approach in power and cyber co-simulation on the IEEE 33 bus feeder with ancillary service signals. Results show near-centralized economic performance under benign conditions and improved robustness to aggregator side deviations compared to standard federated reinforcement learning. Full article

(This article belongs to the Special Issue Secure and Intelligent IoT & CPS: AI Driven Attack–Defense, Network Analysis and Smart Data Protection)

►▼ Show Figures

Figure 1

16 pages, 237 KB

Open AccessFeature PaperArticle

Sanctification and the Ordo Extractionis: Formative Sovereignty and Predictive Habituation

by Åke Elden

Religions 2026, 17(3), 392; https://doi.org/10.3390/rel17030392 - 20 Mar 2026

Abstract

Theological engagement with artificial intelligence has largely focused on applied ethics, addressing bias, governance, and labor displacement. While indispensable, this framing often presumes that algorithmic systems operate as external instruments acting upon already constituted subjects. This article argues that contemporary predictive architectures intervene at a deeper anthropological level by structuring attention, expectation, and habituation prior to deliberative judgment. It introduces the concept of ordo extractionis to designate a technologically mediated regime of formation characterized by behavioral trace extraction, probabilistic modeling, and recursive projection of statistically inferred continuity. Drawing on Augustine’s account of ordered love and temporality and Aquinas’s doctrine of habitus and the invisible mission of the Spirit, the article distinguishes algorithmic projection from sanctification as divergent pedagogies of temporal formation. Predictive systems stabilize continuity by extrapolating from measurable past behavior; sanctification reorders desire teleologically toward a final end not deducible from prior pattern and grounded in non-competitive divine causality. Algorithmic mediation is therefore interpreted pedagogically rather than metaphysically: it does not rival divine agency but participates creaturely in shaping the ecology within which habituation unfolds. Engagement with contemporary AI research on recommender systems, reinforcement learning, and generative models situates the argument within technological realism and resists determinism. The digital twin is analyzed as a probabilistic representation that acquires institutional authority when operationalized in ranking, profiling, and evaluative systems, without constituting a metaphysical competitor to the imago Dei. In response to anticipatory closure, Eucharistic anamnesis and epiclesis are developed as practices that re-situate memory and expectation within eschatological promise. The article concludes that the central theological question posed by AI is not whether machines can think, but how formative sovereignty over desire is exercised within technologically mediated modernity. Full article

(This article belongs to the Special Issue Theological and Ethical Reflections on Artificial Intelligence)

29 pages, 5347 KB

Open AccessArticle

Optimized Reinforcement Learning-Driven Model for Remote Sensing Change Detection

by Yan Zhao, Zhiyun Xiao, Tengfei Bao and Yulong Zhou

J. Imaging 2026, 12(3), 139; https://doi.org/10.3390/jimaging12030139 - 19 Mar 2026

Abstract

In recent years, deep learning has driven remarkable progress in remote sensing change detection (CD); however, practical deployment is still hindered by two limitations. First, CD results are easily degraded by imaging-induced uncertainties—mixed pixels and blurred boundaries, radiometric inconsistencies (e.g., shadows and seasonal illumination changes), and slight residual misregistration—leading to pseudo-changes and fragmented boundaries. Second, prevailing methods follow a static one-pass inference paradigm and lack an explicit feedback mechanism for adaptive error correction, which weakens generalization in complex or unseen scenes. To address these issues, we propose a feedback-driven CD framework that integrates a dual-branch U-Net with deep reinforcement learning (RL) for pixel-level probabilistic iterative refinement of an initial change probability map. The backbone produces a preliminary posterior estimate of change likelihood from multi-scale bi-temporal features, while a PPO-based RL agent formulates refinement as a Markov decision process. The agent leverages a state representation that fuses multi-scale features, prediction confidence/uncertainty, and spatial consistency cues (e.g., neighborhood coherence and edge responses) to apply multi-step corrective actions. From an imaging and interpretation perspective, the RL module can be viewed as a learnable, self-adaptive imaging optimization mechanism: for high-risk regions affected by blurred boundaries, radiometric inconsistencies, and local misalignment, the agent performs feedback-driven multi-step corrections to improve boundary fidelity and spatial coherence while suppressing pseudo-changes caused by shadows and illumination variations. Experiments on four datasets (CDD, SYSU-CD, PVCD, and BRIGHT) verify consistent improvements. Using SiamU-Net as an example, the proposed RL refinement increases mIoU by 3.07, 2.54, 6.13, and 3.1 points on CDD, SYSU-CD, PVCD, and BRIGHT, respectively, with similarly consistent gains observed when the same RL module is integrated into other representative CD backbones. Full article

(This article belongs to the Section AI in Imaging)

►▼ Show Figures

Figure 1

26 pages, 4527 KB

Open AccessFeature PaperArticle

Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning

by Yunjiao Pei, Jun Zhao, Yankai Chen, Jianfeng Li, Qiaoting Chen, Zichen Liu, Xiyan Li, Yifan Zhai and Qi Tang

Mathematics 2026, 14(6), 1043; https://doi.org/10.3390/math14061043 - 19 Mar 2026

Abstract

Pricing multi-peril agricultural insurance under compound climate hazards demands a framework that captures stochastic dependence among heterogeneous perils, accommodates non-stationary loss dynamics, and supports adaptive policy optimisation. We demonstrate that backward stochastic differential equations, combined with copula dependence, recurrent neural networks, and reinforcement learning, provide a unifying language for this task; the contribution lies in their principled integration. The dynamic premium is the unique adapted solution of a BSDE whose driver encodes compound-risk dependence through a Student-t copula, forward loss dynamics through a jump-diffusion process, and a green-finance adjustment through an optimal control variable. Within this framework we derive three progressive results by adapting standard BSDE theory to the compound-dependence and policy-control setting. First, existence and uniqueness hold under Lipschitz and square-integrability conditions. Second, a comparison theorem guarantees that a larger correlation matrix yields higher premiums; the degrees-of-freedom effect enters separately through the risk-loading magnitude. Third, the Euler discretisation converges at a rate of one half of the time-step size, with copula estimation, LSTM conditional expectation approximation, and Q-learning HJB solution as sequential components. Applied to eleven Zhejiang cities (2014–2023,

N \times T = 110

), in this illustrative application the framework reduces premium variance by 43.5 percent (bootstrap 95% CI:

[38.2 %, 48.7 %]

) while maintaining actuarial adequacy with a mean loss ratio of 0.678, though the modest sample size warrants caution in generalising these findings. Each component contributes statistically significant improvements confirmed by the Friedman test at the 0.1 percent significance level. Full article

(This article belongs to the Special Issue Mathematical Modeling for Economics and Finance: Probability, Stochastic Processes, and Differential Equations)

►▼ Show Figures

Figure 1

21 pages, 308 KB

Open AccessArticle

Boys Don’t Cry? Rethinking Emotions and Manhood Through SEL in Pakistani Secondary Schools

by Rahat Shah, Sayed Attaullah Shah and Sadia Saeed

Behav. Sci. 2026, 16(3), 458; https://doi.org/10.3390/bs16030458 - 19 Mar 2026

Abstract

Global research on social–emotional learning (SEL) demonstrates robust benefits for student well-being and academic outcomes, yet SEL is still largely treated as gender and culturally neutral, with little attention to how it intersects with locally specific constructions of masculinity. We address this gap through a qualitative study in three urban secondary schools in Khyber Pakhtunkhwa, Pakistan, combining focus groups with boys aged 13–16 (n = 18), student interviews (n = 10), and teacher/counsellor interviews (n = 10). Using critical masculinity theory, the sociology of emotions, and transformative SEL, a reflexive thematic analysis identifies four patterns: (i) sadness and fear framed as status risks while anger signals strength, (ii) “switching off” feelings as masculinized emotion work tied to locally valued ideals of sabar (endurance) and izzat (honour), (iii) fragile “islands of care” where privacy and dignity enable conditional vulnerability, and (iv) SEL-like practices fostering empathy but also reinforcing stigma when emotions are labelled unmanly. We argue that SEL is a contested site where masculinities are reproduced and renegotiated, and we propose five findings-grounded design principles, including graduated emotional entry points, anti-ridicule norms, and indirect pedagogy for gender-attentive SEL that reduces stigma and supports non-violent masculinities in Pakistani secondary schooling. Full article

(This article belongs to the Special Issue Learning to Thrive: Integrating Social Emotional Learning Across the Diverse Educational Context)

20 pages, 933 KB

Open AccessReview

Robotic Welding Technologies for Intersecting and Irregular Pipes and Pipe Joints Toward Automated Production Line Integration: A Review

by Hrvoje Cajner, Patrik Vlašić, Viktor Ložar, Matija Golec and Maja Trstenjak

Appl. Sci. 2026, 16(6), 2974; https://doi.org/10.3390/app16062974 - 19 Mar 2026

Abstract

Robotic pipe welding represents a key and rapidly evolving technology for the automation of pipe and pipe-joint welding processes with standard, intersecting, and complex geometries. This review analyses 84 studies published over the past three decades, categorising them into four primary research areas: general pipe welding, intersecting pipes, boiler and tube-to-tubesheet welding, and control and modelling. Two separate comparative analyses were conducted: one within intersecting pipe research and another within the control and modelling category. The aggregated findings reveal consistent, complementary patterns: simulation and laboratory experiments clearly dominate validation methods, while industrial-scale evaluations remain scarce. The results further demonstrate that control strategies, sensor integration, and validation levels are strongly interconnected, collectively determining system performance, reliability, and practical applicability. Despite significant progress, challenges remain, including system integration complexity, limited robustness in variable industrial environments, insufficient real-time adaptive control, and inconsistent quantitative performance evaluation. Further research should prioritise the development of digital twins, human–robot collaboration, multi-sensor fusion, reinforcement learning-based adaptive control, and scalable industrial deployment. This review provides an overview of current progress and outlines key directions for developing intelligent and reliable robotic pipe welding systems. Full article

(This article belongs to the Section Mechanical Engineering)

23 pages, 3219 KB

Open AccessArticle

Hybrid Data Curation for Imitation Learning with Physics- Generated Trajectories

by Mincheol Lee, Deun-Sol Cho and Won-Tae Kim

Appl. Sci. 2026, 16(6), 2968; https://doi.org/10.3390/app16062968 - 19 Mar 2026

Abstract

Robotic manipulators were initially introduced to replace repetitive human labor and have since evolved to perform complex tasks in dynamic environments. In such systems, imitation learning and reinforcement learning models capable of real-time trajectory generation are widely applied. Among these approaches, imitation learning enables rapid training when high-quality datasets are available. However, it suffers from high costs associated with collecting expert demonstration data and significant performance variability depending on data quality. Recently, learning approaches utilizing large-scale datasets have been explored, but they often struggle to guarantee reliable performance in tasks requiring precise control and incur substantial computational costs for model construction, limiting their applicability as a general-purpose learning strategy. To address these limitations, this paper proposes an imitation learning framework that integrates sampling-based motion planning with a hybrid data curation strategy. The proposed method employs a sampling-based planner (e.g., RRT*) to generate diverse physically feasible trajectories, thereby reducing the cost of acquiring expert demonstration data. The generated trajectories are then curated through clustering-based grouping and rule-based filtering to select high-quality training samples from large-scale datasets. The proposed framework automatically generates physically feasible trajectories while selecting high-quality data from large trajectory pools, thereby improving training stability and reducing data-related costs. Experimental results demonstrate that the proposed method achieves an average success rate of 79.1% (95% CI: 74.3–83.2%) and produces trajectories with shorter trajectories, lower final distances, and reduced joint movements compared to conventional filtering methods. Full article

(This article belongs to the Special Issue Digital Twin and IoT, 2nd Edition)

►▼ Show Figures

Figure 1

29 pages, 15025 KB

Open AccessArticle

Robot End-Effectors Adaptive Design Method Based on Embedding Domain Knowledge into Reinforcement Learning

by Yong Zhu, Taihua Zhang, Yao Lu and Liguo Yao

Sensors 2026, 26(6), 1933; https://doi.org/10.3390/s26061933 - 19 Mar 2026

Abstract

Existing robot end-effectors design methods lack structured domain prior knowledge support and have insufficient interaction with the environment, making it difficult to guarantee the accuracy of the design results. An adaptive design method is proposed that deeply embeds domain knowledge of end effectors into the design process, treats key design parameters as environmental variables, and optimizes them adaptively through reinforcement learning algorithms in perception and feedback. In a simulation environment constructed by combining a knowledge graph, a two-finger translational gripper is used as an example robot end-effector to acquire target data via sensors, and reinforcement learning is used to adaptively optimize the gripper’s key parameters. Experiments are conducted on a simulation platform with three typical tasks, yielding the optimal parameter range. Compared to the proximal policy optimization (PPO) algorithm, which has no prior knowledge input, the knowledge graph embedding proximal policy optimization (KGPPO) algorithm improves the average reward for gripper length and gripper force by 63.96% and 43.09%, respectively, for grasping eggs. The KGPPO algorithm achieves the highest average reward and the best stability compared with other algorithms. Experiments show that this method can significantly improve the efficiency, stability, and accuracy of design parameter optimization. Full article

(This article belongs to the Special Issue A New Era of Embodiment: Cognitive Breakthroughs and Scene Adaptation in Robot Perception, Decision-Making and Autonomous Control)

►▼ Show Figures

Figure 1

31 pages, 1687 KB

Open AccessArticle

A Hybrid Planning–Learning Framework for Autonomous Navigation with Dynamic Obstacles

by Hatice Arslan Öztürk, Sırma Yavuz and Çetin Kaya Koç

Appl. Sci. 2026, 16(6), 2961; https://doi.org/10.3390/app16062961 - 19 Mar 2026

Abstract

Traditional navigation methods work well in known, static environments but degrade in real-world settings with dynamic and unpredictable obstacles. This paper presents Double Deep Q-Network with A* guidance (DDQNA), a hybrid navigation algorithm that enables an agent to traverse mazes containing static [...] Read more.

ϵ

-greedy policy, and it introduces a redesigned reward function and an improved action-selection mechanism to better exploit A*’s directional cues during training. We evaluate DDQNA in a custom Pygame simulation across 11 environments of increasing difficulty. Experimental results show that DDQNA consistently outperforms the standard DDQN and other state-of-the-art reinforcement learning baselines, achieving higher goal-reaching rates, fewer visited cells, shorter computation times, and higher cumulative rewards. These results indicate that DDQNA provides both effective navigation and computational efficiency in complex environments with static and dynamic obstacles. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 142.

Go to page 1 2 3 4 5

Search Results (7,061)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI