Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (250)

Search Parameters:
Keywords = soft actor–critic

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 4996 KB  
Article
Energy-Efficient, Multi-Agent Deep Reinforcement Learning Approach for Adaptive Beacon Selection in AUV-Based Underwater Localization
by Zahid Ullah Khan, Hangyuan Gao, Farzana Kulsoom, Syed Agha Hassnain Mohsan, Aman Muhammad and Hassan Nazeer Chaudry
J. Mar. Sci. Eng. 2026, 14(3), 262; https://doi.org/10.3390/jmse14030262 - 27 Jan 2026
Abstract
Accurate and energy-efficient localization of autonomous underwater vehicles (AUVs) remains a fundamental challenge due to the complex, bandwidth-limited, and highly dynamic nature of underwater acoustic environments. This paper proposes a fully adaptive deep reinforcement learning (DRL)-driven localization framework for AUVs operating in Underwater [...] Read more.
Accurate and energy-efficient localization of autonomous underwater vehicles (AUVs) remains a fundamental challenge due to the complex, bandwidth-limited, and highly dynamic nature of underwater acoustic environments. This paper proposes a fully adaptive deep reinforcement learning (DRL)-driven localization framework for AUVs operating in Underwater Acoustic Sensor Networks (UAWSNs). The localization problem is formulated as a Markov Decision Process (MDP) in which an intelligent agent jointly optimizes beacon selection and transmit power allocation to minimize long-term localization error and energy consumption. A hierarchical learning architecture is developed by integrating four actor–critic algorithms, which are (i) Twin Delayed Deep Deterministic Policy Gradient (TD3), (ii) Soft Actor–Critic (SAC), (iii) Multi-Agent Deep Deterministic Policy Gradient (MADDPG), and (iv) Distributed DDPG (D2DPG), enabling robust learning under non-stationary channels, cooperative multi-AUV scenarios, and large-scale deployments. A round-trip time (RTT)-based geometric localization model incorporating a depth-dependent sound speed gradient is employed to accurately capture realistic underwater acoustic propagation effects. A multi-objective reward function jointly balances localization accuracy, energy efficiency, and ranging reliability through a risk-aware metric. Furthermore, the Cramér–Rao Lower Bound (CRLB) is derived to characterize the theoretical performance limits, and a comprehensive complexity analysis is performed to demonstrate the scalability of the proposed framework. Extensive Monte Carlo simulations show that the proposed DRL-based methods achieve significantly lower localization error, lower energy consumption, faster convergence, and higher overall system utility than classical TD3. These results confirm the effectiveness and robustness of DRL for next-generation adaptive underwater localization systems. Full article
(This article belongs to the Section Ocean Engineering)
23 pages, 3037 KB  
Article
Depth Matters: Geometry-Aware RGB-D-Based Transformer-Enabled Deep Reinforcement Learning for Mapless Navigation
by Alpaslan Burak İnner and Mohammed E. Chachoua
Appl. Sci. 2026, 16(3), 1242; https://doi.org/10.3390/app16031242 - 26 Jan 2026
Abstract
Autonomous navigation in unknown environments demands policies that can jointly perceive semantic context and geometric safety. Existing Transformer-enabled deep reinforcement learning (DRL) frameworks, such as the Goal-guided Transformer Soft Actor–Critic (GoT-SAC), rely on temporal stacking of multiple RGB frames, which encodes short-term motion [...] Read more.
Autonomous navigation in unknown environments demands policies that can jointly perceive semantic context and geometric safety. Existing Transformer-enabled deep reinforcement learning (DRL) frameworks, such as the Goal-guided Transformer Soft Actor–Critic (GoT-SAC), rely on temporal stacking of multiple RGB frames, which encodes short-term motion cues but lacks explicit spatial understanding. This study introduces a geometry-aware RGB-D early fusion modality that replaces temporal redundancy with cross-modal alignment between appearance and depth. Within the GoT-SAC framework, we integrate a pixel-aligned RGB-D input into the Transformer encoder, enabling the attention mechanism to simultaneously capture semantic textures and obstacle geometry. A comprehensive systematic ablation study was conducted across five modality variants (4RGB, RGB-D, G-D, 4G-D, and 4RGB-D) and three fusion strategies (early, parallel, and late) under identical hyperparameter settings in a controlled simulation environment. The proposed RGB-D early fusion achieved a 40.0% success rate and +94.1 average reward, surpassing the canonical 4RGB baseline (28.0% success, +35.2 reward), while a tuned configuration further improved performance to 54.0% success and +146.8 reward. These results establish early pixel-level multimodal fusion (RGB-D) as a principled and efficient successor to temporal stacking, yielding higher stability, sample efficiency, and geometry-aware decision-making. This work provides the first controlled evidence that spatially aligned multimodal fusion within Transformer-based DRL significantly enhances mapless navigation performance and offers a reproducible foundation for sim-to-real transfer in autonomous mobile robots. Full article
Show Figures

Figure 1

23 pages, 5234 KB  
Article
Training Agents for Strategic Curling Through a Unified Reinforcement Learning Framework
by Yuseong Son, Jaeyoung Park and Byunghwan Jeon
Mathematics 2026, 14(3), 403; https://doi.org/10.3390/math14030403 - 23 Jan 2026
Viewed by 90
Abstract
Curling presents a challenging continuous-control problem in which shot outcomes depend on long-horizon interactions between complex physical dynamics, strategic intent, and opponent responses. Despite recent progress in applying reinforcement learning (RL) to games and sports, curling lacks a unified environment that jointly supports [...] Read more.
Curling presents a challenging continuous-control problem in which shot outcomes depend on long-horizon interactions between complex physical dynamics, strategic intent, and opponent responses. Despite recent progress in applying reinforcement learning (RL) to games and sports, curling lacks a unified environment that jointly supports stable, rule-consistent simulation, structured state abstraction, and scalable agent training. To address this gap, we introduce a comprehensive learning framework for curling AI, consisting of a full-sized simulation environment, a task-aligned Markov decision process (MDP) formulation, and a two-phase training strategy designed for stable long-horizon optimization. First, we propose a novel MDP formulation that incorporates stone configuration, game context, and dynamic scoring factors, enabling an RL agent to reason simultaneously about physical feasibility and strategic desirability. Second, we present a two-phase curriculum learning procedure that significantly improves sample efficiency: Phase 1 trains the agent to master delivery mechanics by rewarding accurate placement around the tee line, while Phase 2 transitions to strategic learning with score-based rewards that encourage offensive and defensive planning. This staged training stabilizes policy learning and reduces the difficulty of direct exploration in the full curling action space. We integrate this MDP and training procedure into a unified Curling RL Framework, built upon a custom simulator designed for stability, reproducibility, and efficient RL training and a self-play mechanism tailored for strategic decision-making. Agent policies are optimized using Soft Actor–Critic (SAC), an entropy-regularized off-policy algorithm designed for continuous control. As a case study, we compare the learned agent’s shot patterns with elite match records from the men’s division of the Le Gruyère AOP European Curling Championships 2023, using 6512 extracted shot images. Experimental results demonstrate that the proposed framework learns diverse, human-like curling shots and outperforms ablated variants across both learning curves and head-to-head evaluations. Beyond curling, our framework provides a principled template for developing RL agents in physics-driven, strategy-intensive sports environments. Full article
(This article belongs to the Special Issue Applications of Intelligent Game and Reinforcement Learning)
26 pages, 3381 KB  
Article
Intelligent Control Framework for Optimal Energy Management of University Campus Microgrid
by Galia Marinova, Edmond Hajrizi, Besnik Qehaja and Vassil Guliashki
Smart Cities 2026, 9(1), 18; https://doi.org/10.3390/smartcities9010018 - 22 Jan 2026
Viewed by 54
Abstract
This study proposes a smart energy management framework for a university campus microgrid aimed at reducing dependence on the main power grid and increasing the utilization of photovoltaic (PV) generation under dynamic load and environmental conditions. The core contribution is a two-stage approach [...] Read more.
This study proposes a smart energy management framework for a university campus microgrid aimed at reducing dependence on the main power grid and increasing the utilization of photovoltaic (PV) generation under dynamic load and environmental conditions. The core contribution is a two-stage approach that combines a genetic algorithm (GA) for static day-ahead optimization with a soft actor-critic (SAC) reinforcement learning (RL) agent performing adaptive supervisory management of microgrid active and reactive power flows via battery control. The GA provides an optimal reference schedule under forecasted conditions, while the SAC agent is trained on eight representative scenarios derived from measured PV generation and campus load data to adapt battery operation and grid exchange under uncertainty. The results show that the benefit of RL does not lie in reproducing the static GA solution, but in learning economically rationally adaptive behavior. In particular, the SAC agent exploits low-tariff periods and hedges against adverse PV conditions by proactively adjusting battery charging strategies in real time. This adaptive behavior addresses a key limitation of static optimization, which cannot respond to deviations from forecasted operation, and represents the main added value of the proposed framework. From a practical perspective, the GA-SAC architecture operates at a supervisory level with low computational requirements, making it suitable for scalable deployment in smart campus and smart city energy management systems. Full article
Show Figures

Figure 1

17 pages, 26741 KB  
Article
Dual-Agent Deep Reinforcement Learning for Low-Carbon Economic Dispatch in Wind-Integrated Microgrids Based on Carbon Emission Flow
by Wenjun Qiu, Hebin Ruan, Xiaoxiao Yu, Yuhang Li, Yicheng Liu and Zhiyi He
Energies 2026, 19(2), 551; https://doi.org/10.3390/en19020551 - 22 Jan 2026
Viewed by 25
Abstract
High renewable penetration in microgrids makes low-carbon economic dispatch under uncertainty challenging, and single-agent deep reinforcement learning (DRL) often yields unstable cost–emission trade-offs. This study proposes a dual-agent DRL framework that explicitly balances operational economy and environmental sustainability. A Proximal Policy Optimization (PPO) [...] Read more.
High renewable penetration in microgrids makes low-carbon economic dispatch under uncertainty challenging, and single-agent deep reinforcement learning (DRL) often yields unstable cost–emission trade-offs. This study proposes a dual-agent DRL framework that explicitly balances operational economy and environmental sustainability. A Proximal Policy Optimization (PPO) agent focuses on minimizing operating cost, while a Soft Actor–Critic (SAC) agent targets carbon emission reduction; their actions are combined through an adaptive weighting strategy. The framework is supported by carbon emission flow (CEF) theory, which enables network-level tracing of carbon flows, and a stepped carbon pricing mechanism that internalizes dynamic carbon costs. Demand response (DR) is incorporated to enhance operational flexibility. The dispatch problem is formulated as a Markov Decision Process, allowing the dual-agent system to learn policies through interaction with the environment. Case studies on a modified PJM 5-bus test system show that, compared with a Deep Deterministic Policy Gradient (DDPG) baseline, the proposed method reduces total operating cost, carbon emissions, and wind curtailment by 16.8%, 11.3%, and 15.2%, respectively. These results demonstrate that the proposed framework is an effective solution for economical and low-carbon operation in renewable-rich power systems. Full article
Show Figures

Figure 1

23 pages, 4564 KB  
Article
Control of Wave Energy Converters Using Reinforcement Learning
by Odai R. Bani Hani, Zeiad Khafagy, Matthew Staber, Ashraf Gaffar and Ossama Abdelkhalik
J. Mar. Sci. Eng. 2026, 14(2), 211; https://doi.org/10.3390/jmse14020211 - 20 Jan 2026
Viewed by 184
Abstract
Efficient control of wave energy converters (WECs) is crucial for maximizing energy capture and reducing the Levelized Cost of Energy (LCoE). In this study, we employ a deep reinforcement learning (DRL) framework based on the Soft Actor-Critic (SAC) and Deep Deterministic Policy Gradient [...] Read more.
Efficient control of wave energy converters (WECs) is crucial for maximizing energy capture and reducing the Levelized Cost of Energy (LCoE). In this study, we employ a deep reinforcement learning (DRL) framework based on the Soft Actor-Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms for WEC control. Our approach leverages a novel decoupled co-simulation architecture, training agents episodically in MATLAB to export a robust policy within the WEC-Sim environment. Furthermore, we utilize a rigorous benchmarking protocol to compare the SAC and DDPG agents against a classical Bang-Singular-Bang (BSB) optimal control benchmark. Evaluation under realistic, irregular Pierson-Moskowitz sea states demonstrates that the performance of the RL agents is very close to that of the BSB optimal control baseline. Monte Carlo simulations show that both the DDPG and SAC agents can perform even better than the BSB when the model of the BSB is different from the simulation environment. Full article
Show Figures

Figure 1

40 pages, 7546 KB  
Article
Hierarchical Soft Actor–Critic Agent with Automatic Entropy, Twin Critics, and Curriculum Learning for the Autonomy of Rock-Breaking Machinery in Mining Comminution Processes
by Guillermo González, John Kern, Claudio Urrea and Luis Donoso
Processes 2026, 14(2), 365; https://doi.org/10.3390/pr14020365 - 20 Jan 2026
Viewed by 259
Abstract
This work presents a hierarchical deep reinforcement learning (DRL) framework based on Soft Actor–Critic (SAC) for the autonomy of rock-breaking machinery in surface mining comminution processes. The proposed approach explicitly integrates mobile navigation and hydraulic manipulation as coupled subprocesses within a unified decision-making [...] Read more.
This work presents a hierarchical deep reinforcement learning (DRL) framework based on Soft Actor–Critic (SAC) for the autonomy of rock-breaking machinery in surface mining comminution processes. The proposed approach explicitly integrates mobile navigation and hydraulic manipulation as coupled subprocesses within a unified decision-making architecture, designed to operate under the unstructured and highly uncertain conditions characteristic of open-pit mining operations. The system employs a hysteresis-based switching mechanism between specialized SAC subagents, incorporating automatic entropy tuning to balance exploration and exploitation, twin critics to mitigate value overestimation, and curriculum learning to manage the progressive complexity of the task. Two coupled subsystems are considered, namely: (i) a tracked mobile machine with a differential drive, whose continuous control enables safe navigation, and (ii) a hydraulic manipulator equipped with an impact hammer, responsible for the fragmentation and dismantling of rock piles through continuous joint torque actuation. Environmental perception is modeled using processed perceptual variables obtained from point clouds generated by an overhead depth camera, complemented with state variables of the machinery. System performance is evaluated in unstructured and uncertain simulated environments using process-oriented metrics, including operational safety, task effectiveness, control smoothness, and energy consumption. The results show that the proposed framework yields robust, stable policies that achieve superior overall process performance compared to equivalent hierarchical configurations and ablation variants, thereby supporting its potential applicability to DRL-based mining automation systems. Full article
(This article belongs to the Special Issue Advances in the Control of Complex Dynamic Systems)
Show Figures

Figure 1

33 pages, 4465 KB  
Article
Environmentally Sustainable HVAC Management in Smart Buildings Using a Reinforcement Learning Framework SACEM
by Abdullah Alshammari, Ammar Ahmed E. Elhadi and Ashraf Osman Ibrahim
Sustainability 2026, 18(2), 1036; https://doi.org/10.3390/su18021036 - 20 Jan 2026
Viewed by 132
Abstract
Heating, ventilation, and air-conditioning (HVAC) systems dominate energy consumption in hot-climate buildings, where maintaining occupant comfort under extreme outdoor conditions remains a critical challenge, particularly under emerging time-of-use (TOU) electricity pricing schemes. While deep reinforcement learning (DRL) has shown promise for adaptive HVAC [...] Read more.
Heating, ventilation, and air-conditioning (HVAC) systems dominate energy consumption in hot-climate buildings, where maintaining occupant comfort under extreme outdoor conditions remains a critical challenge, particularly under emerging time-of-use (TOU) electricity pricing schemes. While deep reinforcement learning (DRL) has shown promise for adaptive HVAC control, existing approaches often suffer from comfort violations, myopic decision making, and limited robustness to uncertainty. This paper proposes a comfort-first hybrid control framework that integrates Soft Actor–Critic (SAC) with a Cross-Entropy Method (CEM) refinement layer, referred to as SACEM. The framework combines data-efficient off-policy learning with short-horizon predictive optimization and safety-aware action projection to explicitly prioritize thermal comfort while minimizing energy use, operating cost, and peak demand. The control problem is formulated as a Markov Decision Process using a simplified thermal model representative of commercial buildings in hot desert climates. The proposed approach is evaluated through extensive simulation using Saudi Arabian summer weather conditions, realistic occupancy patterns, and a three-tier TOU electricity tariff. Performance is assessed against state-of-the-art baselines, including PPO, TD3, and standard SAC, using comfort, energy, cost, and peak demand metrics, complemented by ablation and disturbance-based stress tests. Results show that SACEM achieves a comfort score of 95.8%, while reducing energy consumption and operating cost by approximately 21% relative to the strongest baseline. The findings demonstrate that integrating comfort-dominant reward design with decision-time look-ahead yields robust, economically viable HVAC control suitable for deployment in hot-climate smart buildings. Full article
Show Figures

Figure 1

25 pages, 4405 KB  
Article
Research on Multi-USV Collision Avoidance Based on Priority-Driven and Expert-Guided Deep Reinforcement Learning
by Lixin Xu, Zixuan Wang, Zhichao Hong, Chaoshuai Han, Jiarong Qin and Ke Yang
J. Mar. Sci. Eng. 2026, 14(2), 197; https://doi.org/10.3390/jmse14020197 - 17 Jan 2026
Viewed by 165
Abstract
Deep reinforcement learning (DRL) has demonstrated considerable potential for autonomous collision avoidance in unmanned surface vessels (USVs). However, its application in complex multi-agent maritime environments is often limited by challenges such as convergence issues and high computational costs. To address these issues, this [...] Read more.
Deep reinforcement learning (DRL) has demonstrated considerable potential for autonomous collision avoidance in unmanned surface vessels (USVs). However, its application in complex multi-agent maritime environments is often limited by challenges such as convergence issues and high computational costs. To address these issues, this paper proposes an expert-guided DRL algorithm that integrates a Dual-Priority Experience Replay (DPER) mechanism with a Hybrid Reciprocal Velocity Obstacles (HRVO) expert module. Specifically, the DPER mechanism prioritizes high-value experiences by considering both temporal-difference (TD) error and collision avoidance quality. The TD error prioritization selects experiences with large TD errors, which typically correspond to critical state transitions with significant prediction discrepancies, thus accelerating value function updates and enhancing learning efficiency. At the same time, the collision avoidance quality prioritization reinforces successful evasive actions, preventing them from being overshadowed by a large volume of ordinary experiences. To further improve algorithm performance, this study integrates a COLREGs-compliant HRVO expert module, which guides early-stage policy exploration while ensuring compliance with regulatory constraints. The expert mechanism is incorporated into the Soft Actor-Critic (SAC) algorithm and validated in multi-vessel collision avoidance scenarios using maritime simulations. The experimental results demonstrate that, compared to traditional DRL baselines, the proposed algorithm reduces training time by 60.37% and, in comparison to rule-based algorithms, achieves shorter navigation times and lower rudder frequencies. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

22 pages, 3437 KB  
Article
A Soft Actor-Critic-Based Energy Management Strategy for Fuel Cell Vehicles Considering Fuel Cell Degradation
by Handong Zeng, Changqing Du and Yifeng Hu
Energies 2026, 19(2), 430; https://doi.org/10.3390/en19020430 - 15 Jan 2026
Viewed by 126
Abstract
Energy management strategies (EMSs) play a critical role in improving both the efficiency and durability of fuel cell electric vehicles (FCEVs). To overcome the limited adaptability and insufficient durability consideration of existing deep reinforcement learning-based EMSs, this study develops a degradation-aware energy management [...] Read more.
Energy management strategies (EMSs) play a critical role in improving both the efficiency and durability of fuel cell electric vehicles (FCEVs). To overcome the limited adaptability and insufficient durability consideration of existing deep reinforcement learning-based EMSs, this study develops a degradation-aware energy management strategy based on the Soft Actor–Critic (SAC) algorithm. By leveraging SAC’s maximum-entropy framework, the proposed method enhances exploration efficiency and avoids premature convergence to operating patterns that are unfavorable to fuel cell durability. A reward function explicitly penalizing hydrogen consumption, power fluctuation, and degradation-related operating behaviors is designed, and the influences of reward weighting and key hyperparameters on learning stability and performance are systematically analyzed. The proposed SAC-based EMS is evaluated against Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) strategies under both training and unseen driving cycles. Simulation results demonstrate that SAC achieves a superior and robust trade-off between hydrogen economy and degradation mitigation, maintaining improved adaptability and durability under varying operating conditions. These findings indicate that integrating degradation awareness with entropy-regularized reinforcement learning provides an effective framework for practical EMS design in FCEVs. Full article
(This article belongs to the Section E: Electric Vehicles)
Show Figures

Figure 1

18 pages, 11774 KB  
Article
Retrieval Augment: Robust Path Planning for Fruit-Picking Robot Based on Real-Time Policy Reconstruction
by Binhao Chen, Shuo Zhang, Zichuan He and Liang Gong
Sustainability 2026, 18(2), 829; https://doi.org/10.3390/su18020829 - 14 Jan 2026
Viewed by 133
Abstract
The working environment of fruit-picking robots is highly complex, involving numerous obstacles such as branches. Sampling-based algorithms like Rapidly Exploring Random Trees (RRTs) are faster but suffer from low success rates and poor path quality. Deep reinforcement learning (DRL) has excelled in high-degree-of-freedom [...] Read more.
The working environment of fruit-picking robots is highly complex, involving numerous obstacles such as branches. Sampling-based algorithms like Rapidly Exploring Random Trees (RRTs) are faster but suffer from low success rates and poor path quality. Deep reinforcement learning (DRL) has excelled in high-degree-of-freedom (DOF) robot path planning, but typically requires substantial computational resources and long training cycles, which limits its applicability in resource-constrained and large-scale agricultural deployments. However, picking robot agents trained by DRL underperform because of the complexity and dynamics of the picking scenes. We propose a real-time policy reconstruction method based on experience retrieval to augment an agent trained by DRL. The key idea is to optimize the agent’s policy during inference rather than retraining, thereby reducing training cost, energy consumption, and data requirements, which are critical factors for sustainable agricultural robotics. We first use Soft Actor–Critic (SAC) to train the agent with simple picking tasks and less episodes. When faced with complex picking tasks, instead of retraining the agent, we reconstruct its policy by retrieving experience from similar tasks and revising action in real time, which is implemented specifically by real-time action evaluation and rejection sampling. Overall, the agent evolves into an augment agent through policy reconstruction, enabling it to perform much better in complex tasks with narrow passages and dense obstacles than the original agent. We test our method both in simulation and in the real world. Results show that the augment agent outperforms the original agent and sampling-based algorithms such as BIT* and AIT* in terms of success rate (+133.3%) and path quality (+60.4%), demonstrating its potential to support reliable, scalable, and sustainable fruit-picking automation. Full article
(This article belongs to the Section Sustainable Agriculture)
Show Figures

Figure 1

27 pages, 3772 KB  
Article
Research on Three-Dimensional Simulation Technology Based on an Improved RRT Algorithm
by Nan Zhang, Yang Luan, Chengkun Li, Weizhou Xu, Fengju Zhu, Chao Ye and Nianxia Han
Electronics 2026, 15(2), 286; https://doi.org/10.3390/electronics15020286 - 8 Jan 2026
Viewed by 176
Abstract
As urban power grids grow increasingly complex and underground space resources become increasingly scarce, traditional two-dimensional cable design methods face significant challenges in spatial representation accuracy and design efficiency. This study proposes an automated cable path planning method based on an improved Rapidly [...] Read more.
As urban power grids grow increasingly complex and underground space resources become increasingly scarce, traditional two-dimensional cable design methods face significant challenges in spatial representation accuracy and design efficiency. This study proposes an automated cable path planning method based on an improved Rapidly exploring Random Tree (RRT) algorithm. This framework first introduces an enhanced RRT algorithm (referred to as ABS-RRT) that integrates adaptive stride, target-biased sampling, and Soft Actor-Critic reinforcement learning. This algorithm automates the planning of serpentine cable laying paths in confined environments such as cable tunnels and manholes. Subsequently, through trajectory simplification and smoothing optimization, it generates final paths that are safe, smooth, and compliant with engineering specifications. Simulation validation on a typical cable tunnel project in a city’s core area demonstrates that compared to the traditional RRT algorithm, this approach reduces path planning time by over 57%, decreases path length by 8.1%, and lowers the number of nodes by 52%. These results validate the algorithm’s broad application potential in complex urban power grid projects. Full article
(This article belongs to the Special Issue Planning, Scheduling and Control of Grids with Renewables)
Show Figures

Figure 1

27 pages, 5365 KB  
Article
Autonomous Maneuvering Decision-Making Method for Unmanned Aerial Vehicle Based on Soft Actor-Critic Algorithm
by Shiming Quan, Su Cao, Chang Wang and Huangchao Yu
Drones 2026, 10(1), 35; https://doi.org/10.3390/drones10010035 - 6 Jan 2026
Viewed by 232
Abstract
Focusing on continuous action space methods for autonomous maneuvering decision making in 1v1 unmanned aerial vehicle scenarios, this paper first establishes a UAV kinematic model and a decision-making framework under the Markov Decision Process. Second, a continuous control strategy based on the Soft [...] Read more.
Focusing on continuous action space methods for autonomous maneuvering decision making in 1v1 unmanned aerial vehicle scenarios, this paper first establishes a UAV kinematic model and a decision-making framework under the Markov Decision Process. Second, a continuous control strategy based on the Soft Actor-Critic (SAC) reinforcement learning algorithm is developed to generate precise maneuvering commands. Then, a multi-dimensional situation-coupled reward function is designed, introducing a Health Point (HP) metric to assess situational advantages and simulate cumulative effects quantitatively. Finally, extensive simulations in a custom Gym environment validate the effectiveness of the proposed method and its robustness under both ideal and noisy observation conditions. Full article
Show Figures

Figure 1

33 pages, 5328 KB  
Article
AI-Guided Inference of Morphodynamic Attractor-like States in Glioblastoma
by Simona Ruxandra Volovăț, Diana Ioana Panaite, Mădălina Raluca Ostafe, Călin Gheorghe Buzea, Dragoș Teodor Iancu, Maricel Agop, Lăcrămioara Ochiuz, Dragoș Ioan Rusu and Cristian Constantin Volovăț
Diagnostics 2026, 16(1), 139; https://doi.org/10.3390/diagnostics16010139 - 1 Jan 2026
Viewed by 498
Abstract
Background/Objectives: Glioblastoma (GBM) exhibits heterogeneous, nonlinear invasion patterns that challenge conventional modeling and radiomic prediction. Most deep learning approaches describe the morphology but rarely capture the dynamical stability of tumor evolution. We propose an AI framework that approximates a latent attractor landscape [...] Read more.
Background/Objectives: Glioblastoma (GBM) exhibits heterogeneous, nonlinear invasion patterns that challenge conventional modeling and radiomic prediction. Most deep learning approaches describe the morphology but rarely capture the dynamical stability of tumor evolution. We propose an AI framework that approximates a latent attractor landscape of GBM morphodynamics—stable basins in a continuous manifold that are consistent with reproducible morphologic regimes. Methods: Multimodal MRI scans from BraTS 2020 (n = 494) were standardized and embedded with a 3D autoencoder to obtain 128-D latent representations. Unsupervised clustering identified latent basins (“attractors”). A neural ordinary differential equation (neural-ODE) approximated latent dynamics. All dynamics were inferred from cross-sectional population variability rather than longitudinal follow-up, serving as a proof-of-concept approximation of morphologic continuity. Voxel-level perturbation quantified local morphodynamic sensitivity, and proof-of-concept control was explored by adding small inputs to the neural-ODE using both a deterministic controller and a reinforcement learning agent based on soft actor–critic (SAC). Survival analyses (Kaplan–Meier, log-rank, ridge-regularized Cox) assessed associations with outcomes. Results: The learned latent manifold was smooth and clinically organized. Three dominant attractor basins were identified with significant survival stratification (χ2 = 31.8, p = 1.3 × 10−7) in the static model. Dynamic attractor basins derived from neural-ODE endpoints showed modest and non-significant survival differences, confirming that these dynamic labels primarily encode the morphodynamic structure rather than fixed prognostic strata. Dynamic basins inferred from neural-ODE flows were not independently prognostic, indicating that the inferred morphodynamic field captures geometric organization rather than additional clinical risk information. The latent stability index showed a weak but borderline significant negative association with survival (ρ = −0.13 [−0.26, −0.01]; p = 0.0499). In multivariable Cox models, age remained the dominant covariate (HR = 1.30 [1.16–1.45]; p = 5 × 10−6), with overall C-indices of 0.61–0.64. Voxel-level sensitivity maps highlighted enhancing rims and peri-necrotic interfaces as influential regions. In simulation, deterministic control redirected trajectories toward lower-risk basins (≈57% success; ≈96% terminal distance reduction), while a soft actor–critic (SAC) agent produced smoother trajectories and modest additional reductions in terminal distance, albeit without matching the deterministic controller’s success rate. The learned attractor classes were internally consistent and clinically distinct. Conclusions: Learning a latent attractor landscape links generative AI, dynamical systems theory, and clinical outcomes in GBM. Although limited by the cross-sectional nature of BraTS and modest prognostic gains beyond age, these results provide a mechanistic, controllable framework for tumor morphology in which inferred dynamic attractor-like flows describe latent organization rather than a clinically predictive temporal model, motivating prospective radiogenomic validation and adaptive therapy studies. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Graphical abstract

23 pages, 9766 KB  
Article
Generalization and Exploitation: Meta-GSAC for Multi-Task UAV Path Planning and Obstacle Avoidance
by Jingyi Huang, Shuangxia Bai, Liangliang Huai, Yujie Cui, Bo Li and Kaifang Wan
Drones 2026, 10(1), 14; https://doi.org/10.3390/drones10010014 - 27 Dec 2025
Viewed by 291
Abstract
Deep reinforcement learning (DRL) is extensively applied in autonomous unmanned aerial vehicle (UAV) control yet faces critical challenges regarding adaptability and generalization in dynamic environments. To address these limitations, this paper proposes the Meta Gated Transformer-XL Soft Actor-Critic (Meta-GSAC) algorithm. This framework integrates [...] Read more.
Deep reinforcement learning (DRL) is extensively applied in autonomous unmanned aerial vehicle (UAV) control yet faces critical challenges regarding adaptability and generalization in dynamic environments. To address these limitations, this paper proposes the Meta Gated Transformer-XL Soft Actor-Critic (Meta-GSAC) algorithm. This framework integrates a Gated Transformer-XL module to capture long-term temporal dependencies from multimodal inputs and incorporates the Reptile algorithm to facilitate multi-task meta-learning. Experimental results demonstrate that Meta-GSAC significantly outperforms standard baselines. Notably, it achieves optimal policy convergence with approximately 50% fewer training epochs while effectively eliminating the high-frequency control oscillations observed in the GSAC baseline. Moreover, the proposed method exhibits superior few-shot adaptation capabilities, enabling the UAV to rapidly adapt to novel task scenarios with minimal gradient updates. Full article
(This article belongs to the Section Artificial Intelligence in Drones (AID))
Show Figures

Figure 1

Back to TopTop