Saved Queries

To address the uncertainty of renewable energy output power and load demand in the park-level multi-source synergistic power supply system and achieve economical system operation, an optimal scheduling method based on the mathematical programming guided imitation learning-oscillation-decaying deep deterministic policy gradient (MPG-OD-DDPG) algorithm is proposed. First, a scheduling model of the park-level multi-source synergistic power supply system is established with the objective of minimizing system operating cost. Second, to address the low sample utilization efficiency of the deep deterministic policy gradient (DDPG), a mathematical programming guided imitation learning (MPGIL) method is introduced to obtain demonstration experience and guide the initial training process. Then, targeting insufficient exploration capability of DDPG, an oscillation-decaying Ornstein–Uhlenbeck noise (OD-OU) is designed to enhance the exploratory capability of the algorithm through the oscillation-decaying process of noise intensity. Finally, according to the scheduling model, the reward function is designed, and the proposed algorithm is used to search for the optimal energy regulation strategy, thereby realizing the optimal scheduling of the park-level multi-source synergistic power supply system. Case study results show that compared with the DDPG, the MPG-OD-DDPG reduces the operating cost by 7.41%. It can effectively exploit the energy time-shift capability of the energy storage system (ESS), reduce the operating cost of the park-level multi-source synergistic power supply system. Full article

(This article belongs to the Section A1: Smart Grids and Microgrids)

►▼ Show Figures

Figure 1

24 pages, 4230 KB

Open AccessArticle

Retention and Distribution of Dopamine-Dependent Reward Memory in Regenerating Planaria

by Kenneth Samuel, Abigail K. Hakes, Easter S. Suviseshamuthu and Maria E. Fichera

Biomolecules 2026, 16(5), 649; https://doi.org/10.3390/biom16050649 (registering DOI) - 27 Apr 2026

Abstract

Memory is generally thought to be stored within centralized neural circuits. However, whether learned behaviors can persist in the absence of a brain remains unresolved. Planaria (Girardia spp.) possess a primitive cephalic ganglion and a remarkable capacity for regeneration, providing a unique system to examine non-cephalic memory retention. The primary aim of this study was to determine whether sucrose-induced conditioned place preference (CPP) is retained in posterior, brainless planarian fragments. Planaria were trained using a Pavlovian conditioning paradigm in which an initially unpreferred surface was paired with a 10% sucrose solution, resulting in a robust shift in surface preference. Following amputation, anterior fragments containing the cephalic ganglion as well as posterior fragments lacking the brain preserved the conditioned preference, demonstrating that reward-associated memory is stored even outside the cephalic nervous system. As a secondary objective, we examined the role of dopaminergic reinforcement using a D1 dopamine receptor antagonist during training. While antagonist-treated planaria failed to develop a CPP, posterior fragments from these amputated planaria likewise showed no conditioned preference, indicating that dopamine-dependent signaling is essential for sucrose-associated memory formation across the body. These results provide support for the hypothesis that reward-associated memory in planaria is distributed beyond the brain and can be modulated by dopaminergic pathways, highlighting the utility of this model for exploring fundamental mechanisms of reward, memory, and potential pharmacological interventions. Full article

(This article belongs to the Special Issue The Planarian Model in Pharmacology, Toxicology, and Neuroscience)

23 pages, 3938 KB

Open AccessArticle

Research on Proximal Policy Optimization Algorithm in Path Planning for UAV-Based Vehicle Tracking

by Dongna Qiao and Hongxin Zhang

Drones 2026, 10(5), 319; https://doi.org/10.3390/drones10050319 - 23 Apr 2026

Viewed by 218

Abstract

Unmanned Aerial Vehicle (UAV) tracking of ground moving targets holds significant applications in domains such as intelligent transportation, logistics distribution, and environmental monitoring, placing greater demands on efficient and stable path-planning methods for vehicular tracking. This study investigates a UAV path tracking approach based on a deep reinforcement learning algorithm, Proximal Policy Optimization (PPO). Starting from the kinematic characteristics of UAVs and ground vehicles, a 3D path planning model was constructed that considers spatial coordinates, velocity, and attitude constraints. A well-designed objective function—including tracking error minimization, energy optimization, and safety distance constraints—was incorporated. By designing the state space, action space, and reward function, the PPO algorithm is capable of adaptive learning in complex environments. Compared with traditional Artificial Potential Field (APF), Q-learning, and TD3 algorithms, PPO better balances exploration and exploitation and demonstrates stronger learning stability and global optimization capability in dynamic multi-obstacle scenarios. Simulation results show that PPO-based UAV path planning outperforms Q-learning and other comparative algorithms in terms of tracking accuracy, convergence speed, and robustness. In specific scenarios, Q-learning achieves a trajectory error of approximately 1 m, TD3 and APF exhibit errors around 0.3 m with noticeable oscillations, and PPO achieves an error of about 0.2 m. The UAV can follow the vehicle trajectory smoothly, with a more continuous path and rapidly converging, stable error curves, indicating the promising application potential of PPO in intelligent UAV control. The PPO-based UAV-tracking path planning method effectively enhances the UAV’s intelligent decision-making and path optimization capabilities, providing new technical approaches and a research foundation for intelligent UAV traffic and cooperative control systems. Full article

►▼ Show Figures

Figure 1

20 pages, 4963 KB

Open AccessArticle

Complex-Scene-Oriented Autonomous Decision-Making Method for UAVs

by Hongwei Qu and Jinlin Zou

Electronics 2026, 15(8), 1757; https://doi.org/10.3390/electronics15081757 - 21 Apr 2026

Viewed by 219

Abstract

The extensive application of unmanned aerial vehicles (UAVs) in power inspection, military operations and environmental monitoring demands stronger robustness and adaptability for autonomous decision-making systems. Existing methods suffer from heavy map dependence, high computational complexity and insufficient exploration and generalization. Traditional approaches based on expert rules and planning algorithms only suit fixed scenarios and degrade severely in complex dynamic environments. To address these problems, this paper proposes a complex-scene-oriented autonomous decision-making method for UAVs (CADU). It builds a closed-loop decision chain by integrating perception, strategy and execution modules, and adopts curiosity mechanism and contrastive learning to enhance exploration and adaptability. Experimental results show that the proposed CADU achieves an average reward of 0.85, a trajectory smoothness of 0.87, a flight stability of 0.85, and a cumulative collision count of

8 \pm 1.2

, which significantly outperforms DDPG, PPO and SAC baselines. It provides a reliable and efficient scheme for UAV autonomous decision-making in complex scenarios. Full article

(This article belongs to the Section Artificial Intelligence)

►▼ Show Figures

Figure 1

29 pages, 488 KB

Open AccessReview

Glucagon-like Peptide-1 and Dual GIP/GLP-1 Receptor Agonists in Brain: Exploring the Expanding Role and Safety in Neuropsychiatry

by Ana Cristina Tudosie, Loredana-Maria Marin, Simona Georgiana Popa and Andreea Loredana Golli

Int. J. Mol. Sci. 2026, 27(8), 3628; https://doi.org/10.3390/ijms27083628 - 18 Apr 2026

Viewed by 566

Abstract

Glucagon-like peptide-1 (GLP-1) and dual GIP/GLP-1 receptor agonists, originally introduced for the management of type 2 diabetes mellitus and obesity, are increasingly recognized for their broader actions within the central nervous system, with emerging implications in neuropsychiatry and neurodegeneration. This review integrates current preclinical and clinical evidence, emphasizing their pharmacodynamic profile, central receptor distribution, and the molecular pathways linking metabolic signaling to neural function. Evidence suggests that GLP-1 receptor activation across key brain regions involved in energy balance and reward modulates multiple neurotransmitter systems, including dopamine and serotonin, as well as glutamatergic and GABAergic transmission, thereby influencing behavior, affective processes, and cognitive function. In parallel, these agents exhibit neuroprotective properties through improved neuronal insulin sensitivity, attenuation of neuroinflammatory pathways, and support of neuroplasticity, alongside effects on limiting pathological protein aggregation. Dual GIP/GLP-1 agonism may further potentiate these central actions through complementary metabolic and synaptic mechanisms. Although pharmacovigilance data have identified isolated neuropsychiatric adverse events, current clinical evidence does not support a consistent causal association. Collectively, incretin-based therapies represent a promising translational approach at the interface of metabolic and neuropsychiatric disorders, warranting further investigation into their long-term central safety, therapeutic efficacy, and clinical relevance. Full article

(This article belongs to the Special Issue Role of the Gut-Islet Axis in and Beyond Metabolic Diseases)

22 pages, 919 KB

Open AccessArticle

Large Autonomous Driving Overtaking Decision and Control System Based on Hierarchical Reinforcement Learning

by Chen-Ning Wang and Xiuhui Tang

Electronics 2026, 15(8), 1711; https://doi.org/10.3390/electronics15081711 - 17 Apr 2026

Viewed by 168

Abstract

To address the bottlenecks of low sample efficiency and poor control accuracy in traditional single-layer reinforcement learning during autonomous driving overtaking, this paper proposes an overtaking decision and control system based on hierarchical reinforcement learning to decouple complex tasks in spatial and temporal dimensions. A heterogeneous two-layer architecture is constructed, where the upper layer adopts the Proximal Policy Optimization algorithm to generate macroscopic discrete decisions, while the lower layer employs Twin Delayed Deep Deterministic Policy Gradient combined with Long Short-Term Memory to achieve smooth continuous control of steering and acceleration by perceiving temporal features of dynamic obstacles. A composite reward mechanism, integrating hard safety constraints and soft efficiency incentives, is designed to balance safety, efficiency, and comfort. Experimental results in complex scenarios with multiple interfering vehicles and random lane-changing behaviors demonstrate that the proposed system improves the training convergence speed by approximately 30% within 500,000 steps compared to single-layer algorithms. In tests across varying traffic densities, the system achieves a 98.3% success rate in medium-density scenarios with a collision rate of only 0.6%. In high-density challenges, the success rate remains above 95%, with the collision rate reduced by about 80% compared to baseline models. Furthermore, the lateral control deviation is strictly limited to within 0.2 m, and the longitudinal safety distance remains stable above 5 m. This system provides a robust, high-efficiency paradigm for autonomous overtaking. Full article

►▼ Show Figures

Figure 1

24 pages, 6370 KB

Open AccessArticle

Ketogenic Diet Promotes Reward Learning by Upregulating Hippocampal CAMK2A Expression and Activating Dopamine Synaptic Signaling

by Yanan Qiao, Yubing Zeng, Chen Chen, Jinying Shen, Yi Wang, Pei Pei and Shan Wang

Int. J. Mol. Sci. 2026, 27(8), 3587; https://doi.org/10.3390/ijms27083587 - 17 Apr 2026

Viewed by 192

Abstract

Various neuromodulatory benefits of the ketogenic diet (KD) have been demonstrated, yet its influence on reward learning and underlying mechanisms remain poorly defined. This study combined proteomics and metabolomics to identify key molecular changes in the hippocampus of KD-fed mice. Our analysis revealed significant upregulation of the “dopaminergic synapse” pathway, with CAMK2A emerging as a central regulator. In vitro, treatment of the hippocampal neuronal cell line HT22 with β-hydroxybutyrate (BHB), a primary KD metabolite, increased the protein expression of CAMK2A and increased the phosphorylation of its downstream target, GluA1. Crucially, Camk2a knockdown completely blocked BHB-induced p-GluA1 enhancement. To determine the behavioral relevance, we stereotaxically delivered AAV-shCamk2a into the hippocampus of KD-fed mice. Knockdown of Camk2a reversed the pro-reward effects of KD, as measured by the sucrose preference test and conditioned place preference test, without impairing general locomotor activity in the open field test. Together, these results suggest a novel BHB–CAMK2A–dopaminergic signaling axis through which KD enhances reward learning, thus bridging systemic metabolism with cognitive function and expanding our understanding of KD-mediated neuromodulation. Full article

(This article belongs to the Section Bioactives and Nutraceuticals)

►▼ Show Figures

Figure 1

25 pages, 1418 KB

Open AccessArticle

Artificial Intelligence-Based Decision Support System for UAV Control in a Simulated Environment

by Przemysław Sujecki and Damian Frąszczak

Sensors 2026, 26(8), 2436; https://doi.org/10.3390/s26082436 - 15 Apr 2026

Viewed by 255

Abstract

Unmanned aerial vehicles (UAVs) are increasingly deployed in missions that require high autonomy and reliable decision-making; however, many operational concepts still assume access to GNSS and stable communication with a human operator. In contested environments, this assumption may no longer hold because GNSS degradation, radio-frequency interference, and intentional jamming can disrupt positioning and communication, thereby reducing mission effectiveness and safety. Recent surveys show that operation in GNSS-denied environments remains a major challenge and often requires alternative perception, localization, and control strategies. In response, this article investigates a reinforcement learning (RL)-based decision-support system for the autonomous control of a quadrotor UAV in a three-dimensional simulated environment. Rather than following pre-programmed waypoints, the UAV learns a control policy through interaction with the environment and reward-driven adaptation. The proposed system is designed for mission execution under uncertainty, limited external guidance, and partial observability. Two policy-gradient approaches are implemented and compared: classical REINFORCE and Proximal Policy Optimization (PPO) with an Actor–Critic architecture. The study presents the simulation environment, state and action representation, reward formulation, staged training procedure, and comparative evaluation. The results indicate that, within the considered unseen test scenario, the PPO-based configuration achieved higher mission effectiveness than REINFORCE in the final unseen test scenario, supporting the practical relevance of structured deep reinforcement learning for UAV operation in GPS-denied and communication-constrained environments. Full article

(This article belongs to the Special Issue UAVs as Mobile Sensing Platforms: Advances, Innovations, and Emerging Applications)

►▼ Show Figures

Figure 1

22 pages, 1136 KB

Open AccessArticle

Co-Optimized Scheduling of a Multi-Microgrid System Based on a Reputation Point Trading Mechanism

by Jiankai Fang, Dongmei Yan, Hongkun Wang, Hui Deng, Xinyu Meng and Hong Zhang

Smart Cities 2026, 9(4), 69; https://doi.org/10.3390/smartcities9040069 - 15 Apr 2026

Viewed by 296

Abstract

With the rapid integration of distributed energy resources, achieving a balance between economic efficiency and environmental sustainability in multi-microgrid (MMG) systems is critical. However, existing studies typically treat microgrid operators as fully compliant entities. They often neglect the “trust-risk” dimension along with potential default behaviors in decentralized markets. This paper proposes a novel co-optimized scheduling model for urban MMG systems, centered on a unified “Social–Economic–Physical” coupling framework. To ensure transaction integrity, a robust reputation evaluation framework is developed using Root Mean Square Error (RMSE), mean absolute error (MAE), plus Dynamic Time Warping (DTW). This framework effectively identifies fraudulent data or contractual breaches. Furthermore, to enhance fairness while promoting decarbonization, the model integrates a dynamic network pricing strategy based on the Shapley value. It works alongside a reputation-weighted reward–penalty step-type carbon trading scheme. The proposed model is formulated as a mixed-integer linear programming (MILP) problem and solved using MATLAB R2025b with CPLEX 12.10. Simulation results demonstrate that the integrated approach significantly optimizes system performance. Total carbon emissions are reduced by 49.6 tons. Meanwhile, revenues for the MMG Alliance, individual microgrids, and shared energy storage operators increase by 4.08% to 33.00%. The proposed framework provides a practical governance solution for Smart City multi-microgrid systems, effectively addressing the “trust-risk” challenge in decentralized urban energy markets. The findings validate that the proposed mechanism effectively fosters a trustworthy trading environment, achieving a “win-win” outcome for economic profitability and urban energy resilience. Full article

(This article belongs to the Section Smart Urban Energies and Integrated Systems)

►▼ Show Figures

Figure 1

29 pages, 10011 KB

Open AccessArticle

Method for Controlling the Movement of an AUV Follower Based on Visual Information About the Position of the AUV Leader Using Reinforcement Learning Methods

by Evgenii Norenko, Vadim Kramar and Aleksey Kabanov

Drones 2026, 10(4), 282; https://doi.org/10.3390/drones10040282 - 14 Apr 2026

Viewed by 341

Abstract

This paper considers the problem of controlling the motion of an autonomous underwater vehicle (AUV) following a leader in a leader–follower scheme based on visual information about the leader’s position. It is assumed that the leader is equipped with a system of light markers with known geometry, and the follower determines its relative position based on data from an onboard camera without using a hydroacoustic communication channel or direct exchange of navigation information. To synthesize the control law, a reinforcement learning method based on the Proximal Policy Optimization algorithm is used. Policy learning is performed in a simulation environment, taking into account the dynamic model of the agent in the horizontal plane and observation noise. A structure of state space, actions, and reward function is proposed, aimed at minimizing the error in relative position and orientation. Additionally, Bayesian optimization of the weight coefficients of the reward function is performed. Bayesian optimization of the reward function weights reduces the RMS tracking error from 0.24 m to 0.09 m and demonstrates that heading regulation has a significantly stronger impact on stability than position penalties. The results of modeling, testing in the Webots environment, and experiments on MiddleAUV class devices confirm the feasibility and scalability of the approach. It is shown that a single trained policy ensures stable formation maintenance when the number of follower agents and initial conditions change without additional retraining. Full article

(This article belongs to the Special Issue Intelligent Cooperative Technologies of UAV Swarm Systems)

►▼ Show Figures

Figure 1

21 pages, 2353 KB

Open AccessArticle

An Adaptive Bidding Strategy for Virtual Power Plants in Day-Ahead Markets Under Multiple Uncertainties

by Wei Yang and Wenjun Wang

Energies 2026, 19(8), 1878; https://doi.org/10.3390/en19081878 - 12 Apr 2026

Viewed by 473

Abstract

To address the challenges posed by multiple uncertainties in modern power systems to the market bidding of Virtual Power Plants (VPPs), this paper proposes an adaptive bidding strategy based on Deep Reinforcement Learning (DRL). First, a heterogeneous VPP aggregation model integrating dedicated energy storage, Vehicle-to-Grid (V2G), and flexible loads is constructed, incorporating complex physical and operational constraints. Second, to overcome the “myopic” local optimality problem of traditional DRL in temporal arbitrage tasks, a potential-based reward shaping mechanism linked to future price trends is designed to guide the agent toward long-term optimal strategies. Finally, multi-dimensional comparative experiments and mechanism analyses are conducted in a simulated day-ahead electricity market. Simulation results demonstrate the following: (1) The proposed algorithm exhibits robust convergence stability and effectively handles stochastic noise in market prices and renewable generation. (2) Economically, the strategy significantly outperforms the rule-based strategy and remains highly competitive with the deterministic-optimization benchmark under perfect-information assumptions. (3) Mechanism analysis further reveals that the DRL agent breaks through the rigid logic of fixed thresholds, learning a non-linear dynamic game mechanism based on “Price-SOC” states, thereby achieving full-depth utilization of energy storage resources. This work provides an interpretable data-driven paradigm for intelligent VPP decision-making in uncertain environments. Full article

(This article belongs to the Special Issue Transforming Power Systems and Smart Grids with Deep Learning)

►▼ Show Figures

Figure 1

36 pages, 3241 KB

Open AccessArticle

Optimizing Risk–Return Tradeoffs in Wind–Storage Bidding: A Soft Actor–Critic Approach

by Tongtao Ma, Zongxing Li, Dunnan Liu, Zetian Zhao, Yuting Li, Wantong Cai and Qun Li

Energies 2026, 19(8), 1861; https://doi.org/10.3390/en19081861 - 10 Apr 2026

Viewed by 295

Abstract

Strategic bidding for wind–battery hybrid systems is increasingly critical as electricity spot markets transition toward market-oriented mechanisms, particularly in Chinese pilot regions. However, dual uncertainties—wind generation variability and volatile locational marginal prices (LMPs)—expose market participants to significant financial tail risk. This study develops a risk-constrained reinforcement learning framework for optimal bidding of wind–storage hybrid systems. We employ soft actor–critic (SAC) for continuous action control and integrate conditional value-at-risk (CVaR) into reward design to explicitly penalize low-probability, high-loss outcomes. The framework incorporates realistic operational constraints, including linearized battery degradation costs and a market-compatible single-bid abstraction for hourly settlement. Using one-year historical operational data from a 150 MW wind farm (with a 91-day test period), we find that storage integration increases annual profit by 108.4–114.2% relative to wind-only operation. Critically, the SAC–CVaR policy (η = 0.35) preserves 97.3% of risk-neutral profit ($7.71 M vs. $7.93 M) while substantially mitigating downside risk: CVaR@95% improves by 42.4% (−$549 vs. −$952) and VaR@95% improves by 30.1% (−$275 vs. −$393). The trained policy achieves sub-millisecond inference (0.262 ms per decision, ~3820 decisions/s), corresponding to a 3.8 × 10⁴–5.7 × 10⁴× speedup over optimization-based solvers (10–15 s per decision), enabling real-time deployment. Behavioral analysis reveals that the agent learns adaptive, forecast-normalized bidding strategies with more conservative reporting in high-price regimes and counter-cyclical battery dispatch patterns, demonstrating effective coordination between profitability and risk control under volatile market conditions. Full article

(This article belongs to the Topic AI and Computational Methods for Modelling, Simulations and Optimizing of Advanced Systems: Innovations in Complexity, 2nd Edition)

►▼ Show Figures

Figure 1

32 pages, 9226 KB

Open AccessArticle

Regenerative–Frictional Brake Blending in Electric Vehicles Considering Energy Recovery and Dynamic Battery Charging Limit: A Reinforcement Learning-Based Approach

by Farshid Naseri, Bjartur Ragnarsson a Nordi, Konstantinos Spiliotopoulos and Erik Schaltz

Machines 2026, 14(4), 416; https://doi.org/10.3390/machines14040416 - 9 Apr 2026

Viewed by 482

Abstract

This paper presents the design, development, and evaluation of a Reinforcement Learning (RL)–based torque-split controller for the regenerative braking system (RBS) in battery electric vehicles (BEVs). The controller employs a Deep Deterministic Policy Gradient (DDPG) agent to distribute the braking demand between regenerative and frictional braking systems with the aim of maximizing energy recovery while adhering to the physical and operational constraints. To capture the charging limitation of the battery, a State-of-Power (SoP) calculation mechanism is incorporated, providing a time-varying bound on the regenerative charge power. The agent is trained in a MATLAB/Simulink environment representing the digital twin of a BEV drivetrain, and considers a mix of different braking scenarios, i.e., light braking, medium braking, hard braking, and emergency braking. The RL’s reward shaping promotes efficient utilization of the SoP-limited regenerative capability while discouraging constraint violations and aggressive control behavior. Across a range of State-of-Charge (SoC) conditions and driving cycles, including the Worldwide Harmonized Light–Vehicle Test Procedure (WLTP) and synthetic random-rich driving cycle, the RL controller consistently delivers promising performance, yielding energy recovery of up to ~98% of the total braking energy available on WLTP type 3 driving cycle while being able to operate closely to the battery SoP limit. The results demonstrate the proposed controller’s capability for adaptive, constraint-aware energy management in BEVs and underline its potential for future intelligent braking strategies. Full article

(This article belongs to the Special Issue Next-Level Energy Storage Solutions for Electric Road and Maritime Mobility: Innovations in Cost, Performance and Safety)

►▼ Show Figures

Figure 1

36 pages, 8897 KB

Open AccessArticle

Evolutionary Game Analysis of AI-Generated Disinformation Governance on UGC Platforms Based on Prospect Theory

by Licai Lei, Yanyan Wu and Shang Gao

Systems 2026, 14(4), 416; https://doi.org/10.3390/systems14040416 - 9 Apr 2026

Viewed by 456

Abstract

While Generative Artificial Intelligence technology empowers content production on user-generated content platforms, it also gives rise to novel risks of disinformation dissemination. The effective governance of these risks is critical to ensuring the cybersecurity of the online ecosystem and maintaining long-term social stability. To address the collaborative governance dilemma, this study constructs a tripartite “platform-user-government” evolutionary game model based on prospect theory. It explores the evolutionarily stable strategies and stability conditions of each actor, supplemented by numerical simulations and practical case validation. The results indicate that: (1) under specific conditions, the system can converge to an ideal equilibrium {active platform governance, engaged user participation, stringent government supervision}; (2) the government’s reward–penalty mechanisms can drive the system towards this ideal equilibrium; (3) users’ digital literacy is a key variable influencing the system’s evolutionary path; (4) both the risk preference coefficient (β) and loss aversion coefficient (λ) from prospect theory have a significant moderating effect on the system’s evolution. Finally, targeted recommendations are proposed for the three aforementioned stakeholders to accelerate the improvement of China’s collaborative governance of the content ecosystem. Full article

(This article belongs to the Special Issue Advancing Open Innovation in the Age of AI and Digital Transformation)

►▼ Show Figures

Figure 1

23 pages, 3301 KB

Open AccessArticle

Hierarchical Active Perception and Stability Control for Multi-Robot Collaborative Search in Unknown Environments

by Zeyu Xu, Kai Xue, Ping Wang and Decheng Kong

Actuators 2026, 15(4), 209; https://doi.org/10.3390/act15040209 - 7 Apr 2026

Viewed by 388

Abstract

Multi-robot systems (MRS) have attracted a lot of attention from researchers due to their widespread application in various environments. However, in multi-robot collaborative search tasks, two problems often arise: sparse rewards for capturing targets and control oscillations. To address these issues, this paper proposes the hierarchical active perception multi-agent deep deterministic policy gradient (HAP-MADDPG) framework. This framework guides robots to efficiently explore maps and discover targets through global utility planning based on global exploration rate and local information aggregation based on local exploration rate. A stability control mechanism, which includes hysteresis logic and reward decay, is introduced to suppress control oscillations. Experimental results show that the HAP-MADDPG framework achieves a success rate of 96.25% and an average search time of 216.3 steps. The path trajectories are smooth, demonstrating the effectiveness of the proposed approach. Full article

(This article belongs to the Special Issue Intelligent Planning and Collaborative Control for Unmanned Swarm Systems)

►▼ Show Figures

Graphical abstract

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 34.

Go to page 1 2 3 4 5

Search Results (1,665)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI