Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (139)

Search Parameters:
Keywords = partially observable Markov decision processes

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 4268 KB  
Article
A Bumblebee-Inspired Spatial Memory Navigation Framework for Robotic Odor Source Localization
by Tianyi Xu, Yizhu Guo, Zhigang Wu and Jianing Wu
Biomimetics 2026, 11(5), 350; https://doi.org/10.3390/biomimetics11050350 - 18 May 2026
Viewed by 198
Abstract
Odor source localization in turbulent environments remains a major challenge for autonomous robots, as odor plumes are highly intermittent, spatially fragmented, and often lack stable concentration gradients. Here, we propose a bio-inspired navigation framework that translates key principles of bumblebee olfactory cognition into [...] Read more.
Odor source localization in turbulent environments remains a major challenge for autonomous robots, as odor plumes are highly intermittent, spatially fragmented, and often lack stable concentration gradients. Here, we propose a bio-inspired navigation framework that translates key principles of bumblebee olfactory cognition into robotic decision-making. First, classical conditioning and olfactorily triggered spatial memory experiments demonstrated that bumblebees could form robust odor memories and that training frequency is positively correlated with both proboscis extension response retention and spatial directional preference. Based on these biological findings, a bio-inspired navigation framework, termed Bio-Nav, is constructed by integrating a Partially Observable Markov Decision Process, a Hidden Markov Model, short-term memory, long-term directional reference memory, fuzzy inference, and value iteration. High-fidelity two-dimensional turbulent simulations show that the proposed algorithm substantially outperforms moth-inspired search, Infotaxis, and standard POMDP-based navigation. In 100 Monte Carlo trials, Bio-Nav achieved a success rate of 96.0%, an average of 20.3 search steps, an average path length of 155.1 cm, and a path-to-straight-line distance ratio of 1.6. Even under strong turbulence, the success rate remained above 91%. These results indicate that memory–perception coupling, inspired by bumblebee navigation, provides an effective and robust strategy for odor source localization in complex turbulent environments, offering a generalizable principle for bio-inspired robotic search under uncertainty. Full article
(This article belongs to the Special Issue Bio-Inspired Robotics and Applications 2026)
Show Figures

Graphical abstract

20 pages, 4232 KB  
Article
Coordinated Active Voltage Control Strategy for Active Distribution Networks Based on Multi-Agent Actor-Critic with Multi-Head Attention
by Jianli Zhao, Jiani Xiang, Qing Wang, Weijian Tao and Qian Ai
Electronics 2026, 15(10), 2026; https://doi.org/10.3390/electronics15102026 - 9 May 2026
Viewed by 252
Abstract
High penetration of distributed photovoltaic (PV) generation in active distribution networks (ADNs) has intensified voltage violations and rapid voltage fluctuations, especially under extreme reverse-power-flow conditions. Traditional centralized voltage regulation methods rely on accurate physical network parameters and wide-area communication, making it difficult to [...] Read more.
High penetration of distributed photovoltaic (PV) generation in active distribution networks (ADNs) has intensified voltage violations and rapid voltage fluctuations, especially under extreme reverse-power-flow conditions. Traditional centralized voltage regulation methods rely on accurate physical network parameters and wide-area communication, making it difficult to achieve fast online coordination under rapidly changing operating conditions. To address this issue, this paper proposes a coordinated active voltage control strategy for ADNs based on multi-agent actor-critic learning with a multi-head attention mechanism. The PV-cluster reactive power coordination problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and a reward function combining a bowl-shaped voltage barrier term, a voltage-stability safety term, and an equipment-utilization regularization term is designed. In addition, the multi-head attention mechanism is used to extract state-dependent decision relevance among PV agents, thereby reducing redundant information in high-dimensional state spaces. Case studies on IEEE 33-node and 141-node systems demonstrate that the proposed method outperforms both OPF and benchmark DRL methods in voltage regulation performance. Additional ablation, interpretability, and online-time analyses further verify the contributions of the attention module and the voltage barrier reward design. Full article
Show Figures

Figure 1

19 pages, 3333 KB  
Article
Energy-Harvesting-Assisted UAV Swarm Anti-Jamming Communication Based on Multi-Agent Reinforcement Learning
by Yongfang Li, Tianyu Zhao, Zhijuan Wu, Yan Lin and Yijin Zhang
Drones 2026, 10(4), 294; https://doi.org/10.3390/drones10040294 - 16 Apr 2026
Viewed by 560
Abstract
Considering that the unmanned aerial vehicles (UAVs) are susceptible to both co-channel interference and malicious jamming with limited onboard battery energy, this paper proposes an energy-harvesting-assisted anti-jamming communication framework for UAV swarm networks. Specifically, we first model the problem as a decentralized partially [...] Read more.
Considering that the unmanned aerial vehicles (UAVs) are susceptible to both co-channel interference and malicious jamming with limited onboard battery energy, this paper proposes an energy-harvesting-assisted anti-jamming communication framework for UAV swarm networks. Specifically, we first model the problem as a decentralized partially observable Markov decision process (Dec-POMDP), aiming to achieve a long-term trade-off between data transmission success rate and energy consumption. Then we propose a multi-agent independent advantage actor–critic (IA2C)-based energy-harvesting-assisted anti-jamming communication solution, which enables each cluster head (CH) to learn its transmit channel, power, and energy harvesting time policy independently. By constructing a time-space-based extended Dec-POMDP, the spatiotemporal correlations among neighboring nodes are learned by allowing adjacent agents to share discounted local observations. Extensive simulations show that, compared with the benchmark schemes, the proposed scheme improves the average cumulative reward and average cumulative success rate by 17.26% and 10.37%, respectively, while achieving a higher transmission success rate with lower energy consumption under different numbers of available channels. Full article
(This article belongs to the Special Issue Intelligent Spectrum Management in UAV Communication)
Show Figures

Figure 1

30 pages, 2640 KB  
Article
Environment-Aware Optimal Placement and Dynamic Reconfiguration of Underwater Robotic Sonar Networks Using Deep Reinforcement Learning
by Qiming Sang, Yu Tian, Jin Zhang, Yuyang Xiao, Zhiduo Tan, Jiancheng Yu and Fumin Zhang
J. Mar. Sci. Eng. 2026, 14(8), 733; https://doi.org/10.3390/jmse14080733 - 15 Apr 2026
Viewed by 380
Abstract
Underwater dynamic target detection, classification, localization, and tracking (DCLT) is central to maritime surveillance and monitoring and increasingly relies on distributed AUV-based robotic sonar networks operating in passive listening and, when required, cooperative multistatic modes. Achieving a robust performance in realistic oceans remains [...] Read more.
Underwater dynamic target detection, classification, localization, and tracking (DCLT) is central to maritime surveillance and monitoring and increasingly relies on distributed AUV-based robotic sonar networks operating in passive listening and, when required, cooperative multistatic modes. Achieving a robust performance in realistic oceans remains challenging, because sensor placement must adapt to time-varying acoustic conditions and target priors while preserving acoustic communication connectivity, and because frequent reconfiguration under dynamic currents makes classical large-scale planning computationally expensive. This paper presents an integrated deep reinforcement learning (DRL)-based framework for passive-stage sonar placement and dynamic reconfiguration in distributed AUV networks. First, we cast placement as a constructive finite-horizon Markov decision process (MDP) and train a Proximal Policy Optimization (PPO) agent to sequentially build a collision-free layout on a discretized surveillance grid. The terminal reward is formulated to jointly optimize the environment-aware detection performance, computed from BELLHOP-based transmission loss models, and global network connectivity, quantified using algebraic connectivity. Second, to enable time-critical reconfiguration, we estimate flow-aware motion costs for all AUV–destination pairs using a PPO with a Long Short-Term Memory (LSTM) trajectory policy trained for partial observability. The learned policy can be deployed onboard, allowing each AUV to refine its path online using locally sensed currents, improving robustness to ocean-model uncertainty. The resulting cost matrix is solved via an efficient zero-element assignment method to obtain the optimal one-to-one reassignment. In the reported simulation studies, the proposed Sequential PPO placement method achieves a final reward 16–21% higher than Particle Swarm Optimization (PSO) and 2–3.7% higher than the Genetic Algorithm (GA), while the proposed PPO + LSTM planner reduces average travel time by 30.44% compared with A*. The proposed closed-loop architecture supports frequent re-optimization, scalable fleet operation, and a seamless transition to communication-supported cooperative multistatic tracking after detection, enabling efficient, adaptive DCLT in dynamic marine environments. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

28 pages, 11994 KB  
Article
Multi-UAV Cooperative Path Planning Method Based on an Improved MADDPG Algorithm
by Feiqiao Zhang, Qian Wang and Xin Ma
Electronics 2026, 15(8), 1632; https://doi.org/10.3390/electronics15081632 - 14 Apr 2026
Viewed by 452
Abstract
To address cooperative path planning for multiple UAVs in complex environments, this paper proposes an improved multi-agent deep deterministic policy gradient algorithm, named Prioritized Experience Multi-Agent Deep Deterministic Policy Gradient (PE-MADDPG). An urban low-altitude inspection environment is first constructed within a reinforcement-learning framework, [...] Read more.
To address cooperative path planning for multiple UAVs in complex environments, this paper proposes an improved multi-agent deep deterministic policy gradient algorithm, named Prioritized Experience Multi-Agent Deep Deterministic Policy Gradient (PE-MADDPG). An urban low-altitude inspection environment is first constructed within a reinforcement-learning framework, in which dynamic constraints, safety-separation requirements, and formation-cooperation objectives are incorporated into a partially observable Markov decision process. To improve training effectiveness, prioritized experience replay is introduced to increase the utilization of informative samples, an adaptive exploration-noise strategy is designed to regulate exploration intensity, and a multi-head attention mechanism is embedded in the Critic network to enhance the representation of inter-agent interactions. Simulation results in a three-dimensional urban inspection scenario show that PE-MADDPG outperforms the selected benchmark methods in task completion rate, formation maintenance, flight efficiency, and energy consumption. These results provide an effective solution for urban low-altitude inspection tasks. Full article
Show Figures

Figure 1

26 pages, 5800 KB  
Article
Agentic AI-Based IoT Precision Agriculture Framework—Our Vision and Challenges
by Danco Davcev, Slobodan Kalajdziski, Ivica Dimitrovski, Ivan Kitanovski and Kosta Mitreski
AgriEngineering 2026, 8(4), 147; https://doi.org/10.3390/agriengineering8040147 - 9 Apr 2026
Viewed by 1675
Abstract
Accurate, timely, and resource-efficient decision-making is critical for sustainable precision agriculture. This paper proposes an agentic AI-based Internet of Things (IoT) framework that enables coordinated, closed-loop perception–decision–action processes across heterogeneous sensing and actuation components. The framework models agricultural systems as distributed collections of [...] Read more.
Accurate, timely, and resource-efficient decision-making is critical for sustainable precision agriculture. This paper proposes an agentic AI-based Internet of Things (IoT) framework that enables coordinated, closed-loop perception–decision–action processes across heterogeneous sensing and actuation components. The framework models agricultural systems as distributed collections of goal-driven agents responsible for multimodal sensing, uncertainty-aware reasoning, and adaptive decision-making. To provide a structured foundation, the proposed architecture is formalized within a Multi-Agent Partially Observable Markov Decision Process (MPOMDP) perspective, enabling systematic treatment of coordination, uncertainty, and decision policies. The framework integrates multimodal information sources, including vision-based perception and environmental sensing, and defines mechanisms for their fusion and use in system-level decision-making. A proof-of-concept instantiation is presented using publicly available datasets, combining visual perception models and tabular reasoning models within the proposed agentic workflow. The experiments are designed to demonstrate the feasibility, modularity, and coordination capabilities of the framework, rather than to benchmark predictive performance or provide field-validated evaluation. The results illustrate how multimodal information can be integrated to support adaptive and resource-aware decision processes. Finally, the paper discusses key challenges and outlines directions for future work, including real-world deployment, integration with physical actuation systems, and validation under operational conditions. Full article
(This article belongs to the Special Issue The Future of Artificial Intelligence in Agriculture, 2nd Edition)
Show Figures

Figure 1

28 pages, 4423 KB  
Article
A Neighbor Feature Aggregation-Based Multi-Agent Reinforcement Learning Method for Fast Solution of Distributed Real-Time Power Dispatch Problem
by Baisen Chen, Chenghuang Li, Qingfen Liao, Wenyi Wang, Lingteng Ma and Xiaowei Wang
Electronics 2026, 15(7), 1415; https://doi.org/10.3390/electronics15071415 - 28 Mar 2026
Viewed by 335
Abstract
To address the challenges posed by the strong uncertainty of high-proportion renewable energy sources (RES) to the secure and stable operation of distributed real-time power dispatch (D-RTPD) in new-type power systems, this paper proposes an integrated solution combining a neighborhood feature aggregation-based graph [...] Read more.
To address the challenges posed by the strong uncertainty of high-proportion renewable energy sources (RES) to the secure and stable operation of distributed real-time power dispatch (D-RTPD) in new-type power systems, this paper proposes an integrated solution combining a neighborhood feature aggregation-based graph attention network (NFA-GAT) and multi-agent deep deterministic policy gradient (MADDPG). First, the D-RTPD problem is modeled as a decentralized partially observable Markov decision process (Dec-POMDP), which effectively captures the stochastic game characteristics of multi-regional agents and the partial observability of grid states. Second, the NFA-GAT is designed to enhance agents’ perception of grid operating states: by introducing a spatial discount factor, it realizes rational aggregation of multi-order neighborhood information while modeling the attenuation of electrical quantity influence with topological distance. Third, a prior-guided mechanism is integrated into the MADDPG framework to eliminate constraint-violating actions by setting their actor logits to negative infinity, improving training efficiency and strategy reliability. Simulation validations on the IEEE 118-bus test system (75.2% RES installed capacity ratio) show that the proposed method achieves efficient training convergence. Compared with the multi-layer perceptron (MLP) structure, it attains higher cumulative reward values and scenario win rates. When compared with traditional model-driven (ADMM) and data-driven (Q-MIX) methods, the proposed method balances solution efficiency, operational safety (98.7% maximum line load rate, zero power flow violation rate), and economic performance ($12,845 daily dispatch cost), providing a reliable technical support for D-RTPD under high-proportion RES integration. Full article
Show Figures

Figure 1

35 pages, 1839 KB  
Article
Adversarially Robust Reinforcement Learning for Energy Management in Microgrids with Voltage Regulation Under Partial Observability
by Elida Domínguez, Xiaotian Zhou and Hao Liang
Energies 2026, 19(6), 1497; https://doi.org/10.3390/en19061497 - 17 Mar 2026
Viewed by 521
Abstract
Modern microgrids increasingly rely on learning-based energy management systems (EMSs) for real-time decision-making, yet remain vulnerable to cyber–physical disturbances, sensor tampering, and model uncertainty. Existing resilient control and robust reinforcement learning methods provide useful foundations, but rarely address adversarial measurement perturbations that distort [...] Read more.
Modern microgrids increasingly rely on learning-based energy management systems (EMSs) for real-time decision-making, yet remain vulnerable to cyber–physical disturbances, sensor tampering, and model uncertainty. Existing resilient control and robust reinforcement learning methods provide useful foundations, but rarely address adversarial measurement perturbations that distort belief evolution under partial observability. This gap is critical, as structured perturbations in sensing channels can destabilize learning-based policies and propagate into voltage-regulation violations. This paper proposes an adversarially robust reinforcement learning framework for energy management with voltage regulation under partial observability in microgrids. The EMS decision-making problem is formulated as a partially observable Markov decision process (POMDP) that accounts for adversarial measurement perturbations, belief evolution, and system-level economic and voltage constraints. To avoid excessive conservatism under worst-case uncertainty, an adversary-aware belief construction based on adversarial belief balancing (A3B) is employed to focus on policy-relevant perturbations. Building on this belief representation, an adversarially robust learning framework is developed by incorporating adversarial counterfactual error (ACoE) as a learning regularization mechanism, enabling a balance between nominal operating efficiency and robustness under adversarial measurement distortion. The case study is conducted on a medium-voltage radial distribution feeder (IEEE 123-Node Test Feeder). Case study results demonstrate that the proposed ACoE-regularized policies substantially reduce voltage-deficit events, improve policy stability, and maintain operational constraints under adversarial perturbations, consistently outperforming standard proximal policy optimization (PPO)-based controllers. These results indicate that counterfactual-aware, belief-based learning substantially enhances voltage quality and operational resilience in microgrids with high penetration of distributed energy resources. Full article
(This article belongs to the Special Issue Transforming Power Systems and Smart Grids with Deep Learning)
Show Figures

Figure 1

24 pages, 2763 KB  
Article
Dynamic Hierarchical Fusion for Space Multi-Target Passive Tracking with Limited Field-of-View
by Jizhe Wang, Di Zhou, Runle Du and Jiaqi Liu
Aerospace 2026, 13(3), 282; https://doi.org/10.3390/aerospace13030282 - 17 Mar 2026
Viewed by 331
Abstract
Space-based multi-target passive tracking is critical for space situational awareness, but faces severe challenges due to the limited field-of-view (FoV) and directional ambiguity of onboard sensors. These constraints often lead to target loss, poor observability, and decreased estimation accuracy. To address these issues, [...] Read more.
Space-based multi-target passive tracking is critical for space situational awareness, but faces severe challenges due to the limited field-of-view (FoV) and directional ambiguity of onboard sensors. These constraints often lead to target loss, poor observability, and decreased estimation accuracy. To address these issues, different fusion architectures have been explored. While centralized measurement-level fusion offers superior accuracy for estimating target states, distributed estimation-level fusion provides greater reliability for estimating the number of targets. To adaptively leverage these two complementary strengths, a dynamic hierarchical fusion method through real-time optimization of the fusion topology is proposed. Specifically, at each decision epoch, sensor nodes are dynamically partitioned into local fusion nodes (LFNs) and detection-only nodes (DONs). Each LFN receives measurements from selected DONs and executes an iterated-correction Gaussian-mixture probability hypothesis density filter. Subsequently, LFNs share and fuse their estimates using the intensity-dependent arithmetic average fusion. This dynamic process is achieved by applying a sensor management scheme based on partially observable Markov decision process (POMDP). To ensure accurate cardinality estimation, the reward function in POMDP utilizes the posterior expected number of targets. The resultant optimization is efficiently solved using a binary particle swarm optimization algorithm. Numerical and hardware-in-the-loop simulations demonstrate the effectiveness of the proposed method in balancing the accuracy of target number and state estimation. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

82 pages, 6808 KB  
Article
Agentic Finance: An Adaptive Inference Framework for Bounded-Rational Investing Agents
by Samuel Montañez Jacquez, John H. Clippinger and Matthew Moroney
Entropy 2026, 28(3), 321; https://doi.org/10.3390/e28030321 - 12 Mar 2026
Cited by 2 | Viewed by 1431
Abstract
We propose Adaptive Inference, a portfolio management framework extending Active Inference to non-stationary financial environments. The framework integrates inference, control, and execution under endogenous uncertainty, modeling investment decisions as coupled dynamics of belief updating, preference encoding, and action selection rather than optimization [...] Read more.
We propose Adaptive Inference, a portfolio management framework extending Active Inference to non-stationary financial environments. The framework integrates inference, control, and execution under endogenous uncertainty, modeling investment decisions as coupled dynamics of belief updating, preference encoding, and action selection rather than optimization over fixed objectives. In this approach, portfolio behavior is governed by the expected free energy (EFE) minimization, showing that classical valuation models emerge as limiting cases when epistemic components vanish. Using train–test evaluation on the ARKK Innovation ETF (2015–2025), we identify a Passivity Paradox: frozen belief transfer outperforms naive adaptive learning. A Professional Agent achieves a Sharpe ratio of 0.39 while its adaptive counterpart degrades to 0.28, reflecting belief contamination when learning from policy-dependent signals. Crucially, the architecture is not designed to generate alpha but to perform endogenous risk management that mitigates overtrading under regime ambiguity and distributional shift. Adaptive Inference Agents maintain long exposure most of the time while tactically reducing positions during high-entropy periods, implementing uncertainty-aware passive investing. All agents reduce realized volatility relative to ARKK Buy-and-Hold (43.0% annualized). Cross-asset validation on the S&P 500 ETF (SPY) shows that inference-guided risk shaping achieves a positive Entropic Sharpe Ratio (ESR), defined as excess return per unit of informational work, thereby quantifying the economic value of information under thermodynamic constraints on inference. Full article
Show Figures

Figure 1

22 pages, 2733 KB  
Article
Attention-Enhanced Multi-Agent Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks
by Wenwen Chen, Hao Niu, Linbo Liu, Jianglong Lin and Huan Quan
Mathematics 2026, 14(5), 839; https://doi.org/10.3390/math14050839 - 1 Mar 2026
Viewed by 559
Abstract
The increasing penetration of inverter-interfaced photovoltaic (PV) generation in active distribution networks (ADNs) intensifies fast voltage violations and makes real-time Volt-VAR control (VVC) challenging, especially when each inverter has only partial and noisy measurements and communication is limited. Existing local droop-type strategies lack [...] Read more.
The increasing penetration of inverter-interfaced photovoltaic (PV) generation in active distribution networks (ADNs) intensifies fast voltage violations and makes real-time Volt-VAR control (VVC) challenging, especially when each inverter has only partial and noisy measurements and communication is limited. Existing local droop-type strategies lack coordination, while fully centralized optimization/learning is often impractical for online deployment. To address these gaps, an attention-enhanced multi-agent deep reinforcement learning (MADRL) framework is developed for inverter-based VVC under the centralized training and decentralized execution (CTDE) paradigm. First, the voltage regulation problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP) to explicitly account for system stochasticity and temporal variability under partial observability. To solve this complex game, an attention-enhanced MADRL architecture is employed, where an agent-level attention mechanism is integrated into the centralized critic. Unlike traditional methods that treat all neighbor information equally, the proposed mechanism enables each inverter agent to dynamically prioritize and selectively focus on the most influential states from other agents, effectively capturing complex intercorrelations while enhancing training stability and learning efficiency. Operating under the CTDE paradigm, the framework realizes coordinated reactive power support using only local measurements, ensuring high scalability and practical implementability in communication-constrained environments. Simulations on the IEEE 33-bus system with six PV inverters show that the proposed method reduces the average voltage deviation on the test set from 0.0117 p.u. (droop control) and 0.0112 p.u. (MADDPG) to 0.0074 p.u., while maintaining millisecond-level execution time comparable to other MADRL baselines. Scalability tests with up to 12 agents further demonstrate robust performance of the proposed method under higher PV penetration. Full article
Show Figures

Figure 1

22 pages, 1345 KB  
Article
Multi-UAVs Searching and Tracking for USV Swarm: A Center-Sub-Critics Reinforcement Learning Approach
by Ye Hou, Bo Li and Xueru Miao
Drones 2026, 10(2), 123; https://doi.org/10.3390/drones10020123 - 11 Feb 2026
Viewed by 767
Abstract
This work proposes a multiple unmanned aerial vehicles (UAVs) cooperative trajectory planning scheme constructed by multi-agent reinforcement learning with hybrid critics, improving the searching and tracking efficiency and fairness when the dynamic unmanned surface vehicle (USV) swarm exceeds the number of UAVs. A [...] Read more.
This work proposes a multiple unmanned aerial vehicles (UAVs) cooperative trajectory planning scheme constructed by multi-agent reinforcement learning with hybrid critics, improving the searching and tracking efficiency and fairness when the dynamic unmanned surface vehicle (USV) swarm exceeds the number of UAVs. A confidence map of targets’ existence probability with spatio-temporal decay is first established through a local information fusion mechanism based on Bayesian update theory. It leads to a reformulation of the problem model into a communication-enhanced partially observable Markov decision process. To suppress policy variance and credibility imbalance of the multi-UAVs, a center-sub-critics deep deterministic policy gradient algorithm is then proposed, combining multiple centralized critics with decentralized critics. Meanwhile, a segmented reward function is designed to incentivize the UAV to revisit detected targets. Finally, the simulation results compared with diverse baseline algorithms demonstrate the efficacy and scalability of the proposed scheme in this paper. Full article
(This article belongs to the Section Artificial Intelligence in Drones (AID))
Show Figures

Figure 1

29 pages, 627 KB  
Review
Learning-Based Multi-Robot Active SLAM: A Conceptual Framework and Survey
by Bowen Lv and Shihong Duan
Appl. Sci. 2026, 16(3), 1412; https://doi.org/10.3390/app16031412 - 30 Jan 2026
Cited by 1 | Viewed by 1267
Abstract
Multi-robot systems (MRSs) offer distinct advantages in large-scale exploration but require tight coupling between decentralized decision-making and collaborative estimation. This survey reviews learning-based multi-robot Active Collaborative Simultaneous Localization and Mapping (AC-SLAM), modeling it as a coupled system comprising a Decentralized Partially Observable Markov [...] Read more.
Multi-robot systems (MRSs) offer distinct advantages in large-scale exploration but require tight coupling between decentralized decision-making and collaborative estimation. This survey reviews learning-based multi-robot Active Collaborative Simultaneous Localization and Mapping (AC-SLAM), modeling it as a coupled system comprising a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) decision layer and a distributed factor-graph estimation layer. By synthesizing these components into a conceptual framework, recent methods for cooperative perception, mapping, and policy learning are systematically critiqued. The analysis concludes that Hierarchical Reinforcement Learning (HRL) and graph-based spatial abstraction currently offer superior scalability and robustness compared to monolithic end-to-end approaches. Furthermore, a comprehensive analysis of Sim-to-Real transfer strategies is provided, ranging from domain randomization to emerging Real-to-Sim techniques based on NeRF and 3D Gaussian Splatting. Finally, future directions are outlined, moving from geometric mapping toward LLM-driven active semantic understanding and dynamic digital twins to bridge the reality gap. Full article
(This article belongs to the Special Issue Applications of Robot Navigation in Autonomous Systems)
Show Figures

Figure 1

28 pages, 2028 KB  
Article
Dynamic Resource Games in the Wood Flooring Industry: A Bayesian Learning and Lyapunov Control Framework
by Yuli Wang and Athanasios V. Vasilakos
Algorithms 2026, 19(1), 78; https://doi.org/10.3390/a19010078 - 16 Jan 2026
Viewed by 402
Abstract
Wood flooring manufacturers face complex challenges in dynamically allocating resources across multi-channel markets, characterized by channel conflicts, demand uncertainty, and long-term cumulative effects of decisions. Traditional static optimization or myopic approaches struggle to address these intertwined factors, particularly when critical market states like [...] Read more.
Wood flooring manufacturers face complex challenges in dynamically allocating resources across multi-channel markets, characterized by channel conflicts, demand uncertainty, and long-term cumulative effects of decisions. Traditional static optimization or myopic approaches struggle to address these intertwined factors, particularly when critical market states like brand reputation and customer base cannot be precisely observed. This paper establishes a systematic and theoretically grounded online decision framework to tackle this problem. We first model the problem as a Partially Observable Stochastic Dynamic Game. The core innovation lies in introducing an unobservable market position vector as the central system state, whose evolution is jointly influenced by firm investments, inter-channel competition, and macroeconomic randomness. The model further captures production lead times, physical inventory dynamics, and saturation/cross-channel effects of marketing investments, constructing a high-fidelity dynamic system. To solve this complex model, we propose a hierarchical online learning and control algorithm named L-BAP (Lyapunov-based Bayesian Approximate Planning), which innovatively integrates three core modules. It employs particle filters for Bayesian inference to nonparametrically estimate latent market states online. Simultaneously, the algorithm constructs a Lyapunov optimization framework that transforms long-term discounted reward objectives into tractable single-period optimization problems through virtual debt queues, while ensuring stability of physical systems like inventory. Finally, the algorithm embeds a game-theoretic module to predict and respond to rational strategic reactions from each channel. We provide theoretical performance analysis, rigorously proving the mean-square boundedness of system queues and deriving the performance gap between long-term rewards and optimal policies under complete information. This bound clearly quantifies the trade-off between estimation accuracy (determined by particle count) and optimization parameters. Extensive simulations demonstrate that our L-BAP algorithm significantly outperforms several strong baselines—including myopic learning and decentralized reinforcement learning methods—across multiple dimensions: long-term profitability, inventory risk control, and customer service levels. Full article
(This article belongs to the Section Analysis of Algorithms and Complexity Theory)
Show Figures

Figure 1

25 pages, 1770 KB  
Article
Comparative Evaluation of Bandit-Style Heuristic Policies for Moving Target Detection in a Linear Grid Environment
by Hyunmin Kang, Minho Ahn and Yongduek Seo
Sensors 2026, 26(1), 226; https://doi.org/10.3390/s26010226 - 29 Dec 2025
Viewed by 604
Abstract
Moving-target detection under strict sensing constraints is a recurring subproblem in surveillance, search-and-rescue, and autonomous robotics. We study a canonical one-dimensional finite grid in which a sensor probes one location per time step with binary observations while the target follows reflecting random-walk dynamics. [...] Read more.
Moving-target detection under strict sensing constraints is a recurring subproblem in surveillance, search-and-rescue, and autonomous robotics. We study a canonical one-dimensional finite grid in which a sensor probes one location per time step with binary observations while the target follows reflecting random-walk dynamics. The objective is to minimize the expected time to detection using transparent, training-free decision rules defined on the belief state of the target location. We compare two belief-driven heuristics with purely online implementation: a greedy rule that always probes the most probable location and a belief-proportional sampling (BPS, probability matching) rule that samples sensing locations according to the belief distribution (i.e., posterior probability of the target location). Repeated Monte Carlo simulations quantify the exploitation–exploration trade-off and provide a self-comparison between the two policies. Across tested grid sizes, the greedy policy consistently yields the shortest expected time to detection, improving by roughly 17–20% over BPS and uniform random probing in representative settings. BPS trades some average efficiency for stochastic exploration, which can be beneficial under model mismatch. This study provides an interpretable baseline and quantitative reference for extensions to noisy sensing and higher-dimensional search. Full article
(This article belongs to the Special Issue Multi-Sensor Technology for Tracking, Positioning and Navigation)
Show Figures

Figure 1

Back to TopTop