Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (128)

Search Parameters:
Keywords = Bellman problem

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 723 KB  
Article
Optimal Investment and Consumption Problem with Stochastic Environments and Delay
by Stanley Jere, Danny Mukonda, Edwin Moyo and Samuel Asante Gyamerah
J. Risk Financial Manag. 2026, 19(1), 62; https://doi.org/10.3390/jrfm19010062 - 13 Jan 2026
Viewed by 140
Abstract
This paper examines an optimal investment–consumption problem in a setting where the financial environment is influenced by both stochastic factors and delayed effects. The investor, endowed with Constant Relative Risk Aversion (CRRA) preferences, allocates wealth between a risk-free asset and a single risky [...] Read more.
This paper examines an optimal investment–consumption problem in a setting where the financial environment is influenced by both stochastic factors and delayed effects. The investor, endowed with Constant Relative Risk Aversion (CRRA) preferences, allocates wealth between a risk-free asset and a single risky asset. The short rate follows a Vasiˇček-type term structure model, while the risky asset price dynamics are driven by a delayed Heston specification whose variance process evolves according to a Cox–Ingersoll–Ross (CIR) diffusion. Delayed dependence in the wealth dynamics is incorporated through two auxiliary variables that summarize past wealth trajectories, enabling us to recast the naturally infinite-dimensional delay problem into a finite-dimensional Markovian framework. Using Bellman’s dynamic programming principle, we derive the associated Hamilton–Jacobi–Bellman (HJB) partial differential equation and demonstrate that it generalizes the classical Merton formulation to simultaneously accommodate delay, stochastic interest rates, stochastic volatility, and consumption. Under CRRA utility, we obtain closed-form expressions for the value function and the optimal feedback controls. Numerical illustrations highlight how delay and market parameters impact optimal portfolio allocation and consumption policies. Full article
(This article belongs to the Special Issue Quantitative Methods for Financial Derivatives and Markets)
Show Figures

Figure 1

21 pages, 988 KB  
Article
Study of Performance from Hierarchical Decision Modeling in IVAs Within a Greedy Context
by Francisco Federico Meza-Barrón, Nelson Rangel-Valdez, María Lucila Morales-Rodríguez, Claudia Guadalupe Gómez-Santillán, Juan Javier González-Barbosa, Guadalupe Castilla-Valdez, Nohra Violeta Gallardo-Rivas and Ana Guadalupe Vélez-Chong
Math. Comput. Appl. 2026, 31(1), 8; https://doi.org/10.3390/mca31010008 - 7 Jan 2026
Viewed by 230
Abstract
This study examines decision-making in intelligent virtual agents (IVAs) and formalizes the distinction between tactical decisions (individual actions) and strategic decisions (composed of sequences of tactical actions) using a mathematical model based on set theory and the Bellman equation. Although the equation itself [...] Read more.
This study examines decision-making in intelligent virtual agents (IVAs) and formalizes the distinction between tactical decisions (individual actions) and strategic decisions (composed of sequences of tactical actions) using a mathematical model based on set theory and the Bellman equation. Although the equation itself is not modified, the analysis reveals that the discount factor (γ) influences the type of decision: low values favor tactical decisions, while high values favor strategic ones. The model was implemented and validated in a proof-of-concept simulated environment, namely the Snake Coin Change Problem (SCCP), using a Deep Q-Network (DQN) architecture, showing significant differences between agents with different decision profiles. These findings suggest that adjusting γ can serve as a useful mechanism to regulate both tactical and strategic decision-making processes in IVAs, thus offering a conceptual basis that could facilitate the design of more intelligent and adaptive agents in domains such as video games, and potentially in robotics and artificial intelligence as future research directions. Full article
(This article belongs to the Special Issue Numerical and Evolutionary Optimization 2025)
Show Figures

Figure 1

32 pages, 1486 KB  
Article
Optimal Carbon Emission Reduction Strategies Considering the Carbon Market
by Wenlin Huang and Daming Shan
Mathematics 2026, 14(1), 68; https://doi.org/10.3390/math14010068 - 24 Dec 2025
Viewed by 213
Abstract
In this study, we develop a stochastic optimal control model for corporate carbon management that synergistically combines emission reduction initiatives with carbon trading mechanisms. The model incorporates two control variables: the autonomous emission reduction rate and initial carbon allowance purchases, while accounting for [...] Read more.
In this study, we develop a stochastic optimal control model for corporate carbon management that synergistically combines emission reduction initiatives with carbon trading mechanisms. The model incorporates two control variables: the autonomous emission reduction rate and initial carbon allowance purchases, while accounting for both deterministic and stochastic carbon pricing scenarios. The solution is obtained through a two-step optimization procedure that addresses each control variable sequentially. In the first step, the problem is transformed into a Hamilton–Jacobi–Bellman (HJB) equation in the sense of viscosity solution. A key aspect of the methodology is deriving the corresponding analytical solution based on this equation’s structure. The second-step optimization results are shown to depend on the relationship between the risk-free interest rate and carbon price dynamics. Furthermore, we employ daily closing prices from 16 July 2021, to 31 December 2024, as the sample dataset to calibrate the parameters governing carbon allowance price evolution. The marginal abatement cost (MAC) curve is calibrated using data derived from the Emissions Prediction and Policy Analysis (EPPA) model, enabling the estimation of the emission reduction efficiency parameter. Additional policy-related parameters are obtained from relevant regulatory documents. The numerical results demonstrate how enterprises can implement the model’s outputs to inform carbon emission reduction decisions in practice and offer enterprises a decision-support tool that integrates theoretical rigor and practical applicability for achieving emission targets in the carbon market. Full article
Show Figures

Figure 1

19 pages, 1281 KB  
Article
The Optimal Frequency Control Problem of a Nonlinear Oscillator
by Victor Ilyutko, Dmitrii Kamzolkin and Vladimir Ternovski
Mathematics 2026, 14(1), 37; https://doi.org/10.3390/math14010037 - 22 Dec 2025
Viewed by 179
Abstract
We study a minimum-time (time-optimal) control problem for a nonlinear pendulum-type oscillator, in which the control input is the system’s natural frequency constrained to a prescribed interval. The objective is to transfer the oscillator from a given initial state to a prescribed terminal [...] Read more.
We study a minimum-time (time-optimal) control problem for a nonlinear pendulum-type oscillator, in which the control input is the system’s natural frequency constrained to a prescribed interval. The objective is to transfer the oscillator from a given initial state to a prescribed terminal state in the shortest possible time. Our approach combines Pontryagin’s maximum principle with Bellman’s principle of optimality. First, we decompose the original problem into a sequence of auxiliary problems, each corresponding to a single semi-oscillation. For every such subproblem, we obtain a complete analytical solution by applying Pontryagin’s maximum principle. These results allow us to reduce the global problem of minimizing the transfer time between the prescribed states to a finite-dimensional optimization problem over a sequence of intermediate amplitudes, which is then solved numerically by dynamic programming. Numerical experiments reveal characteristic features of optimal trajectories in the nonlinear regime, including a non-periodic switching structure, non-uniform semi-oscillation durations, and significant deviations from the behavior of the corresponding linearized system. The proposed framework provides a basis for the synthesis of fast oscillatory regimes in systems with controllable frequency, such as pendulum and crane systems and robotic manipulators. Full article
(This article belongs to the Section E: Applied Mathematics)
Show Figures

Figure 1

30 pages, 2066 KB  
Article
Adaptive Control for a Robotic Bipedal Device Using a Hybrid Discrete-Continuous Reinforcement Learning Strategy
by Karla Rincon-Martinez, Wen Yu and Isaac Chairez
Appl. Sci. 2026, 16(1), 1; https://doi.org/10.3390/app16010001 - 19 Dec 2025
Viewed by 260
Abstract
This research develops and implements a novel reinforcement learning (RL) architecture to address the trajectory-tracking problem in bipedal robotic systems under articulated-joint constraints. The proposed RL framework extends previously designed adaptive controllers characterized by state-dependent gain structures. The learning mechanism comprises two hierarchical [...] Read more.
This research develops and implements a novel reinforcement learning (RL) architecture to address the trajectory-tracking problem in bipedal robotic systems under articulated-joint constraints. The proposed RL framework extends previously designed adaptive controllers characterized by state-dependent gain structures. The learning mechanism comprises two hierarchical adaptation layers: the first employs an adaptive dynamic programming (ADP) formulation to approximate the Bellman value function using a class of continuous-time dynamic neural networks. In contrast, the second uses an iterative optimization scheme based on the deep deterministic policy gradient (DDPG) algorithm. The resulting control strategy minimizes a robust performance index defined over the tracking trajectories of a system with uncertain and nonlinear dynamics representative of bipedal locomotion. The dynamic programming formulation ensures robustness to bounded parametric uncertainties and external perturbations. By approximating the Hamilton–Jacobi–Bellman (HJB) value function using neural network structures, a closed-loop controller design is systematically established. Numerical simulations demonstrate the convergence of the tracking error to a region centered at the origin with a size that depends on the approximation quality of the selected neural network. To assess the effectiveness of the proposed approach, a conventional state-feedback control design is adopted as a benchmark, revealing that the suggested method produces a lower cumulative tracking error norm (0.023 vs. 0.037 rad·s) in the trajectory-tracking control problem for all robotic joints while simultaneously reducing the control effort required to complete motion tasks. Full article
(This article belongs to the Special Issue Human–Robot Interaction and Control)
Show Figures

Figure 1

16 pages, 291 KB  
Article
New Generalizations of Gronwall–Bellman–Bihari-Type Integral Inequalities
by Liqiang Chen and Norazrizal Aswad Abdul Rahman
Axioms 2025, 14(12), 929; https://doi.org/10.3390/axioms14120929 - 18 Dec 2025
Viewed by 408
Abstract
This paper develops several new generalizations of Gronwall–Bellman–Bihari-type integral inequalities. We establish three novel integral inequalities that extend classical results to more complex settings, including integrals with mixed linear and nonlinear terms, delayed (retarded) arguments, and general integral kernels. In the preliminaries, we [...] Read more.
This paper develops several new generalizations of Gronwall–Bellman–Bihari-type integral inequalities. We establish three novel integral inequalities that extend classical results to more complex settings, including integrals with mixed linear and nonlinear terms, delayed (retarded) arguments, and general integral kernels. In the preliminaries, we review known Gronwall–Bellman–Bihari inequalities and useful lemmas. In the main results, we present at least three new theorems. The first theorem provides an explicit bound for solutions of an integral inequality involving a separable kernel function and a nonlinear (Bihari-type) term, significantly extending the classical Bihari inequality. The second theorem addresses integral inequalities with delayed arguments, showing that the delay does not enlarge the growth bound compared to the non-delay case. The third theorem handles inequalities with combined linear and nonlinear terms; using a monotone iterative technique, we prove the existence of a maximal solution that bounds any solution of the inequality. Rigorous proofs are given for all main results. In the Applications section, we illustrate how these inequalities can be applied to deduce qualitative properties of differential equations. As an example, we prove a uniqueness result for an initial value problem with a non-Lipschitz nonlinear term using our new inequalities. The paper concludes with a summary of results and a brief discussion of potential further generalizations. Our results provide powerful tools for researchers to obtain a priori bounds and uniqueness criteria for various differential, integral, and functional equations. It is important to note that the integral inequalities established in this work provide bounds on the solution under the assumption of its existence on the considered interval [t0,T]. For nonlinear differential or integral equations where the nonlinearity F fails to be Lipschitz continuous, solutions may develop movable singularities (blow-up) in finite time. The bounds derived from our Gronwall–Bellman–Bihari-type inequalities are valid only on the maximal interval of existence of the solution. Determining the region where solutions are guaranteed to be free of such singularities is a separate and profound problem, often requiring additional techniques such as the construction of Lyapunov functions or the use of differential comparison principles. The primary contribution of this paper is to provide sharp estimates and uniqueness criteria within the domain where a solution is known to exist a priori. Full article
34 pages, 23756 KB  
Article
Fuzzy-Partitioned Multi-Agent TD3 for Photovoltaic Maximum Power Point Tracking Under Partial Shading
by Diana Ortiz-Muñoz, David Luviano-Cruz, Luis Asunción Pérez-Domínguez, Alma Guadalupe Rodríguez-Ramírez and Francesco García-Luna
Appl. Sci. 2025, 15(23), 12776; https://doi.org/10.3390/app152312776 - 2 Dec 2025
Viewed by 342
Abstract
Maximum power point tracking (MPPT) under partial shading is a nonconvex, rapidly varying control problem that challenges multi-agent policies deployed on photovoltaic modules. We present Fuzzy–MAT3D, a fuzzy-augmented multi-agent TD3 (Twin-Delayed Deep Deterministic Policy Gradient) controller trained under centralized training/decentralized execution (CTDE). On [...] Read more.
Maximum power point tracking (MPPT) under partial shading is a nonconvex, rapidly varying control problem that challenges multi-agent policies deployed on photovoltaic modules. We present Fuzzy–MAT3D, a fuzzy-augmented multi-agent TD3 (Twin-Delayed Deep Deterministic Policy Gradient) controller trained under centralized training/decentralized execution (CTDE). On the theory side, we prove that differentiable fuzzy partitions of unity endow the actor–critic maps with global Lipschitz regularity, reduce temporal-difference target variance, enlarge the input-to-state stability (ISS) margin, and yield a global Lγ-contraction of fixed-policy evaluation (hence, non-expansive with κ=γ<1). We further state a two-time-scale convergence theorem for CTDE-TD3 with fuzzy features; a PL/last-layer-linear corollary implies point convergence and uniqueness of critics. We bound the projected Bellman residual with the correct contraction factor (for L and L2(ρ) under measure invariance) and quantified the negative bias induced by min{Q1,Q2}; an N-agent extension is provided. Empirically, a balanced common-random-numbers design across seven scenarios and 20 seeds, analyzed by ANOVA and CRN-paired tests, shows that Fuzzy–MAT3D attains the highest mean MPPT efficiency (92.0% ± 4.0%), outperforming MAT3D and Multi-Agent Deep Deterministic Policy Gradient controller (MADDPG). Overall, fuzzy regularization yields higher efficiency, suppresses steady-state oscillations, and stabilizes learning dynamics, supporting the use of structured, physics-compatible features in multi-agent MPPT controllers. At the level of PV plants, such gains under partial shading translate into higher effective capacity factors and smoother renewable generation without additional hardware. Full article
Show Figures

Figure 1

19 pages, 998 KB  
Article
Optimal Impulsive Control and Stabilization of Dynamic Systems Based on Quasi-Variational Inequalities
by Wenxuan Wang, Chuandong Li and Mingchen Huan
Mathematics 2025, 13(23), 3864; https://doi.org/10.3390/math13233864 - 2 Dec 2025
Viewed by 267
Abstract
In this paper, we investigate the optimal control problem regarding a class of dynamic systems, aiming to address the challenge of simultaneously ensuring cost minimization and system asymptotic stability. The theoretical framework proposed in this paper integrates the value function concept from optimal [...] Read more.
In this paper, we investigate the optimal control problem regarding a class of dynamic systems, aiming to address the challenge of simultaneously ensuring cost minimization and system asymptotic stability. The theoretical framework proposed in this paper integrates the value function concept from optimal control theory with Lyapunov stability theory. By setting the impulse cost at any finite time to be strictly positive, we exclude Zeno behavior, and a set of sufficient conditions is established that simultaneously guarantees system asymptotic stability and cost minimization based on Quasi-Variational Inequalities (QVIs). To address the challenge of solving the Hamilton–Jacobi–Bellman (HJB) equation in high-dimensional nonlinear systems, we employ an inverse optimal control framework to synthesize the strategy and its corresponding cost function. Finally, we validate the feasibility of our method by applying the theoretical results obtained to three numerical examples. Full article
Show Figures

Figure 1

23 pages, 1113 KB  
Article
Optimal Investment Considerations for a Single Cohort Life Insurance Portfolio
by Sari Cahyaningtias, Petar Jevtić, Carl Gardner and Traian A. Pirvu
Risks 2025, 13(12), 233; https://doi.org/10.3390/risks13120233 - 1 Dec 2025
Viewed by 445
Abstract
This study examines the portfolio optimization problem of an insurance company that issues an annuity, receives the associated premiums as a lump sum, and invests in a financial market. The insurer’s objective is to determine an investment strategy that minimizes the likelihood of [...] Read more.
This study examines the portfolio optimization problem of an insurance company that issues an annuity, receives the associated premiums as a lump sum, and invests in a financial market. The insurer’s objective is to determine an investment strategy that minimizes the likelihood of defaulting on annuity payments before ceasing operations, where default occurs if the portfolio value, net of the annuity liability, becomes negative. Unlike the previous work, here the mortality intensity is stochastic and follows a Cox–Ingersoll–Ross (CIR) process. Dynamic programming is employed, and the value function is characterized by a Hamilton–Jacobi–Bellman (HJB) equation, and the former is linearized through the Legendre transform. Numerical results show that default probability declines with higher initial wealth and mortality intensity, while stochastic mortality volatility has little impact—though slightly higher volatility marginally reduces default risk. Optimal stock investment falls with increasing wealth and mortality intensity, and is nearly constant for low wealth levels. Mortality volatility has minimal influence, but a higher Sharpe ratio raises optimal investment, underscoring the role of risk-adjusted returns. Full article
(This article belongs to the Special Issue Advancements in Actuarial Mathematics and Insurance Risk Management)
Show Figures

Figure 1

29 pages, 5351 KB  
Article
Scalable Wireless Sensor Network Control Using Multi-Agent Reinforcement Learning
by Zejian Zhou
Electronics 2025, 14(22), 4445; https://doi.org/10.3390/electronics14224445 - 14 Nov 2025
Viewed by 757
Abstract
In this paper, the real-time decentralized integrated sensing, navigation, and communication co-optimization problem is investigated for large-scale mobile wireless sensor networks (MWSN) under limited energy. Compared with traditional sensor network optimization and control problems, large-scale resource-constrained MWSNs are associated with two new challenges, [...] Read more.
In this paper, the real-time decentralized integrated sensing, navigation, and communication co-optimization problem is investigated for large-scale mobile wireless sensor networks (MWSN) under limited energy. Compared with traditional sensor network optimization and control problems, large-scale resource-constrained MWSNs are associated with two new challenges, i.e., (1) increased computational and communication complexity due to a large number of mobile wireless sensors and (2) an uncertain environment with limited system resources, e.g., unknown wireless channels, limited transmission power, etc. To overcome these challenges, the Mean Field Game theory is adopted and integrated along with the emerging decentralized multi-agent reinforcement learning algorithm. Specifically, the problem is decomposed into two scenarios, i.e., cost-effective navigation and transmission power allocation optimization. Then, the Actor–Critic–Mass reinforcement learning algorithm is applied to learn the decentralized co-optimal design for both scenarios. To tune the reinforcement-learning-based neural networks, the coupled Hamiltonian–Jacobi–Bellman (HJB) and Fokker–Planck–Kolmogorov (FPK) equations derived from the Mean Field Game formulation are utilized. Finally, numerical simulations are conducted to demonstrate the effectiveness of the developed co-optimal design. Specifically, the optimal navigation algorithm achieved an average accuracy of 2.32% when tracking the given routes. Full article
(This article belongs to the Special Issue Advanced Control Strategies and Applications of Multi-Agent Systems)
Show Figures

Figure 1

21 pages, 2842 KB  
Article
Robust Optimal Reinsurance and Investment Problem Under Markov Switching via Actor–Critic Reinforcement Learning
by Fang Jin, Kangyong Cheng, Xiaoliang Xie and Shubo Chen
Mathematics 2025, 13(21), 3502; https://doi.org/10.3390/math13213502 - 2 Nov 2025
Viewed by 512
Abstract
This paper investigates a robust optimal reinsurance and investment problem for an insurance company operating in a Markov-modulated financial market. The insurer’s surplus process is modeled by a diffusion process with jumps, which is correlated with financial risky assets through a common shock [...] Read more.
This paper investigates a robust optimal reinsurance and investment problem for an insurance company operating in a Markov-modulated financial market. The insurer’s surplus process is modeled by a diffusion process with jumps, which is correlated with financial risky assets through a common shock structure. The economic regime switches according to a continuous-time Markov chain. To address model uncertainty concerning both diffusion and jump components, we formulate the problem within a robust optimal control framework. By applying the Girsanov theorem for semimartingales, we derive the dynamics of the wealth process under an equivalent martingale measure. We then establish the associated Hamilton–Jacobi–Bellman (HJB) equation, which constitutes a coupled system of nonlinear second-order integro-differential equations. An explicit form of the relative entropy penalty function is provided to quantify the cost of deviating from the reference model. The theoretical results furnish a foundation for numerical solutions using actor–critic reinforcement learning algorithms. Full article
Show Figures

Figure 1

24 pages, 2934 KB  
Article
Selected Methods for Designing Monetary and Fiscal Targeting Rules Within the Policy Mix Framework
by Agnieszka Przybylska-Mazur
Entropy 2025, 27(10), 1082; https://doi.org/10.3390/e27101082 - 19 Oct 2025
Viewed by 513
Abstract
In the existing literature, targeting rules are typically determined separately for monetary and fiscal policy. This article proposes a framework for determining targeting rules that account for the policy mix of both monetary and fiscal policy. The aim of this study is to [...] Read more.
In the existing literature, targeting rules are typically determined separately for monetary and fiscal policy. This article proposes a framework for determining targeting rules that account for the policy mix of both monetary and fiscal policy. The aim of this study is to compare selected optimization methods used to derive targeting rules as solutions to a constrained minimization problem. The constraints are defined by a model that incorporates a monetary and fiscal policy mix. The optimization methods applied include the linear–quadratic regulator, Bellman dynamic programming, and Euler’s calculus of variations. The resulting targeting rules are solutions to a discrete-time optimization problem with a finite horizon and without discounting. In this article, we define targeting rules that take into account the monetary and fiscal policy mix. The derived rules allow for the calculation of optimal values for the interest rate and the balance-to-GDP ratio, which ensure price stability, a stable debt-to-GDP ratio, and the desired GDP growth dynamics. It can be noted that all the optimization methods used yield the same optimal vector of decision variables, and the specific method applied does not affect the form of the targeting rules. Full article
Show Figures

Figure 1

21 pages, 629 KB  
Article
Finite Time Stability and Optimal Control for Stochastic Dynamical Systems
by Ronit Chitre and Wassim M. Haddad
Axioms 2025, 14(10), 767; https://doi.org/10.3390/axioms14100767 - 16 Oct 2025
Viewed by 978
Abstract
In real-world applications, finite time convergence to a desired Lyapunov stable equilibrium is often necessary. This notion of stability is known as finite time stability and refers to systems in which the state trajectory reaches an equilibrium in finite time. This paper explores [...] Read more.
In real-world applications, finite time convergence to a desired Lyapunov stable equilibrium is often necessary. This notion of stability is known as finite time stability and refers to systems in which the state trajectory reaches an equilibrium in finite time. This paper explores the notion of finite time stability in probability within the context of nonlinear stochastic dynamical systems. Specifically, we introduce sufficient conditions based on Lyapunov methods, utilizing Lyapunov functions that satisfy scalar differential inequalities involving fractional powers for guaranteeing finite time stability in probability. Then, we address the finite time optimal control problem by developing a framework for designing optimal feedback control laws that achieve finite time stochastic stability of the closed-loop system using a Lyapunov function that also serves as the solution to the steady-state stochastic Hamilton–Jacobi–Bellman equation. Full article
Show Figures

Figure 1

33 pages, 3062 KB  
Article
Gradient-Free De Novo Learning
by Karl Friston, Thomas Parr, Conor Heins, Lancelot Da Costa, Tommaso Salvatori, Alexander Tschantz, Magnus Koudahl, Toon Van de Maele, Christopher Buckley and Tim Verbelen
Entropy 2025, 27(9), 992; https://doi.org/10.3390/e27090992 - 22 Sep 2025
Cited by 1 | Viewed by 2355
Abstract
This technical note applies active inference to the problem of learning goal-directed behaviour from scratch, namely, de novo learning. By de novo learning, we mean discovering, directly from observations, the structure and parameters of a discrete generative model for sequential policy optimisation. Concretely, [...] Read more.
This technical note applies active inference to the problem of learning goal-directed behaviour from scratch, namely, de novo learning. By de novo learning, we mean discovering, directly from observations, the structure and parameters of a discrete generative model for sequential policy optimisation. Concretely, our procedure grows and then reduces a model until it discovers a pullback attractor over (generalised) states; this attracting set supplies paths of least action among goal states while avoiding costly states. The implicit efficiency rests upon reframing the learning problem through the lens of the free energy principle, under which it is sufficient to learn a generative model whose dynamics feature such an attracting set. For context, we briefly relate this perspective to value-based formulations (e.g., Bellman optimality) and then apply the active inference formulation to a small arcade game to illustrate de novo structure learning and ensuing agency. Full article
(This article belongs to the Special Issue Active Inference in Cognitive Neuroscience)
Show Figures

Figure 1

19 pages, 4228 KB  
Article
Data-Driven Optimal Bipartite Containment Tracking for Multi-UAV Systems with Compound Uncertainties
by Bowen Chen, Mengji Shi, Zhiqiang Li and Kaiyu Qin
Drones 2025, 9(8), 573; https://doi.org/10.3390/drones9080573 - 13 Aug 2025
Viewed by 515
Abstract
With the increasing deployment of Unmanned Aerial Vehicle (UAV) swarms in uncertain and dynamically changing environments, optimal cooperative control has become essential for ensuring robust and efficient system coordination. To this end, this paper designs a data-driven optimal bipartite containment tracking control scheme [...] Read more.
With the increasing deployment of Unmanned Aerial Vehicle (UAV) swarms in uncertain and dynamically changing environments, optimal cooperative control has become essential for ensuring robust and efficient system coordination. To this end, this paper designs a data-driven optimal bipartite containment tracking control scheme for multi-UAV systems under compound uncertainties. A novel Dynamic Iteration Regulation Strategy (DIRS) is proposed, which enables real-time adjustment of the learning iteration step according to the task-specific demands. Compared with conventional fixed-step data-driven algorithms, the DIRS provides greater flexibility and computational efficiency, allowing for better trade-offs between the performance and cost. First, the optimal bipartite containment tracking control problem is formulated, and the associated coupled Hamilton–Jacobi–Bellman (HJB) equations are established. Then, a data-driven iterative policy learning algorithm equipped with the DIRS is developed to solve the optimal control law online. The stability and convergence of the proposed control scheme are rigorously analyzed. Furthermore, the control law is approximated via the neural network framework without requiring full knowledge of the model. Finally, numerical simulations are provided to demonstrate the effectiveness and robustness of the proposed DIRS-based optimal containment tracking scheme for multi-UAV systems, which can reduce the number of iterations by 88.27% compared to that for the conventional algorithm. Full article
Show Figures

Figure 1

Back to TopTop