Abstract
This paper studies an averaged Linear Quadratic Regulator (LQR) problem for a parabolic partial differential equation (PDE), where the system dynamics are affected by uncertain parameters. Instead of assuming a deterministic operator, we model the uncertainty using a probability distribution over a set of possible system dynamics. This approach extends classical optimal control theory by incorporating an averaging framework to account for parameter uncertainty. We establish the existence and uniqueness of the optimal control solution and analyze its convergence as the probability distribution governing the system parameters changes. These results provide a rigorous foundation for solving optimal control problems in the presence of parameter uncertainty. Our findings lay the groundwork for further studies on optimal control in dynamic systems with uncertainty.
Keywords:
linear quadratic regulator; optimal control; parabolic PDEs; averaging method; uncertainty modeling; control convergence MSC:
49J20; 49N10
1. Introduction
Optimal control problems for partial differential equations (PDEs) are widely used in engineering, physics, and economics to model systems that evolve over time [1]. Among these, the Linear Quadratic Regulator (LQR) problem has been extensively studied due to its well-established theoretical properties and practical applicability. The LQR framework provides an optimal strategy for controlling systems governed by linear PDEs while minimizing a quadratic cost functional [1,2]. Recent advances also explore approximate bounded feedback synthesis in parabolic PDEs with nonlinear perturbations and semidefinite performance criteria, extending classical LQR concepts to weakly nonlinear systems [3].
In this paper, we consider an averaged LQR problem for a parabolic PDE, where the system dynamics contain uncertain parameters. Instead of assuming a single deterministic system, we model the uncertainty using a probability distribution over a set of possible dynamics. This approach is inspired by previous studies in optimal control and stochastic averaging methods [4]. Related work addresses optimal regulation under rapidly oscillating parameters using homogenized models and superposition-type cost structures [5].
Reinforcement learning (RL) has become a cornerstone of modern machine learning, operating alongside supervised and unsupervised learning to solve decision-making problems under uncertainty. In this paradigm, agents learn optimal policies by interacting with an environment, optimizing a long-term performance criterion [6]. The connection between RL and classical optimal control theory has long been recognized [7], and recent developments in RL are now significantly influencing the field of control theory [8].
A key distinction within RL lies between model-free and model-based methods. Model-free approaches directly approximate value functions or policies without constructing a model of the environment, while model-based methods aim to learn a model from data and use it for planning [6]. The latter often suffer from model bias, a challenge identified early on [9]. To overcome this, the PILCO algorithm was proposed [10,11], which models system dynamics probabilistically using Gaussian processes and performs policy improvement based on expected trajectories.
PILCO and its extensions [12,13,14] operate within a Bayesian model-based RL framework that integrates data-driven learning with optimal control. These methods have demonstrated remarkable data efficiency and robustness and inspired numerous theoretical developments [15,16,17]. The formalization of such Bayesian approaches using distributions over system dynamics is central to understanding and reducing model uncertainty [18,19].
In this context, averaged optimal control emerges as a powerful framework that interprets the expected behavior over a distribution of dynamics. This idea has strong theoretical roots in the Riemann–Stieltjes optimal control setting [20,21,22] and averaged controllability [23,24]. Additionally, the challenge of maintaining stability in distributed control systems under destabilizing factors has motivated algorithmic approaches to resilient network synthesis [25], further highlighting the importance of robust control strategies. Our work is motivated by these formulations and aims to explore how solutions to averaged optimal control problems relate to those of deterministic counterparts.
Additionally, reinforcement learning in continuous-time systems is gaining momentum in control engineering [26,27,28,29]. These studies provide a basis for the development of algorithms that can operate in real-world, high-frequency environments.
This paper contributes to this growing body of work by investigating the convergence of optimal policies derived from averaged linear quadratic regulator (LQR) problems to those of classical LQR problems as the distribution over dynamics concentrates. Our approach complements existing algorithms such as PILCO [11] by offering a theoretical justification for their observed empirical success [18].
The main contributions of this paper are as follows:
- We establish the existence and uniqueness of optimal solutions for the averaged control problem.
- We analyze the convergence of optimal controls as the probability measure representing system uncertainty becomes more concentrated.
The structure of this paper is as follows: First, we define the mathematical formulation of the problem. Then, we present theoretical preliminaries and key functional analysis tools. The main results, including proofs of existence, uniqueness, and convergence, follow. Finally, we summarize our findings and discuss potential future directions.
2. Setting of the Problem
For unknown functions , , , , and , we consider the linear quadratic optimal control problem:
where the cost function is given by
where
matrix , and operator satisfies the uniform ellipticity condition given by
The uniform ellipticity condition ensures the well-posedness of the PDE and is a standard assumption in the study of elliptic and parabolic operators.
Here, and are positive numbers that represent the weight coefficients in the objective functional, is a positive number that defines the uniform ellipticity of a differential operator, and .
is a set of symmetric matrices satisfying condition (3) (with one and the same constant ).
is a probability measure on .
In the following, we denote
For , we denote by the standard Euclidean norm.
3. Preliminary Results
It is known that for every and , problem (1) has a unique solution in the weak sense ([30] [Theorem 3.1, p. 70]):
such that and ,
Due to the embedding,
where equality makes sense.
Moreover, for every weak solution, functions and are absolutely continuous and
If for a given , we denote by (cor. ) the solution of (1) with matrix (cor. ), then for , we get
Therefore,
So, from (7), we deduce
Finally, we note that weak convergence (4) can be described by the Wasserstein metric:
where is a Polish metric space, and is the collection of all probability measures on with projections and , respectively.
Then,
4. Main Results
Theorem 1.
where constant does not depend on π and A.
For every measure π, the LQR optimal control problem (1), (2) has a unique solution (here, depends on A), and
Remark 1.
depends on π but does not depend on A.
Proof of Theorem 1.
Let be a minimizing sequence and be the corresponding solutions of (1). Then,
So,
Moreover, from (7), we deduce that for every A, is bounded in . As is compactly embedded in , up to subsequence, for some ,
Taking an arbitrary and passing the limit in the equality
we obtain that is a solution of (1) with control and matrix A. Due to the uniqueness of such a solution, the whole sequence tends to .
Moreover, due to (11), , we get
Also, using Fatou’s Lemma, we get
Because of the strict convexity of , we obtain uniqueness. Thus, the theorem is proved. □
Now, let us assume that
The convergence of optimal controls under the weak convergence of probability measures is analyzed using the Wasserstein metric, a powerful tool in probability and optimal transport theory [31].
Theorem 2.
Proof of Theorem 2.
For every , due to (6) and (8),
where is bounded if is bounded, and is a probability measure on with projections and . If we take , then space is a Polish metric space. Therefore, from (16), we get, for ,
Let us denote
Due to (9), the number does not depend on n.
Then, the optimality of and implies
In the following, we denote , .
So, up to the subsequence, for some , , tends to in the sense of (11).
Passing to the limit yields that is a solution of (1) with control .
Due to (17),
This inequality also implies that in .
Indeed, on the one hand,
And on the other hand,
So,
This inequality and weak convergence to in imply the strong convergence of (14).
From (1), we deduce
Therefore, for all ,
So, the function
is monotone, continuous, and
Then, from Dini’s Theorem,
which implies (15). □
5. An Example
The aim of this section is to illustrate the obtained results using the scheme utilized in [4]. We assume that is a finite set, i.e., for some . We consider a sequence of probability distributions :
where is a Dirac delta concentrated at .
Assume that
Then, clearly,
According to Theorem 1, problems (22), (23) has a unique solution , which satisfies (9). For the measure , we have the following problem of minimizing the functional
where
and y is a solution of (1) with . With obvious changes, we can apply argument (10), (11) to problem (24) and obtain that such a problem has a unique solution , which together with y-components satisfies estimation (9) with constant C not depending on n. This means that
Therefore, using a well-known fact [3]—if in (1), weakly in , then in —we can pass to the limit and obtain
We can give a numerical illustration in the simplest case: , , , , , , , . Then, for and for problem (22), (23), we have
6. Conclusions
In this paper, we studied an averaged Linear Quadratic Regulator (LQR) problem for a parabolic partial differential equation (PDE), where the system dynamics are described by a probability distribution over possible operators. This formulation generalizes the classical LQR problem by incorporating uncertainty in the system parameters through an averaging approach.
The main contributions of this work are the following:
- We established the existence and uniqueness of the optimal control solution under appropriate assumptions.
- We proved the convergence of the optimal control as the probability distribution governing the system dynamics become more concentrated.
These results provide a rigorous theoretical foundation for analyzing control problems with uncertainty in system parameters.
In future research, we plan to generalize our results to multidimensional parabolic systems and evolution problems on infinite-time intervals. Moreover, it should be interesting to extend the results to hyperbolic partial differential equations (PDEs), which model wave-like and transport phenomena [32]. Exploring the control of such systems within a reinforcement learning framework may yield new methods for optimal control in dynamic environments [33], with potential applications in robotics, physics-based simulations, and real-time decision-making systems.
Beyond the realm of physical systems, intelligent control and learning frameworks are gaining traction in socio-technical domains. One notable example is the use of machine learning models to support automated recruitment and decision-making for hiring young professionals [34]. Additionally, the development of advanced database connectors underscores the relevance of scalable control and dataflow mechanisms in distributed computing systems [35].
Author Contributions
Conceptualization, O.K., A.M., and O.L.; methodology, O.K.; formal analysis, O.K.; investigation, O.K., A.M., and O.L.; writing—original draft preparation, A.M.; writing—review and editing, O.K. and A.M.; visualization, A.M.; supervision, O.K.; project administration, O.K.; funding acquisition, O.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Evans, L.C. Partial Differential Equations, 2nd ed.; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
- Anderson, B.D.O.; Moore, J.B. Optimal Control: Linear Quadratic Methods; Prentice Hall: Englewood Cliffs, NJ, USA, 1989. [Google Scholar]
- Kapustyan, O.V.; Kapustyan, O.A.; Sukretna, A.V. Approximate bounded synthesis for one weakly nonlinear boundary-value problem. Nonlinear Oscil. 2009, 12, 297–304. [Google Scholar] [CrossRef]
- Pesare, A.; Palladino, M.; Falcone, M. Convergence results for an averaged LQR problem with applications to reinforcement learning. Math. Control Signals Syst. 2021, 33, 379–411. [Google Scholar] [CrossRef]
- Kapustian, O.A. Approximate optimal regulator for distributed control problem with superposition functional and rapidly oscillating coefficients. In Modern Mathematics and Mechanics; Sadovnichiy, V., Zgurovsky, M., Eds.; Springer: Cham, Switzerland, 2019; pp. 199–208. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Sutton, R.S.; Barto, A.G.; Williams, R.J. Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. 1992, 12, 19–22. [Google Scholar]
- Recht, B. A tour of reinforcement learning: The view from continuous control. Annu. Rev. Control Robot Auton. Syst. 2019, 2, 253–279. [Google Scholar] [CrossRef]
- Atkeson, C.G.; Santamaria, J.C. A comparison of direct and model-based reinforcement learning. In Proceedings of the International Conference on Robotics and Automation, Albuquerque, NM, USA, 20–25 April 1997; Volume 4, pp. 3557–3564. [Google Scholar]
- Deisenroth, M.P. Efficient Reinforcement Learning Using Gaussian Processes; KIT Scientific Publishing: Karlsruhe, Germany, 2010. [Google Scholar]
- Deisenroth, M.P.; Fox, D.; Rasmussen, C.E. Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 37, 408–423. [Google Scholar] [CrossRef]
- Gal, Y.; McAllister, R.; Rasmussen, C.E. Improving PILCO with Bayesian neural network dynamics models. In Proceedings of the ICML Workshop on Data-Efficient Machine Learning, New York, NY, USA, 24 June 2016; Volume 4, p. 25. [Google Scholar]
- Janner, M.; Fu, J.; Zhang, M.; Levine, S. When to trust your model: Model-based policy optimization. Adv. Neural Inf. Process. Syst. 2019, 32, 12519–12530. [Google Scholar]
- Kamthe, S.; Deisenroth, M. Data-efficient reinforcement learning with probabilistic model predictive control. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Playa Blanca, Lanzarote, Spain, 9–11 April 2018; pp. 1701–1710. [Google Scholar]
- Chowdhary, G.; Kingravi, H.A.; How, J.P.; Vela, P.A. A Bayesian nonparametric approach to adaptive control using Gaussian processes. In Proceedings of the IEEE Conference Decision Control (CDC), Florence, Italy, 10–13 December 2013; pp. 874–879. [Google Scholar]
- Chua, K.; Calandra, R.; McAllister, R.; Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Adv. Neural Inf. Process. Syst. 2018, 31, 4754–4765. [Google Scholar]
- Wang, T.; Bao, X.; Clavera, I.; Hoang, J.; Wen, Y.; Langlois, E.; Zhang, S.; Zhang, G.; Abbeel, P.; Ba, J. Benchmarking Model-Based Reinforcement Learning. arXiv 2019, arXiv:1907.02057. [Google Scholar]
- Murray, R.; Palladino, M. A model for system uncertainty in reinforcement learning. Syst. Control Lett. 2018, 122, 24–31. [Google Scholar] [CrossRef]
- Murray, R.; Palladino, M. Modelling uncertainty in reinforcement learning. In Proceedings of the IEEE Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 2436–2441. [Google Scholar]
- Bettiol, P.; Khalil, N. Necessary optimality conditions for average cost minimization problems. Discrete Contin. Dyn. Syst. B 2019, 24, 2093. [Google Scholar] [CrossRef]
- Palladino, M. Necessary conditions for adverse control problems expressed by relaxed derivatives. Set-Valued Var. Anal. 2016, 24, 659. [Google Scholar] [CrossRef]
- Ross, I.M.; Proulx, R.J.; Karpenko, M.; Gong, Q. Riemann–Stieltjes optimal control problems for uncertain dynamic systems. J. Guid. Control Dyn. 2015, 38, 1251–1263. [Google Scholar] [CrossRef]
- Lohéac, J.; Zuazua, E. From averaged to simultaneous controllability. Ann. Fac. Sci. Toulouse Math. 2016, 25, 785–828. [Google Scholar] [CrossRef]
- Zuazua, E. Averaged control. Automatica 2014, 50, 3077–3087. [Google Scholar] [CrossRef]
- Barabash, O.; Sobchuk, V.; Sobchuk, A.; Musienko, A.; Laptiev, O. Algorithms for synthesis of functionally stable wireless sensor network. Adv. Inf. Syst. 2025, 9, 70–79. [Google Scholar]
- Doya, K. Reinforcement learning in continuous time and space. Neural Comput. 2000, 12, 219–245. [Google Scholar] [CrossRef]
- Lee, J.Y.; Park, J.B.; Choi, Y.H. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations. IEEE Trans. Neural Netw. Learn. Syst. 2014, 26, 916–932. [Google Scholar]
- Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32. [Google Scholar] [CrossRef]
- Munos, R. A study of reinforcement learning in the continuous case by the means of viscosity solutions. Mach. Learn. 2000, 40, 265–297. [Google Scholar] [CrossRef]
- Temam, R. Infinite-Dimensional Dynamical Systems in Mechanics and Physics; Springer Science & Business Media: Cham, Switzerland, 2013. [Google Scholar]
- Villani, C. Optimal Transport: Old and New; Springer: Berlin, Germany, 2009. [Google Scholar]
- Lions, J.L. Contrôlabilité Exacte, Perturbations et Stabilisation de Systèmes Distribués; Masson: Paris, France, 1988. [Google Scholar]
- Fleming, W.H.; Soner, H.M. Controlled Markov Processes and Viscosity Solutions, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
- Makarovych, V.; Makarovych, A. Analysis of socio-economic determinants of youth employment using machine learning methods. Acta Acad. Beregsasiensis Econ. 2024, 6, 81–101. [Google Scholar]
- Glebena, M.I.; Makarovych, A.V. SingleStoreDB connector for Apache Beam. Sci. Bull. Uzhhorod Univ. Ser. Math. Inf. 2024, 44, 66–82. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).