Averaging of Linear Quadratic Parabolic Optimal Control Problem

Kapustian, Olena; Laptiev, Oleksandr; Makarovych, Adalbert

doi:10.3390/axioms14070512

Open AccessArticle

Averaging of Linear Quadratic Parabolic Optimal Control Problem

by

Olena Kapustian

¹

,

Oleksandr Laptiev

²

and

Adalbert Makarovych

^1,*

¹

Faculty of Computer Science and Cybernetics, Taras Shevchenko National University of Kyiv, Volodymyrska Street 64, 01601 Kyiv, Ukraine

²

Faculty of Information Technology, Taras Shevchenko National University of Kyiv, Volodymyrska Street 64, 01601 Kyiv, Ukraine

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(7), 512; https://doi.org/10.3390/axioms14070512

Submission received: 4 June 2025 / Revised: 25 June 2025 / Accepted: 30 June 2025 / Published: 2 July 2025

(This article belongs to the Special Issue Modern Trends and Application of Decision-Making Theory, Stability and Control)

Download Versions Notes

Abstract

This paper studies an averaged Linear Quadratic Regulator (LQR) problem for a parabolic partial differential equation (PDE), where the system dynamics are affected by uncertain parameters. Instead of assuming a deterministic operator, we model the uncertainty using a probability distribution over a set of possible system dynamics. This approach extends classical optimal control theory by incorporating an averaging framework to account for parameter uncertainty. We establish the existence and uniqueness of the optimal control solution and analyze its convergence as the probability distribution governing the system parameters changes. These results provide a rigorous foundation for solving optimal control problems in the presence of parameter uncertainty. Our findings lay the groundwork for further studies on optimal control in dynamic systems with uncertainty.

Keywords:

linear quadratic regulator; optimal control; parabolic PDEs; averaging method; uncertainty modeling; control convergence

MSC:

49J20; 49N10

1. Introduction

Optimal control problems for partial differential equations (PDEs) are widely used in engineering, physics, and economics to model systems that evolve over time [1]. Among these, the Linear Quadratic Regulator (LQR) problem has been extensively studied due to its well-established theoretical properties and practical applicability. The LQR framework provides an optimal strategy for controlling systems governed by linear PDEs while minimizing a quadratic cost functional [1,2]. Recent advances also explore approximate bounded feedback synthesis in parabolic PDEs with nonlinear perturbations and semidefinite performance criteria, extending classical LQR concepts to weakly nonlinear systems [3].

In this paper, we consider an averaged LQR problem for a parabolic PDE, where the system dynamics contain uncertain parameters. Instead of assuming a single deterministic system, we model the uncertainty using a probability distribution over a set of possible dynamics. This approach is inspired by previous studies in optimal control and stochastic averaging methods [4]. Related work addresses optimal regulation under rapidly oscillating parameters using homogenized models and superposition-type cost structures [5].

Reinforcement learning (RL) has become a cornerstone of modern machine learning, operating alongside supervised and unsupervised learning to solve decision-making problems under uncertainty. In this paradigm, agents learn optimal policies by interacting with an environment, optimizing a long-term performance criterion [6]. The connection between RL and classical optimal control theory has long been recognized [7], and recent developments in RL are now significantly influencing the field of control theory [8].

A key distinction within RL lies between model-free and model-based methods. Model-free approaches directly approximate value functions or policies without constructing a model of the environment, while model-based methods aim to learn a model from data and use it for planning [6]. The latter often suffer from model bias, a challenge identified early on [9]. To overcome this, the PILCO algorithm was proposed [10,11], which models system dynamics probabilistically using Gaussian processes and performs policy improvement based on expected trajectories.

PILCO and its extensions [12,13,14] operate within a Bayesian model-based RL framework that integrates data-driven learning with optimal control. These methods have demonstrated remarkable data efficiency and robustness and inspired numerous theoretical developments [15,16,17]. The formalization of such Bayesian approaches using distributions over system dynamics is central to understanding and reducing model uncertainty [18,19].

In this context, averaged optimal control emerges as a powerful framework that interprets the expected behavior over a distribution of dynamics. This idea has strong theoretical roots in the Riemann–Stieltjes optimal control setting [20,21,22] and averaged controllability [23,24]. Additionally, the challenge of maintaining stability in distributed control systems under destabilizing factors has motivated algorithmic approaches to resilient network synthesis [25], further highlighting the importance of robust control strategies. Our work is motivated by these formulations and aims to explore how solutions to averaged optimal control problems relate to those of deterministic counterparts.

Additionally, reinforcement learning in continuous-time systems is gaining momentum in control engineering [26,27,28,29]. These studies provide a basis for the development of algorithms that can operate in real-world, high-frequency environments.

This paper contributes to this growing body of work by investigating the convergence of optimal policies derived from averaged linear quadratic regulator (LQR) problems to those of classical LQR problems as the distribution over dynamics concentrates. Our approach complements existing algorithms such as PILCO [11] by offering a theoretical justification for their observed empirical success [18].

The main contributions of this paper are as follows:

We establish the existence and uniqueness of optimal solutions for the averaged control problem.
We analyze the convergence of optimal controls as the probability measure representing system uncertainty becomes more concentrated.

The structure of this paper is as follows: First, we define the mathematical formulation of the problem. Then, we present theoretical preliminaries and key functional analysis tools. The main results, including proofs of existence, uniqueness, and convergence, follow. Finally, we summarize our findings and discuss potential future directions.

2. Setting of the Problem

For unknown functions

y = y (t, x)

,

u = u (t, x) \in L^{2} (Q_{T})

,

t \in [0, T]

,

x \in Ω ⋐ R^{d}

, and

Q_{T} = (0, T) \times Ω

, we consider the linear quadratic optimal control problem:

\{\begin{matrix} \frac{\partial y (t, x)}{\partial t} - L_{A} y (t, x) & = u (t, x), \\ {y |}_{x \in \partial Ω} & = 0, \\ {y |}_{t = 0} & = y_{0} (x), \end{matrix}

(1)

where the cost function is given by

\begin{matrix} J_{π} (u) = E_{π} [\int_{0}^{T} (\int_{Ω} y^{2} (t, x) d x + γ_{1} \int_{Ω} u^{2} (t, x) d x) d t + γ_{2} \int_{Ω} y^{2} (T, x) d x] = \\ = \int_{A} [\int_{0}^{T} (\int_{Ω} y^{2} (t, x) d x + γ_{1} \int_{Ω} u^{2} (t, x) d x) d t + γ_{2} \int_{Ω} y^{2} (T, x) d x] d π (A) \to inf, \end{matrix}

(2)

where

L_{A} = \sum_{i, j = 1}^{d} a_{i j} \frac{\partial^{2}}{\partial x_{i} \partial x_{j}}, a_{i j} = a_{j i},

matrix

A = {a_{i j}}

, and operator

L_{A}

satisfies the uniform ellipticity condition given by

\sum_{i, j = 1}^{d} a_{i j} ξ_{i} ξ_{j} \geq γ_{3} \sum_{i = 1}^{d} ξ_{i}^{2}, \forall ξ \in R^{d} .

(3)

The uniform ellipticity condition ensures the well-posedness of the PDE and is a standard assumption in the study of elliptic and parabolic operators.

Here,

γ_{1}

and

γ_{2}

are positive numbers that represent the weight coefficients in the objective functional,

γ_{3}

is a positive number that defines the uniform ellipticity of a differential operator, and

y_{0} \in L^{2} (Ω)

.

A

is a set of symmetric matrices satisfying condition (3) (with one and the same constant

γ_{3}

).

π

is a probability measure on

A

.

In the following, we denote

\begin{matrix} ∥ y ∥ = {(\int_{Ω} y^{2} (x) d x)}^{\frac{1}{2}}, \\ (y, z) = \int_{Ω} y (x) z (x) d x, \\ {∥ u ∥}_{L^{2} (Q_{T})} = {(\int_{Q_{T}} u^{2} (t, x) d t d x)}^{\frac{1}{2}} . \end{matrix}

For

A \in A

, we denote by

∥ A ∥

the standard Euclidean norm.

We will prove that for every

π

problem (1), (2) has a unique solution

{{\bar{y}}_{π}, {\bar{u}}_{π}}

(here,

{\bar{y}}_{π}

depends on A), and if

π^{n} \to π^{\infty} weakly,

(4)

then

\begin{matrix} \forall A \in A, {\bar{y}}_{π}^{n} \to {\bar{y}}_{π}^{\infty} in C ([0, T]; L^{2} (Ω)), \\ {\bar{u}}_{π}^{n} \to {\bar{u}}_{π}^{\infty} in L^{2} (Ω) . \end{matrix}

3. Preliminary Results

It is known that for every

A \in A

and

u \in L^{2} (Q_{T})

, problem (1) has a unique solution in the weak sense ([30] [Theorem 3.1, p. 70]):

\exists! y \in W (0, T) : = {y \in L^{2} (0, T; H_{0}^{1} (Ω)) | \frac{\partial y}{\partial t} \in L^{2} (0, T; H^{- 1} (Ω))}

such that

{y |}_{t = 0} = y_{0}

and

\forall v \in H_{0}^{1} (Ω)

,

\frac{\partial}{\partial t} (y (t), v) + (A \nabla y, \nabla v) = (u, v) a . e . on (0, T) .

Due to the embedding,

W (0, T) \subset C ([0, T]; L^{2} (Ω))

where equality

{y |}_{t = 0} = y_{0}

makes sense.

Moreover, for every weak solution, functions

t \mapsto (y (t), v)

and

t \mapsto {∥ y (t) ∥}^{2}

are absolutely continuous and

\frac{\partial}{\partial t} (y (t), v) + (A \nabla y, \nabla v) = (u, v) a . e . on (0, T) .

\frac{1}{2} \frac{\partial}{\partial t} {∥ y (t) ∥}^{2} + (A \nabla y, \nabla y) = (u, y) a . e . on (0, T) .

(5)

From (5), we derive the following estimates:

\forall t \in {[0, T] ∥ y (t) ∥}^{2} \leq ({∥ y_{0} ∥}^{2} + {∥ u ∥}_{L^{2} (Q_{T})}^{2}) e^{T},

(6)

2 γ_{3} \int_{0}^{T} {∥ \nabla y ∥}^{2} \leq {∥ y_{0} ∥}^{2} + {∥ u ∥}_{L^{2} (Q_{T})}^{2} + ({∥ y_{0} ∥}^{2} + {∥ u ∥}_{L^{2} (Q_{T})}^{2}) e^{T} T .

(7)

If for a given

u \in L^{2} (Q_{T})

, we denote by

y_{1}

(cor.

y_{2}

) the solution of (1) with matrix

A_{1}

(cor.

A_{2}

), then for

z = y_{1} - y_{2}

, we get

\frac{1}{2} \frac{\partial}{\partial t} {∥ z (t) ∥}^{2} + ((A_{1} - A_{2}) \nabla y_{1}, \nabla z) + (A_{2} \nabla z, \nabla z) = 0 .

Therefore,

\frac{1}{2} \frac{\partial}{\partial t} {∥ z (t) ∥}^{2} \leq ∥ A_{1} - A_{2} ∥ ∥ \nabla y_{1} ∥ ∥ \nabla z ∥ .

So,

\forall t \in [0, T]

from (7), we deduce

∥ y_{1} (t) - y_{2} {(t) ∥}^{2} \leq ∥ A_{1} - A_{2} ∥ \frac{1}{γ_{3}} ({∥ y_{0} ∥}^{2} + {∥ u ∥}_{L^{2} (Q_{T})}^{2}) (T e^{T} + 1) .

(8)

Finally, we note that weak convergence (4) can be described by the Wasserstein metric:

ρ (π_{1}, π_{2}) = inf_{μ \in Γ (π_{1}, π_{2})} \int_{A \times A} d (A_{1}, A_{2}) \partial μ,

where

(A, d)

is a Polish metric space, and

Γ (π_{1}, π_{2})

is the collection of all probability measures on

A \times A

with projections

π_{1}

and

π_{2}

, respectively.

Then,

π^{n} \overset{w}{\to} π^{\infty} \Leftrightarrow ρ (π^{n}, π^{\infty}) \to 0 as n \to \infty .

4. Main Results

Theorem 1.

For every measure π, the LQR optimal control problem (1), (2) has a unique solution

{{\bar{y}}_{π}, {\bar{u}}_{π}}

(here,

{\bar{y}}_{π}

depends on A), and

max_{t \in [0, T]} ∥ {\bar{y}}_{π} (t) ∥ + {∥ {\bar{u}}_{π} ∥}_{L^{2} (Q_{t})} \leq C,

(9)

where constant

C > 0

does not depend on π and A.

Remark 1.

{\bar{u}}_{π}

depends on π but does not depend on A.

Proof of Theorem 1.

Let

{u_{n}}

be a minimizing sequence and

{y_{n}}

be the corresponding solutions of (1). Then,

J_{π} (u_{n}) \leq inf J_{π} (u) + \frac{1}{n} .

Taking control function

u \equiv 0

, for the corresponding solution y of (1), we have from (6),

J_{π} (0) = \int_{A} ({∥ y ∥}_{L^{2} (Q_{T})}^{2} + γ_{2} {∥ y (T) ∥}^{2}) d π (A) \leq {∥ y_{0} ∥}^{2} e^{T} (T + γ_{2}) .

So,

\begin{matrix} γ_{1} {∥ u_{n} ∥}_{L^{2} (Q_{T})}^{2} \leq J_{π} (u_{n}) \leq inf J_{π} (u) + \frac{1}{n} \leq J_{π} (0) + \frac{1}{n} \\ \leq J_{π} (0) + 1 \leq {∥ y_{0} ∥}^{2} e^{T} (T + γ_{2}) + 1 . \end{matrix}

(10)

Equations (10) and (6) imply that

{∥ u_{n} ∥}_{L^{2} (Q_{T})}^{2} + max_{t \in [0, T]} {∥ y_{n} (t) ∥}^{2} \leq C,

where

C = \frac{{∥ y_{0} ∥}^{2} e^{T} (T + γ_{2}) + 1}{γ_{1}} (1 + e^{T}) + {∥ y_{0} ∥}^{2} e^{T}

does not depend on n,

π

, and A.

Moreover, from (7), we deduce that for every A,

{y_{n}}

is bounded in

W (0, T)

. As

W (0, T)

is compactly embedded in

L^{2} (Q_{T})

, up to subsequence, for some

{{\bar{y}}_{A}, \bar{u}}

,

\begin{matrix} u_{n} \to \bar{u} weakly in L^{2} (Q), \\ y_{n} \to {\bar{y}}_{A} weakly in W (0, T), \\ y_{n} \to {\bar{y}}_{A} in L^{2} (Q_{T}) and a . e . in Q_{T}, \\ y_{n} (t) \to {\bar{y}}_{A} (t) weakly in L^{2} (Ω) \forall t \in [0, T] . \end{matrix}

(11)

Taking an arbitrary

ξ \in C_{0}^{\infty} (0, T)

and passing the limit in the equality

- \int_{0}^{T} (y_{n} (t), v) ξ_{t} + \int_{0}^{T} (A \nabla y_{n}, \nabla v) ξ = \int_{0}^{T} (u_{n} (t), v) ξ,

we obtain that

{\bar{y}}_{A}

is a solution of (1) with control

\bar{u}

and matrix A. Due to the uniqueness of such a solution, the whole sequence

y_{n}

tends to

{\bar{y}}_{A}

.

Moreover, due to (11),

\forall t \in [0, T]

, we get

{∥ \bar{u} ∥}_{L^{2} (Q_{T})} + ∥ \bar{y} {(t) ∥}^{2} \leq l i m {{∥ {\bar{u}}_{n} ∥}_{L^{2} (Q_{T})} + ∥ {\bar{y}}_{n} (t) ∥^{2}} \leq C .

Also, using Fatou’s Lemma, we get

\begin{matrix} lim J_{π} (u_{n}) = lim \int_{A} ({∥ y_{n} ∥}_{L^{2} (Q_{T})} + γ_{1} {∥ u_{n} ∥}_{L^{2} (Q_{T})}^{2} + γ_{2} ∥ y_{n} (T) ∥^{2}) d π (A) \\ \geq \int_{A} (lim {∥ y_{n} ∥}_{L^{2} (Q_{T})} + γ_{1} lim {∥ u_{n} ∥}_{L^{2} (Q_{T})}^{2} + γ_{2} lim ∥ y_{n} (T) ∥^{2}) d π (A) \geq J_{π} (\bar{u}) . \end{matrix}

Therefore,

{\bar{y}, \bar{u}} \equiv {{\bar{y}}_{π}, {\bar{u}}_{π}}

is a solution of the LQR optimal control problem (1), (2), and (9) holds.

Because of the strict convexity of

u \mapsto J_{π} (u)

, we obtain uniqueness. Thus, the theorem is proved. □

Now, let us assume that

π^{n} \overset{w}{\to} π^{\infty} .

(12)

The convergence of optimal controls under the weak convergence of probability measures is analyzed using the Wasserstein metric, a powerful tool in probability and optimal transport theory [31].

Let

{{\bar{y}}_{π}^{n}, {\bar{u}}_{π}^{n}}

and

{{\bar{y}}_{π}^{\infty}, {\bar{u}}_{π}^{\infty}}

be optimal solutions of (1), (2) for measures

π^{n}

and

π^{\infty}

.

Theorem 2.

Under assumption (12),

J_{π^{n}} ({\bar{u}}_{π^{n}}) \to J_{π^{\infty}} ({\bar{u}}_{π^{\infty}}),

(13)

{\bar{u}}_{π^{n}} \to {\bar{u}}_{π^{\infty}} i n L^{2} (Q_{T}),

(14)

\forall A \in A, {\bar{y}}_{π^{n}} \to {\bar{y}}_{π^{\infty}} i n C ([0, T]; L^{2} (Ω)) .

(15)

Proof of Theorem 2.

For every

u \in L^{2} (Q_{T})

, due to (6) and (8),

\begin{matrix} | \int_{A} ({∥ y_{A_{1}} ∥}_{L^{2} (Q_{T})}^{2} + γ_{1} {∥ u ∥}_{L^{2} (Q_{T})}^{2} + γ_{2} {∥ y_{A_{1}} (T) ∥}^{2}) d π^{n} (A_{1}) \\ - \int_{A} ({∥ y_{A_{2}} ∥}_{L^{2} (Q_{T})}^{2} + γ_{1} {∥ u ∥}_{L^{2} (Q_{T})}^{2} + γ_{2} {∥ y_{A_{2}} (T) ∥}^{2}) d π^{\infty} (A_{2}) | \\ = | \int_{A \times A} [({∥ y_{A_{1}} ∥}_{L^{2} (Q_{T})}^{2} - {∥ y_{A_{2}} ∥}_{L^{2} (Q_{T})}^{2}) + γ_{2} (∥ y_{A_{1}} {(T) ∥}^{2} - ∥ y_{A_{2}} (T) ∥^{2})] d μ | \\ \leq C ({∥ u ∥}_{L^{2} (Q_{T})}) \cdot \int_{A \times A} {∥ A_{1} - A_{2} ∥}^{\frac{1}{2}} d μ, \end{matrix}

(16)

where

C ({∥ u ∥}_{L^{2} (Q_{T})})

is bounded if

{∥ u ∥}_{L^{2} (Q_{T})}

is bounded, and

μ

is a probability measure on

A \times A

with projections

π^{N}

and

π^{\infty}

. If we take

d (A_{1}, A_{2}) = {∥ A_{1} - A_{2} ∥}^{2}

, then space

(A, d)

is a Polish metric space. Therefore, from (16), we get, for

u \in L^{2} (Q_{T})

,

| J_{π^{n}} (u) - J_{π^{\infty}} (u) | \leq C ({∥ u ∥}_{L^{2} (Q_{T})}) ρ (π^{N}, π^{\infty}) .

(17)

Let us denote

C = max \{C (∥ {\bar{u}}_{π^{n}} ∥_{L^{2} (Q_{T})}), C (∥ {\bar{u}}_{π^{\infty}} ∥_{L^{2} (Q_{T})})\} .

Due to (9), the number

C > 0

does not depend on n.

Then, the optimality of

{\bar{u}}_{π^{n}}

and

{\bar{u}}_{π^{\infty}}

implies

\begin{matrix} inf J_{π^{n}} (u) - inf J_{π^{\infty}} (u) \leq inf J_{π^{n}} (u) - J_{π^{\infty}} ({\bar{u}}_{π^{\infty}}) \\ \leq J_{π^{n}} ({\bar{u}}_{π^{\infty}}) - J_{π^{\infty}} ({\bar{u}}_{π^{\infty}}) \leq C ρ (π^{n}, π^{\infty}) . \end{matrix}

(18)

\begin{matrix} inf J_{π^{\infty}} (u) - inf J_{π^{n}} (u) \leq inf J_{π^{\infty}} (u) - J_{π^{n}} ({\bar{u}}_{π^{n}}) \\ \leq J_{π^{\infty}} ({\bar{u}}_{π^{n}}) - J_{π^{\infty}} ({\bar{u}}_{π^{n}}) \leq C ρ (π^{\infty}, π^{n}) . \end{matrix}

(19)

Inequalities (18) and (19) guarantee (13).

In the following, we denote

{\bar{u}}^{n} : = {\bar{u}}_{π^{n}}

,

{\bar{u}}^{\infty} : = {\bar{u}}_{π^{\infty}}

.

Due to (7) and (9), we obtain that

\begin{matrix} {{\bar{u}}_{n}} is bounded in L^{2} (Q_{T}) \\ \forall A {{\bar{y}}_{n}} is bounded in W (0, T) \end{matrix}

So, up to the subsequence, for some

{\bar{u}

,

\bar{y}}

,

{{\bar{u}}^{n}, {\bar{y}}^{n}}

tends to

{\bar{u}, \bar{y}}

in the sense of (11).

Passing to the limit yields that

\bar{y}

is a solution of (1) with control

\bar{u}

.

Due to (17),

J_{π^{\infty}} ({\bar{u}}^{n}) \leq J_{π^{n}} ({\bar{u}}^{n}) + C ρ (π^{n}, π^{\infty}) .

So, using (11), (13), and Fatou’s Lemma, we get

J_{π^{\infty}} (\bar{u}) \leq lim J_{π^{\infty}} ({\bar{u}}^{n}) \leq lim (J_{π^{n}} ({\bar{u}}^{n}) + C ρ (π^{n}, π^{\infty})) = J_{π^{\infty}} ({\bar{u}}_{\infty}) .

(20)

The uniqueness of the solution of (1), (2), and (20) implies that

\bar{u} = {\bar{u}}_{\infty}

, and consequently,

\bar{y} = {\bar{y}}_{\infty}

.

This inequality also implies that

{\bar{u}}^{n} \to \bar{u}

in

L^{2} (Q_{T})

.

Indeed, on the one hand,

\begin{matrix} lim J_{π^{\infty}} ({\bar{u}}^{n}) \leq J_{π^{\infty}} ({\bar{u}}^{\infty}) \\ = \int_{A} (∥ {\bar{y}}^{\infty} ∥_{L^{2} (Q_{T})}^{2} + γ_{2} {∥ {\bar{y}}^{\infty} (T) ∥}^{2}) d π^{\infty} (A) + γ_{1} {∥ {\bar{u}}^{\infty} ∥}_{L^{2} (Q_{T})}^{2} . \end{matrix}

And on the other hand,

\begin{matrix} lim J_{π^{\infty}} ({\bar{u}}^{n}) \geq lim \int_{A} (∥ {\bar{y}}^{n} ∥_{L^{2} (Q_{T})}^{2} + γ_{2} {∥ {\bar{y}}^{n} (T) ∥}^{2}) d π^{\infty} (A) + γ_{1} lim {∥ {\bar{u}}^{n} ∥}_{L^{2} (Q_{T})}^{2} \\ \geq \int_{A} lim (∥ {\bar{y}}^{n} ∥_{L^{2} (Q_{T})}^{2} + γ_{2} {∥ {\bar{y}}^{n} (T) ∥}^{2}) d π^{\infty} (A) + γ_{1} lim {∥ {\bar{u}}^{n} ∥}_{L^{2} (Q_{T})}^{2} \\ = \int_{A} (∥ {\bar{y}}^{\infty} ∥_{L^{2} (Q_{T})}^{2} + γ_{2} {∥ {\bar{y}}^{\infty} (T) ∥}^{2}) d π^{\infty} (A) + γ_{1} lim {∥ {\bar{u}}^{n} ∥}_{L^{2} (Q_{T})}^{2} . \end{matrix} .

So,

lim ∥ {\bar{u}}^{n} ∥_{L^{2} (Q_{T})} \leq {∥ {\bar{u}}^{\infty} ∥}_{L^{2} (Q_{T})} .

This inequality and weak convergence

{\bar{u}}^{N}

to

{\bar{u}}^{\infty}

in

L^{2} (Q_{T})

imply the strong convergence of (14).

Let us prove (15). We already know from (11) that

\forall A \in A, ∥ {\bar{y}}^{N} (t) ∥ \to ∥ {\bar{y}}^{\infty} (t) ∥ for a . e . t \in (0, T),

and that functions

t \mapsto ∥ {\bar{y}}^{n} (t) ∥

,

t \mapsto ∥ {\bar{y}}^{\infty} (t) ∥

are continuous.

From (1), we deduce

\frac{1}{2} \frac{\partial}{\partial t} {∥ {\bar{y}}^{n} (t) ∥}^{2} \leq 2 ({\bar{u}}^{n} (t), {\bar{y}}^{n} (t)) .

Therefore, for all

t > s > 0

,

\forall t \geq s \geq 0 ∥ {\bar{y}}^{n} {(t) ∥}^{2} \leq {∥ {\bar{y}}^{n} (s) ∥}^{2} + 2 \int_{s}^{t} ({\bar{u}}^{n} (τ), {\bar{y}}^{n} (τ)) d τ .

So, the function

t \mapsto J_{n} (t) = {∥ {\bar{y}}^{n} (t) ∥}^{2} - 2 \int_{0}^{t} ({\bar{u}}^{n} (τ), {\bar{y}}^{n} (τ)) d τ

is monotone, continuous, and

J_{n} (t) \to J_{\infty} (t) a . e .

Then, from Dini’s Theorem,

J_{n} (t) \to J_{\infty} (t) in C ([0, T]),

which implies (15). □

5. An Example

The aim of this section is to illustrate the obtained results using the scheme utilized in [4]. We assume that

A

is a finite set, i.e.,

A = {A_{1}, . ., A_{M}}

for some

M \in N

. We consider a sequence of probability distributions

{π^{n}}_{n = 1}^{\infty}

:

π^{n} = \sum_{i = 1}^{M} α_{i}^{n} \cdot δ_{A_{i}}, where α_{i}^{n} \geq 0, \sum_{i = 1}^{M} α_{i}^{n} = 1,

where

δ_{A_{i}}

is a Dirac delta concentrated at

A_{i}

.

Assume that

α_{1}^{n} \to 1, α_{i}^{n} \to 0, i = 2 \dots M as n \to \infty .

Then, clearly,

π^{n} \to δ_{A_{1}} weakly .

(21)

For the measure

π^{\infty} = δ_{A_{1}}

, problem (1), (2) is the following LQR optimal control problem:

\{\begin{matrix} \frac{\partial y (t, x)}{\partial t} - L_{A_{1}} y (t, x) & = u (t, x), \\ {y |}_{x \in \partial Ω} & = 0, \\ {y |}_{t = 0} & = y_{0} (x), \end{matrix}

(22)

J_{δ_{A_{1}}} (u) = \int_{0}^{T} (\int_{Ω} y^{2} (t, x) d x + γ_{1} \int_{Ω} u^{2} (t, x) d x) d t + γ_{2} \int_{Ω} y^{2} (T, x) d x \to inf,

(23)

According to Theorem 1, problems (22), (23) has a unique solution

{{\bar{y}}_{π^{\infty}}, {\bar{u}}_{π^{\infty}}}

, which satisfies (9). For the measure

π^{n} = \sum_{i = 1}^{M} α_{i}^{n} δ_{A_{i}}

, we have the following problem of minimizing the functional

J_{π^{n}} (u) = \sum_{i = 1}^{M} [α_{i}^{n} \cdot J_{δ_{A_{i}}} (u)] \to i n f

(24)

where

J_{δ_{A_{1}}} (u) = \int_{0}^{T} (\int_{Ω} y^{2} (t, x) d x + γ_{1} \int_{Ω} u^{2} (t, x) d x) d t + γ_{2} \int_{Ω} y^{2} (T, x) d x,

and y is a solution of (1) with

A = A_{i}

. With obvious changes, we can apply argument (10), (11) to problem (24) and obtain that such a problem has a unique solution

{\bar{u}}_{π^{n}}

, which together with y-components satisfies estimation (9) with constant C not depending on n. This means that

J_{δ_{A_{i}}} ({\bar{u}}_{π^{n}}) \leq C^{2} (T + γ_{1} + γ_{2}), i = 1 \dots M, n \geq 1 .

Therefore, using a well-known fact [3]—if in (1),

u_{n} \to u

weakly in

L^{2} (Q_{T})

, then

y_{n} \to y

in

C ([0, T]; L^{2} (Ω))

—we can pass to the limit and obtain

J_{π^{n}} ({\bar{u}}_{π^{n}}) = α_{1}^{n} \cdot J_{δ_{A_{1}}} ({\bar{u}}_{π^{n}}) + \sum_{i = 2}^{M} α_{i}^{n} \cdot J_{δ_{A_{i}}} ({\bar{u}}_{π^{n}}) \to J_{δ_{A_{1}}} ({\bar{u}}_{δ_{A_{1}}}), n \to \infty .

(25)

We can give a numerical illustration in the simplest case:

d = 1

,

Ω = (0, π)

,

γ_{1} = γ_{2} = γ_{3} = 1

,

M = 2

,

A_{1} = 1

,

A_{2} = 2

,

α_{1}^{n} = 1 - \frac{1}{2^{n}}

,

α_{2}^{n} = \frac{1}{2^{n}}

. Then, for

T = 1

and

y_{0} (x) = sin x

for problem (22), (23), we have

\{\begin{matrix} \frac{\partial y (t, x)}{\partial t} - \frac{\partial^{2} y (t, x)}{\partial x^{2}} & = u (t, x), \\ {y |}_{x = 0} & = 0, \\ {y |}_{x = π} & = 0, \\ {y |}_{t = 0} & = sin x, \end{matrix}

(26)

J_{δ_{A_{1}}} (u) = \int_{0}^{1} \int_{0}^{π} (y^{2} (t, x) + u^{2} (t, x)) d x d t + y^{2} (1, x) \to inf,

(27)

Using Fourier analysis, we can reduce problem (26), (27) to the classical calculus of variations problem.

inf J_{δ_{A_{i}}} (u) = inf_{\begin{matrix} y \in C^{1} ([0, 1]) \\ y (0) = 1 \end{matrix}} [\int_{0}^{1} (y^{2} (t) + {(y^{'} + y)}^{2}) d t + y^{2} (1)] .

(28)

For problem (24), we have

\begin{matrix} inf J_{π_{N}} (u) = inf_{\begin{matrix} y_{1}, y_{2} \in C^{1} ([0, 1]) \\ y_{1} (0) = y_{2} (0) = 1 \end{matrix}} [(1 - \frac{1}{2^{n}}) [\int_{0}^{1} (y_{1}^{2} (t) + {(y_{1}^{'} + y_{1})}^{2}) d t + y_{1}^{2} (1)] \\ + \frac{1}{2^{n}} [\int_{0}^{1} (y_{2}^{2} (t) + {(y_{2}^{'} + y_{2})}^{2}) d t + y_{2}^{2} (1)]] \end{matrix}

(29)

which, clearly, tends to the value of (28) as

n \to \infty

.

6. Conclusions

In this paper, we studied an averaged Linear Quadratic Regulator (LQR) problem for a parabolic partial differential equation (PDE), where the system dynamics are described by a probability distribution over possible operators. This formulation generalizes the classical LQR problem by incorporating uncertainty in the system parameters through an averaging approach.

The main contributions of this work are the following:

We established the existence and uniqueness of the optimal control solution under appropriate assumptions.
We proved the convergence of the optimal control as the probability distribution governing the system dynamics become more concentrated.

These results provide a rigorous theoretical foundation for analyzing control problems with uncertainty in system parameters.

In future research, we plan to generalize our results to multidimensional parabolic systems and evolution problems on infinite-time intervals. Moreover, it should be interesting to extend the results to hyperbolic partial differential equations (PDEs), which model wave-like and transport phenomena [32]. Exploring the control of such systems within a reinforcement learning framework may yield new methods for optimal control in dynamic environments [33], with potential applications in robotics, physics-based simulations, and real-time decision-making systems.

Beyond the realm of physical systems, intelligent control and learning frameworks are gaining traction in socio-technical domains. One notable example is the use of machine learning models to support automated recruitment and decision-making for hiring young professionals [34]. Additionally, the development of advanced database connectors underscores the relevance of scalable control and dataflow mechanisms in distributed computing systems [35].

Author Contributions

Conceptualization, O.K., A.M., and O.L.; methodology, O.K.; formal analysis, O.K.; investigation, O.K., A.M., and O.L.; writing—original draft preparation, A.M.; writing—review and editing, O.K. and A.M.; visualization, A.M.; supervision, O.K.; project administration, O.K.; funding acquisition, O.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Evans, L.C. Partial Differential Equations, 2nd ed.; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
Anderson, B.D.O.; Moore, J.B. Optimal Control: Linear Quadratic Methods; Prentice Hall: Englewood Cliffs, NJ, USA, 1989. [Google Scholar]
Kapustyan, O.V.; Kapustyan, O.A.; Sukretna, A.V. Approximate bounded synthesis for one weakly nonlinear boundary-value problem. Nonlinear Oscil. 2009, 12, 297–304. [Google Scholar] [CrossRef]
Pesare, A.; Palladino, M.; Falcone, M. Convergence results for an averaged LQR problem with applications to reinforcement learning. Math. Control Signals Syst. 2021, 33, 379–411. [Google Scholar] [CrossRef]
Kapustian, O.A. Approximate optimal regulator for distributed control problem with superposition functional and rapidly oscillating coefficients. In Modern Mathematics and Mechanics; Sadovnichiy, V., Zgurovsky, M., Eds.; Springer: Cham, Switzerland, 2019; pp. 199–208. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Sutton, R.S.; Barto, A.G.; Williams, R.J. Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. 1992, 12, 19–22. [Google Scholar]
Recht, B. A tour of reinforcement learning: The view from continuous control. Annu. Rev. Control Robot Auton. Syst. 2019, 2, 253–279. [Google Scholar] [CrossRef]
Atkeson, C.G.; Santamaria, J.C. A comparison of direct and model-based reinforcement learning. In Proceedings of the International Conference on Robotics and Automation, Albuquerque, NM, USA, 20–25 April 1997; Volume 4, pp. 3557–3564. [Google Scholar]
Deisenroth, M.P. Efficient Reinforcement Learning Using Gaussian Processes; KIT Scientific Publishing: Karlsruhe, Germany, 2010. [Google Scholar]
Deisenroth, M.P.; Fox, D.; Rasmussen, C.E. Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 37, 408–423. [Google Scholar] [CrossRef]
Gal, Y.; McAllister, R.; Rasmussen, C.E. Improving PILCO with Bayesian neural network dynamics models. In Proceedings of the ICML Workshop on Data-Efficient Machine Learning, New York, NY, USA, 24 June 2016; Volume 4, p. 25. [Google Scholar]
Janner, M.; Fu, J.; Zhang, M.; Levine, S. When to trust your model: Model-based policy optimization. Adv. Neural Inf. Process. Syst. 2019, 32, 12519–12530. [Google Scholar]
Kamthe, S.; Deisenroth, M. Data-efficient reinforcement learning with probabilistic model predictive control. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Playa Blanca, Lanzarote, Spain, 9–11 April 2018; pp. 1701–1710. [Google Scholar]
Chowdhary, G.; Kingravi, H.A.; How, J.P.; Vela, P.A. A Bayesian nonparametric approach to adaptive control using Gaussian processes. In Proceedings of the IEEE Conference Decision Control (CDC), Florence, Italy, 10–13 December 2013; pp. 874–879. [Google Scholar]
Chua, K.; Calandra, R.; McAllister, R.; Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Adv. Neural Inf. Process. Syst. 2018, 31, 4754–4765. [Google Scholar]
Wang, T.; Bao, X.; Clavera, I.; Hoang, J.; Wen, Y.; Langlois, E.; Zhang, S.; Zhang, G.; Abbeel, P.; Ba, J. Benchmarking Model-Based Reinforcement Learning. arXiv 2019, arXiv:1907.02057. [Google Scholar]
Murray, R.; Palladino, M. A model for system uncertainty in reinforcement learning. Syst. Control Lett. 2018, 122, 24–31. [Google Scholar] [CrossRef]
Murray, R.; Palladino, M. Modelling uncertainty in reinforcement learning. In Proceedings of the IEEE Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 2436–2441. [Google Scholar]
Bettiol, P.; Khalil, N. Necessary optimality conditions for average cost minimization problems. Discrete Contin. Dyn. Syst. B 2019, 24, 2093. [Google Scholar] [CrossRef]
Palladino, M. Necessary conditions for adverse control problems expressed by relaxed derivatives. Set-Valued Var. Anal. 2016, 24, 659. [Google Scholar] [CrossRef]
Ross, I.M.; Proulx, R.J.; Karpenko, M.; Gong, Q. Riemann–Stieltjes optimal control problems for uncertain dynamic systems. J. Guid. Control Dyn. 2015, 38, 1251–1263. [Google Scholar] [CrossRef]
Lohéac, J.; Zuazua, E. From averaged to simultaneous controllability. Ann. Fac. Sci. Toulouse Math. 2016, 25, 785–828. [Google Scholar] [CrossRef]
Zuazua, E. Averaged control. Automatica 2014, 50, 3077–3087. [Google Scholar] [CrossRef]
Barabash, O.; Sobchuk, V.; Sobchuk, A.; Musienko, A.; Laptiev, O. Algorithms for synthesis of functionally stable wireless sensor network. Adv. Inf. Syst. 2025, 9, 70–79. [Google Scholar]
Doya, K. Reinforcement learning in continuous time and space. Neural Comput. 2000, 12, 219–245. [Google Scholar] [CrossRef]
Lee, J.Y.; Park, J.B.; Choi, Y.H. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations. IEEE Trans. Neural Netw. Learn. Syst. 2014, 26, 916–932. [Google Scholar]
Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32. [Google Scholar] [CrossRef]
Munos, R. A study of reinforcement learning in the continuous case by the means of viscosity solutions. Mach. Learn. 2000, 40, 265–297. [Google Scholar] [CrossRef]
Temam, R. Infinite-Dimensional Dynamical Systems in Mechanics and Physics; Springer Science & Business Media: Cham, Switzerland, 2013. [Google Scholar]
Villani, C. Optimal Transport: Old and New; Springer: Berlin, Germany, 2009. [Google Scholar]
Lions, J.L. Contrôlabilité Exacte, Perturbations et Stabilisation de Systèmes Distribués; Masson: Paris, France, 1988. [Google Scholar]
Fleming, W.H.; Soner, H.M. Controlled Markov Processes and Viscosity Solutions, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
Makarovych, V.; Makarovych, A. Analysis of socio-economic determinants of youth employment using machine learning methods. Acta Acad. Beregsasiensis Econ. 2024, 6, 81–101. [Google Scholar]
Glebena, M.I.; Makarovych, A.V. SingleStoreDB connector for Apache Beam. Sci. Bull. Uzhhorod Univ. Ser. Math. Inf. 2024, 44, 66–82. [Google Scholar] [CrossRef] [PubMed]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kapustian, O.; Laptiev, O.; Makarovych, A. Averaging of Linear Quadratic Parabolic Optimal Control Problem. Axioms 2025, 14, 512. https://doi.org/10.3390/axioms14070512

AMA Style

Kapustian O, Laptiev O, Makarovych A. Averaging of Linear Quadratic Parabolic Optimal Control Problem. Axioms. 2025; 14(7):512. https://doi.org/10.3390/axioms14070512

Chicago/Turabian Style

Kapustian, Olena, Oleksandr Laptiev, and Adalbert Makarovych. 2025. "Averaging of Linear Quadratic Parabolic Optimal Control Problem" Axioms 14, no. 7: 512. https://doi.org/10.3390/axioms14070512

APA Style

Kapustian, O., Laptiev, O., & Makarovych, A. (2025). Averaging of Linear Quadratic Parabolic Optimal Control Problem. Axioms, 14(7), 512. https://doi.org/10.3390/axioms14070512

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Averaging of Linear Quadratic Parabolic Optimal Control Problem

Abstract

1. Introduction

2. Setting of the Problem

3. Preliminary Results

4. Main Results

5. An Example

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI