Energy Management and Control for Linear–Quadratic–Gaussian Systems with Imperfect Acknowledgments and Energy Constraints

Ju, Zhiping; Guo, Lijun; Li, Jiajia; Ju, Qiangchang

doi:10.3390/axioms14110791

Open AccessArticle

Energy Management and Control for Linear–Quadratic–Gaussian Systems with Imperfect Acknowledgments and Energy Constraints

by

Zhiping Ju

¹,

Lijun Guo

²,

Jiajia Li

^2,* and

Qiangchang Ju

³

¹

School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

²

College of Science, University of Shanghai for Science and Technology, Shanghai 200093, China

³

Institute of Applied Physics and Computational Mathematics, Beijing 100088, China

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(11), 791; https://doi.org/10.3390/axioms14110791 (registering DOI)

Submission received: 8 September 2025 / Revised: 19 October 2025 / Accepted: 24 October 2025 / Published: 27 October 2025

Download

Browse Figures

Versions Notes

Abstract

This paper explores the optimal control issue for a linear–quadratic–Gaussian (LQG) system under the conditions of imperfect feedback and constraints related to energy harvesting. The system is equipped with various energy options, which allow it to gather energy for information transmission while also receiving imperfect feedback from an auxiliary filter that estimates packet loss. The primary goal of this study is to jointly design the energy selector and the controller to achieve an optimal balance between transmission costs and control performance. Initially, we separate the controller’s synthesis task from the energy selection task. The subproblem of optimal controller synthesis is characterized by a Riccati equation that takes continuous packet loss into account. Simultaneously, the energy selection task, influenced by imperfect feedback and constraints on energy costs, is reformulated as a Markov decision process (MDP) that operates with perfect acknowledgments through iterative updates of state information. Ultimately, the optimal energy selection policy that guarantees filtering performance is derived by solving a Bellman equation. The effectiveness of the proposed approach is confirmed through simulation results.

Keywords:

LQG optimal control; optimal energy transmission; Markov decision process (MDP); imperfect acknowledgments

MSC:

62M05; 93E11; 93E20

1. Introduction

As a result of swift advancements in networking, communication technology, and computing, the management of these systems intertwines the cyber and physical realms closely. To ensure the stability of these systems, it is essential to have dependable signals that are transmitted among the components over a common communication network. As demonstrated in references [1,2], implementing improvements to performance and reliability will be subject to challenges when undertaken in the presence of resource constraints. Due to resource constraints, measurement errors are inevitable when predicting actual systems and making decisions. In such cases, these errors are often handled in probabilistic terms to achieve better results [3,4].

Linear–quadratic–Gaussian (LQG) control, a classical and widely used optimal control method in modern control theory, primarily addresses the optimal control problem of linear stochastic systems. It achieves optimal feedback control laws by solving the Riccati equation, integrating state feedback with Kalman filtering. In practical applications, LQG control has been successfully implemented in various fields such as aerospace, robotic control, and power systems. However, as modern industries transition towards intelligence and networking, the dependence of control systems on networks has increased significantly. In this context, the impact of communication constraints on system performance cannot be overlooked [5,6,7]. Taking smart grids as an example, limited communication bandwidth restricts the amount of real-time data transmission between power nodes. When there are sudden changes in grid loads, various nodes may fail to timely exchange power adjustment information, leading to exacerbated voltage fluctuations and potentially triggering local grid failures. Aghaee, Fateme et al. focused on the impact of communication delays and data packet loss on distributed secondary control for microgrids [8]. The study presented in [9] investigates the safety of surgical robots with autonomous control in the context of surgical procedures, focusing on the modeling and control of nonlinear tissue compression and heating utilizing LQG methods. The authors in [10] showcase the balance between the expenses associated with control and the data transmission rates in infinite-horizon LQG systems. Furthermore, the work in [11] examines the ideal tracking capabilities of single-input single-output networked control systems when faced with restricted communication channels. In a similar vein, this paper delves into the LQG control issue under the conditions of communication limitations.

Prior research efforts [12,13,14] have examined the energy transmission issue in systems incorporating energy harvesters, which play a crucial role in transforming various environmental energy types into electricity. In [15], the authors demonstrated that the distributed resource allocation game algorithm significantly enhances energy efficiency and reduces network interference, thereby ensuring successful data transmission. Additionally, In [16], the authors investigated the challenges related to energy transmission for a sensor that harvests energy for remote state estimation, utilizing a continuous-time methodology alongside perturbation analysis. Building on this foundation, they demonstrated the existence of an optimal policy for the allocation of deterministic and stationary power transmission [17]. It is evident that the integration of energy harvesters substantially improves system performance. Consequently, it is posited that the energy selector possesses the capability to efficiently harvest energy.

Generally speaking, control strategies and energy selection strategies need to be co-designed for a long time. For example, researchers demonstrated that the ideal controller does not display the separation principle [2]. On the other hand, they offered the essential criteria for the controller to manifest a separation principle [5]. Based on the principle of separation proposed in [5], a novel framework is introduced in [18] that integrates a controller with a quantization selector, enabling the dynamic determination of the optimal quantization level from a specified set of quantization levels. Nevertheless, the optimal structure of the controller and the energy selector has not been sufficiently investigated. This paper adopts a similar approach and investigates the separation structure between the controller and the energy selector. In contrast to [18], this paper considers a situation where the communication channel is unreliable, which may lead to packet loss. In [19], the authors addressed the issue of energy management for a controller that transmits information to a sensor over communication channels characterized by unknown packet loss. Unlike the approach taken in [19], this paper examines a scenario where the plant transmits information through an unknown packet-dropping link to the filter. In a manner akin to the channel model outlined in reference [17], the energy used for transmission impacts the frequency of packet loss. Apart from scenarios involving flawless receipt acknowledgments [17,20], this study also takes into account a feedback channel that operates as a flawed packet-dropping link between the filter and the plant.

This study examines a challenge related to inefficient control and energy scheduling that stems from a faulty feedback channel. During each time interval, the energy selector assesses the power allocated for data transmission, relying on imperfect acknowledgments and the currently available energy supply. Simultaneously, the controller produces outputs to sustain an ideal equilibrium between control effectiveness and expenses. The optimal control issue, assuming that the state is estimated optimally, is defined by a Riccati equation that incorporates packet loss. The problem of energy selection is addressed by reinterpreting it as a Markov decision process (MDP) with accurate acknowledgments. In conclusion, this study outlines three main contributions:

(1) The LQG optimal control problem under the optimal estimated state with given energy levels is investigated by considering imperfect acknowledgments and energy constraints.

(2) In the case of the packet loss information and transmission cost, a novel optimal controller structure under the optimal estimated state is derived by using backward induction.

(3) The optimal energy selection strategies to ensure filtering performance and the suboptimal controller gain are co-designed, which can be computed offline and independently.

Notations: The symbols

X^{⊤}

and tr(X) denote the transpose and the trace of a real-valued matrix denoted by X, respectively. The inverse of the matrix X is expressed as

X^{- 1}

. The sets of non-negative integers and positive integers are represented by

N_{0}

and

N_{+}

, respectively. The n-dimensional Euclidean space is referred to as

R^{n}

. Additionally,

P [X]

signifies the probability of X, while

E [X | Y]

indicates the conditional expectation of X given Y.

As shown in Figure 1, the plant, on one hand, acquires energy from the environment via its integrated energy-harvesting subsystem (comprising an energy collector, an energy storage battery, and an energy selector) to generate

σ_{t}

. This energy is stored in the battery as

b_{t}

, and then, the energy selector chooses the optimal transmission energy

p_{t}

from M energy levels to power the transmission of the state value. On the other hand, the plant transmits its own state

X_{t}

to the filter through a wireless channel with packet loss. The filter performs state estimation based on the received state value

Y_{t}

, obtains the estimated value, and sends it to the controller. The controller, relying on its own information set, calculates the optimal control signal

U_{t}

in accordance with the separation principle and feeds it back to the plant to stabilize its state. Meanwhile, the filter sends an imperfect acknowledgment signal

{\hat{γ}}_{t}

to the plant. The plant uses this feedback signal to adjust the subsequent energy-selection strategy, ultimately forming a complete closed loop of “state → energy optimization → estimation → control → feedback regulation” and enabling the stable operation of the system under energy constraints and imperfect feedback. In the following sections, we will detail the different submodules represented in the plant diagram.

2. Plant

2.1. LQG System

Consider the following time-invariant LQG system:

X_{t + 1} = A X_{t} + B U_{t} + W_{t}

(1)

For every

t \in N_{0}

, let

X_{t} \in R^{n}

represent the state of the system, while

U_{t} \in R^{m}

denotes the control input. The matrices A and B are constants with dimensions that are compatible. The sequence

{W_{t}}_{t \in N_{0}}

corresponds to a series of independent and identically distributed (i.i.d.) noise values, each following a zero-mean Gaussian distribution denoted by

W_{t} \sim N (0, W)

. The initial state

X_{0}

is characterized by a Gaussian distribution with parameters specified as

N (μ_{0}, Σ)

.

In a wireless communication scenario plagued by packet loss, the plant conveys the measurement data to the filter. It is assumed that the wireless channel operates as an additive white Gaussian noise (AWGN) channel, where the relationship between the bit error rate (BER) and the transmission energy

p_{t}

is detailed in [17]:

\begin{matrix} BER & = 2 Φ (\sqrt{\frac{α p_{t}}{S_{0} K}}) \end{matrix}

(2)

where

\begin{matrix} Φ (x) ≜ \frac{1}{\sqrt{2 π}} \int_{x}^{\infty} exp (- t^{2} / 2) d t \end{matrix}

(3)

and

α

represents a constant parameter, with

S_{0}

denoting the spectral density of the noise power, while K signifies the bandwidth of the wireless channel.

This study examines a situation in which energy availability is limited and the required volume of transmitted information is typically constrained. For simplicity, we consider that each data packet consists of a single bit (similar to what is employed for parity checks). If a packet is lost during transmission, the receiving end will not verify the parity correctly, meaning that the packet dropout corresponds to

BER

. Various data-check methods outlined in [16] can identify packet errors.

We can represent the transmission processes using a binary random variable

γ_{t}

, where

γ_{t} = 1

indicates that the transmitted signal is received without errors by the filter, while

γ_{t} = 0

signifies a dropout scenario. The probability of dropout for the transmitted signals is denoted by

\begin{matrix} P [γ_{t} = 0] = 2 Φ (\sqrt{α \frac{p_{t}}{S_{0} K}}) . \end{matrix}

(4)

It can be deduced from Equation (4) that different transmission energy values may give rise to different dropout probabilities, which subsequently impact the estimation performance at the filter. Depending on the above indicator function

γ_{t}

, the state available to the filter is represented as

Y_{t} = (1 - γ_{t}) X_{t} + γ_{t} (A X_{t - 1} + B U_{t - 1}) .

(5)

2.2. Energy Harvesting and Storage

The energy selector is equipped with an energy harvester that could gather energy from the environment. Denote the amount of harvested energy from the beginning to the end at step t as

σ_{t}

. The process of energy harvesting, denoted as

σ_{t}

, is considered to be a first-order homogeneous Markov process operating in discrete time and demonstrating stationarity. Let C represent the maximum energy storage capacity of the battery, with the condition that

C \geq p_{t}

. Following the work of [4], we assume that the evolution of the battery’s available energy is expressed as

\begin{matrix} b_{t + 1} = min {b_{t} - p_{t} + σ_{t + 1}, C}, t \in N_{0} . \end{matrix}

(6)

2.3. Energy Selection

Divide the transmission energy into M levels, and select the transmission energy

p_{t}

among M energy levels. The energy for transmission at the i-th level, symbolized as

Q^{i}

(

i \in {1, 2, \dots, M}

), is associated with a cost denoted by

λ (Q^{i}) = λ_{i} \in N_{+}

. Let us define the new decision variable

π_{t}^{i}

for the selection of transmission energy as follows:

π_{t}^{i} = 1

if the i-th transmission energy level is employed at time t, and

π_{t}^{i} = 0

in all other cases. Consequently, the vector

π_{t} ≜ {[π_{t}^{1}, π_{t}^{2}, \dots, π_{t}^{M}]}^{⊤} \in {0, 1}^{M}

represents the choices made regarding the switching of transmission energy at the specified time t. Because the transmission energy selector selects only one transmission energy at each time step, hence, we can have

Σ_{i = 1}^{M} π_{t}^{i} = 1

for all

t \in N_{0}

.

3. Filter and Imperfect Feedback

The filter receives the state value

Y_{t}

transmitted by the plant, performs state estimations based on historical measurements and control information, and transmits the estimated results to the controller; at the same time, it sends an imperfect feedback signal to the plant.

3.1. Imperfect Feedback

A signal acknowledging transmission is dispatched from the filter to the plant at every time step. This study addresses a feedback channel that may contain errors. The packet loss sequence

{γ_{t}, t \in N_{0}}

remains unknown to the plant. Instead, an imperfect acknowledgment sequence

{{\hat{γ}}_{t}, t \in N_{0}}

is received by the plant from the filter. According to [4], the modeling of the erroneous feedback channel is presented as follows:

\begin{matrix} {\hat{γ}}_{t} = \{\begin{matrix} 0 or 1 & if β_{t} = 1 \\ 2 & if β_{t} = 0 \end{matrix} . \end{matrix}

(7)

When

β_{t} = 0

, all feedback signals are completely lost with a specified dropout probability

η

where

η \in [0, 1]

. Conversely, when

β_{t} = 1

, the likelihood of a transmission error arises with a probability of

ϵ

such that

ϵ \in [0, 1]

. This error in transmission results in

{\hat{γ}}_{t} = 0

if

γ_{t} = 1

and

{\hat{γ}}_{t} = 1

if

γ_{t} = 0

. The conditional probability matrix for the feedback channel is formulated as follows:

\begin{matrix} A = (a_{m n}) = [\begin{matrix} (1 - ϵ) (1 - η) & ϵ (1 - η) & η \\ ϵ (1 - η) & (1 - ϵ) (1 - η) & η \end{matrix}] \end{matrix}

(8)

where

a_{m n} : = P (\hat{γ} = n - 1 | γ = m - 1)

for

m \in {1, 2}

and

n \in {1, 2, 3}

. The system receives perfect packet reception acknowledgments when both

ϵ

and

η

are set to 0.

3.2. State Estimation

Let us define the sets as follows:

X_{t} ≜ {X_{0}, X_{1}, \dots, X_{t}}

represents the set of state history,

Y_{t} ≜ {Y_{0}, Y_{1}, \dots, Y_{t}}

represents the history of state values that the filter can receive,

U_{t} ≜ {U_{0}, U_{1}, \dots, U_{t}}

denotes the control history,

{\hat{γ}}_{t} ≜ {{\hat{γ}}_{0}, {\hat{γ}}_{1}, \dots, {\hat{γ}}_{t}}

corresponds to the history of feedback signals, and finally,

Π_{t} ≜ {π_{0}, π_{1}, \dots, π_{t}}

signifies the history of transmission selection.

The data accessible to the controller at time t is represented as

F_{t}^{c} = σ (Y_{0}, π_{0}, Y_{1}, U_{0}, π_{1}, \dots, Y_{t}, U_{t - 1}, π_{t})

(9)

with the initial state being

F_{0}^{c} = {Y_{0}, π_{0}}

, where

σ (\cdot)

a

σ

algebra. Based on Equation (9)’s definition, one can interpret an admission control strategy as a function that maps

F_{t}^{c}

to

R^{m}

. We label these strategies as

ξ_{t}^{u}

. In contrast to the feedback-based control methods described in [2], our approach involves transmitting

W_{t - 1}

rather than

X_{t}

at time t. It is important to note that

W_{t - 1}

can be easily calculated using the values of

X_{t}

,

X_{t - 1}

, and

U_{t - 1}

.

Now, we introduce notation

{\hat{X}}_{t} = E [X_{t} | F_{t - 1}^{c}]

, which we will call the prediction of

X_{t}

. Additionally, we define

{\tilde{X}}_{t} = E [X_{t} | F_{t}^{c}]

as the filtered version of

X_{t}

. Consequently, we express

\hat{ω_{t}} (π_{t + 1}) = E [W_{t} | F_{t + 1}^{c}]

.

Using (1) and because

U_{t}

is

F_{t}^{c}

-measurable, we have

{\hat{X}}_{t + 1} = A {\tilde{X}}_{t} + B U_{t}

(10)

\begin{matrix} {\tilde{X}}_{t} = E [X_{t} | F_{t}^{c}] = {\hat{X}}_{t} + {\hat{ω}}_{t - 1} (π_{t}) . \end{matrix}

(11)

Let us introduce the error

Δ_{t} = X_{t} - {\tilde{X}}_{t}

. From this, it can be derived that

\begin{matrix} Δ_{t + 1} = A Δ_{t} + W_{t} - {\hat{ω}}_{t} = \cdot \cdot \cdot = A^{t + 1} Δ_{0} + \sum_{k = 0}^{t} A^{t - k} (W_{k} - {\hat{ω}}_{k}) \end{matrix}

(12)

where

Δ_{0} = W_{- 1} - {\hat{ω}}_{- 1} (π_{0})

. The estimation error of the state

Δ_{t}

is influenced by the sequence

{π_{0}, \dots, π_{t}}

via the variables

{{\hat{ω}}_{- 1}, \dots, {\hat{ω}}_{t - 1}}

. However, it is independent of the control strategy

ξ^{U}

.

4. Suboptimal Control and Energy Selection Strategy

The controller receives the state estimation transmitted by the filter and, based on its own information set

F_{t}^{c}

, outputs the optimal control signal

U_{t}

to ensure the stability of the equipment state and minimize the total cost. The energy transmission selector at time t has access to the information given by

F_{t}^{e} = σ (X_{0}, U_{0}, {\hat{γ}}_{0}, π_{0}, \dots, X_{t}, U_{t - 1}, {\hat{γ}}_{t}, π_{t - 1}) = F_{t - 1}^{e} \lor σ (X_{t}, U_{t - 1}, {\hat{γ}}_{t}, π_{t - 1})

(13)

with

F_{0}^{e} = {X_{0}}

. The strategies for selecting the transmitter can likewise be considered as a mapping:

F_{t}^{e}

to

{0, 1}^{M}

. These strategies are denoted as

ξ_{t}^{π}

. Thus, the decision-making process during a single time step is illustrated in Figure 2.

The cooperative minimization of the cost function undertaken by the controller and the energy transmission selector involves a finite horizon quadratic criterion, and it is expressed as follows:

\begin{matrix} J_{T} (ξ^{U}, ξ^{Π}) & = E [\sum_{t = 0}^{T - 1} (X_{t}^{⊤} Q_{1} X_{t} + U_{t}^{⊤} R U_{t} + π_{t}^{⊤} \land) \\ + X_{T}^{⊤} Q_{2} X_{T} | U_{t} = ξ_{t}^{u} (F_{t}^{c}), π_{t} = ξ_{t}^{π} (F_{t}^{e})] \end{matrix}

(14)

where

\land = {[λ_{1}, λ_{2}, \dots, λ_{M}]}^{⊤}

,

R ≻ 0

,

Q_{1}

,

Q_{2} ⪰ 0

,

ξ^{Π}

represents the entire sequence

{ξ_{0}^{π}, ξ_{1}^{π}, \dots, ξ_{T - 1}^{π}}

; likewise,

ξ^{U}

is defined similarly.

Remark 1.

The overall cost function to minimize is a weighted combination of control efficiency and transmission cost, directly reflecting their balance. Then, penalize the state deviation from the equilibrium (

X_{t}^{⊤} Q_{1} X_{t}

) and control input consumption (

U_{t}^{⊤} R U_{t}

)—smaller values mean better control stability. High-cost energy selection is then penalized—smaller values mean lower energy expenditure (driven by Λ). To avoid the complexity of joint optimization, we use the separation principle to decompose the problem into two decoupled but mutually constrained subproblems—ensuring that neither control efficiency nor energy selection (costΛ)) is sacrificed excessively.

It is essential to identify the optimal mappings

ξ^{U^{*}}

and

ξ^{Π^{*}}

that minimize the cost function (14) across all permissible strategies:

(ξ^{U^{*}}, ξ^{Π^{*}}) = \underset{ξ^{U}, ξ^{Π}}{arg min} J_{T} (ξ^{U}, ξ^{Π}) .

(15)

4.1. Optimal Control Under the Optimal Estimated State with the Separate Principle

When packet loss occurs during the process of transmission, the controller cannot receive

{\hat{ω}}_{t}

, and we have the following function:

\begin{matrix} {\tilde{X}}_{t} = \{\begin{matrix} g ({\tilde{X}}_{t - 1}), & γ (t) = 0 \\ {\tilde{X}}_{t}, & γ (t) = 1 \end{matrix} \end{matrix}

(16)

where

g (X) ≜ A X + B U

. It is clear that the values of

{\tilde{X}}_{t}

belong to a countably infinite collection:

{{\tilde{X}}_{t}, g ({\tilde{X}}_{t - 1}), g^{2} ({\tilde{X}}_{t - 2}), \dots}

.

We may introduce a random variable

τ_{t}

to represent the length of the most recent successful transfer before time t:

τ_{t} = t - m a x {t^{*} : γ (t^{*}) = 1, 0 ⩽ t^{*} ⩽ t}

(17)

The analysis presented suggests a distinction in structure between the controller and the selection of transmitters. In the subsequent sections, we will formally demonstrate the emergence of a separation principle related to this issue. Associated with the cost function (14), let us define the value function as

\begin{matrix} V_{k} (F_{k}^{c}, F_{k}^{e}) & = min_{ξ_{t}^{u}, ξ_{t}^{π}} E [\sum_{t = k}^{T - 1} (X_{t}^{⊤} Q_{1} X_{t} + U_{t}^{⊤} R U_{t} + π_{t}^{⊤} \land) \\ + X_{T}^{⊤} Q_{2} X_{T} | U_{t} = ξ_{t}^{u}, π_{t} = ξ_{t}^{π}, t = k, \dots, T - 1] . \end{matrix}

(18)

Expression (18) is rewritten as follows by using the dynamic programming principle:

\begin{matrix} V_{k} (F_{k}^{c}, F_{k}^{e}) & = min_{ξ_{k}^{u}, ξ_{k}^{π}} E [(X_{k}^{⊤} Q_{1} X_{k} + U_{k}^{⊤} R U_{k} + π_{k}^{⊤} \land) \\ + V_{k + 1} (F_{k + 1}^{c}, F_{k + 1}^{e}) | U_{k} = ξ_{k}^{u}, π_{k} = ξ_{k}^{π}] . \end{matrix}

(19)

If

ξ_{k}^{u^{*}}

and

ξ_{k}^{π^{*}}

serve to minimize the right side of (19), then we have

U_{k}^{*} = ξ_{k}^{u^{*}} (F_{k}^{c})

and

π_{k}^{*} = ξ_{k}^{π^{*}} (F_{k}^{e})

. Additionally, from (18), we can derive the following:

min_{ξ^{U}, ξ^{Π}} J_{T} (ξ^{U}, ξ^{Π}) = E [V_{0} (F_{0}^{c}, F_{0}^{e})]

(20)

Here, the expectation in Equation (20) pertains to the random variables

F_{0}^{c}

and

F_{0}^{e}

. For conciseness in the upcoming analysis, we will denote

V_{k} (F_{k}^{c}, F_{k}^{e})

as follows:

\begin{matrix} V_{k} (F_{k}^{c}, F_{k}^{e}) = min_{ξ_{k}^{u}, ξ_{k}^{π}} E [(X_{k}^{⊤} Q_{1} X_{k} + U_{k}^{⊤} R U_{k} + π_{k}^{⊤} \land) + V_{k + 1} (F_{k + 1}^{c}, F_{k + 1}^{e}) | F_{k}] \end{matrix}

(21)

where

F_{k} = {F_{k}^{c}, F_{k}^{e}}

is the combined information set. Note that the information set

F_{k}^{e}

(and, consequently,

F_{k}

) encompasses the realization

x_{k}

of the state

X_{k}

. The subsequent theorem describes the optimal policy

ξ_{k}^{u^{*}}

for every

k = 0, 1, \dots, T - 1

.

Theorem 1.

Given the information set

F_{k}^{c}

at time k provided to the controller, the optimal control policy

ξ_{k}^{u^{*}} : F_{k}^{c} \to R^{m}

, which operates under the optimal estimated state to minimize the right-hand side of Equation (19), can be expressed in the following manner:

U_{k}^{*} = ξ_{k}^{u^{*}} (F_{k}^{c}) = - G_{k} g^{τ_{k}} (E [X_{k - τ_{k}} | F_{k}^{c}]) .

(22)

This holds for every

k = 0, 1, \dots, T - 1

.

\begin{matrix} G_{k} & = {(R + B^{⊤} P_{k + 1} B)}^{- 1} B^{⊤} P_{k + 1} A, \end{matrix}

(23a)

\begin{matrix} P_{k} & = Q_{1} + A^{⊤} P_{k + 1} A - G_{k}^{⊤} (R + B^{⊤} P_{k + 1} B) G_{k}, \end{matrix}

(23b)

\begin{matrix} P_{T} & = Q_{2} . \end{matrix}

(23c)

Proof.

To ensure conciseness, we will refer to

g^{τ_{k}}

rather than

g^{τ_{k}} (x_{k - τ_{k}})

, and

g_{Δ}^{τ_{k}}

will denote

g^{τ_{k}} (Δ_{k - τ_{k}})

. The proof of this theorem relies on the dynamic programming principle. The central concept is to establish that the value function linked to the optimal control problem, assuming the best estimate of the state, is represented as follows:

V_{k} (F_{k}^{c}, F_{k}^{e}) = {g^{τ_{k}}}^{⊤} P_{k} g^{τ_{k}} + C_{k} + o_{k}

(24)

where

x_{k}

signifies the outcome of the state

X_{k}

, and

P_{k}

is defined similarly to that in (23b); this is applicable for all

k = 0, 1, \dots, T - 1

.

C_{k} = min_{{ξ_{t}^{π}}_{t = k}^{T - 1}} E [\sum_{t = k}^{T - 1} {g_{Δ}^{τ_{t}}}^{⊤} H_{t} g_{Δ}^{τ_{t}} + π_{t}^{⊤} \land | F_{k}^{e}] .

(25)

The matrix

H_{k} \in R^{n \times m}

and the scalar

o_{k}

are defined by the following equations:

\begin{matrix} H_{k} & = G_{k}^{⊤} (R + B^{⊤} P_{k + 1} B) G_{k}, \end{matrix}

(26a)

\begin{matrix} o_{k} & = o_{k + 1} + t r (P_{k + 1} W), \end{matrix}

(26b)

\begin{matrix} o_{T} & = 0 . \end{matrix}

(26c)

Neither

H_{k}

nor

o_{k}

relies on previous or forthcoming control or energy selection decisions. Thus, these values can be calculated offline. Additionally, from (12), it can be deduced that

Δ_{t}

is independent of the earlier control history

U_{t}

, being solely determined by

Π_{t}

. Consequently,

C_{k}

is not influenced by the control action

U_{k}

. Equation (25) can be reformulated as follows.

C_{k} = min_{ξ_{k}^{π}} E [{g_{Δ}^{τ_{k}}}^{⊤} H_{k} g_{Δ}^{τ_{k}} + π_{k}^{⊤} \land + C_{k + 1} | F_{k}^{e}] .

(27)

Based on the definition of

V_{k} (\cdot)

presented in (18), we can express

V_{T} (F_{T}^{c}, F_{T}^{e})

as follows:

V_{T} (F_{T}^{c}, F_{T}^{e}) = {g^{τ_{T}}}^{⊤} Q_{2} g^{τ_{T}} = {g^{τ_{T}}}^{⊤} P_{T} g^{τ_{T}} .

(28)

Following this, we need to confirm that

V_{T - 1} (F_{T - 1}^{c}, F_{T - 1}^{e})

conforms to the structure outlined in (24):

\begin{matrix} V_{T - 1} (F_{T - 1}^{c}, F_{T - 1}^{e}) = min_{ξ_{T - 1}^{u}, ξ_{T - 1}^{π}} E [{g^{τ_{T - 1}}}^{⊤} Q_{1} g^{τ_{T - 1}} + U_{T - 1}^{⊤} R U_{T - 1} + π_{T - 1}^{⊤} \land + {g^{τ_{T}}}^{⊤} P_{T} g^{τ_{T}}] . \end{matrix}

(29)

Substituting Equation (1) and performing a series of simplifications result in the following expression:

\begin{matrix} V_{T - 1} (F_{T - 1}^{c}, F_{T - 1}^{e}) & = min_{ξ_{T - 1}^{u}, ξ_{T - 1}^{π}} E [| | U_{T - 1} + G_{T - 1} g^{τ_{T - 1}} {| |}_{(R + B^{⊤} P_{T} B)}^{2} \\ + {g^{τ_{T - 1}}}^{⊤} P_{T - 1} g^{τ_{T - 1}} + π_{T - 1}^{⊤} \land + t r (P_{T} W) | F_{k}] . \end{matrix}

(30)

In Formula (30), the term

| | U_{T - 1} + G_{T - 1} g^{τ_{T - 1}} {| |}_{(R + B^{⊤} P_{T} B)}^{2}

is the sole component that is influenced by

U_{T - 1}

. Therefore,

U_{T - 1}

serves as the minimum mean squared estimate for

G_{T - 1} g^{τ_{T - 1}}

.Then,

\begin{matrix} U_{T - 1}^{*} & = ξ_{T - 1}^{u^{*}} (F_{T - 1}^{c}) = - G_{T - 1} g^{τ_{T - 1}} (E [X_{T - 1 - τ_{T - 1}} | F_{T - 1}^{c}]) . \end{matrix}

(31)

After replacing the optimal

U_{T - 1}^{*}

in (29), we derive

\begin{matrix} V_{T - 1} (F_{T - 1}^{c}, F_{T - 1}^{e}) = & min_{ξ_{T - 1}^{π}} E [{g_{Δ}^{τ_{T - 1}}}^{⊤} H_{T - 1} g_{Δ}^{τ_{T - 1}} + π_{T - 1}^{⊤} \land | F_{T - 1}^{e}] \\ + {g^{τ_{T - 1}}}^{⊤} P_{T - 1} g^{τ_{T - 1}} + t r (P_{T} W) \end{matrix}

(32)

Consequently, applying the definitions of

C_{T - 1}

and

o_{T - 1}

leads us to the conclusion that

\begin{matrix} V_{T - 1} (F_{T - 1}^{c}, F_{T - 1}^{e}) = C_{T - 1} + {g^{τ_{T - 1}}}^{⊤} P_{T - 1} g^{τ_{T - 1}} + o_{T - 1} . \end{matrix}

(33)

Therefore, the expression

V_{T - 1} (F_{T - 1}^{c}, F_{T - 1}^{e})

is certainly in the format described by Equation (24). To confirm that Equation (24) is also valid for time k, we apply backward induction, presuming that it is true for some time

k + 1

. Toward that end, we obtain

\begin{matrix} V_{k} (F_{k}^{c}, F_{k}^{e}) & = min_{ξ_{k}^{u}, ξ_{k}^{π}} E [({g^{τ_{k}}}^{⊤} Q_{1} g^{τ_{k}} + U_{k}^{⊤} R U_{k} + π_{k}^{⊤} \land) \\ + {g^{τ_{k + 1}}}^{⊤} P_{k + 1} g^{τ_{k + 1}} + C_{k + 1} + o_{k + 1} | F_{k}] . \end{matrix}

(34)

Using Equations (1), (23a) and (23b), one can obtain

\begin{matrix} V_{k} (F_{k}^{c}, F_{k}^{e}) & = min_{ξ_{k}^{u}, ξ_{k}^{π}} E [| | U_{k} + G_{k} g^{τ_{k}} {| |}_{(R + B^{⊤} P_{k + 1} B)}^{2} + {g^{τ_{k}}}^{⊤} P_{k} g^{τ_{k}} \\ + π_{k}^{⊤} \land) + t r (P_{k + 1} W) + C_{k + 1} + o_{k + 1} | F_{k}] . \end{matrix}

(35)

The optimal control, denoted as

U_{k}^{*}

, which is derived under the best estimated state to minimize (35), is expressed as follows:

U_{k}^{*} = ξ_{k}^{u^{*}} (F_{k}^{c}) = - G_{k} g^{τ_{k}} (E [X_{k - τ_{k}} | F_{k}^{c}]) .

(36)

After substituting the optimal control under the optimal estimated state from (36) in (35), we can obtain

\begin{matrix} V_{k} (F_{k}^{c}, F_{k}^{e}) & = min_{ξ_{k}^{π}} E [({g_{Δ}^{τ_{k}}}^{⊤} (G_{k}^{⊤} (R + B^{⊤} P_{k + 1} B) G_{k}) g_{Δ}^{τ_{k}} \\ + π_{k}^{⊤} \land + C_{k + 1}) | F_{k}^{e}] + {g^{τ_{k}}}^{⊤} P_{k} g^{τ_{k}} + t r (P_{k + 1} W) + o_{k + 1} \\ = min_{ξ_{k}^{π}} E_{F_{k}^{e}} [{g_{Δ}^{τ_{k}}}^{⊤} H_{k} g_{Δ}^{τ_{k}} + π_{k}^{⊤} \land + C_{k + 1}] + {g^{τ_{k}}}^{⊤} P_{k} g^{τ_{k}} + o_{k} \\ = {g^{τ_{k}}}^{⊤} P_{k} g^{τ_{k}} + C_{k} + o_{k} . \end{matrix}

(37)

Consequently, the value function can be expressed in the form of (24), and the optimal control based on the best estimated state at times

k = 0, 1, \dots, T - 1

is provided in (36). □

Remark 2.

The optimal feedback controller for the separated control problem is given by Theorem 1 in [5]; it indicates that there exists a separation principle if the policy satisfies the structure of Equation (4) in [5]. In this paper, Equation (23a) is exactly the same as the structure of Equation (4) in [5], which shows that the optimal control strategy proposed by us satisfies the separation principle.

Remark 3.

The problem of designing a controller based on the filter requires the consideration of both filtering performance and control performance simultaneously. One typical method is to prioritize filtering performance before ensuring control performance. For example, the separation structure between the controller and the quantizer is studied in [18] to ensure the filtering performance and control performance individually. Through a similar analytical approach, we find that the state estimation error

Δ_{t}

at the filter depends only on the energy selection policy

ξ_{t}^{Π}

through the variable

{{\hat{ω}}_{- 1}, \dots, {\hat{ω}}_{t - 1}}

and not on the control

ξ_{t}^{U}

. Likewise, we obtain a separation principle between the energy selector and the controller.

Note that, in contrast to [18], this paper takes into account delay:

τ_{t} = t - m a x {t^{*} : γ (t^{*}) = 1, 0 ⩽ t^{*} ⩽ t}

. In the presence of delay, the information available at the controller will be affected since some of the measurements’ arrival will be delayed, and hence, state estimations will be affected. Theorem 1 incorporates the consideration of delay and provides the optimal controller structure for this scenario:

U_{k}^{*} = - G_{k} g^{τ_{k}} (E [X_{k - τ_{k}} | F_{k}^{c}])

.

4.2. Optimal Energy Selection to Ensure Filtering Performance and MDP

Affected by the energy constraint, packet dropouts may occur during the information transmission of the plant. Consequently, the system determines the most efficient transmission energy to reduce the error covariance at the filter, as well as the energy selection cost. The optimization challenge can be reformulated as follows:

min_{ξ_{t}^{π}} E [\sum_{t = 0}^{T - 1} (t r (P_{t}) + π_{t}^{⊤} \land)]

(38)

This subsection aims to identify the ideal energy selection strategy that guarantees filtering efficacy while minimizing the above cost function. Due to the presence of an imperfect feedback communication channel, the system cannot verify the receipt of packets by the filter. Thus, at time t, the system acquires “imperfect state information” regarding

{P_{t}}

through the acknowledgment sequence

{{\hat{γ}}_{t}}

. This optimization problem can be interpreted as a Markov decision process (MDP) with flawed state information. Furthermore, the issue characterized as an MDP with imperfect acknowledgments may be transformed into an MDP with accurate acknowledgments by utilizing the information-state framework.

Let all observations obtained from the filter up to time t be represented as

\begin{matrix} z^{t} : = {P_{0}, {\hat{γ}}_{0}, \dots, {\hat{γ}}_{t}, p_{0}, \dots, p_{t - 1}} \end{matrix}

(39)

for

t \geq 0

, with

z^{- 1} : = {P_{0}}

. Next, we define the information state as

\begin{matrix} f_{t + 1} (P_{t + 1} | z^{t}, p_{t}) = P (P_{t + 1} | z^{t}, p_{t}), \end{matrix}

(40)

representing the conditional probability of the estimation error covariance

P_{t + 1}

given

z^{t}

and

p_{t}

. The ensuing theorem illustrates how

f_{t + 1} (\cdot | z^{t}, p_{t})

can be determined from

f_{t} (\cdot | z^{t - 1}, p_{t - 1})

in conjunction with

{\hat{γ}}_{t}

and

p_{t}

. From reference [4], we have the following lemma.

Lemma 1

([4]). The dynamics of the information-state function

f_{(\cdot)}

is described as follows:

\begin{matrix} f_{t + 1} (P_{t + 1} | z^{t}, p_{t}) \\ = & \sum_{γ_{t} \in {0, 1}} [\int_{P_{t}} (P (P_{t + 1} | P_{t}, γ_{t}) \times f_{t} (P_{t} | z^{t - 1}, p_{t - 1})) d P_{t} \\ \times \frac{P ({\hat{γ}}_{t} | γ_{t}) \times P (γ_{t} | p_{t})}{\sum_{γ_{t} \in {0, 1}} P ({\hat{γ}}_{t} | γ_{t}) \times P (γ_{t} | p_{t})}], \end{matrix}

(41)

with the initial condition given by

f_{0} (P_{0} | z^{- 1}) = δ (P_{0})

, where δ denotes the Dirac delta function.

In light of the function

γ_{t}

, the estimation error covariances at the filter satisfy the following iteration expression:

\begin{matrix} P_{t + 1} = γ_{t} W + (1 - γ_{t}) (A P_{t} A^{⊤} + W) \end{matrix}

(42)

The condition of having perfect state information is characterized by the information-state

f_{(\cdot)}

. Define a binary random variable

γ

, akin to

γ_{t}

. For a given P, we express the following

\begin{matrix} L (P, γ) : = γ W + (1 - γ) (A P A^{⊤} + W) \end{matrix}

(43)

as the operator of the random Riccati equation. Let

S_{+}^{n}

denote the collection of all non-negative definite matrices. We denote the space of all probability density functions defined on

S_{+}^{n}

as

Ψ

, characterized by the condition

\int_{S_{+}^{n}} φ (P) d P = 1

for any

φ \in Ψ

. Building on the recursion of the information-state, we define

\begin{matrix} \tilde{φ} : = \sum_{γ \in {0, 1}} [\int_{P} P (L (P, γ) | P, γ) φ (P) d P \times \frac{P (\hat{γ} | γ) \times P (γ | p)}{\sum_{γ \in {0, 1}} P (\hat{γ} | γ) \times P (γ | p)}] \end{matrix}

(44)

for a given

φ \in Ψ

, as well as for

\hat{γ}

and p. It is important to highlight that when both

ϵ

and

η

are set to zero, the optimization issue transforms into a stochastic control problem with perfect state information.

We formulate a Markov decision process (MDP) to address the stochastic control issue. At time t, the MDP is characterized by the state

s_{t} = (φ_{t}, σ_{t}, b_{t})

, which resides within the state space

S

. Let

A = {0, p_{t} (1), \dots, p_{t} (M)}

represent the set of actions corresponding to transmission energy, while

A (s)

denotes the collection of permissible actions for the state

s \in S

. The reward for a single stage is given by

\begin{matrix} r (s, γ) & = E [(t r (L (P, γ)) | φ, p) + π^{⊤} \land] \\ = P (γ = 1) \times t r (W) \int_{P} φ (P) d P + P (γ = 0) \\ \times \int_{P} t r (A P A^{⊤} + W) φ (P) d P + π^{⊤} \land . \end{matrix}

(45)

The best approach for selecting energy to guarantee effective filtering performance is calculated offline using the Bellman dynamic programming equations presented below.

Theorem 2.

Given the initial condition

I_{0} = {σ_{0}, b_{0}, P_{0}}

, the finite-time horizon minimization problem, which accounts for imperfect acknowledgments, fulfills the following Bellman optimal iterative equation:

\begin{matrix} V_{k} (φ, σ, b) = min_{p \in A} \{r (s, γ) + E [V_{k + 1} (\tilde{φ}, \tilde{σ}, \tilde{b}) | φ, σ, p]\} \end{matrix}

(46)

and the termination condition is

\begin{matrix} V_{T} (φ, σ, b) : = min_{p \in A} E [t r (L (P, γ)) | φ, p] = E [t r (L (P, γ)) | φ, b] \end{matrix}

(47)

where all available energy is used for transmission in the final time T.

Accordingly, the optimal selection policy

ξ^{π^{*}}

is obtained by

\begin{matrix} ξ^{π^{*}} = arg min_{p \in A} \{r (s, γ) + E [V_{k + 1} (\tilde{φ}, \tilde{σ}, \tilde{b}) | φ, σ, p]\} \end{matrix}

(48)

Proof.

For this proof, please consult Theorem

7.1

in reference [21]. □

To facilitate the calculations, we express

\begin{matrix} E [V_{k + 1} (\tilde{φ}, \tilde{σ}, \tilde{b}) | φ, σ, p] = \sum_{i = 1}^{N} V_{k + 1} (\tilde{φ}, i, \tilde{b} (i)) \times {(P (σ) H)}_{i}, \end{matrix}

(49)

where the energy harvesting sequence

{σ_{k}}

is characterized by finite-state Markov chains. The matrix

H

represents the state transition probabilities of the energy harvesting processes, and

P (σ) : = [P (σ = 1), \dots, P (σ = N)]

.

Remark 4.

The energy selection problem with the imperfect state information problem is reduced to ones with perfect state information using the notion of information-state [21]. The information state is the entire probability density function

\tilde{φ}

and not just its value at any particular

P_{t}

. We note that discretized versions of the Bellman equations (46), which, in particular, includes the discretization of the space of probability density functions Ψ, is used for the numerical computation to find the suboptimal solution to the energy selection problem. As the degree of discretization increases, the suboptimal solution will converge to the optimal solution [22]. For Bellman optimal (46), we can solve it by applying the value iteration algorithm [23].

4.3. Algorithm

Algorithm 1 is the algorithm flow of this article.

Algorithm 1 Joint Optimization of Control and Energy Selection for LQG Systems.

Require: Set Time horizon T, system matrices

A, B

, noise covariance

W

, state penalty Q, control cost R, energy cost vector

Λ

, energy levels

A

, battery capacity C, energy harvesting transition matrix T, harvesting set

σ_{t}

,

α

, noise spectral density S, bandwidth K, feedback error probabilities

ϵ, η

Initialize the distributions:

System initial state $X_{0} \sim N (0, I)$ , initial battery level $b_{0}$ ;
Estimation error covariance initial value $P_{0}$ (Dirac delta distribution $f_{0} (P_{0} | z^{- 1}) = δ (P_{0})$ ), information state $φ_{0} = f_{0}$ ;
Energy selection decision variable $π_{0} \in {0, 1}$ ,

Offline Calculation: Controller and Energy Policy (Backward Induction):

1:: for $k = T - 1$ down to 0 do
2:: Solve Riccati equation for optimal control gain
3:: if $k = T - 1$ then
4:: Initialize terminal matrix $P_{T} = Q_{2}$ (Equation (23c))
5:: end if
6:: Calculate $P_{k}$ (Equation (23b))
7:: Compute control gain $G_{k}$ (Equation (23a))
8:: Store ${G_{0}, G_{1}, \dots, G_{T - 1}}$
9:: MDP value iteration for optimal energy policy
10:: while MDP value function $V_{k}$ not converged do
11:: Prediction of information state: Update √ via Lemma 1
12:: Update of value function: Calculate one-stage reward $r (s_{t}, γ)$ (Equation (45))
13:: Update $V_{k}$ (Equation (46))
14:: end while
15:: Determine optimal energy policy: $ξ_{k}^{π^{*}}$ (Equation (48))
16:: Store ${ξ_{0}^{π^{*}}, \dots, ξ_{T - 1}^{π^{*}}}$
17:: end for

Online Simulation: System Operation (Forward Propagation)):

for

t = 0

to

T - 1

do

2:: Select transmission energy $p_{t} = ξ_{t}^{*} (φ_{t}, σ_{t}, b_{t})$ (feasibility constraint: $p_{t} \leq b_{t}$ )
Compute packet loss probability $P [γ_{t} = 0] = 2 Φ (\sqrt{\frac{α p_{t}}{S_{0} K}})$
4:: Generate binary loss indicator $γ_{t} \sim Bern (1 - P [γ_{t} = 0])$
Generate measurement $Y_{t}$ (Equation (5))
6:: Calculate continuous loss duration $τ_{t}$ (Equation (17))
Filtered state ${\tilde{X}}_{t} = g^{τ_{t}} ({\tilde{X}}_{t - τ_{t}})$
8:: Compute optimal control $U_{t}^{*} = - G_{t} g^{τ_{t}} (E [X_{t - τ_{t}} | F_{t}^{c}])$ (Equation (22))
Update state $X_{t + 1}$ (Equation (1)), where $W_{t} \sim N (0, W)$
10:: Update battery $b_{t + 1}$ (Equation (6))
Generate next harvesting energy $σ_{t + 1} \sim Markov (T)$
12:: Record performance indicators:
estimation error covariance $P_{t} = E [Δ_{t} Δ_{t}^{⊤}]$ , transmission cost $π_{t}^{⊤} Λ$ , control cost $U_{t}^{* ⊤} R U_{t}^{*}$
end for

5. Simulation Results

Example 1.

Let us consider the two-dimensional (unstable) system

X_{t + 1} = [\begin{matrix} 1.03 & 0.5 \\ 0 & 1.2 \end{matrix}] X_{t} + [\begin{matrix} 0.1 & 0 \\ 0 & 0.18 \end{matrix}] U_{t} + W_{t},

(50)

where the starting condition is defined as

X_{0} \sim N (0, I)

, while

W_{t}

follows a distribution of

W_{t} \sim N (0, 2 I)

. The parameters for the control cost are specified as

Q_{1} = Q_{2} = R = \frac{1}{2} I

, with the time horizon established at

T = 40

.

The harvested energies are modeled as two-level discrete Markov chains with the transition matrix

T = [0.4, 0.6; 0.3, 0.7]

and the energy-harvesting set

{σ_{t}} = {2, 4}

. The parameters of the channel are

β = 0.5

,

S_{0} = 0.8

, and

K = 2

. The simulation was performed with seven energy selectors corresponding to the action set

{0, 1, 3, 5, 7, 9}

and a one-to-one mapping

{p_{t}} = {0, 1, 3, 5, 7, 9} \mapsto {π_{t}} = {π_{t}^{1}, π_{t}^{2}, π_{t}^{3}, π_{t}^{4}, π_{t}^{5}, π_{t}^{6}}

(where ↦ denotes the mapping from

p_{t}

to the decision variable

π_{t}

). The costs associated with the energy selectors are

\land = {[0, 0.1, 0.2, 0.3, 0.4, 0.5]}^{T}

.

To simplify the discussion, we initially consider a scenario where the plant operates with ideal acknowledgments (i.e.,

η = 0

and

ϵ = 0

). In Figure 3, a single simulation run of the packet loss process

{γ_{t}}

is depicted, with the red line illustrating the filter’s error covariance, while the blue dots represent the values of γ (where

γ = 0

signifies packet loss, and

γ = 1

denotes successful data transmission). Meanwhile, Figure 4 presents battery storage

{b_{t}}

along with the corresponding optimal energy distribution

{p_{t}}

for

C = 10

, and the control effect comparison is illustrated in Figure 5.

We set

τ = {0, 1, 2}

(continuous packet dropping time). Without the proposed strategy, the system state is unstable; implementing this strategy leads to overall stabilization. An increase in τ will result in a corresponding increase in the time required for the system to regain stability. The control gain is calculated by Theorem 1 (removed redundant period) and given in Table 1.

Example 2.

We consider an unstable batch reactor (a large-scale system) with parameters from [24] as follows:

A = [\begin{matrix} 1.1782 & 0.0015 & 0.5116 & - 0.4033 \\ - 0.0515 & 0.6619 & - 0.0110 & 0.0613 \\ 0.0762 & 0.3351 & 0.5607 & 0.3824 \\ - 0.0006 & 0.3353 & 0.0893 & 0.8494 \end{matrix}], B = [\begin{matrix} 0.0045 & - 0.0876 \\ 9.4672 & 0.0012 \\ 0.2132 & - 0.2353 \\ 0.2131 & - 0.0161 \end{matrix}],

where the starting condition is represented as

X_{0} \sim N (0, I)

and

W_{t} \sim N (0, 2 I_{4 \times 4})

. The parameters for control costs are defined as

Q_{1} = Q_{2} = \frac{1}{2} I_{4 \times 4}

and

R = \frac{1}{2} I_{2 \times 2}

, while the time horizon is established at

T = 40

. All other parameters remain consistent with those detailed in Example 1.

If we assume that packet loss may occur within this system, we derive an energy selection policy (specifically,

η = 0

and

ϵ = 0

) and assess the control performance of the proposed approach. As illustrated in Figure 6, we note that the estimated error covariance rises with an increase in the continuous packet dropping duration τ. The optimal energy allocation under this policy when C = 10 is presented in Figure 7, where the blue curve denotes optimal transmission energy and the orange curve represents battery storage, providing quantitative insights into energy management. In Figure 8, the state after control diminishes to 0 (in contrast to the state prior to control), which validates the effectiveness of the designed controller. The control gain for the system is computed using Theorem 1 and is presented in Table 2.

6. Conclusions

In this study, we examine a traditional LQG problem that involves flawed acknowledgments and constraints related to energy harvesting. The transmitter is required to determine the optimal energy level based on incomplete feedback data, aiming to minimize the overall costs associated with transmission and control efficacy. By addressing the classical Riccati equation linked to the LQG problem, we have computed the optimal control gains. The challenge of energy selection under imperfect feedback is reformulated into a Markov decision process (MDP) concerning ideal acknowledgments through the iterative processing of state information. Ultimately, we derive the optimal energy selection approach needed to maintain filtering performance by solving the finite horizon Bellman optimal dynamic equation. In our upcoming research, we intend to investigate and evaluate this matter within a more realistic context.

Author Contributions

Z.J.: Formal analysis, investigation, methodology, writing—original draft, software development, and writing—review and editing; L.G.: writing—review and editing; J.L.: formal analysis, conceptualization, methodology, supervision, and writing—review and editing; Q.J.: supervision and writing—review. All authors have read and agreed to the published version of this manuscript.

Funding

This research was partially supported by the National Natural Science Foundation of China (NSFC) under grants 62103282, 12071292, 12131007, and 12071044.

Data Availability Statement

Data is contained within this article.

Conflicts of Interest

There are no conflicts of interest with the publication of this work.

References

Cao, X.; Zhou, X.; Liu, L.; Cheng, Y. Energy-efficient spectrum sensing for cognitive radio enabled remote state estimation over wireless channels. IEEE Trans. Wireless Commun. 2014, 14, 2058–2071. [Google Scholar] [CrossRef]
Liu, K.; Skelton, R.E.; Grigoriadis, K. Optimal controllers for finite wordlength implementation. IEEE Trans. Autom. Control 1992, 37, 1294–1304. [Google Scholar] [CrossRef]
Chen, L.; Fung, T.C.; Li, Y.; Peng, L. Fitting heavy-tailed distributions to mortality indexes for longevity risk forecasts. J. Math. Study 2024, 57, 486–498. [Google Scholar] [CrossRef]
Nourian, M.; Leong, A.S.; Dey, S. Optimal energy allocation for kalman filtering over packet dropping links with imperfect acknowledgments and energy harvesting constraints. IEEE Trans. Autom. Control 2014, 59, 2128–2143. [Google Scholar] [CrossRef]
Borkar, V.S.; Mitter, S.K. LQG control with communication constraints. In Communications, Computation, Control, and Signal Processing: A Tribute to Thomas Kailath; Springer: Boston, MA, USA, 1997; pp. 365–373. [Google Scholar]
Luo, W.; Lu, P.; Du, C.; Liu, H. Cooperative output tracking control of heterogeneous multi-agent systems with random communication constraints: An observer-based predictive control approach. IEEE Trans. Circuits Syst. II Express Briefs 2021, 69, 1139–1143. [Google Scholar] [CrossRef]
Yang, X.; Marjanovic, O. LQG control with extended Kalman filter for power systems with unknown time-delays. IFAC Proc. Vol. 2011, 44, 3708–3713. [Google Scholar] [CrossRef]
Aghaee, F.; Dehkordi, N.M.; Bayati, N.; Karimi, H. A distributed secondary voltage and frequency controller considering packet dropouts and communication delay. Int. J. Electr. Power Energy Syst. 2022, 143, 108466. [Google Scholar] [CrossRef]
Li, B.; Sinha, U.; Sankaranarayanan, G. Modelling and control of non-linear tissue compression and heating using an LQG controller for automation in robotic surgery. Trans. Inst. Meas. Control 2016, 38, 1491–1499. [Google Scholar] [CrossRef]
Kostina, V.; Hassibi, B. Rate-cost tradeoffs in control. IEEE Trans. Autom. Control 2019, 64, 4525–4540. [Google Scholar] [CrossRef]
Zhan, X.S.; Guan, Z.H.; Zhang, X.H.; Yuan, F.S. Optimal performance of networked control systems over limited communication channels. Trans. Inst. Meas. Control 2014, 36, 637–643. [Google Scholar] [CrossRef]
Kang, T.; Kim, S.; Hyoung, C.; Kang, S.; Park, K. An energy combiner for a multi-input energy-harvesting system. IEEE Trans. Circuits Syst. II Express Briefs 2015, 62, 911–915. [Google Scholar] [CrossRef]
Wang, Z.; Wang, X.; Aggarwal, V. Transmission with energy harvesting nodes in frequency-selective fading channels. IEEE Trans. Wireless Commun. 2016, 15, 1642–1656. [Google Scholar] [CrossRef]
Weddell, A.S.; Merrett, G.V.; Kazmierski, T.J.; Al-Hashimi, B.M. Accurate supercapacitor modeling for energy harvesting wireless sensor nodes. IEEE Trans. Circuits Syst. II Express Briefs 2011, 58, 911–915. [Google Scholar] [CrossRef]
Yao, N.; Chen, B.; Wang, J.; Wang, L.; Hao, X. Distributed optimization game algorithm to improve energy efficiency for multi-radio multi-channel wireless sensor network. Trans. Inst. Meas. Control 2024, 46, 2198–2210. [Google Scholar] [CrossRef]
Li, Y.Z.; Zhang, F.; Quevedo, D.E. Power control of an energy harvesting sensor for remote state estimation. IEEE Trans. Autom. Control 2017, 62, 277–290. [Google Scholar] [CrossRef]
Peng, L.; Cao, X.; Sun, C. Optimal transmit power allocation for an energy-harvesting sensor in wireless cyber-physical systems. IEEE Trans. Cybern. 2019, 51, 779–788. [Google Scholar] [CrossRef]
Maity, D.; Tsiotras, P. Optimal controller synthesis and dynamic quantizer switching for linear-quadratic-gaussian systems. IEEE Trans. Autom. Control 2022, 67, 382–389. [Google Scholar] [CrossRef]
Lei, H.; Yang, W.; Yu, Y. Energy efficient management for distributed state estimation under DoS attacks. Int. J. Robust Nonlinear Control 2022, 32, 1941–1959. [Google Scholar] [CrossRef]
Leong, A.S.; Dey, S. Power allocation for error covariance minimization in Kalman filtering over packet dropping links. In Proceedings of the 2012 IEEE 51st Annual Conference on Decision and Control, Orlando, FL, USA, 10–13 December 2012; pp. 3335–3340. [Google Scholar]
Kumar, P.R.; Varaiya, P. Stochastic Systems: Estimation, Identification, and Adaptive Control; Prentice-Hall: Englewood Cliffs, NJ, USA, 1986. [Google Scholar]
Yu, H.; Bertsekas, D. Discretized approximations for POMDP with average cost. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, Banff, AB, Canada, 7–10 July 2004; pp. 619–627. [Google Scholar]
Bertsekas, D.P. Dynamic Programming and Optimal Control; Athena Scientific: Belmont, MA, USA, 1995. [Google Scholar]
Garone, E.; Sinopoli, B.; Goldsmith, A.; Casavola, A. LQG control for MIMO systems over multiple erasure channels with perfect acknowledgment. IEEE Trans. Autom. Control 2011, 57, 450–456. [Google Scholar] [CrossRef]

Figure 1. The controlled system with imperfect feedback acknowledgments.

Figure 2. The decision-making process during a single time step.

Figure 3. The process of variation in error covariance within the filter.

Figure 4. The optimal energy allocation when

C = 10

.

Figure 4. The optimal energy allocation when

C = 10

.

Figure 5. Control effect comparison chart.

Figure 6. The variation process of error covariance at the filter for Example 2.

Figure 7. The optimal energy allocation when

C = 10

for Example 2.

Figure 7. The optimal energy allocation when

C = 10

for Example 2.

Figure 8. Control effect comparison chart for Example 2.

Table 1. Control gain for Example 1.

Time	0	⋯	38	39
$- G$	$[\begin{matrix} - 0.593 & - 0.743 \\ - 0.703 & - 3.470 \end{matrix}]$	⋯	$[\begin{matrix} - 0.206 & - 0.155 \\ - 0.085 & - 0.566 \end{matrix}]$	$[\begin{matrix} - 0.101 & - 0.049 \\ 0 & - 0.209 \end{matrix}]$

Table 2. Control gain for Example 2.

Time	$- G$
0	$[\begin{matrix} 0.0137 & - 0.0999 & - 0.0122 & - 0.08168 \\ 2.1389 & 0.1310 & 1.4990 & - 0.9285 \end{matrix}]$
1	$[\begin{matrix} 0.0137 & - 0.0999 & - 0.0122 & - 0.08168 \\ 2.1389 & 0.1310 & 1.4990 & - 0.9285 \end{matrix}]$
⋮	⋮
39	$[\begin{matrix} 0.0051 & - 0.0706 & - 0.0003 & - 0.0092 \\ 0.1141 & 0.0757 & 0.1675 & 0.0638 \end{matrix}]$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ju, Z.; Guo, L.; Li, J.; Ju, Q. Energy Management and Control for Linear–Quadratic–Gaussian Systems with Imperfect Acknowledgments and Energy Constraints. Axioms 2025, 14, 791. https://doi.org/10.3390/axioms14110791

AMA Style

Ju Z, Guo L, Li J, Ju Q. Energy Management and Control for Linear–Quadratic–Gaussian Systems with Imperfect Acknowledgments and Energy Constraints. Axioms. 2025; 14(11):791. https://doi.org/10.3390/axioms14110791

Chicago/Turabian Style

Ju, Zhiping, Lijun Guo, Jiajia Li, and Qiangchang Ju. 2025. "Energy Management and Control for Linear–Quadratic–Gaussian Systems with Imperfect Acknowledgments and Energy Constraints" Axioms 14, no. 11: 791. https://doi.org/10.3390/axioms14110791

APA Style

Ju, Z., Guo, L., Li, J., & Ju, Q. (2025). Energy Management and Control for Linear–Quadratic–Gaussian Systems with Imperfect Acknowledgments and Energy Constraints. Axioms, 14(11), 791. https://doi.org/10.3390/axioms14110791

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Energy Management and Control for Linear–Quadratic–Gaussian Systems with Imperfect Acknowledgments and Energy Constraints

Abstract

1. Introduction

2. Plant

2.1. LQG System

2.2. Energy Harvesting and Storage

2.3. Energy Selection

3. Filter and Imperfect Feedback

3.1. Imperfect Feedback

3.2. State Estimation

4. Suboptimal Control and Energy Selection Strategy

4.1. Optimal Control Under the Optimal Estimated State with the Separate Principle

4.2. Optimal Energy Selection to Ensure Filtering Performance and MDP

4.3. Algorithm

5. Simulation Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI