Optimal Feedback Policy for the Tracking Control of Markovian Jump Boolean Control Networks over a Finite Horizon

Bingquan Chen; Yuyi Xue; Aiju Shi

doi:10.3390/math13081332

Abstract

This paper aims to find optimal feedback policies for the tracking control of Markovian jump Boolean control networks (MJBCNs) over a finite horizon. The tracking objective is a predetermined time-varying trajectory with a finite length. To minimize the expected total tracking error between the output trajectory of MJBCN and the reference trajectory, an algorithm is proposed to determine the optimal policy for the system. Furthermore, considering the penalty for control input changes, a new objective function is obtained by weighted summing the total tracking error with the total variation of control input. Certain optimal policies sre designed using an algorithm to minimize the expectation of the new objective function. Finally, the methodology is applied to two simplified biological models to demonstrate its effectiveness.

Keywords:

Boolean control networks; Markov switching; optimal output tracking control; dynamic programming; semi-tensor product (STP)

MSC:

93C29; 93E99; 94C11

1. Introduction

Boolean networks (BNs) were first put forward by Kauffman as a kind of model for genetic regulatory networks (GRNs) [1]. When simulating a GRN with a BN, each node of the BN represents a gene, whose state is quantified as 1 or 0, and the genes’ mutual regulatory interactions are characterized by logical functions. Moreover, sometimes, there are external drug interventions in GRNs, which can be interpreted as control inputs in BNs. BNs with control input nodes are termed Boolean control networks (BCNs). There was no unified tool for studying BNs until semi-tensor product (STP) was proposed [2], which has greatly facilitated research on many related problems regarding BNs, including stability and stabilization [3,4,5], controllability and observability [6,7,8], state estimation [9,10,11,12], state compression [13], and the output tracking control (OTC) problem [14,15,16,17]. In addition, STP has also been widely adopted in other areas like nonlinear shift registers [18] and fuzzy relational inequalities [19].

As we know, there is usually randomness in gene expression. Consequently, Shmulevich et al. proposed probabilistic Boolean networks (PBNs) to characterize the uncertainty in GRNs [20,21]. A PBN has several realizations, and each realization is selected according to a probability distribution for each time. Similarly, probabilistic Boolean control networks (PBCNs) are derived from PBNs by incorporating control inputs into networks. In addition, a Markov chain was shown to emulate the dynamic of a small GRN well [22]. Furthermore, Markovian jump Boolean control networks (MJBCNs) form another category of stochastic extensions to BNs. As stated in [23], PBCNs can be regarded as a special kind of MJBCNs. In an MJBCN, mode transitions are governed by a Markov chain. Recently, some results on MJBCNs have been obtained, such as controllability [23] and stabilization [24,25].

The significance of OTC lies in its ability to design the control so that the system output closely follows a desired reference trajectory. This is valuable in various practical applications such as robotic manipulator control [26] and flight control [27]. By solving the OTC problem, systems can operate stably in dynamic environments, reduce errors, improve performance, and lower costs and energy consumption. In fields like disease treatment, it can also help precisely control drug concentrations and treatment processes, enhancing effectiveness and safety. The finite-time OTC of BCNs was first investigated by Li et al. [14,15]. Moreover, the finite-time OTC of PBCNs has been studied in [28]. In addition, the asymptotic OTC of PBCNs has been addressed by Chen et al. [29]. In these studies, the tracking objective is a constant state or a reference system.

As stated in [30], a useful method to improve the efficiency of bioreactors is forcing the states of microalgas to follow a predetermined reference trajectory. Motivated by this, Zhang et al. [16] investigated the optimal (i.e., minimum error) OTC of BCNs over a finite time horizon, where the tracking objective is a predefined finite-length trajectory. Moreover, in practice, it is difficult to make substantial changes to the control inputs in a short period, or it may require increased costs. For example, in disease treatment, the concentration of a drug in the body typically decreases gradually rather than instantaneously. Thus, the penalty for the control input changes was taken into account in [16]. After that, the OTC problems of PBCNs and MJBCNs with respect to a predefined reference trajectory were studied in [31,32,33], respectively. However, there was no feedback policy provided for all of the initial states in [33]; moreover, a penalty for control input changes was not considered in [31,32,33].

In this article, we study the finite horizon OTC of MJBCNs with respect to a predefined reference trajectory with a finite length.

The OTC problem is reformulated as an optimal control problem, and then an optimal policy is obtained to minimize the expected total tracking error.
A new objective function is constructed by performing a weighted sum of the total tracking error and the total variation of the control input. An optimal policy is given to minimize the expected objective function value.
The optimal feedback policies obtained in this paper apply to all initial states. As shown in the examples, the design of policies is based on the specific weightings given to the two objectives (i.e., reducing tracking errors and decreasing input variations).

The rest of the paper is organized as follows. Section 2 introduces some preliminaries. In Section 3, the main results on the optimal finite horizon OTC of MJBCNs are obtained. Two examples are given in Section 4 to demonstrate the results. Finally, concluding remarks are provided in Section 5.

2. Preliminaries

The basic symbols are given in Table 1. Due to STP being a generalization of ordinary matrix multiplication [2], in the sequel, we omit symbol ⋉ as long as there is no ambiguity.

Table 1. Notations.

MJBCN is presented as follows:

\{\begin{matrix} X_{i} (t + 1) = f_{i}^{σ (t)} (X (t), U (t)), i = 1, \dots, n, \\ Y_{j} (t) = h_{j} (X (t)), j = 1, \dots, q, \end{matrix}

(1)

where

X (t) = (X_{1} (t), X_{2} (t), \dots, X_{n} (t)) \in D^{n}

is the state vector,

U (t) = (U_{1} (t), U_{2} (t), \dots,

U_{m} (t)) \in D^{m}

is the control input vector,

Y (t) = (Y_{1} (t), Y_{2} (t), \dots, Y_{q} (t)) \in D^{q}

is the output vector, and

σ (t) \in [1 : s]

is a Markov switching signal with a transition probability matrix (TPM)

P = {(p_{i j})}_{s \times s}

, that is,

\Pr {σ (t + 1) = j ∣ σ (t) = i} = p_{i j}, \forall i, j \in [1 : s]

. In addition, all

f_{i}^{k}, k \in [1 : s], i \in [1 : n]

and

h_{j}, j \in [1 : q]

are logical mappings.

Let

x (t) = ⋉_{i = 1}^{n} v (X_{i} (t)) \in Δ_{2^{n}}

,

y (t) = ⋉_{j = 1}^{q} v (Y_{j} (t)) \in Δ_{2^{q}}

and

u (t) = ⋉_{k = 1}^{m} v (U_{k} (t)) \in Δ_{2^{m}}

. Then MJBCN (1) can be converted into [24]:

\{\begin{matrix} x (t + 1) = F_{σ (t)} u (t) x (t), \\ y (t) = H x (t), \end{matrix}

(2)

where

F_{i} \in L_{2^{n} \times 2^{n + m}}

,

i \in [1 : s]

and

H \in L_{2^{q} \times 2^{n}}

.

Assumption 1

([35]).

x (t + 1)

and

σ (t + 1)

are conditionally independent of

x (t), σ (t), u (t)

for all

t > 0

.

Let

N = s 2^{n}

,

M = 2^{m}

, and

\tilde{F} = [F_{1} F_{2} \dots F_{s}] \in L_{2^{n} \times N M}

. Introduce instrumental variables

ω (t) = δ_{s}^{σ (t)} \in Δ_{s}

and

z (t) = ω (t) x (t) \in Δ_{N}

. Define a matrix

\hat{F} = (1_{M}^{⊺} \otimes P^{⊺} \otimes 1_{2^{n}}^{⊺}) * (\tilde{F} W_{[M, s]}) .

(3)

where

W_{[M, s]} = [I_{s} \otimes δ_{M}^{1} I_{s} \otimes δ_{M}^{2} \dots I_{s} \otimes δ_{M}^{M}]

is called the swap matrix [2]. Split matrix

\hat{F}

into M blocks of the same size:

\hat{F} = [{\bar{F}}_{1} {\bar{F}}_{2} \dots {\bar{F}}_{2^{m}}]

.

Proposition 1.

For any

δ_{N}^{i}, δ_{N}^{j} \in Δ_{N}

and

δ_{M}^{k} \in Δ_{M}

,

\Pr {z (t + 1) = δ_{N}^{i} ∣ z (t) = δ_{N}^{j}, u (t) = δ_{M}^{k}} = {[{\bar{F}}_{k}]}_{i j} .

Proof.

Suppose

δ_{N}^{i} = δ_{s}^{γ_{i}} δ_{2^{n}}^{ξ_{i}}

and

δ_{N}^{j} = δ_{s}^{γ_{j}} δ_{2^{n}}^{ξ_{j}}

. Note that

x (t + 1) = \tilde{F} ω (t) u (t) x (t) = \tilde{F} W_{[M, s]} u (t) z (t)

, so

\Pr {x (t + 1) = δ_{2^{n}}^{ξ_{i}} ∣ z (t) = δ_{N}^{j}, u (t) = δ_{M}^{k}} = {[\tilde{F} W_{[M, s]} δ_{M}^{k}]}_{ξ_{i} j} .

Moreover, note that

\Pr {σ (t + 1) = γ_{i} ∣ σ (t) = γ_{j}} = {[P^{⊺}]}_{γ_{i} γ_{j}}

, so

\Pr {σ (t + 1) = γ_{i} ∣ z (t) = δ_{N}^{j}, u (t) = δ_{M}^{k}} = {[P^{⊺}]}_{γ_{i} γ_{j}} = {[P^{⊺} \otimes 1_{2^{n}}^{⊺}]}_{γ_{i} j} .

Therefore, under Assumption 1,

\begin{matrix} \Pr {z (t + 1) = δ_{N}^{i} ∣ z (t) = δ_{N}^{j}, u (t) = δ_{M}^{k}} \\ = & \Pr {σ (t + 1) = γ_{i}, x (t + 1) = δ_{2^{n}}^{ξ_{i}} ∣ z (t) = δ_{N}^{j}, u (t) = δ_{M}^{k}} \\ = & \Pr {σ (t + 1) = γ_{i} ∣ σ (t) = γ_{j}, x (t) = δ_{2^{n}}^{ξ_{j}}, u (t) = δ_{M}^{k}} \\ \times \Pr {x (t + 1) = δ_{2^{n}}^{ξ_{i}} ∣ z (t) = δ_{N}^{j}, u (t) = δ_{M}^{k}} \\ = & {[P^{⊺} \otimes 1_{2^{n}}^{⊺}]}_{γ_{i} j} \times {[\tilde{F} W_{[M, s]} δ_{M}^{k}]}_{ξ_{i} j} = {[{\bar{F}}_{k}]}_{i j}, \end{matrix}

That is,

\Pr {z (t + 1) = δ_{N}^{i} ∣ z (t) = δ_{N}^{j}, u (t) = δ_{M}^{k}} = {[{\bar{F}}_{k}]}_{i j}

. □

3. Main Results

3.1. Finite Horizon OTC of MJBCNs

A reference trajectory of length

T \in N

is given as follows:

y^{r} (1) = δ_{2^{q}}^{r_{1}}, y^{r} (2) = δ_{2^{q}}^{r_{2}}, \dots, y^{r} (T) = δ_{2^{q}}^{r_{T}},

(4)

where

r_{t} \in [1 : 2^{q}]

,

t = 1, \dots, T

.

A policy is a sequence of mappings in the form of

π = {ϕ_{0}, ϕ_{1}, \dots, ϕ_{T - 1}}

, where

ϕ_{t} : Δ_{N} \to Δ_{M}

\forall t \in [0 : T - 1]

. There is a

K_{ϕ_{t}} \in L_{M \times N}

for each

ϕ_{t}

such that

ϕ_{t} (z (t)) = K_{ϕ_{t}} z (t)

. If policy

π

is determined, we stipulate

u (t) = K_{ϕ_{t}} z (t), t \in [0 : T - 1] .

(5)

Conversely, if the feedback control (5) is given, policy

π

is also determined.

Π

represents the set of all policies.

Definition 1.

The reference trajectory (4) is considered exactly tracked by MJBCN (2) via policy π under the initial condition

z (0) = z_{0} \in Δ_{N}

if

\Pr {y (t) = y^{r} (t) ∣ z (0) = z_{0}, π} = 1, \forall t \in [1 : T] .

If the reference output trajectory (4) can not be exactly tracked, we attempt to find a policy

π \in Π

that can minimize the expected total tracking error between the output trajectory of MJBCN (2) and the reference trajectory (4) from

t = 1

to

t = T

.

Given two output state vectors

Y^{'} = (Y_{1}^{'}, Y_{2}^{'}, \dots, Y_{q}^{'}) \in D^{q}

and

Y^{″} = (Y_{1}^{″}, Y_{2}^{″}, \dots, Y_{q}^{″}) \in D^{q}

, let

y_{1} = ⋉_{i = 1}^{q} v (Y_{i}^{'}) \in Δ_{2^{q}}

and

y_{2} = ⋉_{i = 1}^{q} v (Y_{i}^{″}) \in Δ_{2^{q}}

. The distance between

y_{1}

and

y_{2}

(or between

Y^{'}

and

Y^{″}

) is given as follows [16]:

d (y_{1}, y_{2}) = \sum_{i = 1}^{q} | Y_{i}^{'} - Y_{i}^{″} | .

(6)

For example, suppose

Y^{'} = (1, 1, 1)

and

Y^{″} = (0, 1, 1)

. We can calculate

y_{1} = δ_{8}^{1}

and

y_{2} = δ_{8}^{5}

. By (6),

d (y_{1}, y_{2}) = 1

. Although

Y^{'}

and

y_{1}

(likewise

Y^{″}

and

y_{2}

) uniquely determine each other,

d (y_{1}, y_{2})

cannot be intuitively observed from

y_{1} = δ_{8}^{1}

and

y_{2} = δ_{8}^{5}

. In fact, the Euclidean distance

∥ y_{1} - y_{2} ∥

of

y_{1}

and

y_{2}

is either 0 or

\sqrt{2}

, which cannot reflect the degree of difference between

Y^{'}

and

Y^{″}

. The distance formula (6) represents the number of different components of

Y^{'}

and

Y^{″}

. In comparison, the distance formula (6) is more in line with our requirements than the Euclidean distance.

The total tracking error between the output trajectory of MJBCN (2) and the reference trajectory (4) is expressed as

\begin{matrix} e (z (0), y, y^{r}) = \sum_{t = 1}^{T} d (y (t), y^{r} (t)) . \end{matrix}

(7)

As the output state vectors of system (2) are finite in number,

E \{e (z (0), y, y^{r})\} = 0

means that MJBCN (2) exactly tracks the reference trajectory (4). We intend to find a policy to minimize

E \{e (z (0), y, y^{r})\}

for all

z (0) \in Δ_{N}

.

Define weight factor vectors:

\begin{matrix} C (T) = [d (δ_{2^{q}}^{1}, δ_{2^{q}}^{r_{T}}) d (δ_{2^{q}}^{2}, δ_{2^{q}}^{r_{T}}) \dots d (δ_{2^{q}}^{2^{q}}, δ_{2^{q}}^{r_{T}})], \\ C (t) = 1_{M}^{⊺} \otimes [d (δ_{2^{q}}^{1}, δ_{2^{q}}^{r_{t}}) d (δ_{2^{q}}^{2}, δ_{2^{q}}^{r_{t}}) \dots d (δ_{2^{q}}^{2^{q}}, δ_{2^{q}}^{r_{t}})], t \in [1 : T - 1] . \end{matrix}

(8)

Then

d (y (t), y^{r} (t)) = C (t) u (t) y (t)

,

t \in [1 : T - 1]

and

d (y (T), y^{r} (T)) = C (T) y (T)

. Consequently,

\begin{matrix} e (z (0), y, y^{r}) = & C (T) y (T) + \sum_{t = 1}^{T - 1} C (t) u (t) y (t) \\ = & C (T) H (1_{s}^{⊺} ω (T)) x (T) + \sum_{t = 1}^{T - 1} C (t) u (t) H (1_{s}^{⊺} ω (t)) x (t) \\ = & C (T) H 1_{s}^{⊺} z (T) + \sum_{t = 1}^{T - 1} C (t) (I_{M} \otimes (H 1_{s}^{⊺})) u (t) z (t) . \end{matrix}

When the dimensions of the matrices do not match, we default to using STP.

Update the weight factor vectors:

\{\begin{matrix} \tilde{C} (0) = 0_{N M}^{⊺}, \\ \tilde{C} (t) = C (t) (I_{M} \otimes (H 1_{s}^{⊺})), t \in [1 : T - 1], \\ \tilde{C} (T) = C (T) H 1_{s}^{⊺} . \end{matrix}

(9)

Then we have

e (z (0), y, y^{r}) = \tilde{C} (T) z (T) + \sum_{t = 0}^{T - 1} \tilde{C} (t) u (t) z (t) .

(10)

Minimizing

E \{e (z (0), y, y^{r}))\}

by a policy can be formulated as addressing an optimization problem as follows:

\begin{matrix} min_{π \in Π} J (z (0), π) & = min_{π \in Π} E \{e (z (0), y, y^{r}))\}, \\ π^{*} & = \underset{π \in Π}{arg min} J (z (0), π) . \end{matrix}

(11)

π^{*}

is called an optimal policy if (11) holds for all

z (0) \in Δ_{N}

.

The sub-policies of

π = {ϕ_{0}, ϕ_{1}, \dots, ϕ_{T - 1}}

are denoted by

π_{k} = {ϕ_{k}, ϕ_{k + 1}, \dots, ϕ_{T - 1}}

,

k \in [1 : T - 1]

. The set of all possible

π_{k}

is represented by

Π_{k}

. Next, define the optimal values of the optimization problem (11) and its sub-problems as follows:

\{\begin{matrix} J_{0}^{*} (z (0)) = min_{π \in Π} J (z (0), π), \\ J_{k}^{*} (z (k)) = min_{π_{k} \in Π_{k}} E \{\tilde{C} (T) z (T) + \sum_{t = k}^{T - 1} \tilde{C} (t) u (t) z (t)\}, k \in [1 : T - 1], \\ J_{T}^{*} (z (T)) = \tilde{C} (T) z (T) . \end{matrix}

(12)

The expectation in (12) is actually the conditional expectation given

z (k) \in Δ_{N}

and

π_{k} \in Π_{k}

. The following lemma, based on dynamic programming, is given to calculate

J_{0}^{*} (z (0))

through an iterative process for all

z (0) \in Δ_{N}

. We omit the detailed proof for brevity, which is similar to the Lemma 2 of [36] and the Theorem 4.1 of [37].

Lemma 1.

For any given

z (t) \in Δ_{N}

,

J_{t}^{*} (z (t)) = min_{u (t) \in Δ_{M}} E \{\tilde{C} (t) u (t) z (t) + J_{t + 1}^{*} (z (t + 1))\}, t \in [0 : T - 1] .

(13)

Moreover, if

u (t) = K_{ϕ_{t}^{*}} z (t)

minimizes the expectation in (13) for all

z (t) \in Δ_{N}

and all

t \in [0 : T - 1]

, then

π^{*} = {ϕ_{0}^{*}, ϕ_{1}^{*}, \dots, ϕ_{T - 1}^{*}}

is an optimal solution of (11).

Specifically, given

z (t) = δ_{N}^{j}

in (13), by Proposition 1,

J_{t}^{*} (δ_{N}^{j}) = min_{k \in [1 : M]} \{\tilde{C} (t) δ_{M}^{k} δ_{N}^{j} + \sum_{i = 1}^{N} {[{\bar{F}}_{k}]}_{i j} J_{t + 1}^{*} (δ_{N}^{i})\}, t \in [0 : T - 1] .

(14)

Furthermore, define the optimal value vectors:

J_{t}^{*} = [J_{t}^{*} (δ_{N}^{1}) J_{t}^{*} (δ_{N}^{2}) \dots J_{t}^{*} (δ_{N}^{N})], t \in [0 : T] .

Then, Equation (14) is equivalent to

\begin{matrix} J_{t}^{*} (δ_{N}^{j}) = min_{k \in [1 : M]} {[\tilde{C} (t) + J_{t + 1}^{*} \hat{F}]}_{(k - 1) N + j}, t \in [0 : T - 1], \end{matrix}

where

{[\tilde{C} (t) + J_{t + 1}^{*} \hat{F}]}_{(k - 1) N + j}

represents the

(k - 1) N + j

-th entry of

\tilde{C} (t) + J_{t + 1}^{*} \hat{F}

.

Theorem 1.

The policy

π^{*}

obtained by Algorithm 1 can minimize the expected total tracking error between the output trajectory of MJBCN (2) and the reference trajectory (4) for all of the initial states.

Proof.

It easy to find that the policy

π^{*} = {ϕ_{0}^{*}, ϕ_{1}^{*}, \dots, ϕ_{T - 1}^{*}}

obtained by Algorithm 1 satisfies

μ_{j} (t) = {arg min}_{k \in [1 : M]} {[\tilde{C} (t) + J_{t + 1}^{*} \hat{F}]}_{(k - 1) N + j}

for all

δ_{N}^{j} \in Δ_{N}

and all

t \in [0 : T - 1]

, where

K_{ϕ_{t}^{*}} : = δ_{2^{m}} [μ_{1} (t) μ_{2} (t) \dots μ_{N} (t)]

. That is,

u (t) = K_{ϕ_{t}^{*}} z (t)

can minimize the expectation in (13) for all

z (t) \in Δ_{N}, t \in [0 : T - 1]

. Therefore, by Lemma 1,

π^{*} = {ϕ_{0}^{*}, ϕ_{1}^{*}, \dots, ϕ_{T - 1}^{*}}

is an optimal solution of (11). In other words,

π^{*}

can minimize the expected total tracking error for all of the initial states. □

Remark 1.

Algorithm 1 calculates

J_{t}^{*}

,

t = T - 1, \dots, 0

recursively and elementwise. The calculation of

J_{t + 1}^{*} \hat{F}

takes a time complexity of

O (N^{2} M)

, while lines

4 - 11

require

O (N M)

time complexity, which is negligible compared with

O (N^{2} M)

. Therefore, Algorithm 1 takes, at most,

O (T N^{2} M)

time complexity.

Algorithm 1: Find an optimal solution for (11).

3.2. Finite Horizon OTC of MJBCNs with a Penalty for Control Input Changes

Given two control input vectors

U^{'} = (U_{1}^{'}, U_{2}^{'}, \dots, U_{m}^{'}), U^{″} = (U_{1}^{″}, U_{2}^{″}, \dots, U_{m}^{″}) \in D^{m}

, let

u_{1} = ⋉_{i = 1}^{m} v (U_{i}^{'}) \in Δ_{M}

and

u_{2} = ⋉_{i = 1}^{m} v (U_{i}^{″}) \in Δ_{M}

. Similar to the distance formula (6), the distance between

u_{1}

and

u_{2}

(or between

U^{'}

and

U^{″}

) is given by

d_{i n} (u_{1}, u_{2}) = \sum_{i = 1}^{m} | U_{i}^{'} - U_{i}^{″} | .

Then, the total variation of the control input of MJBCN (2) within the time period of

[0 : T - 1]

is expressed as

\begin{matrix} \tilde{e} (z (0), \nabla u) = \sum_{t = 1}^{T - 1} d_{i n} (u (t), u (t - 1)) . \end{matrix}

(15)

Define a penalty factor vector:

\tilde{D} = [d_{i n} (δ_{M}^{1}, δ_{M}^{1}) d_{i n} (δ_{M}^{1}, δ_{M}^{2}) \dots d_{i n} (δ_{M}^{1}, δ_{M}^{M}) d_{i n} (δ_{M}^{2}, δ_{M}^{1}) \dots d_{i n} (δ_{M}^{M}, δ_{M}^{M})],

(16)

which satisfies

d_{i n} (u_{1}, u_{2}) = \tilde{D} ⋉ u_{1} ⋉ u_{2}, \forall u_{1}, u_{2} \in Δ_{M}

. Then

\tilde{e} (z (0), \nabla u) = \sum_{t = 1}^{T - 1} \tilde{D} u (t) u (t - 1) .

(17)

By performing a weighted sum of (10) and (17), a new objective function denoted by

\hat{e} (z (0), y, y^{r}, \nabla u)

is obtained as follows:

\begin{matrix} \hat{e} (z (0), y, y^{r}, \nabla u) = & α \cdot e (z (0), y, y^{r}) + (1 - α) \cdot \tilde{e} (z (0), \nabla u), \\ = & α \{\tilde{C} (T) z (T) + \sum_{t = 1}^{T - 1} \tilde{C} (t) u (t) z (t)\} + (1 - α) \sum_{t = 1}^{T - 1} \tilde{D} u (t) u (t - 1), \end{matrix}

(18)

where

0 \leq α \leq 1

. We aim to minimize

E \{\hat{e} (z (0), y, y^{r}, \nabla u)\}

for all

z (0) \in Δ_{N}

. When we are more concerned with the tracking error, we can set

α

to a larger value. Conversely, if we want to reduce the variation of the control input, we can set

α

to a smaller value.

Based on the Kronecker product and STP, we can derive

\{\begin{matrix} \tilde{C} (T) z (T) = (\tilde{C} (T) \otimes 1_{M}^{⊺}) z (T) u (T - 1), \\ \tilde{C} (t) u (t) z (t) = (\tilde{C} (t) \otimes 1_{M}^{⊺}) u (t) z (t) u (t - 1), \\ \tilde{D} u (t) u (t - 1) = \tilde{D} (I_{M} \otimes 1_{N}^{⊺}) u (t) z (t) u (t - 1) . \end{matrix}

Introduce another instrumental variable

\hat{z} (t)

as follows:

\{\begin{matrix} \hat{z} (0) & = z (0) \in Δ_{N}, \\ \hat{z} (t) & = z (t) u (t - 1) \in Δ_{N M}, t = 1, 2, \dots . \end{matrix}

Define weight factor vectors:

\{\begin{matrix} \hat{C} (0) = 0_{N M}^{⊺}, \\ \hat{C} (t) = α (\tilde{C} (t) \otimes 1_{M}^{⊺}) + (1 - α) \tilde{D} (I_{M} \otimes 1_{N}^{⊺}), t \in [1 : T - 1], \\ \hat{C} (T) = α (\tilde{C} (T) \otimes 1_{M}^{⊺}) . \end{matrix}

(19)

Then we have

\hat{e} (z (0), y, y^{r}, \nabla u) = \hat{C} (T) \hat{z} (T) + \sum_{t = 0}^{T - 1} \hat{C} (t) u (t) \hat{z} (t) .

(20)

Define two matrices as follows:

\{\begin{matrix} R & = \hat{F} * (I_{M} \otimes 1_{N}^{⊺}), \\ L & = (\hat{F} \otimes 1_{M}^{⊺}) * (I_{M} \otimes 1_{N M}^{⊺}) . \end{matrix}

(21)

Split R and L into M blocks of the same size:

R = [{\bar{R}}_{1} {\bar{R}}_{2} \dots {\bar{R}}_{M}]

and

L = [{\bar{L}}_{1} {\bar{L}}_{2} \dots {\bar{L}}_{M}]

.

Proposition 2.

For any

δ_{N M}^{i}, δ_{N}^{j} \in Δ_{N}

and

δ_{M}^{k} \in Δ_{M}

,

\Pr {\hat{z} (1) = δ_{N M}^{i} ∣ \hat{z} (0) = δ_{N}^{j}, u (0) = δ_{M}^{k}} = {[{\bar{R}}_{k}]}_{i j};

for any

δ_{N M}^{i}, δ_{N M}^{j} \in Δ_{N M}

and

δ_{M}^{k} \in Δ_{M}

,

\Pr {\hat{z} (t + 1) = δ_{N M}^{i} ∣ \hat{z} (t) = δ_{N M}^{j}, u (t) = δ_{M}^{k}} = {[{\bar{L}}_{k}]}_{i j} .

Proof.

The proof is similar to Proposition 1. □

Next, a policy is in the form of

\hat{π} = {ψ_{0}, ψ_{1}, \dots, ψ_{T - 1}}

, where

ψ_{0} : Δ_{N} \to Δ_{M}

and

ψ_{t} : Δ_{N M} \to Δ_{M}

\forall t \in [1 : T - 1]

. There is a

K_{ψ_{0}} \in L_{M \times N}

such that

ψ_{0} (\hat{z} (0)) = K_{ψ_{0}} \hat{z} (0)

, and for each

ψ_{t}

with

t \geq 1

, there is a

G_{ψ_{t}} \in L_{M \times N M}

such that

ψ_{t} (\hat{z} (t)) = G_{ψ_{t}} \hat{z} (t)

. Once a policy

\hat{π}

is given, a feedback control is determined as follows:

\{\begin{matrix} u (0) = K_{ψ_{0}} \hat{z} (0), \\ u (t) = G_{ψ_{t}} \hat{z} (t), t \in [1 : T - 1] . \end{matrix}

(22)

The set of all possible

\hat{π}

is represented by

\hat{Π}

.

Minimizing the expected value of (20) by a policy is equivalent to solving an optimization problem as follows:

\begin{matrix} min_{\hat{π} \in \hat{Π}} \hat{J} (z (0), \hat{π}) & = min_{\hat{π} \in \hat{Π}} E \{\hat{e} (z (0), y, y^{r}, u)\}, \\ {\hat{π}}^{*} & = \underset{\hat{π} \in \hat{Π}}{arg min} \hat{J} (z (0), \hat{π}) . \end{matrix}

(23)

{\hat{π}}^{*}

is called an optimal policy if (23) holds for all

z (0) \in Δ_{N}

.

The sub-policies of

\hat{π} = {ψ_{0}, ψ_{1}, \dots, ψ_{T - 1}}

are denoted by

{\hat{π}}_{k} = {ψ_{k}, ψ_{k + 1}, \dots, ψ_{T - 1}}

,

k \in [1 : T - 1]

. Denote by

{\hat{Π}}_{k}

the set of all possible

{\hat{π}}_{k}

. Define the optimal values of the optimization problem (23) and its sub-problems as follows:

\{\begin{matrix} {\hat{J}}_{0}^{*} (z (0)) = min_{\hat{π} \in \hat{Π}} \hat{J} (z (0), \hat{π}), \\ {\hat{J}}_{k}^{*} (\hat{z} (k)) = min_{{\hat{π}}_{k} \in {\hat{Π}}_{k}} E \{\hat{C} (T) \hat{z} (T) + \sum_{t = k}^{T - 1} \hat{C} (t) u (t) \hat{z} (t)\}, k \in [1 : T - 1], \\ {\hat{J}}_{T}^{*} (\hat{z} (T)) = \hat{C} (T) \hat{z} (T) . \end{matrix}

Similar to Lemma 1, the following lemma is given to determine

{\hat{J}}_{0}^{*} (z (0))

through an iterative process for all

z (0) \in Δ_{N}

.

Lemma 2.

For any given

\hat{z} (0) \in Δ_{N}

,

{\hat{J}}_{0}^{*} (\hat{z} (0)) = min_{u (0) \in Δ_{M}} E \{\hat{C} (0) u (0) \hat{z} (0) + {\hat{J}}_{1}^{*} (\hat{z} (1))\},

(24)

and for any given

\hat{z} (t) \in Δ_{N M}

,

{\hat{J}}_{t}^{*} (\hat{z} (t)) = min_{u (t) \in Δ_{M}} E \{\hat{C} (t) u (t) \hat{z} (t) + {\hat{J}}_{t + 1}^{*} (\hat{z} (t + 1))\}, t \in [1 : T - 1] .

(25)

Moreover, if

u (0) = K_{ψ_{0}} \hat{z} (0)

minimizes the expectation in (24) for all

\hat{z} (0) \in Δ_{N}

, and

u (t) = G_{ϕ_{t}^{*}} \hat{z} (t)

minimizes the expectation in (25) for all

\hat{z} (t) \in Δ_{N M}, t \in [1 : T - 1]

, then

{\hat{π}}^{*} = {ψ_{0}^{*}, ψ_{1}^{*}, \dots, ψ_{T - 1}^{*}}

is an optimal solution of (23).

Define the optimal value vectors:

\begin{matrix} {\hat{J}}_{0}^{*} & = [{\hat{J}}_{0}^{*} (δ_{N}^{1}) {\hat{J}}_{0}^{*} (δ_{N}^{2}) \dots {\hat{J}}_{0}^{*} (δ_{N}^{N})], \\ {\hat{J}}_{t}^{*} & = [{\hat{J}}_{t}^{*} (δ_{N M}^{1}) {\hat{J}}_{t}^{*} (δ_{N M}^{2}) \dots {\hat{J}}_{t}^{*} (δ_{N M}^{N M})], t \in [1 : T] . \end{matrix}

Similarly, given

\hat{z} (0) = δ_{N}^{i}

in (24) and

\hat{z} (t) = δ_{N M}^{j}

in (25), by Proposition 2,

\{\begin{matrix} {\hat{J}}_{0}^{*} (δ_{N}^{i}) = min_{k \in [1 : M]} {[{\hat{J}}_{1}^{*} R]}_{(k - 1) N + i}, \\ {\hat{J}}_{t}^{*} (δ_{N M}^{j}) = min_{k \in [1 : M]} {[\hat{C} (t) + {\hat{J}}_{t + 1}^{*} L]}_{(k - 1) N M + j}, t \in [1 : T - 1] . \end{matrix}

Theorem 2.

The policy

{\hat{π}}^{*}

obtained by Algorithm 2 can minimize the expectation of the objective function (20) for all initial states.

Proof.

Based on Lemma 2, the proof is similar to Theorem 1. □

Remark 2.

Algorithm 2 calculates

{\hat{J}}_{t}^{*}

,

t = T - 1, \dots, 0

recursively and elementwise. The calculation of

{\hat{J}}_{t + 1}^{*} L

takes a time complexity of

O (N^{2} M^{3})

, while lines

4 - 11

operate in

O (N M^{2})

time. Thus, lines

2 - 13

require

O (T N^{2} M^{3})

. The time complexity of the rest is obviously less than

O (T N^{2} M^{3})

. Therefore, Algorithm 2 takes at most

O (T N^{2} M^{3})

time complexity. Comparing Algorithm 1 with Algorithm 2, we observe that Algorithm 1 has a lower time complexity. Therefore, if the penalty for control input changes is not considered in the OTC problem, we prioritize using Algorithm 1 to design an optimal strategy

π^{*}

for MJBCN (2). However, when such a penalty is incorporated, it becomes necessary to employ Algorithm 2 to develop an optimal strategy

{\hat{π}}^{*}

.

Algorithm 2: Calculate an optimal solution for (23).

Remark 3.

Generally,

u (t)

is given from

t = 0

. In [16], the virtual variable

u (- 1)

needs to be used. Although time-invariant BCNs were considered, the optimal finite horizon OTC problem with a penalty for control input changes has not been completely addressed. In this paper, we define

\hat{z} (t)

in segmented form, which can effectively solve this problem.

4. Illustrative Examples

Example 1.

Consider an MJBCN model of the form (1) with 3 internal nodes, 1 input node, 2 output nodes and 2 realizations [38], where

f_{1}^{1} = X_{1}

,

f_{2}^{1} = \neg X_{1} \land X_{3}

,

f_{3}^{1} = X_{2} \lor U_{1}

,

f_{1}^{2} = (\neg X_{1} \land X_{2}) \lor U_{1}

,

f_{2}^{2} = X_{1} \land X_{2}

,

f_{3}^{2} = X_{1} \land X_{2}

,

h_{1} = X_{1}

,

h_{2} = X_{2}

. The TPM of

{σ (t), t \in N}

is assumed to be

P = (\begin{matrix} 0 & 1 \\ 0.6 & 0.4 \end{matrix})

.

Let

x (t) = ⋉_{i = 1}^{3} v (X_{i} (t))

,

u (t) = v (U_{1} (t))

,

y (t) = v (Y_{1} (t)) ⋉ v (Y_{2} (t))

,

ω (t) = δ_{2}^{σ (t)}

and

z (t) = ω (t) x (t)

. Then, this system can be converted into the form (2) with

F_{1} = δ_{8} [3 3 3 3 5 7 5 7 3 3 4 4 5 7 6 8]

,

F_{2} = δ_{8} [1 1 4 4 4 4 4 4 5 5 8 8 4 4 8 8]

and

H = δ_{4} [1 1 2 2 3 3 4 4]

. The calculation results of

\hat{F}

, R, and L are omitted.

A reference trajectory is given in Table 2. By (8), we can obtain

\begin{matrix} C (4) = [2 1 1 0], \\ C (1) = [1 2 0 1 1 2 0 1], \\ C (2) = [1 0 2 1 1 0 2 1], \\ C (3) = [0 1 1 2 0 1 1 2] . \end{matrix}

Next, by (9), we can calculate

\begin{matrix} \tilde{C} (4) = [2 2 1 1 1 1 0 0 2 2 1 1 1 1 0 0], \\ \tilde{C} (1) = [1 1 2 2 0 0 1 1 1 1 2 2 0 0 1 1 1 1 2 2 0 0 1 1 1 1 2 2 0 0 1 1], \\ \tilde{C} (2) = [1 1 0 0 2 2 1 1 1 1 0 0 2 2 1 1 1 1 0 0 2 2 1 1 1 1 0 0 2 2 1 1], \\ \tilde{C} (3) = [0 0 1 1 1 1 2 2 0 0 1 1 1 1 2 2 0 0 1 1 1 1 2 2 0 0 1 1 1 1 2 2] . \end{matrix}

By Algorithm 1, we can successively obtain

\begin{matrix} J_{4}^{*} = [2 2 1 1 1 1 0 0 2 2 1 1 1 1 0 0], \\ J_{3}^{*} = [1 1 2 2 2 1 3 2 1 1 1 1 2 2 2 2], \\ J_{2}^{*} = [2 2 1 1 4 4 3 3 2 2 1.6 1.6 3.6 3.6 2.6 2.6], \\ J_{1}^{*} = [2.6 2.6 3.6 3.6 3.6 2.6 4.6 3.6 3 3 3.24 3.24 1.24 1.24 2.24 2.24], \\ J_{0}^{*} = [3.24 3.24 3.24 3.24 1.24 2.24 1.24 2.24 2.656 2.656 3.056 3.056 3.456 3.456 3.056 3.056], \end{matrix}

and an optimal policy

π^{*} = {ϕ_{0}^{*}, ϕ_{1}^{*}, ϕ_{2}^{*}, ϕ_{3}^{*}}

with feedback matrices

\begin{matrix} K_{ϕ_{0}^{*}} = δ_{2} [1 1 1 1 1 1 1 1 2 2 2 2 1 1 2 2], \\ K_{ϕ_{1}^{*}} = δ_{2} [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1], \\ K_{ϕ_{2}^{*}} = δ_{2} [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1], \\ K_{ϕ_{3}^{*}} = δ_{2} [1 1 1 1 1 1 1 1 2 2 2 2 1 1 2 2] . \end{matrix}

Table 2. Reference trajectory 1.

Next, take the penalty for the control input changes into account. Let

α = 0.7

. By (16) and (19), we can obtain

\tilde{D} = [0 1 1 0]

and

\begin{matrix} \hat{C} (4) = [1.4 1.4 1.4 1.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0 0 0 0 \\ 1.4 1.4 1.4 1.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0 0 0 0], \\ \hat{C} (1) = [0.7 1 0.7 1 1.4 1.7 1.4 1.7 0 0.3 0 0.3 0.7 1 0.7 1 \\ 0.7 1 0.7 1 1.4 1.7 1.4 1.7 0 0.3 0 0.3 0.7 1 0.7 1 \\ 1 0.7 1 0.7 1.7 1.4 1.7 1.4 0.3 0 0.3 0 1 0.7 1 0.7 \\ 1 0.7 1 0.7 1.7 1.4 1.7 1.4 0.3 0 0.3 0 1 0.7 1 0.7], \\ \hat{C} (2) = [0.7 1 0.7 1 0 0.3 0 0.3 1.4 1.7 1.4 1.7 0.7 1 0.7 1 \\ 0.7 1 0.7 1 0 0.3 0 0.3 1.4 1.7 1.4 1.7 0.7 1 0.7 1 \\ 1 0.7 1 0.7 0.3 0 0.3 0 1.7 1.4 1.7 1.4 1 0.7 1 0.7 \\ 1 0.7 1 0.7 0.3 0 0.3 0 1.7 1.4 1.7 1.4 1 0.7 1 0.7], \\ \hat{C} (3) = [0 0.3 0 0.3 0.7 1 0.7 1 0.7 1 0.7 1 1.4 1.7 1.4 1.7 \\ 0 0.3 0 0.3 0.7 1 0.7 1 0.7 1 0.7 1 1.4 1.7 1.4 1.7 \\ 0.3 0 0.3 0 1 0.7 1 0.7 1 0.7 1 0.7 1.7 1.4 1.7 1.4 \\ 0.3 0 0.3 0 1 0.7 1 0.7 1 0.7 1 0.7 1.7 1.4 1.7 1.4] . \end{matrix}

By Algorithm 2, we can successively get

\begin{matrix} {\hat{J}}_{4}^{*} = & [1.4 1.4 1.4 1.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0 0 0 0 \\ 1.4 1.4 1.4 1.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0 0 0 0], \\ {\hat{J}}_{3}^{*} = & [0.7 0.7 0.7 0.7 1.4 1.4 1.4 1.4 1.4 1.4 0.7 0.7 2.1 2.1 1.4 1.4 \\ 1 0.7 1 0.7 1 0.7 1 0.7 1.4 1.4 1.4 1.4 1.7 1.4 1.7 1.4], \\ {\hat{J}}_{2}^{*} = & [1.7 1.4 1.7 1.4 1 0.7 1 0.7 2.8 2.8 3.1 2.8 2.1 2.1 2.4 2.1 \\ 1.52 1.82 1.52 1.82 1.24 1.4 1.24 1.4 2.64 2.52 2.64 2.52 1.94 2.1 1.94 2.1], \\ {\hat{J}}_{1}^{*} = & [1.94 2.1 1.94 2.1 2.64 2.8 2.64 2.8 2.64 2.52 1.94 2.1 3.34 3.22 2.64 2.8 \\ 2.328 2.628 2.328 2.628 2.496 2.796 2.496 2.796 1.096 0.98 1.096 0.98 1.796 2.096 1.796 2.096], \\ {\hat{J}}_{0}^{*} = & [2.496 2.496 2.496 2.496 0.98 1.796 0.98 1.796 \\ 1.904 1.904 2.5184 2.5184 2.5824 2.5824 2.5184 2.5184], \end{matrix}

and an optimal policy

{\hat{π}}^{*} = {ψ_{0}^{*}, ψ_{1}^{*}, ψ_{2}^{*}, ψ_{3}^{*}}

with feedback matrices

\begin{matrix} K_{ψ_{0}^{*}} = δ_{2} [ & 1 1 1 1 2 1 2 1 2 2 2 2 1 1 2 2], \\ G_{ψ_{1}^{*}} = δ_{2} [ & 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1], \\ G_{ψ_{2}^{*}} = δ_{2} [ & 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2], \\ G_{ψ_{3}^{*}} = δ_{2} [ & 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2] . \end{matrix}

For each given parameter α, we can always obtain the optimal value of

E \{\hat{e} (z (0), y, y^{r}, u)\}

and an optimal policy by Algorithm 2. However, these optimal values are not directly comparable across different α settings. Therefore, to evaluate the relative merits of different α values, we compare the performance of

E {e (z (0), y, y^{r})}

and

E {\tilde{e} (z (0), \nabla u)}

under their respective optimal policies generated by varying α. This approach allows us to select a preferable α value.

Let α in the objective function (18) take

1, 0.7, 0.4, 0

sequentially, and an optimal policy can be determined by Algorithm 2 for each value of α. Under each corresponding optimal policy,

E {e (z (0), y, y^{r})}

and

E {\tilde{e} (z (0), \nabla u)}

for each

z (0) \in Δ_{16}

are shown in Figure 1 and Figure 2, respectively, where a horizontal axis value of i actually means

z (0) = δ_{16}^{i}

,

i \in [1 : 16]

.

Figure 1.

E {e (z (0), y, y^{r})}

for each

z (0) \in Δ_{16}

under the optimal policy.

Figure 2.

E {\tilde{e} (z (0), \nabla u)}

for each

z (0) \in Δ_{16}

under the optimal policy.

As shown in the figures, increasing α results in a smaller tracking error, whereas reducing α leads to diminished variation in the control input. This aligns with the design intent of the objective function (18). In comparison, we observe that when

α = 0.7

, both the tracking error and variation of control input are effectively maintained at satisfactory levels.

Example 2.

Consider an MJBCN model of the form (1) with 9 internal nodes, 2 input nodes, 3 output nodes, and 2 realizations [39], where

\begin{matrix} f_{1}^{1} = f_{1}^{2} = \neg X_{7} \land X_{3}; f_{2}^{1} = f_{2}^{2} = X_{1}; f_{3}^{1} = f_{3}^{2} = \neg U_{1}; \\ f_{4}^{1} = f_{4}^{2} = X_{5} \land X_{6}; f_{5}^{1} = X_{5}, f_{5}^{2} = X_{5} \land U_{2}; \\ f_{6}^{1} = f_{6}^{2} = X_{1}; f_{7}^{1} = f_{7}^{2} = \neg X_{4} \land \neg X_{8}; \\ f_{8}^{1} = f_{8}^{2} = X_{4} \lor X_{5} \lor X_{9}; f_{9}^{1} = \neg U_{1} \land X_{9}, f_{9}^{2} = X_{9}; \\ h_{1} = \neg X_{1} \land X_{3}; h_{2} = X_{6} \lor X_{9}; h_{3} = X_{2} \land \neg X_{8} . \end{matrix}

The TPM of

{σ (t), t \in N}

is assumed to be

P = (\begin{matrix} 0.5 & 0.5 \\ 0.3 & 0.7 \end{matrix})

. Let

x (t) = ⋉_{i = 1}^{9} v (X_{i} (t))

,

u (t) = v (U_{1} (t)) ⋉ v (U_{1} (t))

,

y (t) = ⋉_{j = 1}^{3} v (Y_{j} (t))

,

ω (t) = δ_{2}^{σ (t)}

and

z (t) = ω (t) x (t)

. The transition matrices of this MJBCN are not presented here due to the large dimensionality.

A reference trajectory is given in Table 3. Letting α take

1, 0.7, 0.4, 0

in (18) sequentially, we can obtain an optimal policy by Algorithm 2 for each value of α. Under each corresponding optimal policy,

E {e (z (0), y, y^{r})}

and

E {\tilde{e} (z (0), \nabla u)}

for each

z (0) \in Δ_{1024}

are shown in the Figure 3 and Figure 4, respectively, where a horizontal axis value of i actually means

z (0) = δ_{1024}^{i}

,

i \in [1 : 1024]

. To avoid visual clutter, the figures selectively display sparsely sampled data points on the horizontal axis.

Table 3. Reference trajectory 2.

Figure 3.

E {e (z (0), y, y^{r})}

for each

z (0) \in Δ_{1024}

under the optimal policy.

Figure 4.

E {\tilde{e} (z (0), \nabla u)}

for each

z (0) \in Δ_{1024}

under the optimal policy.

5. Conclusions

This paper studied the minimum error OTC of an MJBCN with respect to a predefined trajectory with a finite length, which was transformed into a dynamic optimization problem in terms of the instrumental variable

z (t)

. An optimal policy was designed using an algorithm to minimize the expected total tracking error.

Next, the penalty for control input changes was taken into account. Through the weighted summation of the total tracking error and the total variation of the control input, a new objective function was constructed. The optimal expected value of the objective function and the optimal policy were determined through the dynamic programming of the instrumental variable

\hat{z} (t)

. A methodology framework diagram of this paper is provided in Figure 5.

Figure 5. Methodology framework diagram.

Finally, the main results were applied to two simplified biological models. As shown in the examples, the parameter

α

can be adjusted according to different requirements, and different values of

α

lead to optimal policies with varying emphasis on the tracking error and the variation of the control input.

Author Contributions

Writing—original draft preparation, B.C.; validation, writing—review and editing, Y.X. and A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 62403253 and 12401642), in part by the Natural Science Foundation of Jiangsu Province (Grant Nos. BK20240604 and BK20240606), and in part by the Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications (Grant Nos. NY223195 and NY223198).

Data Availability Statement

All data are included in this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kauffman, S.A. Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. 1969, 22, 437–467. [Google Scholar] [CrossRef] [PubMed]
Cheng, D.; Qi, H.; Li, Z. Analysis and Control of Boolean Networks: A Semi-Tensor Product Approach; Springer: London, UK, 2011. [Google Scholar]
Liu, W.; Fu, S.; Zhao, J. Set stability and set stabilization of Boolean control networks avoiding undesirable set. Mathematics 2021, 9, 2864. [Google Scholar] [CrossRef]
Sun, Q.; Li, H. Robust stabilization of impulsive Boolean control networks with function perturbation. Mathematics 2022, 10, 4029. [Google Scholar] [CrossRef]
Deng, L.; Cao, X.; Zhao, J. One-bit function perturbation impact on robust set stability of Boolean networks with disturbances. Mathematics 2024, 12, 2258. [Google Scholar] [CrossRef]
Tang, T.; Ding, X.; Lu, J.; Liu, Y. Improved criteria for controllability of Markovian jump Boolean control networks with time-varying state delays. IEEE Trans. Autom. Control 2024, 69, 7028–7035. [Google Scholar] [CrossRef]
Li, Y.; Feng, J.-E.; Wang, B. Observability of singular Boolean control networks with state delays. J. Franklin Inst. 2022, 359, 331–351. [Google Scholar] [CrossRef]
Li, Y.; Li, H. Relation coarsest partition method to observability of probabilistic Boolean networks. Inf. Sci. 2024, 681, 121221. [Google Scholar] [CrossRef]
Chen, H.; Wang, Z.; Liang, J.; Li, M. State estimation for stochastic time-varying Boolean networks. IEEE Trans. Autom. Control 2020, 65, 5480–5487. [Google Scholar] [CrossRef]
Chen, H.; Wang, Z.; Shen, B.; Liang, J. Model evaluation of the stochastic Boolean control networks. IEEE Trans. Autom. Control 2022, 67, 4146–4153. [Google Scholar] [CrossRef]
Li, Y.; Li, H.; Xiao, G. Luenberger-like observer design and optimal state estimation of logical control networks with stochastic disturbances. IEEE Trans. Autom. Control 2023, 68, 8193–8200. [Google Scholar] [CrossRef]
Li, B.; Pan, Q.; Zhong, J.; Xu, W. Long-run behavior estimation of temporal Boolean networks with multiple data losses. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 15004–15011. [Google Scholar] [CrossRef]
Li, B.; Lu, J.; Xu, W.; Zhong, J. Lossless state compression of Boolean control networks. IEEE Trans. Autom. Control 2024, 69, 4166–4173. [Google Scholar] [CrossRef]
Li, H.; Wang, Y.; Xie, L. Output tracking control of Boolean control networks via state feedback: Constant reference signal case. Automatica 2015, 59, 54–59. [Google Scholar] [CrossRef]
Li, H.; Xie, L.; Wang, Y. Output regulation of Boolean control networks. IEEE Trans. Autom. Control 2017, 62, 2993–2998. [Google Scholar] [CrossRef]
Zhang, Z.; Leifeld, T.; Zhang, P. Finite horizon tracking control of Boolean control networks. IEEE Trans. Autom. Control 2018, 63, 1798–1805. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, X.; Fu, S.; Xia, J. Robust output tracking of Boolean control networks over finite time. Mathematics 2022, 10, 4078. [Google Scholar] [CrossRef]
Gao, Z.; Feng, J.-E. Research status of nonlinear feedback shift register based on semi-tensor product. Mathematics 2022, 10, 3538. [Google Scholar] [CrossRef]
Wang, S.; Li, H. Resolution of fuzzy relational inequalities with Boolean semi-tensor product composition. Mathematics 2021, 9, 937. [Google Scholar] [CrossRef]
Shmulevich, I.; Dougherty, E.R.; Zhang, W. From Boolean to probabilistic Boolean networks as models of genetic regulatory networks. Proc. IEEE 2002, 90, 1778–1792. [Google Scholar] [CrossRef]
Shmulevich, I.; Dougherty, E.R.; Kim, S.; Zhang, W. Probabilistic Boolean networks: A rule-based uncertainty model for gene regulatory networks. Bioinformatics 2002, 18, 261–274. [Google Scholar] [CrossRef]
Kim, S.; Li, H.; Dougherty, E.R.; Cao, N.; Chen, Y.; Bittner, M.; Suh, E.B. Can Markov chain models mimic biological regulation? J. Biol. Syst. 2002, 10, 337–357. [Google Scholar] [CrossRef]
Meng, M.; Xiao, G.; Zhai, C.; Li, G. Controllability of Markovian jump Boolean control networks. Automatica 2019, 106, 70–76. [Google Scholar] [CrossRef]
Chen, B.; Cao, J.; Lu, G.; Rutkowski, L. Stabilization of Markovian jump Boolean control networks via sampled-data control. IEEE Trans. Cybern. 2022, 52, 10290–10301. [Google Scholar] [CrossRef]
Chen, B.; Cao, J.; Lu, G.; Rutkowski, L. Stabilization of Markovian jump Boolean control networks via event-triggered control. IEEE Trans. Autom. Control 2023, 68, 1215–1222. [Google Scholar] [CrossRef]
Melhem, K.; Wang, W. Global output tracking control of flexible joint robots via factorization of the manipulator mass matrix. IEEE Trans. Robot. 2009, 25, 428–437. [Google Scholar] [CrossRef]
Al-Hiddabi, S.A.; McClamroch, N.H. Tracking and maneuver regulation control for nonlinear nonminimum phase systems: Application to flight control. IEEE Trans. Control Syst. Technol. 2002, 10, 780–792. [Google Scholar] [CrossRef]
Li, H.; Wang, Y.; Guo, P. State feedback based output tracking control of probabilistic Boolean networks. Inf. Sci. 2016, 349–350, 1–11. [Google Scholar] [CrossRef]
Chen, B.; Cao, J.; Luo, Y.; Rutkowski, L. Asymptotic output tracking of probabilistic Boolean control networks. IEEE Trans. Circuits Syst. I. Reg. Papers 2020, 67, 2780–2790. [Google Scholar] [CrossRef]
Abdollahi, J.; Dubljevic, S. Lipid production optimization and optimal control of heterotrophic microalgae fed-batch bioreactor. Chem. Eng. Sci. 2012, 84, 619–627. [Google Scholar] [CrossRef]
Zhang, Q.; Feng, J.-E.; Jiao, T. Finite horizon tracking control of probabilistic Boolean control networks. J. Franklin Inst. 2021, 358, 9909–9928. [Google Scholar] [CrossRef]
Zhang, A.; Li, L.; Li, Y.; Lu, J. Finite-time output tracking of probabilistic Boolean control networks. Appl. Math. Comput. 2021, 411, 126413. [Google Scholar] [CrossRef]
Li, Y.; Li, H.; Xiao, G. Optimal control for reachability of Markov jump switching Boolean control networks subject to output trackability. Int. J. Control 2025, 98, 200–207. [Google Scholar] [CrossRef]
Khatri, C.G.; Rao, C.R. Solutions to some functional equations and their applications to characterization of probability distributions. Sankhyā Indian J. Stat. A 1968, 30, 167–180. [Google Scholar]
Li, C.; Zhang, X.; Feng, J.-E.; Cheng, D. Transition analysis of stochastic logical control networks. IEEE Trans. Autom. Control 2024, 69, 1226–1233. [Google Scholar] [CrossRef]
Liu, Z.; Wang, Y.; Li, H. Two kinds of optimal controls for probabilistic mix-valued logical dynamic networks. Sci. China Inf. Sci. 2014, 57, 1–10. [Google Scholar] [CrossRef]
Wu, Y.; Shen, T. An algebraic expression of finite horizon optimal control algorithm for stochastic logical dynamical systems. Syst. Control Lett. 2015, 82, 108–114. [Google Scholar] [CrossRef]
Meng, M.; Liu, L.; Feng, G. Stability and l₁ gain analysis of Boolean networks with Markovian jump parameters. IEEE Trans. Autom. Control 2017, 62, 4222–4228. [Google Scholar] [CrossRef]
Acernese, A.; Yerudkar, A.; Glielmo, L.; Del Vecchio, C. Reinforcement learning approach to feedback stabilization problem of probabilistic Boolean control networks. IEEE Control Syst. Lett. 2021, 5, 337–342. [Google Scholar] [CrossRef]

Figure 1.

E {e (z (0), y, y^{r})}

for each

z (0) \in Δ_{16}

under the optimal policy.

Figure 1.

E {e (z (0), y, y^{r})}

for each

z (0) \in Δ_{16}

under the optimal policy.

Figure 2.

E {\tilde{e} (z (0), \nabla u)}

for each

z (0) \in Δ_{16}

under the optimal policy.

Figure 2.

E {\tilde{e} (z (0), \nabla u)}

for each

z (0) \in Δ_{16}

under the optimal policy.

Figure 3.

E {e (z (0), y, y^{r})}

for each

z (0) \in Δ_{1024}

under the optimal policy.

Figure 3.

E {e (z (0), y, y^{r})}

for each

z (0) \in Δ_{1024}

under the optimal policy.

Figure 4.

E {\tilde{e} (z (0), \nabla u)}

for each

z (0) \in Δ_{1024}

under the optimal policy.

Figure 4.

E {\tilde{e} (z (0), \nabla u)}

for each

z (0) \in Δ_{1024}

under the optimal policy.

Figure 5. Methodology framework diagram.

Table 1. Notations.

Notations	Definitions
$N$	Set of natural numbers
$D$ , $D^{n}$	${0, 1}$ , $\underset{\underset{n}{︸}}{D \times D \times \dots \times D}$
$I_{n}$	n-dimensional identity matrix
$δ_{n}^{i}$	i-th column of $I_{n}$
$Δ_{n}$	Set of columns of $I_{n}$ ; $Δ : = Δ_{2}$
$v (X)$	Vector form of $X \in D$ , i.e., $v (X) = δ_{2}^{2 - X} \in Δ$
${[A]}_{i j}$	$(i, j)$ -th entry of the matrix A
${Col}_{i} (A)$	i-th column of the matrix A
${[v]}_{i}$	i-th entry of the vector v
$[n : m]$	Set of integers ${n, n + 1, \dots, m}$
$δ_{n} [i_{1} i_{2} \dots i_{m}]$	The logical matrix of which k-column is $δ_{n}^{i_{k}}$
$R_{n \times m}$	Set of $n \times m$ real matrices
$L_{n \times m}$	Set of $n \times m$ logical matrices
⊗	Kronecker product
⋉	Semi-tensor product [2]
*	Khatri-Rao product [34]
$⋉_{i = 1}^{n} x_{i}$	Short for $x_{1} ⋉ x_{2} ⋉ \dots ⋉ x_{n}$
$1_{n}$ , $0_{n}$ , $\infty_{n}$	${[\underset{\underset{n}{︸}}{1 1 \dots 1}]}^{⊺}$ , ${[\underset{\underset{n}{︸}}{0 0 \dots 0}]}^{⊺}$ ${[\underset{\underset{n}{︸}}{\infty \infty \dots \infty}]}^{⊺}$

Table 2. Reference trajectory 1.

t	1	2	3	4
$Y_{1}^{r}$	0	1	1	0
$Y_{2}^{r}$	1	0	1	0
$y^{r}$	$δ_{4}^{3}$	$δ_{4}^{2}$	$δ_{4}^{1}$	$δ_{4}^{4}$

Table 3. Reference trajectory 2.

t	1	2	3	4	5	6
$Y_{1}^{r}$	1	1	1	0	0	0
$Y_{2}^{r}$	1	1	0	1	0	0
$Y_{3}^{r}$	0	1	1	0	1	0
$y^{r}$	$δ_{8}^{2}$	$δ_{8}^{1}$	$δ_{8}^{3}$	$δ_{8}^{6}$	$δ_{8}^{7}$	$δ_{8}^{8}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.