Towards Tensor Representation of Controlled Coupled Markov Chains

Daniel McInnes; Boris Miller; Gregory Miller; Sergei Schreider

doi:10.3390/math8101712

,

and

¹

School of Mathematics, Monash University, Wellington Road, Clayton VIC 3800, Australia

²

Institute for Information Transmission Problems RAS, 19/1 Bolshoy Karetny Per., 127051 Moscow, Russia

³

Institute of Informatics Problems of Federal Research Center “Computer Science and Control” RAS, 44/2 Vavilova Str., 119333 Moscow, Russia

⁴

Department of Management and Information Systems, Rutgers Business School, Rutgers—The State University of New Jersey, New Brunswick, NJ 07102, USA

Mathematics2020, 8(10), 1712;https://doi.org/10.3390/math8101712

This article belongs to the Special Issue Advances in Applications of Probability Theory and Stochastic Processes

Version Notes

Order Reprints

Abstract

For a controlled system of coupled Markov chains, which share common control parameters, a tensor description is proposed. A control optimality condition in the form of a dynamic programming equation is derived in tensor form. This condition can be reduced to a system of coupled ordinary differential equations and admits an effective numerical solution. As an application example, the problem of the optimal control for a system of water reservoirs with phase and balance constraints is considered.

Keywords:

coupled markov chains; stochastic control; optimal control; tensor representation; dynamic programming

1. Introduction

The analysis and optimization of systems of coupled Markov Chains (MC) appear in various applied areas such as environmental management [1,2], control of dams [3,4,5], control of data transmission systems for the avoidance of congestion [6,7,8], and many others. In general, the optimal control of MC with constraints and various criteria leads to the solution of a system of ordinary differential equations. However, the coupled MC produces serious difficulties due to a dramatic increase in the number of states, which renders impossible the numerical solution of the optimization problem with the aid of standard computers in a reasonable time [9]. Moreover, the presence of constraints very often leads to rather complicated and cumbersome mathematical problems.

The evolution of single controlled MC may be effectively represented via a vector-valued function satisfying stochastic differential equations with a control dependent generator [10]. If a set of MCs shares common control parameters, they no longer can be analyzed separately, since the system becomes coupled. Approaching the coupled system as a whole leads to the necessity of the consideration of a global system state, which incorporates the states of all individual MCs involved. Joining together the states of all MCs as a set of vectors (perhaps of different dimensions) leads to the description of their evolution in tensor form. The next step of the control synthesis for this coupled system is the derivation of the optimality conditions in the form of a special dynamic programming equation (DPE). The demand for accuracy leads, however, to the extension of MCs’ state space, and by that, to the increase of the dimensionality of the DPE. At the same time, the numerical solution of DPE allows the parallelization of the most time-consuming operations, and this is one of the advantages of the tensor representation. Here we give an algorithm for DPE derivation, which is the principal step for its numerical solution. It should be noted that this paper follows the previously published works [6,9,11] and provides a proper mathematical justification for the results presented there.

As an example, we consider the management of coupled dams under non-stationary seasonally changing random inflows/outflows. The presentation of a dam’s state as a continuous-time MC permits one to take into account the random character of the incoming and outgoing water flow due to rain, evaporation, and, of course, customer demands. However, sometimes it is necessary to organize the interflow between the different parts of a system, such as controlled flow from one dam to another or controlled release, to avoid overflow. In all these cases, it is necessary to consider the system as a whole, that is, the global state of the dam system. The current water level of each dam is described by the state of a continuous-time MC, hence the state of the whole system of the coupled dams is represented in tensor form. The connection of MCs is a result of the controlled flow between the dams. The control aims to maintain the required water levels in given weather conditions, and at the same time, to satisfy the customer demands. The general approach is based on the solution of DPE in tensor form. This equation may be reduced to a system of ordinary differential equations. We suggest here the procedure for the generation of this system and also the approach to the minimization in its right-hand side (RHS), which may be realized for each state of the coupled MC independently.

The structure of the paper is as follows: in Section 2 we describe the model of the controlled MC system. The martingale representation and stochastic evolution of the global state of coupled MC is given in Section 3. The optimal control and Kolmogorov equation for the distribution of states of the controlled coupled system of MC is in Section 4. A numerical example motivated by the Goulburn River water cascade is presented in Section 5.

2. Model of Controlled Markov Chains System

In this section we give the definition of the model of the controlled coupled MCs. Let us consider a set of M MCs. The m-th MC has

N^{m}

possible states and a time- and control-dependent generator

A^{m} (t, u) = [a_{i, j}^{m} (t, u)]

,

i, j = 1, \dots, N^{m}

:

$a_{i, j}^{m} (t, u) \geq 0$ , for $i \neq j$ ;
$a_{i, i}^{m} (t, u) = - \sum_{j \neq i} a_{j, i}^{m} (t, u)$ .

The controlparameter

u \in U

, where U is a compact set in some complete metric space, and the matrix valued functions

A^{m}

,

m = 1, \dots, M

are assumed to be continuous in both parameters

(t, u) \in [0, T] \times U

, where

T < \infty

. The control parameter u is shared by all the MCs, and hence defines their interconnection.

2.1. Global State

Let the local state space of the m-th MC be given by a set of standard basis vectors

X_{t}^{m} \in S^{m} = {e^{1}, \dots, e^{N^{m}}}

, then its dynamics can be described by the following stochastic differential equation [10]:

X_{t}^{m} = X_{0}^{m} + \int_{0}^{t} A^{m} (s, u (s)) X_{s}^{m} d s + W_{t}^{m}, m = 1, \dots, M,

(1)

where

X_{0}^{m}

is the initial state of the m-th MC and the process

W_{t}^{m}

is a square integrable martingale with bounded quadratic variation. We assume that the martingales

W_{t}^{l}

and

W_{t}^{m}

are independent for

m \neq l

, and hence the MCs do not experience simultaneous transitions.

The first approach to the controlled coupled MCs was presented in References [6,9,11], where the global system state was described as a tensor product of the individual MCs’ states:

X_{t} = X_{t}^{1} \otimes X_{t}^{2} \otimes \dots \otimes X_{t}^{M} = ⨂_{m = 1}^{M} X_{t}^{m},

(2)

X_{t} \in S

,

S = S^{1} \times \dots \times S^{M}

, where × denotes the Cartesian product (the definition of tensor product can be found in Appendix A). For the combination of local states

X_{t}^{1} = e^{i^{1}}, \dots, X_{t}^{M} = e^{i^{M}}

, the global state

X_{t} = e^{i^{1} \dots i^{M}} = e^{i^{1}} \otimes \dots \otimes e^{i^{M}}

can be represented as a multidimensional array of shape

(N^{1}, \dots, N^{M})

with single non-zero element

X_{t}^{i^{1} \dots i^{M}} = 1

. This representation allows us to significantly simplify the expressions involved in the DPE and makes the implementation of the control optimization algorithms rather straightforward, especially with programming tools that allow multidimensional array objects (e.g., Matlab, Python NumPy or Julia).

Let the set of admissible controls

U

be defined as a set of

F_{t}^{X}

-predictable controls taking values in U (here

F_{t}^{X}

stands for a natural filtration associated with the stochastic process

X_{t}

). This assumption ensures the measurability of the admissible controls. In other words, if the number of global state transitions of the coupled MC system up to the current time

t \in [0, T]

is

N_{t}

and

τ_{k}

is the time of the k-th jump and

{X^{m}|}_{0}^{t} = {(X_{0}^{m}, 0), (X_{1}^{m}, τ_{1}), \dots, (X_{N_{t}}^{m}, τ_{N_{t}})}

is the history of the m-th MC on the time interval

[0, t]

(set of state and jump time pairs), then for

τ_{N_{t}} \leq t < τ_{N_{t + 1}}

the controls

u (t) = u (t, {X|}_{0}^{t})

are measurable with respect to t and the global history

{X|}_{0}^{t} = {{X^{0}|}_{0}^{t}, \dots, {X^{M}|}_{0}^{t}}

[9].

2.2. Performance Criterion

Let

f (s, u (s), X_{s})

be a running cost function defined for the global state

X_{s}

and time

s \in [0, T]

, and

φ_{0} (X_{T})

be a terminal condition. Then a general performance criterion to be minimized is given by

J [u (\cdot), X (\cdot)] = E [φ_{0} (X_{T}) + \int_{0}^{T} f (s, u (s), X_{s}) d s] .

(3)

Here

f (s, u (s), \cdot)

is, in fact, a multidimensional array-valued function defined for any possible global state of the coupled MC system

X_{t} = e^{i^{1} \dots i^{M}} = e^{i^{1}} \otimes \dots \otimes e^{i^{M}}

. For example, the running cost function can be represented as

f (s, u (s), X_{t}) = ⟨ f (s, u (s)), X_{t} ⟩ = \sum_{i^{1} \dots i^{M}} f_{i^{1} \dots i^{M}} (s, u (s)) X_{t}^{i^{1} \dots i^{M}} = \sum_{i^{1} \dots i^{M}} f_{i^{1} \dots i^{M}} (s, u (s)) I {X_{t}^{i^{1} \dots i^{M}} = 1},

(4)

where

⟨ \cdot, \cdot ⟩

denotes the inner product and

I {\cdot}

is the indicator function. Thus, for given t and

u (t)

, the state

X_{t} = e^{i^{1} \dots i^{M}}

selects a single value

f_{i^{1} \dots i^{M}} (s, u (s))

from the multidimensional array

f (s, u (s))

. The same is valid for the terminal condition

φ_{0} (X_{T}) = ⟨ φ_{T}, X_{T} ⟩

, which is defined by a multidimensional array

φ_{T}

.

Assumption A1.

The elements of

f (s, u (s))

are bounded from below and are continuous functions on

[0, T] \times U

.

3. Martingale Representation of the Global State

In this section we derive the martingale representation for the global state of the controlled coupled MC system analogous to (1). To that end, we need to introduce some additional notations. Let

I^{A^{m}}

denote an identity matrix of the same shape as the matrix

A^{m} (s, u (s))

,

m = 1, \dots, M

, and

A \otimes B

denote the tensor product of matrices A and B (see Appendix A). Then

\begin{matrix} A (s, u (s)) = ⨁_{m = 1}^{M} A^{m} (s, u (s)) = & A^{1} (s, u (s)) \oplus A^{1} (s, u (s)) \oplus \dots \oplus A^{M} (s, u (s)) \\ = & A^{1} (s, u (s)) \otimes I^{A^{2}} \otimes \dots \otimes I^{A^{M}} \\ + & I^{A^{1}} \otimes A^{2} (s, u (s)) \otimes \dots \otimes I^{A^{M}} \\ \dots \\ + & I^{A^{1}} \otimes I^{A^{2}} \otimes \dots \otimes A^{M} (s, u (s)) \end{matrix}

(5)

is called the tensor sum [12] of the square matrices

A^{m} (s, u (s))

,

m = 1, \dots, M

.

Theorem 1.

Let

X_{t}

be the global state of the coupled MC system, where each MC satisfies the representation (1). Then

X_{t}

satisfies the representation

X_{t} = X_{0} + \int_{0}^{t} A (s, u (s)) X_{s} d s + W_{t} = X_{0} + \int_{0}^{t} ⨁_{m = 1}^{M} A^{m} (s, u (s)) ⨂_{m = 1}^{M} X_{s}^{m} d s + W_{t},

(6)

where

W_{t}

is a square integrable martingale.

Note that from Theorem 1 it follows, that piece-wise constant right-continuous process

X_{t}

and square integrable martingale

W_{t}

give the solution of the martingale problem (see Reference [13], [Chapter III] for details) for SDE (6).

Proof.

The theory of stochastic differential equations with values from finite-dimensional Hilbert spaces can be found in Reference [14], and the present proof is based on the corollary from the Ito formula for a finite-dimensional Hilbert space [14], [Section 4.1]. This result, applied for the case of

M = 2

, that is, for tensor product

X_{t} = X_{t}^{1} \otimes X_{t}^{2}

, brings the following:

X_{t}^{1} \otimes X_{t}^{2} = X_{0}^{1} \otimes X_{0}^{2} + \int_{0}^{t} X_{s -}^{1} \otimes d X_{s}^{2} + d X_{s}^{1} \otimes X_{s -}^{2} + {[W^{1}, W^{2}]}_{t},

where

{[W^{1}, W^{2}]}_{t}

stands for the tensor mutual variation of the processes

X_{t}^{1}

,

X_{t}^{2}

, which is a zero tensor due to the independence of the martingales

W_{t}^{1}

,

W_{t}^{2}

.

Substituting the martingale representation of the individual MCs (1) for

m = 1, 2

we obtain

X_{t}^{1} \otimes X_{t}^{2} = X_{0}^{1} \otimes X_{0}^{2} + \int_{0}^{t} A^{1} (s, u (s)) X_{s}^{1} \otimes X_{s}^{2} d s + X_{s}^{1} \otimes A^{2} (s, u (s)) X_{s}^{2} d s + W_{t}^{1, 2},

where

d W_{t}^{1, 2} = X_{s -}^{1} \otimes d W_{t}^{2} + d W_{t}^{1} \otimes X_{s -}^{2}

, so

W_{t}^{1, 2}

is a square integrable martingale.

Proceeding further for

m > 2

, we finally have

\begin{matrix} X_{t}^{1} \otimes X_{t}^{2} \otimes \dots \otimes X_{t}^{M} = & X_{0}^{1} \otimes X_{0}^{2} \otimes \dots \otimes X_{0}^{M} \\ + & \int_{0}^{t} A^{1} (s, u (s)) X_{s}^{1} \otimes X_{s}^{2} \otimes \dots \otimes X_{s}^{M} d s \\ + & X_{s}^{1} \otimes A^{2} (s, u (s)) X_{s}^{2} \otimes \dots \otimes X_{s}^{M} d s \\ \dots \\ + & X_{s}^{1} \otimes X_{s}^{2} \otimes \dots \otimes A^{M} (s, u (s)) X_{s}^{M} d s & + W_{t} . \end{matrix}

(7)

Now using the definition of tensor product of matrices (see Appendix A), we can factorize the summands in the last expression as

\begin{matrix} X_{t}^{1} \otimes X_{t}^{2} \otimes \dots \otimes X_{t}^{M} = & X_{0}^{1} \otimes X_{0}^{2} \otimes \dots \otimes X_{0}^{M} \\ + & \int_{0}^{t} (A^{1} (s, u (s)) \otimes I^{A^{2}} \otimes \dots \otimes I^{A^{M}}) (X_{s}^{1} \otimes X_{s}^{2} \otimes \dots \otimes X_{s}^{M}) d s \\ + & (I^{A^{1}} \otimes A^{2} (s, u (s)) \otimes \dots \otimes I^{A^{M}}) (X_{s}^{1} \otimes X_{s}^{2} \otimes \dots \otimes X_{s}^{M}) d s \\ \dots \\ + & (I^{A^{1}} \otimes I^{A^{2}} \otimes \dots \otimes A^{M} (s, u (s))) (X_{s}^{1} \otimes X_{s}^{2} \otimes \dots \otimes X_{s}^{M}) d s & + W_{t}, \end{matrix}

which, along with the definition of the tensor sum (5), yields the theorem statement. □

Note that Formula (7) provides an efficient way to calculate the tensor contraction

A (s, u (s)) X_{s} = A_{j^{1} \dots j^{M}}^{i^{1} \dots i^{M}} X_{s}^{j^{1} \dots j^{M}}

:

\begin{matrix} A (s, u (s)) X_{s} = & A^{1} (s, u (s)) X_{s}^{1} \otimes X_{s}^{2} \otimes \dots \otimes X_{s}^{M} \\ + & X_{s}^{1} \otimes A^{2} (s, u (s)) X_{s}^{2} \otimes \dots \otimes X_{s}^{M} \\ \dots \\ + & X_{s}^{1} \otimes X_{s}^{2} \otimes \dots \otimes A^{M} (s, u (s)) X_{s}^{M} . \end{matrix}

(8)

4. Optimal Control

4.1. Value Function Representation

The value function of the coupled MC system is a function, which is equal to the minimum total cost for the system given the starting time

t \in [0, T]

and state

X_{t} = X

:

V (t, X) = inf_{u (\cdot)} J [u (\cdot), X (\cdot) | X_{t} = X],

where

J [u (\cdot), X (\cdot) | X_{t} = X] = E [⟨ φ_{T}, X_{T} ⟩ + \int_{t}^{T} ⟨ f (s, u (s)), X_{s} ⟩ d s | X_{t} = X] .

As in Formula (4) for the running cost function, we now represent the value function

V (t, X)

as

V (t, X) = ⟨ φ_{t}, X ⟩,

(9)

where

φ_{t}

is a multidimensional array-valued function defined for any possible global state of the coupled MC system

X_{t} = e^{i^{1} \dots i^{M}} = e^{i^{1}} \otimes \dots \otimes e^{i^{M}}

.

4.2. Dynamic Programming Equation

Further, we generalize the approach to the control optimization of ordinary MC [7] to the controlled MC systems. Define the dynamic programming equation with respect to

{\hat{φ}}_{t} \in R^{N^{1}} \times \dots \times R^{N^{M}}

:

\frac{⟨ d {\hat{φ}}_{t}, X_{t} ⟩}{d t} = - min_{u \in U} \{⟨{\hat{φ}}_{t}, A (t, u) X_{t}⟩ + ⟨f (t, u), X_{t}⟩\} = - min_{u \in U} H (t, {\hat{φ}}_{t}, u, X_{t}) = - H (t, {\hat{φ}}_{t}, X_{t})

(10)

with terminal condition

\hat{φ} (T) = φ_{T}

[7,10,15]. Our aim is to show that (10) has a unique solution

{\hat{φ}}_{t}

, and that

{\hat{φ}}_{t} = φ_{t}

from the value function representation (9). This will allow us to present an optimal Markov control, which minimizes the performance criterion (3).

If Assumption 1 holds, then

H (t, {\hat{φ}}_{t}, u, X)

is continuous in

(t, u)

, and hence for any

X \in S

there exists

u^{*} (t, X) = \underset{u \in U}{argmin} H (t, {\hat{φ}}_{t}, u, X),

which minimizes this continuous function on a compact set U. Moreover,

H (t, {\hat{φ}}_{t}, u, X)

is affine in

{\hat{φ}}_{t}

for any

(t, X) \in [0, T] \times S

, so

H (t, {\hat{φ}}_{t}, X)

is Lipschitz in

{\hat{φ}}_{t}

, and hence Equation (10) has a unique solution on

[0, T]

.

The existence of an

F_{t}^{X}

-predictable (i.e.,

F_{t -}^{X}

-measurable) optimal control, provided that Assumption 1 is true, follows from a general result [16], [Theorem 4.2]. The following theorem states that the optimal control can be chosen among the Markov controls, which depend not on the whole process history

{X|}_{0}^{t}

, but on the left limit

X_{t -}

only. It should be noted that this result is a generalization of Theorem 2.8 [7] for controlled MCs to the case of controlled MC systems.

Theorem 2.

If Assumption 1 holds, then the Markov control and the value function is defined by the solution of the DPE (10), that is,

V (t, X) = ⟨ {\hat{φ}}_{t}, X ⟩

and

\hat{u} (t, {X|}_{0}^{t}) = u^{*} (t, X_{t -}) = \underset{u \in U}{argmin} H (t, {\hat{φ}}_{t}, u, X_{t -}),

is the optimal control.

Proof.

Since

\hat{u} (\cdot)

is a predictable control, then for any initial condition

X_{0} \in S

there exists a unique solution of the martingale problem (6). That is, there exists a process

X_{t}^{\hat{u}} \in S D_{[0, T]}

, where

S D_{[0, T]}

is a class of

S

-valued right continuous functions with left limits, which satisfies Equation (6), that is,

X_{t}^{\hat{u}} = X_{0} + \int_{0}^{t} A (s, \hat{u} (s)) X_{s}^{\hat{u}} d s + W_{t}^{\hat{u}},

where

W^{\hat{u}}

is a square integrable

F_{t}^{X^{\hat{u}}}

martingale.

Let us take some admissible control

u (\cdot)

and corresponding solution

X_{t}^{u}

of the martingale problem (6). Applying Ito’s formula to the process

⟨ {\hat{φ}}_{t}, X_{t}^{u} ⟩

we get

⟨ {\hat{φ}}_{t}, X_{t}^{u} ⟩ = ⟨ {\hat{φ}}_{T}, X_{T}^{u} ⟩ - \int_{t}^{T} \{⟨ d {\hat{φ}}_{s}, X_{s}^{u} ⟩ + ⟨ {\hat{φ}}_{s}, d X_{s}^{u} ⟩\} .

Since the first integrand in the RHS is the minimum in (10), and

X_{s}^{u}

in the second integrand obeys the martingale representation (6), we can transform the last equation into the following inequality:

\begin{matrix} ⟨ {\hat{φ}}_{t}, X_{t}^{u} ⟩ & \leq ⟨ {\hat{φ}}_{T}, X_{T}^{u} ⟩ + \int_{t}^{T} \{H (t, {\hat{φ}}_{s}, u (s), X_{s}^{u}) d s - ⟨ {\hat{φ}}_{s}, A (s, u (s)) X_{t}^{u} ⟩ d s - ⟨ {\hat{φ}}_{s}, d W_{t}^{u} ⟩\} \\ = ⟨ {\hat{φ}}_{T}, X_{T}^{u} ⟩ + \int_{t}^{T} \{⟨ f (s, u (s)) X_{t}^{u} ⟩ d s - ⟨ {\hat{φ}}_{s}, d W_{t}^{u} ⟩\} . \end{matrix}

Taking the expectation

E [\cdot | X_{t} = X]

of both parts of the inequality, we get

J [u (\cdot), X (\cdot) | X_{t} = X] = E [⟨ {\hat{φ}}_{T}, X_{T}^{u} ⟩ + \int_{t}^{T} ⟨ f (s, u (s)), X_{s}^{u} ⟩ d s | X_{t} = X] \geq ⟨ {\hat{φ}}_{t}, X ⟩,

since

{\hat{φ}}_{s}

is a continuous deterministic function and the expectation of the integral over the martingale is equal to zero. Substitution of the control

\hat{u} (t, {X|}_{0}^{t}) = u^{*} (t, X_{t -})

yields the equality, which means that the solution to the DPE (10) defines the value function

V (t, X) = ⟨ {\hat{φ}}_{t}, X ⟩

and

u^{*} (t, X_{t -})

is the optimal control. □

4.3. Optimal Control Calculation

In the previous section it was shown that the solution to the DPE (10) is unique and from Theorem 2 it follows that the Markov control, which minimizes the RHS of this equation, is optimal. Now if we let

X_{t} = e^{i^{1} \dots i^{M}} = e^{i^{1}} \otimes \dots \otimes e^{i^{M}}

, then we get a system of

\prod_{m = 1}^{M} N^{m}

ordinary differential equations (ODEs):

\frac{d {\hat{φ}}_{t}^{i^{1} \dots i^{M}}}{d t} = - min_{u \in U} \{⟨ {\hat{φ}}_{t}, A (t, u) e^{i^{1} \dots i^{M}} ⟩ + f_{i^{1} \dots i^{M}} (t, u)\} = - H (t, {\hat{φ}}_{t}, e^{i^{1} \dots i^{M}}) .

(11)

The inner product in the right-hand side of the ODEs (11) can be simplified as follows:

\begin{matrix} ⟨ {\hat{φ}}_{t}, A (t, u) e^{i^{1} \dots i^{M}} ⟩ = \sum_{j^{1} \dots j^{M}} {\hat{φ}}_{t}^{j^{1} \dots j^{M}} ⟨ A (t, u) e^{i^{1} \dots i^{M}}, e^{j^{1} \dots j^{M}} ⟩ \\ = \sum_{j^{1} \dots j^{M}} {\hat{φ}}_{t}^{j^{1} \dots j^{M}} (⟨ A^{1} (t, u (t)) e^{i^{1}}, e^{j^{1}} ⟩ \otimes \dots \otimes ⟨ e^{i^{M}}, e^{j^{M}} ⟩ + \dots + ⟨ e^{i^{1}}, e^{j^{1}} ⟩ \otimes \dots \otimes ⟨ A^{M} (t, u (t)) e^{i^{M}}, e^{j^{M}} ⟩) \\ = \sum_{j^{1} \dots j^{M}} {\hat{φ}}_{t}^{j^{1} \dots j^{M}} (⟨ {A^{1}}^{T} (t, u (t)) e^{j^{1}}, e^{i^{1}} ⟩ \otimes \dots \otimes ⟨ e^{j^{M}}, e^{i^{M}} ⟩ + \dots + ⟨ e^{j^{1}}, e^{i^{1}} ⟩ \otimes \dots \otimes ⟨ {A^{M}}^{T} (t, u (t)) e^{j^{M}}, e^{i^{M}} ⟩) \\ = \sum_{j^{1} \dots j^{M}} {\hat{φ}}_{t}^{j^{1} \dots j^{M}} ⟨ A^{T} (t, u (t)) e^{j^{1} \dots j^{M}}, e^{i^{1} \dots i^{M}} ⟩ = ⟨ A^{T} (t, u (t)) {\hat{φ}}_{t}, e^{i^{1} \dots i^{M}} ⟩, \end{matrix}

where we used the definition of tensor sum (5), the representation of the inner product as a component-wise sum, and denoted

A^{T} (t, u (t)) = ⨁_{m = 1}^{M} {A^{m}}^{T} (t, u (t)) .

Finally, we get the following tensor form for the system of ODEs (11):

\frac{d {\hat{φ}}_{t}}{d t} = - min_{u \in U} \{A^{T} (t, u (t)) {\hat{φ}}_{t} + f (t, u)\} .

(12)

The system (12) provides a method of simultaneous calculation of the optimal Markov control

u^{*} (t, X_{t -})

and the DPE solution. This system is solved backwards in time, starting from the terminal condition

\hat{φ} (T) = \hat{φ} (t)

. The minimizations of the RHSs of the equations of this system are independent of each other and, in the case of the numerical solution, these time-consuming operations can be performed in parallel. The result of the minimization yields a set of control functions

u_{i^{1} \dots i^{M}}^{*} (t)

, which define the Markov optimal control:

u^{*} (t, X_{t -}) = u_{i^{1} \dots i^{M}}^{*} (t) I {X_{t -} = e^{i^{1} \dots i^{M}}} .

(13)

4.4. Kolmogorov Equation

Once the optimal control is obtained, there arises the problem of the state probabilities calculation. The main difficulty is that the optimal Markov control depends on the state

X_{t}

, so one cannot directly compute the generator using the tensor sum Formula (5). To overcome this issue for a single controlled MC with the generator

A (t, u (t, X_{t}))

, it suffices to substitute

u (t, X_{t})

in the j-th column with control

u (t, e^{j})

, which corresponds to the state

e^{j}

. For the controlled MC system, the generator

A (t, u^{*} (t, X_{t}))

can be constructed in a similar block-wise manner. Indeed, for any global state

e^{j^{1} \dots j^{M}} = e^{j^{1}} \otimes \dots \otimes e^{j^{M}}

, the corresponding Markov control is equal to

u^{*} (t, e^{j^{1} \dots j^{M}})

and

A_{j^{1} \dots j^{M}}^{i^{1} \dots i^{M}} (t, u^{*} (t, X_{t})) = A_{j^{1} \dots j^{M}}^{i^{1} \dots i^{M}} (t, u^{*} (t, e^{j^{1} \dots j^{M}})), for every multi - index i^{1} \dots i^{M},

(14)

where

A (t, u^{*} (t, e^{j^{1} \dots j^{M}}))

is defined by Formula (5) for

u (t) = u^{*} (t, e^{j^{1} \dots j^{M}})

.

Finally, Equation (14) defines the transition rate tensor

A (t, u^{*} (t, X_{t}))

, which is nonrandom, and the state probabilities

P_{t} = P {X_{t} = e^{i^{1} \dots i^{M}}}

for the process

X_{t}^{\hat{u}}

satisfy the Kolmogorov equation

\frac{d P_{t}}{d t} = A (t, u^{*} (t, X_{t})) P_{t} .

(15)

Note that as in the case of a single controlled MC this equation is simply a result of taking the expectation of both sides of the corresponding SDE (6). Note also that the right-hand side of (15) is the tensor contraction

A (t, u^{*} (t, X_{t})) P_{t} = A_{j^{1} \dots j^{M}}^{i^{1} \dots i^{M}} (t, u^{*} (t, X_{t})) P_{t}^{j^{1} \dots j^{M}}

defined as in (8).

5. Numerical Study: Goulburn River Cascade

To illustrate our approach to the modeling of real-world systems as coupled MCs and optimal control synthesis, in this section we present a numerical study based on real-world data. The usual area of applications for MCs and numerous generalizations is service systems, which can be described in terms of Poisson arrival flows, queues, and servers. For example, in References [6,17], one can find an application of coupled MCs for congestion control in networks. Specifically, an optimal control strategy was proposed for a multi-homing network connection, where the packets (portions of incoming traffic) could be sent through one of two lines of different speed and price or discarded. However, in the present paper, we stick to another application area, namely, the control of water reservoirs. The reason is that the proposed approach allows successful modeling of these systems, accounting for the stochastic nature of the incoming and outgoing flows, being dependent on the weather condition and users’ behavior. Moreover, the proposed form of the performance criterion (3) permits a variety of optimization goals, for example, minimization of the difference between the actual and desired demand intensities, maintenance of the balance between the incoming and outgoing flows, minimization of the probability that either dam falls below some critical level on average or at the terminal time [9]. Besides, this criterion allows penalizing control in certain states, so that natural restrictions (such as the impossibility to transfer water from an empty or to an overflowing reservoir) can also be taken into account. The terminal condition can reflect the desire to bring the system to a certain state by the end of the control interval.

The object of our study is a system of interconnected dams of the Goulburn River cascade. The model is deliberately simplified to avoid cumbersome details, which are unnecessary for a journal on mathematical subjects, nevertheless, it demonstrates all the necessary techniques, such as MCs states’ choice, relations between the discharges and state transition intensities, accounting for the natural inflows and outflows, consumer demands, and environmental constraints.

The Goulburn River Basin is located in the south-western part of State Victoria, Australia. The description of the region and a review of the literature devoted to water resource modeling and planning can be found, for instance, in References [18,19,20]. The review of papers on allocation modeling in this region and water trading can be found in References [21,22,23]. The Goulburn Basin, together with the Upper Murray, Ovens, Kiewa, Broken, Loddon, and Campaspe Basins constitute an area called the Goulburn-Murray Irrigation District (common abbreviation is GMW from Goulburn-Murray Water—an organization governing water resources in this area). Since the Goulburn River is a major waterway and a major contributor of water supply in the district, it is an object of very intensive research and modeling. The GMW uses the Goulburn simulation model (GSM) for the optimization and planning of water supply in the region. The GSM represents the Goulburn system as a set of nodes of different types: storages, demands, and streamflow inputs. These nodes are connected by a network of carriers characterized by their capacities and delivery penalties (penalty functions). The GSM was calibrated using a trial-and-error procedure, which optimizes the model parameters related to the water supply infrastructure [24]. The model presented here reflects a part of the GSM and involves Lake Eildon—a major storage of water, which can be required by farmers during the irrigation season, and Lake Nagambie with the Goulbourn Weir, which is connected by three major channels to the end-users. The Goulburn River connects the two water storages and continues to run after Nagambie, serving as its fourth outflow. Based on the request of irrigators, the water is released from Lake Eildon to Lake Nagambie, and then by one of three channels or the river delivered to the farmers (water right holders). The important point is that apart from irrigation requests, the system must satisfy environmental demand, which means the levels in rivers and channels should exceed some minimal threshold that will allow the support of healthy ecosystems and provide some required minimal inflows to satisfy downstream demands.

The scheme of the Goulburn River cascade is presented in Figure 1. We assume that the water level of Lake Eildon is affected by two major flows. One is the incoming flow, which reflects the sum of all the natural water arrivals and losses, including precipitation, upstream tributaries, evaporation, and other causes. The single outcoming flow is the controlled discharge of the Goulburn River, which also serves as the single incoming flow of Lake Nagambie (other natural inflows and outflows into this lake are ignored here due to a significant difference in the size of the two lakes). From Lake Nagambie, there are four controlled discharges, namely the Stuart Murray Canal, Cattanach Canal, East Goulburn Main Channel, and the Goulburn River.

Figure 1. Goulburn River cascade scheme.

5.1. Mcs and Generators

The states of the water reservoirs indicate their water levels, expressed in equal portions of their volume. So for Lake Eildon with a total volume equal to

V_{E}

= 3,390,000 ML, the bounds of the states are

\frac{i \cdot V_{E}}{N_{E}}

, where

i = 1, \dots, N_{E}

and

N_{E}

is the desired number of states, which is chosen according to accuracy demands. The same state division is considered for Lake Nagambie with volume

V_{N}

= 25,500 ML: the bounds are

\frac{j \cdot V_{N}}{N_{N}}

,

j = 1, \dots, N_{N}

,

N_{N}

— desired number of states for this reservoir. Association of the states with water volumes instead of the water surface levels leads to the same dependency of the transitions on the incoming and outcoming flows and hence simplifies the resulting model.

We model the reservoirs as MCs, so the evolution of the reservoirs’ states in time is described by the stochastic differential Equation (1), where

M = 2

and

X_{t}^{1}

and

X_{t}^{2}

stand for Lakes Eildon and Nagambie respectively. Due to natural reasons, the transitions are only possible between adjacent states, so the generators

A^{m} (t, u)

reflect a birth-and-death processes:

A^{1} (t, u) = (\begin{matrix} - L_{R} & λ_{R} & 0 & 0 & \dots & 0 \\ U_{E} & - U_{E} - L_{R} & L_{R} & 0 & \dots & 0 \\ 0 & U_{E} & - U_{E} - L_{R} & L_{R} & \dots & 0 \\ \dots \\ 0 & \dots & 0 & 0 & U_{E} & - U_{E}, \end{matrix})

where

L_{R} = L_{R} (t)

is the intensity of the incoming flow of natural water arrivals and losses, and

U_{E} = U_{E} (t)

is the controlled intensity, which reflects the flow from Lake Eildon to Lake Nagambie. The variables

L_{R}

and

U_{E}

have a purely probabilistic sense, which has to be translated into the common units of water discharge measurement, say

[\frac{M L}{d a y}]

. Here the basic assumption is the equality of the average transition times, which for the constant rate

λ

and constant discharge u are equal to

1 / λ

from the one side, and to

\frac{v}{u}

from the other. Here v is the volume of water, which corresponds to the difference between the levels, for example, for Lake Eildon,

v = \frac{V_{E}}{N_{E}}

. Thus, in terms of the discharges, the generator

A^{1} (t, u)

can be represented as follows:

A^{1} (t, u) = \frac{N_{E}}{V_{E}} (\begin{matrix} - λ_{R} & λ_{R} & 0 & 0 & \dots & 0 \\ u_{E} & - u_{E} - λ_{R} & λ_{R} & 0 & \dots & 0 \\ 0 & u_{E} & - u_{E} - λ_{R} & λ_{R} & \dots & 0 \\ \dots \\ 0 & \dots & 0 & 0 & u_{E} & - u_{E}, \end{matrix})

(16)

where

u_{E}

is the discharge from Lake Eildon, and

λ_{R}

is the sum of all the natural inflows expressed in common units

[\frac{M L}{d a y}]

.

Denote the controlled discharges from Lake Nagambie by

u_{S M}

,

u_{C a}

,

u_{E G}

,

u_{G R}

, which stand respectively for the discharges of the Stuart Murray, Cattanach, East Goulburn Main Channels, and the Goulburn River, and the sum of these discharges as

u_{N} = u_{S M} + u_{C a} + u_{E G} + u_{G R}

. Since the outflow of Lake Eildon is at the same time the incoming flow of Lake Nagambie, we have the following representation of the generator

A^{2} (t, u)

:

A^{2} (t, u) = \frac{N_{N}}{V_{N}} (\begin{matrix} - u_{E} & u_{E} & 0 & 0 & \dots & 0 \\ u_{N} & - u_{N} - u_{E} & u_{E} & 0 & \dots & 0 \\ 0 & u_{N} & - u_{N} - u_{E} & u_{E} & \dots & 0 \\ \dots \\ 0 & \dots & 0 & 0 & u_{N} & - u_{N} . \end{matrix})

(17)

It should be noted that the capacity of Lake Eildon is more than 100 times larger than the capacity of Lake Nagambie. This causes certain problems for the numerical simulation since the same discharge values affect the transition rates in different scales. To overcome this issue, one can normalize the transition rates by adjusting the number of states, providing

N_{E} ≫ N_{N}

. Another way is to scale down the time discretization step, ensuring adequate probabilities of state transitions for the MC, which corresponds to Lake Nagambie.

5.2. Performance Criterion and Constraints

The performance criterion should take into account the customer demands and ensure the balance of the water reservoirs inflows and outflows. We assume that there are some reference values for the demands in the channels and the Goulburn River:

{\tilde{u}}_{S M}

,

{\tilde{u}}_{C a}

,

{\tilde{u}}_{E G}

,

{\tilde{u}}_{G R}

, which an optimal control aims to satisfy. Let

\begin{matrix} f_{i j} (t, u) = & {(u_{N} - u_{E})}^{2} \\ + & {(u_{S M} - {\tilde{u}}_{S M})}^{2} + {(u_{C a} - {\tilde{u}}_{C a})}^{2} + {(u_{E G} - {\tilde{u}}_{E G})}^{2} + {(u_{G R} - {\tilde{u}}_{G R})}^{2} \\ + & C u_{E}^{2} I {i = 1} + C u_{S M}^{2} I {j = 1} + C u_{C a}^{2} I {j = 1} + C u_{E G}^{2} I {j = 1} + C u_{G R}^{2} I {j = 1}, \end{matrix}

(18)

where i and j stand for the state number of Lakes Eildon and Nagambie respectively, C is some large number, and

I {\cdot}

is an indicator function.

The running cost function (4) with the state-related cost given by (18) reflects the mentioned aim along with the intention to balance the arriving and outgoing flows of Lake Nagambie. Plus, this running cost penalizes the nonzero discharges from both lakes, when they are in the states

X_{t}^{1} = e_{1}

,

X_{t}^{2} = e_{1}

, which corresponds to the lowest possible level, or drought.

The terminal condition

φ_{0} (X_{T}) = ⟨ φ_{T}, X_{T} ⟩

in (3) is defined by matrix

φ_{T}

with elements

φ_{T}^{i j}

, which correspond to the combination of states

X_{T}^{1} = i

and

X_{T}^{2} = j

. Let

{\tilde{X}}_{T}

be the desired terminal state of the system, then define the terminal condition as follows:

φ_{T}^{i j} = C \cdot {exp (∥ (i, j)}^{T} - {\tilde{X}}_{T} ∥_{1}),

(19)

where

{∥ \cdot ∥}_{1}

is the

L^{1}

norm (sum of absolute values). This condition penalizes any deviation of the system’s state at the end of the control interval T from the desired and, moreover, it assigns a larger penalty for greater deviations.

For all the control variables we also define phase constraints in the form of upper and lower bounds:

\underset{̲}{u} \leq u \leq \bar{u}, u = {(u_{E}, u_{S M}, u_{C a}, u_{E G}, u_{G R})}^{T} .

(20)

These constraints bound the maximal possible discharges for the channels and the Goulburn River and guarantee the environmental demand, mentioned earlier.

5.3. Parameters Definition

In previous subsections, the Goulburn River cascade model was defined in general. Nevertheless, there still are some undefined parameters, which will be specified using real-world data. To define the reference demand values

{\tilde{u}}_{S M}

,

{\tilde{u}}_{C a}

,

{\tilde{u}}_{E G}

,

{\tilde{u}}_{G R}

, we use the data on the discharges of the corresponding channels and river available from the Water Measurement Information System of the Department of Environment, Land, Water & Planning of Victoria State Government [25]. The data on the daily discharges was grouped by the day of year, and for each group, the mean value was calculated. The resulting time series were approximated by functions

u (t) = a_{0} + \sum_{i = 1}^{K} \{a_{i} sin (\frac{2 π i}{T} t) + b_{i} cos (\frac{2 π i}{T} t)\},

where

T = 365

days and coefficients

a_{i}

,

b_{i}

are chosen using the Least-Squares method. The resulting functions are smooth and have a period equal to one year, which makes them suitable for the reference values of the irrigation demands in the proposed model. The time series of the average daily discharges and the approximating functions are presented in Figure 2.

Figure 2. Average daily discharges (dotted lines) and the approximating smooth periodic functions (solid lines).

The data for the Lake Eildon incoming flow is obtained from the Climate Data Online service of the Bureau of Meteorology of the Australian Government [26]. The rainfall data is collected at the weather station situated there. This data is averaged on a daily basis and approximated by a smooth periodic function in the same way as the daily discharges. The resulting function serves as an incoming flow pattern

Λ_{R} (t)

. For this model, we assume that the irrigation demands agree with the total available natural resources, so the incoming flow pattern is normalized to make the incoming flow

λ_{R} (t) = c o n s t \cdot Λ_{R} (t)

satisfy the following condition:

\int_{0}^{T} ({\tilde{u}}_{S M} (t) + {\tilde{u}}_{C a} (t) + {\tilde{u}}_{E G} (t) + {\tilde{u}}_{G R} (t)) d t = \int_{0}^{T} λ_{R} (t) d t .

The resulting flow approximation along with normalized rainfall averages is presented in Figure 3.

Figure 3. Normalized rainfall daily averages (dotted line) and the approximation (solid line).

Specifying of the upper and lower bounds for the control parameters

\underset{̲}{u} = {({\underset{̲}{u}}_{E}, {\underset{̲}{u}}_{S M}, {\underset{̲}{u}}_{C a}, {\underset{̲}{u}}_{E G}, {\underset{̲}{u}}_{G R})}^{T}

and

\bar{u} = {({\bar{u}}_{E}, {\bar{u}}_{S M}, {\bar{u}}_{C a}, {\bar{u}}_{E G}, {\bar{u}}_{G R})}^{T}

by maximum and minimum registered values of the corresponding discharges time series obtained from Reference [25] finalizes the definition of our Goulburn River cascade model.

5.4. Simulation Results

The complete definition of the control optimization problem is given by

state Equation (1) with generators (16) and (17);
optimization criterion (3) with running cost (18) and terminal condition (19);
phase constraints (20);
incoming flow $λ_{R} (t)$ , reference values of the irrigation demands ${\tilde{u}}_{S M}$ , ${\tilde{u}}_{C a}$ , ${\tilde{u}}_{E G}$ , ${\tilde{u}}_{G R}$ and upper and lower bounds for the control parameters defined in Section 5.3.

Now the optimal Markov control (13) can be calculated as a solution to the system of ODEs (12). Remember that the minimization in the RHS of the ODEs can be done independently and hence can be effectively parallelized. The system (12) was solved numerically using the Euler method with the discretization step

δ t = 10^{- 4}

, while the time scale and all the necessary functions were normalized so that

T = 1.0

. The number of MCs’ states for both lakes was chosen as

N_{E} = N_{N} = 10

, and the desired terminal state

{\tilde{X}}_{T} = {(9, 9)}^{T}

.

In Figure 4 and Figure 5, the optimal control values are presented for the states

{(9, 9)}^{T}

and

{(7, 9)}^{T}

. One can see the nontrivial behavior of the optimal discharges, which differs from the reference values even for the desired terminal state. The evolution of the value function components

φ_{t}^{i j}

exhibits the influence both from the integral part of the criterion and the terminal condition. Indeed, for

{\tilde{X}}_{T} = {(9, 9)}^{T}

it monotonically decreases since the demands here are mostly satisfied and the system is already in the desired state. For the state

{(7, 9)}^{T}

, on the contrary, there are intervals of monotonic growth: the first is because of insufficient supplies in late summer and autumn months, and the second starts in spring and shows the necessity to transit to the desired terminal state

{\tilde{X}}_{T} = {(9, 9)}^{T}

.

Figure 4. Probability of the state

X^{1} (t) = 9

,

X^{2} (t) = 9

, corresponding value function component

φ_{t}^{9, 9}

and optimal Markov control

u_{9, 9}^{*}

.

Figure 5. Probability of the state

X^{1} (t) = 7

,

X^{2} (t) = 9

, corresponding value function component

φ_{t}^{7, 9}

and optimal Markov control

u_{7, 9}^{*}

.

The Kolmogorov Equation (15) allows us to calculate the probabilities of the global system states given the initial state or distribution. The probabilities for states

{(9, 9)}^{T}

and

{(7, 9)}^{T}

are presented in Figure 4 and Figure 5 on the upper subplots. The initial state here was chosen to be equal to the terminal

{\tilde{X}}_{0} = {\tilde{X}}_{T} = {(9, 9)}^{T}

. The solid red lines present the solution of (15), and the dotted lines are the result of Monte-Carlo sampling: the state rate was estimated using a set of 100 sample paths governed by the optimal control strategy. The state probabilities allow us to also calculate the average states of MCs, which can be converted into the average levels of Lakes Eildon and Nagambie. In Figure 6, the average levels calculated by means of the Kolmogorov Equation (15) are presented by solid lines. The corresponding averages calculated using the Monte-Carlo sampling are depicted with dotted lines.

Figure 6. Average levels of Lakes Eildon and Nagambie for optimal control.

As an alternative to the optimal control

u^{*} (t, X_{t})

we present here a program control

u^{p} (t, X_{t})

which aims to fully satisfy the customer demands:

u^{p} (t, X_{t}) = (\begin{matrix} I {X_{t} = e^{1} \otimes e^{j}} ({\tilde{u}}_{S M} + {\tilde{u}}_{C a} + {\tilde{u}}_{E G} + {\tilde{u}}_{G R}) \\ I {X_{t} = e^{i} \otimes e^{1}} {\tilde{u}}_{S M} \\ I {X_{t} = e^{i} \otimes e^{1}} {\tilde{u}}_{C a} \\ I {X_{t} = e^{i} \otimes e^{1}} {\tilde{u}}_{E G} \\ I {X_{t} = e^{i} \otimes e^{1}} {\tilde{u}}_{G R} \end{matrix}) .

The indicator functions make sure that the natural constraints are satisfied, that is, the water cannot be taken from an empty dam. In Figure 7, the average levels calculated by means of the Kolmogorov equation and Monte-Carlo sampling are presented. One can observe that this program control leads to overuse of the resources in contrast with the optimal one, which is able to refill the dams after the winter demand growth and rainfall shortage.

Figure 7. Average levels of Lakes Eildon and Nagambie for program control.

6. Conclusions

The paper presents a framework of controlled coupled MC optimization. The specific character of the problem requires the usage of tensor representation of the set of MCs, which permits us to derive the joint set of DPEs and organize the calculation in a parallel manner. This is important, especially for the most time-consuming minimization of the right-hand side in each discretization step of the numerical solution of the DPE. For a demonstration of the method, we chose a model of the Goulburn River cascade. The model was deliberately simplified to provide a clear demonstration of the approach; however, it is just a first step in a possible detailed analysis of this dam system, which would rely on more detailed data on natural and agricultural processes in the district.

Author Contributions

Conceptualization, B.M., G.M.and S.S.; Data curation, G.M.; Formal analysis, D.M., B.M., G.M.and S.S.; Investigation, D.M., B.M., G.M.and S.S.; Methodology, B.M. and S.S.; Software, G.M.; Supervision, B.M. and S.S.; Validation, B.M.; Visualization, G.M.; Writing–original draft, D.M., B.M., G.M.and S.S.; Writing–review–editing, D.M., G.M.and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work of G. Miller was partially supported by the Russian Foundation of Basic Research (RFBR Grant No. 19-07-00187-A).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DPE	Dynamic programming equation
GMW	Goulburn-Murray water
GSM	Goulburn simulation model
MC	Markov chain
ODE	Ordinary differential equation
RHS	Right-hand side

Appendix A. Tensor Product Properties

In this Section we provide the tensor related definitions used throughout the paper. For more details we refer the reader to Reference [27].

For the present paper it is sufficient to define a tensor by its entries like an M-dimensional array of elements

V^{i^{1} \dots i^{M}} \in R

, where

i^{1} \dots i^{M}

is multi-index (tuple) given by indices

i^{m} \in 1, \dots, N^{m}

. The set of such objects we denote as

R^{N^{1} \times \dots \times N^{m}}

.

Definition A1.

Let

V_{m} \in R^{N^{m}}

,

m = 1, \dots, M

then

V \in R^{N^{1} \times \dots \times N^{m}}

is called tensor product of vectors

V_{m}

V = V_{1} \otimes \dots \otimes V_{M} = ⨂_{m = 1}^{M} V_{m}

if its entries are equal to

V^{i^{1} \dots i^{M}} = V_{1}^{i^{1}} \cdot \dots \cdot V_{M}^{i^{M}}

.

From Definition A1 it follows that the tensor space

R^{N^{1} \times \dots \times N^{m}}

can be defined as a set of all finite linear combinations of tensor products of vectors from vector spaces

R^{N^{1}}, \dots, R^{N^{M}}

:

R^{N^{1} \times \dots \times N^{m}} = ⨂_{m = 1}^{M} R^{N^{m}} = s p a n {V_{1} \otimes \dots \otimes V_{M}, where V_{m} \in R^{N^{m}}, m = 1, \dots, M} .

Definition A2.

Let

A_{m} \in R^{N^{m} \times Q^{m}}

be a set of matrices, then the tensor product of these matrices is the linear mapping

A = A_{1} \otimes \dots \otimes A_{M} = ⨂_{m = 1}^{M} A^{m} : R^{N^{1} \times \dots \times N^{m}} ⟶ R^{Q^{1} \times \dots \times Q^{m}}

defined for any

V \in R^{N^{1} \times \dots \times N^{m}}

by

A V = (⨂_{m = 1}^{M} A^{m}) (⨂_{m = 1}^{M} V_{m}) = ⨂_{m = 1}^{M} (A_{m} V_{m}) \in R^{Q^{1} \times \dots \times Q^{m}} .

References

Filar, J.; Vrieze, K. Competitive Markov Decision Processes; Springer: New York, NY, USA, 1997. [Google Scholar] [CrossRef]
Williams, B.K. Markov decision processes in natural resources management: Observability and uncertainty. Ecol. Model. 2015, 220, 830–840. [Google Scholar] [CrossRef]
Delebecque, F.; Quadrat, J.P. Optimal control of markov chains admitting strong and weak interactions. Automatica 1981, 17, 281–296. [Google Scholar] [CrossRef]
Miller, B.M.; McInnes, D.J. Management of dam systems via optimal price control. Procedia Comput. Sci. 2011, 4, 1373–1382. [Google Scholar] [CrossRef][Green Version]
McInnes, D.; Miller, B. Optimal control of time-inhomogeneous Markov chains with application to dam management. In Proceedings of the 2013 Australian Control Conference, Perth, Australia, 4–5 November 2013; pp. 230–237. [Google Scholar] [CrossRef]
Miller, A.; Miller, B. Control of connected Markov chains. Application to congestion avoidance in the Internet. In Proceedings of the 2011 50th IEEE Conference on Decision and Control and European Control Conference, Orlando, FL, USA, 12–15 December 2011; pp. 7242–7248. [Google Scholar] [CrossRef]
Miller, B.M. Optimization of queuing system via stochastic control. Automatica 2009, 45, 1423–1430. [Google Scholar] [CrossRef]
Miller, A. Using methods of stochastic control to prevent overloads in data transmission networks. Autom. Remote. Control 2010, 71, 1804–1815. [Google Scholar] [CrossRef]
Miller, B.; McInnes, D. Optimal management of a two dam system via stochastic control: Parallel computing approach. In Proceedings of the 2011 50th IEEE Conference on Decision and Control and European Control Conference, Orlando, FL, USA, 12–15 December 2011; pp. 1417–1423. [Google Scholar] [CrossRef]
Elliott, R.J.; Aggoun, L.; Moore, J.B. Hidden Markov Models. Estimation and Control; Stochastic Modelling and Applied Probability; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
Miller, A.; Miller, B.; Popov, A.; Stepanyan, K. Towards the development of numerical procedure for control of connected Markov chains. In Proceedings of the 2015 5th Australian Control Conference (AUCC), Gold Coast, Australia, 5–6 November 2015; pp. 336–341. [Google Scholar]
Fourneau, J.; Plateau, B.; Stewart, W. Product form for Stochastic Automata Networks. In Proceedings of the 2nd International ICST Conference on Performance Evaluation Methodologies and Tools (Valuetools), Nantes, France, 23–25 October 2007. [Google Scholar] [CrossRef]
Jacod, J.; Shiryaev, A. Limit Theorems for Stochastic Processes. In Grundlehren Der Mathematischen Wissenschaften; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar] [CrossRef]
Metivier, M.; Pellaumail, J. Stochastic Integration. In Probability and Mathematical Statistics; Academic Press: New York, NY, USA, 1980. [Google Scholar] [CrossRef]
Miller, B.; Miller, G.; Siemenikhin, K. Towards the optimal control of Markov chains with constraints. Automatica 2010, 46, 1495–1502. [Google Scholar] [CrossRef]
Wan, C.B.; Davis, M.H.A. Existence of Optimal Controls for Stochastic Jump Processes. SIAM J. Control Optim. 1978, 17, 511–524. [Google Scholar] [CrossRef]
Miller, A.; Miller, B. Application of stochastic control to analysis and optimization of TCP. In Proceedings of the 2013 Australian Control Conference, Victoria, Australia, 4–5 November 2013; pp. 238–243. [Google Scholar] [CrossRef]
Schreider, S.; Whetton, P.; Jakeman, A.; Pittock, A. Runoff modelling for snow-affected catchments in the Australian alpine region, eastern Victoria. J. Hydrol. 1997, 200, 1–23. [Google Scholar] [CrossRef]
Schreider, S.; Jakeman, A.; Letcher, R.; Nathan, R.; Neal, B.; Beavis, S. Detecting changes in streamflow response to changes in non-climatic catchment conditions: Tarm dam development in the Murray–Darling basin, Australia. J. Hydrol. 2002, 262, 84–98. [Google Scholar] [CrossRef]
Griffith, M.; Codner, G.; Weinmann, E.; Schreider, S. Modelling hydroclimatic uncertainty and short-run irrigator decision making: The Goulburn system. Aust. J. Agric. Resour. Econ. 2009, 53, 565–584. [Google Scholar] [CrossRef]
Cui, J.; Schreider, S. Modelling of pricing and market impacts for water options. J. Hydrol. 2009, 371, 31–41. [Google Scholar] [CrossRef]
Dixon, P.; Schreider, S.; Wittwer, G. Combining engineering-based water models with a CGE model. In Quantitative Tools in Microeconomic Policy Analysis; Pincus, J., Ed.; Australian Productivity Commission: Canberra, Australia, 2005; Chapter 2; pp. 17–30. [Google Scholar]
Schreider, S.; Weinmann, P.E.; Codner, G.; Malano, H.M. Integrated Modelling System for Sustainable Water Allocation Planning. In Proceedings of the International Congress on Modelling and Simulation MODSIM01, Canberra, Australia, 1 January 2001; Ghassemi, F., McAleer, M., Oxley, L., Scoccimarro, M., Eds.; 2001; Volume 3, pp. 1207–1212. [Google Scholar]
Perera, B.; James, B.; Kularathna, M. Computer software tool REALM for sustainable water allocation and management. J. Environ. Manag. 2005, 77, 291–300. [Google Scholar] [CrossRef] [PubMed]
Water Measurement Information System of the Department of Environment, Land, Water & Planning of Victoria State Government. Available online: https://data.water.vic.gov.au/ (accessed on 25 July 2020).
Climate Data Online Service of the Bureau of Meteorology of the Australian Government. Available online: http://www.bom.gov.au// (accessed on 25 July 2020).
Hackbusch, W. Tensor Spaces and Numerical Tensor Calculus; Springer Series in Computational Mathematics; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]

Figure 1. Goulburn River cascade scheme.

Figure 2. Average daily discharges (dotted lines) and the approximating smooth periodic functions (solid lines).

Figure 3. Normalized rainfall daily averages (dotted line) and the approximation (solid line).

Figure 4. Probability of the state

X^{1} (t) = 9

,

X^{2} (t) = 9

, corresponding value function component

φ_{t}^{9, 9}

and optimal Markov control

u_{9, 9}^{*}

.

Figure 5. Probability of the state

X^{1} (t) = 7

,

X^{2} (t) = 9

, corresponding value function component

φ_{t}^{7, 9}

and optimal Markov control

u_{7, 9}^{*}

.

Figure 6. Average levels of Lakes Eildon and Nagambie for optimal control.

Figure 7. Average levels of Lakes Eildon and Nagambie for program control.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Towards Tensor Representation of Controlled Coupled Markov Chains

Abstract

1. Introduction

2. Model of Controlled Markov Chains System

2.1. Global State

2.2. Performance Criterion

3. Martingale Representation of the Global State

4. Optimal Control

4.1. Value Function Representation

4.2. Dynamic Programming Equation

4.3. Optimal Control Calculation

4.4. Kolmogorov Equation

5. Numerical Study: Goulburn River Cascade

5.1. Mcs and Generators

5.2. Performance Criterion and Constraints

5.3. Parameters Definition

5.4. Simulation Results

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A. Tensor Product Properties

References

Article Metrics

Citations

Article Access Statistics