On a Class of Tensor Markov Fields

Hernández-Lemus, Enrique

doi:10.3390/e22040451

Open AccessArticle

On a Class of Tensor Markov Fields

by

Enrique Hernández-Lemus

^1,2

¹

Computational Genomics Division, National Institute of Genomic Medicine, 14610 Mexico City, Mexico

²

Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, 04510 Mexico City, Mexico

Entropy 2020, 22(4), 451; https://doi.org/10.3390/e22040451

Submission received: 6 March 2020 / Revised: 1 April 2020 / Accepted: 9 April 2020 / Published: 16 April 2020

(This article belongs to the Special Issue Data Science: Measuring Uncertainties)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Here, we introduce a class of Tensor Markov Fields intended as probabilistic graphical models from random variables spanned over multiplexed contexts. These fields are an extension of Markov Random Fields for tensor-valued random variables. By extending the results of Dobruschin, Hammersley and Clifford to such tensor valued fields, we proved that tensor Markov fields are indeed Gibbs fields, whenever strictly positive probability measures are considered. Hence, there is a direct relationship with many results from theoretical statistical mechanics. We showed how this class of Markov fields it can be built based on a statistical dependency structures inferred on information theoretical grounds over empirical data. Thus, aside from purely theoretical interest, the Tensor Markov Fields described here may be useful for mathematical modeling and data analysis due to their intrinsic simplicity and generality.

Keywords:

Markov random fields; probabilistic graphical models; multilayer networks

1. General Definitions

Here, we introduce Tensor Markov Fields, i.e., Markov random fields [1,2] over tensor spaces. Tensor Markov Fields (TMFs) represent the joint probability distribution for a set of tensor-valued random variables.

Let

X = X_{α}^{β}

be one of such tensor-valued random variables. Here

X_{i}^{j} \in X

may represent either a variable

i \in α

, that may exist in a given context or layer

j \in β

(giving rise to a class of so-called multilayer graphical models or multilayer networks) or a single tensor-valued quantity

X_{i}^{j}

. A TMF will be an undirected multilayer graph representing the statistical dependency structure of X as given by the joint probability distribution

P (X)

.

As an extension of the case of Markov random fields, a TMF is a multilayer graph

\hat{G} = (V, E)

formed by a set V of vertices or nodes (the

X_{i}^{j}

’s) and a set

E \subseteq V \times V

of edges connecting the nodes, either on the same layer or through different layers (Figure 1). The set of edges represents a neighborhood law N stating which vertex is connected (dependent) to which other vertex in the multilayer graph. With this in mind, a TMF can be also represented (slightly abusing notation) as

\hat{G} = (V, N)

. The set of neighbors of a given point

X_{i}^{j}

will be denoted

N_{X_{i}^{j}}

.

1.1. Configuration

It is possible to assign to each point in the multilayer graph, one of a finite set S of labels. Such assignment will be called a configuration. We will assign probability measures to the set

Ω

of all possible configurations

ω

. Hence,

ω_{A}

represents the configuration

ω

restricted to the subset A of V. It is possible to think of

ω_{A}

as a configuration on the smaller multilayer graph

{\hat{G}}_{A}

restricting V to points of A (Figure 2).

1.2. Local Characteristics

It is also possible to extend the notion of local characteristics from MRFs. The local characteristics of a probability measure

P

defined on

Ω

are the conditional probabilities of the form:

P (ω_{t} | ω_{T \ t}) = P (ω_{t} | ω_{N_{t}})

(1)

i.e., the probability that the point t is assigned the value

ω_{t}

, given the values at all other points of the multilayer graph. In order to make explicit the tensorial nature of the multilayer graph

\hat{G}

, let us re-write Equation (1). Let us also recall the fact that the probability measure will define a tensor Markov random field (a TMF) if the local characteristics depend only of the knowledge of the outcomes at neighboring points, i.e., if for every

ω

P (ω_{X_{i}^{j}} | ω_{\hat{G} \ X_{i}^{j}}) = P (ω_{X_{i}^{j}} | ω_{N_{X_{i}^{j}}})

(2)

1.3. Cliques

Given an arbitrary graph (or in the present case a multilayer graph), we shall say that a set of points C is a clique if every pair of points in C are neighbors (see Figure 3). This definition includes the empty set as a clique. A clique is thus a set whose induced subgraph is complete, for this reason cliques are also called complete induced subgraphs or maximal subgraphs (although these latter term may be ambiguous).

1.4. Configuration Potentials

A potential

η

is a way to assign a number

η_{A} (ω)

to every subconfiguration

ω_{A}

of a configuration

ω

in the multilayer graph

\hat{G}

. Given a potential, we shall say that it defines (or better, induces) a dimensionless energy

U (ω)

on the set of all configurations

ω

by

U (ω) = \sum_{A} η_{A} (ω)

(3)

In the preceeding expression, for fixed

ω

, the sum is taken over all subsets

A \subseteq V

including the empty set. We can define a probability measure, called the Gibbs measure induced by U as

P (ω) = \frac{e^{- U (ω)}}{Z}

(4)

with Z a normalization constant called the partition function.

Z = \sum_{ω} e^{- U (ω)}

(5)

In physics, the term potential is often used in connection with the so-called potential energies. Physicists often call

η_{A}

a dimensionless potential energy, and they call

ϕ_{A} = e^{- η_{A}}

a potential.

Equations (4) and (5) can be thus rewritten as:

P (ω) = \frac{\prod_{A} ϕ_{A} (ω)}{Z}

(6)

Z = \sum_{ω} \prod_{A} ϕ_{A} (ω)

(7)

Since this latter use is more common in probability and graph theory, we will refer to Equations (6) and (7) as the definitions of Gibbs measure and partition function (respectively) unless otherwise stated.

1.5. Gibbs Fields

A potential is called a nearest neighbor Gibbs potential if

ϕ_{A} (ω) = 1

whenever A is not a clique. It is customary to refer as a Gibbs measure to a measure induced by a nearest neighbor Gibbs potential. However, it is possible to define more general Gibbs measures by considering other types of potentials.

The inclusion of all cliques in the calculation of the Gibbs measure is necessary to establish the equivalence between Gibbs random fields and Markov random fields. Let us see how a nearest neighbor Gibbs measure on a multilayer graph determines a TMF.

Let

P (ω)

be a probability measure determined on

Ω

by a nearest neighbor Gibbs potential

ϕ

:

P (ω) = \frac{\prod_{C} ϕ_{C} (ω)}{Z}

(8)

With the product taken over all cliques C on the multilayer graph

\hat{G}

. Then,

P (ω_{X_{i}^{j}} | ω_{\hat{G} \ X_{i}^{j}}) = \frac{P (ω)}{\sum_{ω^{'}} P (ω^{'})}

(9)

Here

ω^{'}

is any configuration which agrees with

ω

at all points except

X_{i}^{j}

.

P (ω_{X_{i}^{j}} | ω_{\hat{G} \ X_{i}^{j}}) = \frac{\prod_{C} ϕ_{C} (ω)}{\sum_{ω^{'}} \prod_{C} ϕ_{C} (ω^{'})}

(10)

For any clique C that does not contain

X_{i}^{j}

,

ϕ_{C} (ω) = ϕ_{C} (ω^{'})

, So that all the terms that correspond to cliques that do not contain the point

X_{i}^{j}

cancel both from the numerator and the denominator in Equation (10), therefore this probability depends only on the values

x_{i}^{j}

at

X_{i}^{j}

and its neighbors.

P

defines thus a TMF.

A more general proof of this equivalence is given by Hammersley-Clifford theorem that will be presented in the following section.

2. Extended Hammersley Clifford Theorem

Here we will outline a proof for an extension of Hammersley-Clifford theorem for Tensor Markov Fields (i.e., we will show that a Tensor Markov Field is equivalent to a Tensor Gibbs Field).

Let

\hat{G} = (V, N)

be a multilayer graph representing a TMF as defined in the previous section. With

V = X_{α}^{β} = {X_{i}^{j}}

, a set of vertices over a tensor field and N a neighborhood law that connects vertices over this tensor field. The field

\hat{G}

obeys the following neighborhood law given its Markov property (see Equation (2))

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = P (X_{i}^{j} | X_{N_{i}^{j}})

(11)

Here

X_{N_{i}^{j}}

is any neighbor of

X_{i}^{j}

. The Hammersley-Clifford theorem states that a MRF is also a local Gibbs field. In the case of a TMF we have the following expression:

P (X) = \frac{1}{Z} \prod_{c \in C_{\hat{G}}} ϕ_{c} (X_{c})

(12)

In order to prove the equivalence of Equations (11) and (12), we will first bult a deductive (backward direction) part of the proof to be complemented with a constructive (forward direction) part as presented in the following subsections.

2.1. Backward Direction

Let us consider Equation (11) at the light of Bayes’ theorem:

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = \frac{P (X_{i}^{j}, X_{N_{i}^{j}})}{P (X_{N_{i}^{j}})}

(13)

Using a clique-approach to calculate the joint and marginal probabilities (see next subsection to support the following statement):

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = \frac{\sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in C_{\hat{G}}} ϕ_{c} (X_{c})}{\sum_{X_{i}^{j}} \sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in C_{\hat{G}}} ϕ_{c} (X_{c})}

(14)

Let us split the product

\prod_{c \in C_{\hat{G}}} ϕ_{c} (X_{c})

into two products, one over the set of cliques that contain

X_{i}^{j}

(let us call it

C_{i}^{j}

) and another set formed by cliques not containing

X_{i}^{j}

(let us call it

R_{i}^{j}

):

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = \frac{\sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c}) \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}{\sum_{X_{i}^{j}} \sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c}) \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}

(15)

Factoring out the terms depending on

X_{i}^{j}

(that do not contribute to cliques in the domain

\hat{G} \ X_{i}^{j}

):

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = \frac{\prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c}) \sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}{\sum_{X_{i}^{j}} \prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c}) \sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}

(16)

The term

\sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})

does not involve

X_{i}^{j}

(by construction) so, it can be factored out from the summation over

X_{i}^{j}

in the denominator.

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = \frac{\prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c}) \sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}{\sum_{\hat{G} \ D_{i}^{j}} \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c}) \sum_{X_{i}^{j}} \prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c})}

(17)

We can cancel the term in the numerator and denominator:

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = \frac{\prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c})}{\sum_{X_{i}^{j}} \prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c})}

(18)

Then we multiply by

\frac{\prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}{\prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = \frac{\prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c}) \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}{\sum_{X_{i}^{j}} \prod_{c \in C_{i}^{j}} ϕ_{c} (X_{c}) \prod_{c \in R_{i}^{j}} ϕ_{c} (X_{c})}

(19)

Remembering that

C_{i}^{j} ⋃ R_{i}^{j} = C_{\hat{G}}

,

P (X_{i}^{j} | X_{\hat{G} \ X_{i}^{j}}) = \frac{\prod_{c \in \hat{G}} ϕ_{c} (X_{c})}{\sum_{X_{i}^{j}} \prod_{c \in \hat{G}} ϕ_{c} (X_{c})}

(20)

Equation (20) is nothing but the definition of a local Gibbs Tensor Field (Equation (12)).

2.2. Forward Direction

In this subsection we will show how to express the clique potential functions

ϕ_{c} (X_{c})

, given the joint probability distribution over the tensor field and the Markov property.

Consider any subset

σ \subset \hat{G}

of the multilayer graph

\hat{G}

. We define a candidate potential function (following Möbius inversion lemma) [3] as follows:

f_{σ} (X_{σ} = x_{σ}) = \prod_{ζ \subset σ} P {(X_{ζ} = x_{ζ}, X_{\hat{G} \ ζ} = 0)}^{- 1^{| σ | - | ζ |}}

(21)

In order for

f_{σ}

to be a proper clique potential, it must satisfy the following two conditions:

(i): $\prod_{σ \subset \hat{G}} f_{σ} (X_{σ}) = P (X)$
(ii): $f_{σ} (X_{σ}) = 1$ whenever $σ$ is not a clique

To prove (i), we need to show that all factors in

f_{σ} (X_{σ} = x_{σ})

cancel out, except for

P (X)

.

To do this, it will be useful to consider the following combinatorial expansion of zero:

0 = {(1 - 1)}^{K} = C_{0}^{K} - C_{1}^{K} + C_{2}^{K} + \dots + {(- 1)}^{K} C_{K}^{K}

(22)

Here, of course

C_{B}^{A}

is the number of combinations of B elements from an A-element set.

Let us consider any subset

ζ

of

\hat{G}

. Let us consider a factor

Δ = P (X_{ζ} = x_{ζ}, X_{\hat{G} \ ζ} = 0)

. For the case of

f ζ (X_{ζ})

it occurs as

Δ^{- 1^{0}} = Δ

. Such factor also occurs in subsets containg

ζ

and other additional elements. If it includes

ζ

and one additional element, there are

C_{1}^{| \hat{G} | - | ζ |}

such functions. The additional element creates an inverse factor

Δ^{- 1^{1}} = Δ^{- 1}

. The functions over subsets containg

ζ

and two additional elements contributes with a factor

Δ^{- 1^{2}} = Δ^{1} = Δ

. If we continue this process and consider Equation (22), it is evident that all odd cardinality difference terms cancel out with all even cardinality difference terms so that the only remaining factor corresponds to

ζ = \hat{G}

equal to

P (X)

thus fulfilling condition (i).

In order to show how condition (ii) is fulfilled, we will need to use the Markov property of TMFs. Let us consider

σ * \subset \hat{G}

that is not a clique. Then it will be possible to find two nodes

X_{i}^{h}

and

X_{j}^{k}

in

σ *

that are not connected to each other. Let us recall Equation (21):

f_{σ} (X_{σ} * = x_{σ} *) = \prod_{ζ \subset σ *} P {(X_{ζ} = x_{ζ}, X_{\hat{G} \ ζ} = 0)}^{- 1^{| σ * | - | ζ |}}

(23)

An arbitrary subset

ζ

may belong to any of the following classes: (i)

ζ = ω

a generic subset of

σ

; (ii)

ζ = ω \cup {X_{i}^{h}}

; (iii)

ζ = ω \cup {X_{j}^{k}}

or (iv)

ζ = ω \cup {X_{i}^{h}, X_{j}^{k}}

. If we write down Equation (23) factored down to these contributions we get:

f_{σ} (X_{σ} * = x_{σ} *) = \prod_{ω \subset σ * \ {X_{i}^{h}, X_{j}^{k}}} {[\frac{P (X_{ω}, X_{\hat{G} \ ω} = 0) P (X_{ω \cup {X_{i}^{h}, X_{j}^{k}}}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0)}{P (X_{ω \cup {X_{i}^{h}}}, X_{\hat{G} \ ω \cup {X_{i}^{h}}} = 0) P (X_{ω \cup {X_{j}^{k}}}, X_{\hat{G} \ ω \cup {X_{j}^{k}}} = 0)}]}^{- 1^{| σ * | - | ζ |}}

(24)

Let us consider two of the factors in Equation (24) at the light of Bayes’ theorem:

\frac{P (X_{ω}, X_{\hat{G} \ ω} = 0)}{P (X_{ω \cup {X_{i}^{h}}}, X_{\hat{G} \ ω \cup {X_{i}^{h}}} = 0)} = \frac{P (X_{{X_{i}^{h}}} = 0 | X_{{X_{j}^{k}}} = 0, X_{ω}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0) P (X_{{X_{j}^{k}}} = 0, X_{ω}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0)}{P (X_{{X_{i}^{h}}} | X_{{X_{j}^{k}}} = 0, X_{ω}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0) P (X_{{X_{j}^{k}}} = 0, X_{ω}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0)}

(25)

We can notice that the priors in the numerator and denominator of Equation (25) are the same. We can then cancell them out. Since by definition

X_{i}^{h}

and

X_{j}^{k}

are conditionally independent given the rest of the multilayer graph, we can also replace the default value

X_{j}^{k} = 0

for

X_{j}^{k}

instead.

\frac{P (X_{ω}, X_{\hat{G} \ ω} = 0)}{P (X_{ω \cup {X_{i}^{h}}}, X_{\hat{G} \ ω \cup {X_{i}^{h}}} = 0)} = \frac{P (X_{{X_{i}^{h}}} = 0 | X_{{X_{j}^{k}}}, X_{ω}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0) P (X_{{X_{j}^{k}}}, X_{ω}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0)}{P (X_{{X_{i}^{h}}} | X_{{X_{j}^{k}}}, X_{ω}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0) P (X_{{X_{j}^{k}}}, X_{ω}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0)}

(26)

Since

X_{i}^{h}

and

X_{j}^{k}

are conditionally independent given the rest of the multilayer graph, we can also replace the condition for

X_{j}^{k}

with any other, without affecting

X_{i}^{h}

. By adjusting this prior conveniently, we can write out:

\frac{P (X_{ω}, X_{\hat{G} \ ω} = 0)}{P (X_{ω \cup {X_{i}^{h}}}, X_{\hat{G} \ ω \cup {X_{i}^{h}}} = 0)} = \frac{P (X_{ω \cup {X_{j}^{k}}}, X_{\hat{G} \ ω \cup {X_{j}^{k}}} = 0)}{P (X_{ω \cup {X_{i}^{h}, X_{j}^{k}}}, X_{\hat{G} \ ω \cup {X_{i}^{h}, X_{j}^{k}}} = 0)}

(27)

By substituting Equation (27) into Equation (24) we get (condition (ii)):

f_{σ} * (X_{σ} *) = 1

(28)

3. An Information-Theoretical Class of Tensor Markov Fields

Let us consider again the set of tensor-valued random variables

X = X_{α}^{β}

. It is possible to calculate, for every duplex in X, the mutual information function

I (\cdot, \cdot)

[4]:

I (X_{i}^{h}, X_{j}^{k}) = \sum_{Ω} \sum_{Ω^{'}} p (X_{i}^{h}, X_{j}^{k}) log \frac{p (X_{i}^{h}, X_{j}^{k})}{p (X_{i}^{h}) p (X_{j}^{k})}

(29)

Let us consider a multilayer graph scenario. From now on, the indices

i, j

will refer to the random variables, whereas

h, k

will be indices for the layers.

Ω

and

Ω^{'}

are the respective sampling spaces (that may, of course, be equal). In order to discard self-information, let us define the off-diagonal mutual information as follows:

I^{†} (X_{i}^{h}, X_{j}^{k}) = I (X_{i}^{h}, X_{j}^{k}) \times (1 - δ_{X_{i}^{h} X_{j}^{k}})

(30)

With the bi-delta function

δ_{X_{i}^{h} X_{j}^{k}}

defined as:

δ_{X_{i}^{h} X_{j}^{k}} = \{\begin{matrix} 1, & if i = j and h = k \\ 0, & otherwise \end{matrix}

(31)

By having the complete set of off-diagonal mutual information functions for all the random variables and layers, it is possible to define the following hyper-matrix elements:

A_{i j}^{h k} = Θ [I^{†} (X_{i}^{h}, X_{j}^{k}) - I_{0}]

(32)

as well as:

W_{i j}^{h k} = A_{i j}^{h k} \circ I^{†} (X_{i}^{h}, X_{j}^{k})

(33)

Here

Θ [\cdot]

is Heavyside’s function and

I_{0}

is a lower bound for mutual information (a threshold) to be considered significant.

We call

A_{i j}^{h k}

and

W_{i j}^{h k}

the adjacency hypermatrix and the strength hypermatrix respectively (notice that ∘ in Equation (33) represents the product of a scalar times a hypermatrix). The adjacency hyper-matrix and the strength hyper-matrix define the (unweighted and weighted, respectively) neighborhood law of the associated TMF, hence the statistical dependency structure for the set of random variables and contexts (layers).

Although the adjacency and strength hypermatrices are indeed, proper representations of the undirected (unweighted and weighted) dependency structure of

P (X)

, it has been considered advantageous to embed them into a tensor linear structure, in order to be able to work out some of the mathematical properties of such fields relying on the methods of tensor algebra. One relevant proposal in this regard, has been advanced by De Domenico and collaborators, in the context of multilayer networks.

Following the ideas of De Domenico and co-workers [5], we introduce the unweighted and weighted adjacency 4-tensors (respectively) as follows:

A = \sum_{h, k = 1}^{L} \sum_{i, j = 1}^{N} A_{i j}^{h k} \otimes ξ_{β δ}^{α γ}

(34)

W = \sum_{h, k = 1}^{L} \sum_{i, j = 1}^{N} W_{i j}^{h k} \otimes ξ_{β δ}^{α γ}

(35)

Here,

ξ_{β δ}^{α γ} = ξ_{β δ}^{α γ} [i j h k]

is a unit four-tensor whose role is to provide the hypermatrices with the desired linear properties (projections, contractions, etc.). Square brackets indicate that the indices

i, j, h

and k belong to the

α, β, γ

and

δ

dimensions and ⊗ represents a form of a tensor matricization product (i.e., the one producing a 4-tensor out of a 4-index hypermatrix times a unitary 4-tensor).

3.1. Conditional Independence in Tensor Markov Fields

In order to discuss the conditional independence structure induced by the present class of TMFs, let us analyze Equation (32). As already mentioned, the hyper-adjacency matrix

A_{i j}^{h k}

represents the neigborhood law (as given by the Markov property) on the multilayer graph

\hat{G}

(i.e., the TMF). Every non-zero entry on this hypermatrix represents a statistical dependence relation between two elements on X. The conditional dependence structure on TMFs inferred from mutual information measures via Equation (32) are related not only to the statistical independence conditions (as given by a zero mutual information measure between two elements), but also to the lower bound

I_{0}

and in general to the dependency structure of the whole multilayer graph.

The definition of conditional independence (CI) for tensor random variables is as follows:

(X_{i}^{h} ⊥ ⊥ X_{j}^{k}) | X_{l}^{m} \Leftrightarrow F_{X_{i}^{h}, X_{j}^{k} | X_{l}^{m} = X_{l}^{m} *} (X_{i}^{h} *, X_{j}^{k} *) = F_{X_{i}^{h} | X_{l}^{m} = X_{l}^{m} *} (X_{i}^{h} *) \cdot F_{X_{j}^{k} | X_{l}^{m} = X_{l}^{m} *} (X_{j}^{k} *)

(36)

\forall X_{i}^{h}, X_{j}^{k}, X_{l}^{m} \in X

.

Here

⊥ ⊥

represents conditional independence between two random variables, were

F_{X_{i}^{h}, X_{j}^{k} | X_{l}^{m} = X_{l}^{m} *} (X_{i}^{h} *, X_{j}^{k} *) = P r (X_{i}^{h} \leq X_{i}^{h} *, X_{j}^{k} \leq X_{j}^{k} * | X_{l}^{m} = X_{l}^{m} *)

is the joint conditional cumulative distribution of

X_{i}^{h}

and

X_{j}^{k}

given

X_{l}^{m}

and

X_{i}^{h} *

,

X_{j}^{k} *

and

X_{l}^{m} *

are realization events of the corresponding random variables.

In the case of MRFs (and by extension TMFs), CI is defined by means of (multi)graph separation: in this sense we say that

X_{i}^{h} ⊥ ⊥_{\hat{G}} X_{j}^{k} | X_{l}^{m}

iff

X_{l}^{m}

separates

X_{i}^{h}

from

X_{j}^{k}

in the multilayer graph

\hat{G}

. This means that if we remove node

X_{l}^{m}

there are no undirected paths from

X_{i}^{h}

to

X_{j}^{k}

in

\hat{G}

.

Conditional independence in random fields is often considered in terms of subsets of V. Let A, B and C be three subsets of V. The statement

X_{A} ⊥ ⊥_{\hat{G}} X_{B} | X_{C}

, which holds only iff C separates A from B in the multilayer graph

\hat{G}

, meaning that if we remove all vertices in C there will be no paths connecting any vertex in A to any vertex in B is called the global Markov property of TMFs.

The smallest set of vertices that renders a vertex

X_{i}^{h}

conditionally independent of all other vertices in the multilayer graph is called its Markov blanket, denoted

m b (X_{i}^{h})

. If we define the closure of a node

X_{i}^{h}

as

C (X_{i}^{h})

then

X_{i}^{h} ⊥ ⊥ \hat{G} \ C (X_{i}^{h}) | m b (X_{i}^{h})

.

It is possible to show that in a TMF, the Markov blanket of a vertex is its set of first neighbors. This is called the undirected local Markov property. Starting from the local Markov property it is possible to show that two vertices

X_{i}^{h}

and

X_{j}^{k}

are conditionally independent given the rest if there is no direct edge between them. This has been called the pairwise Markov property.

If we denote by

{\hat{G}}_{X_{i}^{h} \to X_{j}^{k}}

the set of undirected paths in the multilayer graph

\hat{G}

connecting vertices

X_{i}^{h}

and

X_{j}^{k}

, then the pairwise Markov property of a TMF can be stated as:

X_{i}^{h} ⊥ ⊥ X_{j}^{k} | \hat{G} \ {X_{i}^{h}, X_{j}^{k}} \Leftrightarrow {\hat{G}}_{X_{i}^{h} \to X_{j}^{k}} = \emptyset

(37)

It is clear that the global Markov property implies the local Markov property which in turn implies the pairwise Markov property. For systems with positive definity probability densities, it has been probed (in the case of MRFs) that pairwise Markov actually implied global Markov (See [6] p. 119 for a proof). For the present extension this is important since it is easier to assess pairwise conditional independence statements.

3.2. Indepence Maps

Let

I_{\hat{G}}

denote the set of all conditional independence relations encoded by the multilayer graph

\hat{G}

(i.e., those CI relations given by the Global Markov property). Let

I_{P}

be the set of all CI relations implied by the probability distribution

P (X_{i}^{j})

. A multilayer graph

\hat{G}

will be called an independence map (I-map) for a probability distribution

P (X_{i}^{j})

, if all CI relations implied by

\hat{G}

hold for

P (X_{i}^{j})

, i.e.,

I_{\hat{G}} \subseteq I_{P}

[6].

The converse statement is not necessarily true, i.e., there may be some CI relations implied by

P (X_{i}^{j})

that are not encoded in the multilayer graph

\hat{G}

. We may be usually interested in minimal I-maps, i.e., I-maps from which none of the edges could be removed without destroying its CI properties.

Every distribution has a unique minimal I-map (and a given graph representation). Let

P (X_{i}^{j}) > 0

. Let

{\hat{G}}^{†}

be the multilayer graph obtained by introducing edges between all pairs of vertices

X_{i}^{h}

,

X_{j}^{k}

such that

X_{i}^{h} ⊥ ⊥ X_{j}^{k} | X \ {X_{i}^{h}, X_{j}^{k}}

, then

{\hat{G}}^{†}

is the unique minimal I-map. We call

\hat{G}

a perfect map of

P

when there is no dependencies

\hat{G}

which are not indicated by

P

, i.e.,

I_{\hat{G}} = I_{P}

[6].

3.3. Conditional Independence Tests

Conditional independence tests are useful to evaluate whether CI conditions apply either exactly or in the case of applications under a certain bounded error. In order to be able to write down expressions for C.I. tests let us introduce the following conditional kernels [7]:

C_{A} (B) = P (B | A) = \frac{P (A B)}{P (A)}

(38)

as well as their generalized recursive relations:

C_{A B C} (D) = C_{A B} (D | C) = \frac{C_{A B} (C D)}{C_{A B} (C)}

(39)

The conditional probability of

X_{h}^{k}

given

X_{i}^{j}

can be thus written as:

C_{X_{i}^{j}} (X_{h}^{k}) = P (X_{h}^{k} | X_{i}^{j}) = \frac{P (X_{h}^{k}, X_{i}^{j})}{P (X_{i}^{j})}

(40)

We can then write down expressions for Markov conditional independence as follows:

X_{i}^{j} ⊥ ⊥ X_{h}^{k} | X_{l}^{m} \Rightarrow P (X_{i}^{j}, X_{h}^{k} | X_{l}^{m}) = P (X_{i}^{j} | X_{l}^{m}) \times P (X_{h}^{k} | X_{l}^{m})

(41)

Following Bayes’ theorem, CI conditions –in this case– will be of the form:

P (X_{i}^{j}, X_{h}^{k} | X_{l}^{m}) = \frac{P (X_{i}^{j}, X_{l}^{m})}{P (X_{l}^{m})} \times \frac{P (X_{h}^{k}, X_{l}^{m})}{P (X_{l}^{m})} = \frac{P (X_{i}^{j}, X_{l}^{m}) \times P (X_{h}^{k}, X_{l}^{m})}{{P (X_{l}^{m})}^{2}}

(42)

Equation (42) is useful since in large scale data applications is computationally cheaper to work with joint and marginal probabilities rather than conditionals.

Now let us consider the case of conditional independence given several conditional variables. The case for CI given two variables could be written—using conditional kernels—as follows:

X_{i}^{j} ⊥ ⊥ X_{h}^{k} | X_{l}^{m}, X_{n}^{o} \Rightarrow P (X_{i}^{j}, X_{h}^{k} | X_{l}^{m}, X_{n}^{o}) = P (X_{i}^{j} | X_{l}^{m}, X_{n}^{o}) \times P (X_{h}^{k} | X_{l}^{m}, X_{n}^{o})

(43)

Hence,

P (X_{i}^{j}, X_{h}^{k} | X_{l}^{m}, X_{n}^{o}) = C_{X_{l}^{m}, X_{n}^{o}} (X_{i}^{j}) \times C_{X_{l}^{m}, X_{n}^{o}} (X_{h}^{k})

(44)

Using Bayes’ theorem,

P (X_{i}^{j}, X_{h}^{k} | X_{l}^{m}, X_{n}^{o}) = \frac{P (X_{i}^{j}, X_{l}^{m}, X_{n}^{o})}{P (X_{l}^{m}, X_{n}^{o})} \times \frac{P (X_{h}^{k}, X_{l}^{m}, X_{n}^{o})}{P (X_{l}^{m}, X_{n}^{o})}

(45)

P (X_{i}^{j}, X_{h}^{k} | X_{l}^{m}, X_{n}^{o}) = \frac{P (X_{i}^{j}, X_{l}^{m}, X_{n}^{o}) \times P (X_{h}^{k}, X_{l}^{m}, X_{n}^{o})}{{P (X_{l}^{m}, X_{n}^{o})}^{2}}

(46)

In order to generalize the previous results to CI relations given an arbitrary set of conditionals, let us consider the following sigma-algebraic approach:

Let

Σ_{i h}^{j k}

be the

σ

-algebra of all subsets of X that do not contain

X_{i}^{j}

or

X_{h}^{k}

. If we consider the contravariant index

i \in α

with

i = 1, 2, \dots, N

and the covariant index

j \in β

with

j = 1, 2, \dots, L

, then there are

M = \frac{N L}{2}

such

σ

-algebras in X (let us recall that TMFs are undirected graphical models).

A relevant problem for network reconstruction is that of establishing the more general Markov pairwise CI conditions, i.e., the CI relations for every edge not drawn in the graph. Two arbitrary nodes

X_{i}^{j}

and

X_{h}^{k}

are conditionally independent given the rest of the graph iff:

X_{i}^{j} ⊥ ⊥ X_{h}^{k} | Σ_{i h}^{j k} \Rightarrow P (X_{i}^{j}, X_{h}^{k} | Σ_{i h}^{j k}) = P (X_{i}^{j} | Σ_{i h}^{j k}) \times P (X_{h}^{k} | Σ_{i h}^{j k})

(47)

By using conditional kernels, the recursive relations and Bayes’ theorem it is possible to write down

M

expressions of the form:

P (X_{i}^{j}, X_{h}^{k} | Σ_{i h}^{j k}) = \frac{P (X_{i}^{j}, Σ_{i h}^{j k}) \times P (X_{h}^{k}, Σ_{i h}^{j k})}{{P (Σ_{i h}^{j k})}^{2}}

(48)

The family of Equations (48) represent the CI relations for all the non-existing edges in the hypergraph

\hat{G}

, i.e., every pair of nodes

X_{i}^{j}

and

X_{h}^{k}

not-connected in

\hat{G}

must be conditionally independent given the rest of the nodes in the graph. These expression may serve to implement exact tests or optimization strategies for graph reconstruction and/or graph sparsification in applications considering a mutual information threshold

I_{0}

as in Equation (32).

In brief, for every node pair with a mutual information value lesser than

I_{0}

, the presented graph reconstruction approach will not draw an edge, hence implying CI between the two nodes given the rest. Such CI condition may be tested on the data to see whether it holds or the threshold itself can be determined by resorting to optimization schemes (e.g., error bounds) in Equation (48).

4. Graph Theoretical Features and Multilinear Structure

Once the probabilistic properties of TMFs have been set, it may be fit to briefly present some of their graph theoretical features, as well as some preliminaries as to the reasons to embed hyperadjecency matrices into multilayer adjacency tensors. Given that TMFs are indeed PGMs, some of their graph characteristics will result relevant here.

Since the work by De Domenico and coworkers [5] covers in great detail how the multilinear structure of the multilayer adjacency tensor allows the calculation of these quantities—usually as projection operations—we will only mention connectivity degree vectors since these are related with the size of the TMF dependency neighborhoods.

Let us recall multilayer adjacency tensors, as defined in Equations (34) and (35). To ease presentation, we will work with the unweighted tensor

A_{β δ}^{α γ}

(Equation (34)). The multidegree centrality vector

K^{α}

which contains the connectivity degrees of the nodes spanning different layers can be written as follows:

K^{α} = A_{β δ}^{α γ} U_{γ}^{δ} u^{β}

(49)

Here

U_{γ}^{δ}

is a rank 2 tensor that contains a 1 in every component and

u^{β}

is a rank 1 tensor that contains a 1 in every component—these quantities are called 1—tensors by De Domenico and coworkers [5]. It can be shown that

K^{α}

is indeed given by the sums of the connectivity degree vectors

k^{α}

corresponding to all different layers:

K^{α} = \sum_{h = 1}^{L} \sum_{k = 1}^{L} k^{α} (h k)

(50)

k^{α} (h k)

is the vector of connections that nodes in the set

α = 1, 2, \dots, N

in layer h have to any other nodes in layer k. Whereas

K^{α}

is the vector with connections in all the layers. Appropriate projections will yield measures such as the size of the neighborhood to a given vertex

| N_{X_{i}^{j}} |

, the size of its Markov blanquet

| m b (X_{i}^{h}) |

, or other similar quantities.

5. Specific Applications

After having considered some of the properties of this class of Tensor Markov Fields, it may become evident that aside from purely theoretical importance, there is a number of important applications that may arise as probabilistic graphical models in tensor valued problems, among the ones that are somewhat evident are the following:

The analysis of multidimensional biomolecular networks such as the ones arising from multi-omic experiments (For a real-life example, see Figure 4) [8,9,10];
Probabilistic graphical models in computer vision (especially 3D reconstructions and 4D [3D+time] rendering) [11];
The study of fracture mechanics in continuous deformable media [12];
Probabilistic network models for seismic dynamics [13];
Boolean networks in control theory [14].

Some of these problems are being treated indeed as multiple instances of Markov fields or as multipartite graphs or hypergraphs. However, it may become evident that when random variables across layers are interdependent (which is often the case), the definitions of potentials, cliques and partition functions, as well as the conditional statistical independence features become manageable (and in some cases even meaninful) under the presented formalism of Tensor Markov Fields.

6. Conclusions

Here we have presented the definitions and fundamental properties of Tensor Markov Fields, i.e., random Markov fields over tensor spaces. We have proved –by extending the results of Dobruschin, Hammersley and Clifford to such tensor valued fields– that tensor Markov fields are indeed Gibbs fields whenever strictly positive probability measures are considered. We also introduced a class of tensor Markov fields obtained by using information theoretical statistical dependence measures inducing local and global Markov properties, and show how these can be used as probabilistic graphical models in multi-context environments much in the spirit of the so-called multilayer network approach. Finally, we discuss the convenience of embedding tensor Markov fields in the multilinear tensor representation of multilayer networks.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

Dobruschin, R.L. The description of a random field by means of conditional probabilities and conditions of its regularity. Theory Probab. Appl. 1968, 13, 197–224. [Google Scholar] [CrossRef]
Grimmett, G.R. A theorem about random fields. Bull. Lond. Math. Soc. 1973, 5, 81–84. [Google Scholar] [CrossRef]
Rota, G.C. On the foundations of combinatorial theory I: Theory of Möbius functions. Probab. Theory Relat. Fields 1964, 2, 340–368. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
De Domenico, M.; Solé-Ribalta, A.; Cozzo, E.; Kivelä, M.; Moreno, Y.; Porter, M.A.; Gómez, S.; Arenas, A. Mathematical formulation of multilayer networks. Phys. Rev. X 2013, 3, 041022. [Google Scholar] [CrossRef]
Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques (Adaptive Computation and Machine Learning series). Mit Press. Aug 2009, 31, 2009. [Google Scholar]
Williams, D. Probability with Martingales; Cambridge University Press: Cambridge, UK, 1991. [Google Scholar]
Hernández-Lemus, E.; Espinal-Enríquez, J.; de Anda-Jáuregui, G. Probabilistic multilayer networks. arXiv 2018, arXiv:1808.07857. [Google Scholar]
De Anda-Jauregui, G.; Hernandez-Lemus, E. Computational Oncology in the Multi-Omics Era: State of the Art. Front. Oncol. 2020, 10, 423. [Google Scholar] [CrossRef]
Hernández-Lemus, E.; Reyes-Gopar, H.; Espinal-Enríquez, J.; Ochoa, S. The Many Faces of Gene Regulation in Cancer: A Computational Oncogenomics Outlook. Genes 2019, 10, 865. [Google Scholar] [CrossRef] [PubMed]
McGee, F.; Ghoniem, M.; Melançon, G.; Otjacques, B.; Pinaud, B. The state of the art in multilayer network visualization. In Computer Graphics Forum; John Wiley & Sons: Hoboken, NJ, USA, 2019; Volume 38, pp. 125–149. [Google Scholar]
Krejsa, M.; Koubová, L.; Flodr, J.; Protivínskỳ, J.; Nguyen, Q.T. Probabilistic prediction of fatigue damage based on linear fracture mechanics. Fract. Struct. Integr. 2017, 39, 143–159. [Google Scholar] [CrossRef]
Abe, S.; Suzuki, N. Complex network of earthquakes. In International Conference on Computational Science; Springer: Berlin/Heidelberg, Germany, 2004; pp. 1046–1053. [Google Scholar]
Liu, F.; Cui, Y.; Wang, J.; Ji, D. Observability of probabilistic Boolean multiplex networks. Asian J. Control. 2020, 1, 1–8. [Google Scholar] [CrossRef]

Figure 1. A Tensor Markov Field: represented as a multilayer graph spanning over

X_{i}^{j}

with

i = {1, 2, 3, 4}

and

j = {I, I I}

. To illustrate, layer I is colored in blue and layer

I I

is colored green.

Figure 1. A Tensor Markov Field: represented as a multilayer graph spanning over

X_{i}^{j}

with

i = {1, 2, 3, 4}

and

j = {I, I I}

. To illustrate, layer I is colored in blue and layer

I I

is colored green.

Figure 2. Three different configurations of a Tensor Markov Fieldpanels (i), (ii) and (iii) present different configurations or states of the TMF. Labels are represented by color intensity.

Figure 3. Cliques on a Tensor Markov Field: The set

{X_{2}^{I}, X_{3}^{I}, X_{4}^{I}}

forms an intra-layer 2-clique (as marked by the red edges, all on layer I), the set

{X_{3}^{I}, X_{3}^{I I}}

forms an inter-layer 1-clique (marked by the blue edge connecting layers I and

I I

). However, the set

{X_{3}^{I}, X_{3}^{I I}, X_{4}^{I}, X_{4}^{I I},}

is not a clique since there are no edges between

X_{3}^{I}

and

X_{4}^{I I}

nor between

X_{3}^{I}

and

X_{4}^{I I}

.

Figure 3. Cliques on a Tensor Markov Field: The set

{X_{2}^{I}, X_{3}^{I}, X_{4}^{I}}

forms an intra-layer 2-clique (as marked by the red edges, all on layer I), the set

{X_{3}^{I}, X_{3}^{I I}}

forms an inter-layer 1-clique (marked by the blue edge connecting layers I and

I I

). However, the set

{X_{3}^{I}, X_{3}^{I I}, X_{4}^{I}, X_{4}^{I I},}

is not a clique since there are no edges between

X_{3}^{I}

and

X_{4}^{I I}

nor between

X_{3}^{I}

and

X_{4}^{I I}

.

Figure 4. Gene and microRNA regulatory network: A Tensor Markov Field depicting the statistical dependence of genome wide gene and microRNA (miR) on a human phenotype. Edge width is given by the mutual information

I^{†} (X_{i}^{j}, X_{h}^{k})

between expression levels of genes (layer j) and miRs (layer k) in a very large corpus of RNASeq samples, vertex size is proportional to the degree, i.e., the size of the node’s neighborhood,

N_{X_{i}^{j}}

.

Figure 4. Gene and microRNA regulatory network: A Tensor Markov Field depicting the statistical dependence of genome wide gene and microRNA (miR) on a human phenotype. Edge width is given by the mutual information

I^{†} (X_{i}^{j}, X_{h}^{k})

between expression levels of genes (layer j) and miRs (layer k) in a very large corpus of RNASeq samples, vertex size is proportional to the degree, i.e., the size of the node’s neighborhood,

N_{X_{i}^{j}}

.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hernández-Lemus, E. On a Class of Tensor Markov Fields. Entropy 2020, 22, 451. https://doi.org/10.3390/e22040451

AMA Style

Hernández-Lemus E. On a Class of Tensor Markov Fields. Entropy. 2020; 22(4):451. https://doi.org/10.3390/e22040451

Chicago/Turabian Style

Hernández-Lemus, Enrique. 2020. "On a Class of Tensor Markov Fields" Entropy 22, no. 4: 451. https://doi.org/10.3390/e22040451

APA Style

Hernández-Lemus, E. (2020). On a Class of Tensor Markov Fields. Entropy, 22(4), 451. https://doi.org/10.3390/e22040451

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On a Class of Tensor Markov Fields

Abstract

1. General Definitions

1.1. Configuration

1.2. Local Characteristics

1.3. Cliques

1.4. Configuration Potentials

1.5. Gibbs Fields

2. Extended Hammersley Clifford Theorem

2.1. Backward Direction

2.2. Forward Direction

3. An Information-Theoretical Class of Tensor Markov Fields

3.1. Conditional Independence in Tensor Markov Fields

3.2. Indepence Maps

3.3. Conditional Independence Tests

4. Graph Theoretical Features and Multilinear Structure

5. Specific Applications

6. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI