Conditioning in Tropical Probability Theory

Matveev, Rostislav; Portegies, Jacobus W.

doi:10.3390/e25121641

Open AccessFeature PaperArticle

Conditioning in Tropical Probability Theory

by

Rostislav Matveev

^1,*

and

Jacobus W. Portegies

^2,*

¹

Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany

²

Department of Mathematics and Computer Science, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands

^*

Authors to whom correspondence should be addressed.

Entropy 2023, 25(12), 1641; https://doi.org/10.3390/e25121641

Submission received: 6 November 2023 / Revised: 4 December 2023 / Accepted: 7 December 2023 / Published: 9 December 2023

(This article belongs to the Special Issue Synergy and Redundancy Measures: Theory and Applications to Characterize Complex Systems and Shape Neural Network Representations)

Download Versions Notes

Abstract

:

We define a natural operation of conditioning of tropical diagrams of probability spaces and show that it is Lipschitz continuous with respect to the asymptotic entropy distance.

Keywords:

tropical probability

1. Introduction

In [1,2], we have initiated the study of tropical probability spaces and their diagrams. In [1], we endowed (commutative) diagrams of probability spaces with the intrinsic entropy distance and, in [2], we defined tropical diagrams as points in the asymptotic cone of the metric space. They are represented by certain sequences of diagrams of probability spaces.

We expect that tropical diagrams will be helpful in the study of information optimization problems, such as the ones considered in [3,4,5,6,7,8], and we have indeed applied them to derive a dimension-reduction result for the shape of the entropic cone in [9].

In this present article, we introduce the notion of conditioning on a space in a tropical diagram and show that the operation is Lipschitz continuous with respect to the asymptotic entropy distance.

It is a rather technical result, and we have, therefore, decided to treat it in this separate article, but it is an important ingredient in the theory and, in particular, we need it for the dimension-reduction result mentioned before.

Given a tuple of finite-valued random variables

{(X_{i})}_{i = 1}^{n}

and a random variable

Y

, one may "condition" the collection

(X_{i})

on

Y

. The result of this operation is a family of n-tuples of random variables denoted

{(X_{i} | Y)}_{i = 1}^{n}

parameterized by those values of

Y

that have positive probability. Each tuple of random variables in this family is defined on a separate probability space.

When passing to the tropical setting, the situation is different in the sense that when we condition a tropical diagram

[X]

on a space

[Y]

, the result is again a tropical diagram

[X | Y]

rather than a family. After recalling some preliminaries in Section 2, we describe the operation of conditioning and prove that the result depends in a Lipschitz way on the original diagram in Section 3.

2. Preliminaries

Our main objects of study are commutative diagrams of probability spaces and their tropical counterparts. In this section, we recall briefly the main definitions and results.

2.1. Probability Spaces and Their Diagrams

2.1.1. Probability Spaces

By a finite probability space, we mean a set with a probability measure that has finite support. A reduction from one probability space to another is an equivalence class of measure-preserving maps. Two maps are equivalent if they coincide on a set of full measures. We call a point x in a probability space

X = (\underset{̲}{X}, p)

an atom if it has positive weight, and we write

x \in X

to mean x is an atom in X (as opposed to

x \in \underset{̲}{X}

for points in the underlying set). For a probability space X, we denote by

| X |

the cardinality of the support of the probability measure.

2.1.2. Indexing Categories

To record the combinatorial structure of commutative diagrams of probability spaces and reductions, we use an object that we call an indexing category. By an indexing category, we mean a finite category

G

such that for any pair of objects

i, j \in G

, there is at most one morphism between them either way. In addition, we will assume it satisfies one additional property that we will describe after introducing some terminology. For a pair of objects

i, j \in G

such that there is a morphism

γ_{i j} : i \to j

, object i will be called an ancestor of j and object j will be called a descendant of i. The subcategory of all descendants of an object

i \in G

is called an ideal generated by i and will be denoted

⌈i⌉

, while we will call the subcategory consisting of all ancestors of i together with all the morphisms in it a co-ideal generated by i and denote it by

⌊i⌋

. (The term filter is also used for a co-ideal in the literature about lattices).

The additional property that an indexing category has to satisfy is that for any pair of objects

i, j \in G

, there exists a minimal common ancestor

\hat{𝚤}

, and

\hat{𝚤}

is an ancestor for both i and j and any other ancestor of them both is also an ancestor of

\hat{𝚤}

; in other words,

G

is an upper semi-lattice.

An equivalent formulation of the property above is the following: the intersection of the co-ideals generated by two objects

i, j \in G

is also a co-ideal generated by some object

\hat{𝚤} \in G

.

Any indexing category

G

is necessarily initial, which means that there exists an initial object, that is an object

i_{0}

such that

G = ⌈i_{0}⌉

.

A fan in a category is a pair of morphisms with the same domain. A fan

(i \leftarrow k \to j)

is called minimal if for any other fan

(i \leftarrow l \to j)

included in a commutative diagram

the vertical arrow must be an isomorphism; in other words, k is a minimal common ancestor of i and j.

For any pair of objects

i, j

in an indexing category

G

, there exists a unique minimal fan

(i \leftarrow \hat{𝚤} \to j)

in

G

.

2.1.3. Diagrams

We denote by

Prob

the category of finite probability spaces and reductions, i.e., the equivalence classes of measure-preserving maps. For an indexing category

G = \{i; γ_{i j}\}

, a

G

-diagram is a functor

X : G \to Prob

. A reduction f from one

G

-diagram

X = \{X_{i}; χ_{i j}\}

to another

Y = \{Y_{i}; υ_{i j}\}

is a natural transformation between the functors. It amounts to a collection of reductions

f_{i} : X_{i} \to Y_{i}

, such that the big diagram consisting of all spaces

X_{i}

,

Y_{i}

and all morphisms

χ_{i j}

,

υ_{i j}

and

f_{i}

is commutative. The category of

G

-diagrams and reductions will be denoted as

Prob 〈G〉

. The construction of diagrams could be iterated; thus, we can consider

H

-diagrams of

G

-diagrams and denote the corresponding category

Prob 〈G〉 〈H〉 = Prob 〈G, H〉

. Every

H

-diagram of a

G

-diagram can also be considered as

G

-diagrams of

H

-diagrams; thus, there is a natural equivalence of categories

Prob 〈G, H〉 ≅ Prob 〈H, G〉

.

A

G

-diagram

X

will be called minimal if it maps minimal fans in

G

to minimal fans in the target category. The subspace of all minimal

G

-diagrams will be denoted

Prob {〈G〉}_{m}

. In [1], we have shown that for any fan in

Prob

or in

Prob 〈G〉

, its minimization exists and is unique up to isomorphism.

2.1.4. Tensor Product

The tensor product of two probability spaces

X = (\underset{̲}{X}, p)

and

Y = (\underset{̲}{Y}, q)

is their independent product

X \otimes Y : = (\underset{̲}{X} \times \underset{̲}{Y}, p \otimes q)

. For two

G

-diagrams

X = \{X_{i}; χ_{i j}\}

and

Y = \{Y_{i}; υ_{i j}\}

, we define their tensor product to be

X \otimes Y = \{X_{i} \otimes Y; χ_{i j} \times υ_{i j}\}

.

2.1.5. Constant Diagrams

Given an indexing category

G

and a probability space, we can form a constant diagram

X^{G}

that has all spaces equal to X and all reductions equal to the identity isomorphism. Sometimes, when such a constant diagram is included in a diagram with other

G

-diagrams (such as, for example, a reduction

X \to X^{G}

), we will write simply X in place of

X^{G}

.

2.1.6. Entropy

Evaluating entropy on every space in a

G

-diagram, we obtain a tuple of non-negative numbers indexed by objects in

G

; thus, entropy gives a map

{Ent}_{*} : Prob 〈G〉 \to R^{G},

where the target space

R^{G}

is a space of real-valued functions on the set of objects in

G

endowed with the

ℓ^{1}

-norm. Entropy is a homomorphism in that it satisfies

{Ent}_{*} (X \otimes Y) = {Ent}_{*} (X) + {Ent}_{*} (Y) .

2.1.7. Entropy Distance

Let

G

be an indexing category and

K = (X \leftarrow Z \to Y)

be a fan of

G

-diagrams. We define the entropy distance as

kd (K) : = {∥{Ent}_{*} Z - {Ent}_{*} X∥}_{1} + {∥{Ent}_{*} Z - {Ent}_{*} Y∥}_{1} .

The intrinsic entropy distance between two

G

-diagrams is defined to be the infimal entropy distance of all fans with terminal diagrams

X

and

Y

:

k (X, Y) : = inf \{kd (K) : K = (X \leftarrow Z \to Y)\} .

The intrinsic entropy distance was introduced in [10,11] for probability spaces.

In [1], it is shown that the infimum is attained, that the optimal fan is minimal, that

k

is a pseudo-distance, which vanishes if, and only if,

X

and

Y

are isomorphic, and that

{Ent}_{*}

is a 1-Lipschitz linear functional with respect to

k

.

2.2. Diagrams of Sets, Distributions, and Empirical Reductions

2.2.1. Distributions on Sets

For a set S, we denote by

Δ S

the collection of all finitely-supported probability distributions on S. For a pair of distributions

π_{1}, π_{2} \in Δ S

, we denote by

{∥π_{1} - π_{2}∥}_{1}

the total variation distance between them.

For a map

f : S \to S^{'}

between two sets, we denote by

f_{*} : Δ S \to Δ S^{'}

the induced affine map (the map-preserving convex combinations).

For

n \in N

, we define the empirical map

q : S^{n} \to Δ S

by the assignment below. For

\bar{s} = (s_{1}, \dots, s_{n}) \in S^{n}

and

A \subset S

, we define

q (\bar{s}) (A) : = \frac{1}{n} \cdot | \{k : s_{k} \in A\} | .

For a finite probability space

X = (S, p)

, the empirical distribution on

Δ X

is the push-forward

τ_{n} : = q_{*} p^{\otimes n}

. Thus,

q : X^{n} \to (Δ X, τ_{n})

is a reduction of finite probability spaces. The construction of empirical reduction is functorial, which is for a reduction between two probability spaces

f : X \to Y

, the diagram of the reductions

commutes.

2.2.2. Distributions on Diagrams of Sets

Let

Set

denote the category of sets and surjective maps. For an indexing category

G

, we denote by

Set 〈G〉

the category of

G

-diagrams in

Set

. The objects in

Set 〈G〉

are commutative diagrams of sets indexed by

G

, and the spaces in such a diagram are sets, where the arrows represent surjective maps, subject to commutativity relations.

For a diagram of sets

S = \{S_{i}; σ_{i j}\}

, we define the space of distributions on the diagram

S

by

Δ S : = \{(π_{i}) \in \prod_{i} Δ S_{i} : {(σ_{i j})}_{*} π_{i} = π_{j}\} .

If

S_{0}

is the initial set of

S

, then there is an isomorphism of

\begin{matrix} Δ S_{0} & \overset{≅}{\leftrightarrow} Δ S \\ Δ S_{0} ∋ π_{0} & \mapsto \{{(σ_{0 i})}_{*} π_{0}\} \in Δ S \\ Δ S_{0} ∋ π_{0} & ↤ \{π_{i}\} \in Δ S . \end{matrix}

(1)

Given a

G

-diagram of sets

S = \{S_{i}; σ_{i j}\}

and an element

π \in Δ S

, we can construct a

G

-diagram of probability spaces

(S, π) : = \{(S_{i}, π_{i}); σ_{i j}\}

. Note that any diagram

X

of probability spaces has this form.

2.3. Conditioning

Consider a

G

-diagram of probability spaces

X = (S, π)

, where

S

is a diagram of sets and

π \in Δ S

. Let

X_{0} = (S_{0}, π_{0})

be the initial space in

X

and

U : = X_{i}

be another space in

X

. Since

S_{0}

is initial, there is a map

σ_{0, i} : S_{0} \to S_{i}

. Fix an atom

u \in U

and define the conditioned distribution

π_{0} (\cdot | u)

on

S_{0}

as the distribution supported in

σ_{0, i}^{- 1} (u)

and for every

s \in σ_{0, i}^{- 1} (u)

defined by

π_{0} (s | u) : = \frac{π_{0} (s)}{π_{0} (σ_{0, i}^{- 1} (u))} .

Let

π (\cdot | u) \in Δ S

be the distribution corresponding to

π_{0} (\cdot | u)

under the isomorphism in (1). We define the conditioned

G

-diagram as

X | u : = (S, π (\cdot | u))

.

2.4. The Slicing Lemma

In [1], we prove the so-called Slicing Lemma that allows us to estimate the intrinsic entropy distance between two diagrams in terms of distances between conditioned diagrams. Among the corollaries of the Slicing Lemma is the following inequality.

Proposition 1.

Let

(X \leftarrow \hat{X} \to U^{G}) \in Prob 〈G, Λ_{2}〉

be a fan of

G

-diagrams of probability spaces and

Y \in Prob 〈G〉

be another diagram. Then,

k (X, Y) \leq \int_{U} k (X | u, Y) d p (u) + 2 [[G]] \cdot Ent U .

The fan in the assumption of the proposition above can often be constructed in the following manner. Suppose

X

is a

G

-diagram and

U : = X_{ι}

is a space in it for some

ι \in G

. We can construct a fan

(X \overset{f}{\leftarrow} \hat{X} \overset{g}{\to} U^{G}) \in Prob 〈G, Λ_{2}〉

by assigning

{\hat{X}}_{i}

to be the initial space of the (unique) minimal fan in

X

with terminal spaces

X_{i}

and U and

f_{i}

and

g_{i}

to be left and right reductions in that fan for any

i \in G

.

2.5. Tropical Diagrams

A detailed discussion of the topics in this section can be found in [2].

The asymptotic entropy distance between two diagrams of the same combinatorial type is defined by

κ (X, Y) : = lim \frac{1}{n} k (X^{n}, Y^{n}) .

A tropical

G

-diagram is an equivalence class of certain sequences of

G

-diagrams of probability spaces. Below, we describe the type of sequences and the equivalence relation.

A function

φ : R \geq 1 \to R \geq 0

is called an admissible function if

φ

is non-decreasing and there is a constant

D_{φ}

, such that for any

t \geq 1

:

t \cdot \int_{t}^{\infty} \frac{φ (s)}{s^{2}} d s \leq D_{φ} \cdot φ (t) .

An example of an admissible function will be

φ (t) = t^{α}

for

α \in [0, 1)

.

A sequence

\bar{X} = (X (n) : n \in N_{0})

of diagrams of probability spaces will be called quasi-linear with the defect bounded by an admissible function

φ

if for some

C > 0

and all

m, n \in N

, it satisfies

κ (X (n + m), X (n) \otimes X (m)) \leq C \cdot φ (n + m) .

For example for a diagram

X

, the sequence

\vec{X} : = (X^{n} : n \in N_{0})

is

φ

-quasi-linear for

φ \equiv 0

(and for any admissible

φ

). Sequences with zero defect are called linear, and the space of all linear sequences in

Prob 〈G〉

is denoted by

L (Prob 〈G〉)

.

The asymptotic entropic distance between two

φ

-quasi-linear sequences

\bar{X} = (X (n) : n \in N_{0})

and

\bar{Y} = (Y (n) : n \in N_{0})

is defined to be

κ (\bar{X}, \bar{Y}) : = lim_{n \to \infty} \frac{1}{n} k (X (n), Y (n)),

and the sequences are called asymptotically equivalent if

κ (\bar{X}, \bar{Y}) = 0

. An equivalence class of a sequence

\bar{X}

will be denoted as

[X]

, and the totality of all the classes as

Prob [G]

. We have shown in [2] that the space of equivalence classes of

φ

-quasi-linear sequences does not depend on the choice of a non-zero admissible function

φ

.

The sum of two such equivalence classes is defined to be the equivalence class of the sequence obtained by tensor-multiplying representative sequences of the summands term-wise. In addition, there is a doubly transitive action of

R_{\geq 0}

on

Prob [G]

. In [2], the following theorem is proven.

Theorem 1.

Let

G

be an indexing category. Then:

1.: The space $Prob [G]$ does not depend on the choice of a positive admissible function φ up to isometry.
2.: The space $Prob [G]$ is metrically complete.
3.: The map $X \mapsto \vec{X}$ is a $κ$ - $κ$ -isometric embedding. The space of linear sequences, i.e., the image of the map above, is dense in $Prob [G]$ .
4.: There is a distance-preserving homomorphism from $Prob [G]$ into a Banach space B, whose image is a closed convex cone in B.
5.: The entropy functional

$\begin{matrix} {Ent}_{*} : Prob [G] & \to R^{G} \\ [{(X (n))}_{n \in N_{0}}] & \mapsto lim_{n \to \infty} \frac{1}{n} {Ent}_{*} X (n) \end{matrix}$

is a well-defined 1-Lipschitz linear map.

2.6. Asymptotic Equipartition Property for Diagrams

Among all

G

-diagrams, there is a special class of maximally symmetric ones. We call such diagrams homogeneous; see below for the definition. Homogeneous diagrams come in very handy in many considerations, because their structure is easier to describe than that of general diagrams. We show below that among the tropical diagrams, those that have homogeneous representatives are dense. It means, in particular, that when considering continuous functionals in the space of diagrams, it suffices to only study them in the space of all homogeneous diagrams.

2.6.1. Homogeneous Diagrams

A

G

-diagram

X

is called homogeneous if the automorphism group

Aut (X)

acts transitively on every space in

X

, by which we mean that the action is transitive on the support of the probability measure. Homogeneous probability spaces are isomorphic to uniform spaces. For more complex indexing categories, this simple description is not sufficient.

2.6.2. Tropical Homogeneous Diagrams

The subcategory of all homogeneous

G

-diagrams will be denoted

Prob {〈G〉}_{h}

, and we write

Prob {〈G〉}_{h, m}

for the category of minimal homogeneous

G

-diagrams. These spaces are invariant under the tensor product; thus, they are metric Abelian monoids, and the general "tropicalization" described in [2] can be performed. Passing to the tropical limit, we obtain spaces of tropical (minimal) homogeneous diagrams, which we denote by

Prob {[G]}_{h}

and

Prob {[G]}_{h, m}

, respectively.

2.6.3. Asymptotic Equipartition Property

For an indexing category

G

, denote by

[[G]]

the number of objects in

G

. In [1], the following theorem is proven.

Theorem 2.

Suppose

X \in Prob 〈G〉

is a

G

-diagram of probability spaces for some fixed indexing category

G

. Then, there exists a sequence

\bar{H} = {(H_{n})}_{n = 0}^{\infty}

of homogeneous

G

-diagrams, such that

\frac{1}{n} k (X^{n}, H_{n}) \leq C (| X_{0} |, [[G]]) \cdot \sqrt{\frac{{ln}^{3} n}{n}},

(2)

where

C (| X_{0} |, [[G]])

is a constant only depending on

| X_{0} |

and

[[G]]

.

The approximating sequence of homogeneous diagrams is evidently quasi-linear with the defect bounded by the admissible function

φ (t) : = 2 C (| X_{0} |, [[G]]) \cdot t^{3 / 4} \geq 2 C (| X_{0} |, [[G]]) \cdot t^{1 / 2} \cdot {ln}^{3 / 2} t .

Thus, Theorem 2 above states that

L (Prob 〈G〉) \subset Prob {[G]}_{h}

. On the other hand, we have shown in [2] that the space of linear sequences

L (Prob 〈G〉)

is dense in

Prob [G]

. Combining the two statements, we obtain the following theorem.

Theorem 3.

For any indexing category

G

, the space

Prob {[G]}_{h}

is dense in

Prob [G]

. Similarly, the space

Prob {[G]}_{h, m}

is dense in

Prob {[G]}_{m}

.

3. Conditioning of Tropical Diagrams

3.1. Motivation

Let

X \in Prob 〈G〉

be a

G

-diagram of probability spaces containing probability space

U = X_{i_{0}}

indexed by an object

i_{0} \in G

.

Given an atom

u \in U

, we can define a conditioned diagram

X | u

. If the diagram

X

is homogeneous, then the isomorphism class of

X | u

is independent of u, so that

(X | u : u \in U)

is a constant family. On the other hand, we have shown that the power of any diagram can be approximated by homogeneous diagrams, thus suggesting that in the tropical setting

X | U

should be a well-defined tropical diagram, rather than a family. Below, we give a definition of the tropical conditioning operation and prove its consistency.

3.2. Classical-Tropical Conditioning

Here, we define the operation of conditioning of the classical diagram, such that the result is a tropical diagram. Let

X

be a

G

-diagram of probability spaces and U be a space in

X

. We define the conditioning map

[\cdot | \cdot] : Prob 〈G〉 \to Prob [G]

by conditioning

X

by

u \in U

and averaging the corresponding tropical diagrams:

[X | U] : = \int_{u \in U} \vec{(X | u)} d p_{U} (u),

where

\vec{(X | u)}

is the tropical diagram represented by a linear sequence generated by

X | u

; see Section 2.5. Note that the integral on the right-hand side is just a finite convex combination of tropical diagrams. Expanding all the definitions, we will obtain for

[Y] : = [X | U]

, the representative sequence

Y (n) = ⨂_{u \in U} {(X | u)}^{⌊ n \cdot p (u) ⌋} .

3.3. Properties

3.3.1. Conditioning of Homogeneous Diagrams

If the diagram

X

is homogeneous, then for any atom

u \in U

, with a positive weight,

[X | U] \overset{ˇ}{=} \vec{(X | u)} .

3.3.2. Entropy

By definition, the conditioned entropy is

{Ent}_{*} (X | U) : = \int_{U} {Ent}_{*} (X | u) d p_{U} (u) .

Now that

[X | U]

is a tropical diagram, the expression

{Ent}_{*} (X | U)

can be interpreted in two, a priori different, ways: by the formula above and as the entropy of the object introduced in the previous subsection. Fortunately, the numeric value of it does not depend on the interpretation since the entropy is a linear functional on

Prob [G]

.

3.3.3. Additivity

If

X

and

Y

are two

G

-diagrams with

U : = X_{ι}

,

V : = Y_{ι}

for some

ι \in G

, then

[(X \otimes Y) | (U \otimes V)] = [X | U] + [Y | V] .

Proof.

\begin{matrix} [(X \otimes & Y) | (U \otimes V)] = \int_{U \otimes V} \vec{(X \otimes Y) | (u, v)} d p (u) d p (v) \\ = \int_{U \otimes V} (\vec{X | u} + \vec{Y | v}) d p (u) d p (v) = \int_{U} \vec{X | u} d p (u) + \int_{V} \vec{Y | v} d p (v) \\ = [X | U] + [Y | V] \end{matrix}

□

3.3.4. Homogeneity

It follows that for any diagram

X

, a space U in

X

and

n \in N_{0}

holds

[X^{n} | U^{n}] = n \cdot [X | U] .

3.4. Continuity and Lipschitz Property

Proposition 2.

Let

G

be an indexing category,

X, Y \in Prob 〈G〉

be two

G

diagrams, and

U : = X_{ι}

and

V : = Y_{ι}

be two spaces in

X

and

Y

, respectively, indexed by some

ι \in G

. Then,

κ ([X | U], [Y | V]) \leq (2 \cdot [[G]] + 1) \cdot k (X, Y) .

Using the homogeneity property of conditioning, Section 3.3.4, we can obtain the following stronger inequality.

Corollary 1.

In the setting of Proposition 2, the following holds:

κ ([X | U], [Y | V]) \leq (2 \cdot [[G]] + 1) \cdot κ (X, Y) .

Before we prove Proposition 2, we will need some preparatory lemmas.

Lemma 1.

Let

A

be a

G

-diagram of probability spaces and E be a space in it. Let

q : E^{n} \to (Δ E, τ_{n})

be the empirical reduction. Then, for any

n \in N

and any

\bar{e}, {\bar{e}}^{'} \in, E^{n}

k (A^{n} | \bar{e}, A^{n} | {\bar{e}}^{'}) \leq n \cdot ∥ {Ent}_{*} {(A) ∥}_{1} \cdot {∥ q (\bar{e}) - q ({\bar{e}}^{'}) ∥}_{1} .

Proof.

To prove the lemma, we construct a coupling between

A^{n} | \bar{e}

and

A^{n} | {\bar{e}}^{'}

in the following manner. Note that there exists a permutation

σ \in S_{n}

, such that

| \{i : e_{i} \neq e_{σ i}^{'}\} | = \frac{n}{2} \cdot {∥ q (\bar{e}) - q ({\bar{e}}^{'}) ∥}_{1} .

Let

\begin{matrix} I & = \{i : e_{i} = e_{σ i}^{'}\} \\ \tilde{I} & = \{i : e_{i} \neq e_{σ i}^{'}\} . \end{matrix}

Using that

| \tilde{I} | = \frac{n}{2} \cdot {∥ q (\bar{e}) - q ({\bar{e}}^{'}) ∥}_{1}

, we can estimate

\begin{matrix} k (A^{n} | \bar{e}, A^{n} | {\bar{e}}^{'}) & = k (⨂_{i = 1}^{n} (A | e_{i}), ⨂_{i = 1}^{n} (A | e_{σ i}^{'})) \\ \leq \sum_{i \in I} kd (A | e_{i} \overset{=}{⟷} A | e_{σ i}^{'}) + \sum_{i \in \tilde{I}} kd (A | e_{i} \overset{\otimes}{⟷} A | e_{σ i}^{'}) \\ \leq n \cdot ∥ {Ent}_{*} {(A) ∥}_{1} \cdot {∥ q (\bar{e}) - q ({\bar{e}}^{'}) ∥}_{1}, \end{matrix}

where

A \overset{=}{\leftrightarrow} B

denotes the isomorphism coupling of two naturally isomorphic diagrams, while

A \overset{\otimes}{\leftrightarrow} B

denotes the “independence” coupling. □

Lemma 2.

Let

A

be a

G

-diagram of probability spaces and E be a space in

A

. Then,

\int_{E^{n}} k (A^{n}, A^{n} | \bar{e}) d p (\bar{e}) \leq 2 n \cdot [[G]] \cdot Ent (E) + o (n) .

Proof.

First, we apply Proposition 1 slicing the first argument:

\begin{matrix} \int_{E^{n}} k (A^{n}, A^{n} | \bar{e}) d p (\bar{e}) \\ \leq \int_{E^{n}} \int_{E^{n}} k (A^{n} | {\bar{e}}^{'}, A^{n} | \bar{e}) d p ({\bar{e}}^{'}) d p (\bar{e}) + 2 n \cdot [[G]] \cdot Ent (E) . \end{matrix}

We will argue now that the double integral on the right-hand side grows sub-linearly with n. We estimate the double integral by applying Lemma 1 to the integrand

\begin{matrix} \int_{E^{n}} \int_{E^{n}} k (A^{n} | {\bar{e}}^{'}, A^{n} | \bar{e}) d p ({\bar{e}}^{'}) d p (\bar{e}) \\ \leq \int_{E^{n}} \int_{E^{n}} n \cdot [[G]] \cdot | {Ent}_{*} {(A) |}_{1} \cdot {| q (\bar{e}) - q ({\bar{e}}^{'}) |}_{1} d p ({\bar{e}}^{'}) d p (\bar{e}) \\ = n \cdot [[G]] \cdot | {Ent}_{*} {(A) |}_{1} \cdot \int_{Δ E} \int_{Δ E} {| π - π^{'} |}_{1} d τ_{n} (π) d τ_{n} (π^{'}) = o (n), \end{matrix}

where the convergence to zero of the last double integral follows from Sanov’s theorem. □

Corollary 2.

Let

A

be a

G

-diagram and E a probability space included in

A

. Then,

κ (\vec{A}, [A | E]) \leq 2 [[G]] \cdot Ent (E) .

Proof.

Let

n \in N

. Then,

\begin{matrix} κ (\vec{A}, [A | E]) & = \frac{1}{n} κ (\vec{A^{n}}, [A^{n} | E^{n}]) \\ = \frac{1}{n} κ (\vec{A^{n}}, \int_{E^{n}} \vec{A^{n} | \bar{e}} d p (\bar{e})) \\ \leq \frac{1}{n} \int_{E^{n}} κ (\vec{A^{n}}, \vec{A^{n} | \bar{e}}) d p (\bar{e}) \\ = \frac{1}{n} \int_{E^{n}} κ (A^{n}, A^{n} | \bar{e}) d p (\bar{e}) \\ \leq 2 \cdot [[G]] \cdot Ent (E) + o (n^{0}), \end{matrix}

where we used Lemma 2 and the fact that

κ \leq k

in the last line. We finish the proof by taking the limit

n \to \infty

. □

Proof of Proposition 2.

We start with a note on general terminology: a reduction

f : A \to B

of probability spaces can also be considered as a fan

F : = (A \overset{=}{\leftarrow} A \overset{f}{\to} B)

. Then, the entropy distance of f is

kd (f) : = kd (F) = Ent A - Ent B .

If the reduction f is a part of a bigger diagram containing also space U, then the following inequality holds:

\int_{U} kd (f | u) d p (u) \leq kd (f) .

Let

K \in Prob 〈G, Λ_{2}〉

K = (X \overset{f}{\leftarrow} Z \overset{g}{\to} Y) \in Prob 〈G, Λ_{2}〉 = Prob 〈Λ_{2}, G〉

be an optimal coupling between

X

and

Y

. It can also be viewed as a

G

-diagram of two fans,

K = {\{K_{i}\}}_{i \in G}

, each of which is a minimal coupling between

X_{i}

and

Y_{i}

. Among them is the minimal fan

W : = K_{ι} = (U \overset{f_{ι}}{⟵} W \overset{g_{ι}}{⟶} V)

.

We use the triangle inequality to bound the distance

κ ([X | U], [Y | V])

by four summands as follows:

\begin{matrix} κ ([X | U], [Y | V]) \leq & κ ([X | U], [Z | U]) + κ ([Z | U], [Z | W]) + \\ κ ([Z | W], [Z | V]) + κ ([Z | V], [Y | V]) . \end{matrix}

We will estimate each of the four summands separately. The bound for the first one is as follows:

\begin{matrix} κ ([X | U], [Z | U]) = κ (\int_{U} \vec{X | u} d p (u), \int_{U} \vec{Z | u} d p (u)) \\ \leq \int_{U} κ (\vec{X | u}, \vec{Z | u}) d p (u) = \int_{U} κ (X | u, Z | u) d p (u) \\ \leq \int_{U} k (X | u, Z | u) d p (u) \leq \int_{U} kd (f | u) d p (u) \\ \leq \sum_{i \in G} \int_{U} kd (f_{i} | u) d p (u) = \sum_{i \in G} kd (f_{i}) = kd (f) . \end{matrix}

An analogous calculation shows that

κ ([Z | V], [Y | V]) \leq kd (g) .

To bound the second summand, we will use Corollary 2:

\begin{matrix} κ ([Z | U], [Z | W]) & = κ (\int_{U} \vec{Z | u} d p (u), \int_{W} \vec{Z | w} d p (w)) \\ = κ (\int_{U} \vec{Z | u} d p (u), \int_{U} \int_{W | u} \vec{Z | w} d p (w | u) d p (u)) \\ \leq \int_{U} κ (\vec{Z | u}, \int_{W | u} \vec{Z | w} d p (w | u)) d p (u) . \end{matrix}

We will now use Corollary 2 with

A = Z | u

and

E = W | u

to estimate the integrand. Then,

\begin{matrix} κ ([Z | U], [Z | W]) & = \int_{U} κ (\vec{Z | u}, \int_{W | u} \vec{Z | w} d p (w | u)) d p (u) \\ \leq 2 [[G]] \cdot \int_{U} Ent (W | u) d p (u) \\ \leq 2 [[G]] \cdot Ent (W | U) \leq 2 [[G]] \cdot kd (f) . \end{matrix}

Similarly,

κ ([Z | W], [Z | V]) \leq 2 [[G]] \cdot kd (g) .

Combining the estimates, we obtain

κ ([X | U], [Y | V]) \leq (2 [[G]] + 1) \cdot (kd (f) + kd (g)) = (2 [[G]] + 1) \cdot k (X, Y) .

□

3.5. Tropical Conditioning

Let

[X]

be a tropical

G

-diagram and

[U] = [X_{ι}]

for some

ι \in G

. Choose a representative

{(X (n))}_{n \in N_{0}}

and denote

u (n) : = X_{ι} (n)

. We define now a conditioned diagram

[X | U]

by the following limit:

[X | U] : = lim_{n \to \infty} \frac{1}{n} [X (n) | U (n)] .

Proposition 1 guarantees that the limit exists and is independent of the choice of representative. For a fixed

ι \in G

, the conditioning is a linear Lipschitz map of

[\cdot | \cdot_{ι}] : Prob [G] \to Prob [G] .

Author Contributions

Investigation, R.M. and J.W.P. All authors have read and agreed to the published version of the manuscript.

Funding

Open Access funding was provided by the Max Planck Society.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Matveev, R.; Portegies, J.W. Asymptotic dependency structure of multiple signals. Inf. Geom. 2018, 1, 237–285. [Google Scholar] [CrossRef]
Matveev, R.; Portegies, J.W. Tropical diagrams of probability spaces. Inf. Geom. 2020, 3, 61–88. [Google Scholar] [CrossRef]
Ay, N.; Bertschinger, N.; Der, R.; Güttler, F.; Olbrich, E. Predictive information and explorative behavior of autonomous robots. Eur. Phys. J. B 2008, 63, 329–339. [Google Scholar] [CrossRef]
Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
Friston, K. The free-energy principle: A rough guide to the brain? Trends Cogn. Sci. 2009, 13, 293–301. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Steudel, B.; Ay, N. Information-theoretic inference of common ancestors. Entropy 2015, 17, 2304–2327. [Google Scholar] [CrossRef]
Dijk, S.G.V.; Polani, D. Informational constraints-driven organization in goal-directed behavior. Adv. Complex Syst. 2013, 16, 1350016. [Google Scholar] [CrossRef]
Matveev, R.; Portegies, J.W. Tropical probability theory and an application to the entropic cone. Kybernetika 2020, 56, 1133–1153. [Google Scholar] [CrossRef]
Kovačević, M.; Stanojević, I.; Šenk, V. On the hardness of entropy minimization and related problems. In Proceedings of the 2012 IEEE Information Theory Workshop, Lausanne, Switzerland, 3–7 September 2012; pp. 512–516. [Google Scholar]
Vidyasagar, M. A metric between probability distributions on finite sets of different cardinalities and applications to order reduction. IEEE Trans. Autom. Control 2012, 57, 2464–2477. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Matveev, R.; Portegies, J.W. Conditioning in Tropical Probability Theory. Entropy 2023, 25, 1641. https://doi.org/10.3390/e25121641

AMA Style

Matveev R, Portegies JW. Conditioning in Tropical Probability Theory. Entropy. 2023; 25(12):1641. https://doi.org/10.3390/e25121641

Chicago/Turabian Style

Matveev, Rostislav, and Jacobus W. Portegies. 2023. "Conditioning in Tropical Probability Theory" Entropy 25, no. 12: 1641. https://doi.org/10.3390/e25121641

APA Style

Matveev, R., & Portegies, J. W. (2023). Conditioning in Tropical Probability Theory. Entropy, 25(12), 1641. https://doi.org/10.3390/e25121641

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Conditioning in Tropical Probability Theory

Abstract

1. Introduction

2. Preliminaries

2.1. Probability Spaces and Their Diagrams

2.1.1. Probability Spaces

2.1.2. Indexing Categories

2.1.3. Diagrams

2.1.4. Tensor Product

2.1.5. Constant Diagrams

2.1.6. Entropy

2.1.7. Entropy Distance

2.2. Diagrams of Sets, Distributions, and Empirical Reductions

2.2.1. Distributions on Sets

2.2.2. Distributions on Diagrams of Sets

2.3. Conditioning

2.4. The Slicing Lemma

2.5. Tropical Diagrams

2.6. Asymptotic Equipartition Property for Diagrams

2.6.1. Homogeneous Diagrams

2.6.2. Tropical Homogeneous Diagrams

2.6.3. Asymptotic Equipartition Property

3. Conditioning of Tropical Diagrams

3.1. Motivation

3.2. Classical-Tropical Conditioning

3.3. Properties

3.3.1. Conditioning of Homogeneous Diagrams

3.3.2. Entropy

3.3.3. Additivity

3.3.4. Homogeneity

3.4. Continuity and Lipschitz Property

3.5. Tropical Conditioning

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI