Conditioning in Tropical Probability Theory

We define a natural operation of conditioning of tropical diagrams of probability spaces and show that it is Lipschitz continuous with respect to the asymptotic entropy distance.


Introduction
In [MP18], [MP19a] we have initiated the study of tropical probability spaces and their diagrams: In [MP18] we endowed (commutative) diagrams of probability spaces with the intrinsic entropy distance and in [MP19a] we defined tropical diagrams as points in the asymptotic cone of the metric space.They are represented by certain sequences of diagrams of probability spaces.
We expect that tropical diagrams will be helpful in the study of information optimization problems, and we have indeed applied them to derive a dimensionreduction result for the shape of the entropic cone in [MP19b].
In the present article we introduce the notion of conditioning on a space in a tropical diagram and show that the operation is Lipschitz-continuous with respect to the asymptotic entropy distance.
It is a rather technical result, and we have therefore decided to treat it in this separate article, but it is an important ingredient in the theory, and in particular we need it for the dimension-reduction result mentioned before.
Given a tuple of finite-valued random variables (X i ) n i=1 and a random variable Y, one may "condition" the collection (X i ) on Y.The result of this operation is a family of n-tuples of random variables denoted (X i Y) n i=1 parameterized by those values of Y that have positive probability.Each tuple of random variable in this family is defined on a separate probability space.
When passing to the tropical setting the situation is different in the sense that when we condition a tropical diagram [X ] on a space [Y ], the result is again a tropical diagram [X Y ] rather than a family.After recalling some preliminaries in Section 2, we describe the operation of conditioning and prove that the result depends in a Lipschitz way on the original diagram in Section 3.

Preliminaries
Our main objects of study are commutative diagrams of probability spaces and their tropical counterparts.In this section we recall briefly the main definitions and results.
2.1.1.Probability spaces.By a finite probability space we mean a set with a probability measure, that has finite support.A reduction from one probability space to another is an equivalence class of measure-preserving maps.Two maps are equivalent, if they coincide on a set of full measure.We call a point x in a probability space X = (X, p) an atom if it has positive weight and we write x ∈ X to mean x is an atom in X (as opposed to x ∈ X for points in the underlying set).For a probability space X we denote by X the cardinality of the support of the probability measure.
2.1.2.Indexing categories.To record the combinatorial structure of a commutative diagrams of probability spaces and reductions we use an object that we call an indexing category.By an indexing category we mean a finite category G such that for any pair of objects i, j ∈ G there is at most one morphism between them either way.In addition, we will assume it satisfies one additional property that we will describe after introducing some terminology.For a pair of objects i, j ∈ G such that there is a morphism γ ij ∶ i → j, object i will be called an ancestor of j and object j will be called a descendant of i.The subcategory of all descendants of an object i ∈ G is called an ideal generated by i and will be denoted ⌈i⌉, while we will call the subcategory consisting of all ancestors of i together with all the morphisms in it a co-ideal generated by i and denote it by ⌊i⌋.(The term filter is also used for co-ideal in the literature about lattices) The additional property that an indexing category has to satisfy is that for any pair of objects i, j ∈ G there exists a minimal common ancestor î, that is î is an ancestor for both i and j and any other ancestor of them both is also an ancestor of î.
An equivalent formulation of the property above is the following: the intersection of the co-ideals generated by two objects i, j ∈ G is also a co-ideal generated by some object î ∈ G.
Any indexing category G is necessarily initial, which means that there exists an initial object, that is an object i 0 such that G = ⌈i 0 ⌉.
A fan in a category is a pair of morphisms with the same domain.A fan the vertical arrow must be an isomorphism.
For any pair of objects i, j in an indexing category G there exists a unique minimal fan (i ← î → j) in G.
2.1.3.Diagrams.We denote by Prob the category of finite probability spaces and reductions.For an indexing category G = {i; is a natural transformation between the functors.It amounts to a collection of reductions f i ∶ X i → Y i such that the big diagram consisting of all spaces X i , Y i and all morphisms χ ij , υ ij and f i is commutative.The category of G-diagrams and reductions will be denoted Prob ⟨G⟩.The construction of diagrams could be iterated, thus we can consider H-diagrams of G-diagrams and denote the corresponding category Prob ⟨G⟩ ⟨H⟩ = Prob ⟨G, H⟩.Every H-diagram of G-diagrams can also be considered as G-diagram of H-diagrams, thus there is a natural equivalence of categories Prob ⟨G, H⟩ ≅ Prob ⟨H, G⟩.
A G-diagram X will be called minimal if it maps minimal fans in G to minimal fans in the target category.The subspace of all minimal G-diagrams will be denoted Prob ⟨G⟩ m .In [MP18] we have shown that for any fan in Prob or in Prob ⟨G⟩ its minimization exists and is unique up to isomorphism.
2.1.4.Tensor product.The tensor product of two probability spaces X = (X, p) and Y = (Y , q) is their independent product, X ⊗ Y ∶= (X × Y , p ⊗ q) .For two 2.1.5.Constant diagrams.Given an indexing category G and a probability space we can form a constant diagram X G that has all spaces equal to X and all reductions equal to the identity isomorphism.Sometimes when such constant diagram is included in a diagram with another G-diagrams (such as, for example, a reduction X → X G ) we will write simply X in place of X G .2.1.6.Entropy.Evaluating entropy on every space in a G-diagram we obtain a tuple of non-negative numbers indexed by objects in G, thus entropy gives a map Ent * ∶ Prob ⟨G⟩ → R G where the target space R G is a space of real-valued functions on the set of objects in G endowed with the 1 -norm.Entropy is a homomorphism in that it satisfies 2.1.7.Entropy distance.Let G be an indexing category and K = (X ← Z → Y) be a fan of G-diagrams.We define the entropy distance The intrinsic entropy distance between two G-diagrams is defined to be the infimal entropy distance of all fans with terminal diagrams X and Y The intrinsic entropy distance was introduced in [KS Š12, Vid12] for probability spaces.
In [MP18] it is shown that the infimum is attained, that the optimal fan is minimal, that k is a pseudo-distance which vanishes if and only if X and Y are isomorphic and that Ent * is a 1-Lipschitz linear functional with respect to k.

Diagrams of sets, distributions and empirical reductions.
2.2.1.Distributions on sets.For a set S we denote by ∆S the collection of all finitely-supported probability distributions on S. For a pair of distributions π 1 , π 2 ∈ ∆S we denote by π 1 − π 2 1 the total variation distance between them.
For a map f ∶ S → S ′ between two sets we denote by f * ∶ ∆S → ∆S ′ the induced affine map (the map preserving convex combinations).
For n ∈ N define the empirical map q ∶ S n → ∆S by the assignment below.For s = (s 1 , . . ., s n ) ∈ S n and A ⊂ S set For a finite probability space X = (S, p) the empirical distribution on ∆X is the push-forward τ n ∶= q * p ⊗n .Thus is a reduction of finite probability spaces.The construction of empirical reduction is functorial, that is for a reduction between two probability spaces 2.2.2.Distributions on diagrams of sets.Let Set denote the category of sets and surjective maps.For an indexing category G, we denote by Set ⟨G⟩ the category of G-diagrams in Set.That is, objects in Set ⟨G⟩ are commutative diagrams of sets indexed by G, the spaces in such a diagram are sets and arrows represent surjective maps, subject to commutativity relations.
For a diagram of sets S = {S i ; σ ij } we define the space of distributions on the diagram S by and an element π ∈ ∆S we can construct a G-diagram of probability spaces (S, π) ∶= {(S i , π i ); σ ij }.Note that any diagram X of probability spaces has this form.
2.3.1.The Slicing Lemma.In [MP18] we prove the so-called Slicing Lemma that allows to estimate the intrinsic entropy distance between two diagrams in terms of distances between conditioned diagrams.Among the corollaries of the Slicing Lemma is the following inequality.
The fan in the assumption of the the proposition above can often be constructed in the following manner.Suppose X is a G-diagram and U = X ι is a space in it for some ι ∈ G.We can construct a fan (X f ← X g → U G ) ∈ Prob ⟨G, Λ 2 ⟩ by assigning Xi to be the initial space of the (unique) minimal fan in X with terminal spaces X i and U and f i and g i to be left and right reductions in that fan, for any i ∈ G.

Tropical Diagrams. s:tropical-diagrams
A detailed discussion of the topics in this section can be found in [MP19a].
The asymptotic entropy distance between two diagrams of the same combinatorial type is defined by A tropical G-diagram is an equivalence class of certain sequences of Gdiagrams of probability spaces.Below we describe the type of sequences and the equivalence relation.
A function ϕ ∶ R ≥ 1 → R ≥ 0 is called an admissible function if ϕ is nondecreasing and there is a constant D ϕ such that for any t 2 dt ≤ D ϕ ⋅ ϕ(t) An example of an admissible function will be ϕ(t) = t α , for α ∈ [0, 1).
A sequence X = (X (n) ∶ n ∈ N 0 ) of diagrams of probability spaces will be called quasi-linear with defect bounded by an admissible function ϕ if it satisfies For example for a diagram X , the sequence → X ∶= (X n ∶ n ∈ N 0 ) is ϕ-quasi-linear for ϕ ≡ 0 (and for any admissible ϕ).Such sequences are called linear.
The asymptotic entropic distance between two quasi-linear sequences X = X (n and sequences are called asymptotically equivalent if κ( X , Ȳ) = 0.An equivalence class of a sequence X will be denoted [X ] and the totality of all the classes Prob[G].The sum of two such equivalence classes is defined to be the equivalence class of the sequence obtained by tensor-multiplying representative sequences of the summands term-wise.In addition there is a doubly transitive action of R ≥0 on Prob[G].In [MP19a] the following theorem is proven Theorem 2.2.

p:tropical-summary
Let G be an indexing category.Then is a well-defined 1-Lipschitz linear map.
⊠ 2.5.Asymptotic Equipartition Property for Diagrams.Among all Gdiagrams there is a special class of maximally symmetric ones.We call such diagrams homogeneous, see below for the definition.Homogeneous diagrams come very handy in many considerations, because their structure is easier to describe then that of general diagrams.We show below that among tropical diagrams, those that have homogeneous representatives are dense.It means, in particular, that when considering continuous functionals on the space of diagrams, it suffices to only look at homogeneous diagrams.

Homogeneous diagrams.
A G-diagram X is called homogeneous if the automorphism group Aut(X ) acts transitively on every space in X , by which we mean that the action is transitive on the support of the probability measure.Homogeneous probability spaces are isomorphic to uniform spaces.For more complex indexing categories this simple description is not sufficient.
2.5.2.Tropical Homogeneous Diagrams.The subcategory of all homogeneous G-diagrams will be denoted Prob ⟨G⟩ h and we write Prob ⟨G⟩ h,m for the category of minimal homogeneous G-diagrams.These spaces are invariant under the tensor product, thus they are metric Abelian monoids and the general "tropicalization" described in [MP19a] can be performed.Passing to the tropical limit we obtain spaces of tropical (minimal) homogeneous diagrams, that we denote by Prob[G] h and Prob[G] h,m , respectively.

Asymptotic Equipartition Property. In [MP18] the following theorem is proven
Theorem 2.3.

p:aep-complete
Suppose X ∈ Prob ⟨G⟩ is a G-diagram of probability spaces for some fixed indexing category G. Then there exists a sequence The approximating sequence of homogeneous diagrams is evidently quasilinear with the defect bounded by the admissible function Thus, Theorem 2.3 above states that L(Prob ⟨G⟩) ⊂ Prob[G] h .On the other hand we have shown in [MP19a], that the space of linear sequences L(Prob ⟨G⟩) is dense in Prob [G].Combining the two statements we get the following theorem.

p:aep-tropical
For any indexing category G, the space

Conditioning of Tropical Diagrams
3.1.Motivation.Let X ∈ Prob ⟨G⟩ be a G-diagram of probability spaces containing probability space U = X i 0 indexed by an object i 0 ∈ G.
Given an atom u ∈ U we can define a conditioned diagram X u.If the diagram X is homogeneous, then the isomorphism class of X u is independent of u, so that X u is a constant family.On the other hand we have shown, that the power of any diagram can be approximated by homogeneous diagrams, thus suggesting that in the tropical setting X U should be a well-defined tropical diagram, rather than a family.Below we give a definition of tropical conditioning operation and prove its consistency.

Classical-tropical conditioning.
Here we define the operation of conditioning of classical diagram, such that the result is a tropical diagram.Let X be a G-diagram of probability spaces and U be a space in X .We define the conditioning map by conditioning X by u ∈ U and averaging the corresponding tropical diagrams: where → (X u) is the tropical diagram represented by a linear sequence generated by X u, see section 2.4.Note that the integral on the right-hand side is just a finite convex combination of tropical diagrams.Expanding all the definitions we will get for [Y] ∶= [X U ] the representative sequence 3.3.2.Entropy.Recall that earlier we have defined a quantity Now that [X U ] is a tropical diagram, the expression Ent * (X U ) can be interpreted in two, a priori different, ways: by the formula above and as the entropy of the object introduced in the previous subsection.Fortunately, the numeric value of it does not depend on the interpretation, since the entropy is a linear functional on Prob[G].Let G be a complete poset category, X , Y ∈ Prob ⟨G⟩ be two G diagrams, U ∶= X ι and V ∶= Y ι be two spaces in X and Y, respectively, indexed by some ι ∈ G. Then Using homogeneity property of conditioning, Section 3.3.4,we can obtain the following stronger inequality.

p:cond-lip-aikd
In the setting of Proposition 3.1 holds Before we prove Proposition 3.1 we will need some preparatory lemmas.Lemma 3.3.

p:dist-cond-types
Let A be a G-diagram of probability spaces and E be a space in it.Let q ∶ E n → (∆E, τ n ) be the empirical reduction.Then for any n ∈ N and any ē, ē′ ∈ E n k(A n ē, A n ē′ ) ≤ n ⋅ Ent * (A) 1 ⋅ q(ē) − q(ē ′ ) 1 ⊠ Proof: To prove the lemma we construct a coupling between A n ē and A n ē′ in the following manner.Note that there exists a permutation σ ∈ S n such that {i where A = ↔ B denotes the isomorphism coupling of two naturally isomorphic diagrams, while A ⊗ ↔ B denotes the "independence" coupling.⊠ Lemma 3.4.

p:int-dist-cond
Let A be a G-diagram of probability spaces and E be a space in A. Then Proof: First we apply Proposition 2.1 slicing the first argument We will argue now that the double integral on the right-hand side grows sublinearly with n.We estimate the double integral by applying Lemma 3.3 to the integrand where the convergence to zero of the last double integral follows from Sanov's theorem.⊠ Corollary 3.5.

p:dist-cond
Let A be a G-diagram and E a probability space included in A. Then where we used Lemma 3.4 and the fact that κ ≤ k in the last line.We finish the proof by taking the limit n → ∞. ⊠ Proof (of Proposition 3.1): We start with a note on general terminology: a reduction f ∶ A → B of probability spaces can also be considered as a fan If the reduction f is a part of a bigger diagram containing also space U , then the following inequality holds be an optimal coupling between X and Y.It can also we viewed as a Gdiagram of two-fans, K = {K i } i∈G each of which is a minimal coupling between X i and Y i .Among them is the minimal fan We use triangle inequality to bound the distance κ [X U ], [Y V ] by four summands as follows.
We will estimate each of the four summands separately.The bound for the first one is as follows.for some ι ∈ G. Choose a representative X (n) n∈N 0 and denote u(n) ∶= X ι (n).We define now a conditioned diagram [X U ] by the following limit Proposition 3.2 guarantees, that the limit exists and is independent of the choice of representative.For a fixed ι ∈ G the conditioning is a linear Lipschitz map

( i )
The space Prob[G] does not depend on the choice of a positive admissible function ϕ up to isometry.(ii) The space Prob[G] is metrically complete.(iii) The map X ↦ → X is a κ-κ-isometric embedding.The space of linear sequences, i.e. the image of the map above, is dense in Prob[G].(iv) There is a distance-preserving homomorphism from Prob[G] into a Banach space B, whose image is a closed convex cone in B. (v) The entropy functional