- freely available
- re-usable

*Entropy*
**2015**,
*17*(5),
3253-3318;
doi:10.3390/e17053253

^{1}

^{2}

^{†}

^{*}

## Abstract

**:**We propose that entropy is a universal co-homological class in a theory associated to a family of observable quantities and a family of probability distributions. Three cases are presented: (1) classical probabilities and random variables; (2) quantum probabilities and observable operators; (3) dynamic probabilities and observation trees. This gives rise to a new kind of topology for information processes, that accounts for the main information functions: entropy, mutual-informations at all orders, and Kullback–Leibler divergence and generalizes them in several ways. The article is divided into two parts, that can be read independently. In the first part, the introduction, we provide an overview of the results, some open questions, future results and lines of research, and discuss briefly the application to complex data. In the second part we give the complete definitions and proofs of the theorems A, C and E in the introduction, which show why entropy is the first homological invariant of a structure of information in four contexts: static classical or quantum probability, dynamics of classical or quantum strategies of observation of a finite system.

## 1. Introduction

#### 1.1. What is Information?

“What is information?” is a question that has received several answers according to the different problems investigated. The best known definition was given by Shannon [1], using random variables and a probability law, for the problem of optimal message compression. However, the first definition was given by Fisher, as a metric associated to a smooth family of probability distributions, for optimal discrimination by statistical tests; it is a limit of the Kullback–Leibler divergence, which was introduced to estimate the accuracy of a statistical model of empirical data, and which can be also viewed as a quantity of information. More generally Kolmogorov considered that the concept of information must precede probability theory (cf. [2]). However, Evariste Galois saw the application of group theory for discriminating solutions of an algebraic equation as a first step toward a general theory of ambiguity, that was developed further by Riemann, Picard, Vessiot, Lie, Poincare and Cartan, for systems of differential equations; it is also a theory of information. In another direction Rene Thom claimed that information must have a topological content (see [3]); he gave the example of the unfolding of the coupling of two dynamical systems, but he had in mind the whole domain of algebraic or differential topology.

All these approaches have in common the definition of secondary objects, either functions, groups or homology cycles, for measuring in what sense a pair of objects departs from independency. For instance, in the case of Shannon, the mutual information is I(X; Y) = H (X) + H (Y) − H (X,Y), where H denotes the usual Gibbs entropy (H(X) = − Σ_{x} P(X = x) ln_{2} P(X = x)), and for Galois it is the quotient set IGal(L_{1}; L_{2}|K) = (Gal(L_{1} |K) × Gal(L_{2}|K))/Gal(L|K), where L_{1}, L_{2} are two fields containing a field K in an algebraic closure Ω of K, where L is the field generated by L_{1} and L_{2} in Ω, and where
$Gal\left({L}_{i}|K\right)=\left(\text{for}\phantom{\rule{0.2em}{0ex}}i=\overline{)0},1,2\right)$ denotes the group introduced by Galois, made by the field automorphisms of L_{i} fixing the elements of K.

We suggest that all information quantities are of co-homological nature, in a setting which depends on a pair of categories (cf. [4,5]); one for the data on a system, like random variables or functions of solutions of an equation, and one for the parameters of this system, like probability laws or coefficients of equations; the first category generates an algebraic structure like a monoid, or more generally a monad (cf. [4]), and the second category generates a representation of this structure, as do for instance conditioning, or adding new numbers; then information quantities are co-cycles associated with this module.

We will see that, given a set of random variables on a finite set Ω and a simplicial subset of probabilities on Ω, the entropy appears as the only one universal co-homology class of degree one. The higher mutual information functions that were defined by Shannon are co-cycles (or twisted co-cycles for even orders), and they correspond to higher homotopical constructions. In fact this description is equivalent to the theorem of Hu Kuo Ting [6], that gave a set theoretical interpretation of the mutual information decomposition of the total entropy of a system. Then we can use information co-cycles to describe forms of the information distribution between a set of random data; figures like ordinary links, or chains or Borromean links appear in this context, giving rise to a new kind of topology.

#### 1.2. Information Homology

Here we call random variables (r.v) on a finite set Ω congruent when they define the same partition (remind that a partition of Ω is a family of disjoint non-empty subsets covering Ω and that the partition associated to a r.v X is the family of subsets Ω_{x} of Ω defined by the equations X(ω) = x); the join r.v YZ, also denoted by (Y, Z), corresponds to the less fine partition that is finer than Y and Z. This defines a monoid structure on the set n(Ω) of partitions of Ω, with 1 as a unit, and where each element is idempotent, i.e., ∀X, XX = X. An information category is a set
$\mathcal{S}$ of r.v such that, for any
$Y,Z\in \mathcal{S}$ less fine than
$U\cup \mathcal{S}$, the join YZ belongs to
$\mathcal{S}$, cf. [7]. An ordering on S is given by Y ≤ Z when Z refines Y, which also defines the morphisms Z → Y in the category
$\mathcal{S}$. In what follows we always assume that 1 belongs to
$\mathcal{S}$. The simplex ∆(Ω) is defined as the set of families of numbers {p_{ω}; ω ∊ Ω}, such that ∀ω, 0 ≤ p_{ω} ≤ 1 and Σ_{ω} p_{ω} = 1; it parameterizes all probability laws on Ω. We choose a simplicial sub-complex
$\mathcal{P}$ in Δ(Ω), which is stable by all the conditioning operations by elements of
$\mathcal{S}$. By definition, for N ∊ ℕ, an information N-cochain is a family of measurable functions of
$P\in \mathcal{P}$, with values in ℝ or ℂ, indexed by the sequences (S_{1};…;S_{N}) in
$\mathcal{S}$ majored by an element of
$\mathcal{S}$, whose values depend only of the image law (S_{1}, …, S_{N})_{*}P. This condition is natural from a topos point of view, cf. [4]; we interpret it as a “locality” condition. Note that we write (S_{1}; …; S_{N}) for a sequence, because (S_{1}, …, S_{N}) designates the joint variable. For N = 0 this gives only the constants. We denote by
${\mathcal{C}}^{N}$ the vector space of N-cochains of information. The following formula corresponds to the averaged conditioning of Shannon [1]:

_{0}, and the vertical bar is ordinary conditioning. It satisfies the associativity condition $\left({{S}^{\prime}}_{0}{S}_{0}\right).F={{S}^{\prime}}_{0}.\left({S}_{0}.F\right)$.

The coboundary operator δ is defined by

It corresponds to a standard non-homogeneous bar complex (cf. [5]). Another co-boundary operator on
${\mathcal{C}}^{N}$ is δ_{t} (t for twisted or trivial action or topological complex), that is defined by the above formula with the first term S_{0}.F (S_{1};…; S_{N}; ℙ) replaced by F (S_{1};…; S_{N}; ℙ). The corresponding co-cycles are defined by the equations δF = 0 or δ_{t} F = 0, respectively. We easily verify that δ ○ δ = 0 and δ_{t} ○ δ_{t} = 0; then co-homology
$H*\left(\mathcal{S};\mathbb{P}\right)$ resp.
${H}_{t}^{*}\left(\mathcal{S};\mathbb{P}\right)$ is defined by taking co-cycles modulo the elements of the image of δ resp. δ_{t}, called co-boundaries. The fact that classical entropy H(X; ℙ) = − Σ_{i} p_{i} log_{2} p_{i} is a 1-co-cycle is the fundamental equation H(X, Y) = H(X) + X.H (Y).

**Theorem A.** (cf. Theorem 1 section 2.3, [7]): For the full simplex ∆(Ω), and if
$\mathcal{S}$ is the monoid generated by a set of at least two variables, such that each pair takes at least four values, then the information co-homology space of degree one is one-dimensional and generated by the classical entropy.

**Problem 1.** Compute the homology of higher degrees.

We conjecture that for binary variables it is zero, but that in general non-trivial classes appear, deduced from polylogarithms. This could require us to connect with the works of Dupont, Bloch, Goncharov, Elbaz-Vincent, Gangl et al. on motives (cf. [8]), which started from the discovery of Cathelineau (1988) that entropy appears in the computation of the degree one homology of the discrete group SL_{2} over ℂ with coefficients in the adjoint action (cf. [9]).

Suppose $\mathcal{S}$ is the monoid generated by a finite family of partitions. The higher mutual informations were defined by Shannon as alternating sums:

_{I}denotes the join of the S

_{i}such that i ∊ I. We have I

_{1}= H and I

_{2}= I is the usual mutual information: I(S; T) = H(S) + H (T) − H(S, T).

**Theorem B.** (cf. section 3, [7]): I_{2}_{m} = δ_{t}δδt…δδ_{t}H, I_{2}_{m}_{+1} = −δδ_{t}δδt…δδ_{t}H, where there are m − 1 δ and m δ_{t} factors for I_{2}_{m} and m δ and m δ_{t} factors for I_{2m+1}.

Thus odd information quantities are information co-cycles, because they are in the image of δ, and even information quantities are twisted (or topological) co-cycles, because they are in the image of δ_{t}.

In [7] we show that this description is equivalent to the theorem of Hu Kuo Ting (1962) [6], giving a set theoretical interpretation of the mutual information decomposition of the total entropy of a system: mutual information, join and averaged conditioning correspond respectively to intersection, union and difference A\B = A ⋂ B^{c}. In special cases we can interpret I_{N} as homotopical algebraic invariants. For instance for N = 3, suppose that I(X; Y) = I(Y; Z) = I(Z; X) = 0, then I_{3}(X; Y; Z) = −I ((X,Y); Z) can be defined as a Milnor invariant for links, generalized by Massey, as they are presented in [10] (cf. page 284), through the 3-ary obstruction to associativity of products in a subcomplex of a differential algebra, cf. [7]. The absolute minima of I_{3} correspond to Borromean links, interpreted as synergy, cf. [11,12].

#### 1.3. Extension to Quantum Information

Positive hermitian n × n-matrices ρ, normalized by Tr(ρ) = 1, are called density of states (or density operators) and are considered as quantum probabilities on E = ℂ^{n}. Real quantum observables are n × n hermitian matrices, and, by definition, the amplitude, or expectation, of the observable Z in the state ρ is given by the formula
$\mathbb{E}(Z)=Tr(Zp)$ (see e.g., [13]). Two real observables Y, Z are said congruent if their eigenspaces are the same, thus orthogonal decomposition of E are the quantum analogs of partitions. The join is well defined for commuting observables. An information structure **S** is given by a subset of observables, such that, if Y, Z have common refined eigenspaces decomposition in **S**, their join (Y, Z) belongs to S. We assume that {E} belongs to **S**. What plays the role of a probability functor is a map **Q** from **S** to sets of positive hermitian forms on E, which behaves naturally with respect to the quantum direct image, thus **Q** is a covariant functor. We define information N-cochains as for the classical case, starting with the numerical functions on the sets **Q**_{X}; X ∊ **S**, which behave naturally under direct images.

The restriction of a density ρ by an observable Y is
${\rho}_{Y}={\displaystyle {\sum}_{A}{E}_{A}^{*}}\rho {E}_{A}$, where the E_{A}’s are the spectral projectors of the observable Y. The functor **Q** is said to match **S** (or to be complete and minimal with respect to **S**) if, for each X ∊ **S**, the set **Q**_{X} is the set of all possible densities of the form ρ_{X}.

The action of a variable on the cochains space ${\mathcal{C}}_{\mathcal{Q}}^{*}$ is given by the quantum averaged conditioning:

>From here we define coboundary operators δ_{q} and δ_{Qt} by the formula (22), then the notions of co-cycles, co-boundaries and co-homology classes follow. We have δ_{q} ○ δ_{q} = 0 and δ_{Qt} ○ δ_{Qt} = 0; cf. [7].

When the unitary group U_{n} acts transitively on **S** and **Q**, there is a notion of invariant cochains, forming a subcomplex of information cochains, and giving a more computable co-homology than the brut information co-homology. We call it the invariant information co-homology and denote it by
${H}_{U}^{*}\left(\mathbf{S};\mathbf{Q}\right)$.

The Von-Neumann entropy of ρ is S(ρ) = ℕρ(−log_{2}(ρ)) = −(ρ log_{2}(ρ)); it defines a 0-cochain S_{Y} by restricting S to the sets **Q**_{X}. The classical entropy is
$H\left(Y;\rho \right)=-{\displaystyle {\sum}_{A}Tr}\left({E}_{A}^{*}\rho {E}_{A}\right){\mathrm{log}}_{2}\left(Tr\left({E}_{A}^{*}\rho {E}_{A}\right)\right)$. Both these co-chains are invariant. It is well known that S_{(}_{X,Y}_{)}(ρ) = H(X; ρ) + X.S_{Y}(ρ) when X, Y commute, cf. [13]. In particular, by taking Y = 1_{E} we see that classical entropy measures the default of equivariance of the quantum entropy, i.e., H(X; ρ) = S_{X} (ρ) − (X.S)(ρ). But using the case where X refines Y, we obtain that the entropy of Shannon is the co-boundary of (minus) the Von Neumann entropy.

**Theorem C.** (cf. Theorem 3 section 4.3): For n ≥ 4 and when **S** is generated by at least two decompositions such that each pair has at least four subspaces, and when **Q** is matching **S**, the invariant co-homology
${H}_{U}^{1}$ of δ_{q} in degree one is zero, and the space
${H}_{U}^{0}$ is of dimension one. In particular, the only invariant 0-cochain such that δS = −H is the Von Neumann entropy.

(This statement, which will be proved below, corrects a similar statement which was made in the announcement [14].)

#### 1.4. Concavity and Convexity Properties of Information Quantities

The simplest classical information structure
$\mathcal{S}$ is the monoid generated by a family of “elementary” binary variables S_{1},…,S_{n}. It is remarkable that in this case, the information functions I_{N,J} = I_{N}(S_{j}_{1};…S_{jN}) over all the subsets J = {j_{1},…,j_{N}} of [n] = {1,…, n}, different from [n] itself, give algebraically independent functions on the probability simplex ∆(Ω) of dimension 2^{n} − 1. They form coordinates on the quotient of ∆(Ω) by a finite group.

Let $\mathcal{L}$_{d} denotes the Lie derivative with respect to d = (1,…,1) in the vector space
${\mathbb{R}}^{{2}^{n}}$, and ∆ the Euclidian Laplace operator on
${\mathbb{R}}^{{2}^{n}}$, then ∆ = ∆ − 2^{−n} $\mathcal{L}$_{d} ○ $\mathcal{L}$_{d} is the Laplace operator on the simplex ∆(Ω) defined by equating the sum of coordinates to 1.

**Theorem D.** (cf [15]): On the affine simplex ∆(Ω) the functions I_{N,J} with N odd (resp. even) satisfies the inequality ∆I_{N} ≥ 0 (resp. ∆I_{N} ≤ 0).

In other terms, for N odd the I_{N,J} are super-harmonic which is a kind of weak concavity and for N even they are sub-harmonic which is a kind of weak convexity. In particular, when N is even (resp. odd) I_{N,J} has no local maximum (resp. minimum) in the interior of ∆(Ω).

**Problem 2.** What can be said of the other critical points of I_{N,J}? What can be said of the restriction of one information function on the intersection of levels of other information functions? Information topology depends on the shape of these intersections and on the Morse theory for them.

#### 1.5. Monadic Cohomology of Information

Now we consider the category
$\mathcal{S}*$ of generalized ordered partitions of Ω over
$\mathcal{S}$: they are sequences S = (E_{1},…,E_{m}) of subsets of Ω such that ⋃_{j}E_{j} = Ω and
${E}_{i}\cap {E}_{j}=\overline{)0}$ as soon as i ≠ j. The number m is named the degree of S. Note the important technical point that some of the sets E_{j} can be the empty set. In the same spirit we introduce generalized ordered orthogonal decompositions of E for the quantum case; but in this summary, for simplicity we restrict ourselves to the classical case. Also we forget to add generalized to ordered up to now in this summary. A rooted tree decorated by
$\mathcal{S}*$ is an oriented finite planar tree Γ, with a marked initial vertex s_{0}, named the root of Γ, where each vertex s is equipped with an element F_{s} of
$\mathcal{S}*$, such that edges issued from s correspond to the values of F_{s}. When we want to mention that we restrict to partitions less fine than a partition X we put an index X, like in
${\mathcal{S}}_{X}^{*}$.

The notation μ(m; n_{1},…,n_{m}) denotes the operation which associates to an ordered partition S of degree m and to m ordered partitions S_{i} of respective degrees n_{i}, the ordered partition that is obtained by cutting the pieces of S using the pieces of S_{i} and respecting the order. An evident unit element for this operation is the unique partition n_{0} of degree 1. The symbol μ_{m} denotes the collection of those operations for m fixed. The introduction of empty subsets in ordered partitions insures that the result of μ(m; n_{i},…,n_{m}) is a partition of length n_{i} +… + n_{m}, thus the μ_{m} do define what is named an operad; cf. [10,16]. The axioms of unity, associativity and covariance for permutations are satisfied. See [10,16–18] for the definition of operads.

The most important algebraic object which is associated to an operad is a monad (cf. [4,16]), i.e., a functor $\mathcal{V}$ from a category $\mathcal{A}$ to itself, equipped with two natural transformations $\mu :\mathcal{V}\circ \mathcal{V}\to \mathcal{V}$ and $\eta :\mathbb{R}\to \mathcal{V}$, which satisfy to the following axioms:

In our situation, we can apply the Schur construction (cf. [16]) to the μ_{m} to get a monad: take for V the real vector space freely generated by
$\mathcal{S}*$; it is naturally graded, so it is the direct sum of spaces V(m); m ≥ 1 where the symmetric group
${\mathfrak{S}}_{m}$ acts naturally to the right, then introduce, for any real vector space W the real vector space
$\mathcal{V}\left(W\right)={\otimes}_{m\ge 0}V\left(m\right){\otimes}_{{\mathfrak{S}}_{m}}{W}^{\otimes m}$; the Schur composition is defined by
$\mathcal{V}\circ \mathcal{V}={\oplus}_{m\ge 0}V\left(m\right){\otimes}_{{\mathfrak{S}}_{m}}{\mathcal{V}}^{\otimes m}$. It is easy to verify that the collection (μ_{m}; m ∊ ℕ) defines a natural transformation
$\mathcal{V}\circ \mathcal{V}\to \mathcal{V}$, and the trivial partition π_{0} defines a natural transformation
$\eta :\mathcal{R}\to \mathcal{V}$, that satisfied to the axioms of a monad.

Also we fix a functor of probability laws Q_{X} over the category
$\mathcal{S}$ $\mathcal{M}$_{X}(m) be the vector space freely generated over ℝ by the symbols (P,i,m) where P belongs to Q_{X}, and 1 ≤ i ≤ m. In the last section of the second part we show how this space arises from the consideration of divided probabilities. This is apparent on the following definition of the right action of the operad
$\mathcal{V}$ on the family
${\mathcal{M}}_{X}\left(m\right);m\in \mathbb{N}*$: a sequence S_{1},…,S_{m} or ordered partitions in
${\mathcal{S}}_{X}^{*}$ acts to a generator (P,i,m) by giving the vector Σ_{j}p_{j}(P_{j},),n) where p_{j} is the probability P(S_{i} = j) and P_{j} is the conditioned probability P|(S_{i} = j). We denote by θ_{m}((P, i, m), (S_{1},…,S_{m})) this vector.

Now we consider the Schur functor
${\mathcal{M}}_{X}\left(W\right)+{\oplus}_{m}{\mathcal{M}}_{X}\left(m\right){\otimes}_{{\mathfrak{S}}_{m}}{W}^{\otimes m}$; the operations θ_{m} define a natural transformation
$\theta :\mathcal{M}\circ \mathcal{V}\to \mathcal{M}$, which is an action to the right in the sense of monads, i.e.,
$\theta \circ \left(\mathcal{F}\mu \right)=\theta \circ \left(\theta \mathcal{V}\right)$; θ ○ ($\mathcal{F}$η) = Id. (We forgot the index X for simplicity.)

Now we consider the bar resolution of $\mathcal{M}:\dots .\to \mathcal{M}\circ \mathcal{V}{\circ}^{\left(k+1\right)}\to \mathcal{M}\circ \mathcal{V}{\circ}^{k}\to \dots $, as in Beck (triples,…) [19], and Fresse [16], with its simplicial structure deduced from θ and μ, and the complex of natural transformations of $\mathcal{V}$-right modules $\mathcal{C}*\left(\mathcal{M}\right)=Ho{m}_{V}\left(\mathcal{M}\circ \mathcal{V}\xb0*,\mathcal{R}\right)$, where $\mathcal{R}$ is the trivial right module given by $\mathcal{R}\left(m\right)=\mathbb{R}$. As in the classical case, we restrict us to co-chains that are measurable in the probability (P, i, m).

The co-boundary is defined by the Hochschild formula, extended by MacLane and Beck to monads (see Beck [19]):

The cochains are described by families of scalar measurable functions F_{X}(S_{1};…,S_{k}; (P, i, m), where S_{1};…;S_{k} is a forest of m trees of level k labelled by
${\mathcal{S}}_{X}^{*}$, and where the value on (P, i, m) depends only on the tree
${S}_{1}^{i};{S}_{2}^{i};\dots ;{S}_{k}^{i}$.

We impose now the condition, named regularity, that ${F}_{X}\left({S}_{1};\dots ,{S}_{k};\left(P,i,m\right)\right)={F}_{X}\left({S}_{1}^{i};{S}_{2}^{i};\dots ;{S}_{k}^{i}P\right)$. The regular co-chains form a sub-complex ${C}_{r}^{*}\left(\mathcal{M}\right)$; by definition, its homology is the arborescent information co-homology.

The regular cochains of degree k are determined by their values for m = 1 and decorated trees of level k, where the co-boundary takes the form:

This gives co-homology groups
${H}_{\tau}^{*}\left(\mathcal{S},\mathcal{P}\right)$, τ for tree. The fact that entropy H(S_{*}ℙ) = H(S; ℙ) defines a 1-cocycle is a result of an equation of Fadeev, generalized by Baez, Fritz and Leinster [20], who gave another interpretation, based on the operad structure of the set of all finite probability laws. See also Marcolli and Thorngren [21].

**Theorem E.** (cf. Theorem 4 section 6.3, [22]): If Ω has more than four points,
${H}_{\tau}^{1}\left(\prod \left(\mathrm{\Omega}\right),\mathrm{\Delta}\left(\mathrm{\Omega}\right)\right)$ is the one dimensional vector space generated by the entropy.

Another co-boundary δ_{t} on
${\mathcal{C}}_{r}^{*}\left(\mathcal{M}\right)$ corresponds to another right action of the monad
${\mathcal{V}}_{X}$, which is deduced from the maps θ_{t} that send (P, i, m) ⊗ S_{1} ⊗… ⊗ S_{m}) to the sum of the vectors (P, (i, j), n) for j = 1,…, n_{i} that are associated to the end branches of S_{i}. It gives a twisted version of information co-homology as we have done in the first paragraph. This allows us to define higher information quantities for strategies: for N = 2M + 1 odd, I_{τ,N} = − (δδ_{t})^{M} H, and for N = 2M + 2 even, i_{τ,n} = δ_{t}(δδ_{t})^{M} H.

This gives for N = 2, a notion of mutual information between a variable S of length m and a collection T of m variables T_{1},…,T_{m}:

When all the T_{i} are equals we recover the ordinary mutual information of Shannon plus a multiple of the entropy of T_{i}.

#### 1.6. The Forms of Information Strategies

A rooted tree Γ decorated by
${\mathcal{S}}_{*}$ can be seen as a strategy to discriminate between points in Ω. For each vertex s there is a minimal set of chained edges α_{1},…,α_{k} connecting s_{0} to s; the cardinal k is named the level of s; this chain defines a sequence (F_{0},v_{0}; F_{1},v_{1}; F_{k}_{−1},v_{k}_{−1}) of observables and values of them; then we can associate to s the subset Ω_{s} of Ω where each F_{j} takes the value u_{j}. At a given level k the sets Ω_{s} form a partition π_{k} of Ω; the first one π_{0} is the unit partition of length 1, and π_{l} is finer than π_{l−1} for any l. By recurrence over k it is easy to deduce from the orderings of the values of F_{s} an embedding in the Euclidian plane of the subtrees Γ(k) at level k such that the values of the variables issued from each vertex are oriented in the direct trigonometric sense, thus π_{k} has a canonical ordering ω_{k}. Remark that many branches of the tree gives the empty set for Ω_{s} after some level; we name them dead branches. It is easy to prove that the set
$\prod {\left(\mathcal{S}\right)}_{*}$ of ordered partitions that can be obtained as a (π_{k},ω_{k}) for some tree Γ and some level k is closed by the natural ordered join operation, and, as
$\prod {\left(\mathcal{S}\right)}_{*}$ contains π_{0}, it forms a monoid, which contains the monoid
$M\left({\mathcal{S}}_{*}\right)$ generated by
${\mathcal{S}}_{*}$.

Complete discrimination of Ω by
${\mathcal{S}}_{*}$ exists when the final partition of Ω by singletons is attainable as a π_{k}; optimal discrimination correspond to minimal level k. When the set Ω is a subset of the set of words x_{1},…,x_{N} with letters x_{i} belonging to given sets M_{i} of respective cardinalities m_{i}, the problem of optimal discrimination by observation strategies Γ decorated by
${\mathcal{S}}_{*}$ is equivalent to a problem of minimal rewriting by words of type (F_{0},v_{0}), (F_{1},v_{1}),(F_{k},v_{k}); it is a variant of optimal coding, where the alphabet is given. The topology of the poset of discriminating strategies can be computed in terms of the free Lie algebra on Ω, cf. [16].

Probabilities ℙ in
$\mathcal{P}$ correspond to a priori knowledge on Ω. In many problems
$\mathcal{P}$ is reduced to one element, that is the uniform law. Let s be a vertex in a strategic tree Γ, and let
${\mathcal{P}}_{s}$ be the set of probability laws that are obtained by conditioning through the equations F_{i} = i = 0,…,k − 1 for a minimal chain leading from s_{0} to s. We can consider that the sets
${\mathcal{P}}_{s}$ for different s along a branch measure the evolution of knowledge when applying the strategy. The entropy H(F; ℙ_{s}) for F in
${\mathcal{S}}_{*}$ and ℙ_{s} in
${\mathcal{P}}_{s}$ gives a measure of information we hope to obtain when applying F at s in the state ℙ_{s}. The maximum entropy algorithm consists in choosing at each vertex s a variable that has the maximal conditioned entropy H(F; ℙs).

**Theorem F.** (cf. [22]): To find one false piece of different weight among N pieces for N ≥ 3, when knowing the false piece is unique, by the minimal numbers of weighing, one can use the maximal entropy algorithm.

However we have another measure of information of the resting ambiguity at s, by taking for the Galois group G_{s} the set of permutations of Ω_{s} which respects globally the set
${\mathcal{P}}_{s}$ and the set of restrictions of elements of
${\mathcal{S}}_{*}$ to Ω_{s}, and which preserves one by one the equations F_{i} = v_{i}. Along branches of Γ this gives a decreasing sequence of groups, whose successive quotients measure the evolution of acquired information in an algebraic sense.

**Problem 3.** Generalize Theorem F. Can we use algorithms based on the Galoisian measure of information? Can we use higher information quantities associated to trees for optimal discrimination?

#### 1.7. Conclusion and Perspective

Concepts of Algebraic topology were recently applied to Information theory by several researchers. In particular notions coming from category theory, homological algebra and differential geometry were used for revisiting the nature and scope of entropy, cf. for instance Baez et al. [20], Marcolli and Thorngren [21] and Gromov [23]. In the present note we interpreted entropy and Shannon information functions as co-cycles in a natural co-homology theory of information, based on categories of observable and complexes of probability. This allowed us to associate topological figures, like Borromean links, with particular configuration of mutual dependency of several observable quantities. Moreover we extended these results to a dynamical setting of system observation, and we connected probability evolutions with the measures of ambiguity given by Galois groups. All those results provide only the first steps toward a developed Information Topology. However, even at this preliminary stage, this theory can be applied to the study of distribution and evolution of Information in concrete physical and biological systems. This kind of approach already proved its efficiency for detecting collective synergic dynamic in neural coding [12], in genetic expression [24], in cancer signature [25], or in signaling pathways [26]. In particular, information topology could provide the principles accounting for the structure of information flows in biological systems and notably in the central nervous system of animals.

## 2. Classical Information Topos. Theorem One

#### 2.1. Information Structures and Probability Families

Let Ω be a finite set, the set Π(Ω) of all partitions of Ω constitutes a category with one arrow Y → Z from Y to Z when Y is more fine than Z, we also say in this case that Y divides Z. In Π(Ω) we have an initial element, which is the partition by points, denoted ω and a final element, which is Ω itself and is denoted by 1. The joint partition YZ or (Y, Z), of two partitions Y, Z of Ω is the less fine partition that divides Y and Z, i.e., their gcd. For any X we get XX = X, ωX = ω and 1.X = X.

By definition an information structure$\mathcal{S}$ on Ω is a subset of Π(Ω), such that for any element X of $\mathcal{S}$, and any pair of elements Y, Z in $\mathcal{S}$ that X refines, the joint partition YZ also belongs to $\mathcal{S}$.

In addition we will always assume that the final partition 1 belongs to $\mathcal{S}$. In terms of observations, it means that at least something is a certitude.

Examples: start with a set Σ = {S_{i}; 1 ≤ i ≤ n} of partitions of Ω. For any subset I = {i_{1},…, i_{k}} of [n] = {1,…, n}, the joint (Si_{1},…, Si_{k}), also denoted S_{I}, divides each Si_{j}. The set W = W(Σ) of all the S_{I}, when I describes the subsets of [n] is an information struture. It is even a commutative monoid, because any product of elements of W belongs to W, and the partition associated to Ω itself gives the identity element of W. The product S_{[}_{n}_{]} of all the S_{i} is maximal; it divides all the other elements. As Π(Ω) the monoid W(Σ) is idempotent, i.e., for any X we have XX = X.

By definition, the faces of the abstract simplex ∆([n]) are the subsets of [n]; its vertices are the singletons. Thus the monoid W(Σ) can be identified with the first barycentric subdivision of the simplex ∆([n]).

Remind that a simplicial subcomplex of ∆([n]) is a subset of faces that contains all faces of any of its elements. Then any simplicial sub-complex K of ∆([n]) gives a simplicial information structure
$\mathfrak{S}\left(K\right)$, embedded in W(Σ). In fact, if Y and Z are faces of a simplex X belonging to K, YZ is also a face in X, thus it belongs to K. The maximal faces Σ_{a}; a ∊ A of K correspond to the finest elements in
$\mathfrak{S}\left(K\right)$; the vertices of a face Σ_{a} gives a family of partitions, which generates a sub-monoid W_{a} = W(Σ_{a}) of W; it is a sub-information structures (full sub-category) of
$\mathfrak{S}\left(K\right)$, having the same unit, but having its own initial element ω_{a}. These examples arise naturally when formalizing measurements if some obstructions or a priori decisions forbid a set of joint measurements.

This kind of examples were considered by Han [27] see also McGill [28].

**Example 1.** Ω has four elements (00), (01), (10), (11); the variable S_{1} (resp. S_{2}) is the projection pr_{1} (resp. pr_{2}), on E_{1} = E_{2} = {0, 1}; Σ is the set {S_{1}, S_{2}}. The monoid W(Σ) has four elements 1, S_{1}, S_{2}, S_{1} S_{2}. The partition S_{1}S_{2} = S_{2}S_{1} corresponds to the variable Id : Ω → Ω.

**Example 2.** Same Ω as before, with the same names for the elements, but we take all the partitions of Ω in
$\mathcal{S}$. In addition to 1, S_{1}, S_{2} and S = S_{1}S_{2}, there is S_{3}, the last partition in two subsets of cardinal two, which can be represented by the sum of the indices: S_{3}(00) = 0, S_{3}(11) = 0, S_{3}(01) = 1, S_{3}(10) = 1, the four partitions Y_{ω}, for ω ∊ Ω, formed by a singleton {ω} and its complementary, and finally the six partitions X_{μν} = Y_{μ}Y_{ν}, indexed by pairs of points in Ω satisfying p < ν in the lexical order. The product of two distinct Y is a X, the product of two distinct X or two distinct S_{i} is S, the product of one Y and a S_{i} is a X, of one Y and a X is this X or S, of one S and a X is this X or S. In particular the monoid W is also generated by the three S_{i} and the four Y_{ω}; it is called the monoid of partitions of Ω, and the associative algebra
$\Lambda \left(\mathcal{S}\right)$ of this monoid is called the partition algebra of Ω.

**Example 3.** Same Ω as before, that is Ω = ∆(4), with the notations of example 2 for the partitions; but we choose as generating family the set ϒ of the four partitions Y_{μ}; μ ∊ Ω; the joint product of two such partitions is either a Y_{μ} (when they coincide) or a X_{μv} (when they are different). The monoid W(ϒ) has twelve elements.

**Example 4.** Ω has 8 elements, noted (000),…,(111), and we consider the family Σ of the three binary variables S_{1}, S_{2}, S_{3} given by the three projections. If we take all the joints, we have a monoid of eight elements. However, if we forbid the maximal face (S_{1}, S_{2}, S_{3}), we have a structure
$\mathcal{S}$ which is not a monoid; it is the set formed by 1, S_{1}, S_{2}, S_{3} and the three joint pairs (S_{1}, S_{2}), (S_{1}, S_{3}), (S_{2}, S_{3}).

On the side of probabilities, we choose a Boolean algebra
$\mathcal{B}$ of sets in Ω, i.e., a subset
$\mathcal{B}$ of the set
$\mathbb{P}\left(\mathrm{\Omega}\right)$ of subsets of Ω that contains the empty set
$\overline{)0}$ and the full set Ω, and is closed by union and intersection. In this finite context, it is easy to prove that
$\mathcal{B}$ is constituted by all the unions of its minimal elements (called atoms). Associated to this case, we will consider only information structures that are made by partitions whose each element belongs to
$\mathcal{B}$. Consequently we could replace everywhere Ω by the finite set
${\mathrm{\Omega}}_{\mathcal{B}}$ of the atoms of
$\mathcal{B}$, but we will see that several Boolean sub-algebras appear naturally in the process of observation, thus we prefer to mention the choice of
$\mathcal{B}$ at the beginning of observations. Then we consider the set
$\mathrm{\Delta}\left({\mathrm{\Omega}}_{\mathcal{B}}\right)$, or
$\mathrm{\Delta}\left(\mathcal{B}\right)$, of all probability laws on
$\left(\mathrm{\Omega},\mathcal{B}\right)$, i.e., all real functions p_{x} of the atoms x of
$\mathcal{B}$ (the points of
${\mathrm{\Omega}}_{\mathcal{B}}$), satisfying p_{x} ≥ 0 and Σ_{x} p_{x} = 1. We see that this set of probabilities is also a simplex ∆([N]), where N is the cardinality of
${\mathrm{\Omega}}_{\mathcal{B}}$.

As on the side of partitions, we will consider more generally any simplicial sub-complex $\mathcal{Q}$ of $\mathrm{\Delta}\left(\mathcal{B}\right)$, and call it a probability complex. In the appendix, we show that this kind of examples correspond to natural forbidding rules, that can express physical constraints on the observed system.

A partition Y which is measurable with respect to
$\mathcal{B}$ is made by elements Y_{i} for i = 1, …, m, belonging to
$\mathcal{B}$. Let P be an element of
$\mathrm{\Delta}\left(\mathcal{B}\right)$; the conditioning of P by the element Y_{i} is defined only if P(Y_{i}) ≠ 0, and given by the formula P(B|Y = y_{i}) = P(B ⋂ Y_{i})/P(Y_{i}). We will consider it as a probability on Ω equipped with
$\mathcal{B}$, not as a probability on Y_{i}. Remark that if P belongs to a simplicial family
$\mathcal{Q}$, the probability P(B|Y = y_{i}) is also contained in
$\mathcal{Q}$. In fact, if the smallest face of
$\mathcal{Q}$ which contains P is the simplex a on the vertices x_{1},…,x_{k}, then the conditioning of P by Y_{i}, being equal to 0 for the other atoms x, belongs to a face of σ, which is in
$\mathcal{Q}$, because
$\mathcal{Q}$ is a complex.

For a probability family $\mathcal{Q}$, i.e., a set of probabilities on Ω, and a set of partitions $\mathcal{S}$, we say that $\mathcal{Q}$ and $\mathcal{S}$ are adapted one to each other if the conditioning of every element of $\mathcal{Q}$ by every element of S belongs to $\mathcal{Q}$.

By definition, the algebra
${\mathcal{B}}_{Y}$ is the set of unions of elements of the partition Y. We can consider it as a Boolean algebra on Ω contained in
$\mathcal{B}$ or as Boolean algebra on the quotient set Ω/Y. The image Y_{*}Q of a probability Q for
$\mathcal{B}$ by the partition Y is the probability on Ω for the sub-algebra
${\mathcal{B}}_{Y}$, that is given by Y * Q(t) = Q(t) for t ∊
${\mathcal{B}}_{Y}$. It is the forgetting operation, also frequently named marginalization by Y.

By definition, the set
${\mathcal{Q}}_{Y}$ is the image of Y_{*}. Let us prove that it is a simplicial sub-complex of
$\mathrm{\Delta}\left({\mathcal{B}}_{Y}\right)$: take a simplex σ of
$\mathcal{Q}$, denote its vertices by x_{1},…,x_{k}, note δ_{j} the Dirac mass of x_{j}, and look at the partition σ_{i} = Y_{i} ⋂ σ of σ induced by Y, then for all the x_{j} ∊ σ_{i} the images Y_{*} δ_{j} coincide. Let us denote this image by δ(Y, σ_{i}); it is an element of
${\mathcal{Q}}_{Y}$. For every law Q in a, the image Y_{*}Q belongs to the simplex on the laws δ(Y, σ_{i}), and any point in this simplex belongs to
${\mathcal{Q}}_{Y}$. Q.E.D.

If X → Y is an arrow in $\Pi \left({\mathrm{\Omega}}_{\mathcal{B}}\right)$, the above argument shows that the map ${\mathcal{Q}}_{X}\to {\mathcal{Q}}_{Y}$ is a simplicial mapping.

Conditioning by Y and marginalization by Y_{*} are related by the barycentric law (or theorem of total probability, Kolmogorov 1933 [29]): for any measurable set A in
$\mathcal{B}$ we have

Remark that the notions of information structures and probability complexes extend to infinite sets; this is developed in paper [7].

In this context, we have a formula for any integrable function φ on Ω with respect to P:

Consider a finite set Ω, equipped with a Boolean algebra $\mathcal{B}$, a probability family $\mathcal{Q}$ for it and an information structure $\mathcal{S}$ adapted to $\mathcal{B}$.

For each object X in $\mathcal{S}$, the set ${\mathcal{S}}_{X}$ made by the partitions Y that are divided by X is a closed sub-category, possessing an internal law of monoid. The object X is initial. To any arrow X → Y is associated the inclusion ${\mathcal{S}}_{Y}\to {\mathcal{S}}_{X}$, thus we get a contra-variant functor from $\mathcal{S}$ to the category of monoids.

On the other side we have a natural co-variant functor of
$\mathcal{S}$ to the category of sets, which associates to each partition
$X\in \mathcal{S}$ the set
${\mathcal{Q}}_{X}$ of probability laws in the image of
$\mathcal{Q}$ on the quotient set Ω/X, and which associates to each arrow X → Y the surjection
${\mathcal{Q}}_{X}\to {\mathcal{Q}}_{Y}$ which is given by direct image P_{X} ↦ Y_{*}P_{X}. If
$\mathcal{Q}$ is simplicial the functor goes to the category of simplicial complexes.

**Definition 1.** For
$X\in \mathcal{S}$, the functional module${\mathcal{F}}_{X}\left(\mathcal{Q}\right)$ is the real vector space of measurable functions on the space
${\mathcal{Q}}_{X}$; for each arrow of divisibility X → Y, we have an injective linear map f ↦ f^{Y}^{|X} from $\mathcal{F}$_{Y} to $\mathcal{F}$_{X} given by

In this manner, we obtain a contra-variant functor $\mathcal{F}$ from the category $\mathcal{S}$ to the category of real vector spaces.

If $\mathcal{Q}$ and $\mathcal{S}$ are adapted one to each other, the functor $\mathcal{F}$ admits a canonical action of the monoid functor $X\mapsto {\mathcal{S}}_{X}$, given by the average formula

To verify this is an action of monoid, we must verify that for any Z which divides Y, and any f ∊ $\mathcal{F}$_{Y}, we have, in $\mathcal{F}$_{X} the identity

But this results from the identity Y_{*}(P|(Z = z)) = (Y_{*}P)|(Z = z) due to Y_{*}P(Z = z) = P(Z = z). The arrows of direct images and the action of averaged conditioning satisfy the axiom of distributivity: if Y and Z divide X, but not necessarily Z divides Y, we have

**Proof.** The first identity comes from the fact that (Z,Y)_{*}(P|(Z = z)) = Y_{*}(P|(Z = z)); the second one follows from the fact that we have an action of the monoid${\mathcal{S}}_{X}$.

As the formula (12) is central in our work, we insist a bit on it, and comment its meaning, at least in this finite setting:

Let P ↦ f (P) be an element of $\mathcal{F}$_{X}, and Y be the goal of an arrow X → Y, we have

We will see when discussing functions of several partitions that this formula is due to Shannon and correspond to conditional information.

**Lemma 1.** for any pair (Y, Z) of variables in
${\mathcal{S}}_{X}$, and any F for which the integrals converge, we have (Y,Z).F = Y.(Z.F).

**Proof.** We note p_{i} the probability that Y = y_{i}, π_{ij} the joint probability of (Y = y_{i}, Z = z_{j}), and q_{ij} the conditional probability of Z = z_{j} knowing that Y = y_{i}, then

**Remark 1.** In the general case, where Ω is not necessarily finite and
$\mathcal{B}$ is any sigma-algebra, the Lemma 1 is a version of the Fubini theorem.

Let us consider the category $\mathcal{S}$ equipped with the discrete topology, to get a site (cf. SGA [30]). Over a discrete site every presheaf is a sheaf. The contravariant functor $X\mapsto {\mathcal{S}}_{X}$ gives a structural sheaf of monoids, and by passing to the algebras ${\mathcal{A}}_{X}$ over ℝ which are generated by the (finite) monoids, we get a sheaf in rings, thus S becomes a ringed site. Moreover, by considering all contra-variant functors $X\mapsto {\mathbb{N}}_{X}$ from $\mathcal{S}$ to modules over the algebra functor $\mathcal{A}$, we obtain a ringed topos, that we name the information topos associated to $\mathrm{\Omega},\mathcal{B},\mathcal{S}$. This ringed topos concerns only the observables given by partitioning.

Take now in account a probability family
$\mathcal{Q}$ which is adapted to
$\mathcal{S}$, for instance a simplicial family; we obtain a functor X ↦ Q_{X} translating the marginalization by the partitions, considered as observable quantities, and the conditioning by observables is translated by a special element X ↦ $\mathcal{F}$_{X} of the information topos.

In this way it is natural to expect that topos co-homology, as introduced by Grothendieck, Verdier and their collaborators (see SGA 4 [30]), captures the invariant structure of observation, and defines in this context what information is. This is the main outcome of our work.

As a consequence of Grothendieck’s article (Tohoku, 1957 [31]), a ringed topos possesses enough injective objects, i.e., any object is the sub-object of an injective object, moreover, up to isomorphism, there is a unique minimal injective object containing a given object, called its injective envelope (cf. Gabriel, seminaire Dubreil, exp. 17 [32]). Thus each object in the category
${\mathcal{D}}_{\mathcal{S}}$ of modules over a ringed site
$\mathcal{S}$ possesses a canonical injective resolution I_{*}(N); then the group
$Ex{t}_{\mathcal{D}}^{n}\left(M,N\right)$ can be defined as the homology of the complex
$Ho{m}_{\mathcal{D}}\left(M,{I}_{n}\left(N\right)\right)$. Those groups are denoted by H^{n}(M; N).

The “comparison theorem” (cf. Bourbaki, Alg.X Th1, p. 100 [33], or MacLane 1975, p. 261 [5]) asserts that, for any projective (resp. injective) resolution of M (resp. N) there exists a natural map of complexes between the resulting complex of homomorphisms and the above canonical complex, and that this map induces an isomorphism in co-homology.

In our context, we take for M the trivial constant module ${\mathcal{R}}_{\mathcal{S}}$ over $\mathcal{S}$, and we take for N the functional module $\mathcal{F}\left(\mathcal{Q}\right)$.

The existence of free resolutions of ${\mathcal{R}}_{\mathcal{S}}$ makes things easier to handle.

Hence we propose that the natural information quantities are classes in the co-homology groups $H*\left({\mathcal{R}}_{\mathcal{S}},\mathcal{F}\left(\mathcal{Q}\right)\right)$.

This is reminiscent of Galois co-homology see SGA [30], where M is also taken as the constant sheaf over the category of G-objects seen as a site.

In [7] we develop further this more geometric approach, by considering several resolutions. But in this paper, in order to be concrete, we will only focus on a more elementary approach, associated to a special resolution, called the non-homogeneous bar-resolution, which also leads to the general result. This is the object of the next section.

#### 2.2. Non-Homogeneous Information Co-Homology

For each relative integer m ≥ 0, and each object
$X\in \mathcal{S}$, we consider the real vector space S_{m}(X), freely generated by the m-uples of elements of the monoid
${\mathcal{S}}_{X}$, and we define C^{m}(X) as the real vector space of linear functions from S_{m}(X) to the space $\mathcal{F}$_{X} of measurable functions from
${\mathcal{Q}}_{X}$ to ℝ.

Then we define the set
${\mathcal{C}}^{m}$ of m-cochains as the set of collections F_{X} ∊ C^{m}(X) satisfying the following condition, named joint locality:

For each Y divided by X, when each variable X_{j} is divided by Y, we must have

Thus a co-chain F is a natural transformation from the functor S_{m} (X) from
$\mathcal{S}$ to the category of real vector spaces to the functor $\mathcal{F}$ of measurable functions on
${\mathcal{Q}}_{X}$. Hence, F is not an ordinary numerical function of probability laws ℙ and a set (X_{i},…,X_{m}) of m random variables, but we can speak of its value F_{X}(X_{1};…;X_{m}; ℙ) for each X in
$\mathcal{S}$. For X given the co-chains form a sub-vector space
${\mathcal{C}}^{m}\left(X\right)$ of C^{m}(X).

If we apply the condition to Y = (X_{1},…,X_{m}) we find that F(X_{1};…; X_{m}; ℙ) depends only on the direct image of ℙ by the joint variable of the X_{i}’s. This implies that, if F belongs to
${\mathcal{C}}^{m}\left(X\right)$, we have

Conversely, suppose that F satisfies the conditions (18) and consider X, Y two variables such that X divides Y, and that Y divides each X_{j}, and let P be a probability in
${\mathcal{Q}}_{X}$; then the joint variable Z =(X_{i},…,X_{m}) divides Y and X, thus we have Z_{*}P = Z_{*}(X_{*}P) = Z_{*}(Y_{*}P), and

Which proves that F belongs to ${\mathcal{C}}^{m}\left(X\right)$.

Let F be an element of ${\mathcal{C}}^{m}\left(X\right)$, and Y an element of ${\mathcal{S}}_{X}$; then we define

It follows from the equivalent condition (18) that Y.F also belongs to ${\mathcal{C}}^{m}\left(X\right)$.

Moreover, the proof of Lemma 1 applies and give that, for any pair (Y, Z) of variables in ${\mathcal{S}}_{X}$, and any F in ${\mathcal{C}}^{m}(X)$, we have (Y, Z).F = Y.(Z.F).

Thus (1) defines an action of the semigroup ${\mathcal{S}}_{X}$ on the vector spaces ${\mathcal{C}}^{m}(X)$.

**Remark 2.** The operation of
${\mathcal{S}}_{X}$ can be rewritten more compactly by using integrals:

The differential δ for computing co-homology is given by the Eilenberg-MacLane formula (1943):

Since this formula corresponds to the standard inhomogeneous bar-resolution in the case of semi-groups and algebras (Cf. MacLane p. 115 [4] and Cartan-Eilenberg pp. 174–175. [34]), we name δ the Hochschild co-boundary, as in the case of semi-groups, and algebras.

Remark that a function F satisfying the joint locality condition, (i.e., the hypothesis that F(Y_{1};…; Y_{m}; P) depends only on (Y_{1},…, Y_{m})_{*}P), has a co-boundary which is also jointly local, because the variables appearing in the definition are all joint variables of the Y_{j}. (This this would not have been true for the stronger locality hypothesis asking that F depends only on the collection (Y_{j})_{*}P; j = 1,…,m.)

It is easy to verify that δ^{m} ○ δ^{m−1} = 0. We denote by Z^{m} the kernel of δ^{m} and by B^{m} the image of δ^{m−1}. The elements of Z^{m} are named m-cocycles, we consider them as information quantities, and the elements of B^{m} are m-coboundaries.

**Definition 2.** For m ≥ 0, the quotient

The information co-homology satisfies functoriality properties:

Consider two pairs of information structures and probability families, $\left(\mathcal{S},\mathcal{Q}\right)$ and $\left({\mathcal{S}}^{\prime},{\mathcal{Q}}^{\prime}\right)$ on two sets Ω, Ω′ equipped with the σ-algebras $\mathcal{B},\mathcal{B}\prime $ respectively, and φ a surjective measurable map from $(\mathrm{\Omega},\mathcal{B})$ to $({\mathrm{\Omega}}^{\prime},\mathcal{B}\prime )$, such that $\mathcal{Q}\subseteq {\phi}_{*}(\mathcal{Q})$(i.e., ${\phi}_{*}(\mathcal{Q})\in {\mathcal{Q}}^{\prime}$ for every $\mathcal{Q}\in \mathcal{Q}$), and such that $S\subseteq {\phi}^{*}{\mathcal{S}}^{\prime}$(i.e., $\forall X\in S,\exists {X}^{\prime}\in {S}^{\prime},X={X}^{\prime}\circ \phi $); then we have the following construction:

**Proposition 1.** For each integer m ≥ 0, a natural linear map

**Proof.** First, remark that X_{j}=X″_{j} ○ φ implies
${{X}^{\prime}}_{j}=X{\u201d}_{j}$ because φ is surjective. As F′ is (jointly) local, the co-chain F = φ* (F′) is also (jointly) local. Finally, it is evident that the map F′ ↦ F commutes with the co-boundary operator. Therefore the proposition follows.

Another co-homological construction works in the reversed direction:

Consider two information structures $(S,\mathcal{Q})$ and $({\mathcal{S}}^{\prime},{\mathcal{Q}}^{\prime})$ on two sets Ω, Ω′ equipped with σ-algebras $\mathcal{B},\mathcal{B}\prime $ respectively, and φ a measurable map from $(\mathrm{\Omega},\mathcal{B})$ to $({\mathrm{\Omega}}^{\prime},{\mathcal{B}}^{\prime})$, such that ${\mathcal{Q}}^{\prime}\subseteq {\phi}_{*}(\mathcal{Q})$ (i.e., $\forall {\mathcal{Q}}^{\prime}\in {\mathcal{Q}}^{\prime},\exists \mathcal{Q}\in \mathcal{Q},{\mathcal{Q}}^{\prime}={\phi}_{*}(\mathcal{Q})$), and such that ${\phi}^{*}{S}^{\prime}\subseteq S$ (i.e., $\forall {X}^{\prime}\in {\mathcal{S}}^{\prime},{X}^{\prime}\circ \phi \in \mathcal{S}$); then the following result is true:

**Proposition 2.** For each integer m ≥ 0, a natural linear map

_{*}(P).

**Proof.** First, remark that, if Q also satisfies P′ = φ_{*}(Q), we have
$F({{X}^{\prime}}_{1}\circ \phi ;\dots ;{{X}^{\prime}}_{m}\circ \phi ;P)=F({{X}^{\prime}}_{1}\circ \phi ;\dots ;{{X}^{\prime}}_{m}\circ \phi ;\mathcal{Q})$. To establish that point, let us denote
${X}_{j}={{X}^{\prime}}_{j}\circ \phi ;j=0,\dots ,m$, and
${X}^{\prime}=({{X}^{\prime}}_{1},\dots ,{{X}^{\prime}}_{m})$, X= (X_{1},…,X_{m}) the joint variables; the quantity
$F({{X}^{\prime}}_{1}\circ \phi ;\dots ;{{X}^{\prime}}_{m}\circ \phi ;P)$ depends only on X_{*}P, but this law can be rewritten
${{X}^{\prime}}_{*}{P}^{\prime}$, which is also equal to X_{*}Q. In particular, if F is local, then F′ = φ_{*} F is local.

As it is evident that the map F ↦ F′ commutes with the co-boundary operator, the proposition follows.

Remark this way of functoriality uses the locality of co-cycles.

**Corollary 1.** In the case where
${\mathcal{Q}}^{\prime}={\phi}_{*}(\mathcal{Q})$ and
$\mathcal{S}={\phi}^{*}{\mathcal{S}}^{\prime}$, the maps φ* and φ_{*} in information co-homology are inverse one of each other.

This is our formulation of the invariance of the information co-homology for equivalent information structures.

When m = 0, co-cochains are functions f of P_{X} in
${\mathcal{Q}}_{X}$ such that f(Y_{*} P_{X}) = f (P_{X}) for any Y multiple of X (i.e., coarser than X). As we assume 1 belongs to
$\mathcal{S}$, and the set Q_{1} has only one element, f must be a constant. And every constant is a co-cycle, because

Consequently H^{0} is ℝ. This corresponds to the hypothesis
$1\in \mathcal{S}$, meaning connexity of the category. If m components exist, we recover them in the same way and H^{0} is isomorphic to ℝ^{m}.

We now consider the case m = 1. From what precedes we know that there is no non-trivial co-boundary.

Non-homogeneous 1-cocycles of information are families of functions f_{X} (Y; P_{X}), measurable in the variable P in
$\mathcal{Q}$, labelled by elements
$Y\in {\mathcal{S}}_{X}$, which satisfies the locality condition, stating that each time we have Z → X → Y in
$\mathcal{S}$, we have

Remark that locality implies that it is sufficient to know the f_{Y}(Y; Y_{*}P) to recover f_{X}(Y; P) for all partition X in
$\mathcal{S}$ that divides Y.

It is in this sense that we frequently omit the index X in f_{X}.

Remark also that for any 1-cocycle f we have f (1; P) = 0.

In fact, the co-cycle equation tells that

More generally, for any X, and any value x_{i} of X, we have

In fact a special case of Equation (30) is

_{i})) = 0, due to P ≥ 0. This generalizes f (1; P) = 0 for any P, because, for a probability conditioned by X = x

_{i}, the partition X appears the same as 1, that is a certitude.

Remark also that for each pair of variables (X, Y), a 1-cocycle must satisfy the following symmetric relation:

#### 2.3. Entropy

Any multiple of the Shannon entropy is a non-homogeneous information co-cycle. Remind that entropy H is defined for one partition X by the formula

_{i}denotes the values of ℙ on the elements of the partition X. In particular the function H depends only on X

_{*}(ℙ), which is locality. The co-cycle equation expresses the fundamental property for an information quantity, writen by Shannon:

Thus every constant multiple f = λH of H defines a co-cycle. Remark that the corresponding “homogeneous 1-cocycle” is the entropy variation:

This means that it satisfies the “invariance property”:

Note that the entropy variation H(X; P) − H(Y; P) exists in a wider range of condition, i.e., when Ω is infinite, if the laws of X and Y are absolutely continuous with respect to a same probability law ℙ_{0}: we only have to replace the finite sum by the integral of the function −φ log φ where φ denotes the density with respect to ℙ_{0}. Changing the reference law ℙ_{0} changes the quantities H(X) and H(Y) by the same constant, thus does not change the variation H(X; P) − H(Y; P).

We will prove now that, for many simplicial structures $\mathcal{S}$, and sufficiently large adapted probability complexes $\mathcal{Q}$, any information co-homology class of degree one is a multiple of the entropy class.

In particular this would be true for $\mathcal{S}=W(\Sigma )$ and $\mathcal{Q}=\mathrm{\Delta}(\mathrm{\Omega})$, when Σ has more than two elements and Ω more than four elements, but this is also true in more refined situation, as we will see.

We assume that the functor of probabilities ${\mathcal{Q}}_{X}$ contains all the laws on Ω/X, when X belongs to $\mathcal{S}$. In such a case, by definition, we say that $\mathcal{Q}$ is complete with respect to $\mathcal{S}$.

Let us consider a probability law P in
$\mathcal{Q}$ and two partitions X, Y in the structure
$\mathcal{S}$, such that the joint XY belongs to
$\mathcal{S}$. We denote by Greek letters α,β,… the indices labelling the partition Y and by Latin letters k,l,… the indices of the partition X; the probability that X = ξ_{k},Y = η_{α} is noted p_{k,α}, then the probability of X = ξ_{k} is equal to p_{k} = Σ_{α} p_{k,α} and the probability of Y = η_{α} is equal to q_{α} = Σ_{k} p_{k,α}. To simplify the notations, let us write F = f (X; p),G = f ((Y, X); ℙ),H = f (Y; ℙ), F_{α} = f (X; ℙ|(Y = η_{α})),H_{k} = f (Y; P|(X = ξ_{k})).

The Hochschild co-cycle equation gives

But we also have the relation obtained by exchanging X and Y, which gives

Suppose that p_{k,α} = 0 except when α = α_{1} and k = k_{2}, k_{3},…,k_{m} or α = α_{2} and k = k_{1}; we put
${p}_{{k}_{i},{\alpha}_{1}}={x}_{i}$; i = 2,…,m and
${p}_{{k}_{1},{\alpha}_{2}}={x}_{1}$, which implies that we have x_{1} + x_{2} +… + x_{m} = 1. Then Equation (33) implies that each term H in Equation (42) is zero, because only one value of the image law is non-zero, thus we can replace the only term G by
$F({p}_{{k}_{1},\dots ,}{p}_{{k}_{m}})$, and we get from Equation (41):

Only the term F for α_{1} subsists because, the possible other one, for α_{2}, concerns a certitude.

Consequently, by imposing x_{2} = 1 − x_{1} = a, x_{3} =… = x_{m} = 0, we deduce the identity H (a, 1 − a, 0,…, 0) = F(1 − a, a, 0,…, 0). This gives a recurrence equation to calculate F from the binomial case:

That is due to the fact that F_{α}_{1} is a special case of F, thus independent from Y and α_{1}.

Then coming back to the co-cycle equation, we obtain in particular a functional equation for the binomial variables.

**Lemma 2.** With the notations of the example 1 (cf. example 1), Ω = {(00), (01), (10), (11)}, S_{1} (resp. S_{2}) the projection pr_{1} (resp. pr_{2}), on E_{1} = E_{2} = {0,1}, S = {S_{1}, S_{2}}; then the (measurable) information co-homology of degree one is generated by the entropy, i.e., there exists a constant C such that, for any X in
$W(\Sigma ),P\in \mathbb{P},f(X;P)=CH(X;P)$.

**Proof.** We consider a 1-cocycle f. We have f(1; P) = 0. Let us note f_{i}(P) = f(S_{i}; P), and f_{ijk} (u) the function f (S_{i}; P|(S_{j} = k)), the variable u representing the probability of the first point in the fiber S_{j} = k in the lexicographic order. For each tableau 2 × 2, P = (p_{00}, p_{01}, p_{10}, p_{11}), the symmetry formula (36) gives

_{10}= 0,p

_{00}= u,p

_{11}= v,p

_{01}= 1 − u − v in this relation, we obtain the equation:

By hypothesis, f_{1}, f_{2} depend only on the image law by S_{1}, S_{2} respectively, thus, again by noting a binomial probability from the value of the first element in lexicographic order, we get

By equating u to 1 − v, we find that f_{1}(u) = f_{2}(u); then we arrive to the following functional equation for h = f_{1} = f_{2}:

This is the functional equation which was considered by Tverberg in 1958 [35]. As a result of the works of Tverberg [35], Kendall [36] and Lee (1964, [37]), (see also Kontsevich, 1995 [38]), it is known that every measurable solution of this equation is a multiple of the entropy function:

_{1}, …, x

_{m}) of real numbers such that x

_{1}+ … + x

_{m}= 1,

The same is true for H and G with the appropriate number of variables.

A pair of variables X, Y , such that X, Y, (XY) belong to S, is called an edge of S; we says this edge is rich if X and Y contain at least two elements and (X, Y) at least four elements which cross the elements of X and Y , in such a manner that the Lemma 2 applies if $\mathcal{Q}$ is complete. We say that $\mathcal{S}$ is connected, if every pair of elements X, X′ in $\mathcal{S}$ can be joined by a sequence of edges. We say that $\mathcal{S}$ is sufficiently rich if each vertex belongs to at least one rich edge. By the the recurrence Equation (100), these two conditions guaranty that the constant C which appears in the Lemma 2 is the same for all rich edges. Then the same recurrence Equation (100) implies that the whole co-cycle is equal to CH. If $\mathcal{S}$ has m connected components, we get necessarily m independent constants.

Thus we have established the following result:

**Theorem 1.** For every connected structure of information
$\mathcal{S}$, which is sufficiently rich, and every set of probability
$\mathcal{Q}$, which is complete with respect to
$\mathcal{S}$, the information co-homology group of degree one is one-dimensional and generated by the classical entropy.

The theorem applies to rich simplicial complexes, in particular to the full simplex
$\mathcal{S}=W\left(\Sigma \right)$, which is generated by a family Σ of partitions S_{1}, …, S_{n}, when n ≥ 2, such that, for every i at least of the pairs (S_{i}, S_{j}) is rich.

Note that most of the axiomatic characterizations of entropy have used convexity, and recurrence over the dimension, see Khintchin [39], Baez et al. [20].

In our characterization, we assumed no symmetry hypothesis, this was a consequence of co-homology. Moreover, we do not assume any stability property relating to a higher dimensional simplex, this was also a consequence of the homological definition.

There exists a notion of symmetric information co-homology:

The group of permutations
$\mathfrak{S}\left(\Omega ,\mathcal{B}\right)$, made by the permutations of Ω that respect the algebra
$\mathcal{B}$, acts naturally on the set of partitions Π(Ω); in fact, if X ∈ Π(Ω) is made by the subsets Ω_{1}, …, Ω_{k}, the partition σ^{∗}X is made by the subsets σ^{−}^{1}(Ω_{1}), …, σ^{−}^{1} (Ω_{1}), in such a manner that, if σ, τ are two permutations of Ω, we have τ^{∗}(σ^{∗}X) = (σ ○ τ)^{∗}X.

We say that a classical information structure
$\mathcal{S}$ on
$\left(\Omega ,\mathcal{B}\right)$ is symmetric if it is closed by the action of the group of permutations
$\mathfrak{S}\left(\Omega ,\mathcal{B}\right)$, i.e., if X ∈ S, and σ ∈ S(Ω), the partition σ^{∗}X also belongs to
$\mathcal{S}$.

In the same way, we say that a probability functor
$\mathcal{Q}$ is symmetric, if it is stable under local permutations, i.e., if
$X\in \mathcal{S}$ and
$P\in {\mathcal{Q}}_{X}$, and if
$\sigma \in \mathfrak{S}\left(\Omega /X\right)$, then the probability law σ^{∗}P = P ○ σ on Ω/X also belongs to
${\mathcal{Q}}_{X}$.

Remark that we also have τ^{∗}σ^{∗}P = (σ ○ τ)^{∗}P). Thus the actions of symmetric groups are defined here on the right. However, we have actions to the left by taking σ_{∗} = (σ^{−}^{1})^{∗}. For the essential role of symmetries in information theory, see the article of Gromov in this volume.

A m-cochain
${F}_{X}:{\mathcal{S}}^{m}\times {\mathcal{Q}}_{X}\to \mathbb{R}$ is said symmetric, when, for every
$X\in \mathcal{S}$, every probability
$P\in {\mathcal{Q}}_{X}$, every collection of partitions Y_{1}, …, Y_{m} in
${\mathcal{S}}_{X}$, we have

It is evident that symmetric cochains form a subcomplex of the information cochains complex; i.e., the coboundary of a symmetric cochain being a symmetric cochain. Consequently we get a symmetric information co-homology, that we name ${H}_{\mathfrak{S}}^{*}\left(\mathcal{S};\mathcal{Q}\right)$.

In particular the entropy is a symmetric 1-cocycle.

The above proof of Theorem 1 applies to symmetric cocycle as well, thus, under the convenient hypothesis of connexity, richness, and completeness for $\mathcal{S}$ and $\mathcal{Q}$ we have ${H}_{\mathfrak{S}}^{1}\left(\mathcal{S};\mathcal{Q}\right)=\mathbb{R}H$.

Remark that an equivalent way to look at symmetric information cochains, consists in enlarging the category
$\mathcal{S}$ in a “symmetric category”
${\mathcal{S}}^{\mathfrak{S}}$, by putting an arrow associated to each element
${\sigma}_{X}\in \mathfrak{S}\left(\Omega /X\right)$ from X to σ_{∗}X, and completing the category by composing the two kind of arrows, division and permutation. In this case, the probability functor
$\mathcal{Q}$ must behave naturally with respect to permutation, which implies it is symmetric. Moreover, the natural notion of functional sheaf and local cochains are a symmetric sheaf and symmetric cochains.

#### 2.4. Appendix. Complex of Possible Events

In each concrete situation, physical constraints produce exclusion rules between possible events, which select a sub-complex $\mathcal{Q}$ in the full probability simplex $\mathbb{P}={\mathrm{\Delta}}_{N}$ on Ω. The aim of this appendix is to make this remark more precise.

Let A^{0}, A^{1}, A^{2}, A^{3}, … the N + 1 vertices of the large simplex Δ_{N}, a point of Δ_{N} is interpreted as a probability ℙ on the set of thee vertices; each vertex can be seen as an elementary event, and we will say that a general event A is possible for ℙ when ℙ(A) is different from zero. An event A is said impossible for P in the other case, that is when ℙ(A) = 0.

The star S(A) of a vertex A of Δ_{N} is the complementary set of the opposite face to A, i.e., it is the set of probabilities P in Δ_{N} such that A is possible, i.e., has non-zero probability. The relative star S(A|K) of A in subcomplex K is the intersection of the star of A with K.

We denote F = (A, B, C, D, …) the face of Δ_{N} whose vertices are A, B, C, D, …. We note L(F) the set of points p in Δ_{N} such that at least one of the points A, B, C, D, … is impossible for p. This is also the reunion of the faces which are opposite to the vertices A, B, C, D, … . Then L(F) is a simplicial complex. The complementary set in F of the interior of F , i.e., the boundary of F , is the reunion of the intersections of F with all faces opposite to A, B, C, D, …; it is also the set of probabilities p in F such that at least one of the points A, B, C, D, … is impossible for p, thus it is equal to L(F) ∩ F . If G is a face containing F the complex L(G) contains the complex L(F).

Let K be a simplicial complex contained in a N-simplex; then K is obtained by deleting from Δ_{N} a set E = E_{K} of open faces. Let
$\dot{F}=F\backslash \partial F$ be an element of E, then each faces G of Δ_{N} containing F belongs to E, because K is a complex.

In this case K is contained in L(F). In fact L(F) is the smallest sub-complex of Δ_{N} which does not contain
$\dot{F}$. This can be proved as follows: if p in K makes that every vertices of F is possible, it belongs to a face G such that every vertex of F is a vertex of G, thus K contains G which contains F . So, if K does not contain
$\dot{F}$, K is contained in L(F).

Let L = L_{K} be the intersection of the L(F), where F describe the faces in E_{K}. From what precedes we know that K is contained in L. However, every
$\dot{F}$ in E is included in the complementary set of L(F), thus it is included in the complementary set of L, which is the union of the complementary sets of the L(F). Consequently the complementary set of K is included in the complementary set of L. Then K = L.

This discussion establishes the following result:

**Theorem 2.** A subset K of the simplex Δ_{N} is a simplicial sub-complex if and only if it is defined by a finite number of constraints of the type: “for any p in K, the fact that A, B, C, … are possible for p implies that D is impossible for p”.

In other terms, more imaged but also more ambiguous, every sub-complex K is defined by constraints of the type: “if A, B, C, … are simultaneously allowed it is excluded that D can happen”.

The statement of the theorem is just a rewriting of the discussion, using elementary propositional calculus: let K be a sub-complex of Δ_{N}, we have shown that K is the intersection of the L(F) where the open face
$\dot{F}$ is not in K, but if A, B, C, D, … denote the vertices of the face F, a point p belongs to L(F) if and only if “(A is impossible for p) or (B is impossible for p) or …”, and this sentence is equivalent to “if (A is possible for p) and (B is possible for p) and …, then (D is impossible for p)”. This results from the equivalence between “(P implies Q) is true” and “(no P or Q) is true”. Reciprocally any L(F) is a simplicial complex, then every intersection of sets of the form L(F) is a simplicial complex too.

## 3. Higher Mutual Informations. A Sketch

The topological co-boundary operator on C^{∗}, denoted by δ_{t}, is defined by the same formula as δ, except that the first term Y_{1}.F (Y_{2}; …; Y_{n}; ℙ) is replaced by the term F(Y_{2}; …; Y_{n}; ℙ) without Y_{1}:

It is the coboundary of the bar complex for the trivial module ${\mathcal{F}}_{t}$, which is the same as $\mathcal{F}$ except no conditioning appears, i.e., Y.F = F . Hence it is the ordinary simplicial co-homology of the complex S with local coefficients in $\mathcal{F}$.

Remark that this operator also preserves locality, because all the functions of ℙ which comes in the development depends only on (Y_{2}, …, Y_{n}) ∗ ℙ, (Y_{1}, …, Y_{n}) ∗ ℙ and (Y_{1}, …, Y_{n−}_{1}) ∗ ℙ.

By definition a topological cocycle of information is a cochain F that satisfies δ_{t}F = 0, and a topological co-boundary is an element in the image of δ_{t}.

It is easy to show that δ_{t} ○ δ_{t} = 0, which allows to define a co-homology theory that we will name topological co-homology.

Now assume that the information structure
$\mathcal{S}$ is a set W (Σ) = Δ(n) generated by a family Σ of partitions S_{1}, …, S_{n}, when n ≥ 2.

Higher mutual information quantities were defined by Hu Kuo Ting [6] (see also Yeung [40]), generalizing the Shannon mutual information.

S_{I} denoting the joint partition of the S_{i} such that i ∈ I. We also define I_{1} = H.

The definition of I_{N} makes evident it is a symmetric function, invariant by all permutation of the partitions S_{1}, …, S_{N}.

For instance I_{2}(S; T) = H(S) + H(T) − H(S, T) is the usual mutual information.

It is easily seen that I_{2} = δ_{t}H. The following formula generalizes this remark to higher mutual informations of even orders:

And for odd mutual information we have

We deduce from here that higher mutual informations are co-boundaries for δ or δ_{t} according that their order is odd or even respectively.

The result which proves the two above formulas is the following:

**Lemma 3.** Let n be even or odd we have

This lemma can be proved by comparing the completely developed forms of the quantities. It seems to signify that, with respect to one variable, I_{N} satisfies the equation of information 1-cocycle, thus I_{N} seems to be a kind of “partial 1-cocycle”; however this is misleading, because the locality condition is not satisfied. In fact I_{N} is a N-cocycle, either for δ, either for δ_{t} depending on the parity of N.

For any N-cochain F we have

_{0}− 1 denotes the sum of the two operators of mean conditioning and minus identity.

That implies:

**Remark 3.** Reciprocally the functions I_{N} decompose the entropy of the finest joint partition:

For example, we have H(S, T) = I_{1}(S) + I_{1}(T) − I_{2}(S; T), and

Let us also note the recurrence formula whose proof is left to the reader (cf. Cover and Thomas [41]):

## 4. Quantum Information and Projective Geometry

#### 4.1. Quantum Measure, Geometry of Abelian Conditioning

In finite dimensional quantum mechanics the role of the finite set Ω of atomic events is played by a complex vector space E of finite dimension.

In fact, to each set Ω, of cardinal N, is naturally associated a vector space of dimension N over ℂ, which is the space freely generated over ℂ by the elements of Ω. Then we can identify E with ℂ^{N}, the canonical basis being the points x of Ω. In this case the canonical positive hermitian metric on E corresponds to the quadratic mean: if f and g are elements of E, we have

Remark that, in the infinite dimensional situation, the space which would play the role of E is the space of L^{2} functions for a fixed probability P_{0}.

Probability laws ℙ, which are elements of the big simplex Δ(N), give other hermitian structures, the ones which are expressed by diagonal matrices, with positive coefficients, and trace equal to 1.

In the general quantum case, described by E, a quantum probability law is every positive non-zero hermitian product h. If a basis is chosen, h is described by an N × N-matrix ρ. In the physical literature, every such ρ is called a density of states; and it is considered as a full description of the physical states of the finite quantum system. Usually ρ is normalized by Tr(ρ) = 1.

Note that this condition on the trace has no meaning for a positive hermitian form h if no additional structure is given, for instance a non-degenerate form h_{0} of reference. Why is it so? Because a priori a hermitian form h on E is a map from E to
${\overline{E}}^{\ast}$, where ∗ denotes duality and bar denotes conjugation, the conjugate space
$\overline{E}$ being the same set E, with the same structure of vector space over the real numbers as E, but with structure of vector space over the complex numbers changed by changing the sign of the action of the imaginary unit i. The complexification of the real vector space H of hermitian forms is
${H}_{om\u2102}(E,{\overline{E}}^{\ast})\cong {E}^{\ast}\otimes {\overline{E}}^{\ast}$. The space H is the set of fixed points of the ℂ-anti-linear map u ↦^{t} ū. A trace is defined for an endomorphism of the space E, as a linear invariant quantity on E^{*} ⊗ E. Here we could take the trace over ℝ, because E and
$\overline{E}$ are the same over ℝ, but the duality would be an obstacle, because even over the field ℝ, the spaces E and E^{*} cannot be identified, and there exits no linear invariant in E^{*} ⊗ E^{*}, even over ℝ. In fact, a non-degenerate positive h_{0} is one of the way to identify E and
${\overline{E}}^{\ast}$. A basis is another way, also defining canonically a form h_{0}. More precisely, when h_{0} is given, every hermitian form h diagonalizes in an orthonormal basis for h_{0}, thus all the spectrum of h makes sense not only the trace.

This h_{0} is tacitly assumed in most presentations. However it is better to understand the consequences of this choice. In non-relativistic quantum mechanics, it is not too grave, however in relativist quantum mechanics, it is; for instance, considering the system of two states as a spinor on the Lorentz space of dimension 4, the choice of h_{0} is equivalent to the choice of a coordinate of time. See Penrose and Rindler [42].

A much less violent way to do is to consider hermitian structures h up to multiplication by a strictly positive number. This would have the same effect as fixing the trace equals to one, without introducing any choice. In quantum mechanics only non-zero positive h are considered, not necessarily positive definite, but non-zero. This indicates that a good space of states is not the set H_{+} of all positive non-zero hermitian products but a convex part PH_{+} of the real projective space of real lines in the vector space H of hermitian forms. In this space, the complex projective space ℙ(E) of dimension N − 1 over ℂ is naturally embedded, its image consists of the rank one positive hermitian matrices of trace 1; these matrices correspond to the orthogonal projectors on one dimensional directions in E.

When a basis of E is chosen, particular elements of ℙ(E) are given by the generators of ℂ^{N}; they correspond to the Dirac distributions on classical states. We see here a point defended in particular by Von Neumann, that quantum states are projective objects not linear objects.

The classical random variables, i.e., the measurable functions on Ω with values in ℂ, are generalized in Quantum Mechanics by the operators in E, they are all the endomorphisms, i.e., any N × N-matrix, and they are named observables. Classical observables are recovered by diagonal matrices, their action on E corresponding to the multiplication of functions. Real valued variables are generalized by hermitian operators. Again this supposes that a special probability law h_{0} is given. If not “to be hermitian” for an operator has no meaning. (What could have a meaning for an operator is to be diagonalizable over R, which is something else.)

Then if h_{0} is chosen, the only difference between real observable and density of states is the absence of the positivity constraint.

By definition, the amplitude, or expectation, of the observable Z in the state ρ is the number given by the formula

It is important to note that h_{0} plays a role in this formula. Consequently the definition of expectation requires to fix an h_{0} not only a ρ. This imposes a departure from the relativistic case, which shall not be surprising, since considerations in relativistic statistical physics show that the entropy, for instance, depends on the choice of a coordinate for time. Cf. Landau-Lifschitz, Fluid Mechanics, second edition [43].

The partitions of Ω associated to random variables are replaced in the quantum context by the spectral decompositions of the hermitian operators X. As h_{0} is given, this decomposition is given by a set of positive hermitian commuting projectors of sum equal to the identity. The additional data for recovering the operator X is one real eigenvalue for each projector. The underlying fact from linear algebra is that every hermitian matrix is diagonalizable in a unitary basis, which means that

_{j}are real, two by two different, and where the matrices E

_{j}are hermitian projectors, which satisfy, for any j and k ≠ j,

When the hermitian operator Z commutes with the canonical projectors on the axis of ℂ^{N}, its spectral measure gives an ordinary partition of the canonical basis, and we recover the classical situation.

Note that the extension of the notion of partition is given by any decomposition of the vector space E in orthogonal sum, not necessarily compatible with a chosen basis. Again this assumes a given positive definite h_{0}.

To generalize what we presented in the classical setting, quantum information theory must use only the spectral support of the decomposition, not the eigenvalues.

It would have been tempting to consider any decomposition of E in direct sum as a possible observable, however not every linear operator, or projective transformation, corresponds to such a decomposition, due to the existence of non-trivial nilpotent operators. What could be their role in quantum information? Moreover, the presence of h_{0} fully justifies the limitation to orthogonal decompositions.

In the general case, hermitian but not necessarily diagonal, we define the probability of the elementary events Z = z_{j} by the following formula

And we define the conditional probability ρ|(Z = z_{j}) by the formula

One can notice that this definition can be extended to any projector, not necessarily hermitian. By definition, the conditioning of ρ by a projector Y is the matrix Y^{*}ρY, normalized to be of trace 1. However, here, as it is done in most of the texts on Quantum Mechanics, we will mostly restrict ourselves to the case of hermitian projectors, i.e., Y^{*} = Y.

**Remark 4.** What justifies these definitions of probability and conditioning? First they allow to recover the classical notions when we restrict to diagonal densities and diagonal observables, i.e., when ρ is diagonal, real, positive, of trace 1, Z is diagonal, and the E_{j} are diagonals, in which case they give a partition of Ω. The mean of Z is its amplitude. The probability of the event Z = z_{j} is the sum of the probabilities p(ω) = ρ_{ωω} for ω in the image of E_{j}; this the trace of ρE_{j}. Moreover, the conditioning by this event is the probability obtained by projection on this image, as prescribed by the above formula.

Second, pure states are defined as rank one hermitian matrices. In this case ρ is the orthogonal projection on a vector ψ of norm equal to 1 (the finite dimensional version of the Schrodinger wave vector), the exact relation is

Let Z be any hermitian operator, the result of quantum experiments indicate that the probability of the event Z = z_{j}, for the state ψ, is equal to

But this quantity can also be written

Starting from this formula and the fact any ρ can be written as a classical mixture of commuting quantum pure states,

Moreover, physical experiments indicate that after the measurement of an observable Z, giving the quantity z_{j}, the system is reduced to the space E_{j}, and every pure state ψ is reduced to its projection E_{j}ψ, which is compatible with the above definition of conditioning for pure states. Here again, the general formula can be deduced by Equation (74). The division by the probability is achieved to normalize to a trace 1. Thus conditioning in general is given by orthogonal projection in E, and it corresponds to the operation of measurement.

However, as claimed in particular by Roger Balian [44], the fact that the decomposition in pure states is non-unique implies that pure states cannot be so pertinent for understanding quantum information.

**Definition 3.** The density of states associated to a given variable Z and a given density ρ is given by the sum:

_{j})

_{j}

_{∈}

_{J}designates the spectral decomposition of Z, also named spectral measure of Z. Thus ρ

_{Z}is usually seen as representing the density of states after the measurement of the variable Z. This formula is usually interpreted by saying that the statistical analysis of the repeated measurements of the observable Z transforms the density ρ into the density ρ

_{Z}.

Remark that ρ_{Z} is better understood as being a collection of conditional probabilities ρ|(Z = z_{j}), indexed by j.

In quantum physics as in classical physics the symmetries, discrete and continuous, have always played a fundamental role. For example, in quantum mechanics, a fundamental principle is the unitarity of the evolution in time, which claims that the states evolve as ρ_{t} = U_{t}ρ and that the observables evolve as
${Z}_{t}={U}_{t}Z{U}_{t}^{-1}$, with U_{t} respecting the fundamental scalar product h_{0}. In fact, as we already mentioned, a deeper principle associates the choice of a time coordinate t to the choice of h_{0}, which gives birth to a unitary group U(E; h_{0}), isomorphic to U_{N}(ℂ). For stationary systems the family (U_{t})_{t}_{∈ℝ} forms a one parameter group, i.e., U_{t}_{+}_{s} = U_{t}U_{s} = U_{s}U_{t}, and there exists a hermitian generator H of U_{t} in the sense that U_{t} = exp(2π itH/h); by definition, this particular observable H is the energy, the most important observable. Even if we have a privileged basis, like Ω in the relation with classical probability, the consideration of another basis which makes the energy H diagonal is of great importance. In the stationary case, a symmetry of the dynamical system is defined as any unitary operator, which commutes with the energy H. The set of symmetries forms a Lie group G, a closed sub-group in U_{N}. The infinitesimal generators are considered as hermitian observables (obtained by multiplying the elements of the Lie algebra L(G) by i); in general they do not commute between themselves.

All these axioms extend to the infinite dimensional situation when E has a structure of an Hilbert space, but the spectral analysis of the un-bounded operators is more delicate and diverse than the analysis in finite dimension. Three kinds of spectrum appear, discrete, absolutely continuous and singular continuous. The symmetries could not form a Lie group in general, and so on.

In our simple case of elementary quantum probability, without fixed dynamics, the classical symmetries of the set of probabilities are given by the permutations of Ω, the vertices of Δ(N). They correspond to the unitary matrices which have one and only one non-zero element in each line and each column. They do not diagonalize in the same basis because they do not commute, but they form a group
${\mathcal{S}}_{N}$. Another subgroup of U_{N} is natural for semi-classical study, it is the diagonal torus
${\mathbb{T}}^{N}$, its elements are the diagonal matrices with elements of modulus 1, they correspond to sets of angles. The group
${\mathcal{S}}_{N}$ normalizes the torus
${\mathbb{T}}^{N}$, i.e., for each permutation σ and each diagonal element Z, the matrix σZσ^{−1} is also diagonal; its elements are the same as the elements of Z but in a different orders. The subgroup generated by
${\mathcal{S}}_{N}$ and
${\mathbb{T}}^{N}$ is the full normalizer of
${\mathbb{T}}^{N}$.

One of the strengths of the quantum theory, with respect to the classical theory, is that it gives a similar status to the states, the observables and the symmetries. States are hermitian forms, generalizing points in the sphere (or in the projective space) which are pure states, observables are hermitian operators, or better spectral decompositions, and symmetries are unitary operators, infinitesimal symmetries being anti-hermitian matrices.

All classical groups should appear in this framework. First, by choosing a special structure on E we restrict the linear group GL_{N}(ℂ) to an algebraic subgroup G_{ℂ}. For instance, by choosing a symmetric invertible bilinear form on E we obtain O_{N}(ℂ), or, when N is even, by choosing an antisymmetric invertible bilinear form on E we obtain Sp_{N}(ℂ). In each of these cases there exists a special maximal torus (formed by the complexification of a maximal abelian subgroup T of unitary operators in G_{ℂ}), and a Weyl group, which is the quotient of the normalizer N(T) by the torus T itself. This Weyl group generalizes the permutation group when more algebraic structures are given in addition to the linear structure. The compact group of symmetries is the intersection G of G_{ℂ} with U_{N}. In fact, given any compact Lie group G_{c}, and any faithful representation r_{c} of G_{c} in ℂ^{N}, we can restrict real observables to generators of elements in C_{c}, and general observables to complex combinations of these generators, which integrate in a reductive linear group G. The spectral decomposition corresponds to the restriction to parabolic sub-groups of G_{ℂ}. The densities of states are restricted to the Satake compactification of the symmetric space G_{ℂ}/G_{c} [45].

#### 4.2. Quantum Information Structures and Density Functors

To define information quantities in the quantum setting, we have a priori to consider families of operators (Y_{1}, Y_{2}, …, Y_{m}) as joint variables. However, the efforts made in Physics and Mathematics were not sufficient to attribute a clear probability to the joint events (Y_{1} = y_{1}, Y_{2} = y_{2}, …, Y_{m} = y_{m}), when Y_{1}, …, Y_{m} do not commute; we even suspect that this difficulty is revelator of a principle, that information requires a form of commutativity. Thus, in our study, we will adopt the convention that every time we consider joint observables, they do commute. Hence we will consider only collections of commuting hermitian observables; their natural amplitudes in a given state are vectors in ℝ^{m}. However we do not exclude the consideration in our theory of sequences (Y_{1}; …; Y_{m}) such that the Y_{i} do not commute.

A joint observable (Y_{1}, Y_{2}, …, Y_{m}) define a linear decomposition of the total space E in direct orthogonal sum

_{α}; α ∈ A is the collection of joint eigenspaces of the operators Y

_{j}. Note that any orthogonal decomposition can be defined by a unique operator.

Another manner to handle the joint variables is to consider linear families of commuting operators

^{m}to End(E). Then assigning a probability number and perform probability conditioning can be seen as functorial operations.

In what follows we denote indifferently by E_{α} the subspace of E or the orthogonal projection on this subspace.

>From the point of view of information, two sets of observables are equivalent if they give the same linear decomposition of E. We say that a decomposition E_{α}; α ∈ A refines a decomposition E′_{β}; β ∈ B, when each E′_{β} is a sum of spaces E_{α} for α in a subset A_{β} of A. In such a case, we say that E_{α}; α ∈ A divides E′_{β}; β ∈ B.

For instance, for commuting decompositions Y, Z it is possible to define the joint variable, as the less fine decomposition which is finer than Y and Z.

We insist that only decompositions have a role in information study at this moment. We will see that observation trees in the last section imposes to consider a supplementary structure, which consists in an ordering of the factors in the decomposition.

An information structure on E is a set **S** of decompositions X of E in direct sum, such that when Y and Z are elements of **S** which refine X ∈ **S**, then Y, Z commute and the finer decomposition (Y, Z) they generate belongs to **S**. In this text, we will only consider orthogonal decompositions.

Remark: in fact, the necessity of this condition in the quantum context was the original motivation to introduce the definition of classical information structure, as exposed in the first section. This can be seen as a comfortable flexibility in the classical context, or as a step from classical to quantum information theory.

As in the classical case, an information structure gives a category, denoted by the letter **S**, whose objects are the elements of **S**, and whose arrows X → Y are given by the divisions X|Y between the decompositions in **S**.

In what follows we always assume that 1, which corresponds to the trivial partition E, belongs to **S**, and is a final object. If not we will not get a topos.

Note that we are not the first to use categories and topos to formulate quantum or classical probability. In particular Doring and Isham propose a reformulation of the whole quantum and classical physics by using topos theory, see [46] and references inside. This theory followed remarkable works of Isham, Butterfield and Hamilton, made beween 1998 and 2002, and was further developed by Flori, Heunen, Landsman, Spitters, specially in the direction of a quantum logic. A common point between these works and our work is the consideration of sheaves over the category made by the partial ordering in commutative subalgebras. However, Doring et al. consider only the set of maximal algebras, and do not look at decompositions, i.e., they consider also the spectral values. In [46], Doring and Isham defined topos associated to quantum and classical probabilities. However, they focused on the definition of truth values in this context. For instance, in the classical setting, the topos they define is the topos of ordinary topological sheaves over the space (0, 1)_{L} which has for open sets the intervals]0, r[for 0 ≤ r ≤ 1, and particular points in their topos are given by arbitrary probabilized spaces, which is far from the objects we consider, because our classical topos are attached to sigma-algebras over a given set. In fact, our aim is more to develop a kind of geometry in this context, by using homological algebra, in the spirit of Artin, Grothendieck, Verdier, when they developed topos for studying the geometry of schemes.

**Example 5.** The most interesting structures **S** seem to be provided by the quantum generalization of the simplicial information structure in classical finite probability. A finite family of commuting decompositions Σ = {S_{1}, …, S_{n}} is given, they diagonalize in a common orthogonal basis, but it can happen that not all diagonal decompositions associated to the maximal torus belongs to the set of joints W (Σ). In such a case a subgroup G_{Σ} appears, which corresponds to the stabilizer of the finest decomposition S_{[n]} = (S_{1}…S_{n}). This group is in general larger than a maximal torus of U_{N}, it is a product of unitary groups (corresponding to common eigenvalues of observables in W (Σ)), and it is named a Levy subgroup of the unitary group. In addition we consider a closed subgroup G in the group U(E; h_{0}) (which could be identified with U_{N}), and all the conjugates gY g^{−1} of elements of W (Σ) by elements of G; this gives a manifold of commutative observable families Σ_{g}; g ∈ G. More generally we could consider several families Σ_{γ}; γ ∈ Γ of commuting observables, where Γ is any set. It can happen that an element of Σ_{γ} is also an element of Σ_{λ} for λ ≠ γ. The family Γ ∗ Σ of the Σ_{γ} when γ describes the set Γ forms a quantum information structure. The elements of this structure are (perhaps ambiguously) parameterized by the product of an abstract simplex ∆(n) with the set Δ (in particular Γ = G for conjugated families).

A simplicial information structure is a subset of Γ ∗ Σ which corresponds to a family K_{γ} of simplicial sub-complexes of ∆(n). In the invariant case, when Γ = G, several restrictions could be usefull, for instance using the structure of the manifold of the conjugation classes of G_{Σ} under G. The simplest case is given by taking the same complex K for all conjugates gΣg^{−1}. By definition this latter case is a simplicial invariant family of quantum observables.

An event associated to **S** is a subspace E_{A}, which is an element of one of the decompositions X ∈ **S**. For instance, if Y = (Y_{1}, …, Y_{m}), the joint event A = (Y_{1} = y_{1}, Y_{2} = y_{2}, …, Y_{m} = y_{m}) gives the space E_{A} which is the maximal vector subspace of E where A happens, i.e.,

We say that A is measurable for a decomposition Y whenever it is obtained by unions of elements of Y.

The role of the Boolean algebra
$\mathcal{B}$ introduced in the first section, could have been accounted here by a given decomposition **B** of E such that any decomposition in **S** is divided by **B**.

However this choice of **B** is too rigid, in particular it forbids invariance by the unitary group U(h_{0}). Thus we decided that a better analog of the Boolean algebra
$\mathcal{B}$ is the set U**B** of all decompositions that are deduced from a given **B** by unitary transformations.

On the side of density of states, i.e., quantum probabilities, we can consider a subspace **Q**_{1} of the space **P** = ℙ**H**_{+} of hermitian positive matrices modulo multiplication by a constant. Concretely, we identify the elements of **Q**_{1} with positive hermitian operators ρ such that T rρ = 1. The space **P** is naturally stratified by the rank of the form; the largest cell ℙ**H**_{++} corresponds to the non-degenerate forms; the smallest cells correspond to the rank one forms, which are called pure states in Quantum Mechanics.

We will only consider subsets **Q**_{1} of **P** which are adapted to **S**, i.e., which satisfy that if ρ belongs to **Q**_{1}, the conditioning of ρ by elements of **S** also belongs to **Q**_{1}. This means that **Q**_{1} is closed by orthogonal projections on all the elements E_{A} of the orthogonal decompositions X belonging to **S**. Note that a subset of **P** which is closed by all orthogonal projections is automatically adapted to any information category **S**.

Remind that, if ρ is a density of states and E_{A} is an elementary event (i.e., a subspace of E), we define the conditioning of ρ by A by the hermitian matrix

And we define the probability of the event E_{A} for ρ as the trace:

In the same manner we define the density of a joint observable by

A nice reference studying important examples is Paul-Andre Meyer, Quantum probability for probabilists [47].

If X is an orthogonal decomposition of E, we can associate to it a subset **Q**_{X} of **Q**_{1}, which contains at least all the forms ρ_{X} where ρ belongs to **Q**_{1}. The natural axiom that we assume for the function X ↦ **Q**_{X}, is that for each arrow of division X → Y , the set **Q**_{Y} contains the set **Q**_{X}; then we note Y_{∗} the injection from **Q**_{X} to **Q**_{Y} . The fact that **Q**_{X} is stable by conditioning by every element of a decomposition Y which is less fine than X is automatic; it follows from the fact that **Q**_{1} is adapted to **S**. We will use conditioning in this way.

In what follows we denote by the letter **Q** such a functor X ↦ **Q**_{X} from the category **S** to the category of quantum probabilities, with the arrows given by direct images. The set **Q**_{1} is the value of the functor **Q** for the certitude 1. We must remind that many choices are possible for the functor when **Q**_{1} is given; the two extreme being the functor **Q**^{max} where **Q**_{X} = **Q**_{1} for every X, and the functor **Q**^{min} where **Q**_{X} is restricted to the set of forms ρ_{X} where ρ describes **Q**_{1}; in this last case the elements of **Q**_{X} are positive hermitian forms on E, which are decomposed in blocs according to X.

From the physical point of view, **Q**^{min} appears to have more sense than **Q**^{max}, but we prefer to consider both of them.

A special probability functor, which will be noted **Q**^{can}(**S**), is canonically associated to a quantum information structure **S**:

**Definition 4.** The canonical density functor${\mathbf{Q}}_{X}^{can}(\mathbf{S})$, is made by all positive hermitian forms matched to X, i.e., all the forms ρ_{X} when ρ describes P**H**_{+}.

It is equal to the functor **Q**^{min} associated to the full set **Q**_{1} = P**H**_{+}. When the context is clear, we will simply write **Q**^{can}.

An important difference appears between the quantum and the classical frameworks: if X divides Y, there exist more (quantum) probability laws in **Q**_{Y} than in **Q**_{X}, but there exist less classical laws at the place Y than at the place X, because classical laws are defined on smaller sigma-algebras.

In particular, the trivial partition has only one classical state, which is Tr(ρ) = 1, but it has the richest structure in terms of quantum laws, any hermitian positive form.

Let us consider the classical probabilities, i.e., the maps that associate the number P_{ρ}(A) to an event A; then, for an event which is measurable for Y, the law Y_{∗}ρ_{X} gives the same result than the law ρ_{X}.

Remark: This points to a generalized notion of direct image, which is a correspondence q_{X}Y_{∗} between **Q**_{X} and **Q**_{Y} , not a map: we say that the pair (ρ_{X}, ρ_{Y}) in **Q**_{X} × **Q**_{Y} belongs to q_{X}Y_{∗}, if for any event which is measurable for Y, we have the equality of probabilities

Let us look at the relation of quantification, between a classical information structure and a quantum one:

Consider a maximal family of commuting observables
$\mathcal{S}$ in the quantum information structure **S**, i.e., the full subcategory associated to an initial object X_{0}. This family is a classical information structure. Conversely, if we start with a classical information structure
$\mathcal{S}$, made by partitions of a finite set Ω, we can always consider it as a quantum structure associated to the vector space E = ℂ^{Ω} freely generated over ℂ by the elements of Ω. Note that E comes with a canonical positive definite form h_{0}, and, to be interesting from the quantum point of view, it is better to extend
$\mathcal{S}$ by applying to it all unitary transformations of E, generating a quantum structure
$\mathcal{S}=U\mathcal{S}$.

**Remark 5.** Suppose that **S** is unitary invariant, we can define a larger category **S**^{U} by taking as arrows the isomorphisms of ordered decomposition, and close by all compositions of arrows of **S** with them. Such an invariant extended category **S**^{U} is not far to be equivalent to the category
${\mathcal{S}}^{\mathfrak{S}}$, made by adding arrows for permutations of the sets Ω/X (cf. above section), from the point of view of category theory: let us work an instant, as we will do in the last part of this paper, with ordered partitions of Ω, being itself equipped with an order, and ordered orthogonal decompositions of E. In this case we can associate to any ordered partition X = (E_{1}, …, E_{m}) of E, the unique ordered partition Ω compatible with the sequence of dimensions and the order of Ω. It gives a functor τ from **S** to
$\mathcal{S}$ such that
$\iota \phantom{\rule{0.2em}{0ex}}\circ \phantom{\rule{0.2em}{0ex}}\tau =I{d}_{\mathcal{S}}$, where ι denotes the inclusion of
$\mathcal{S}$ in **S**. These two functors are extended, preserving this property, to the categories **S**^{U} and
${\mathcal{S}}^{\mathfrak{S}}$. In fact, the functor ι sends a permutation to the unitary map which acts by this permutation on the canonical basis, and the functor τ sends a unitary transformation g between X ∈ **S** and gXg^{∗} ∈ **S** to the permutation it induces on the orthogonal decompositions. Moreover, consider the map f which associates to any X ∈ S^{U} the unique morphism from the decomposition ι ◦ τ(X) to X; it is a natural transformation from the functor ι ◦ τ to the functor
$I{d}_{{\mathcal{S}}^{U}}$, which is invertible, then it defines an equivalence of category between
${\mathcal{S}}^{\mathfrak{S}}$ and **S**^{U}. However a big difference begins with probability functors.

Let **Q** be a quantum density functor adapted to **S**, and note ι^{∗}**Q** the composite functor on
$\mathcal{S}$; we can consider the map Q which associates to
$X\in \mathcal{S}$ the set of classical probabilities ℙ_{ρ} for ρ ∈ **Q**_{X}. If X divides Y, the fact that the direct image Y_{∗}ℙ(ρ) of ρ ∈ **Q**_{X} coincides with the law
${\mathbb{P}}_{{Y}_{*}\phantom{\rule{0.2em}{0ex}}(\rho )}$ gives the following result:

**Lemma 4.** p ↦ ℙ_{ρ} is a natural transformation from the functor ι^{∗}**Q** to the functor Q.

**Definition 5.** This natural transformation is called the Trace, and we denote by T r_{X} its value in X, i.e., T r_{X}(ρ) = ℙ_{ρ}, seen as a map from **Q**_{X} to
${\mathcal{Q}}_{X}$.

In general there is no natural transformation in the other direction, from
${\mathcal{Q}}_{X}$ to **Q**_{X}.

Remark that the trace sends a unitary invariant functor to a symmetric functor.

#### 4.3. Quantum Information Homology

As in the classical case, we can consider the ringed site given by the category **S**, equipped with the sheaf of monoids {**S**_{X}; X ∈ **S**}. In the ringed topos of sheaves of **S**-modules, the choice of a probability functor **Q** generates remarkable elements in this topos, formed by the functional space **F** of measurable functions on **Q** with values in ℝ. The action of the monoid (or the generated ring) being given by averaged conditioning, and the arrows being given by transposition of direct images. Then, the quantum information co-homology is the topos co-homology:

However, as in the classical case, we can define directly the co-homology with a bar resolution of the constant sheaf, as follows:

A set of functions F_{X} of m observables Y_{1}, …, Y_{m} divided by X, and one density ρ indexed by X ∈ **S**, is said local, when for any decomposition X dividing a decomposition Y, we have, for each ρ in **Q**_{X},

For m = 0 this equation expresses that the family F_{X} is an element of the topos.

For every m, a collection F_{X}, X ∈ **S** is a natural transform F from a free functor **S**_{m} to the functor **F**.

Be careful that in the quantum context, it is not true in general that locality is equivalent to the condition saying that the value F_{X}(Y_{1}; …; Y_{n}; ρ) depends only on the family of conditioned densities
${E}_{{A}_{i}}^{*}\rho {E}_{{A}_{\iota}};\phantom{\rule{0.2em}{0ex}}i=0,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m$, where A_{i} is one of the possible events defined by Y_{i}.

In fact it depends on the choice of **Q**; for instance it is false for a **Q**^{max}, but it is true for a **Q**^{min}.

The counter-example in the case of **Q**^{max} is given by a function F (ρ) which is independent of X. It is local (in the sense of topos that we adopt) but it is non-local in the apparently more natural sense that it depends only of ρ_{X}. This is important to have this quantum particularity in the mind for understanding the following discussion.

As in the classical case, the action of observables on local functions is given by the average of conditioning, in the manner of Shannon, but using the Von Neumann conditioning:

_{A}’s are the spectral projectors of the bundle Y. In this definition there is no necessity to assume that Y commutes with the Y

_{j}’s.

Remind that, when E^{∗}_{A}ρE_{A} is non-zero, ρ|A is equal to E^{∗}_{A}ρE_{A}/T r(E_{A}^{∗}ρE_{A}), and verifies the normalization condition that the trace equals to one. When E^{∗}_{A}ρE_{A} is equal to zero, the factor T r(E^{∗}_{A}ρE_{A}) is zero, then by convention the corresponding term F is absent.

The proof of the Lemma 1 applies without significant change to prove that the above formula defines an action of the monoid functor **S**_{X}.

Then, the definition of co-homology is given exactly as we have done for the classical case, by introducing the Hochschild operator:

The Von-Neumann entropy is defined by the following formula

For any density functor **Q** which is adapted to **S**, the Von-Neumann entropy defines a local 0-cochain, that we will call S_{X}, and is simply the restriction of S to the set **Q**_{X}. If ρ belongs to **Q**_{X} and if X divides Y , the law Y_{∗}ρ, which is the same hermitian form as ρ belongs to **Q**_{Y} by functoriality, thus S(Y_{∗}ρ) = S(ρ) is translated by S_{X}(ρ) = S_{Y} (Y_{∗}ρ). This 0-cochain will be simply named the Von Neumann entropy.

In the case of **Q**^{max}, S_{X} gives the same value at all places X. In the case of **Q**^{min} it coincides with S(ρ_{X}), where ρ_{X} denotes the restriction to the decomposition X.

Be careful: ρ ↦S(ρ_{X}) is not a local 0-cochain for **Q**^{max}. In fact in the case of **Q**^{max} we have the same set **Q** = **Q**_{X} for every place X, thus, if we take for X a strict divisor of Y and if we take a density ρ such that, for the restrictions of ρ, the spectrum of ρ_{Y} and ρ_{X} are different, then, in general, we do not have S_{X}(ρ) = S_{Y} (Y_{∗}ρ), even if, as it is the case in the quantum context, Y_{∗}ρ = ρ.

Remark that in the case of **Q**^{max}, where every function of ρ independent of X is a cochain of degree zero, the particular functions which depends only on the spectrum of ρ are invariant under the action of the unitary group, and they are the only 0-cochains which are invariant by this group.

**Definition 6.** Suppose that **S** and **Q** are invariant by the unitary group, as is U**B**, we say that an m-cochain F is invariant, if for every X in **S** dividing Y_{1}, …, Y_{m} in S, every ρ in **Q**_{X} and every g in the group U(h_{0}), we have

^{∗}, g.Y

_{i}= gY

_{i}g

^{∗}; i = 1, …, m and g.ρ = gρg

^{∗}.

This is compatible with the naturality assumption (functoriality by direct images), because direct image is a covariant operation.

Note that conditioning is also covariant if we change all variables and laws coherently. Thus the action of the monoids **S**_{X} on cochains respects the invariance.

Then the coboundary $\widehat{\delta}$ preserves invariance. Thus the co-homology of the invariant co-chains is well defined. We call it the invariant information co-homology, and we will denote it by ${H}_{U}^{*}(\mathcal{S};\phantom{\rule{0.2em}{0ex}}\mathcal{Q})$, U for unitary.

Invariant co-cochains form a subcomplex of ordinary cochains, then we have a well defined map from
${H}_{U}^{*}(\mathcal{S};\phantom{\rule{0.2em}{0ex}}\mathcal{Q})$ to H^{∗}(S; Q).

The invariant 0-co-chains depend only on the spectrum of ρ in the sets **Q**_{X}.

The invariant co-homology is probably a more natural object from the point of view of Physics. It is also on this co-homology that we were able to obtain constructive results.

The classical entropy of the decomposition {E_{j}} and the quantum law ρ is

In general it is not true that H(X; ρ) = H(Y ; Y_{∗}ρ) when X divides Y . Thus the Shannon (or Gibbs) entropy is not a local 0-cochain, but it is a local 1-cochain, i.e., if X → Y → Z we have

Moreover it is a spectral 1-cochain for any **Q**^{min}.

The following result is well known, cf. Nielsen and Chuang [13].

**Lemma 5.** Let X, Y be two commuting families of observables; we have

**Proof.** We denote by α, β, … the indices of the different values of X, by k, l, … the indices of the different values of Y , and by i, j, … the indices of a basis I_{k,α} of eigenvectors of the conditioned density
${\rho}_{k,\alpha}={E}_{k,\alpha}^{*}\rho {E}_{k,\alpha}$ constrained by the projectors E_{k,α} of the pair (Y, X). The probability
${p}_{k}={P}_{\rho}(X={\xi}_{k})$ is equal to the sum over i, α of the eigenvalues λ_{i,k,α} of ρ_{k,α}. We have

**Remark 6.** Taking X = 1, or any scalar matrix, the preceding Lemma 5 expresses the fact that classical entropy is a derived quantity measuring the default of equivariance of the quantum entropy:

**Lemma 6.** For any X ∈ **S**, dividing Y ∈ **S** and ρ ∈ **Q**_{X},

**Proof.** This is exactly what says the Lemma 5 in this particular case, because in this case (X, Y) = X, and, by definition, we have
$\widehat{\delta}({S}_{X})(Y;\rho )=Y.{S}_{X}\phantom{\rule{0.2em}{0ex}}(\rho )-{S}_{X}(\rho )$.

To insist, we give a direct proof with less indices for this case:

The Lemma 6 says that (up to the sign) the Shannon entropy is the co-boundary of the Von-Neumann entropy. This implies that the Shannon entropy is a 1-co-cycle, as in the classical case, but now it gives zero in co-homology.

Note that the result is true for any **Q**, thus for **Q**^{min} and for **Q**^{max} as well.

Consider a maximal observable X_{0} in **S**, i.e., a maximal set of commuting observables in **S**, the elements of this maximal partition form a finite set Ω_{0}. If **S** is invariant by the group U(E; h_{0}), all the maximal observables are deduced from X_{0} by applying a unitary base change. Suppose that the functor **Q** is invariant also; then we get automatically a symmetric classical structure of information
$\mathcal{S}$ on Ω_{0}, given by the elements of **S** divided by X_{0}. And
$\mathcal{S}$ is equipped with a symmetric classical functor of probability, given by the probability laws associated to the elements of
$\mathcal{S}$.

Remind that we defined the trace from quantum probabilities to classical probabilities, by taking the classical ℙ_{ρ} for each ρ, and we noticed that the trace is compatible with invariance and symmetry by permutations.

**Definition 7.** To each classical co-chain F ^{0} we can associate a quantum co-chain F = tr^{∗}F^{0} by putting

The following result is straightforward:

**Proposition 3.** (i) The trace of co-chains defines a map of the classical information Hochschild complex to the quantum one, which commutes with the co-boundaries, i.e., the map tr^{∗} defines a map from the classical information Hochschild complex to the quantum Hochschild complex; (ii) this map sends symmetric cochains to invaraint cochains; it induces a natural map from the symmetric classical information co-homology
${H}_{\mathfrak{S}}^{*}\phantom{\rule{0.2em}{0ex}}(\mathcal{S},\phantom{\rule{0.2em}{0ex}}\mathcal{Q})$ to the invariant quantum information co-homology H_{U}^{∗}(**S**; **Q**).

The Lemma 6 says that the entropy class goes to zero.

**Remark 7.** In a preliminary version of these notes, we considered the expression s(X; ρ) = S(ρ_{X}) − S(ρ) and showed it satisfies formally the 1-cocycle equation. But we suppress this consideration now, because s is not local, thus it plays no interesting role in homology. For instance in **Q**^{min}, S(ρ_{X}) is local but S(ρ) is not and in **Q**^{max}, S(ρ) is local but S(ρ_{X}) is not.

**Definition 8.** In an information structure **S** we call edge a pair of decompositions (X, Y) such that X, Y and XY belong to **S**; we say that an edge is rich when both X and Y have at least two elements and XY cuts those two in four distinct subspaces of E. The structure **S** is connected if every two points are joined by a sequence of edges, and it is sufficiently rich when every point belongs to a rich edge. We assume a maximal set of subspaces U**B** is given in the Grassmannian of E, in such a way that the maximal elements X_{0} of **S** (i.e., initial in the category) are made by pieces in U**B**. The density functor **Q** is said complete with respect to **S** (or U**B**) if for every X, the set **Q**_{X} contains the positive hermitian forms on the blocs of X, that give scalar blocs ρ_{αβ} for two elements E_{α}, E_{β} of a maximal decomposition. (All that is simplified when we choose a basis, and take maximal commutative subalgebras of operators, but we want to be free to consider simplicial complexes.)

**Theorem 3.** (i) for any unitary invariant quantum information structure **S**, which is connected and sufficiently rich, and for the canonical invariant density functor **Q**^{can}(**S**), (i.e., the density functor which is minimal and complete with respect to **S**), the invariant information co-homology of degree one
${H}_{U}^{1}(\mathcal{S};\phantom{\rule{0.2em}{0ex}}\mathcal{Q})$ is zero. (ii) Under the same hypothesis, the invariant co-homology of degree zero has dimension one, and is generated by the constants. Then, up to an additive constant, the only invariant 0-cochain which has the Shannon entropy as co-boundary is (minus) the Von-Neumann entropy.

**Proof.** (I) Let X, Y be two orthogonal decompositions of E belonging to **S** such that (X, Y) belongs to **S**, and ρ an element of **Q**. We name
${A}_{{k}_{i}};\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m$ the summands of X, and
${B}_{{\alpha}_{j}};\phantom{\rule{0.2em}{0ex}}j=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}l$ the summands of Y ; the projections
${E}_{{k}_{i}}\rho {E}_{{k}_{i}};\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m$ resp.
${E}_{{\alpha}_{j}}\rho {E}_{{\alpha}_{j}};j=1,\phantom{\rule{0.2em}{0ex}}\dots ,l$ of ρ on the summands of X, resp. Y are denoted by
${\rho}_{{k}_{i}};\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m$ and
${\rho}_{{\alpha}_{j}};\phantom{\rule{0.2em}{0ex}}j=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}l$ respectively. The projections by the commutative products
${E}_{{k}_{i}}{E}_{{\alpha}_{j}}$ are denoted by
${\rho}_{{k}_{i},\phantom{\rule{0.2em}{0ex}}{\alpha}_{j}};\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m,\phantom{\rule{0.2em}{0ex}}j=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}l$.

Let f be a 1-cocycle, we write f(X; ρ) = F (ρ), f(Y; ρ) = H(ρ) and G(ρ) = f(X, Y; ρ). Note that in **Q**^{min}, F is a function of the
${\rho}_{{k}_{i}}$, H a function of the
${\rho}_{{\alpha}_{j}}$ and G a function of the
${\rho}_{{k}_{i},\phantom{\rule{0.2em}{0ex}}{\alpha}_{j}}$, but there is no necessity too assume this property; we can always consider these functions restricted to diagonal blocs, which are arbitrary due to the completeness hypothesis.

For any positive hermitian ρ′, we write ρ′|α, resp. ρ′|i the form conditioned by the event B_{α} resp. A_{i}. The co-cycle equation gives the two following equations, that are exchanged by permuting X and Y:

Now we consider a particular case, where the small blocs ρ_{k,α} are zero except for (k_{1}, α_{2}) and (k_{j}, α_{1}) for j = 2, …, m. We denote by h_{1} the forme
${\rho}_{{k}_{1},{\alpha}_{2}}$ and by h_{i} the form
${\rho}_{{k}_{i},{\alpha}_{1}}$, for i = 2, …, m. Remark that Tr(h_{1} + h_{2} + … + h_{m}) = 1.

(II) As in the classical case, it is a general fact for a 1-cocycle f and any variable Z the value f(Z; ρ) is zero if ρ is zero outside one of the orthogonal summand C_{a} of Z; because the equation f_{X}(Z, Z; ρ) = f_{X}(Z; ρ) + Z.f_{X}(Z; ρ) implies Z.f_{X}(Z, ρ) = 0, and if ρ has only one non-zero factor ρ_{a}, we have

Therefore in the particular case that we consider, we get for any i that $H(({\rho}_{{\alpha}_{j}}\phantom{\rule{0.2em}{0ex}}|{k}_{i});\phantom{\rule{0.2em}{0ex}}j)=0$. Consequently the Equation (96) equals the term in G to the term in F , and we can report this equality in the first equation. By denoting $1-{x}_{1}=Tr({\rho}_{\alpha 1})$, this gives

Now if we add the condition h_{3} = … = h_{m} = 0 we have F (0, h_{2}/(1−x_{1}), 0, …, 0) = 0 for the reason which eliminated the
$H(({\rho}_{{\alpha}_{j}}|{k}_{i});\phantom{\rule{0.2em}{0ex}}j)$; thus we obtain

This is a sufficiently strong constraints for implying that both terms are functions of h_{1}, h_{2} only, and that of course they coincide as functions of these small blocs.

First this gives a recurrence equation, which, as in the classical case is able to reconstruct $F(({\rho}_{{k}_{i}});\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m)$ from the case of two blocs:

(III) We are left with the study of two binary variables Y, Z, forming a rich edge.

The blocs of ρ adapted to the joint ZY are denoted by ρ_{00}, ρ_{01}, ρ_{10}, ρ_{11}, where the first index refers to Y and the second index refers to Z, but the blocs that are allowed for Y and Z are more numerous than four; there exist out of diagonal blocs, and their role will be important in our analysis. For Y we have matrices
${\rho}_{0}^{0}$ and
${\rho}_{1}^{0}$, and for Z we have matrices
${\rho}_{0}^{1}$ and
${\rho}_{1}^{1}$;

They are disposed in sixteen blocs for ρ, but certain of them, noted with stars, cannot be seen from ρ_{Y} or ρ_{Z}:

Now the co-cycle equations are

The conditioning makes many blocs disappear. Then, by denoting with latin letters the corresponding traces, and taking in account explicitly the blocs that must count, the symmetrical identity gives, for any ρ, the following developed equation:

(IV) Now we make appeal to the invariance hypothesis: let us apply a unitary transformation g which respects the two summands of Y but does not necessarily respect the summands of Z we replace Z by gZg^{∗}, and ρ by gρg^{∗}, the value of
${F}_{Y}\phantom{\rule{0.2em}{0ex}}({\rho}_{0}^{0},\phantom{\rule{0.2em}{0ex}}{\rho}_{1}^{0})$ does not change. Our claim is that the only function F_{Y} which is compatible with the Equation (106) for every ρ are functions of the traces of the blocs.

For the proof, we assume that all the blocs are zero except the eight blocs concerning Y . In this case, we see that the last function −F_{Y} of the right member, involves the eight blocs, but all the other functions involve only the four diagonal blocs. Thus our claim follows from the following result:

**Lemma 7.** A measurable function f on the set H of hermitian matrices which is invariant under conjugation by the unitary group U_{n} and invariant by the change of the coefficient a_{1}_{n}, the farthest from the diagonal, is a function of the trace.

**Proof.** An invariant function for the adjoint representation is a function of the traces of the exterior powers Λ^{k}(ρ), but these traces are coefficients in the basis
${e}_{{i}_{1}}\phantom{\rule{0.2em}{0ex}}\wedge \phantom{\rule{0.2em}{0ex}}{e}_{{i}_{1}}\phantom{\rule{0.2em}{0ex}}\wedge \phantom{\rule{0.2em}{0ex}}\dots \wedge {e}_{{i}_{k}}$, and the elements divisible by e_{1} ∧ e_{n} cannot be neglected, as soon as k ≥ 2.

Therefore the co-cycle F_{Y}, F_{Z} comes from the image of tr^{*} in proposition 3. Then the recurrence relation (100) implies that the same is true for the whole co-cycle F.

(V) For concluding the proof of (i), we appeal to the Theorem 1, that the only non-zero cocycles in this context, connected and sufficiently rich, are multiples of the classical entropy. However, the Lemma 5 says that the entropy is a co-boundary.

(VI) To prove (ii), we have to show that every 0-cocycle X ↦ f_{X}(ρ), which depends only on the spectrum of ρ, is a constant. We know that a spectral function is a measurable function φ(σ_{1}; σ_{2}; …) of the elementary symmetric functions
${\sigma}_{1}={\displaystyle {\sum}_{i}{\lambda}_{i},{\sigma}_{2}={\displaystyle {\sum}_{i<j}{\lambda}_{i}{\lambda}_{j},\dots}}$.

And, to be a 0-cocycle, f must verify, for every pair of decompositions, X → Y, the equation

Explicitly, if f_{X}(ρ) = φ_{X}(σ_{1}, σ_{2}, …),

By equating the two second members, taking λ_{01} = λ_{00} = 0, and varying λ_{10}, λ_{11}, we find that f(x, y) is the sum of a constant and a linear function.

At the end, f_{X} must be the sum of a constant and a linear function for every X. However, a linear symmetric function is a multiple of σ_{1}. As ρ is normalized by the condition Tr(ρ) = 1, only the constant survives.

**Remark 8.** In his book “Structure des Systemes Dynamiques”, J-M. Souriau [48] showed that the mass of a mechanical system is a degree one class of co-homology of the relativity group with values in its adjoint representation; this class being non-trivial for classical Mechanics, with the Galileo group, and becoming trivial for Einstein relativistic Mechanics, with the Lorentz-Poincare group. Even if we are conscious of the big difference with our construction, the above result shows the same thing happens for the entropy, but going from classical statistics to quantum statistics.

>From the philosophical point of view, it is important to mention that the main difference between classical and quantum information co-homology in degree less than one, is the fact that the certitude, 1, becomes highly non-trivial in the quantum context. This point is discussed in particular by Gabriel Catren [49]. In geometric quantization the first ingredient, discovered by Kirillov, Kostant and Souriau in the sixties, is a circular bundle over the phase space that allows a non-trivial representation of the constants. The second ingredient also discovered by the same authors, is the necessity to choose a polarization, which correspond to the choice of a maximal commutative Poisson sub-algebra of observable quantities. This second ingredient appears in our framework through the limitations of information categories to collection of commutative Boolean algebras, coming from the impossibility to define manageable joints for arbitrary pair of observables.

## 5. Product Structures, Kullback–Leibler Divergence, Quantum Version

In this short section, we use both the homogeneous bar-complex and the non-homogeneous complex. A natural extension of the information co-cycles is to look at the measurable functions

_{j}(or density of states respectively) on Ω (or E respectively) belonging to the space ${\mathcal{Q}}_{X}$ that are absolutely continuous with respect to P

_{0}, and several decompositions Y

_{i}less fine than X. To be homogeneous co-chains these functions have to behave naturally under direct image Y

_{∗}(P

_{i}), and to satisfy the equivariance relation:

_{X}(resp.

**S**

_{X}), where

Note that a special role is played by the law P_{0}, which justifies the coma notation.

The proof of the Lemma 1 in Section 2.1 extends without modification to show that this defines an action of semi-group.

Then we define the homogeneous co-boundary operator by

The co-cycles are the elements of the kernel of δ and the co-boundaries the elements of the image of δ (with a shift of degree). The co-homology groups are the quotients of the spaces of co-cycles by the spaces of co-boundaries.

This co-homology is the topos co-homology ${H}_{\mathfrak{S}}^{*}(\mathbb{R},\phantom{\rule{0.2em}{0ex}}{\mathcal{F}}_{n})$, of the module functor ${\mathcal{F}}_{n}$ of measurable functions of n + 1-uples of probabilities, in the ringed topos S (resp. S in the quantum case).

There is also the non-homogeneous version: a m-cocycle is a family of functions F_{X}(X_{1}; …; …; X_{m}; P_{0}; P_{1}, P_{2}, …, P_{n}) which behave naturally under direct images, without equivariance condition.

The co-boundary operator is copied on the Hochschild operator: then we define the homogeneous co-boundary operator by

Let us recall the definition of the Kullback–Leibler divergence (or relative entropy) between two classical probability laws P, Q on the same space Ω, in the finite case:

Over an infinite set, it is required that Q is absolutely continuous with respect to P with a L^{1}-density dQ/dP , and the definition is

When dQ(ω)/dP (ω) = 0, the logarithm is −∞ and due to the sign minus, we get a contribution +∞ in H, thus, if this happens with probability non-zero for P the divergence is infinite positive. To get a finite number we must suppose also that P is absolutely continuous with respect to Q, i.e., P and Q are equivalent.

The analogous formula defines the quantum Kullback–Leibler divergence (or quantum relative entropy), cf. Nielsen-Chuang [13], between two density of states ρ, σ on the same Hilbert space E, in the finite dimensional case:

These quantities are positive or zero, and they are zero only in the case of equality of the measures (resp. the densities of states). It is the reason why it is frequently used as a measure of distance between two laws.

**Proposition 4.** The map which associates to X in
$\mathcal{S}$, Y divided by X, and two laws P, Q the quantity H(Y_{∗}P ; Y_{∗}Q) defines a non-homogeneous 1-cocycle, denoted H_{X}(Y ; P ; Q).

**Proof.** As we already know that the classical Shannon entropy is a non-homogeneous 1-cocycle, it is sufficient to prove the Hochschild relation for the new function

Let us denote by p_{ij} (resp. q_{ij}) the probability for P (resp. Q) of the event Y = x_{i}, Z = y_{j}, and by p^{j} (resp. q^{j}) the probability for P (resp. Q) of the event Z = y_{j}; then the probability p^{j} (resp.
${q}_{i}^{j}$) of Y = x_{i} knowing that Z = y_{j} for P (resp. for Q) is equal to p_{ij}/p^{j} (resp. q_{ij}/q^{j}), and we have

_{m}(Z; P ; Q) and the second is (Z.H

_{m})(Y ; P ; Q), Q.E.D.

This defines a homogeneous co-cycle for pairs of probability laws H_{X}(Y ; Z; P ; Q) = H_{X}(Y ; P ; Q)− H_{X}(Z; P ; Q), named Kullback-divergence variation.

In the quantum case, for two densities of states ρ, σ we define in the same manner a classical Kullback–Leibler divergence H_{X}(Y ; ρ; σ) by the formula

_{k}associated to Y and where ρ

_{k}(resp. σ

_{k}) denotes the matrix E

_{k}

^{∗}ρE

_{k}(resp. E

_{k}

^{∗}σE

_{k}). It is the Kullback–Leibler divergence of the classical laws associated to the direct images ρ and σ respectively.

But in the case of quantum information theory, we can also define a quantum divergence, for any pair densities of states (ρ, σ) in **Q**_{X},

**Lemma 8.** For any pair (X, Y) of commuting hermitian operators, such that Y divides X, the function S_{X} satisfies the relation

_{X}of two variables denotes the mixed entropy, defined by Equation (119).

**Proof.** As in the proof of the Lemma 4, we denote by α, β, … (resp. k, l, …) the indices of the orthogonal decomposition Y (resp. X), and by i, j, … the indices of a basis φ_{i,k,α} of the space E_{k,α} made by eigenvectors of the matrix
${\mathcal{G}}_{k,\alpha}={E}_{k,\alpha}^{*}\rho {E}_{k,\alpha}$ belonging to the joint operator (X, Y). In a general manner if M is an endomorphism of E_{k,α} we denote by M_{i,k,α} the diagonal coefficient of index (i, k, α). The probability p_{k} (resp. q_{k}) for ρ (resp. σ) of the event X = ξ_{k} is equal to the sum over i, α of the eigenvalues λi,k,α of ρ_{k,α} (resp. µi,k,α of σ_{k,α}). And the restricted density ρ^{Yk} (resp. σ^{Yk}), conditioned by X = ξ_{k}, is the sum over α of ϱ_{k,α} (resp. of σ_{k,α}) divided by p_{k} (resp. q_{k}). We have

As a corollary, with the argument proving the Lemma 5 from the Lemma 4, we obtain that the classical Kullback divergence is minus the co-boundary of the 0-cochain defined by the quantum divergence.

This shows that the generating function of all the co-cycles we have considered so far is the quantum 0-cochain for pairs S(ρ; σ) = −T r(ρ log σ).

## 6. Structure of Observation of a Finite System

Up to now the considered structures and the interventions of entropy can be considered as forming a kind of statics in information theory. The aim of this section is to indicate the elements of dynamics which could correspond. This more dynamical study could be more adapted to the known intervention of entropy in the theory of dynamical systems, as defined by Kolmogorov and Sinai.

#### 6.1. Problems of Discrimination

The problem of optimal discrimination consists in separating the various states of a system, by using in the most economical manner, a family of observable quantities. One can also only want to detect a state satisfying a certain chosen property. A possible measure of the cost of discrimination is the number of step before ending the process.

First, let us define more precisely what we mean by a system, a state, an observable quantity and a strategy for using observations. As before, for simplicity, the setting is finite sets.

The symbol [n] denotes the set {1, …, n}. We have n finite sets M_{i} of respective cardinalities m_{i}, and we consider the set M of sequences x_{1}, …, x_{n} where x_{i} belongs to M_{i}; by definition a system is a subset X of M and a state of the system is an element of X. The set of (classical) observable quantities is a (finite) subset A of the functions from X to R.

A use of observables, named an observation strategy, is an oriented tree Γ, starting at its root, that is the smallest vertex, and such that each vertex is labelled by an element of A, and each arrow (naturally oriented edge) is labelled by a possible value of the observable at the initial vertex of the arrow.

For instance, if F_{0} marks the root s_{0}, it means that we aim to measure F_{0}(x) for the states; then branches issued from t_{0} are indexed by the values v of F_{0}, and to each branch F_{0} = υ corresponds a subset X_{υ} of states, giving a partition of X. If F_{1}_{,v} is the observable at the final vertex α_{v} of the branch F_{0} = υ, the next step in the program is to evaluate F_{1}_{,v}(x) for x ∈ X_{v}; then branches issued from α_{v} corresponds to values w of F_{1}_{,}_{υ} restricted to X_{v}, and so on.

For each vertex s in Γ we note ν(s) the number of edges that are necessary for joining s to the root s_{0}. The function ν with values in ℕ is called the level in the tree.

It can happen that a set X_{υ} consists of one element only; in this case we decide to extend the tree to the next levels by a branch without bifurcation, for instance by labelling with the same observable and the same value, but it could be any labelling, and its value on X_{v}. In such a way, each level k gives a well defined partition π_{k} of X.

The level k also defines a sub-tree Γ_{k} of Γ, such that its final branches are bearing π_{k}. This gives a sequence π_{0}, π_{1}, …, π_{l} of finer and finer partitions of X, i.e., a growing sequence of partitions (if the ordering on partition is the opposite of the sense of arrows in the information category Π(X)). The tree is said fully discriminant if the last partition π_{l}, which is the finest is made by singletons.

The minimal number of steps that are necessary for separating the elements of X, or more modestly for detecting a certain part of states, can be seen as a measure of complexity of the system with respect to the observations A. A refined measure could take in account the cost of use of a given observable, for instance the difficulty to compute its values.

Standard examples are furnished by weighting problems: in this case the states are mass repartitions in n objects, and allowed observables are weighting, which are functions of the form

We underline that such a function, which requires the choice of two disjoint subsets in [n], makes use of the definition of M as a set of sequences, not as an abstract finite set.

The kind of problems we can ask in this framework were studied for instance in “Problemes plaisants et delectables qui se font par les nombres” from Bachet de Meziriac (1612, 1624) [50].

The starting point of our research in this direction was a particular classical problem signaled to us by Guillaume Marrelec: given n objects ξ_{1}, …, ξ_{n}, if we know that m have the same mass and n − m have another common mass, how many measures must be performed, to separate the two groups and decide which is the heavier?

Even for m = 1 the solution is interesting, and follows a principle of choice by maximum of entropy. In the present text we only want to describe the general structures in relation to this kind of problem without developing a specific study, in particular we want to show that the co-homological nature of the entropy extends to a more dynamical context of discrimination in time.

**Remark 9.** The discrimination problem is connected with the coding problem. In fact a finite system X (as we defined it just before) is nothing else than a particular set of words of length n, where the letter appearing at place i belongs to an alphabet M_{i}. Distinguishing between different words with a set A of variables f, is nothing else than rewriting the words x of X with symbols v_{f} (labelling the image f(X)). To determine the most economical manner to do that, consists to find the smallest maximal length l of words in the alphabet (f, v_{f}); f ∈ A, v_{f} ∈ f(X) translating all the words x in X. This translation, when it is possible, can be read on the branches of a fully discriminating rooted tree, associated to an optimal strategy, of minimal level l. The word that translate x being the sequence (F_{0}, v_{0}), (F_{1}, v_{1}), …, (F_{k}, v_{k}), k ≤ l, of the variables put on the vertices along the branch going from 0 to x, and the values of these variables put along the edges of this branch.

#### 6.2. Observation Trees. Galois Groups and Probability Knowledge

More generally, we consider as in the first part (resp. in the second part) a finite set Ω, equipped with a Boolean algebra
$\mathcal{B}$ (resp. a finite dimensional complex vector space E equipped with a positive definite hermitian form h_{0} and a family of direct decompositions in linear spaces U**B**). In each situation we have a natural notion of observable quantity: in the case of Ω it is a partition Y compatible with
$\mathcal{B}$ (i.e., less fine than
$\mathcal{B}$) with numbering of the parts by the integers 1, .., k if Y has k elements; in the case of E it is a decomposition Y compatible with U**B** (i.e., each summand is direct sum of elements of one of the decompositions u**B**; for u ∈ U(h_{0})), with a numbering of the summands by the integers 1, .., k if Y has k elements. We also have a notion of probability: in the case of (Ω, Y) it is a classical probability law P_{Y} on the quotient set Ω/Y; in the case of (E, Y) it is a collection of non-negative hermitian forms h_{Y,i} on each summands of Y.

We will consider information structures, denoted by the symbol S, for both cases (which could be distinguished by the typography,
$\mathcal{S}$ or **S**, if necessary): they are categories made by objects that are observables and arrows that are divisions, satisfying the condition that if X ∈ S divides Y and Z in S, then the joint (Y, Z) belongs to S.

We will also consider probability families adapted to these information structures; they form a covariant functor X ↦ Q_{X} (which can be typographically distinguished in the two cases by
${\mathcal{Q}}_{X}$ and **Q**_{X}) of direct images. When
$\mathcal{S}$ is a classical subcategory of the quantum structure **S**, we suppose that we have a trace transformation from ι^{∗}**Q** to
$\mathcal{Q}$, and if **S** and **Q** are unitary invariant, we remind that, thanks to the ordering, we have an equivalence of category between **S**^{U} and
$\mathcal{S}$, and a compatible morphism from the functional module
${\mathcal{F}}_{\mathcal{Q}}$ to the functional module
${\mathcal{F}}_{\mathbf{Q}}$.

Except the new ingredient of orderings, they are familiar objects for our reader. The letter X will denote both cases Ω and E, then the letters S, B, Q will denote respectively
$\mathcal{S}$,
$\mathcal{B}$,
$\mathcal{Q}$ or **S**, U**B**, **Q**. Be careful that now all observable quantities are ordered, either partitions, either direct decomposition. We will always assume the compatibility condition between Q and S, meaning that every conditioning of P ∈ Q by an event associated to an element of S belongs to Q.

In addition we choose a subset A of observables in S, which play the role of allowed elementary observations.

We say that a bijection σ from Ω to itself, measurable for
$\mathcal{B}$, respects a set of observables
$\mathcal{A}$ if for any
$Y\in \mathcal{A}$, there exists
$Z\in \mathcal{A}$ such that Y ○ σ = Z. It means that σ establishes an ordered bijection between the pieces Y (i) and the pieces Z(i), i.e., x ∈ Z(i) if and only if σ(x) ∈ Y (i). In other words the permutation σ respects
$\mathcal{A}$ when the map σ^{*} which associates the partition Y ○ σ to any partition Y, sends
$\mathcal{A}$ into
$\mathcal{A}$.

In the same way, we say that σ respects a family of probabilities
$\mathcal{Q}$ if the associated map σ_{*} sends an element of
$\mathcal{Q}$ to an element of
$\mathcal{Q}$.

In the quantum case, with E, h_{0} and U**B**, we do the same by asking in addition that σ is a linear unitary automorphism of E.

**Definition 9.** If X, S, Q, B and A are given, the Galois group G_{0} is the set of permutations of X (resp. linear maps) that respect S, Q, B and A.

**Example 6.** Consider the system X associated to the simple classical weighting problem: states are parameterized by points with coordinates 0, 1 or −1 in the sphere S^{n−}^{1} of radius 1 in ℝ^{n}, according to their weights, either normal, heavier or lighter. Thus in this case Ω = X possesses 2n points. The set A of elementary observables is given by the weighting operations F_{I,J}, Equation (132). For
$\mathcal{S}$ we take the set
$\mathcal{S}(A)$ of all ordered partitions π_{k} obtained by applications of discrimination trees labelled by A. And we consider only the uniform probability P_{0} on X; in
$\mathcal{Q}$ this gives the images of this law by the elements of
$\mathcal{S}$, and the conditioning by all the events associated to
$\mathcal{S}$.

Then the Galois group G_{0} is the subgroup
${\mathcal{S}}_{n}\times {C}_{2}$ of
${\mathcal{S}}_{2n}$ made by the product of the permutation group of n symbols by the group changing the signs of all the x_{i} for i in [n].

Proof: the elements of
${\mathcal{S}}_{n}$ respect A, and the uniform law. Moreover if σ changes the sign of all the x_{i}, one can compensate the effect of σ on F_{I,J} by taking G_{I,J} = F_{J,I}, i.e., by exchanging the two sides of the balance.

To finish we have to show that permutations of X outside
${\mathcal{S}}_{n}\times {C}_{2}$ do not respect A. First, consider a permutation σ that does not respect the indices i. In this case there exists an index i ∈ [n] such that σ(i^{+}) and σ(i^{−}) are states associated to different coins, for instance σ(i^{+}) = j^{+} and σ(i^{−}) = k^{+}, with j ≠ k, or σ(i^{+}) = j^{+} and σ(i^{−}) = k^{−}, with j ≠ k. Two cases are possible: these states have the same mass, or they have opposite mass. In both cases let us consider a weighting F_{j,h}(x) = x_{j} − x_{h}, where h ≠ k; by applying σ^{*}F_{j,h} to x = σ(i^{+}) we find +1 (or −1), and by applying σ^{*}F_{j,h} to x = σ(i^{−}) we find 0. However, this cannot happen for a weighting, because for a weighting, either the change of i^{+} into i^{−} has no effect, either it exchanges the results +1 and −1. Finally, consider a permutation σ that respects the indices but exchanges the signs of a subset I = {i_{1}, …, i_{k}}, with 0 < k < n. In this case let us consider a weighting F_{i,j}(x) = x_{i} − x_{j} with i ∈ I and j ∈ [n]\I, the function F_{i,j} ○ σ takes the value +1 for the states i^{−}, j^{−}, the value −1 for i^{+}, j^{+} and the value 0 for the other states, which cannot happen for any weighting, because this weighting must involve both i and j, but it cannot be F_{j,i}(x) = x_{j} − x_{i}, which takes the value −1 for j^{−}, and it cannot be F_{i,j} which takes the value +1 for i^{+}.

The probability laws we are considering express the beliefs in initial knowledge on the system, in this case it is legitimate to consider that they constrain the initial Galois group G_{0}. This corresponds to the Jaynes principle [51,52].

We define in this framework the notion of observation tree adapted to a given subset A of S: it is a finite oriented rooted tree Γ where each vertex s is labelled by an observable F_{s} belonging to A and each arrow α beginning at s is labelled by an element F_{s}(i) of F_{s}. A priori we introduce as many branches as there exist elements in F_{s}. The disposition of the arrows in the trigonometric circular order makes that the tree Γ is imbedded in the Euclidian plane up to homotopy.

A branch γ in the tree Γ is a sequence α_{1}, …, α_{k} of oriented edges, such that, for each i the initial extremity of α_{i}_{+1} is the terminal extremity of α_{i}. Then α_{i}_{+1} starts with the label F_{i} and ends with the label F_{i}_{+1}. We will say that γ starts with the root if the initial extremity of α_{1} is the root s_{0}, with a label F_{0}.

For any edge α in Γ, there exists a unique branch γ(α) starting from the root, and abutting in α. Along this branch, the vertices are decorated with the variables F_{i}; i = 0, …, F_{k} and the edges are decorated with values v_{i} of these functions; we note

By definition, the set X(α) of states which are compatible with α is the subset of elements of X such that F_{0}(x) = v_{0}, …, F_{k−}_{1}(x) = v_{k−}_{1}.

At any level k the sets X(α) form a partition π_{k} de X.

**Definition 10.** We say that an observation tree Γ labelled by A is allowed by S, if all joint observable along each branch belongs to S.

We say simply allowed if their is no risk of confusion.

In what follows this restriction is imposed on all considered tree. Of course if we start with the algebra of all ordered partitions this gives no restriction, but this would exclude the quantum case, where the best we can do is to take maximal commutative families.

**Definition 11.** Let α be an edge of Γ, we note
$\mathcal{Q}\left(\alpha \right)$ the set of probability laws on X(α) which are obtained by conditioning by the values v_{0}, v_{1}…, v_{k−}_{1} of the observables F_{0}, F_{1}, …, F_{k−}_{1} along the branch γ(α) starting in the root and ending with α.

**Definition 12.** The Galois group G(α) is the set of permutations of elements of X(α) that belongs to G_{0}, preserve all the equations F_{i}(x) = v_{i} (resp. all the summands of the orthogonal decomposition F_{i} labelling the edges) and preserve the sets of probability Q(α) (resp. quantum probabilities).

We consider G(α) as embedded in G_{0} by fixing point by point all the elements of X outside X(α).

**Remark 10.** Let P be a probability law (either classical or quantum) on X, Φ = (F_{i}; i ∈ I) a collection of observables, and φ = (v_{i}; i ∈ I) a vector of possible values of Φ; the law P |(Φ = φ) obtained by conditioning P by the equations Φ(x) = φ, is defined only if the set X_{φ} of all solutions of the system of equations Φ(x) = φ has a non-zero probability p_{φ} = P (X_{φ}). It can be viewed either as a law on X_{φ}, or as a law on the whole X by taking the image by the inclusion of X_{φ} in X.

**Definition 13.** The edge α is said Galoisian if the set of equations and probabilities that are invariant by G(α) coincide respectively with X(α) and
$\mathcal{Q}\left(\alpha \right)$.

A tree Γ is said Galoisian when all its edges are Galoisian.

At each level k we define the group G_{k} which is the product of the groups G(α) for the free edges at level k; it is a subgroup of G_{0} preserving elements by elements the pieces of the partition π_{k}.

Along the path γ the partition (or decomposition) π_{l}, l ≤ k of X is increasing (finer and finer) and the sequence of groups G_{l}, l ≤ k is decreasing.

Along a branch the sets X(α) are decreasing and the sequence of groups G_{0}, G(α_{1}), …, G(α_{k}) is decreasing. We propose that the quotient G(α_{i}_{+1})/G(α_{i}) gives a measure of the Galoisian information gained by applying F_{i} and obtaining the value v_{i}.

On each set X(α) the images of the elements of the probability family $\mathcal{Q}$ form sets $\mathcal{Q}\left(\alpha \right)$ of probabilities on X(α).

Thus also imposed in the group G(α) to preserve the set $\mathcal{Q}\left(\alpha \right)$.

**Remark 11.** In terms of coding, introducing probabilities on the X(α) permits to formulate the principle, that it is more efficient to choose, after the edge α, the observation having the largest conditional entropy in Q(α). In what circumstances it gives the optimal discrimination tree is a difficult problem, even if the folklore admit that as a theorem. It is the problem of optimal coding.

In virtue of a Shannon’s theorem, the minimal length is bounded below by entropy of the law on X if this law is unique. We found it works in a simple example of weighting (cf. paper 3 [22]).

Note however important differences between our approach and the traditional one for coding: for us A is given and $\mathcal{Q}$ is given; they correspond respectively to an a priori limitation of possible codes for use (like a natural language), and to a set of possible a priori knowledges, for instance taking in account the Galois ambiguity in the system (Jaynes principle). All that is Bayesian in spirit.

**Definition 14.** We say that an observation tree Γ labelled by A is allowed by S and by X ∈ S, if it is allowed by S_{X}, which means that all joint observable along each branch is divided by X.

**Definition 15.** S(A) is the set of (ordered) observables π_{k} which can be obtained by allowed observation trees. For X ∈ S we note S_{X}(A) the set of (ordered) observables π_{k} which can be obtained by observation trees that are allowed by S and X.

**Lemma 9.** The joint product defines a structure of monoid on the set S_{X}(A).

**Proof.** Let Γ, Γ′ be two observation trees allowed by A, S and X ∈ S, of respective lengths k, k′, giving final decompositions S, S′. To establish the lemma we must show that the joint SS′ is obtained by a tree associated with A, allowed by S and X.

For that we just graft one exemplar of Γ′ on each free edge of Γ. This new tree ΓΓ′ is associated with A, and its final partition is clearly finer than S. It is also finer than S′, because at the end of any branch of ΓΓ′ we have an X(β) which is contained in the corresponding element of the final partition π_{k}′ (Γ′). To finish the proof we have to show that each element of π_{k}_{+}_{k}′ (ΓΓ′) is the intersection of element of π_{k}(Γ) with one element of π_{k}′ (Γ′), because we know these observables are in S_{X}, which is a monoid, by the definition of information structure. But a complete branch γ.γ′ in ΓΓ′, going from the root to a terminal edge at level k + k′, corresponds to a word (F_{0}, v_{0}, F_{1}, v_{1}, …, F_{k−}_{1}, v_{k−}_{1},
${{F}^{\prime}}_{0}$,
${{v}^{\prime}}_{0},\dots ,{{F}^{\prime}}_{{k}^{\prime}-1}$,
${{v}^{\prime}}_{{k}^{\prime}-1}$, thus the final set of the branch γ.γ′ is defined by the equations F_{i} = v_{i}; i = 0, …, k−1 et
${{F}^{\prime}}_{j}={{v}^{\prime}}_{j}$; j = 0, …, k′−1, and is the intersection of the sets respectively defined by the first and second groups of equations, that belong respectively to π_{k}(Γ) and π_{k}′ (Γ′).

Then S(A) form an information structure. In particular there is a unique maximal partition, initial element for each subcategory S_{X}(A) in the information structure S(A).

But on S(A) the operation of grafting, that we will describe now, is much richer than what we used in the above Lemma 9: we can graft an allowed tree on each free edge of an allowed tree, and this introduces to a theory of operads and monads for information theory.

#### 6.3. Co-Homology of Observation Strategies

Remember that the elements of the partitions or decompositions Y we are considering, are now numbered by the ordered set {1, …, L(Y)}, where L(Y) is the number of elements in the partition, or the decomposition, also called its length. In particular we consider as different two partitions which are labelled differently by the integers. This was already taken into account in the definition of the Galois groups.

We define the multi-products µ(m; n_{1}, …, n_{m}) on the set of ordered partitions:

They are defined between a partition equipped with an ordering (π, ω) with m pieces and m ordered partitions (π_{1}, ω_{1}), …, (π_{m}, ω_{m}) of respective lengths n_{1}, …, n_{m}; the results is the ordered partition obtained by cutting each piece X_{i} of π by the corresponding decomposition π_{i} and renumbering the non-empty pieces by integers in the unique way compatible with the orderings ω, ω_{1}, …, ω_{m}. Observe the important fact that the result has in general less than n = n_{1} + … + n_{m} pieces. This introduces a strong departure from usual multi-products (cf. P. May [17,53], Loday-Vallette [10]). We do not have an operad, when introducing vector spaces V (m) generated by decompositions of length m, we get filtered but not graded structures. However a form of associativity and neutral element are preserved, hence we propose to name this structure a filtered operads.

There exists an evident unit to the right which is the unique decomposition of length 1.

The action of the symmetric group
${\mathfrak{S}}_{m}$ on the products is evident, and does not respect the length of the result. We will designate by µ_{m} the collection of products for the same length m.

The numbers m_{i} between 1 and n_{i} that counts the pieces of the decomposition of the element X_{i} of π are functions m_{i}(π, ω, π_{i}, ω_{i}). There exists a growing injection η_{i} : [m_{i}] → [n_{i}], which depends only on (π, ω, π_{i}, ω_{i}) telling what indices of (π_{i}, ω_{i}) survive in the product. These injections are integral parts of the structure of filtered operad. In particular, if we apply a permutation σ_{i} to [n_{i}], i.e., if we replace ω_{i} by ω_{i} ○ σ_{i}, the number can change.

The axioms of operadic unity and associativity, conveniently modified are easy to verify (cf. [22]). The reference we follow here is Fresse “Basic concepts of operads” [16]. For unity nothing has to be modified. For associativity (Figure 1.3 in Fresse [16]), we modify by saying that if the (π_{i}, ω_{i}) of lengths n_{i}, for i between 1 et k, are composed from µ(n_{i};
${n}_{i}^{1},\dots ,{n}_{i}^{{n}_{i}}$) with the n_{i}-uples (…,
$({\pi}_{i}^{j},{\omega}_{i}^{j})$, …) whose respective lengths are
${n}_{i}^{j}$, and if the result µ_{i} for each i has length (
${m}_{i}^{1}+\dots +{m}_{i}^{{n}_{i}}$) where
${m}_{i}^{j}$ is function of (π_{i}, ω_{i}) and
$({\pi}_{i}^{j},{\omega}_{i}^{j})$, then the product of (π, ω) of length k with the µ_{i} is the same as the one we would have obtained by composing µ(k; n_{1}, …, n_{k})((π, ω); (π_{1}, ω_{1}), …)) with the m = m_{1} + … + m_{k} ordered decompositions
$({\pi}_{i}^{j},{\omega}_{i}^{j})$ for j belonging to the image of η_{i} : [m_{i}] → [n_{i}]. This result is more complicate to write than to prove, because it only expresses the associativity of the ordinary join of three partitions; from which ordering follows.

Moreover, the first axiom concerning permutations (Figure 1.1 in Fresse [16]), can be modified, by considering only permutations of n_{i} letters which preserve the images of the maps η_{i}.

The second axiom, which concerns a permutation σ of k elements in π, and the inverse permutation of the partitions π_{i} can be reformulated by telling the effect of σ on the multiple product µ is the same as the effect of σ on the indices of the (π_{i}, ω_{i}). In other terms, the effect of σ on ω is compensated by the action of σ^{−}^{1} on the indices of the (π_{i}, ω_{i}). One has to be careful, because the result of µ applied to (π, ω ○ σ) has in general not the same length as µ applied to (π, ω). However the compensation implies that µ_{k} is well defined on the quotient of the set of sequences ((π, ω), (π_{1}, ω_{1}), …) by the diagonal action of
${\mathcal{S}}_{k}$, which permutes the k pieces of π and which permutes the indices i of the n_{i} in the other factors.

Geometrically, if the partition (π, ω) in S(A) is generated by an observation tree Γ with m ending edges and the partitions (π_{i}, ω_{i}); i = 1, …, m are generated by a collection of observation trees Γ_{i}; then the result of the application of µ(m; n_{1}, …, n_{m}) to (π, ω) and (π_{i}, ω_{i}); i = 1, …, m is generated by the observation tree that is obtained by grafting each Γ_{i} on the vertex number i. Drawing the planar trees associated to three successive sets of decompositions for two successive grafting operations helps to understand the associativity property.

The fact that in general this does not give a tree with n_{1} + … + n_{m} free edges, where n_{i} denotes the number of free edges of Γ_{i} comes from the possibility to find an empty set X(β) at some moment along a branch of the grafted tree; this we call a dead branch. It expresses the fact that the empty set is excluded from the elements of a partition in the classical context, and the zero space excluded from the orthogonal decomposition in the quantum context. When computing conditioned probabilities we encounter the same problem if a set X(β) at some place in a branch has measure zero.

The dead branches and the lack of graduation cause a lot of difficulties for studying algebraically the operations µ_{m}, thus we introduce more flexible objects, which are the ordered partitions with empty parts of Ω, resp. ordered orthogonal decompositions with zero summands of E: such a partition π^{*} (resp. decomposition) is a family (E_{1}, …, E_{m}) of disjoint subsets of Ω (resp. orthogonal subspaces of E), such that their union (resp. sum) is Ω (resp. E). The only difference with respect to ordered partitions, resp. decompositions, is that we accept to repeat ø (resp. 0) an arbitrary high number of times. For shortening we will name generalized decompositions these new objects. The number m is named the degree of π^{*}. These objects are the natural results of applying rooted observation trees embedded in an oriented half plane.

The notions of adaptation to A, **S** and X in **S** concerning the trees, apply to the generated generalized decompositions. The corresponding sets of generalized objets are written **S**^{*}(A) and
${\mathbf{S}}_{X}^{*}(A)$.

The multi-product µ(m; n_{1}, …, n_{m}) extends naturally to generalized decompositions, and in this case the degrees are respected, i.e., the result of this operation is a generalized decomposition of degree n_{1} + n_{2} + … + n_{m}.

Remark that we could write µ^{*}(m; n_{1}, …, n_{m}) for the multi-products extended to generalized decompositions, however we prefer to keep the same notation µ(m; n_{1}, …, n_{m}); this is justified by the following observation: to a generalized decomposition π^{*} is associated a unique ordered decomposition (π, ω), by forgetting the empty sets (resp. zero spaces) in the family, and the multi-product is compatible with this forgetting application. The gain of the extension is the easy construction of a monad we expose now.

The definition of operad was introduced by P. May [17] as the right tool for studying the homology of infinite loop spaces; then it was recognized as a fundamental tool for algebraic topology, and many other topics, see Loday and Valette, Fresse.

We will encounter only “symmetric” operads.

The multiple products μ_{m} on generalized decompositions can be assembled in a structure of monad by using the standard Schur construction (cf. Loday et Valette [10], or Fresse, “on partitions” [16]): For each X ∈ **S**, we introduce the real vector space V_{X} = V_{X}(A) freely generated by the set
${\mathcal{S}}_{X}^{*}(A),$ of generalized decompositions obtained by observation trees that are allowed by A, S and X; the length m define a graduation V_{X}(m) of V_{X}. We put V_{X}(0) = 0.

The maps µ_{m} generate m-linear applications from products of these spaces to themselves which respect the graduation; these applications, also denoted by µ_{m}, are parameterized by the sets
${\mathcal{S}}_{X}^{*}(m),$, whose elements are the generalized decompositions of degree m which are divided by X:

The linear Schur functor from the category of real vector spaces to itself, is defined by the direct sum of symmetric co-invariants:

The composition of Schur functors is defined by

**Proposition 5.** For each X in S, the collection of operations µ_{m} defines a linear natural transformation of functors µ_{X} : V_{X} ◦ V_{X} → V_{X}; and the trivial partition defines a linear natural transformation of functors η_{X} : R → V_{X}, which satisfy the axioms of a monad (cf. MacLane “Categories for Working Mathematician” 2nd ed. [4], and Alain Proute, Introduction a la Logique Categorique, 2013, Prepublications [54]):

**Proof.** The argument is the same as the argument given in Fresse (partitions …). The fact that the natural transformation µ_{X} is well defined on the quotient by the diagonal action of the symmetric group
${\mathcal{S}}_{m}$ on
${V}_{X}(m)\otimes {\otimes}_{i}{V}_{X}({n}_{i}){\otimes}_{{\mathcal{S}}_{{n}_{1},\dots ,{n}_{m}}}{W}^{\otimes s}$ comes from the verification of the symmetry axiom and the properties of associativity and neutral element comes from the verification of the corresponding axiom.

Moreover all these operations are natural for the functor of inclusion from the category S_{Y} to the category S_{X} of observables divided by Y and X respectively when X divides Y; therefore we have the following result:

**Proposition 6.** To each arrow X → Y in the category S is associated a natural transformation of functors
${\rho}_{X,Y}:{\mathcal{V}}_{Y}\to {\mathcal{V}}_{X}$, making a morphism of monads; this defines a contravariant functor
$\mathcal{V}$ from the category S to the category of monads, that we name the arborescent structural sheaf of S and A.

Considering the discrete topology on S, we introduce the topos of sheaves of modules over the functor in monads $\mathcal{V}$, which we call the arborescent information topos associated to S and A.

As explained in Proute loc.cit. [54] a monad in a category $\mathcal{C}$ becomes a monoid in the category of endo-functors of $\mathcal{C}$, thus the topos we introduce is equivalent to an ordinary ringed topos.

The monad
${\mathcal{V}}_{X}$, and the contravariant monadic functor
$\mathcal{V}$ on S, are better understood by considering trees, cf. Getzler-Jones [55], Ginzburg-Kapranov [56] and Fresse [16]; in our context we consider all observation trees labelled by elements of
${\mathbf{S}}_{X}^{*}A$: if Γ is an oriented rooted tree of level k, each vertex v of Γ gives birth to m_{v} edges; we define

The space V (Γ)(W) is the direct sum of spaces V_{X}(Γ_{Y}_{)}(W) associated to trees which are decorated by a subset Y in
${\mathbf{S}}_{X}^{*}(A)$, with one element Y_{v} of S_{X}(m) for each vertex v which gives birth to m_{v} edges.

Then the iterated functors ${\mathcal{V}}^{\circ k}=\mathcal{V}\circ \dots \circ \mathcal{V}$ for k ≥ 1 are the direct sums of the functors V (Γ) of level k. Remark that we could have worked directly with observation trees labelled by elements of A in spite of working with generalized partitions; this would have given a strictly larger monad but equivalent results.

Associated to probability families we define now a right ${\mathcal{V}}_{X}$-module (in the terms of Fresse, Partitions, the term ${\mathcal{V}}_{X}$-algebra being reserved to a structure of left module on a constant functor).

For that we introduce the notion of divided probability.

**Definition 16.** A divided probability law of degree m is a sequence of triplets (p, P, U) = (p_{1}, P_{1}, U_{1}; …; p_{m}, P_{m}, U_{m}), where p_{i}; i = 1, …, m are positive numbers of sum one, i.e., p_{1}+…+p_{m} = 1, where each P_{i}; i = 1, …, m is a classical (resp. quantum) probability law when the corresponding p_{i} is strictly positive, and a probability law or the empty set when the corresponding p_{i} is equal to 0, and where each U_{i}; i = 1, …, m is the support in X of P_{i}; moreover the U_{i} are assumed to be orthogonal (resp. disjoint in the classical case). The letter P will designate the probability p_{1}P_{1} +…+p_{m}P_{m}, where 0.∅ = 0 when it happens.

The symbol
$\mathcal{D}(m)$ designates the set of divided probabilities of degree m on X, and
${\mathcal{D}}_{X}(m)$ denotes the subset made with probability laws in Q_{X} adapted to a variable X.

The vector space generated by ${\mathcal{D}}_{X}(m)$ will be written ${\mathcal{L}}_{X}(m)$. We put ${\mathcal{L}}_{X}(0)=0$.

We also introduce the subspace $\mathcal{K}(m)$ of ${\mathcal{L}}_{X}(m)$ which is generated by two families of vectors in ${\mathcal{L}}_{X}(m)$:

First the vectors

_{1},…, P

_{m}) and the same supports (U

_{1}, …, U

_{m});

Second the vectors

_{i}> 0 we have ${P}_{i}={{P}^{\prime}}_{i}$, and consequently ${U}_{i}={{U}^{\prime}}_{i}$.

The we define the space of classes of divided probabilities as the quotient real vector space
${\mathcal{M}}_{X}(m)={\mathcal{L}}_{X}(m)/\mathcal{K}(m)$. In particular M_{X} (0) = 0, M_{X} (1) is freely generated over ℝ by the elements of **Q**_{X}.

**Lemma 10.** The space
${\mathcal{M}}_{X}(m)$ is freely generated over ℝ by the vectors (∅, …, ∅, P_{i}, ∅, …, ∅) of length m, where at the rank i, P_{i} is an element of **Q**_{X}.

**Proof.** Let D = (p_{1}, P_{1}, U_{1}), …, (p_{m}, P_{m}, U_{m}) be a divided probability; we consider for each i between 1 and m the divided probability

_{i}− (∅, …, ∅, P

_{i}, ∅, …, ∅) is of type D, thus the particular vectors of the Lemma 10 generate ${\mathcal{M}}_{X}(m)$.

Now, we prove that, if a linear combination of r of these vectors belongs to
${\mathcal{K}}_{X}$, the coefficients of this combination must all be equal to 0. We proceed by recurrence on r, the result being evident for r = 1. We also can suppose that at least two involved vectors have a non-empty element at the same place, which we can suppose to be i = 1. All vectors with p_{1} = 0 can be replaced by a vector where P_{1} = ∅ using an element of type D in
${\mathcal{K}}_{X}(m)$, then we can assume that at least one of the vectors has a p_{1} strictly positive, i.e., equals to 1. Let us consider all these vectors D_{1}, …, D_{s}, for 2 ≤ s ≤ r, their other numbers p_{i} for i > 1 are zero. The other vectors D_{j}, for j > s having the coordinate p_{1} equal to zero. Let ∑_{j} λ_{j} D_{j} be the linear combination of length r belonging to
${\mathcal{K}}_{X}(m)$; this vector is a linear combination of vectors of type L and D. We can suppose that every λ_{j} is non-zero. Let us consider an element Q of **Q**_{X} which appears in at least one of the D_{j}, j ≤ s; this Q cannot appear in only one D_{j}, because the sum of coefficients λ multiplied by the first p_{1} in front of any given Q in a vector L or D is zero. Thus we have at least two D_{j} with the same P_{1}. We can replace the sum of them with λ_{j} positive (resp. negative) by only one special vector of the Lemma 10 using a sum of multiples of vectors of type L. Then we are left with the case of two vectors, D_{1}, D_{2} having P_{1} = Q such that λ_{1} + λ_{2} = 0, which means that λ_{1}D_{1} + λ_{2}D_{2} is multiple of a vector of type D. Subtracting it we can apply the recurrence hypothesis and conclude that the considered linear relation is trivial.

As a corollary an equivalent definition of the spaces
${\mathcal{M}}_{X}(m)$ would be the real vector space freely generated by pairs (P, i) where P ∈ **Q**_{X} and i ∈ [m]. Such a vector, identified with (∅, .., P, …, ∅) in
${\mathcal{L}}_{X}(m)$, where only the place i is non-empty, will be named a simple vector of degree m.

Let S = (S_{1}, …, S_{m}) be a sequence of generalized decompositions in
${\mathbf{S}}_{X}^{*}(A)$, of respective degrees n_{1}, …, n_{m}, with n = n_{1} + … + n_{m}, and let (p, P, U) be an element of
${\mathcal{D}}_{X}(m)$, we define θ((p, P, U), S) as the following divided probability of degree n: if, for i = 1, …, m the decomposition S_{i} is made of pieces
${E}_{i}^{{j}_{i}}$ where j_{i} varies between 1 and n_{i}, we take for
${p}_{i}^{ji}$ is the classical probability
$\mathbb{P}({E}_{i}^{{j}_{i}}\cap {U}_{i})$; we take for
${P}_{i}^{{j}_{i}}$ the law P_{i} conditioned by the event S_{i} = j_{i} which corresponds to
${E}_{i}^{{j}_{i}}$; and we take for
${U}_{i}^{{j}_{i}}$ the support of
${P}_{i}^{{j}_{i}}$. Then we order the obtained family of triples
${({p}_{i}^{{j}_{i}},{P}_{i}^{{j}_{i}},{U}_{i}^{{j}_{i}})}_{i=1,\dots ,m;{j}_{i}=1,\dots ,{n}_{i}}$ by the lexicographic ordering. It is easy to verify that the resulting sequence is a divided probability.

Extending by linearity we get a linear map,

By linearity a vector of type L in
${\mathcal{L}}_{X}\left(m\right)$, tensorized with S_{1}⊗…⊗S_{m} goes to a linear combination of vectors of type L in
${\mathcal{L}}_{X}\left(n\right)$. Moreover, if p_{i} = 0 for an index i in [m], all the
${p}_{i}^{{j}_{i}}$ are zero, thus a vector of type D goes to a vector of type D. Then the map λ_{m} sends the subspace
${\mathcal{K}}_{X}\left(m\right)\otimes {V}_{X}\left({n}_{1}\right)\otimes \dots \otimes {V}_{X}\left({n}_{m}\right)$ into the subspace
${\mathcal{K}}_{X}\left({n}_{1}+\dots {n}_{m}\right)$, thus it defines a linear map

On a simple vector (P, i), the operation θ_{m} is independent of the S_{j} for i ≠ i.

Now we introduce the Schur functor ${\mathcal{M}}_{X}$ of symmetric co-invariant spaces ${\mathcal{M}}_{X}(W)={\oplus}_{m}{\mathcal{M}}_{X}\left(m\right)\otimes {\mathcal{S}}_{m}{W}^{\otimes m}$ from the category of real vector space to itself, associated to the $\mathcal{S}-\mathrm{module}$${\mathcal{M}}_{X}^{*}$ (cf. Loday and Valette [10], Fresse [16]), formed by the graded family ${\mathcal{M}}_{X}\left(m\right);m\in \mathbb{N}$.

Then the maps θ_{m} define a natural transformation of functors:

In addition, this set of transformations behaves naturally with respect to X in the information category S. Note that it defines a co-variant functor, not a presheaf.

For simplicity, we will note in general θ, µ,
$\mathcal{F}$,
$\mathcal{V}$, … and not θ_{X}, µ_{X},
${\mathcal{F}}_{X}$,
${\mathcal{V}}_{X}$, …, but we memorize this is an abuse of language.

Then the composite functor $\mathcal{M}\circ \mathcal{V}(W)$ is given by

**Proposition 7.** The natural transformation θ defines a right action in the sense of monads, i.e., we have

**Proof.** The proof is the same as for proposition 5, by using the associativity of conditioning, and the Bayes identity P (A ∩ B) = P (A|B)P (B).

Ginzburg and Kapranov [56] gave a construction of the (co)bar complex of an operad based on decorated trees. It is a graded complex of operads, with a differential operator of degree −1. The dual construction can be found in Getzler et Jones [55]; it gives a graded complex of co-operads with a differential operator of degree +1. The link with quasi-free co-operads and operads (Quillen’s construction) is developed by Fresse (in “partitions” [16]); in this article Fresse also shows that these constructions correspond to the simplicial bar construction for the monads (Maclane) and to the natural notions of derived functors in this context.

In our case, with two right modules, the easiest way is to use the bar construction of Beck (1967) [19], further explicited by Fresse with decorated trees in the case of monads coming from operads.

A morphism from a right module $\mathcal{M}$ over $\mathcal{V}$ to a right module $\mathcal{R}$ over $\mathcal{V}$ is a natural transformation f of the first functor in the second such that $f\circ {\theta}_{M}={\theta}_{R}\circ f\mathcal{V}$.

In what follows we will use the module R which comes from the functor of symmetric powers:

The right action of ${\mathcal{V}}_{X}$ is given by the map

_{1}, …, S

_{m}) to 1 in $\mathcal{R}\left(n\right)=\mathbb{R}$.

The axioms of a right module are easy to verify.

This $\mathcal{V}$-module $\mathcal{R}$ will play the dual role of the trivial module in the case of information structure co-homology.

Following Beck (Triples, Algebras, Cohomology, 1967, 2002 [19]), we consider the simplicial bar complex ${\mathcal{M}}_{X}\circ {V}_{X}^{*}$ extending the right module $\mathcal{M}$ on $\mathcal{V}$ by the sequence of modules $\dots .\to {\mathcal{M}}_{X}\circ {\mathcal{V}}_{X}^{\circ (k+1)}\to {\mathcal{M}}_{X}\circ {\mathcal{V}}_{X}^{\circ k}\to \dots $. Then we introduce the growing complex ${C}^{\ast}\left({\mathcal{M}}_{X}\right)$ of measurable morphisms from ${\mathcal{M}}_{X}\circ {\mathcal{V}}_{X}^{*}$ to the symmetric right module R.

For a given k ≥ 0, a morphism F from ${\mathcal{M}}_{X}\circ {\mathcal{V}}_{X}^{\circ k}$ to R is defined by a family of maps F (N) : ${\mathcal{M}}_{X}\circ {\mathcal{V}}_{X}^{\circ k}(N)\to \mathcal{R}(N)=\mathbb{R}$ for N ∈ ℕ.

This gives a family of measurable numerical functions of a divided probability law (p, P, U), of degree m ≤ N, indexed by forests having m components trees of height k and having total number of ending branches N.

We denote such a family of functions by the symbol F_{X}(S_{1}; S_{2}; …; S_{k}; (p, P, U)), indexed by X in S, where S_{1}; …; S_{k} here designates the sets of decompositions present in the trees at each level from 1 to k.

First we remark that the compatibility with the action of
${\mathcal{V}}_{X}$ to the right imposes that for any allowed set of variables S_{k}_{+1} we must have

By taking for S_{k} the collection (π_{0}, …, π_{0}), we deduce that F_{X} is independent of the last variable.

This has the effect of decreasing the degree in k by one, for respecting the preceding conventions on information cochains; i.e., we pose ${\mathcal{C}}^{k}({M}_{X})=Hom({\mathcal{M}}_{X}\circ {\mathcal{V}}^{\circ (k+1)},\mathcal{R})$.

Secondly, as we are working with the quotient of the space generated by divided probabilities (p, P, U) by the space generated by linearity relations on the external law p, for (p, P, U) of degree m, we have

Moreover, from the definition of θ and the rule of composition of functors, for any m ≥ 1 and i ∈ [m], and any simple vector (Q, i, m), the value of F on any forest depends only on the tree component of index i; that we can summarize by the following identity:

**Definition 17.** An element of
${\mathcal{C}}^{k}\left({M}_{X}\right)$ is said regular when for each degree m and each index i between 1 and m, we have, for each ordered forest S_{1}; S_{2}; …; S_{k} of m trees, and each probability Q,

Due to Equation (150), this makes that regular elements are defined by their values on trees and ordinary, not divided probabilities.

The adjective regular can be better interpreted as “local in the sense of observation trees”.

The vector space
${\mathcal{C}}_{X}^{k}\left(N\right)$ is generated by families of functions of divided probabilities F_{X}(S_{1}; S_{2}; …; S_{k}; (p, P, U)), indexed by X in S and forests S_{1}; …; S_{k} of level k. These families are supposed local with respect to X, which means that it is compatible with direct image of probabilities under observables in S^{∗}.

**Remark 12.** As we showed in the static case, in the classical context, locality is equivalent to the fact that the values of the functions depend on ℙ through the direct images of ℙ by the joint of all the ordered observables which decorate the tree (the joint of the joints along branches); but this is not necessarily true in the quantum context, where it depends on **Q**. However it is true for **Q**^{min}, in particular **Q**^{can} which is the most natural choice.

The spaces ${\mathcal{C}}^{k}\left({M}_{X}\right)$ form a natural degree one complex:

The faces
${\delta}_{i}^{(k)};1\le i\ge k$ are given by applying µ on
$\mathcal{V}\circ \mathcal{V}$ at the places (i, i + 1); the last face
${\delta}_{k+1}^{(k)};1\le i\ge k$ consists in forgetting the last functor, the operation denoted by ϵ; and the zero face is given by the action θ. Then the boundary δ^{(}^{k}^{)} is the alternate sum of the operators
${\delta}_{i}^{\left(k\right)};0\le i\ge k+1$: if F is measurable morphism from
$\mathcal{M}\circ {\mathcal{V}}^{\circ k}$ to ℝ, then

The zero face in the complex
${\mathcal{C}}_{X}^{*}$ corresponds to the right action of the monad V_{X} on divided probabilities; on regular cochains it is expressed by a generalization of the formula (20): if (P, i, m) is a simple vector of degree m and S_{0}; S_{1}; …; S_{k} a forest of level k + 1, with m component trees, then

_{i}grafted on the branch j

_{i}of the variable S

_{0}

_{,i}at the place i in the collection S

_{0}.

The formula (154) is compatible with the existence of dead branches.

Note that natural integers come into the play under two different aspects: m is for the internal monadic degree and counts the number of components, or the length of partitions, k is for the height of the trees in the forest. The number k gives the degree in co-homology.

The coboundary δ of ${\mathcal{C}}^{\ast}$ is of degree +1 with respect to k and degree 0 with respect to m. For any m ∈ ℕ, the operator δ has the formula of the coboundary given by the simplicial structure associated to θ and µ:

We constat that locality is preserved by δ.

**Lemma 11.** If the transformation F is regular, then δF is regular; in other terms, the regular elements form a sub-complex
${\mathcal{C}}^{k}r\left({\mathcal{M}}_{X}\right)$.

**Proof.** Let (P, i, m) be a simple vector and S_{0}; …; S_{k} a forest with m components; let us denote by
${S}_{0}^{j}$ the variable number j having degree n_{j}, and n = n_{1} + … + n_{m}; we have

The first term on the right is a combination of the image of F for the n simple vectors
$P.{S}_{0}^{i,{j}_{i}}$ of degree n = n_{1} + … + n_{m} which result from the division of (P, i, m) by
${S}_{0}^{i}$. If F is regular, this combination is the same as the combination of the simple vectors of degree n_{i} constituting the division of (P, i, m) by
${S}_{0}^{i}$, which gives the same result as the first term on the right in the formula

If F is regular the term number l > 1 on the right of the equation (156) coincides with the corresponding term on the right of the Equation (157).

Therefore the terms on the left in Equation (156) coincides with the left term in (157); which establishes the lemma.

We define ${\mathcal{C}}_{r}^{*}\left({\mathcal{M}}_{X}\right)$ as the sub-complex of regular vectors in ${\mathcal{C}}^{\ast}\left({\mathcal{M}}_{X}\right)$. Its elements are named tree information cochains or arborescent information cochains.

By definition, the tree information co-homology is the homology of this regular complex, considered as a sheaf of complexes over the category S(A), i.e., a contravariant functor. This corresponds to the topos information co-homology in the monadic context.

To recover the case of the ordinary algebra of partitions, and the formulas of the bar construction in the first sections of this article, we have to take the special case where all the decompositions of the same level coincide at every level of the forests. In this case, we can replace the quotient ${\mathcal{M}}_{X}$ by the modules of conditioning by a redefinition of the action on functions ${\mathcal{F}}_{X}$. However the notion of divided probabilities for observation trees and the definition of co-homology in the monadic context can be seen as the natural basis of information co-homology.

When k = 0, in the classical case, a cochain is a function f(ℙ), the locality condition tells that it is a constant; and in this case it is a cocycle because the sum of probabilities equals one implies f(ℙ) = f_{S}(ℙ). Then
${H}_{\tau}^{0}$ has dimension one.

When k = 0, in the quantum case, the spectral functions of ρ in the **Q**_{X} gives invariant information co-chains. Among them the Von Neumann entropy is specially relevant because its co-boundary gives the classical entropy. However, only the constant function is an invariant zero degree co-cycle. Thus again
${H}_{U}^{0}$ has dimension one.

For k = 1, a cochain is given by a function F_{X}(S; P), such that, each time we have X → Y → S and elements of Y refines S, we have F_{X}(S; P) = F_{Y} (S; Y_{∗}P). It is a cocycle when for every collection S_{1}, …, S_{m} of m observables, where m is the length of S, we have

Note that the partition µ_{m}(S, (S_{1}, …, S_{m})) is not the joint of S and the S_{i} for i ≥ 1, except when all the S_{i} coincide. Thus it is amazing that the ordinary entropy also satisfies this functional equation, finer than the Shannon’s identity:

**Proposition 8.** The usual entropy H(S_{∗}ℙ) = H(S; ℙ) is an arborescent co-cycle.

**Proof.** By linearity on the module of divided probabilities
${\mathcal{M}}_{X}$, we can decompose the probability ℙ in the conditional probabilities ℙ|(S = s), thus we can restrict the proof of the lemma to the case where S = π_{0} is the trivial partition, i.e., m = 1.

Let X_{i}; i = 1, …, m denote the elements of the partition associated to S_{0} and
${X}_{i}^{j};j=1,\dots ,{n}_{i}$ the pieces of the intersection of X_{i} with the elements of the partition associate to S_{i}; note p_{i} the probability of the event X_{i} and
${p}_{i}^{j}$ the probability of the event
${X}_{i}^{j}$; we have

Q.E.D.

This identity was discovered by Faddeev, Baez, Fritz, Leinster see [20]. However, we propose that information homology explains its significance.

When the category of quantum information **S**, the set A and the probability functor **Q** are invariant under the unitary group, and if we choose a classical full subcategory
$\mathcal{S}$, there is trace map from **Q** to
$\mathcal{Q}$, induces a morphism from the classical arborescent co-homology of
$\mathcal{S}$, A and
$\mathcal{Q}$ to the invariant quantum arborescent co-homology of **S**, A and **Q**.

As a corollary of the Lemma 10 and the Theorems 1 and 3, we obtain the following result:

**Theorem 4.** (i) both in the classical and the invariant quantum context, if S(A) is connected, sufficiently rich, and if Q is canonical, every 1-co-cycle is co-homologous to the entropy of Shannon; (ii) in the classical case H^{1}(
$\mathcal{S}$, A,
$\mathcal{Q}$) is the vector space of dimension 1 generated by the entropy; (iii) in the quantum case
${H}_{U}^{1}\left(\mathbf{S},A,\mathbf{Q}\right)=0$, and the only invariant 0-cochain which has for co-boundary the Shannon entropy is (minus) the Von-Neumann entropy.

#### 6.4. Arborescent Mutual Information

For k = 2, a cochain is given by a local function of a probability and a rooted decorated tree of level 2. It is a cocycle when the following functional equation is satisfied

_{1}, …, T

_{m}of respective lengths n

_{1}, …, n

_{m}and U a collection of variables ${U}_{i,j}^{k}$ of respective lengths n

_{i,j}, with i going from 1 to m, j going from 1 to n

_{i}and k going from 1 to n

_{i,j}; the notation U

_{i}denoting the collection of variables ${U}_{i,j}^{k}$ of index i.

Our aim is to extend in the monadic context the topological action of the ordinary information structure on functions of probability used in the discussion of mutual information.

For that, we define another structure of
${\mathcal{V}}_{X}$ -right module on the functor
${\mathcal{M}}_{X}$ associated to probabilities, by defining the following map θ_{t}(m) from
${\mathcal{M}}_{X}\left(m\right)$ tensorized with V_{X}(n_{1})⊗…⊗V_{X}(n_{m}) to
${\mathcal{M}}_{X}\left(n\right)$, for n = n_{1} + … + n_{m}:

Remark that the generalized decompositions S_{j} are used only through the orders on their elements.

As for
$\mathcal{R}$, it is easy to verify that the collection of maps θ_{t}(m) defines a right action of the monad V_{X} on the Schur functor
${\mathcal{M}}_{X}$.

Then we consider as before, the graded vector space ${C}^{\ast}\left({\mathcal{M}}_{X}\right)$ of homomorphisms of $\mathcal{V}$-modules from the functors $\mathcal{M}\circ {\mathcal{V}}^{\circ k};k\ge 0$ to the functor $\mathcal{R}$ which are measurable in the probabilities P . As before, on ${C}^{\ast}\left({\mathcal{M}}_{X}\right)$, we shift the degree by one, because of the independency with respect to the last stage of the forest, which follows from the trivial action on $\mathcal{R}$.

The topological coboundary operator δ_{t} is defined in every degree by the formula of the simplicial bar construction, as in Equation (153) for δ, but with θ_{t} replacing θ. It corresponds to the usual simplicial complex of the family
${\mathcal{V}}^{\circ k}$. A cochain is represented by a family of functions of probability laws F_{X}(S_{1}; …; S_{k}; (P, i, m)), where S_{1}; …; S_{k} denotes a forest with m trees of level k. The operator δ_{t} is given by

_{1}+ … + n

_{m}is the sum of numbers of branches of the generalized decompositions ${S}_{0}^{i}$ for i = 1, …, m.

As for δ, a value F (S_{1}; …; S_{k}; (P, j, n) depends only on the tree
${S}_{1}^{j};\dots ;{S}_{k}^{j}$ rooted at the place numbered by j in the forest S_{1}; …; S_{k}.

**Lemma 12.** The coboundary δ_{t} sends a regular cochain to a regular cochain.

**Proof.** Consider a simple vector (P, i, m) in $\mathcal{M}$_{X}(m) and a forest S_{0}; …; S_{k} with m components; we denote by
${S}_{0}^{j}$ the variable number j having degree n_{j}, and n = n_{1} + … + n_{m}, and we consider the formula (167).

If F is regular the first term on the right is the sum of the images by F for P and the n trees
${S}_{1}^{i,{j}_{i}}$ which result from the forgetting of the first branches
${S}_{0}^{j}$, and the other terms on the right are equal to the value of F for P and the tree rooted at i in S_{0}. On the other side for the tree
${S}_{0}^{i};\dots ;{S}_{k}^{i}$, if F is regular, we have

Thus δF is topologically regular.

Consequently we can restrict δ_{t} to the subcomplex
${\mathcal{C}}_{r}^{*}\left({N}_{X}\right)$, and name its homology the arborescent, or tree, topological information co-homology, written H_{τ,t}^{∗}(S^{∗}, A, Q).

Now we suggest to extend the notion of mutual information I(X; Y ; ℙ) in the way it will be a cocycle for this co-homology as it was the case for the Shannon mutual information in the ordinary topological information complex. We suggest to adopt the formulas using δ and δ_{t}, as in the standard case:

**Definition 18.** Let H(T ; (P, i, m)) denotes the regular extension to forests of the usual entropy; then the mutual arborescent information between a partition S of length m and a collection T of m partitions T_{1}, …, T_{m} is defined by

The identity δH = 0 implies

In the particular case were all the T_{i} are equal to a variable T , it gives

For
$\mathcal{S}(A($, the function I_{α} is an arborescent topological 2-cocycle.

It satisfies the Equation (165) were ℙ replaces conditional probabilities ℙ|(S = i) and where the factors ℙ(S = i) disappear. Remark that, in this manner, maximization of I_{α}(S; T ; ℙ) comports maximization of usual mutual information I(S; T ; ℙ) and unconditioned entropies H(T_{i}; ℙ).

Pursuing the homological interpretation of higher mutual information quantities given by the Formulas (55) and (56), we suggest the following definition:

**Definition 19.** The mutual arborescent informations of higher orders are given by I_{α,N} = −(δδ_{t})^{M}H for N = 2M + 1 odd and by I_{α,N} = δ_{t}(δδ_{t})^{M} H for N = 2M + 2 even.

## Acknowledgments

We thank MaxEnt14 for the opportunity to present these researches to the information science community. We thank Guillaume Marrelec for discussions and notably his participation to the research of the last part on optimal discrimination. We thank Frederic Barbaresco, Alain Chenciner, Alain Proute and Juan-Pablo Vigneaux for discussions and comments on the manupscript. We thank the "Institut des Systemes complexes" (ISC-PIF) region Ile-de-France, and Max Planck Institute For Mathematic in the Science for the financial support and hosting of P. Baudot.

## Author Contributions

Both authors contribute equally to the research, the second author wrote the manuscript. Both authors have read and approved the final manuscript.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J
**1948**, 27, 379–423. [Google Scholar] - Kolmogorov, A. Combinatorial foundations of information theory and the calculus of probabilities. Russ. Math. Surv.
**1983**, 38. [Google Scholar] [CrossRef] - Thom, R. Stabilité struturelle et morphogénèse; deuxième ed.; Dunod: Paris, France, 1977; in French. [Google Scholar]
- Mac Lane, S. Categories for the Working Mathematician; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Mac Lane, S. Homology; Springer: Berlin/Heidelberg, Germany, 1975. [Google Scholar]
- Hu, K.T. On the Amount of Information. Theory Probab. Appl.
**1962**, 7, 439–447. [Google Scholar] - Baudot, P.; Bennequin, D. Information Topology I. in preparation.
- Elbaz-Vincent, P.; Gangl, H. On poly(ana)logs I. Compos. Math.
**2002**, 130, 161–214. [Google Scholar] - Cathelineau, J. Sur l’homologie de sl2 a coefficients dans l’action adjointe. Math. Scand.
**1988**, 63, 51–86. [Google Scholar] - Loday, J.L.; Valette, B. Algebraic Operads; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Matsuda, H. Information theoretic characterization of frustrated systems. Physica A
**2001**, 294, 180–190. [Google Scholar] - Brenner, N.; Strong, S.; Koberle, R.; Bialek, W. Synergy in a Neural Code. Neural Comput.
**2000**, 12, 1531–1552. [Google Scholar] - Nielsen, M.; Chuang, I. Quantum Computation and Quantum Information; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Baudot, P.; Bennequin, D. Topological forms of information. AIP Conf. Proc.
**2015**, 1641, 213–221. [Google Scholar] - Baudot, P.; Bennequin, D. Information Topology II. in preparation.
- Fresse, B. Koszul duality of operads and homology of partitionn posets. Contemp. Math. Am. Math. Soc.
**2004**, 346, 115–215. [Google Scholar] - May, J.P. The Geometry of Iterated Loop Spaces; Springer: Berlin/Heidelberg, Germany, 1972. [Google Scholar]
- May, J.P. Einfinite Ring Spaces and Einfinite Ring Spectra; Springer: Berlin/Heidelberg, Germany, 1977. [Google Scholar]
- Beck, J. Triples, Algebras and Cohomology. Ph.D. Thesis, Columbia University, New York, NY, USA, 1967. [Google Scholar]
- Baez, J.; Fritz, T.; Leinster, T. A Characterization of Entropy in Terms of Information Loss. Entropy
**2011**, 13, 1945–1957. [Google Scholar] - Marcolli, M.; Thorngren, R. Thermodynamic Semirings
**2011**, arXiv. [CrossRef] - Baudot, P.; Bennequin, D. Information Topology III. in preparation.
- Gromov, M. In a Search for a Structure, Part 1: On Entropy, 2013. Available online: http://www.ihes.fr/gromov/PDF/structre-serch-entropy-july5-2012.pdf accessed on 6 May 2015.
- Watkinson, J.; Liang, K.; Wang, X.; Zheng, T.; Anastassiou, D. Inference of Regulatory Gene Interactions from Expression Data Using Three-Way Mutual Information. Chall. Syst. Biol. Ann. N.Y. Acad. Sci.
**2009**, 1158, 302–313. [Google Scholar] - Kim, H.; Watkinson, J.; Varadan, V.; Anastassiou, D. Multi-cancer computational analysis reveals invasion-associated variant of desmoplastic reaction involving INHBA, THBS2 and COL11A1. BMC Med. Genomics.
**2010**, 3. [Google Scholar] [CrossRef] - Uda, S.; Saito, T.H.; Kudo, T.; Kokaji, T.; Tsuchiya, T.; Kubota, H.; Komori, Y.; ichi Ozaki, Y.; Kuroda, S. Robustness and Compensation of Information Transmission of Signaling Pathways. Science
**2013**, 341, 558–561. [Google Scholar] - Han, T.S. Linear dependence structure of the entropy space. Inf. Control.
**1975**, 29, 337–368. [Google Scholar] - McGill, W. Psychometrika. Multivar. Inf. Transm.
**1954**, 19, 97–116. [Google Scholar] - Kolmogorov, A.N. Grundbegriffe der Wahrscheinlichkeitsrechnung; Springer: Berlin/Heidelberg, Germany, 1933; in German. [Google Scholar]
- Artin, M.; Grothendieck, A.; Verdier, J. Théorie des topos et cohomologie étale des schémas—(SGA 4) Tome I,II,III; Springer: Berlin/Heidelberg, Germany, in French.
- Grothendieck, A. Sur quelques points d’algèbre homologique, I. Tohoku Math. J
**1957**, 9, 119–221. [Google Scholar] - Gabriel, P. Objets injectifs dans les catégories ab liennes. Séminaire Dubreil. Algèbre et théorie des nombres 12, 1–32.
- Bourbaki, N. Algèbre, chapitre 10, Algèbre homologique; Masson: Paris, France, 1980; in French. [Google Scholar]
- Cartan, H.; Eilenberg, S. Homological Algebra; The Princeton University Press: Princeton, NJ, USA, 1956. [Google Scholar]
- Tverberg, H. A new derivation of information function. Math. Scand.
**1958**, 6, 297–298. [Google Scholar] - Kendall, D. Functional Equations in Information Theory. Z. Wahrscheinlichkeitstheorie
**1964**, 2, 225–229. [Google Scholar] - Lee, P. On the Axioms of Information Theory. Ann. Math. Stat.
**1964**, 35, 415–418. [Google Scholar] - Kontsevitch, M. The 1+1/2 logarithm. Unpublished note. Reproduced in Elbaz-Vincent & Gangl, 2002 On poly(ana)logs I. Compositio Mathematica, 1995. e-print math.KT/0008089. [Google Scholar]
- Khinchin, A. Mathematical Foundations of Information Theory; Dover: New York, NY, USA; Silverman, R.A.; Friedman, M.D., Translators; From two Russian articles in Uspekhi Matematicheskikh Nauk; 1957; pp. 17–75. [Google Scholar]
- Yeung, R. Information Theory and Network Coding; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Cover, T.M.; Thomas, J. Elements of Information Theory; Wiley: Weinheim, Germany, 1991. [Google Scholar]
- Rindler, W.; Penrose, R. Spinors and Spacetime, 2nd ed; Cambridge University Press: Cambridge, UK, 1986. [Google Scholar]
- Landau, L.D.; Lifshitz, E.M. Fluid Mechanics, 2nd ed; Volume 6 of a Course of Theoretical Physics; Pergamon Press, 1959. [Google Scholar]
- Balian, R. Emergences in Quantum Measurement Processes. KronoScope
**2013**, 13, 85–95. [Google Scholar] - Borel, A.; Ji, L. Compactifications of Symmetric and Locally Symmetric Spaces. In Unitary Representations and Compactifications of Symmetric Spaces; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Doering, A.; Isham, C. Classical and quantum probabilities as truth values. J. Math. Phys.
**2012**, 53. [Google Scholar] [CrossRef] - Meyer, P. Quantum Probability for Probabilists; Springer: Berlin, Germany, 1993. [Google Scholar]
- Souriau, J. Structure des Systemes Dynamiques; Jacques Gabay: Paris, France, 1970; in French. [Google Scholar]
- Catren, G. Towards a Group-Theoretical Interpretation of Mechanics. Philos. Sci. Arch. 2013. http://philsci-archive.pitt.edu/10116/. [Google Scholar]
- Bachet Claude-Gaspar, Problèmes plaisans et délectables, qui se font par les nombres; A. Blanchard: Paris, France, 1993; p. 1612, in French.
- Jaynes, E.T.; Information, Theory. Statistical Mechanics. In Statistical Physics; Ford, K., Ed.; Benjamin: New York, NY, USA, 1963; p. 181. [Google Scholar]
- Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern.
**1968**, 4, 227–241. [Google Scholar] - Cohen, F.; Lada, T.; May, J. The Homology of Iterated Loop Spaces; Springer: Berlin, Germany, 1976. [Google Scholar]
- Prouté, A. Introduction la Logique Catégorique, 2013. Available online: www.logique.jussieu.fr/~alp/ accessed on 6 May 2015.
- Getzler, E.; Jones, J.D.S. Operads, homotopy algebra and iterated integrals for double loop spaces
**1994**, arXiv. hep-th/9403055v1. - Ginzburg, V.; Kapranov, M.M. Koszul duality for operads. Duke Math. J
**1994**, 76, 203–272. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).