#
The Homological Nature of Entropy^{ †}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### 1.1. What is Information?

_{x}P(X = x) ln

_{2}P(X = x)), and for Galois it is the quotient set IGal(L

_{1}; L

_{2}|K) = (Gal(L

_{1}|K) × Gal(L

_{2}|K))/Gal(L|K), where L

_{1}, L

_{2}are two fields containing a field K in an algebraic closure Ω of K, where L is the field generated by L

_{1}and L

_{2}in Ω, and where $Gal\left({L}_{i}|K\right)=\left(\text{for}\phantom{\rule{0.2em}{0ex}}i=\overline{)0},1,2\right)$ denotes the group introduced by Galois, made by the field automorphisms of L

_{i}fixing the elements of K.

#### 1.2. Information Homology

_{x}of Ω defined by the equations X(ω) = x); the join r.v YZ, also denoted by (Y, Z), corresponds to the less fine partition that is finer than Y and Z. This defines a monoid structure on the set n(Ω) of partitions of Ω, with 1 as a unit, and where each element is idempotent, i.e., ∀X, XX = X. An information category is a set $\mathcal{S}$ of r.v such that, for any $Y,Z\in \mathcal{S}$ less fine than $U\cup \mathcal{S}$, the join YZ belongs to $\mathcal{S}$, cf. [7]. An ordering on S is given by Y ≤ Z when Z refines Y, which also defines the morphisms Z → Y in the category $\mathcal{S}$. In what follows we always assume that 1 belongs to $\mathcal{S}$. The simplex ∆(Ω) is defined as the set of families of numbers {p

_{ω}; ω ∊ Ω}, such that ∀ω, 0 ≤ p

_{ω}≤ 1 and Σ

_{ω}p

_{ω}= 1; it parameterizes all probability laws on Ω. We choose a simplicial sub-complex $\mathcal{P}$ in Δ(Ω), which is stable by all the conditioning operations by elements of $\mathcal{S}$. By definition, for N ∊ ℕ, an information N-cochain is a family of measurable functions of $P\in \mathcal{P}$, with values in ℝ or ℂ, indexed by the sequences (S

_{1};…;S

_{N}) in $\mathcal{S}$ majored by an element of $\mathcal{S}$, whose values depend only of the image law (S

_{1}, …, S

_{N})

_{*}P. This condition is natural from a topos point of view, cf. [4]; we interpret it as a “locality” condition. Note that we write (S

_{1}; …; S

_{N}) for a sequence, because (S

_{1}, …, S

_{N}) designates the joint variable. For N = 0 this gives only the constants. We denote by ${\mathcal{C}}^{N}$ the vector space of N-cochains of information. The following formula corresponds to the averaged conditioning of Shannon [1]:

_{0}, and the vertical bar is ordinary conditioning. It satisfies the associativity condition $\left({{S}^{\prime}}_{0}{S}_{0}\right).F={{S}^{\prime}}_{0}.\left({S}_{0}.F\right)$.

_{t}(t for twisted or trivial action or topological complex), that is defined by the above formula with the first term S

_{0}.F (S

_{1};…; S

_{N}; ℙ) replaced by F (S

_{1};…; S

_{N}; ℙ). The corresponding co-cycles are defined by the equations δF = 0 or δ

_{t}F = 0, respectively. We easily verify that δ ○ δ = 0 and δ

_{t}○ δ

_{t}= 0; then co-homology $H*\left(\mathcal{S};\mathbb{P}\right)$ resp. ${H}_{t}^{*}\left(\mathcal{S};\mathbb{P}\right)$ is defined by taking co-cycles modulo the elements of the image of δ resp. δ

_{t}, called co-boundaries. The fact that classical entropy H(X; ℙ) = − Σ

_{i}p

_{i}log

_{2}p

_{i}is a 1-co-cycle is the fundamental equation H(X, Y) = H(X) + X.H (Y).

**Theorem A.**(cf. Theorem 1 section 2.3, [7]): For the full simplex ∆(Ω), and if $\mathcal{S}$ is the monoid generated by a set of at least two variables, such that each pair takes at least four values, then the information co-homology space of degree one is one-dimensional and generated by the classical entropy.

**Problem 1.**Compute the homology of higher degrees.

_{2}over ℂ with coefficients in the adjoint action (cf. [9]).

_{I}denotes the join of the S

_{i}such that i ∊ I. We have I

_{1}= H and I

_{2}= I is the usual mutual information: I(S; T) = H(S) + H (T) − H(S, T).

**Theorem B.**(cf. section 3, [7]): I

_{2}

_{m}= δ

_{t}δδt…δδ

_{t}H, I

_{2}

_{m}

_{+1}= −δδ

_{t}δδt…δδ

_{t}H, where there are m − 1 δ and m δ

_{t}factors for I

_{2}

_{m}and m δ and m δ

_{t}factors for I

_{2m+1}.

_{t}.

^{c}. In special cases we can interpret I

_{N}as homotopical algebraic invariants. For instance for N = 3, suppose that I(X; Y) = I(Y; Z) = I(Z; X) = 0, then I

_{3}(X; Y; Z) = −I ((X,Y); Z) can be defined as a Milnor invariant for links, generalized by Massey, as they are presented in [10] (cf. page 284), through the 3-ary obstruction to associativity of products in a subcomplex of a differential algebra, cf. [7]. The absolute minima of I

_{3}correspond to Borromean links, interpreted as synergy, cf. [11,12].

#### 1.3. Extension to Quantum Information

^{n}. Real quantum observables are n × n hermitian matrices, and, by definition, the amplitude, or expectation, of the observable Z in the state ρ is given by the formula $\mathbb{E}(Z)=Tr(Zp)$ (see e.g., [13]). Two real observables Y, Z are said congruent if their eigenspaces are the same, thus orthogonal decomposition of E are the quantum analogs of partitions. The join is well defined for commuting observables. An information structure

**S**is given by a subset of observables, such that, if Y, Z have common refined eigenspaces decomposition in

**S**, their join (Y, Z) belongs to S. We assume that {E} belongs to

**S**. What plays the role of a probability functor is a map

**Q**from

**S**to sets of positive hermitian forms on E, which behaves naturally with respect to the quantum direct image, thus

**Q**is a covariant functor. We define information N-cochains as for the classical case, starting with the numerical functions on the sets

**Q**

_{X}; X ∊

**S**, which behave naturally under direct images.

_{A}’s are the spectral projectors of the observable Y. The functor

**Q**is said to match

**S**(or to be complete and minimal with respect to

**S**) if, for each X ∊

**S**, the set

**Q**

_{X}is the set of all possible densities of the form ρ

_{X}.

_{q}and δ

_{Qt}by the formula (22), then the notions of co-cycles, co-boundaries and co-homology classes follow. We have δ

_{q}○ δ

_{q}= 0 and δ

_{Qt}○ δ

_{Qt}= 0; cf. [7].

_{n}acts transitively on

**S**and

**Q**, there is a notion of invariant cochains, forming a subcomplex of information cochains, and giving a more computable co-homology than the brut information co-homology. We call it the invariant information co-homology and denote it by ${H}_{U}^{*}\left(\mathbf{S};\mathbf{Q}\right)$.

_{2}(ρ)) = −(ρ log

_{2}(ρ)); it defines a 0-cochain S

_{Y}by restricting S to the sets

**Q**

_{X}. The classical entropy is $H\left(Y;\rho \right)=-{\displaystyle {\sum}_{A}Tr}\left({E}_{A}^{*}\rho {E}_{A}\right){\mathrm{log}}_{2}\left(Tr\left({E}_{A}^{*}\rho {E}_{A}\right)\right)$. Both these co-chains are invariant. It is well known that S

_{(}

_{X,Y}

_{)}(ρ) = H(X; ρ) + X.S

_{Y}(ρ) when X, Y commute, cf. [13]. In particular, by taking Y = 1

_{E}we see that classical entropy measures the default of equivariance of the quantum entropy, i.e., H(X; ρ) = S

_{X}(ρ) − (X.S)(ρ). But using the case where X refines Y, we obtain that the entropy of Shannon is the co-boundary of (minus) the Von Neumann entropy.

**Theorem C.**(cf. Theorem 3 section 4.3): For n ≥ 4 and when

**S**is generated by at least two decompositions such that each pair has at least four subspaces, and when

**Q**is matching

**S**, the invariant co-homology ${H}_{U}^{1}$ of δ

_{q}in degree one is zero, and the space ${H}_{U}^{0}$ is of dimension one. In particular, the only invariant 0-cochain such that δS = −H is the Von Neumann entropy.

#### 1.4. Concavity and Convexity Properties of Information Quantities

_{1},…,S

_{n}. It is remarkable that in this case, the information functions I

_{N,J}= I

_{N}(S

_{j}

_{1};…S

_{jN}) over all the subsets J = {j

_{1},…,j

_{N}} of [n] = {1,…, n}, different from [n] itself, give algebraically independent functions on the probability simplex ∆(Ω) of dimension 2

^{n}− 1. They form coordinates on the quotient of ∆(Ω) by a finite group.

_{d}denotes the Lie derivative with respect to d = (1,…,1) in the vector space ${\mathbb{R}}^{{2}^{n}}$, and ∆ the Euclidian Laplace operator on ${\mathbb{R}}^{{2}^{n}}$, then ∆ = ∆ − 2

^{−n}$\mathcal{L}$

_{d}○ $\mathcal{L}$

_{d}is the Laplace operator on the simplex ∆(Ω) defined by equating the sum of coordinates to 1.

**Theorem D.**(cf [15]): On the affine simplex ∆(Ω) the functions I

_{N,J}with N odd (resp. even) satisfies the inequality ∆I

_{N}≥ 0 (resp. ∆I

_{N}≤ 0).

_{N,J}are super-harmonic which is a kind of weak concavity and for N even they are sub-harmonic which is a kind of weak convexity. In particular, when N is even (resp. odd) I

_{N,J}has no local maximum (resp. minimum) in the interior of ∆(Ω).

**Problem 2.**What can be said of the other critical points of I

_{N,J}? What can be said of the restriction of one information function on the intersection of levels of other information functions? Information topology depends on the shape of these intersections and on the Morse theory for them.

#### 1.5. Monadic Cohomology of Information

_{1},…,E

_{m}) of subsets of Ω such that ⋃

_{j}E

_{j}= Ω and ${E}_{i}\cap {E}_{j}=\overline{)0}$ as soon as i ≠ j. The number m is named the degree of S. Note the important technical point that some of the sets E

_{j}can be the empty set. In the same spirit we introduce generalized ordered orthogonal decompositions of E for the quantum case; but in this summary, for simplicity we restrict ourselves to the classical case. Also we forget to add generalized to ordered up to now in this summary. A rooted tree decorated by $\mathcal{S}*$ is an oriented finite planar tree Γ, with a marked initial vertex s

_{0}, named the root of Γ, where each vertex s is equipped with an element F

_{s}of $\mathcal{S}*$, such that edges issued from s correspond to the values of F

_{s}. When we want to mention that we restrict to partitions less fine than a partition X we put an index X, like in ${\mathcal{S}}_{X}^{*}$.

_{1},…,n

_{m}) denotes the operation which associates to an ordered partition S of degree m and to m ordered partitions S

_{i}of respective degrees n

_{i}, the ordered partition that is obtained by cutting the pieces of S using the pieces of S

_{i}and respecting the order. An evident unit element for this operation is the unique partition n

_{0}of degree 1. The symbol μ

_{m}denotes the collection of those operations for m fixed. The introduction of empty subsets in ordered partitions insures that the result of μ(m; n

_{i},…,n

_{m}) is a partition of length n

_{i}+… + n

_{m}, thus the μ

_{m}do define what is named an operad; cf. [10,16]. The axioms of unity, associativity and covariance for permutations are satisfied. See [10,16–18] for the definition of operads.

_{m}to get a monad: take for V the real vector space freely generated by $\mathcal{S}*$; it is naturally graded, so it is the direct sum of spaces V(m); m ≥ 1 where the symmetric group ${\mathfrak{S}}_{m}$ acts naturally to the right, then introduce, for any real vector space W the real vector space $\mathcal{V}\left(W\right)={\otimes}_{m\ge 0}V\left(m\right){\otimes}_{{\mathfrak{S}}_{m}}{W}^{\otimes m}$; the Schur composition is defined by $\mathcal{V}\circ \mathcal{V}={\oplus}_{m\ge 0}V\left(m\right){\otimes}_{{\mathfrak{S}}_{m}}{\mathcal{V}}^{\otimes m}$. It is easy to verify that the collection (μ

_{m}; m ∊ ℕ) defines a natural transformation $\mathcal{V}\circ \mathcal{V}\to \mathcal{V}$, and the trivial partition π

_{0}defines a natural transformation $\eta :\mathcal{R}\to \mathcal{V}$, that satisfied to the axioms of a monad.

_{X}over the category $\mathcal{S}$ $\mathcal{M}$

_{X}(m) be the vector space freely generated over ℝ by the symbols (P,i,m) where P belongs to Q

_{X}, and 1 ≤ i ≤ m. In the last section of the second part we show how this space arises from the consideration of divided probabilities. This is apparent on the following definition of the right action of the operad $\mathcal{V}$ on the family ${\mathcal{M}}_{X}\left(m\right);m\in \mathbb{N}*$: a sequence S

_{1},…,S

_{m}or ordered partitions in ${\mathcal{S}}_{X}^{*}$ acts to a generator (P,i,m) by giving the vector Σ

_{j}p

_{j}(P

_{j},),n) where p

_{j}is the probability P(S

_{i}= j) and P

_{j}is the conditioned probability P|(S

_{i}= j). We denote by θ

_{m}((P, i, m), (S

_{1},…,S

_{m})) this vector.

_{m}define a natural transformation $\theta :\mathcal{M}\circ \mathcal{V}\to \mathcal{M}$, which is an action to the right in the sense of monads, i.e., $\theta \circ \left(\mathcal{F}\mu \right)=\theta \circ \left(\theta \mathcal{V}\right)$; θ ○ ($\mathcal{F}$η) = Id. (We forgot the index X for simplicity.)

_{X}(S

_{1};…,S

_{k}; (P, i, m), where S

_{1};…;S

_{k}is a forest of m trees of level k labelled by ${\mathcal{S}}_{X}^{*}$, and where the value on (P, i, m) depends only on the tree ${S}_{1}^{i};{S}_{2}^{i};\dots ;{S}_{k}^{i}$.

_{*}ℙ) = H(S; ℙ) defines a 1-cocycle is a result of an equation of Fadeev, generalized by Baez, Fritz and Leinster [20], who gave another interpretation, based on the operad structure of the set of all finite probability laws. See also Marcolli and Thorngren [21].

**Theorem E.**(cf. Theorem 4 section 6.3, [22]): If Ω has more than four points, ${H}_{\tau}^{1}\left(\prod \left(\mathrm{\Omega}\right),\mathrm{\Delta}\left(\mathrm{\Omega}\right)\right)$ is the one dimensional vector space generated by the entropy.

_{t}on ${\mathcal{C}}_{r}^{*}\left(\mathcal{M}\right)$ corresponds to another right action of the monad ${\mathcal{V}}_{X}$, which is deduced from the maps θ

_{t}that send (P, i, m) ⊗ S

_{1}⊗… ⊗ S

_{m}) to the sum of the vectors (P, (i, j), n) for j = 1,…, n

_{i}that are associated to the end branches of S

_{i}. It gives a twisted version of information co-homology as we have done in the first paragraph. This allows us to define higher information quantities for strategies: for N = 2M + 1 odd, I

_{τ,N}= − (δδ

_{t})

^{M}H, and for N = 2M + 2 even, i

_{τ,n}= δ

_{t}(δδ

_{t})

^{M}H.

_{1},…,T

_{m}:

_{i}are equals we recover the ordinary mutual information of Shannon plus a multiple of the entropy of T

_{i}.

#### 1.6. The Forms of Information Strategies

_{1},…,α

_{k}connecting s

_{0}to s; the cardinal k is named the level of s; this chain defines a sequence (F

_{0},v

_{0}; F

_{1},v

_{1}; F

_{k}

_{−1},v

_{k}

_{−1}) of observables and values of them; then we can associate to s the subset Ω

_{s}of Ω where each F

_{j}takes the value u

_{j}. At a given level k the sets Ω

_{s}form a partition π

_{k}of Ω; the first one π

_{0}is the unit partition of length 1, and π

_{l}is finer than π

_{l−1}for any l. By recurrence over k it is easy to deduce from the orderings of the values of F

_{s}an embedding in the Euclidian plane of the subtrees Γ(k) at level k such that the values of the variables issued from each vertex are oriented in the direct trigonometric sense, thus π

_{k}has a canonical ordering ω

_{k}. Remark that many branches of the tree gives the empty set for Ω

_{s}after some level; we name them dead branches. It is easy to prove that the set $\prod {\left(\mathcal{S}\right)}_{*}$ of ordered partitions that can be obtained as a (π

_{k},ω

_{k}) for some tree Γ and some level k is closed by the natural ordered join operation, and, as $\prod {\left(\mathcal{S}\right)}_{*}$ contains π

_{0}, it forms a monoid, which contains the monoid $M\left({\mathcal{S}}_{*}\right)$ generated by ${\mathcal{S}}_{*}$.

_{k}; optimal discrimination correspond to minimal level k. When the set Ω is a subset of the set of words x

_{1},…,x

_{N}with letters x

_{i}belonging to given sets M

_{i}of respective cardinalities m

_{i}, the problem of optimal discrimination by observation strategies Γ decorated by ${\mathcal{S}}_{*}$ is equivalent to a problem of minimal rewriting by words of type (F

_{0},v

_{0}), (F

_{1},v

_{1}),(F

_{k},v

_{k}); it is a variant of optimal coding, where the alphabet is given. The topology of the poset of discriminating strategies can be computed in terms of the free Lie algebra on Ω, cf. [16].

_{i}= i = 0,…,k − 1 for a minimal chain leading from s

_{0}to s. We can consider that the sets ${\mathcal{P}}_{s}$ for different s along a branch measure the evolution of knowledge when applying the strategy. The entropy H(F; ℙ

_{s}) for F in ${\mathcal{S}}_{*}$ and ℙ

_{s}in ${\mathcal{P}}_{s}$ gives a measure of information we hope to obtain when applying F at s in the state ℙ

_{s}. The maximum entropy algorithm consists in choosing at each vertex s a variable that has the maximal conditioned entropy H(F; ℙs).

**Theorem F.**(cf. [22]): To find one false piece of different weight among N pieces for N ≥ 3, when knowing the false piece is unique, by the minimal numbers of weighing, one can use the maximal entropy algorithm.

_{s}the set of permutations of Ω

_{s}which respects globally the set ${\mathcal{P}}_{s}$ and the set of restrictions of elements of ${\mathcal{S}}_{*}$ to Ω

_{s}, and which preserves one by one the equations F

_{i}= v

_{i}. Along branches of Γ this gives a decreasing sequence of groups, whose successive quotients measure the evolution of acquired information in an algebraic sense.

**Problem 3.**Generalize Theorem F. Can we use algorithms based on the Galoisian measure of information? Can we use higher information quantities associated to trees for optimal discrimination?

#### 1.7. Conclusion and Perspective

## 2. Classical Information Topos. Theorem One

#### 2.1. Information Structures and Probability Families

_{i}; 1 ≤ i ≤ n} of partitions of Ω. For any subset I = {i

_{1},…, i

_{k}} of [n] = {1,…, n}, the joint (Si

_{1},…, Si

_{k}), also denoted S

_{I}, divides each Si

_{j}. The set W = W(Σ) of all the S

_{I}, when I describes the subsets of [n] is an information struture. It is even a commutative monoid, because any product of elements of W belongs to W, and the partition associated to Ω itself gives the identity element of W. The product S

_{[}

_{n}

_{]}of all the S

_{i}is maximal; it divides all the other elements. As Π(Ω) the monoid W(Σ) is idempotent, i.e., for any X we have XX = X.

_{a}; a ∊ A of K correspond to the finest elements in $\mathfrak{S}\left(K\right)$; the vertices of a face Σ

_{a}gives a family of partitions, which generates a sub-monoid W

_{a}= W(Σ

_{a}) of W; it is a sub-information structures (full sub-category) of $\mathfrak{S}\left(K\right)$, having the same unit, but having its own initial element ω

_{a}. These examples arise naturally when formalizing measurements if some obstructions or a priori decisions forbid a set of joint measurements.

**Example 1.**Ω has four elements (00), (01), (10), (11); the variable S

_{1}(resp. S

_{2}) is the projection pr

_{1}(resp. pr

_{2}), on E

_{1}= E

_{2}= {0, 1}; Σ is the set {S

_{1}, S

_{2}}. The monoid W(Σ) has four elements 1, S

_{1}, S

_{2}, S

_{1}S

_{2}. The partition S

_{1}S

_{2}= S

_{2}S

_{1}corresponds to the variable Id : Ω → Ω.

**Example 2.**Same Ω as before, with the same names for the elements, but we take all the partitions of Ω in $\mathcal{S}$. In addition to 1, S

_{1}, S

_{2}and S = S

_{1}S

_{2}, there is S

_{3}, the last partition in two subsets of cardinal two, which can be represented by the sum of the indices: S

_{3}(00) = 0, S

_{3}(11) = 0, S

_{3}(01) = 1, S

_{3}(10) = 1, the four partitions Y

_{ω}, for ω ∊ Ω, formed by a singleton {ω} and its complementary, and finally the six partitions X

_{μν}= Y

_{μ}Y

_{ν}, indexed by pairs of points in Ω satisfying p < ν in the lexical order. The product of two distinct Y is a X, the product of two distinct X or two distinct S

_{i}is S, the product of one Y and a S

_{i}is a X, of one Y and a X is this X or S, of one S and a X is this X or S. In particular the monoid W is also generated by the three S

_{i}and the four Y

_{ω}; it is called the monoid of partitions of Ω, and the associative algebra $\Lambda \left(\mathcal{S}\right)$ of this monoid is called the partition algebra of Ω.

**Example 3.**Same Ω as before, that is Ω = ∆(4), with the notations of example 2 for the partitions; but we choose as generating family the set ϒ of the four partitions Y

_{μ}; μ ∊ Ω; the joint product of two such partitions is either a Y

_{μ}(when they coincide) or a X

_{μv}(when they are different). The monoid W(ϒ) has twelve elements.

**Example 4.**Ω has 8 elements, noted (000),…,(111), and we consider the family Σ of the three binary variables S

_{1}, S

_{2}, S

_{3}given by the three projections. If we take all the joints, we have a monoid of eight elements. However, if we forbid the maximal face (S

_{1}, S

_{2}, S

_{3}), we have a structure $\mathcal{S}$ which is not a monoid; it is the set formed by 1, S

_{1}, S

_{2}, S

_{3}and the three joint pairs (S

_{1}, S

_{2}), (S

_{1}, S

_{3}), (S

_{2}, S

_{3}).

_{x}of the atoms x of $\mathcal{B}$ (the points of ${\mathrm{\Omega}}_{\mathcal{B}}$), satisfying p

_{x}≥ 0 and Σ

_{x}p

_{x}= 1. We see that this set of probabilities is also a simplex ∆([N]), where N is the cardinality of ${\mathrm{\Omega}}_{\mathcal{B}}$.

_{i}for i = 1, …, m, belonging to $\mathcal{B}$. Let P be an element of $\mathrm{\Delta}\left(\mathcal{B}\right)$; the conditioning of P by the element Y

_{i}is defined only if P(Y

_{i}) ≠ 0, and given by the formula P(B|Y = y

_{i}) = P(B ⋂ Y

_{i})/P(Y

_{i}). We will consider it as a probability on Ω equipped with $\mathcal{B}$, not as a probability on Y

_{i}. Remark that if P belongs to a simplicial family $\mathcal{Q}$, the probability P(B|Y = y

_{i}) is also contained in $\mathcal{Q}$. In fact, if the smallest face of $\mathcal{Q}$ which contains P is the simplex a on the vertices x

_{1},…,x

_{k}, then the conditioning of P by Y

_{i}, being equal to 0 for the other atoms x, belongs to a face of σ, which is in $\mathcal{Q}$, because $\mathcal{Q}$ is a complex.

_{*}Q of a probability Q for $\mathcal{B}$ by the partition Y is the probability on Ω for the sub-algebra ${\mathcal{B}}_{Y}$, that is given by Y * Q(t) = Q(t) for t ∊ ${\mathcal{B}}_{Y}$. It is the forgetting operation, also frequently named marginalization by Y.

_{*}. Let us prove that it is a simplicial sub-complex of $\mathrm{\Delta}\left({\mathcal{B}}_{Y}\right)$: take a simplex σ of $\mathcal{Q}$, denote its vertices by x

_{1},…,x

_{k}, note δ

_{j}the Dirac mass of x

_{j}, and look at the partition σ

_{i}= Y

_{i}⋂ σ of σ induced by Y, then for all the x

_{j}∊ σ

_{i}the images Y

_{*}δ

_{j}coincide. Let us denote this image by δ(Y, σ

_{i}); it is an element of ${\mathcal{Q}}_{Y}$. For every law Q in a, the image Y

_{*}Q belongs to the simplex on the laws δ(Y, σ

_{i}), and any point in this simplex belongs to ${\mathcal{Q}}_{Y}$. Q.E.D.

_{*}are related by the barycentric law (or theorem of total probability, Kolmogorov 1933 [29]): for any measurable set A in $\mathcal{B}$ we have

_{X}↦ Y

_{*}P

_{X}. If $\mathcal{Q}$ is simplicial the functor goes to the category of simplicial complexes.

**Definition 1.**For $X\in \mathcal{S}$, the functional module${\mathcal{F}}_{X}\left(\mathcal{Q}\right)$ is the real vector space of measurable functions on the space ${\mathcal{Q}}_{X}$; for each arrow of divisibility X → Y, we have an injective linear map f ↦ f

^{Y}

^{|X}from $\mathcal{F}$

_{Y}to $\mathcal{F}$

_{X}given by

_{Y}, we have, in $\mathcal{F}$

_{X}the identity

_{*}(P|(Z = z)) = (Y

_{*}P)|(Z = z) due to Y

_{*}P(Z = z) = P(Z = z). The arrows of direct images and the action of averaged conditioning satisfy the axiom of distributivity: if Y and Z divide X, but not necessarily Z divides Y, we have

**Proof.**The first identity comes from the fact that (Z,Y)

_{*}(P|(Z = z)) = Y

_{*}(P|(Z = z)); the second one follows from the fact that we have an action of the monoid${\mathcal{S}}_{X}$.

_{X}, and Y be the goal of an arrow X → Y, we have

**Lemma 1.**for any pair (Y, Z) of variables in ${\mathcal{S}}_{X}$, and any F for which the integrals converge, we have (Y,Z).F = Y.(Z.F).

**Proof.**We note p

_{i}the probability that Y = y

_{i}, π

_{ij}the joint probability of (Y = y

_{i}, Z = z

_{j}), and q

_{ij}the conditional probability of Z = z

_{j}knowing that Y = y

_{i}, then

**Remark 1.**In the general case, where Ω is not necessarily finite and $\mathcal{B}$ is any sigma-algebra, the Lemma 1 is a version of the Fubini theorem.

_{X}translating the marginalization by the partitions, considered as observable quantities, and the conditioning by observables is translated by a special element X ↦ $\mathcal{F}$

_{X}of the information topos.

_{*}(N); then the group $Ex{t}_{\mathcal{D}}^{n}\left(M,N\right)$ can be defined as the homology of the complex $Ho{m}_{\mathcal{D}}\left(M,{I}_{n}\left(N\right)\right)$. Those groups are denoted by H

^{n}(M; N).

#### 2.2. Non-Homogeneous Information Co-Homology

_{m}(X), freely generated by the m-uples of elements of the monoid ${\mathcal{S}}_{X}$, and we define C

^{m}(X) as the real vector space of linear functions from S

_{m}(X) to the space $\mathcal{F}$

_{X}of measurable functions from ${\mathcal{Q}}_{X}$ to ℝ.

_{X}∊ C

^{m}(X) satisfying the following condition, named joint locality:

_{j}is divided by Y, we must have

_{m}(X) from $\mathcal{S}$ to the category of real vector spaces to the functor $\mathcal{F}$ of measurable functions on ${\mathcal{Q}}_{X}$. Hence, F is not an ordinary numerical function of probability laws ℙ and a set (X

_{i},…,X

_{m}) of m random variables, but we can speak of its value F

_{X}(X

_{1};…;X

_{m}; ℙ) for each X in $\mathcal{S}$. For X given the co-chains form a sub-vector space ${\mathcal{C}}^{m}\left(X\right)$ of C

^{m}(X).

_{1},…,X

_{m}) we find that F(X

_{1};…; X

_{m}; ℙ) depends only on the direct image of ℙ by the joint variable of the X

_{i}’s. This implies that, if F belongs to ${\mathcal{C}}^{m}\left(X\right)$, we have

_{j}, and let P be a probability in ${\mathcal{Q}}_{X}$; then the joint variable Z =(X

_{i},…,X

_{m}) divides Y and X, thus we have Z

_{*}P = Z

_{*}(X

_{*}P) = Z

_{*}(Y

_{*}P), and

**Remark 2.**The operation of ${\mathcal{S}}_{X}$ can be rewritten more compactly by using integrals:

_{1};…; Y

_{m}; P) depends only on (Y

_{1},…, Y

_{m})

_{*}P), has a co-boundary which is also jointly local, because the variables appearing in the definition are all joint variables of the Y

_{j}. (This this would not have been true for the stronger locality hypothesis asking that F depends only on the collection (Y

_{j})

_{*}P; j = 1,…,m.)

^{m}○ δ

^{m−1}= 0. We denote by Z

^{m}the kernel of δ

^{m}and by B

^{m}the image of δ

^{m−1}. The elements of Z

^{m}are named m-cocycles, we consider them as information quantities, and the elements of B

^{m}are m-coboundaries.

**Definition 2.**For m ≥ 0, the quotient

**Proposition 1.**For each integer m ≥ 0, a natural linear map

**Proof.**First, remark that X

_{j}=X″

_{j}○ φ implies ${{X}^{\prime}}_{j}=X{\u201d}_{j}$ because φ is surjective. As F′ is (jointly) local, the co-chain F = φ* (F′) is also (jointly) local. Finally, it is evident that the map F′ ↦ F commutes with the co-boundary operator. Therefore the proposition follows.

**Proposition 2.**For each integer m ≥ 0, a natural linear map

_{*}(P).

**Proof.**First, remark that, if Q also satisfies P′ = φ

_{*}(Q), we have $F({{X}^{\prime}}_{1}\circ \phi ;\dots ;{{X}^{\prime}}_{m}\circ \phi ;P)=F({{X}^{\prime}}_{1}\circ \phi ;\dots ;{{X}^{\prime}}_{m}\circ \phi ;\mathcal{Q})$. To establish that point, let us denote ${X}_{j}={{X}^{\prime}}_{j}\circ \phi ;j=0,\dots ,m$, and ${X}^{\prime}=({{X}^{\prime}}_{1},\dots ,{{X}^{\prime}}_{m})$, X= (X

_{1},…,X

_{m}) the joint variables; the quantity $F({{X}^{\prime}}_{1}\circ \phi ;\dots ;{{X}^{\prime}}_{m}\circ \phi ;P)$ depends only on X

_{*}P, but this law can be rewritten ${{X}^{\prime}}_{*}{P}^{\prime}$, which is also equal to X

_{*}Q. In particular, if F is local, then F′ = φ

_{*}F is local.

**Corollary 1.**In the case where ${\mathcal{Q}}^{\prime}={\phi}_{*}(\mathcal{Q})$ and $\mathcal{S}={\phi}^{*}{\mathcal{S}}^{\prime}$, the maps φ* and φ

_{*}in information co-homology are inverse one of each other.

_{X}in ${\mathcal{Q}}_{X}$ such that f(Y

_{*}P

_{X}) = f (P

_{X}) for any Y multiple of X (i.e., coarser than X). As we assume 1 belongs to $\mathcal{S}$, and the set Q

_{1}has only one element, f must be a constant. And every constant is a co-cycle, because

^{0}is ℝ. This corresponds to the hypothesis $1\in \mathcal{S}$, meaning connexity of the category. If m components exist, we recover them in the same way and H

^{0}is isomorphic to ℝ

^{m}.

_{X}(Y; P

_{X}), measurable in the variable P in $\mathcal{Q}$, labelled by elements $Y\in {\mathcal{S}}_{X}$, which satisfies the locality condition, stating that each time we have Z → X → Y in $\mathcal{S}$, we have

_{Y}(Y; Y

_{*}P) to recover f

_{X}(Y; P) for all partition X in $\mathcal{S}$ that divides Y.

_{X}.

_{i}of X, we have

_{i})) = 0, due to P ≥ 0. This generalizes f (1; P) = 0 for any P, because, for a probability conditioned by X = x

_{i}, the partition X appears the same as 1, that is a certitude.

#### 2.3. Entropy

_{i}denotes the values of ℙ on the elements of the partition X. In particular the function H depends only on X

_{*}(ℙ), which is locality. The co-cycle equation expresses the fundamental property for an information quantity, writen by Shannon:

_{0}: we only have to replace the finite sum by the integral of the function −φ log φ where φ denotes the density with respect to ℙ

_{0}. Changing the reference law ℙ

_{0}changes the quantities H(X) and H(Y) by the same constant, thus does not change the variation H(X; P) − H(Y; P).

_{k},Y = η

_{α}is noted p

_{k,α}, then the probability of X = ξ

_{k}is equal to p

_{k}= Σ

_{α}p

_{k,α}and the probability of Y = η

_{α}is equal to q

_{α}= Σ

_{k}p

_{k,α}. To simplify the notations, let us write F = f (X; p),G = f ((Y, X); ℙ),H = f (Y; ℙ), F

_{α}= f (X; ℙ|(Y = η

_{α})),H

_{k}= f (Y; P|(X = ξ

_{k})).

_{k,α}= 0 except when α = α

_{1}and k = k

_{2}, k

_{3},…,k

_{m}or α = α

_{2}and k = k

_{1}; we put ${p}_{{k}_{i},{\alpha}_{1}}={x}_{i}$; i = 2,…,m and ${p}_{{k}_{1},{\alpha}_{2}}={x}_{1}$, which implies that we have x

_{1}+ x

_{2}+… + x

_{m}= 1. Then Equation (33) implies that each term H in Equation (42) is zero, because only one value of the image law is non-zero, thus we can replace the only term G by $F({p}_{{k}_{1},\dots ,}{p}_{{k}_{m}})$, and we get from Equation (41):

_{1}subsists because, the possible other one, for α

_{2}, concerns a certitude.

_{2}= 1 − x

_{1}= a, x

_{3}=… = x

_{m}= 0, we deduce the identity H (a, 1 − a, 0,…, 0) = F(1 − a, a, 0,…, 0). This gives a recurrence equation to calculate F from the binomial case:

_{α}

_{1}is a special case of F, thus independent from Y and α

_{1}.

**Lemma 2.**With the notations of the example 1 (cf. example 1), Ω = {(00), (01), (10), (11)}, S

_{1}(resp. S

_{2}) the projection pr

_{1}(resp. pr

_{2}), on E

_{1}= E

_{2}= {0,1}, S = {S

_{1}, S

_{2}}; then the (measurable) information co-homology of degree one is generated by the entropy, i.e., there exists a constant C such that, for any X in $W(\Sigma ),P\in \mathbb{P},f(X;P)=CH(X;P)$.

**Proof.**We consider a 1-cocycle f. We have f(1; P) = 0. Let us note f

_{i}(P) = f(S

_{i}; P), and f

_{ijk}(u) the function f (S

_{i}; P|(S

_{j}= k)), the variable u representing the probability of the first point in the fiber S

_{j}= k in the lexicographic order. For each tableau 2 × 2, P = (p

_{00}, p

_{01}, p

_{10}, p

_{11}), the symmetry formula (36) gives

_{10}= 0,p

_{00}= u,p

_{11}= v,p

_{01}= 1 − u − v in this relation, we obtain the equation:

_{1}, f

_{2}depend only on the image law by S

_{1}, S

_{2}respectively, thus, again by noting a binomial probability from the value of the first element in lexicographic order, we get

_{1}(u) = f

_{2}(u); then we arrive to the following functional equation for h = f

_{1}= f

_{2}:

_{1}, …, x

_{m}) of real numbers such that x

_{1}+ … + x

_{m}= 1,

**Theorem 1.**For every connected structure of information $\mathcal{S}$, which is sufficiently rich, and every set of probability $\mathcal{Q}$, which is complete with respect to $\mathcal{S}$, the information co-homology group of degree one is one-dimensional and generated by the classical entropy.

_{1}, …, S

_{n}, when n ≥ 2, such that, for every i at least of the pairs (S

_{i}, S

_{j}) is rich.

_{1}, …, Ω

_{k}, the partition σ

^{∗}X is made by the subsets σ

^{−}

^{1}(Ω

_{1}), …, σ

^{−}

^{1}(Ω

_{1}), in such a manner that, if σ, τ are two permutations of Ω, we have τ

^{∗}(σ

^{∗}X) = (σ ○ τ)

^{∗}X.

^{∗}X also belongs to $\mathcal{S}$.

^{∗}P = P ○ σ on Ω/X also belongs to ${\mathcal{Q}}_{X}$.

^{∗}σ

^{∗}P = (σ ○ τ)

^{∗}P). Thus the actions of symmetric groups are defined here on the right. However, we have actions to the left by taking σ

_{∗}= (σ

^{−}

^{1})

^{∗}. For the essential role of symmetries in information theory, see the article of Gromov in this volume.

_{1}, …, Y

_{m}in ${\mathcal{S}}_{X}$, we have

_{∗}X, and completing the category by composing the two kind of arrows, division and permutation. In this case, the probability functor $\mathcal{Q}$ must behave naturally with respect to permutation, which implies it is symmetric. Moreover, the natural notion of functional sheaf and local cochains are a symmetric sheaf and symmetric cochains.

#### 2.4. Appendix. Complex of Possible Events

^{0}, A

^{1}, A

^{2}, A

^{3}, … the N + 1 vertices of the large simplex Δ

_{N}, a point of Δ

_{N}is interpreted as a probability ℙ on the set of thee vertices; each vertex can be seen as an elementary event, and we will say that a general event A is possible for ℙ when ℙ(A) is different from zero. An event A is said impossible for P in the other case, that is when ℙ(A) = 0.

_{N}is the complementary set of the opposite face to A, i.e., it is the set of probabilities P in Δ

_{N}such that A is possible, i.e., has non-zero probability. The relative star S(A|K) of A in subcomplex K is the intersection of the star of A with K.

_{N}whose vertices are A, B, C, D, …. We note L(F) the set of points p in Δ

_{N}such that at least one of the points A, B, C, D, … is impossible for p. This is also the reunion of the faces which are opposite to the vertices A, B, C, D, … . Then L(F) is a simplicial complex. The complementary set in F of the interior of F , i.e., the boundary of F , is the reunion of the intersections of F with all faces opposite to A, B, C, D, …; it is also the set of probabilities p in F such that at least one of the points A, B, C, D, … is impossible for p, thus it is equal to L(F) ∩ F . If G is a face containing F the complex L(G) contains the complex L(F).

_{N}a set E = E

_{K}of open faces. Let $\dot{F}=F\backslash \partial F$ be an element of E, then each faces G of Δ

_{N}containing F belongs to E, because K is a complex.

_{N}which does not contain $\dot{F}$. This can be proved as follows: if p in K makes that every vertices of F is possible, it belongs to a face G such that every vertex of F is a vertex of G, thus K contains G which contains F . So, if K does not contain $\dot{F}$, K is contained in L(F).

_{K}be the intersection of the L(F), where F describe the faces in E

_{K}. From what precedes we know that K is contained in L. However, every $\dot{F}$ in E is included in the complementary set of L(F), thus it is included in the complementary set of L, which is the union of the complementary sets of the L(F). Consequently the complementary set of K is included in the complementary set of L. Then K = L.

**Theorem 2.**A subset K of the simplex Δ

_{N}is a simplicial sub-complex if and only if it is defined by a finite number of constraints of the type: “for any p in K, the fact that A, B, C, … are possible for p implies that D is impossible for p”.

_{N}, we have shown that K is the intersection of the L(F) where the open face $\dot{F}$ is not in K, but if A, B, C, D, … denote the vertices of the face F, a point p belongs to L(F) if and only if “(A is impossible for p) or (B is impossible for p) or …”, and this sentence is equivalent to “if (A is possible for p) and (B is possible for p) and …, then (D is impossible for p)”. This results from the equivalence between “(P implies Q) is true” and “(no P or Q) is true”. Reciprocally any L(F) is a simplicial complex, then every intersection of sets of the form L(F) is a simplicial complex too.

## 3. Higher Mutual Informations. A Sketch

^{∗}, denoted by δ

_{t}, is defined by the same formula as δ, except that the first term Y

_{1}.F (Y

_{2}; …; Y

_{n}; ℙ) is replaced by the term F(Y

_{2}; …; Y

_{n}; ℙ) without Y

_{1}:

_{2}, …, Y

_{n}) ∗ ℙ, (Y

_{1}, …, Y

_{n}) ∗ ℙ and (Y

_{1}, …, Y

_{n−}

_{1}) ∗ ℙ.

_{t}F = 0, and a topological co-boundary is an element in the image of δ

_{t}.

_{t}○ δ

_{t}= 0, which allows to define a co-homology theory that we will name topological co-homology.

_{1}, …, S

_{n}, when n ≥ 2.

_{I}denoting the joint partition of the S

_{i}such that i ∈ I. We also define I

_{1}= H.

_{N}makes evident it is a symmetric function, invariant by all permutation of the partitions S

_{1}, …, S

_{N}.

_{2}(S; T) = H(S) + H(T) − H(S, T) is the usual mutual information.

_{2}= δ

_{t}H. The following formula generalizes this remark to higher mutual informations of even orders:

_{t}according that their order is odd or even respectively.

**Lemma 3.**Let n be even or odd we have

_{N}satisfies the equation of information 1-cocycle, thus I

_{N}seems to be a kind of “partial 1-cocycle”; however this is misleading, because the locality condition is not satisfied. In fact I

_{N}is a N-cocycle, either for δ, either for δ

_{t}depending on the parity of N.

_{0}− 1 denotes the sum of the two operators of mean conditioning and minus identity.

**Remark 3.**Reciprocally the functions I

_{N}decompose the entropy of the finest joint partition:

_{1}(S) + I

_{1}(T) − I

_{2}(S; T), and

## 4. Quantum Information and Projective Geometry

#### 4.1. Quantum Measure, Geometry of Abelian Conditioning

^{N}, the canonical basis being the points x of Ω. In this case the canonical positive hermitian metric on E corresponds to the quadratic mean: if f and g are elements of E, we have

^{2}functions for a fixed probability P

_{0}.

_{0}of reference. Why is it so? Because a priori a hermitian form h on E is a map from E to ${\overline{E}}^{\ast}$, where ∗ denotes duality and bar denotes conjugation, the conjugate space $\overline{E}$ being the same set E, with the same structure of vector space over the real numbers as E, but with structure of vector space over the complex numbers changed by changing the sign of the action of the imaginary unit i. The complexification of the real vector space H of hermitian forms is ${H}_{om\u2102}(E,{\overline{E}}^{\ast})\cong {E}^{\ast}\otimes {\overline{E}}^{\ast}$. The space H is the set of fixed points of the ℂ-anti-linear map u ↦

^{t}ū. A trace is defined for an endomorphism of the space E, as a linear invariant quantity on E

^{*}⊗ E. Here we could take the trace over ℝ, because E and $\overline{E}$ are the same over ℝ, but the duality would be an obstacle, because even over the field ℝ, the spaces E and E

^{*}cannot be identified, and there exits no linear invariant in E

^{*}⊗ E

^{*}, even over ℝ. In fact, a non-degenerate positive h

_{0}is one of the way to identify E and ${\overline{E}}^{\ast}$. A basis is another way, also defining canonically a form h

_{0}. More precisely, when h

_{0}is given, every hermitian form h diagonalizes in an orthonormal basis for h

_{0}, thus all the spectrum of h makes sense not only the trace.

_{0}is tacitly assumed in most presentations. However it is better to understand the consequences of this choice. In non-relativistic quantum mechanics, it is not too grave, however in relativist quantum mechanics, it is; for instance, considering the system of two states as a spinor on the Lorentz space of dimension 4, the choice of h

_{0}is equivalent to the choice of a coordinate of time. See Penrose and Rindler [42].

_{+}of all positive non-zero hermitian products but a convex part PH

_{+}of the real projective space of real lines in the vector space H of hermitian forms. In this space, the complex projective space ℙ(E) of dimension N − 1 over ℂ is naturally embedded, its image consists of the rank one positive hermitian matrices of trace 1; these matrices correspond to the orthogonal projectors on one dimensional directions in E.

^{N}; they correspond to the Dirac distributions on classical states. We see here a point defended in particular by Von Neumann, that quantum states are projective objects not linear objects.

_{0}is given. If not “to be hermitian” for an operator has no meaning. (What could have a meaning for an operator is to be diagonalizable over R, which is something else.)

_{0}is chosen, the only difference between real observable and density of states is the absence of the positivity constraint.

_{0}plays a role in this formula. Consequently the definition of expectation requires to fix an h

_{0}not only a ρ. This imposes a departure from the relativistic case, which shall not be surprising, since considerations in relativistic statistical physics show that the entropy, for instance, depends on the choice of a coordinate for time. Cf. Landau-Lifschitz, Fluid Mechanics, second edition [43].

_{0}is given, this decomposition is given by a set of positive hermitian commuting projectors of sum equal to the identity. The additional data for recovering the operator X is one real eigenvalue for each projector. The underlying fact from linear algebra is that every hermitian matrix is diagonalizable in a unitary basis, which means that

_{j}are real, two by two different, and where the matrices E

_{j}are hermitian projectors, which satisfy, for any j and k ≠ j,

^{N}, its spectral measure gives an ordinary partition of the canonical basis, and we recover the classical situation.

_{0}.

_{0}fully justifies the limitation to orthogonal decompositions.

_{j}by the following formula

_{j}) by the formula

^{*}ρY, normalized to be of trace 1. However, here, as it is done in most of the texts on Quantum Mechanics, we will mostly restrict ourselves to the case of hermitian projectors, i.e., Y

^{*}= Y.

**Remark 4.**What justifies these definitions of probability and conditioning? First they allow to recover the classical notions when we restrict to diagonal densities and diagonal observables, i.e., when ρ is diagonal, real, positive, of trace 1, Z is diagonal, and the E

_{j}are diagonals, in which case they give a partition of Ω. The mean of Z is its amplitude. The probability of the event Z = z

_{j}is the sum of the probabilities p(ω) = ρ

_{ωω}for ω in the image of E

_{j}; this the trace of ρE

_{j}. Moreover, the conditioning by this event is the probability obtained by projection on this image, as prescribed by the above formula.

_{j}, for the state ψ, is equal to

_{j}, the system is reduced to the space E

_{j}, and every pure state ψ is reduced to its projection E

_{j}ψ, which is compatible with the above definition of conditioning for pure states. Here again, the general formula can be deduced by Equation (74). The division by the probability is achieved to normalize to a trace 1. Thus conditioning in general is given by orthogonal projection in E, and it corresponds to the operation of measurement.

**Definition 3.**The density of states associated to a given variable Z and a given density ρ is given by the sum:

_{j})

_{j}

_{∈}

_{J}designates the spectral decomposition of Z, also named spectral measure of Z. Thus ρ

_{Z}is usually seen as representing the density of states after the measurement of the variable Z. This formula is usually interpreted by saying that the statistical analysis of the repeated measurements of the observable Z transforms the density ρ into the density ρ

_{Z}.

_{Z}is better understood as being a collection of conditional probabilities ρ|(Z = z

_{j}), indexed by j.

_{t}= U

_{t}ρ and that the observables evolve as ${Z}_{t}={U}_{t}Z{U}_{t}^{-1}$, with U

_{t}respecting the fundamental scalar product h

_{0}. In fact, as we already mentioned, a deeper principle associates the choice of a time coordinate t to the choice of h

_{0}, which gives birth to a unitary group U(E; h

_{0}), isomorphic to U

_{N}(ℂ). For stationary systems the family (U

_{t})

_{t}

_{∈ℝ}forms a one parameter group, i.e., U

_{t}

_{+}

_{s}= U

_{t}U

_{s}= U

_{s}U

_{t}, and there exists a hermitian generator H of U

_{t}in the sense that U

_{t}= exp(2π itH/h); by definition, this particular observable H is the energy, the most important observable. Even if we have a privileged basis, like Ω in the relation with classical probability, the consideration of another basis which makes the energy H diagonal is of great importance. In the stationary case, a symmetry of the dynamical system is defined as any unitary operator, which commutes with the energy H. The set of symmetries forms a Lie group G, a closed sub-group in U

_{N}. The infinitesimal generators are considered as hermitian observables (obtained by multiplying the elements of the Lie algebra L(G) by i); in general they do not commute between themselves.

_{N}is natural for semi-classical study, it is the diagonal torus ${\mathbb{T}}^{N}$, its elements are the diagonal matrices with elements of modulus 1, they correspond to sets of angles. The group ${\mathcal{S}}_{N}$ normalizes the torus ${\mathbb{T}}^{N}$, i.e., for each permutation σ and each diagonal element Z, the matrix σZσ

^{−1}is also diagonal; its elements are the same as the elements of Z but in a different orders. The subgroup generated by ${\mathcal{S}}_{N}$ and ${\mathbb{T}}^{N}$ is the full normalizer of ${\mathbb{T}}^{N}$.

_{N}(ℂ) to an algebraic subgroup G

_{ℂ}. For instance, by choosing a symmetric invertible bilinear form on E we obtain O

_{N}(ℂ), or, when N is even, by choosing an antisymmetric invertible bilinear form on E we obtain Sp

_{N}(ℂ). In each of these cases there exists a special maximal torus (formed by the complexification of a maximal abelian subgroup T of unitary operators in G

_{ℂ}), and a Weyl group, which is the quotient of the normalizer N(T) by the torus T itself. This Weyl group generalizes the permutation group when more algebraic structures are given in addition to the linear structure. The compact group of symmetries is the intersection G of G

_{ℂ}with U

_{N}. In fact, given any compact Lie group G

_{c}, and any faithful representation r

_{c}of G

_{c}in ℂ

^{N}, we can restrict real observables to generators of elements in C

_{c}, and general observables to complex combinations of these generators, which integrate in a reductive linear group G. The spectral decomposition corresponds to the restriction to parabolic sub-groups of G

_{ℂ}. The densities of states are restricted to the Satake compactification of the symmetric space G

_{ℂ}/G

_{c}[45].

#### 4.2. Quantum Information Structures and Density Functors

_{1}, Y

_{2}, …, Y

_{m}) as joint variables. However, the efforts made in Physics and Mathematics were not sufficient to attribute a clear probability to the joint events (Y

_{1}= y

_{1}, Y

_{2}= y

_{2}, …, Y

_{m}= y

_{m}), when Y

_{1}, …, Y

_{m}do not commute; we even suspect that this difficulty is revelator of a principle, that information requires a form of commutativity. Thus, in our study, we will adopt the convention that every time we consider joint observables, they do commute. Hence we will consider only collections of commuting hermitian observables; their natural amplitudes in a given state are vectors in ℝ

^{m}. However we do not exclude the consideration in our theory of sequences (Y

_{1}; …; Y

_{m}) such that the Y

_{i}do not commute.

_{1}, Y

_{2}, …, Y

_{m}) define a linear decomposition of the total space E in direct orthogonal sum

_{α}; α ∈ A is the collection of joint eigenspaces of the operators Y

_{j}. Note that any orthogonal decomposition can be defined by a unique operator.

^{m}to End(E). Then assigning a probability number and perform probability conditioning can be seen as functorial operations.

_{α}the subspace of E or the orthogonal projection on this subspace.

_{α}; α ∈ A refines a decomposition E′

_{β}; β ∈ B, when each E′

_{β}is a sum of spaces E

_{α}for α in a subset A

_{β}of A. In such a case, we say that E

_{α}; α ∈ A divides E′

_{β}; β ∈ B.

**S**of decompositions X of E in direct sum, such that when Y and Z are elements of

**S**which refine X ∈

**S**, then Y, Z commute and the finer decomposition (Y, Z) they generate belongs to

**S**. In this text, we will only consider orthogonal decompositions.

**S**, whose objects are the elements of

**S**, and whose arrows X → Y are given by the divisions X|Y between the decompositions in

**S**.

**S**, and is a final object. If not we will not get a topos.

_{L}which has for open sets the intervals]0, r[for 0 ≤ r ≤ 1, and particular points in their topos are given by arbitrary probabilized spaces, which is far from the objects we consider, because our classical topos are attached to sigma-algebras over a given set. In fact, our aim is more to develop a kind of geometry in this context, by using homological algebra, in the spirit of Artin, Grothendieck, Verdier, when they developed topos for studying the geometry of schemes.

**Example 5.**The most interesting structures

**S**seem to be provided by the quantum generalization of the simplicial information structure in classical finite probability. A finite family of commuting decompositions Σ = {S

_{1}, …, S

_{n}} is given, they diagonalize in a common orthogonal basis, but it can happen that not all diagonal decompositions associated to the maximal torus belongs to the set of joints W (Σ). In such a case a subgroup G

_{Σ}appears, which corresponds to the stabilizer of the finest decomposition S

_{[n]}= (S

_{1}…S

_{n}). This group is in general larger than a maximal torus of U

_{N}, it is a product of unitary groups (corresponding to common eigenvalues of observables in W (Σ)), and it is named a Levy subgroup of the unitary group. In addition we consider a closed subgroup G in the group U(E; h

_{0}) (which could be identified with U

_{N}), and all the conjugates gY g

^{−1}of elements of W (Σ) by elements of G; this gives a manifold of commutative observable families Σ

_{g}; g ∈ G. More generally we could consider several families Σ

_{γ}; γ ∈ Γ of commuting observables, where Γ is any set. It can happen that an element of Σ

_{γ}is also an element of Σ

_{λ}for λ ≠ γ. The family Γ ∗ Σ of the Σ

_{γ}when γ describes the set Γ forms a quantum information structure. The elements of this structure are (perhaps ambiguously) parameterized by the product of an abstract simplex ∆(n) with the set Δ (in particular Γ = G for conjugated families).

_{γ}of simplicial sub-complexes of ∆(n). In the invariant case, when Γ = G, several restrictions could be usefull, for instance using the structure of the manifold of the conjugation classes of G

_{Σ}under G. The simplest case is given by taking the same complex K for all conjugates gΣg

^{−1}. By definition this latter case is a simplicial invariant family of quantum observables.

**S**is a subspace E

_{A}, which is an element of one of the decompositions X ∈

**S**. For instance, if Y = (Y

_{1}, …, Y

_{m}), the joint event A = (Y

_{1}= y

_{1}, Y

_{2}= y

_{2}, …, Y

_{m}= y

_{m}) gives the space E

_{A}which is the maximal vector subspace of E where A happens, i.e.,

**B**of E such that any decomposition in

**S**is divided by

**B**.

**B**is too rigid, in particular it forbids invariance by the unitary group U(h

_{0}). Thus we decided that a better analog of the Boolean algebra $\mathcal{B}$ is the set U

**B**of all decompositions that are deduced from a given

**B**by unitary transformations.

**Q**

_{1}of the space

**P**= ℙ

**H**

_{+}of hermitian positive matrices modulo multiplication by a constant. Concretely, we identify the elements of

**Q**

_{1}with positive hermitian operators ρ such that T rρ = 1. The space

**P**is naturally stratified by the rank of the form; the largest cell ℙ

**H**

_{++}corresponds to the non-degenerate forms; the smallest cells correspond to the rank one forms, which are called pure states in Quantum Mechanics.

**Q**

_{1}of

**P**which are adapted to

**S**, i.e., which satisfy that if ρ belongs to

**Q**

_{1}, the conditioning of ρ by elements of

**S**also belongs to

**Q**

_{1}. This means that

**Q**

_{1}is closed by orthogonal projections on all the elements E

_{A}of the orthogonal decompositions X belonging to

**S**. Note that a subset of

**P**which is closed by all orthogonal projections is automatically adapted to any information category

**S**.

_{A}is an elementary event (i.e., a subspace of E), we define the conditioning of ρ by A by the hermitian matrix

_{A}for ρ as the trace:

**Q**

_{X}of

**Q**

_{1}, which contains at least all the forms ρ

_{X}where ρ belongs to

**Q**

_{1}. The natural axiom that we assume for the function X ↦

**Q**

_{X}, is that for each arrow of division X → Y , the set

**Q**

_{Y}contains the set

**Q**

_{X}; then we note Y

_{∗}the injection from

**Q**

_{X}to

**Q**

_{Y}. The fact that

**Q**

_{X}is stable by conditioning by every element of a decomposition Y which is less fine than X is automatic; it follows from the fact that

**Q**

_{1}is adapted to

**S**. We will use conditioning in this way.

**Q**such a functor X ↦

**Q**

_{X}from the category

**S**to the category of quantum probabilities, with the arrows given by direct images. The set

**Q**

_{1}is the value of the functor

**Q**for the certitude 1. We must remind that many choices are possible for the functor when

**Q**

_{1}is given; the two extreme being the functor

**Q**

^{max}where

**Q**

_{X}=

**Q**

_{1}for every X, and the functor

**Q**

^{min}where

**Q**

_{X}is restricted to the set of forms ρ

_{X}where ρ describes

**Q**

_{1}; in this last case the elements of

**Q**

_{X}are positive hermitian forms on E, which are decomposed in blocs according to X.

**Q**

^{min}appears to have more sense than

**Q**

^{max}, but we prefer to consider both of them.

**Q**

^{can}(

**S**), is canonically associated to a quantum information structure

**S**:

**Definition 4.**The canonical density functor${\mathbf{Q}}_{X}^{can}(\mathbf{S})$, is made by all positive hermitian forms matched to X, i.e., all the forms ρ

_{X}when ρ describes P

**H**

_{+}.

**Q**

^{min}associated to the full set

**Q**

_{1}= P

**H**

_{+}. When the context is clear, we will simply write

**Q**

^{can}.

**Q**

_{Y}than in

**Q**

_{X}, but there exist less classical laws at the place Y than at the place X, because classical laws are defined on smaller sigma-algebras.

_{ρ}(A) to an event A; then, for an event which is measurable for Y, the law Y

_{∗}ρ

_{X}gives the same result than the law ρ

_{X}.

_{X}Y

_{∗}between

**Q**

_{X}and

**Q**

_{Y}, not a map: we say that the pair (ρ

_{X}, ρ

_{Y}) in

**Q**

_{X}×

**Q**

_{Y}belongs to q

_{X}Y

_{∗}, if for any event which is measurable for Y, we have the equality of probabilities

**S**, i.e., the full subcategory associated to an initial object X

_{0}. This family is a classical information structure. Conversely, if we start with a classical information structure $\mathcal{S}$, made by partitions of a finite set Ω, we can always consider it as a quantum structure associated to the vector space E = ℂ

^{Ω}freely generated over ℂ by the elements of Ω. Note that E comes with a canonical positive definite form h

_{0}, and, to be interesting from the quantum point of view, it is better to extend $\mathcal{S}$ by applying to it all unitary transformations of E, generating a quantum structure $\mathcal{S}=U\mathcal{S}$.

**Remark 5.**Suppose that

**S**is unitary invariant, we can define a larger category

**S**

^{U}by taking as arrows the isomorphisms of ordered decomposition, and close by all compositions of arrows of

**S**with them. Such an invariant extended category

**S**

^{U}is not far to be equivalent to the category ${\mathcal{S}}^{\mathfrak{S}}$, made by adding arrows for permutations of the sets Ω/X (cf. above section), from the point of view of category theory: let us work an instant, as we will do in the last part of this paper, with ordered partitions of Ω, being itself equipped with an order, and ordered orthogonal decompositions of E. In this case we can associate to any ordered partition X = (E

_{1}, …, E

_{m}) of E, the unique ordered partition Ω compatible with the sequence of dimensions and the order of Ω. It gives a functor τ from

**S**to $\mathcal{S}$ such that $\iota \phantom{\rule{0.2em}{0ex}}\circ \phantom{\rule{0.2em}{0ex}}\tau =I{d}_{\mathcal{S}}$, where ι denotes the inclusion of $\mathcal{S}$ in

**S**. These two functors are extended, preserving this property, to the categories

**S**

^{U}and ${\mathcal{S}}^{\mathfrak{S}}$. In fact, the functor ι sends a permutation to the unitary map which acts by this permutation on the canonical basis, and the functor τ sends a unitary transformation g between X ∈

**S**and gXg

^{∗}∈

**S**to the permutation it induces on the orthogonal decompositions. Moreover, consider the map f which associates to any X ∈ S

^{U}the unique morphism from the decomposition ι ◦ τ(X) to X; it is a natural transformation from the functor ι ◦ τ to the functor $I{d}_{{\mathcal{S}}^{U}}$, which is invertible, then it defines an equivalence of category between ${\mathcal{S}}^{\mathfrak{S}}$ and

**S**

^{U}. However a big difference begins with probability functors.

**Q**be a quantum density functor adapted to

**S**, and note ι

^{∗}

**Q**the composite functor on $\mathcal{S}$; we can consider the map Q which associates to $X\in \mathcal{S}$ the set of classical probabilities ℙ

_{ρ}for ρ ∈

**Q**

_{X}. If X divides Y, the fact that the direct image Y

_{∗}ℙ(ρ) of ρ ∈

**Q**

_{X}coincides with the law ${\mathbb{P}}_{{Y}_{*}\phantom{\rule{0.2em}{0ex}}(\rho )}$ gives the following result:

**Lemma 4.**p ↦ ℙ

_{ρ}is a natural transformation from the functor ι

^{∗}

**Q**to the functor Q.

**Definition 5.**This natural transformation is called the Trace, and we denote by T r

_{X}its value in X, i.e., T r

_{X}(ρ) = ℙ

_{ρ}, seen as a map from

**Q**

_{X}to ${\mathcal{Q}}_{X}$.

**Q**

_{X}.

#### 4.3. Quantum Information Homology

**S**, equipped with the sheaf of monoids {

**S**

_{X}; X ∈

**S**}. In the ringed topos of sheaves of

**S**-modules, the choice of a probability functor

**Q**generates remarkable elements in this topos, formed by the functional space

**F**of measurable functions on

**Q**with values in ℝ. The action of the monoid (or the generated ring) being given by averaged conditioning, and the arrows being given by transposition of direct images. Then, the quantum information co-homology is the topos co-homology:

_{X}of m observables Y

_{1}, …, Y

_{m}divided by X, and one density ρ indexed by X ∈

**S**, is said local, when for any decomposition X dividing a decomposition Y, we have, for each ρ in

**Q**

_{X},

_{X}is an element of the topos.

_{X}, X ∈

**S**is a natural transform F from a free functor

**S**

_{m}to the functor

**F**.

_{X}(Y

_{1}; …; Y

_{n}; ρ) depends only on the family of conditioned densities ${E}_{{A}_{i}}^{*}\rho {E}_{{A}_{\iota}};\phantom{\rule{0.2em}{0ex}}i=0,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m$, where A

_{i}is one of the possible events defined by Y

_{i}.

**Q**; for instance it is false for a

**Q**

^{max}, but it is true for a

**Q**

^{min}.

**Q**

^{max}is given by a function F (ρ) which is independent of X. It is local (in the sense of topos that we adopt) but it is non-local in the apparently more natural sense that it depends only of ρ

_{X}. This is important to have this quantum particularity in the mind for understanding the following discussion.

_{A}’s are the spectral projectors of the bundle Y. In this definition there is no necessity to assume that Y commutes with the Y

_{j}’s.

^{∗}

_{A}ρE

_{A}is non-zero, ρ|A is equal to E

^{∗}

_{A}ρE

_{A}/T r(E

_{A}

^{∗}ρE

_{A}), and verifies the normalization condition that the trace equals to one. When E

^{∗}

_{A}ρE

_{A}is equal to zero, the factor T r(E

^{∗}

_{A}ρE

_{A}) is zero, then by convention the corresponding term F is absent.

**S**

_{X}.

**Q**which is adapted to

**S**, the Von-Neumann entropy defines a local 0-cochain, that we will call S

_{X}, and is simply the restriction of S to the set

**Q**

_{X}. If ρ belongs to

**Q**

_{X}and if X divides Y , the law Y

_{∗}ρ, which is the same hermitian form as ρ belongs to

**Q**

_{Y}by functoriality, thus S(Y

_{∗}ρ) = S(ρ) is translated by S

_{X}(ρ) = S

_{Y}(Y

_{∗}ρ). This 0-cochain will be simply named the Von Neumann entropy.

**Q**

^{max}, S

_{X}gives the same value at all places X. In the case of

**Q**

^{min}it coincides with S(ρ

_{X}), where ρ

_{X}denotes the restriction to the decomposition X.

_{X}) is not a local 0-cochain for

**Q**

^{max}. In fact in the case of

**Q**

^{max}we have the same set

**Q**=

**Q**

_{X}for every place X, thus, if we take for X a strict divisor of Y and if we take a density ρ such that, for the restrictions of ρ, the spectrum of ρ

_{Y}and ρ

_{X}are different, then, in general, we do not have S

_{X}(ρ) = S

_{Y}(Y

_{∗}ρ), even if, as it is the case in the quantum context, Y

_{∗}ρ = ρ.

**Q**

^{max}, where every function of ρ independent of X is a cochain of degree zero, the particular functions which depends only on the spectrum of ρ are invariant under the action of the unitary group, and they are the only 0-cochains which are invariant by this group.

**Definition 6.**Suppose that

**S**and

**Q**are invariant by the unitary group, as is U

**B**, we say that an m-cochain F is invariant, if for every X in

**S**dividing Y

_{1}, …, Y

_{m}in S, every ρ in

**Q**

_{X}and every g in the group U(h

_{0}), we have

^{∗}, g.Y

_{i}= gY

_{i}g

^{∗}; i = 1, …, m and g.ρ = gρg

^{∗}.

**S**

_{X}on cochains respects the invariance.

^{∗}(S; Q).

**Q**

_{X}.

_{j}} and the quantum law ρ is

_{∗}ρ) when X divides Y . Thus the Shannon (or Gibbs) entropy is not a local 0-cochain, but it is a local 1-cochain, i.e., if X → Y → Z we have

**Q**

^{min}.

**Lemma 5.**Let X, Y be two commuting families of observables; we have

**Proof.**We denote by α, β, … the indices of the different values of X, by k, l, … the indices of the different values of Y , and by i, j, … the indices of a basis I

_{k,α}of eigenvectors of the conditioned density ${\rho}_{k,\alpha}={E}_{k,\alpha}^{*}\rho {E}_{k,\alpha}$ constrained by the projectors E

_{k,α}of the pair (Y, X). The probability ${p}_{k}={P}_{\rho}(X={\xi}_{k})$ is equal to the sum over i, α of the eigenvalues λ

_{i,k,α}of ρ

_{k,α}. We have

**Remark 6.**Taking X = 1, or any scalar matrix, the preceding Lemma 5 expresses the fact that classical entropy is a derived quantity measuring the default of equivariance of the quantum entropy:

**Lemma 6.**For any X ∈

**S**, dividing Y ∈

**S**and ρ ∈

**Q**

_{X},

**Proof.**This is exactly what says the Lemma 5 in this particular case, because in this case (X, Y) = X, and, by definition, we have $\widehat{\delta}({S}_{X})(Y;\rho )=Y.{S}_{X}\phantom{\rule{0.2em}{0ex}}(\rho )-{S}_{X}(\rho )$.

**Q**, thus for

**Q**

^{min}and for

**Q**

^{max}as well.

_{0}in

**S**, i.e., a maximal set of commuting observables in

**S**, the elements of this maximal partition form a finite set Ω

_{0}. If

**S**is invariant by the group U(E; h

_{0}), all the maximal observables are deduced from X

_{0}by applying a unitary base change. Suppose that the functor

**Q**is invariant also; then we get automatically a symmetric classical structure of information $\mathcal{S}$ on Ω

_{0}, given by the elements of

**S**divided by X

_{0}. And $\mathcal{S}$ is equipped with a symmetric classical functor of probability, given by the probability laws associated to the elements of $\mathcal{S}$.

_{ρ}for each ρ, and we noticed that the trace is compatible with invariance and symmetry by permutations.

**Definition 7.**To each classical co-chain F

^{0}we can associate a quantum co-chain F = tr

^{∗}F

^{0}by putting

**Proposition 3.**(i) The trace of co-chains defines a map of the classical information Hochschild complex to the quantum one, which commutes with the co-boundaries, i.e., the map tr

^{∗}defines a map from the classical information Hochschild complex to the quantum Hochschild complex; (ii) this map sends symmetric cochains to invaraint cochains; it induces a natural map from the symmetric classical information co-homology ${H}_{\mathfrak{S}}^{*}\phantom{\rule{0.2em}{0ex}}(\mathcal{S},\phantom{\rule{0.2em}{0ex}}\mathcal{Q})$ to the invariant quantum information co-homology H

_{U}

^{∗}(

**S**;

**Q**).

**Remark 7.**In a preliminary version of these notes, we considered the expression s(X; ρ) = S(ρ

_{X}) − S(ρ) and showed it satisfies formally the 1-cocycle equation. But we suppress this consideration now, because s is not local, thus it plays no interesting role in homology. For instance in

**Q**

^{min}, S(ρ

_{X}) is local but S(ρ) is not and in

**Q**

^{max}, S(ρ) is local but S(ρ

_{X}) is not.

**Definition 8.**In an information structure

**S**we call edge a pair of decompositions (X, Y) such that X, Y and XY belong to

**S**; we say that an edge is rich when both X and Y have at least two elements and XY cuts those two in four distinct subspaces of E. The structure

**S**is connected if every two points are joined by a sequence of edges, and it is sufficiently rich when every point belongs to a rich edge. We assume a maximal set of subspaces U

**B**is given in the Grassmannian of E, in such a way that the maximal elements X

_{0}of

**S**(i.e., initial in the category) are made by pieces in U

**B**. The density functor

**Q**is said complete with respect to

**S**(or U

**B**) if for every X, the set

**Q**

_{X}contains the positive hermitian forms on the blocs of X, that give scalar blocs ρ

_{αβ}for two elements E

_{α}, E

_{β}of a maximal decomposition. (All that is simplified when we choose a basis, and take maximal commutative subalgebras of operators, but we want to be free to consider simplicial complexes.)

**Theorem 3.**(i) for any unitary invariant quantum information structure

**S**, which is connected and sufficiently rich, and for the canonical invariant density functor

**Q**

^{can}(

**S**), (i.e., the density functor which is minimal and complete with respect to

**S**), the invariant information co-homology of degree one ${H}_{U}^{1}(\mathcal{S};\phantom{\rule{0.2em}{0ex}}\mathcal{Q})$ is zero. (ii) Under the same hypothesis, the invariant co-homology of degree zero has dimension one, and is generated by the constants. Then, up to an additive constant, the only invariant 0-cochain which has the Shannon entropy as co-boundary is (minus) the Von-Neumann entropy.

**Proof.**(I) Let X, Y be two orthogonal decompositions of E belonging to

**S**such that (X, Y) belongs to

**S**, and ρ an element of

**Q**. We name ${A}_{{k}_{i}};\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m$ the summands of X, and ${B}_{{\alpha}_{j}};\phantom{\rule{0.2em}{0ex}}j=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}l$ the summands of Y ; the projections ${E}_{{k}_{i}}\rho {E}_{{k}_{i}};\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m$ resp. ${E}_{{\alpha}_{j}}\rho {E}_{{\alpha}_{j}};j=1,\phantom{\rule{0.2em}{0ex}}\dots ,l$ of ρ on the summands of X, resp. Y are denoted by ${\rho}_{{k}_{i}};\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m$ and ${\rho}_{{\alpha}_{j}};\phantom{\rule{0.2em}{0ex}}j=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}l$ respectively. The projections by the commutative products ${E}_{{k}_{i}}{E}_{{\alpha}_{j}}$ are denoted by ${\rho}_{{k}_{i},\phantom{\rule{0.2em}{0ex}}{\alpha}_{j}};\phantom{\rule{0.2em}{0ex}}i=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}m,\phantom{\rule{0.2em}{0ex}}j=1,\phantom{\rule{0.2em}{0ex}}\dots ,\phantom{\rule{0.2em}{0ex}}l$.

**Q**

^{min}, F is a function of the ${\rho}_{{k}_{i}}$, H a function of the ${\rho}_{{\alpha}_{j}}$ and G a function of the ${\rho}_{{k}_{i},\phantom{\rule{0.2em}{0ex}}{\alpha}_{j}}$, but there is no necessity too assume this property; we can always consider these functions restricted to diagonal blocs, which are arbitrary due to the completeness hypothesis.

_{α}resp. A

_{i}. The co-cycle equation gives the two following equations, that are exchanged by permuting X and Y:

_{k,α}are zero except for (k

_{1}, α

_{2}) and (k

_{j}, α

_{1}) for j = 2, …, m. We denote by h

_{1}the forme ${\rho}_{{k}_{1},{\alpha}_{2}}$ and by h

_{i}the form ${\rho}_{{k}_{i},{\alpha}_{1}}$, for i = 2, …, m. Remark that Tr(h

_{1}+ h

_{2}+ … + h

_{m}) = 1.

_{a}of Z; because the equation f

_{X}(Z, Z; ρ) = f

_{X}(Z; ρ) + Z.f

_{X}(Z; ρ) implies Z.f

_{X}(Z, ρ) = 0, and if ρ has only one non-zero factor ρ

_{a}, we have

_{3}= … = h

_{m}= 0 we have F (0, h

_{2}/(1−x

_{1}), 0, …, 0) = 0 for the reason which eliminated the $H(({\rho}_{{\alpha}_{j}}|{k}_{i});\phantom{\rule{0.2em}{0ex}}j)$; thus we obtain

_{1}, h

_{2}only, and that of course they coincide as functions of these small blocs.

_{00}, ρ

_{01}, ρ

_{10}, ρ

_{11}, where the first index refers to Y and the second index refers to Z, but the blocs that are allowed for Y and Z are more numerous than four; there exist out of diagonal blocs, and their role will be important in our analysis. For Y we have matrices ${\rho}_{0}^{0}$ and ${\rho}_{1}^{0}$, and for Z we have matrices ${\rho}_{0}^{1}$ and ${\rho}_{1}^{1}$;

_{Y}or ρ

_{Z}:

^{∗}, and ρ by gρg

^{∗}, the value of ${F}_{Y}\phantom{\rule{0.2em}{0ex}}({\rho}_{0}^{0},\phantom{\rule{0.2em}{0ex}}{\rho}_{1}^{0})$ does not change. Our claim is that the only function F

_{Y}which is compatible with the Equation (106) for every ρ are functions of the traces of the blocs.

_{Y}of the right member, involves the eight blocs, but all the other functions involve only the four diagonal blocs. Thus our claim follows from the following result:

**Lemma 7.**A measurable function f on the set H of hermitian matrices which is invariant under conjugation by the unitary group U

_{n}and invariant by the change of the coefficient a

_{1}

_{n}, the farthest from the diagonal, is a function of the trace.

**Proof.**An invariant function for the adjoint representation is a function of the traces of the exterior powers Λ

^{k}(ρ), but these traces are coefficients in the basis ${e}_{{i}_{1}}\phantom{\rule{0.2em}{0ex}}\wedge \phantom{\rule{0.2em}{0ex}}{e}_{{i}_{1}}\phantom{\rule{0.2em}{0ex}}\wedge \phantom{\rule{0.2em}{0ex}}\dots \wedge {e}_{{i}_{k}}$, and the elements divisible by e

_{1}∧ e

_{n}cannot be neglected, as soon as k ≥ 2.

_{Y}, F

_{Z}comes from the image of tr

^{*}in proposition 3. Then the recurrence relation (100) implies that the same is true for the whole co-cycle F.

_{X}(ρ), which depends only on the spectrum of ρ, is a constant. We know that a spectral function is a measurable function φ(σ

_{1}; σ

_{2}; …) of the elementary symmetric functions ${\sigma}_{1}={\displaystyle {\sum}_{i}{\lambda}_{i},{\sigma}_{2}={\displaystyle {\sum}_{i<j}{\lambda}_{i}{\lambda}_{j},\dots}}$.

_{X}(ρ) = φ

_{X}(σ

_{1}, σ

_{2}, …),

_{01}= λ

_{00}= 0, and varying λ

_{10}, λ

_{11}, we find that f(x, y) is the sum of a constant and a linear function.

_{X}must be the sum of a constant and a linear function for every X. However, a linear symmetric function is a multiple of σ

_{1}. As ρ is normalized by the condition Tr(ρ) = 1, only the constant survives.

**Remark 8.**In his book “Structure des Systemes Dynamiques”, J-M. Souriau [48] showed that the mass of a mechanical system is a degree one class of co-homology of the relativity group with values in its adjoint representation; this class being non-trivial for classical Mechanics, with the Galileo group, and becoming trivial for Einstein relativistic Mechanics, with the Lorentz-Poincare group. Even if we are conscious of the big difference with our construction, the above result shows the same thing happens for the entropy, but going from classical statistics to quantum statistics.

## 5. Product Structures, Kullback–Leibler Divergence, Quantum Version

_{j}(or density of states respectively) on Ω (or E respectively) belonging to the space ${\mathcal{Q}}_{X}$ that are absolutely continuous with respect to P

_{0}, and several decompositions Y

_{i}less fine than X. To be homogeneous co-chains these functions have to behave naturally under direct image Y

_{∗}(P

_{i}), and to satisfy the equivariance relation:

_{X}(resp.

**S**

_{X}), where

_{0}, which justifies the coma notation.

_{X}(X

_{1}; …; …; X

_{m}; P

_{0}; P

_{1}, P

_{2}, …, P

_{n}) which behave naturally under direct images, without equivariance condition.

^{1}-density dQ/dP , and the definition is

**Proposition 4.**The map which associates to X in $\mathcal{S}$, Y divided by X, and two laws P, Q the quantity H(Y

_{∗}P ; Y

_{∗}Q) defines a non-homogeneous 1-cocycle, denoted H

_{X}(Y ; P ; Q).

**Proof.**As we already know that the classical Shannon entropy is a non-homogeneous 1-cocycle, it is sufficient to prove the Hochschild relation for the new function

_{ij}(resp. q

_{ij}) the probability for P (resp. Q) of the event Y = x

_{i}, Z = y

_{j}, and by p

^{j}(resp. q

^{j}) the probability for P (resp. Q) of the event Z = y

_{j}; then the probability p

^{j}(resp. ${q}_{i}^{j}$) of Y = x

_{i}knowing that Z = y

_{j}for P (resp. for Q) is equal to p

_{ij}/p

^{j}(resp. q

_{ij}/q

^{j}), and we have

_{m}(Z; P ; Q) and the second is (Z.H

_{m})(Y ; P ; Q), Q.E.D.

_{X}(Y ; Z; P ; Q) = H

_{X}(Y ; P ; Q)− H

_{X}(Z; P ; Q), named Kullback-divergence variation.

_{X}(Y ; ρ; σ) by the formula

_{k}associated to Y and where ρ

_{k}(resp. σ

_{k}) denotes the matrix E

_{k}

^{∗}ρE

_{k}(resp. E

_{k}

^{∗}σE

_{k}). It is the Kullback–Leibler divergence of the classical laws associated to the direct images ρ and σ respectively.

**Q**

_{X},

**Lemma 8.**For any pair (X, Y) of commuting hermitian operators, such that Y divides X, the function S

_{X}satisfies the relation

_{X}of two variables denotes the mixed entropy, defined by Equation (119).

**Proof.**As in the proof of the Lemma 4, we denote by α, β, … (resp. k, l, …) the indices of the orthogonal decomposition Y (resp. X), and by i, j, … the indices of a basis φ

_{i,k,α}of the space E

_{k,α}made by eigenvectors of the matrix ${\mathcal{G}}_{k,\alpha}={E}_{k,\alpha}^{*}\rho {E}_{k,\alpha}$ belonging to the joint operator (X, Y). In a general manner if M is an endomorphism of E

_{k,α}we denote by M

_{i,k,α}the diagonal coefficient of index (i, k, α). The probability p

_{k}(resp. q

_{k}) for ρ (resp. σ) of the event X = ξ

_{k}is equal to the sum over i, α of the eigenvalues λi,k,α of ρ

_{k,α}(resp. µi,k,α of σ

_{k,α}). And the restricted density ρ

^{Yk}(resp. σ

^{Yk}), conditioned by X = ξ

_{k}, is the sum over α of ϱ

_{k,α}(resp. of σ

_{k,α}) divided by p

_{k}(resp. q

_{k}). We have

## 6. Structure of Observation of a Finite System

#### 6.1. Problems of Discrimination

_{i}of respective cardinalities m

_{i}, and we consider the set M of sequences x

_{1}, …, x

_{n}where x

_{i}belongs to M

_{i}; by definition a system is a subset X of M and a state of the system is an element of X. The set of (classical) observable quantities is a (finite) subset A of the functions from X to R.

_{0}marks the root s

_{0}, it means that we aim to measure F

_{0}(x) for the states; then branches issued from t

_{0}are indexed by the values v of F

_{0}, and to each branch F

_{0}= υ corresponds a subset X

_{υ}of states, giving a partition of X. If F

_{1}

_{,v}is the observable at the final vertex α

_{v}of the branch F

_{0}= υ, the next step in the program is to evaluate F

_{1}

_{,v}(x) for x ∈ X

_{v}; then branches issued from α

_{v}corresponds to values w of F

_{1}

_{,}

_{υ}restricted to X

_{v}, and so on.

_{0}. The function ν with values in ℕ is called the level in the tree.

_{υ}consists of one element only; in this case we decide to extend the tree to the next levels by a branch without bifurcation, for instance by labelling with the same observable and the same value, but it could be any labelling, and its value on X

_{v}. In such a way, each level k gives a well defined partition π

_{k}of X.

_{k}of Γ, such that its final branches are bearing π

_{k}. This gives a sequence π

_{0}, π

_{1}, …, π

_{l}of finer and finer partitions of X, i.e., a growing sequence of partitions (if the ordering on partition is the opposite of the sense of arrows in the information category Π(X)). The tree is said fully discriminant if the last partition π

_{l}, which is the finest is made by singletons.

_{1}, …, ξ

_{n}, if we know that m have the same mass and n − m have another common mass, how many measures must be performed, to separate the two groups and decide which is the heavier?

**Remark 9.**The discrimination problem is connected with the coding problem. In fact a finite system X (as we defined it just before) is nothing else than a particular set of words of length n, where the letter appearing at place i belongs to an alphabet M

_{i}. Distinguishing between different words with a set A of variables f, is nothing else than rewriting the words x of X with symbols v

_{f}(labelling the image f(X)). To determine the most economical manner to do that, consists to find the smallest maximal length l of words in the alphabet (f, v

_{f}); f ∈ A, v

_{f}∈ f(X) translating all the words x in X. This translation, when it is possible, can be read on the branches of a fully discriminating rooted tree, associated to an optimal strategy, of minimal level l. The word that translate x being the sequence (F

_{0}, v

_{0}), (F

_{1}, v

_{1}), …, (F

_{k}, v

_{k}), k ≤ l, of the variables put on the vertices along the branch going from 0 to x, and the values of these variables put along the edges of this branch.

#### 6.2. Observation Trees. Galois Groups and Probability Knowledge

_{0}and a family of direct decompositions in linear spaces U

**B**). In each situation we have a natural notion of observable quantity: in the case of Ω it is a partition Y compatible with $\mathcal{B}$ (i.e., less fine than $\mathcal{B}$) with numbering of the parts by the integers 1, .., k if Y has k elements; in the case of E it is a decomposition Y compatible with U

**B**(i.e., each summand is direct sum of elements of one of the decompositions u

**B**; for u ∈ U(h

_{0})), with a numbering of the summands by the integers 1, .., k if Y has k elements. We also have a notion of probability: in the case of (Ω, Y) it is a classical probability law P

_{Y}on the quotient set Ω/Y; in the case of (E, Y) it is a collection of non-negative hermitian forms h

_{Y,i}on each summands of Y.

**S**, if necessary): they are categories made by objects that are observables and arrows that are divisions, satisfying the condition that if X ∈ S divides Y and Z in S, then the joint (Y, Z) belongs to S.

_{X}(which can be typographically distinguished in the two cases by ${\mathcal{Q}}_{X}$ and

**Q**

_{X}) of direct images. When $\mathcal{S}$ is a classical subcategory of the quantum structure

**S**, we suppose that we have a trace transformation from ι

^{∗}

**Q**to $\mathcal{Q}$, and if

**S**and

**Q**are unitary invariant, we remind that, thanks to the ordering, we have an equivalence of category between

**S**

^{U}and $\mathcal{S}$, and a compatible morphism from the functional module ${\mathcal{F}}_{\mathcal{Q}}$ to the functional module ${\mathcal{F}}_{\mathbf{Q}}$.

**S**, U

**B**,

**Q**. Be careful that now all observable quantities are ordered, either partitions, either direct decomposition. We will always assume the compatibility condition between Q and S, meaning that every conditioning of P ∈ Q by an event associated to an element of S belongs to Q.

^{*}which associates the partition Y ○ σ to any partition Y, sends $\mathcal{A}$ into $\mathcal{A}$.

_{*}sends an element of $\mathcal{Q}$ to an element of $\mathcal{Q}$.

_{0}and U

**B**, we do the same by asking in addition that σ is a linear unitary automorphism of E.

**Definition 9.**If X, S, Q, B and A are given, the Galois group G

_{0}is the set of permutations of X (resp. linear maps) that respect S, Q, B and A.

**Example 6.**Consider the system X associated to the simple classical weighting problem: states are parameterized by points with coordinates 0, 1 or −1 in the sphere S

^{n−}

^{1}of radius 1 in ℝ

^{n}, according to their weights, either normal, heavier or lighter. Thus in this case Ω = X possesses 2n points. The set A of elementary observables is given by the weighting operations F

_{I,J}, Equation (132). For $\mathcal{S}$ we take the set $\mathcal{S}(A)$ of all ordered partitions π

_{k}obtained by applications of discrimination trees labelled by A. And we consider only the uniform probability P

_{0}on X; in $\mathcal{Q}$ this gives the images of this law by the elements of $\mathcal{S}$, and the conditioning by all the events associated to $\mathcal{S}$.

_{0}is the subgroup ${\mathcal{S}}_{n}\times {C}_{2}$ of ${\mathcal{S}}_{2n}$ made by the product of the permutation group of n symbols by the group changing the signs of all the x

_{i}for i in [n].

_{i}, one can compensate the effect of σ on F

_{I,J}by taking G

_{I,J}= F

_{J,I}, i.e., by exchanging the two sides of the balance.

^{+}) and σ(i

^{−}) are states associated to different coins, for instance σ(i

^{+}) = j

^{+}and σ(i

^{−}) = k

^{+}, with j ≠ k, or σ(i

^{+}) = j

^{+}and σ(i

^{−}) = k

^{−}, with j ≠ k. Two cases are possible: these states have the same mass, or they have opposite mass. In both cases let us consider a weighting F

_{j,h}(x) = x

_{j}− x

_{h}, where h ≠ k; by applying σ

^{*}F

_{j,h}to x = σ(i

^{+}) we find +1 (or −1), and by applying σ

^{*}F

_{j,h}to x = σ(i

^{−}) we find 0. However, this cannot happen for a weighting, because for a weighting, either the change of i

^{+}into i

^{−}has no effect, either it exchanges the results +1 and −1. Finally, consider a permutation σ that respects the indices but exchanges the signs of a subset I = {i

_{1}, …, i

_{k}}, with 0 < k < n. In this case let us consider a weighting F

_{i,j}(x) = x

_{i}− x

_{j}with i ∈ I and j ∈ [n]\I, the function F

_{i,j}○ σ takes the value +1 for the states i

^{−}, j

^{−}, the value −1 for i

^{+}, j

^{+}and the value 0 for the other states, which cannot happen for any weighting, because this weighting must involve both i and j, but it cannot be F

_{j,i}(x) = x

_{j}− x

_{i}, which takes the value −1 for j

^{−}, and it cannot be F

_{i,j}which takes the value +1 for i

^{+}.

_{0}. This corresponds to the Jaynes principle [51,52].

_{s}belonging to A and each arrow α beginning at s is labelled by an element F

_{s}(i) of F

_{s}. A priori we introduce as many branches as there exist elements in F

_{s}. The disposition of the arrows in the trigonometric circular order makes that the tree Γ is imbedded in the Euclidian plane up to homotopy.

_{1}, …, α

_{k}of oriented edges, such that, for each i the initial extremity of α

_{i}

_{+1}is the terminal extremity of α

_{i}. Then α

_{i}

_{+1}starts with the label F

_{i}and ends with the label F

_{i}

_{+1}. We will say that γ starts with the root if the initial extremity of α

_{1}is the root s

_{0}, with a label F

_{0}.

_{i}; i = 0, …, F

_{k}and the edges are decorated with values v

_{i}of these functions; we note

_{0}(x) = v

_{0}, …, F

_{k−}

_{1}(x) = v

_{k−}

_{1}.

_{k}de X.

**Definition 10.**We say that an observation tree Γ labelled by A is allowed by S, if all joint observable along each branch belongs to S.

**Definition 11.**Let α be an edge of Γ, we note $\mathcal{Q}\left(\alpha \right)$ the set of probability laws on X(α) which are obtained by conditioning by the values v

_{0}, v

_{1}…, v

_{k−}

_{1}of the observables F

_{0}, F

_{1}, …, F

_{k−}

_{1}along the branch γ(α) starting in the root and ending with α.

**Definition 12.**The Galois group G(α) is the set of permutations of elements of X(α) that belongs to G

_{0}, preserve all the equations F

_{i}(x) = v

_{i}(resp. all the summands of the orthogonal decomposition F

_{i}labelling the edges) and preserve the sets of probability Q(α) (resp. quantum probabilities).

_{0}by fixing point by point all the elements of X outside X(α).

**Remark 10.**Let P be a probability law (either classical or quantum) on X, Φ = (F

_{i}; i ∈ I) a collection of observables, and φ = (v

_{i}; i ∈ I) a vector of possible values of Φ; the law P |(Φ = φ) obtained by conditioning P by the equations Φ(x) = φ, is defined only if the set X

_{φ}of all solutions of the system of equations Φ(x) = φ has a non-zero probability p

_{φ}= P (X

_{φ}). It can be viewed either as a law on X

_{φ}, or as a law on the whole X by taking the image by the inclusion of X

_{φ}in X.

**Definition 13.**The edge α is said Galoisian if the set of equations and probabilities that are invariant by G(α) coincide respectively with X(α) and $\mathcal{Q}\left(\alpha \right)$.

_{k}which is the product of the groups G(α) for the free edges at level k; it is a subgroup of G

_{0}preserving elements by elements the pieces of the partition π

_{k}.

_{l}, l ≤ k of X is increasing (finer and finer) and the sequence of groups G

_{l}, l ≤ k is decreasing.

_{0}, G(α

_{1}), …, G(α

_{k}) is decreasing. We propose that the quotient G(α

_{i}

_{+1})/G(α

_{i}) gives a measure of the Galoisian information gained by applying F

_{i}and obtaining the value v

_{i}.

**Remark 11.**In terms of coding, introducing probabilities on the X(α) permits to formulate the principle, that it is more efficient to choose, after the edge α, the observation having the largest conditional entropy in Q(α). In what circumstances it gives the optimal discrimination tree is a difficult problem, even if the folklore admit that as a theorem. It is the problem of optimal coding.

**Definition 14.**We say that an observation tree Γ labelled by A is allowed by S and by X ∈ S, if it is allowed by S

_{X}, which means that all joint observable along each branch is divided by X.

**Definition 15.**S(A) is the set of (ordered) observables π

_{k}which can be obtained by allowed observation trees. For X ∈ S we note S

_{X}(A) the set of (ordered) observables π

_{k}which can be obtained by observation trees that are allowed by S and X.

**Lemma 9.**The joint product defines a structure of monoid on the set S

_{X}(A).

**Proof.**Let Γ, Γ′ be two observation trees allowed by A, S and X ∈ S, of respective lengths k, k′, giving final decompositions S, S′. To establish the lemma we must show that the joint SS′ is obtained by a tree associated with A, allowed by S and X.

_{k}′ (Γ′). To finish the proof we have to show that each element of π

_{k}

_{+}

_{k}′ (ΓΓ′) is the intersection of element of π

_{k}(Γ) with one element of π

_{k}′ (Γ′), because we know these observables are in S

_{X}, which is a monoid, by the definition of information structure. But a complete branch γ.γ′ in ΓΓ′, going from the root to a terminal edge at level k + k′, corresponds to a word (F

_{0}, v

_{0}, F

_{1}, v

_{1}, …, F

_{k−}

_{1}, v

_{k−}

_{1}, ${{F}^{\prime}}_{0}$, ${{v}^{\prime}}_{0},\dots ,{{F}^{\prime}}_{{k}^{\prime}-1}$, ${{v}^{\prime}}_{{k}^{\prime}-1}$, thus the final set of the branch γ.γ′ is defined by the equations F

_{i}= v

_{i}; i = 0, …, k−1 et ${{F}^{\prime}}_{j}={{v}^{\prime}}_{j}$; j = 0, …, k′−1, and is the intersection of the sets respectively defined by the first and second groups of equations, that belong respectively to π

_{k}(Γ) and π

_{k}′ (Γ′).

_{X}(A) in the information structure S(A).

#### 6.3. Co-Homology of Observation Strategies

_{1}, …, n

_{m}) on the set of ordered partitions:

_{1}, ω

_{1}), …, (π

_{m}, ω

_{m}) of respective lengths n

_{1}, …, n

_{m}; the results is the ordered partition obtained by cutting each piece X

_{i}of π by the corresponding decomposition π

_{i}and renumbering the non-empty pieces by integers in the unique way compatible with the orderings ω, ω

_{1}, …, ω

_{m}. Observe the important fact that the result has in general less than n = n

_{1}+ … + n

_{m}pieces. This introduces a strong departure from usual multi-products (cf. P. May [17,53], Loday-Vallette [10]). We do not have an operad, when introducing vector spaces V (m) generated by decompositions of length m, we get filtered but not graded structures. However a form of associativity and neutral element are preserved, hence we propose to name this structure a filtered operads.

_{m}the collection of products for the same length m.

_{i}between 1 and n

_{i}that counts the pieces of the decomposition of the element X

_{i}of π are functions m

_{i}(π, ω, π

_{i}, ω

_{i}). There exists a growing injection η

_{i}: [m

_{i}] → [n

_{i}], which depends only on (π, ω, π

_{i}, ω

_{i}) telling what indices of (π

_{i}, ω

_{i}) survive in the product. These injections are integral parts of the structure of filtered operad. In particular, if we apply a permutation σ

_{i}to [n

_{i}], i.e., if we replace ω

_{i}by ω

_{i}○ σ

_{i}, the number can change.

_{i}, ω

_{i}) of lengths n

_{i}, for i between 1 et k, are composed from µ(n

_{i}; ${n}_{i}^{1},\dots ,{n}_{i}^{{n}_{i}}$) with the n

_{i}-uples (…, $({\pi}_{i}^{j},{\omega}_{i}^{j})$, …) whose respective lengths are ${n}_{i}^{j}$, and if the result µ

_{i}for each i has length ( ${m}_{i}^{1}+\dots +{m}_{i}^{{n}_{i}}$) where ${m}_{i}^{j}$ is function of (π

_{i}, ω

_{i}) and $({\pi}_{i}^{j},{\omega}_{i}^{j})$, then the product of (π, ω) of length k with the µ

_{i}is the same as the one we would have obtained by composing µ(k; n

_{1}, …, n

_{k})((π, ω); (π

_{1}, ω

_{1}), …)) with the m = m

_{1}+ … + m

_{k}ordered decompositions $({\pi}_{i}^{j},{\omega}_{i}^{j})$ for j belonging to the image of η

_{i}: [m

_{i}] → [n

_{i}]. This result is more complicate to write than to prove, because it only expresses the associativity of the ordinary join of three partitions; from which ordering follows.

_{i}letters which preserve the images of the maps η

_{i}.

_{i}can be reformulated by telling the effect of σ on the multiple product µ is the same as the effect of σ on the indices of the (π

_{i}, ω

_{i}). In other terms, the effect of σ on ω is compensated by the action of σ

^{−}

^{1}on the indices of the (π

_{i}, ω

_{i}). One has to be careful, because the result of µ applied to (π, ω ○ σ) has in general not the same length as µ applied to (π, ω). However the compensation implies that µ

_{k}is well defined on the quotient of the set of sequences ((π, ω), (π

_{1}, ω

_{1}), …) by the diagonal action of ${\mathcal{S}}_{k}$, which permutes the k pieces of π and which permutes the indices i of the n

_{i}in the other factors.

_{i}, ω

_{i}); i = 1, …, m are generated by a collection of observation trees Γ

_{i}; then the result of the application of µ(m; n

_{1}, …, n

_{m}) to (π, ω) and (π

_{i}, ω

_{i}); i = 1, …, m is generated by the observation tree that is obtained by grafting each Γ

_{i}on the vertex number i. Drawing the planar trees associated to three successive sets of decompositions for two successive grafting operations helps to understand the associativity property.

_{1}+ … + n

_{m}free edges, where n

_{i}denotes the number of free edges of Γ

_{i}comes from the possibility to find an empty set X(β) at some moment along a branch of the grafted tree; this we call a dead branch. It expresses the fact that the empty set is excluded from the elements of a partition in the classical context, and the zero space excluded from the orthogonal decomposition in the quantum context. When computing conditioned probabilities we encounter the same problem if a set X(β) at some place in a branch has measure zero.

_{m}, thus we introduce more flexible objects, which are the ordered partitions with empty parts of Ω, resp. ordered orthogonal decompositions with zero summands of E: such a partition π

^{*}(resp. decomposition) is a family (E

_{1}, …, E

_{m}) of disjoint subsets of Ω (resp. orthogonal subspaces of E), such that their union (resp. sum) is Ω (resp. E). The only difference with respect to ordered partitions, resp. decompositions, is that we accept to repeat ø (resp. 0) an arbitrary high number of times. For shortening we will name generalized decompositions these new objects. The number m is named the degree of π

^{*}. These objects are the natural results of applying rooted observation trees embedded in an oriented half plane.

**S**and X in

**S**concerning the trees, apply to the generated generalized decompositions. The corresponding sets of generalized objets are written

**S**

^{*}(A) and ${\mathbf{S}}_{X}^{*}(A)$.

_{1}, …, n

_{m}) extends naturally to generalized decompositions, and in this case the degrees are respected, i.e., the result of this operation is a generalized decomposition of degree n

_{1}+ n

_{2}+ … + n

_{m}.

^{*}(m; n

_{1}, …, n

_{m}) for the multi-products extended to generalized decompositions, however we prefer to keep the same notation µ(m; n

_{1}, …, n

_{m}); this is justified by the following observation: to a generalized decomposition π

^{*}is associated a unique ordered decomposition (π, ω), by forgetting the empty sets (resp. zero spaces) in the family, and the multi-product is compatible with this forgetting application. The gain of the extension is the easy construction of a monad we expose now.

_{m}on generalized decompositions can be assembled in a structure of monad by using the standard Schur construction (cf. Loday et Valette [10], or Fresse, “on partitions” [16]): For each X ∈

**S**, we introduce the real vector space V

_{X}= V

_{X}(A) freely generated by the set ${\mathcal{S}}_{X}^{*}(A),$ of generalized decompositions obtained by observation trees that are allowed by A, S and X; the length m define a graduation V

_{X}(m) of V

_{X}. We put V

_{X}(0) = 0.

_{m}generate m-linear applications from products of these spaces to themselves which respect the graduation; these applications, also denoted by µ

_{m}, are parameterized by the sets ${\mathcal{S}}_{X}^{*}(m),$, whose elements are the generalized decompositions of degree m which are divided by X:

**Proposition 5.**For each X in S, the collection of operations µ

_{m}defines a linear natural transformation of functors µ

_{X}: V

_{X}◦ V

_{X}→ V

_{X}; and the trivial partition defines a linear natural transformation of functors η

_{X}: R → V

_{X}, which satisfy the axioms of a monad (cf. MacLane “Categories for Working Mathematician” 2nd ed. [4], and Alain Proute, Introduction a la Logique Categorique, 2013, Prepublications [54]):

**Proof.**The argument is the same as the argument given in Fresse (partitions …). The fact that the natural transformation µ

_{X}is well defined on the quotient by the diagonal action of the symmetric group ${\mathcal{S}}_{m}$ on ${V}_{X}(m)\otimes {\otimes}_{i}{V}_{X}({n}_{i}){\otimes}_{{\mathcal{S}}_{{n}_{1},\dots ,{n}_{m}}}{W}^{\otimes s}$ comes from the verification of the symmetry axiom and the properties of associativity and neutral element comes from the verification of the corresponding axiom.

_{Y}to the category S

_{X}of observables divided by Y and X respectively when X divides Y; therefore we have the following result:

**Proposition 6.**To each arrow X → Y in the category S is associated a natural transformation of functors ${\rho}_{X,Y}:{\mathcal{V}}_{Y}\to {\mathcal{V}}_{X}$, making a morphism of monads; this defines a contravariant functor $\mathcal{V}$ from the category S to the category of monads, that we name the arborescent structural sheaf of S and A.

_{v}edges; we define

_{X}(Γ

_{Y}

_{)}(W) associated to trees which are decorated by a subset Y in ${\mathbf{S}}_{X}^{*}(A)$, with one element Y

_{v}of S

_{X}(m) for each vertex v which gives birth to m

_{v}edges.

**Definition 16.**A divided probability law of degree m is a sequence of triplets (p, P, U) = (p

_{1}, P

_{1}, U

_{1}; …; p

_{m}, P

_{m}, U

_{m}), where p

_{i}; i = 1, …, m are positive numbers of sum one, i.e., p

_{1}+…+p

_{m}= 1, where each P

_{i}; i = 1, …, m is a classical (resp. quantum) probability law when the corresponding p

_{i}is strictly positive, and a probability law or the empty set when the corresponding p

_{i}is equal to 0, and where each U

_{i}; i = 1, …, m is the support in X of P

_{i}; moreover the U

_{i}are assumed to be orthogonal (resp. disjoint in the classical case). The letter P will designate the probability p

_{1}P

_{1}+…+p

_{m}P

_{m}, where 0.∅ = 0 when it happens.

_{X}adapted to a variable X.

_{1},…, P

_{m}) and the same supports (U

_{1}, …, U

_{m});

_{i}> 0 we have ${P}_{i}={{P}^{\prime}}_{i}$, and consequently ${U}_{i}={{U}^{\prime}}_{i}$.

_{X}(0) = 0, M

_{X}(1) is freely generated over ℝ by the elements of

**Q**

_{X}.

**Lemma 10.**The space ${\mathcal{M}}_{X}(m)$ is freely generated over ℝ by the vectors (∅, …, ∅, P

_{i}, ∅, …, ∅) of length m, where at the rank i, P

_{i}is an element of

**Q**

_{X}.

**Proof.**Let D = (p

_{1}, P

_{1}, U

_{1}), …, (p

_{m}, P

_{m}, U

_{m}) be a divided probability; we consider for each i between 1 and m the divided probability

_{i}− (∅, …, ∅, P

_{i}, ∅, …, ∅) is of type D, thus the particular vectors of the Lemma 10 generate ${\mathcal{M}}_{X}(m)$.

_{1}= 0 can be replaced by a vector where P

_{1}= ∅ using an element of type D in ${\mathcal{K}}_{X}(m)$, then we can assume that at least one of the vectors has a p

_{1}strictly positive, i.e., equals to 1. Let us consider all these vectors D

_{1}, …, D

_{s}, for 2 ≤ s ≤ r, their other numbers p

_{i}for i > 1 are zero. The other vectors D

_{j}, for j > s having the coordinate p

_{1}equal to zero. Let ∑

_{j}λ

_{j}D

_{j}be the linear combination of length r belonging to ${\mathcal{K}}_{X}(m)$; this vector is a linear combination of vectors of type L and D. We can suppose that every λ

_{j}is non-zero. Let us consider an element Q of

**Q**

_{X}which appears in at least one of the D

_{j}, j ≤ s; this Q cannot appear in only one D

_{j}, because the sum of coefficients λ multiplied by the first p

_{1}in front of any given Q in a vector L or D is zero. Thus we have at least two D

_{j}with the same P

_{1}. We can replace the sum of them with λ

_{j}positive (resp. negative) by only one special vector of the Lemma 10 using a sum of multiples of vectors of type L. Then we are left with the case of two vectors, D

_{1}, D

_{2}having P

_{1}= Q such that λ

_{1}+ λ

_{2}= 0, which means that λ

_{1}D

_{1}+ λ

_{2}D

_{2}is multiple of a vector of type D. Subtracting it we can apply the recurrence hypothesis and conclude that the considered linear relation is trivial.

**Q**

_{X}and i ∈ [m]. Such a vector, identified with (∅, .., P, …, ∅) in ${\mathcal{L}}_{X}(m)$, where only the place i is non-empty, will be named a simple vector of degree m.

_{1}, …, S

_{m}) be a sequence of generalized decompositions in ${\mathbf{S}}_{X}^{*}(A)$, of respective degrees n

_{1}, …, n

_{m}, with n = n

_{1}+ … + n

_{m}, and let (p, P, U) be an element of ${\mathcal{D}}_{X}(m)$, we define θ((p, P, U), S) as the following divided probability of degree n: if, for i = 1, …, m the decomposition S

_{i}is made of pieces ${E}_{i}^{{j}_{i}}$ where j

_{i}varies between 1 and n

_{i}, we take for ${p}_{i}^{ji}$ is the classical probability $\mathbb{P}({E}_{i}^{{j}_{i}}\cap {U}_{i})$; we take for ${P}_{i}^{{j}_{i}}$ the law P

_{i}conditioned by the event S

_{i}= j

_{i}which corresponds to ${E}_{i}^{{j}_{i}}$; and we take for ${U}_{i}^{{j}_{i}}$ the support of ${P}_{i}^{{j}_{i}}$. Then we order the obtained family of triples ${({p}_{i}^{{j}_{i}},{P}_{i}^{{j}_{i}},{U}_{i}^{{j}_{i}})}_{i=1,\dots ,m;{j}_{i}=1,\dots ,{n}_{i}}$ by the lexicographic ordering. It is easy to verify that the resulting sequence is a divided probability.

_{1}⊗…⊗S

_{m}goes to a linear combination of vectors of type L in ${\mathcal{L}}_{X}\left(n\right)$. Moreover, if p

_{i}= 0 for an index i in [m], all the ${p}_{i}^{{j}_{i}}$ are zero, thus a vector of type D goes to a vector of type D. Then the map λ

_{m}sends the subspace ${\mathcal{K}}_{X}\left(m\right)\otimes {V}_{X}\left({n}_{1}\right)\otimes \dots \otimes {V}_{X}\left({n}_{m}\right)$ into the subspace ${\mathcal{K}}_{X}\left({n}_{1}+\dots {n}_{m}\right)$, thus it defines a linear map

_{m}is independent of the S

_{j}for i ≠ i.

_{m}define a natural transformation of functors:

_{X}, µ

_{X}, ${\mathcal{F}}_{X}$, ${\mathcal{V}}_{X}$, …, but we memorize this is an abuse of language.

**Proposition 7.**The natural transformation θ defines a right action in the sense of monads, i.e., we have

**Proof.**The proof is the same as for proposition 5, by using the associativity of conditioning, and the Bayes identity P (A ∩ B) = P (A|B)P (B).

_{1}, …, S

_{m}) to 1 in $\mathcal{R}\left(n\right)=\mathbb{R}$.

_{X}(S

_{1}; S

_{2}; …; S

_{k}; (p, P, U)), indexed by X in S, where S

_{1}; …; S

_{k}here designates the sets of decompositions present in the trees at each level from 1 to k.

_{k}

_{+1}we must have

_{k}the collection (π

_{0}, …, π

_{0}), we deduce that F

_{X}is independent of the last variable.

**Definition 17.**An element of ${\mathcal{C}}^{k}\left({M}_{X}\right)$ is said regular when for each degree m and each index i between 1 and m, we have, for each ordered forest S

_{1}; S

_{2}; …; S

_{k}of m trees, and each probability Q,

_{X}(S

_{1}; S

_{2}; …; S

_{k}; (p, P, U)), indexed by X in S and forests S

_{1}; …; S

_{k}of level k. These families are supposed local with respect to X, which means that it is compatible with direct image of probabilities under observables in S

^{∗}.

**Remark 12.**As we showed in the static case, in the classical context, locality is equivalent to the fact that the values of the functions depend on ℙ through the direct images of ℙ by the joint of all the ordered observables which decorate the tree (the joint of the joints along branches); but this is not necessarily true in the quantum context, where it depends on

**Q**. However it is true for

**Q**

^{min}, in particular

**Q**

^{can}which is the most natural choice.

^{(}

^{k}

^{)}is the alternate sum of the operators ${\delta}_{i}^{\left(k\right)};0\le i\ge k+1$: if F is measurable morphism from $\mathcal{M}\circ {\mathcal{V}}^{\circ k}$ to ℝ, then

_{X}on divided probabilities; on regular cochains it is expressed by a generalization of the formula (20): if (P, i, m) is a simple vector of degree m and S

_{0}; S

_{1}; …; S

_{k}a forest of level k + 1, with m component trees, then

_{i}grafted on the branch j

_{i}of the variable S

_{0}

_{,i}at the place i in the collection S

_{0}.

**Lemma 11.**If the transformation F is regular, then δF is regular; in other terms, the regular elements form a sub-complex ${\mathcal{C}}^{k}r\left({\mathcal{M}}_{X}\right)$.

**Proof.**Let (P, i, m) be a simple vector and S

_{0}; …; S

_{k}a forest with m components; let us denote by ${S}_{0}^{j}$ the variable number j having degree n

_{j}, and n = n

_{1}+ … + n

_{m}; we have

_{1}+ … + n

_{m}which result from the division of (P, i, m) by ${S}_{0}^{i}$. If F is regular, this combination is the same as the combination of the simple vectors of degree n

_{i}constituting the division of (P, i, m) by ${S}_{0}^{i}$, which gives the same result as the first term on the right in the formula

_{S}(ℙ). Then ${H}_{\tau}^{0}$ has dimension one.

**Q**

_{X}gives invariant information co-chains. Among them the Von Neumann entropy is specially relevant because its co-boundary gives the classical entropy. However, only the constant function is an invariant zero degree co-cycle. Thus again ${H}_{U}^{0}$ has dimension one.

_{X}(S; P), such that, each time we have X → Y → S and elements of Y refines S, we have F

_{X}(S; P) = F

_{Y}(S; Y

_{∗}P). It is a cocycle when for every collection S

_{1}, …, S

_{m}of m observables, where m is the length of S, we have

_{m}(S, (S

_{1}, …, S

_{m})) is not the joint of S and the S

_{i}for i ≥ 1, except when all the S

_{i}coincide. Thus it is amazing that the ordinary entropy also satisfies this functional equation, finer than the Shannon’s identity:

**Proposition 8.**The usual entropy H(S

_{∗}ℙ) = H(S; ℙ) is an arborescent co-cycle.

**Proof.**By linearity on the module of divided probabilities ${\mathcal{M}}_{X}$, we can decompose the probability ℙ in the conditional probabilities ℙ|(S = s), thus we can restrict the proof of the lemma to the case where S = π

_{0}is the trivial partition, i.e., m = 1.

_{i}; i = 1, …, m denote the elements of the partition associated to S

_{0}and ${X}_{i}^{j};j=1,\dots ,{n}_{i}$ the pieces of the intersection of X

_{i}with the elements of the partition associate to S

_{i}; note p

_{i}the probability of the event X

_{i}and ${p}_{i}^{j}$ the probability of the event ${X}_{i}^{j}$; we have

**S**, the set A and the probability functor

**Q**are invariant under the unitary group, and if we choose a classical full subcategory $\mathcal{S}$, there is trace map from

**Q**to $\mathcal{Q}$, induces a morphism from the classical arborescent co-homology of $\mathcal{S}$, A and $\mathcal{Q}$ to the invariant quantum arborescent co-homology of

**S**, A and

**Q**.

**Theorem 4.**(i) both in the classical and the invariant quantum context, if S(A) is connected, sufficiently rich, and if Q is canonical, every 1-co-cycle is co-homologous to the entropy of Shannon; (ii) in the classical case H

^{1}( $\mathcal{S}$, A, $\mathcal{Q}$) is the vector space of dimension 1 generated by the entropy; (iii) in the quantum case ${H}_{U}^{1}\left(\mathbf{S},A,\mathbf{Q}\right)=0$, and the only invariant 0-cochain which has for co-boundary the Shannon entropy is (minus) the Von-Neumann entropy.

#### 6.4. Arborescent Mutual Information

_{1}, …, T

_{m}of respective lengths n

_{1}, …, n

_{m}and U a collection of variables ${U}_{i,j}^{k}$ of respective lengths n

_{i,j}, with i going from 1 to m, j going from 1 to n

_{i}and k going from 1 to n

_{i,j}; the notation U

_{i}denoting the collection of variables ${U}_{i,j}^{k}$ of index i.

_{t}(m) from ${\mathcal{M}}_{X}\left(m\right)$ tensorized with V

_{X}(n

_{1})⊗…⊗V

_{X}(n

_{m}) to ${\mathcal{M}}_{X}\left(n\right)$, for n = n

_{1}+ … + n

_{m}:

_{j}are used only through the orders on their elements.

_{t}(m) defines a right action of the monad V

_{X}on the Schur functor ${\mathcal{M}}_{X}$.

_{t}is defined in every degree by the formula of the simplicial bar construction, as in Equation (153) for δ, but with θ

_{t}replacing θ. It corresponds to the usual simplicial complex of the family ${\mathcal{V}}^{\circ k}$. A cochain is represented by a family of functions of probability laws F

_{X}(S

_{1}; …; S

_{k}; (P, i, m)), where S

_{1}; …; S

_{k}denotes a forest with m trees of level k. The operator δ

_{t}is given by

_{1}+ … + n

_{m}is the sum of numbers of branches of the generalized decompositions ${S}_{0}^{i}$ for i = 1, …, m.

_{1}; …; S

_{k}; (P, j, n) depends only on the tree ${S}_{1}^{j};\dots ;{S}_{k}^{j}$ rooted at the place numbered by j in the forest S

_{1}; …; S

_{k}.

**Lemma 12.**The coboundary δ

_{t}sends a regular cochain to a regular cochain.

**Proof.**Consider a simple vector (P, i, m) in $\mathcal{M}$

_{X}(m) and a forest S

_{0}; …; S

_{k}with m components; we denote by ${S}_{0}^{j}$ the variable number j having degree n

_{j}, and n = n

_{1}+ … + n

_{m}, and we consider the formula (167).

_{0}. On the other side for the tree ${S}_{0}^{i};\dots ;{S}_{k}^{i}$, if F is regular, we have

_{t}to the subcomplex ${\mathcal{C}}_{r}^{*}\left({N}_{X}\right)$, and name its homology the arborescent, or tree, topological information co-homology, written H

_{τ,t}

^{∗}(S

^{∗}, A, Q).

_{t}, as in the standard case:

**Definition 18.**Let H(T ; (P, i, m)) denotes the regular extension to forests of the usual entropy; then the mutual arborescent information between a partition S of length m and a collection T of m partitions T

_{1}, …, T

_{m}is defined by

_{i}are equal to a variable T , it gives

_{α}is an arborescent topological 2-cocycle.

_{α}(S; T ; ℙ) comports maximization of usual mutual information I(S; T ; ℙ) and unconditioned entropies H(T

_{i}; ℙ).

**Definition 19.**The mutual arborescent informations of higher orders are given by I

_{α,N}= −(δδ

_{t})

^{M}H for N = 2M + 1 odd and by I

_{α,N}= δ

_{t}(δδ

_{t})

^{M}H for N = 2M + 2 even.

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J
**1948**, 27, 379–423. [Google Scholar] - Kolmogorov, A. Combinatorial foundations of information theory and the calculus of probabilities. Russ. Math. Surv.
**1983**, 38. [Google Scholar] [CrossRef] - Thom, R. Stabilité struturelle et morphogénèse; deuxième ed.; Dunod: Paris, France, 1977; in French. [Google Scholar]
- Mac Lane, S. Categories for the Working Mathematician; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Mac Lane, S. Homology; Springer: Berlin/Heidelberg, Germany, 1975. [Google Scholar]
- Hu, K.T. On the Amount of Information. Theory Probab. Appl.
**1962**, 7, 439–447. [Google Scholar] - Baudot, P.; Bennequin, D. Information Topology I, in preparation.
- Elbaz-Vincent, P.; Gangl, H. On poly(ana)logs I. Compos. Math.
**2002**, 130, 161–214. [Google Scholar] - Cathelineau, J. Sur l’homologie de sl2 a coefficients dans l’action adjointe. Math. Scand.
**1988**, 63, 51–86. [Google Scholar] - Loday, J.L.; Valette, B. Algebraic Operads; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Matsuda, H. Information theoretic characterization of frustrated systems. Physica A
**2001**, 294, 180–190. [Google Scholar] - Brenner, N.; Strong, S.; Koberle, R.; Bialek, W. Synergy in a Neural Code. Neural Comput.
**2000**, 12, 1531–1552. [Google Scholar] - Nielsen, M.; Chuang, I. Quantum Computation and Quantum Information; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Baudot, P.; Bennequin, D. Topological forms of information. AIP Conf. Proc.
**2015**, 1641, 213–221. [Google Scholar] - Baudot, P.; Bennequin, D. Information Topology II, in preparation.
- Fresse, B. Koszul duality of operads and homology of partitionn posets. Contemp. Math. Am. Math. Soc.
**2004**, 346, 115–215. [Google Scholar] - May, J.P. The Geometry of Iterated Loop Spaces; Springer: Berlin/Heidelberg, Germany, 1972. [Google Scholar]
- May, J.P. Einfinite Ring Spaces and Einfinite Ring Spectra; Springer: Berlin/Heidelberg, Germany, 1977. [Google Scholar]
- Beck, J. Triples, Algebras and Cohomology. Ph.D. Thesis, Columbia University, New York, NY, USA, 1967. [Google Scholar]
- Baez, J.; Fritz, T.; Leinster, T. A Characterization of Entropy in Terms of Information Loss. Entropy
**2011**, 13, 1945–1957. [Google Scholar] - Marcolli, M.; Thorngren, R. Thermodynamic Semirings
**2011**, arXiv. [CrossRef] - Baudot, P.; Bennequin, D. Information Topology III, in preparation.
- Gromov, M. In a Search for a Structure, Part 1: On Entropy. 2013. Available online: http://www.ihes.fr/gromov/PDF/structre-serch-entropy-july5-2012.pdf accessed on 6 May 2015.
- Watkinson, J.; Liang, K.; Wang, X.; Zheng, T.; Anastassiou, D. Inference of Regulatory Gene Interactions from Expression Data Using Three-Way Mutual Information. Chall. Syst. Biol. Ann. N.Y. Acad. Sci.
**2009**, 1158, 302–313. [Google Scholar] - Kim, H.; Watkinson, J.; Varadan, V.; Anastassiou, D. Multi-cancer computational analysis reveals invasion-associated variant of desmoplastic reaction involving INHBA, THBS2 and COL11A1. BMC Med. Genomics.
**2010**, 3. [Google Scholar] [CrossRef] - Uda, S.; Saito, T.H.; Kudo, T.; Kokaji, T.; Tsuchiya, T.; Kubota, H.; Komori, Y.; ichi Ozaki, Y.; Kuroda, S. Robustness and Compensation of Information Transmission of Signaling Pathways. Science
**2013**, 341, 558–561. [Google Scholar] - Han, T.S. Linear dependence structure of the entropy space. Inf. Control.
**1975**, 29, 337–368. [Google Scholar] - McGill, W. Psychometrika. Multivar. Inf. Transm.
**1954**, 19, 97–116. [Google Scholar] - Kolmogorov, A.N. Grundbegriffe der Wahrscheinlichkeitsrechnung; Springer: Berlin/Heidelberg, Germany, 1933; in German. [Google Scholar]
- Artin, M.; Grothendieck, A.; Verdier, J. Théorie des topos et cohomologie étale des schémas—(SGA 4) Tome I,II,III; Springer: Berlin/Heidelberg, Germany, in French.
- Grothendieck, A. Sur quelques points d’algèbre homologique, I. Tohoku Math. J
**1957**, 9, 119–221. [Google Scholar] - Gabriel, P. Objets injectifs dans les catégories ab liennes. Séminaire Dubreil. Algèbre et théorie des nombres 12, 1–32.
- Bourbaki, N. Algèbre, chapitre 10, Algèbre homologique; Masson: Paris, France, 1980; in French. [Google Scholar]
- Cartan, H.; Eilenberg, S. Homological Algebra; The Princeton University Press: Princeton, NJ, USA, 1956. [Google Scholar]
- Tverberg, H. A new derivation of information function. Math. Scand.
**1958**, 6, 297–298. [Google Scholar] - Kendall, D. Functional Equations in Information Theory. Z. Wahrscheinlichkeitstheorie
**1964**, 2, 225–229. [Google Scholar] - Lee, P. On the Axioms of Information Theory. Ann. Math. Stat.
**1964**, 35, 415–418. [Google Scholar] - Kontsevitch, M. The 1+1/2 logarithm. Unpublished note. Reproduced in Elbaz-Vincent & Gangl, 2002 On poly(ana)logs I. Compositio Mathematica, 1995; e-print math.KT/0008089. [Google Scholar]
- Khinchin, A. Mathematical Foundations of Information Theory; Dover: New York, NY, USA; Silverman, R.A.; Friedman, M.D., Translators; From two Russian articles in Uspekhi Matematicheskikh Nauk; 1957; pp. 17–75. [Google Scholar]
- Yeung, R. Information Theory and Network Coding; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Cover, T.M.; Thomas, J. Elements of Information Theory; Wiley: Weinheim, Germany, 1991. [Google Scholar]
- Rindler, W.; Penrose, R. Spinors and Spacetime, 2nd ed; Cambridge University Press: Cambridge, UK, 1986. [Google Scholar]
- Landau, L.D.; Lifshitz, E.M. Fluid Mechanics, 2nd ed; Volume 6 of a Course of Theoretical Physics; Pergamon Press, 1959. [Google Scholar]
- Balian, R. Emergences in Quantum Measurement Processes. KronoScope
**2013**, 13, 85–95. [Google Scholar] - Borel, A.; Ji, L. Compactifications of Symmetric and Locally Symmetric Spaces. In Unitary Representations and Compactifications of Symmetric Spaces; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Doering, A.; Isham, C. Classical and quantum probabilities as truth values. J. Math. Phys.
**2012**, 53. [Google Scholar] [CrossRef] - Meyer, P. Quantum Probability for Probabilists; Springer: Berlin, Germany, 1993. [Google Scholar]
- Souriau, J. Structure des Systemes Dynamiques; Jacques Gabay: Paris, France, 1970; in French. [Google Scholar]
- Catren, G. Towards a Group-Theoretical Interpretation of Mechanics. Philos. Sci. Arch. 2013. http://philsci-archive.pitt.edu/10116/.
- Bachet Claude-Gaspar, Problèmes plaisans et délectables, qui se font par les nombres; A. Blanchard: Paris, France, 1993; p. 1612, in French.
- Jaynes, E.T.; Information, Theory. Statistical Mechanics. In Statistical Physics; Ford, K., Ed.; Benjamin: New York, NY, USA, 1963; p. 181. [Google Scholar]
- Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern.
**1968**, 4, 227–241. [Google Scholar] - Cohen, F.; Lada, T.; May, J. The Homology of Iterated Loop Spaces; Springer: Berlin, Germany, 1976. [Google Scholar]
- Prouté, A. Introduction la Logique Catégorique. 2013. Available online: www.logique.jussieu.fr/~alp/ accessed on 6 May 2015.
- Getzler, E.; Jones, J.D.S. Operads, homotopy algebra and iterated integrals for double loop spaces
**1994**, arXiv. hep-th/9403055v1. - Ginzburg, V.; Kapranov, M.M. Koszul duality for operads. Duke Math. J
**1994**, 76, 203–272. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Baudot, P.; Bennequin, D. The Homological Nature of Entropy. *Entropy* **2015**, *17*, 3253-3318.
https://doi.org/10.3390/e17053253

**AMA Style**

Baudot P, Bennequin D. The Homological Nature of Entropy. *Entropy*. 2015; 17(5):3253-3318.
https://doi.org/10.3390/e17053253

**Chicago/Turabian Style**

Baudot, Pierre, and Daniel Bennequin. 2015. "The Homological Nature of Entropy" *Entropy* 17, no. 5: 3253-3318.
https://doi.org/10.3390/e17053253