 Previous Article in Journal
An Electric Fish-Based Arithmetic Optimization Algorithm for Feature Selection Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Entropy as a Topological Operad Derivation

[email protected], Mountain View, CA 94043, USA
Entropy 2021, 23(9), 1195; https://doi.org/10.3390/e23091195
Received: 27 July 2021 / Revised: 27 August 2021 / Accepted: 7 September 2021 / Published: 9 September 2021

## Abstract

:
We share a small connection between information theory, algebra, and topology—namely, a correspondence between Shannon entropy and derivations of the operad of topological simplices. We begin with a brief review of operads and their representations with topological simplices and the real line as the main example. We then give a general definition for a derivation of an operad in any category with values in an abelian bimodule over the operad. The main result is that Shannon entropy defines a derivation of the operad of topological simplices, and that for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy. We show this is compatible with, and relies heavily on, a well-known characterization of entropy given by Faddeev in 1956 and a recent variation given by Leinster.

## 1. Introduction

In this article, we describe a simple connection between information theory, algebra, and topology. To motivate the idea, consider the function $d : [ 0 , 1 ] → R$ defined by
$d ( x ) = − x log x if x > 0 , 0 if x = 0 .$
This map satisfies an equation reminiscent of the Leibniz rule from Calculus, $d ( x y ) = d ( x ) y + x d ( y )$ for all $x , y ∈ [ 0 , 1 ]$. In other words, d is a nonlinear derivation , (Lemma 2.2.6). This derivation may also bring to mind the Shannon entropy of a probability distribution. Indeed, a probability distribution on a finite set ${ 1 , … , n }$ for $n ≥ 1$ is a tuple of nonnegative real numbers $p = ( p 1 , … , p n )$ satisfying $∑ i = 1 n p i = 1$, and the Shannon entropy of p is defined to be
$H ( p ) = − ∑ i = 1 n p i log p i = ∑ i = 1 n d ( p i ) .$
Although d is not linear, this may prompt one to wonder about settings in which Shannon entropy itself is a derivation. We describe one such setting below by showing a correspondence between Shannon entropy and derivations of the operad of topological simplices.

#### 1.1. Motivation

As evidenced by recent work, the intersection of information theory and algebraic topology is fertile ground. In 2015 tools of information cohomology were introduced in  by Baudot and Bennequin who construct a certain cochain complex for which entropy represents the unique cocycle in degree 1. In the same year, Elbaz-Vincent and Gangl approached entropy from an algebraic perspective and showed that what are known as information functions of degree 1 behave “a lot like certain derivations” . A few years prior in 2011, Baez, Fritz, and Leinster gave a category theoretical characterization of entropy in , which was recently extended to the quantum setting by Parzygnat in . In preparation of that 2011 result, Baez remarked in the informal article  that entropy appears to behave similarly to a derivation in a certain operadic context, an observation we verify and make explicit below. Cohomological ideas are also explored in Mainiero’s recent work, where entropy is found to appear in the Euler characteristic of a particular cochain complex associated to a quantum state . Upon taking inventory, one thus has the sense that entropy behaves somewhat similar to “d of something,” for some (co)boundary-like operator $d .$ The present article is in this same vein. Notably, once a few simple definitions are in place, the mathematics is quite straightforward. Even so, we feel it is worth sharing if for no other reason than to provide a glimpse at yet another algebraic and topological facet of entropy.

#### 1.2. Background

To start, our work is based on a particular characterization of Shannon entropy that is compatible with an operadic viewpoint. Let $Δ n$ denote the standard topological n-simplex for $n ≥ 0$,
$Δ n : = { ( p 0 , p 1 , … , p n ) ∈ R n + 1 ∣ 0 ≤ p i ≤ 1 and ∑ i = 0 n p i = 1 } ,$
where $Δ 0$ denotes the unique probability distribution on the one-point set. More generally, any probability distribution $p = ( p 0 , … , p n )$ on an $n + 1$-element set is a point in $Δ n$. Given $n + 1$ probability distributions $q i = ( q 0 i , … , q k i i ) ∈ Δ k i$ where $i = 0 , 1 , … , n$, they may be composed with p simultaneously to obtain a point in $Δ k 0 + k 1 + ⋯ + k n + n$ denoted by
$p ∘ ( q 0 , q 1 , … , q n ) : = ( p 0 q 0 0 , … , p 0 q k 0 0 , p 1 q 1 1 , … , p 1 q k 1 1 , … , p n q 1 n , … , p n q k n n ) .$
As shown in  and reviewed below, this composition of probabilities finds a natural home in the language of operads. Furthermore, it plays a key role in a well-known 1956 characterization of Shannon entropy due to D. K. Faddeev . A proof of a slight variation of Faddeev’s result was recently given by Leinster , (Theorem 2.5.1). That is the version we quote here.
Theorem 1
(Faddeev-Leinster). Let ${ F : Δ n → R } n ≥ 0$ be a sequence of functions. The following are equivalent:
1.
the functions F are continuous and satisfy
$F ( p ∘ ( q 0 , … , q n ) ) = F ( p ) + ∑ i = 0 n p i F ( q i )$
where $n ≥ 0$ and $p ∈ Δ n$ and $q i ∈ Δ k i$ with $k 0$, $k 1 , … , k n ≥ 0$;
2.
$F = c H$ for some $c ∈ R .$
To make the connection with derivations, let us introduce some notation. Given a probability distribution $p ∈ Δ n$ let $p ¯ : R n + 1 → R$ denote the function that maps a point $x = ( x 0 , … , x n )$ to the standard inner product $〈 p , x 〉 = ∑ i = 0 n p i x i$. Then, when $F = H$, Equation (1) may be rewritten as
$H ( p ∘ ( q 0 , … , q n ) ) = H ( p ) + p ¯ ( H ( q 0 ) , … , H ( q n ) ) .$
This equation is one hint that entropy might be a derivation, although a “q” is notably absent from the first term on the right-hand side. As a further teaser, Baez explored an algebraic interpretation of Equation (2) in the informal article , where the reader is reminded that Shannon entropy is a derivative of the partition function of a probability distribution with respect to Boltzmann’s constant, considered as a formal parameter. In that article, Equation (2) follows in a few short lines from this computation. One is thus motivated to look for a general framework of operad derivations for which Equation (2) is an example. This is what we describe below.
Section 2 reviews the definition of operads and representations of them. We will recall that the collection of topological simplices admits the structure of an operad as in  and that $R$ gives rise to a representation of it. In Section 3, we define an abelian bimodule M over any operad $O$ and the notion of a derivation of $O$ with values in M. With these definitions in place, Equation (2) will find a generalization in Proposition 1, and the main result will quickly follow.
Theorem 2.
Shannon entropy defines a derivation of the operad of topological simplices, and for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy.

## 2. Background: Operads and Their Representations

In an introduction to operads, it is helpful to first think about algebras. An algebra A is a vector space V equipped with a bilinear map $μ : V × V → V$ thought of as multiplication. Depending on whether $μ$ satisfies a particular relation, the algebra will usually be described by an approriate qualifier. For instance, if $μ ( v , w ) = μ ( w , v )$ for all $v , w ∈ V$, then A is called a commutative algebra; if $μ ( μ ( u , v ) , w ) = μ ( u , μ ( v , w ) )$ for all $u , v , w ∈ V$, then A is a called an associative algebra, and so on. Behind each of these algebras is a particular operad that encodes the behavior of the multiplication map $μ$. To motivate the formal definition, it is helpful to visualize $μ$ as a planar binary rooted tree and more generally to imagine an arbitrary n-ary operation as a planar rooted tree with n leaves. There is a natural way to compose such operations. For instance, when f is a 3-ary operation and g is a 4-ary operation, they may be composed to obtain a 6-ary operation by using the output of g as one of the inputs of f as illustrated in Figure 1. There g has been grafted into the second leaf of the tree associated to f, and so we denote that choice with the subscript “$∘ 2$” in the figure. There are two other composites $f ∘ 1 g$ and $f ∘ 3 g$, which are not shown but are obtained similarly.
In general, there are n ways to compose an m-ary operation with an n-ary operation, and the resulting operation will always have arity $m + n − 1$. This composition should further satisfy some sensible associativity and unital axioms, and the collection of all such operations with their compositions is called an operad. The concept has origins in category theory  and has been used extensively in algebraic topology and homotopy theory [10,11,12,13,14] with applications in physics as well [15,16]. Operads may be defined in any symmetric monoidal category, and for ease of exposition below, we will assume all categories $C$ are concrete (that is, all objects have underlying sets) so that we may refer to elements in a given object of $C$. Indeed, the main example to have in mind is the category of topological spaces.
Definition 1.
Let $C$ be a symmetric monoidal category with monoidal product $⊗ .$. Anoperadin $C$ consists of a sequence of objects ${ O ( 1 ) , O ( 2 ) , … }$ together with morphisms
$∘ i : O ( n ) ⊗ O ( m ) → O ( n + m − 1 )$
in $C$ for all $n , m ≥ 1$ and $1 ≤ i ≤ n$ and an operation $1 ∈ O ( 1 )$ satisfying the following:
(i)
[associativity] For all $p ∈ O ( n )$ and $q ∈ O ( m )$ and $r ∈ O ( k )$,
$( p ∘ j q ) ∘ i r = ( p ∘ i r ) ∘ j + k − 1 q if 1 ≤ i ≤ j − 1 p ∘ j ( q ∘ i − j + 1 r ) if j ≤ i ≤ j + m − 1 ( p ∘ i − m + 1 r ) ∘ j q if i ≥ j + m$
(ii)
[identity] The operation $1 ∈ O ( 1 )$ acts as an identity in the sense that
$1 ∘ 1 p = p ∘ i 1 = p$
for all $p ∈ O ( n )$ and $1 ≤ i ≤ n .$
The definition is conceptually simple despite its cumbersome appearance. For instance, Figure 2 illustrates the associativity requirements listed in item (i).
As mentioned above, one often thinks of the elements $O ( n )$ as abstract n-to-1 operations, and the morphisms $∘ i$ specify a way to compose them. It is common to begin indexing the sequence of objects at $n = 0$ to account for 0-ary operations, but as we will soon see, our main example of an operad in Example 2 will have no 0-ary operations, and so our definition starts with $O ( 1 )$. We do not consider an action of the symmetric group and so $O$ is sometimes called a non-symmetric operad, but we will simply call it an operad. In the special case when $C$ is the category of vector spaces with linear maps and ⊗ is the tensor product, $O$ is often called a linear operad. When it is the category $Top$ of topological spaces with continuous maps and ⊗ is the Cartesian product, $O$ is often called a topological operad.
Example 1.
Given a set X, theendomorphism operadis $End X = { End X ( 1 ) , End X ( 2 ) , … }$ where $End X ( n ) : = C ( X n , X )$ denotes the set of all functions from the n-fold Cartesian product $X n$ to X. The unit operation in $End X ( 1 )$ is the identity function $id X : X → X .$ If $f ∈ C ( X n , X )$ and $g ∈ C ( X m , X )$ are a pair of functions, then for each $i = 1 , … , n$ the composition $f ∘ i g$ is obtained by using the output of g as the ith input of $f .$ Explicitly, given $( x 1 , … , x n + m − 1 ) ∈ X n + m − 1$,
$( f ∘ i g ) ( x 1 , … , x n + m − 1 ) : = f ( x 1 , … , x i − 1 , g ( x i , … , x i + m − 1 ) , x i + m , … , x n + m − 1 ) .$
The simultaneous composition of several functions may also be considered. That is, given n functions $g i ∈ C ( X k i , X )$ where $i = 1 , … , n$ they may be composed with f simultaneously to obtain a new function $f ∘ ( g 1 , … , g n ) ∈ C ( X k 1 + ⋯ + k n , X )$, which is again defined by using the outputs of the $g i$ as the inputs of $f .$ Explicitly, given $( x 1 , … , x k 1 + ⋯ + k n ) ∈ X k 1 + ⋯ + k n$, we have
$( f ∘ ( g 1 , … , g n ) ) ( x 1 , … , x k 1 + ⋯ + k n ) = f ( g 1 ( x 1 , … , x k 1 ) , … , g n ( x k 1 + ⋯ + k n − 1 + 1 , … , x k 1 + ⋯ + k n ) )$
Example 2.
The simplices $Δ 0 , Δ 1 , Δ 2 , …$ give rise to a topological operad calledthe operad of topological simplices$Δ = { Δ 1 , Δ 2 , … }$ where $Δ n : = Δ n − 1$. The unit operation in $Δ 1$ is the unique probability distribution on a one-point set. If $p = ( p 1 , … , p n ) ∈ Δ n$ and $q = ( q 1 , … , q m ) ∈ Δ m$ are probability distributions, then the composition $p ∘ i q$ is obtained by multiplying each of the m coordinates of q by $p i$ and then replacing the $i th$ coordinate of p with the resulting m-tuple. Explicitly,
$p ∘ i q : = ( p 1 , … , p i q 1 , … , p i q m , … , p n ) ∈ Δ n + m − 1 .$
Equivalently, the distribution p may be visualized as a planar tree with n leaves labeled by the probabilities $p 1 , … , p n$ and similarly for q. Then the composition $p ∘ i q$ is obtained by “painting” each of the leaves of q with the probability $p i$ and grafting the resulting tree into the $i th$ leaf of p as below. Notice the sum of the probabilities on the leaves on the composite tree is 1.
As an example, if $p = 1 6 , … , 1 6$ represents the probability distribution of rolling a six-sided die and $q = 1 2 , 1 2$ is that of a fair coin toss, then $p ∘ 3 q = 1 6 , 1 6 , 1 12 , 1 12 , 1 6 , 1 6 , 1 6$ is a point in $Δ 7$, whose picture is shown on the left of Figure 3.
Further recall that if we have n different distributions $q i = ( q 1 i , … , q k i i ) ∈ Δ k i$ where $i = 1 , … , n$, then we may compose them with p simultaneously to obtain the following point in $Δ k 1 + ⋯ + k n ,$
$p ∘ ( q 1 , … , q n ) = ( p 1 q 1 1 , … , p 1 q k 1 1 , p 2 q 1 2 , … , p 2 q k 2 2 , … , p n q 1 n , … , p n q k n n ) .$
This simultaneous composition is illustrated by the tree on the right in Figure 3.
Just as groups come to life when considering representations of them, so operads come to life when each abstract n-ary operation is mapped to a concrete n-ary operation on a particular object. This assignment is traditionally called an algebra of the operad, but we prefer the more descriptive name representation.
Definition 2.
Let $O$ be an operad in the category of sets. Arepresentation of $O$, or an$O$-representation, is set X together with functions
$φ n : O ( n ) → End X ( n ) for n ≥ 1$
that respect the operad unit and compositions. That is, $φ n ( 1 ) = 1$ and
$φ n + m − 1 ( p ∘ i q ) = φ n ( p ) ∘ i φ m ( q )$
for all $p ∈ O ( n ) , q ∈ O ( m )$ and $1 ≤ i ≤ n$.
Importantly, one may also wish to define a representation of an operad in any symmetric monoidal category $C$ whenever “$End X ( n )$” is in fact an object in $C .$ It must consist of an object X together with a family of morphisms $O ( n ) → End X ( n )$ in $C$ that are compatible with the operad unit and compositions. This holds, for instance, when the monoidal category $C$ is also closed—that is, when it is equipped with an internal hom functor that is compatible with the monoidal product. Monoidal closure, however, will not be required in our work, which primarily concerns the category $Top$ of topological spaces. Indeed, the main example to have in mind is when $O = Δ$ is the operad of simplices and $X = R$ is the real line in $Top$. In this case, we define $End R ( n ) : = Top ( R n , R )$ to be the space of continuous functions $R n → R$ equipped with the product topology. Now, consider the continuous maps $φ n : Δ n → End R ( n )$ given by $p ↦ φ n ( p )$ where $φ n ( p ) ( x ) : = 〈 p , x 〉 = ∑ i = 1 n p i x i$ whenever $x = ( x 1 , … , x n ) ∈ R n$. Then, it is simple to check that $φ n + m − 1 ( p ∘ i q ) = φ n ( p ) ∘ i φ m ( q )$ for all $p , q ,$ and i and that $φ n ( 1 ) = 1$ for all n, and so $R$ is a representation of $Δ .$

## 3. Derivations of the Operad of Simplices

With these basic definitions in hand, the present goal is to define a mapping d out of the topological operad $Δ$ that satisfies an appropriate version of the Leibniz rule,
$d ( p ∘ i q ) = d p ∘ i q + p ∘ i d q ( desideratum )$
for all $p ∈ Δ n$ and $q ∈ Δ m$ and for all $1 ≤ i ≤ n$. This desired equation suggests the codomain of d should be a (bi)module over $Δ$ that is, moreover, an abelian monoid. This motivates the following two definitions, the first of which is a slight generalization of that given by Markl in .
Definition 3.
Let $O = { O ( 1 ) , O ( 2 ) , … }$ be an operad in a symmetric monoidal category $C$. Abimodule over $O$, or simply an$O$-bimodule, is a collection of objects $M = { M ( 1 ) , M ( 2 ) , … }$ in $C$ together with morphisms
$∘ i L = O ( n ) ⊗ M ( m ) → M ( n + m − 1 ) ( left composition ) ∘ i R = M ( n ) ⊗ O ( m ) → M ( n + m − 1 ) ( right composition )$
in $C$ for each $1 ≤ i ≤ n$ such that whenever
$p ⊗ q ⊗ r ∈ M ( n ) ⊗ O ( m ) ⊗ O ( k ) , or O ( n ) ⊗ M ( m ) ⊗ O ( k ) , or O ( n ) ⊗ O ( m ) ⊗ M ( k )$
the following holds:
$( p ∘ j q ) ∘ i r = ( p ∘ i r ) ∘ j + k − 1 q if 1 ≤ i ≤ j − 1 p ∘ j ( q ∘ i − j + 1 r ) if j ≤ i ≤ j + m − 1 ( p ∘ i − m + 1 r ) ∘ j q if i ≥ j + m .$
The associativity requirements displayed in Equation (4)—and hence the intuition behind them—are completely analogous to those defining operads as illustrated in Figure 2. The only difference here is that one of the three operations may come from the bimodule rather than the operad. Here is the main example to have in mind.
Example 3.
As every algebra is a bimodule over itself, so every representation of $O$ is an $O$-bimodule in a straightforward way. Indeed, in the case of the topological operad of simplices, the maps comprising the Δ-representation structure on $R$ induce a Δ-bimodule structure on $End R$. However, we will make use of a slight variant of this bimodule structure. Right composition will be defined in the expected way, though left composition will not. Explicitly, we define the left and right composition maps
$∘ i L : Δ n × Top ( R m , R ) ⟶ Top ( R n + m − 1 , R ) ∘ i R : Top ( R n , R ) × Δ m ⟶ Top ( R n + m − 1 , R )$
as follows. Given a probability distribution $p ∈ Δ n$and a continuous function$f : R m → R$, define left composition by $p ∘ i L f : = p ¯ ∘ ( 0 , … , 0 , f , 0 , … , 0 )$, where the composition on the right-hand side is defined as in the simultaneous composition in the endomorphism operad of $R$ illustrated in Example 1, and where each 0 denotes the zero function $R → R$. Here, recall that $p ¯ : R n → R$ maps a point x to the standard inner product $〈 p , x 〉$as introduced in Section 1. Unwinding this, left composition thus evaluates explicitly as $( p ∘ i L f ) ( x 1 , … , x n + m − 1 ) = p i f ( x i , … , x i + m − 1 )$. In words, the value of the left composite $p ∘ i L f : R n + m − 1 → R$ at a point x is computed by evaluating f at the m-subtuple of x beginning at the $i th$coordinate and scaling that output by $p i$. All other coordinates of x are ignored. The picture to have in mind is that below, where the bold dots are imagined to be “plugs” that prevent the surplus coordinates from playing a role. In this picture,$n = 3$and$m = 2$.
Given a probability distribution $q ∈ Δ m$ and a continuous function $g : R n → R$, define right composition by
$( g ∘ i R q ) ( x 1 , … , x n + m − 1 ) : = g ( x 1 , … , x i − 1 , ∑ k = 1 m q k x i + k − 1 , x i + m , … , x n + m − 1 ) .$
This may be understood visually as well. The value of the right composite $g ∘ i R q : R n + m − 1 → R$ at a point x is computed by taking the inner product of q with the m-tuple of x beginning at the $i th$ coordinate and using that number as the $i th$ input of g with all other coordinates of x falling into place as in the picture below. There are no “plugs” in this instance since all coordinates of x play a role.
These examples suggest the inner product notation is a convenient choice. Given $N ≥ 1$ and $k ≤ N$ and a point $x ∈ R N$, let $x i , k ∈ R k$ denote the k-subtuple of x beginning at the $i th$ coordinate:
$x i , k : = ( x i , … , x i + k − 1 ) .$
Then given any point $x ∈ R n + m − 1$, the left and right composition maps may be written more succinctly as
$( p ∘ i L f ) ( x ) = p i f ( x i , m ) ( g ∘ i R q ) ( x ) = g ( x 1 , … , x i − 1 , 〈 q , x i , m 〉 , x i + m , … , x n + m − 1 ) .$
We will use this notation below and will always write $x i$ in lieu of $x i , m$ since the context will make it clear that $x i$ must be an m-tuple. The boldface font is used to distinguish a tuple $x i$ from a real number $x i$. Finally, note that the maps $∘ i L$ and $∘ i R$ are continuous since f and g are continuous, and moreover that the associativity requirements in Equation (4) are analogous to those illustrated in Figure 2, so it is straightforward to verify they are satisfied. In particular, the zero functions appearing in the definition of $∘ i L$ simplify the situation greatly. For instance, several of associativity requirements follow from the simple fact that multiplying an input $x i$ by a probability and then mapping the result to zero is the same as first mapping the input to zero and then multiplying that zero by a probability. So $End R$ is indeed a Δ-bimodule.
Next, recall that the desired Leibniz rule in Equation (3) suggests the bimodule should be equipped with a notion of addition. This motivates the following definition.
Definition 4.
Let $O$ be an operad in a symmetric monoidal category $C$. An $O$-bimodule M is anabelian $O$-bimoduleif each $M ( n )$ is an abelian monoid in $C$; that is, if for each $n = 1 , 2 , …$ the following hold:
(i)
[associativity, commutativity] there is a morphism $μ n : M ( n ) × M ( n ) → M ( n )$ in $C$ such that $μ n ( μ n ( a , b ) , c ) = μ n ( a , μ n ( b , c ) )$ and $μ n ( a , b ) = μ n ( b , a )$ for all $a , b , c ∈ M ( n )$,
(ii)
[identity] there is an element $1 ∈ M ( n )$ such that $μ n ( 1 , a ) = a = μ n ( a , 1 )$ for all $a ∈ M ( n )$.
As the primary example, consider $End R$ viewed as a $Δ$-bimodule as described in Example 3. For each $n ,$ define $μ n : End R ( n ) × End R ( n ) → End R ( n )$ by pointwise addition, meaning that for each $f , g ∈ End R ( n )$ we have $μ n ( f , g ) = f + g$ where $( f + g ) ( x ) : = f ( x ) + g ( x )$ for all $x ∈ R n$. The identity element in $End R ( n )$ is the constant map at zero. Moreover each $μ n$ is continuous and inherits associativity and commutativity from $R .$ In this way, $End R$ is an abelian $Δ$-bimodule.
Remark 1.
Notice that the Δ-bimodule composition maps $∘ i L$ and $∘ i R$ distribute over sums in the abelian Δ-bimodule $End R$. In other words, for all continuous functions $f , g ∈ End R ( n )$ and for all probability distributions $q ∈ Δ m$,
$( f + g ) ∘ i R q = f ∘ i R q + g ∘ i R q , 1 ≤ i ≤ n$
and similarly for left composition $∘ i L$. This follows directly from pointwise addition.
With this setup in mind, our desideratum in Equation (3) is now realized in the following definition.
Definition 5.
Let $O$ be an operad in a category $C$ and let M be an abelian $O$-bimodule. Aderivation of $O$ valued in M is sequence of morphisms ${ d n : O ( n ) → M ( n ) }$ in $C$ satisfying
$d n + m − 1 ( p ∘ i q ) = d n p ∘ i R q + p ∘ i L d m q$
for all $p ∈ O ( n ) , q ∈ O ( m )$ and for all $1 ≤ i ≤ n .$
In the special case when $O$ is a linear operad, this definition coincides with that given by Markl in . In what follows, we omit the subscripts and simply write d instead of $d n$. Now, suppose $O = Δ$ is the operad of topological simplices and $End R$ is equipped with the structure of an abelian $Δ$-bimodule given above. Here is the picture to have in mind for Equation (5):
On the right-hand side we have used the “plug” notation introduced in Example 3, which can also be understood explicitly by evaluating d at a point $x ∈ R n + m − 1$,
$d ( p ∘ i q ) ( x ) = ( d p ∘ i R q ) ( x ) + ( p ∘ i L d q ) ( x ) = d p ( x 1 , … , 〈 q , x i 〉 , … , x n + m − 1 ) + p i d q ( x i ) .$
Of particular interest is the behavior of a derivation ${ d : Δ n → End R ( n ) }$ when it is applied to a simultaneous composition of probability distributions. A derivation applied to the composite $( p ∘ j q ) ∘ i r$ for probability distributions $p ∈ Δ n , q ∈ Δ m$, and $r ∈ Δ k$ can be understood in a convenient picture when q and r are composed onto different leaves of p; that is, when $1 ≤ i ≤ j − 1$ or $i ≥ j + m$. This follows straightforwardly from a repeated application of d. Indeed, by definition we have $d ( ( p ∘ j q ) ∘ i r ) = d ( p ∘ j q ) ∘ i R r + ( p ∘ j q ) ∘ i L d r$ and by applying the Leibniz rule again to the first summand, this is equal to $( d p ∘ j R q + p ∘ j L d q ) ∘ i R r + ( p ∘ j q ) ∘ i L d r ,$ which we can expand to obtain $( d p ∘ j R q ) ∘ i R r + ( p ∘ j L d q ) ∘ i R r + ( p ∘ j q ) ∘ i L d r$ since composition distributes over sums as noted in Remark 1. We will identify this function with the picture below in lieu of the cumbersome notation.
Importantly, the obvious generalization of the formula holds for any simultaneous composition $p ∘ ( q 1 , … , q n )$ for any $p ∈ Δ n$ and $q i ∈ Δ k i$ where $i = 1 , … , n$. This again follows directly from repeated applications of Equation (5), as illustrated below.
This is summarized in the following proposition.
Proposition 1.
Let $p ∈ Δ n$ and $q i ∈ Δ k i$ for $n , k 1 , … , k n ≥ 1$ and let ${ d : Δ n → End R ( n ) }$ be a derivation of the operad of topological simplices. Then for any point $x ∈ R k 1 + ⋯ + k n$,
$d ( p ∘ ( q 1 , … , q n ) ) ( x ) = d p ( 〈 q 1 , x 1 〉 , ⋯ , 〈 q n , x n 〉 ) + ∑ i = 1 n p i d q i ( x i ) .$
Finally, the main result follows.
Theorem 3.
Shannon entropy defines a derivation of the operad of topological simplices, and for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy.
Proof.
For each $n ≥ 1$ define $d : Δ n → End R ( n )$ by $p ↦ d p$ where $d p ( x ) = H ( p )$ is constant for all $x ∈ R n$. Then, d is continuous since H is continuous. Moreover, if $p = ( p 1 , … , p n ) ∈ Δ n$ and $q = ( q 1 , … , q m ) ∈ Δ m$ are probability distributions, then for any $x ∈ R m + n − 1$ and $1 ≤ i ≤ n$, we have
$d ( p ∘ i q ) ( x ) = H ( p ∘ i q ) = − ∑ k = 1 i − 1 p k log p k + p i ∑ k = 1 m q k log ( p i q k ) + ∑ k = i + 1 n p k log p k = − ∑ k = 1 i − 1 p k log p k + p i log p i ∑ k = 1 m q k + p i ∑ k = 1 m q k log q k + ∑ k = i + 1 n p k log p k = − ∑ k = 1 n p k log p k + p i ∑ k = 1 m q k log q k = H ( p ) + p i H ( q ) = ( d p ∘ i R q + p ∘ i L d q ) ( x ) ,$
where the last line follows since $( d p ∘ i R q ) ( x )$ is computed by evaluating the function $d p$ at some point, and this function is assumed to be constant at $H ( p ) .$
Conversely, suppose ${ d : Δ n → End R ( n ) }$ is a derivation. For each $n ≥ 1$ define a function $F : Δ n → R$ by $F ( p ) = d p ( 0 )$ where $0 = ( 0 , … , 0 ) ∈ R n$. Then F is continuous since d is continuous, and Proposition 1 further implies that
$F ( p ∘ ( q 1 , … , q n ) ) = d ( p ∘ ( q 1 , … , q n ) ) ( 0 ) = d p ( 〈 q 1 , 0 1 〉 , … , 〈 q n , 0 n 〉 ) + ∑ i = 1 n p i d q i ( 0 i ) = d p ( 0 ) + ∑ i = 1 n p i d q i ( 0 ) = F ( p ) + ∑ i = 1 n p i F ( q i ) .$
From the Faddeev–Leinster result in Theorem 1, it follows that $d p ( 0 ) = F ( p ) = c H ( p )$ for some $c ∈ R$. □
Notice that the important Equation (2) mentioned in the introduction is obtained as a corollary. Indeed, if for each $n ≥ 1$ the map $d : Δ n → End R ( n )$ is defined to be constant at entropy $p ↦ d p ≡ H ( p )$, then d is a derivation by Theorem 3 and so Proposition 1 yields the following by evaluating $d ( p ∘ ( q 1 , … , q n ) )$ at any point.
Corollary 1.
Let $p ∈ Δ n$ and $q i ∈ Δ k i$ with $1 ≤ i ≤ n .$ Then
$H ( p ∘ ( q 1 , … , q n ) ) = H ( p ) + ∑ i = 1 n p i H ( q i ) .$
As a closing remark, Faddeev’s characterization of entropy in Theorem 1 can be reexpressed using the language of category theory and operads as in , (Theorem 12.3.1). We have omitted this language here but invite the reader to explore the full category theoretical story in Chapter 12 of Leinster’s book.

## Funding

This research received no external funding.

Not applicable.

## Acknowledgments

I thank Darij Grinberg, Joey Hirsh, Tom Leinster, Jim Stasheff, and John Terilla for helpful discussions as well as the anonymous referees for their insightful feedback.

## Conflicts of Interest

The author declares no conflict of interest.

## References

1. Leinster, T. Entropy and Diversity: The Axiomatic Approach; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
2. Baudot, P.; Bennequin, D. The Homological Nature of Entropy. Entropy 2015, 17, 3253–3318. [Google Scholar] [CrossRef]
3. Elbaz-Vincent, P.; Gangl, H. Finite Polylogarithms, Their Multiple Analogues and the Shannon Entropy. In Lecture Notes in Computer Science; Nielsen, F., Barbaresco, F., Eds.; Geometric Science of Information. GSI 2015; Springer: Cham, Switzerland, 2015; Volume 9389, pp. 277–285. [Google Scholar]
4. Baez, J.C.; Fritz, T.; Leinster, T. A characterization of entropy in terms of information loss. Entropy 2011, 13, 1945–1957. [Google Scholar] [CrossRef]
5. Parzygnat, A.J. A functorial characterization of von Neumann entropy. arXiv 2020, arXiv:2009.07125. [Google Scholar]
6. Baez, J.C. Entropy as a Functor. Blog Post. 2011. Available online: https://www.ncatlab.org/johnbaez/show/Entropy+as+a+functor (accessed on 15 July 2021).
7. Mainiero, T. Homological Tools for the Quantum Mechanic. arXiv 2019, arXiv:1901.02011. [Google Scholar]
8. Faddeev, D.K. On the concept of entropy of a finite probabilistic scheme. Uspekhi Mat. Nauk 1956, 11, 227–231. (In Russian) [Google Scholar]
9. Lambek, J. Deductive systems and categories. In Deductive systems and categories II. Standard constructions and closed categories; Hilton, P., Ed.; Category Theory, Homology Theory and their Applications, I (Battelle Institute Conference, Seattle, 1968); Springer: Berlin/Heidelberg, Germany, 1969; Volume 68. [Google Scholar]
10. May, J. The Geometry of Iterated Loop Spaces. In Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1972; Volume 271. [Google Scholar]
11. Boardman, J.M.; Vogt, R. Homotopy Invariant Algebraic Structures on Topological spaces. In Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1973; Volume 347. [Google Scholar]
12. Loday, J.L.; Vallette, B. Algebraic Operads; Grundlehren der mathematischen Wissenschaften; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
13. Vallette, B. Algebra + Homotopy = Operad. arXiv 2012, arXiv:1202.3245. [Google Scholar]
14. Stasheff, J. What is... an operad? Notices Amer. Math. Soc. 2004, 51, 630–631. [Google Scholar]
15. Markl, M. Models for operads. Commun. Algebra 1996, 24, 1471–1500. [Google Scholar] [CrossRef]
16. Markl, M.; Shnider, S.; Stasheff, J. Operads in Algebra, Topology and Physics; Mathematical Surveys and Monographs, American Mathematical Society: Providence, RI, USA, 2002. [Google Scholar]
Figure 1. One of the three ways to compose a 4-ary operation g with a 3-ary operation f.
Figure 1. One of the three ways to compose a 4-ary operation g with a 3-ary operation f.
Figure 2. Associativity in an operad. (Left) First composing q with p and then r is the same as first composing r with p and then q. The order in which this is performed does not matter. (Right) The same is true if r appears to the right, rather than the left, of $q .$ (Middle) Likewise, r may first be composed with q and their composite may then be composed with p, or q may be first composed with p followed by $r .$ Again, the order does not matter.
Figure 2. Associativity in an operad. (Left) First composing q with p and then r is the same as first composing r with p and then q. The order in which this is performed does not matter. (Right) The same is true if r appears to the right, rather than the left, of $q .$ (Middle) Likewise, r may first be composed with q and their composite may then be composed with p, or q may be first composed with p followed by $r .$ Again, the order does not matter.
Figure 3. (Left) A picture of the composition $p ∘ 3 q$ when p is the probability distribution associated to a six-sided die and q is that of a fair coin toss. (Right) The simultaneous composition of n probability distributions $q i ∈ Δ k i$ with a given $p ∈ Δ n$.
Figure 3. (Left) A picture of the composition $p ∘ 3 q$ when p is the probability distribution associated to a six-sided die and q is that of a fair coin toss. (Right) The simultaneous composition of n probability distributions $q i ∈ Δ k i$ with a given $p ∈ Δ n$.
 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Share and Cite

MDPI and ACS Style

Bradley, T.-D. Entropy as a Topological Operad Derivation. Entropy 2021, 23, 1195. https://doi.org/10.3390/e23091195

AMA Style

Bradley T-D. Entropy as a Topological Operad Derivation. Entropy. 2021; 23(9):1195. https://doi.org/10.3390/e23091195

Chicago/Turabian Style

Bradley, Tai-Danae. 2021. "Entropy as a Topological Operad Derivation" Entropy 23, no. 9: 1195. https://doi.org/10.3390/e23091195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.