# The Information Loss of a Stochastic Map

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- 0.
- Positivity: $K\left(f\right)\ge 0$ for all $(X,p)\stackrel{f}{\to}(Y,q)$. This says that the information loss associated with a deterministic process is always non-negative.
- 1.
- Functoriality: $K(g\circ f)=K\left(g\right)+K\left(f\right)$ for every composable pair $(f,g)$ of measure-preserving maps. This says that the information loss of two successive processes is the sum of the information losses associated with each process.
- 2.
- Convex Linearity: $K(\lambda f\oplus (1-\lambda \left)g\right)=\lambda K\left(f\right)+(1-\lambda )K\left(g\right)$ for all $\lambda \in (0,1)$. This says that the information loss associated with tossing a (possibly unfair) coin in deciding amongst two processes is the associated weighted sum of their information losses.
- 3.
- Continuity: $K\left(f\right)$ is a continuous function of f. This says that the information loss does not change much under small perturbations (i.e., is robust with respect to errors).

**stochastic map**$(X,p)\stackrel{f}{\u21dd}(Y,q)$ associates with every $x\in X$ a probability distribution ${f}_{x}$ on Y such that ${q}_{y}={\sum}_{x\in X}{f}_{yx}{p}_{x}$, where ${f}_{yx}$ is the distribution ${f}_{x}$ evaluated at $y\in Y$. In terms of information flow, the space $(X,p)$ may be thought of as a probability distribution on the set of inputs for a communication channel described by the stochastic matrix ${f}_{yx}$, while $(Y,q)$ is then thought of as the induced distribution on the set of outputs of the channel.

**conditional entropy**of $(X,p)\stackrel{f}{\u21dd}(Y,q)$ and is given by the the non-negative real number

**conditional information loss**of $(X,p)\stackrel{\phantom{\rule{-0.166667em}{0ex}}f}{\u21dd}(Y,q)$ (the same letter K is used here because it agrees with the Shannon entropy difference when f is deterministic). As $H\left(f\right|p)=0$ whenever f is deterministic, the conditional information loss restricts to the category of measure preserving functions as the information loss functor of Baez, Fritz, and Leinster, while also satisfying conditions 0, 2, and 3 (i.e., positivity, convex linearity, and continuity) on the larger category of stochastic maps. However, conditional information loss is not functorial in general, and while this may seem like a defect at first glance, we prove that there is no extension of the information loss functor that remains functorial on the larger category of stochastic maps if the positivity axiom is to be preserved, thus retaining an interpretation as information loss. In spite of this, conditional information loss does satisfy a weakened form of functoriality, which we briefly describe now.

**a.e. coalescable**if and only if for every pair of elements $z\in Z$ and $x\in X$ for which ${r}_{z}>0$ and ${p}_{x}>0$, there exists a unique $y\in Y$ such that ${f}_{yx}>0$ and ${g}_{zy}>0$. Intuitively, this says that the information about the intermediate step can be recovered given knowledge about the input and output. In particular, if f is deterministic, then the pair $(f,g)$ is a.e. colescable (for obvious reasons, since knowing x alone is enough to determine the intermediate value). However, there are other many situations where a pair could be a.e. coalescable and the maps need not be deterministic. With this definition in place (which we also generalize to the setting of arbitrary Markov categories), we replace functoriality with the following weaker condition.

- 1
^{🟉}. - Semi-functoriality: $K(g\circ f)=K\left(g\right)+K\left(f\right)$ for every a.e. coalescable pair $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\u21dd}(Z,r)$ of stochastic maps. This says that the conditional information loss of two successive processes is the sum of the conditional information losses associated with each process provided that the information in the intermediate step can always be recovered.

**bloom-shriek factorization**of f is given by the decomposition $f={\pi}_{Y}\circ {\xa1}_{f}$, where ${\xa1}_{f}:X\u21ddX\times Y$ is the

**bloom**of f whose value at ${x}^{\prime}$ is the probability measure on $X\times Y$ given by sending $(x,y)$ to ${\delta}_{{x}^{\prime}x}{f}_{y{x}^{\prime}}$, where ${\delta}_{{x}^{\prime}x}$ is the Kronecker delta. In other words, ${\xa1}_{f}$ records each of the probability measures ${f}_{x}$ on a copy of Y indexed by $x\in X$. A visualization of the bloom of f is given in Figure 1a. When one is given the additional data of probability measures p and q on X and Y, respectively, then Figure 1b illustrates the bloom-shriek factorization of f. From this point of view, ${\xa1}_{f}$ keeps track of the information encoded in both p and f, while the projection map ${\pi}_{Y}$ forgets, or loses, some of this information.

- 4(a).
- Reduction: $K\left(f\right)=K\left({\pi}_{Y}\right)$, where $f={\pi}_{Y}\circ {\xa1}_{f}$ is the bloom-shriek factorization of f. This says that the conditional information loss of f equals the information loss of the projection using the associated joint distribution on $X\times Y$.

- 4(b).
- Blooming: $K\left({\xa1}_{p}\right)=0$, where ${\xa1}_{p}$ is the unique map $(\u2022,1)\u21dd(X,p)$ from a one point probability space to $(X,p)$. This says that if a process begins with no prior information, then there is no information to be lost in the process.

**Bayesian inverse**of f. The Bayesian inverse can be visualized using the bloom-shriek factorization because it itself has a bloom-shriek factorization $\overline{f}={\pi}_{X}\circ {\xa1}_{\overline{f}}$. This is obtained by finding the stochastic maps in the opposite direction of the arrows so that they reproduce the appropriate volumes of the water droplets.

- 4(c).
- Entropic Bayes’ Rule: $F\left(f\right)+F\left({\xa1}_{p}\right)=F\left(\overline{f}\right)+F\left({\xa1}_{q}\right)$ for all $(X,p)\stackrel{f}{\u21dd}(Y,q)$. This is an information theoretic analogue of Bayes’ rule, which reads ${f}_{yx}{p}_{x}={\overline{f}}_{xy}{q}_{y}$ for all $x\in X$ and $y\in Y$, or in more traditional probabilistic notation $\mathbb{P}\left(y\right|x\left)\mathbb{P}\right(x)=\mathbb{P}(x\left|y\right)\mathbb{P}\left(y\right).$

## 2. Categories of Stochastic Maps

**Definition**

**1.**

**stochastic map**$f:X\u21ddY$ associates a probability measure ${f}_{x}$ to every $x\in X$. If $f:X\u21ddY$ is such that ${f}_{x}$ is a point-mass distribution for every $x\in X$, then f is said to be

**deterministic**.

**Notation**

**1.**

**Definition**

**2.**

**nullspace**of p.

**Definition**

**3.**

**FinStoch**be the category of stochastic maps between finite sets. Given a finite set X, the identity map of X in

**FinStoch**corresponds to the identity function ${\mathrm{id}}_{X}:X\to X$. Second, given stochastic maps $f:X\u21ddY$ and $g:Y\u21ddZ$, the composite $g\circ f:X\u21ddZ$ is given by the Chapmann–Kolmogorov equation ${(g\circ f)}_{zx}:={\sum}_{y\in Y}{g}_{zy}{f}_{yx}.$

**Definition**

**4.**

**copy**of X is the diagonal embedding ${\mathsf{\Delta}}_{X}:X\to X\times X$, and the

**discard**of X is the unique map from X to the terminal object • in

**FinStoch**, which will be denoted by ${!}_{X}:X\to \u2022$. If Y is another finite set, the

**swap map**is the map $\gamma :X\times Y\to Y\times X$ given by $(x,y)\mapsto (y,x)$. Given morphisms $f:X\u21dd{X}^{\prime}$ and $g:Y\u21dd{Y}^{\prime}$ in

**FinStoch**, the

**product**of f and g is the stochastic map $f\times g:X\times Y\u21dd{X}^{\prime}\times {Y}^{\prime}$ given by ${(f\times g)}_{({x}^{\prime},{y}^{\prime})(x,y)}:={f}_{{x}^{\prime}x}{g}_{{y}^{\prime}y}.$

**FinStoch**with the structure of a monoidal category. Together with the copy, discard, and swap maps,

**FinStoch**is a Markov category [2,3].

**Definition**

**5.**

**fin**ite

**p**robabilities and

**s**tochastic maps”) be the co-slice category $\u2022\downarrow \mathbf{FinStoch}$, i.e., the category whose objects are pairs $(X,p)$ consisting of a finite set X equipped with a probability measure p, and a morphism from $(X,p)$ to $(Y,q)$ is a stochastic map $X\stackrel{f}{\u21dd}Y$ such that $q}_{y}=\sum _{x\in X}{f}_{yx}{p}_{x$ for all $y\in Y$. The subcategory of deterministic maps in $\mathbf{FinPS}$ will then be denoted by $\mathbf{FinPD}$ (which stands for “

**fin**ite

**p**robabilities and

**d**eterministic maps”). A pair $(f,g)$ of morphisms in $\mathbf{FinPS}$ is said to be a

**composable pair**iff $g\circ f$ exists.

**Remark**

**1.**

**Lemma**

**1.**

**Definition**

**6.**

**shriek**and

**bloom**of p are the unique maps to and from $(\u2022,1)$ respectively, which will be denoted ${!}_{p}:(X,p)\to (\u2022,1)$ and ${\xa1}_{p}:(\u2022,1)\u21dd(X,p)$ (the former is deterministic, while the latter is stochastic). The underlying stochastic maps associated with ${!}_{p}$ and ${\xa1}_{p}$ are ${!}_{X}:X\to \u2022$ and $p:\u2022\u21ddX$, respectively.

**Example**

**1.**

**Definition**

**7.**

**joint distribution**associated with f is the probability measure $\vartheta \left(f\right):\u2022\u21ddX\times Y$ given by $\vartheta {\left(f\right)}_{(x,y)}={f}_{yx}{p}_{x}$.

**Definition**

**8.**

**weighted convex sum**${\u2a01}_{x\in X}{p}_{x}({Y}_{x},{q}^{x})$ is defined to be the set ${\coprod}_{x\in X}{Y}_{x}$ equipped with the probability measure ${\u2a01}_{x\in X}{p}_{x}{q}^{x}$ given by

**weighted convex sum**${\u2a01}_{x\in X}{p}_{x}{Q}^{x}:\left(\right)open="("\; close=")">{\coprod}_{x\in X}{Y}_{x},{\u2a01}_{x\in X}{p}_{x}{q}^{x}$ is given by

## 3. The Baez–Fritz–Leinster Characterization of Information Loss

**Definition**

**9.**

**covariant**or

**contravariant**in the sense that for any morphism $a\stackrel{\gamma}{\to}b$ in $\mathcal{C}$, $F\left(\gamma \right)$ is a morphism from $F\left(a\right)$ to $F\left(b\right)$ or from $F\left(b\right)$ to $F\left(a\right)$, respectively. These are the only types of functions between categories we will consider in this work. As such, we therefore abuse terminology and use the term functions for such assignments throughout. If M is a commutative monoid and $\mathbb{B}M$ denotes its one object category, then every covariant function $\mathcal{C}\to \mathbb{B}M$ is also contravariant and vice-versa.

**Definition**

**10.**

**converges**to a morphism $(X,p)\stackrel{f}{\u21dd}(Y,q)$ if and only if the following two conditions hold.

- (a)
- There exists an $N\in \mathbb{N}$ for which ${X}_{n}=X$ and ${Y}_{n}=Y$ for all $n\ge N$.
- (b)
- The following limits hold: $\underset{n\to \infty}{lim}{p}_{n}=p$ and $\underset{n\to \infty}{lim}{f}_{n}=f$ (note that these limits necessarily imply $\underset{n\to \infty}{lim}{q}_{n}=q$).

**continuous**if and only if $\underset{n\to \infty}{lim}F\left({f}_{n}\right)=F\left(f\right)$ whenever $\left\{{f}_{n}\right\}$ is a sequence in $\mathbf{FinPS}$ converging to f.

**Remark**

**2.**

- (a)
- There exists an $N\in \mathbb{N}$ for which ${X}_{n}=X,{Y}_{n}=Y,$${f}_{n}=f$ for all $n\ge N$.
- (b)
- For $n\ge N$, one has $\underset{n\to \infty}{lim}{p}_{n}=p$.

**Definition**

**11.**

**convex linear**if and only if for all objects $(X,p)$ in $\mathbf{FinPS}$,

**Definition**

**12.**

**functorial**if and only if it is in fact a functor, i.e., if and only if $F(g\circ f)=F\left(f\right)+F\left(g\right)$ for every composable pair $(f,g)$ in $\mathbf{FinPS}$.

**Definition**

**13.**

**Shannon entropy**of p is given by

**Definition**

**14.**

**information loss**of f. Information loss defines a functor $K:\mathbf{FinPD}\to \mathbb{B}\mathbb{R}$, henceforth referred to as the

**information loss functor**on $\mathbf{FinPD}$.

**Theorem**

**1**

**.**Suppose $F:\mathbf{FinPD}\to \mathbb{B}{\mathbb{R}}_{\ge 0}$ is a function which satisfies the following conditions.

- 1.
- F is functorial.
- 2.
- F is convex linear.
- 3.
- F is continuous.

**Proposition**

**1.**

**Proof.**

## 4. Extending the Information Loss Functor

**Definition**

**15.**

**conditional information loss**of a morphism $(X,p)\stackrel{f}{\u21dd}(Y,q)$ in $\mathbf{FinPS}$ is the real number given by

**conditional entropy**of $(X,p)\stackrel{f}{\u21dd}(Y,q)$.

**Proposition**

**2.**

- (i)
- $K\left(f\right)\ge 0$.
- (ii)
- K restricted to $\mathbf{FinPD}$ agrees with the information loss functor (cf. Definition 14).
- (iii)
- K is convex linear.
- (iv)
- K is continuous.
- (v)
- Given $(X,p)\stackrel{f}{\u21dd}(Y,q)$, then $K\left(f\right)=K\left({\pi}_{Y}\right)$, where $(X\times Y,\vartheta \left(f\right))\stackrel{{\pi}_{Y}}{\to}(Y,q)$ is the projection and $\vartheta \left(f\right)$ is the joint distribution (cf. Definition 7).

**Lemma**

**2.**

**Proof of Lemma**

**2.**

**Proof of Proposition**

**2.**

- (i)
- The non-negativity of K follows from Lemma 2 and the equality ${q}_{y}={\sum}_{{x}^{\prime}\in X}{f}_{y{x}^{\prime}}{p}_{{x}^{\prime}}\ge {f}_{yx}{p}_{x}$.
- (ii)
- This follows from the fact that $H\left(f\right|p)=0$ for all deterministic f.
- (iii)
- Let $p:\u2022\u21ddX$ be a probability measure, and let $({Y}_{x},{q}^{x})\stackrel{{Q}^{x}}{\u21dd}({Y}_{x}^{\prime},{q}^{\prime x})$ be a collection of morphisms in $\mathbf{FinPS}$ indexed by X. Then the p-weighted convex sum ${\u2a01}_{x\in X}{p}_{x}{Q}^{x}$ is a morphism in $\mathbf{FinPS}$ of the form $(Z,r)\stackrel{h}{\u21dd}({Z}^{\prime},{r}^{\prime})$, where $Z:={\coprod}_{x\in X}{Y}_{x}$, ${Z}^{\prime}:={\coprod}_{x\in X}{Y}_{x}^{\prime}$, $h:={\u2a01}_{x\in X}{p}_{x}{Q}^{x}$, $r:={\u2a01}_{x\in X}{p}_{x}{q}^{x}$, and ${r}^{\prime}:={\u2a01}_{x\in X}{p}_{x}{q}^{\prime x}$. Then$$\begin{array}{ccc}\hfill K\left(h\right)& =& -\sum _{z\in Z}{r}_{z}log\left({r}_{z}\right)+\sum _{{z}^{\prime}\in {Z}^{\prime}}{r}_{{z}^{\prime}}^{\prime}log\left({r}_{{z}^{\prime}}^{\prime}\right)-\sum _{z\in Z}\sum _{{z}^{\prime}\in {Z}^{\prime}}{r}_{z}{h}_{{z}^{\prime}z}log\left({h}_{{z}^{\prime}z}\right)\hfill \\ & =& -\sum _{x\in X}\sum _{{y}_{x}\in {Y}_{x}}\sum _{{y}_{x}^{\prime}\in {Y}_{x}^{\prime}}{p}_{x}{q}_{{y}_{x}}^{x}{Q}_{{y}_{x}{y}_{x}^{\prime}}^{x}log\left(\right)open="("\; close=")">{Q}_{{y}_{x}{y}_{x}^{\prime}}^{x}\hfill \end{array}$$
- (iv)
- Let $\left(\right)open="("\; close=")">{X}^{\left(n\right)},{p}^{\left(n\right)}$ be a sequence (indexed by $n\in \mathbb{N}$) of probability-preserving stochastic maps such that ${X}^{\left(n\right)}=X$ and ${Y}^{\left(n\right)}=Y$ for large enough n, and where $\underset{n\to \infty}{lim}{f}^{\left(n\right)}=f,\underset{n\to \infty}{lim}{p}^{\left(n\right)}=p,$ and $\underset{n\to \infty}{lim}{q}^{\left(n\right)}=q$. Then$$\underset{n\to \infty}{lim}K\left({f}^{\left(n\right)}\right)=-\underset{n\to \infty}{lim}\sum _{x,y}{f}_{yx}^{\left(n\right)}{p}_{x}^{\left(n\right)}log\left(\right)open="("\; close=")">\frac{{f}_{yx}^{\left(n\right)}{p}_{x}^{\left(n\right)}}{{\sum}_{{x}^{\prime}}{f}_{y{x}^{\prime}}^{\left(n\right)}{p}_{{x}^{\prime}}^{\left(n\right)}}$$
- (v)
- This follows from$$\begin{array}{c}\hfill H\left(\right)open="("\; close=")">\vartheta \left(f\right)\\ =-\sum _{\begin{array}{c}x\in X\\ y\in Y\end{array}}{f}_{yx}{p}_{x}log\left({f}_{yx}{p}_{x}\right)\hfill \end{array}$$

**Remark**

**3.**

**Definition**

**16.**

**reductive**if and only if $F\left(f\right)=F\left({\pi}_{Y}\right)$ for every morphism $(X,p)\stackrel{f}{\u21dd}(Y,q)$ in $\mathbf{FinPS}$ (cf. Proposition 2 item (v) for notation).

**Proposition**

**3**

**.**Let $F:\mathbf{FinPS}\to \mathbb{B}{\mathbb{R}}_{\ge 0}$ be a function satisfying the following conditions.

- (i)
- F restricted to $\mathbf{FinPD}$ is functorial, convex linear, and continuous.
- (ii)
- F is reductive.

**Proof.**

## 5. Coalescable Morphisms and Semi-Functoriality

**Definition**

**17.**

**mediator**for the composable pair $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\u21dd}(Z,r)$ in $\mathbf{FinPS}$ if and only if

**strong mediator**for the composable pair $X\stackrel{f}{\u21dd}Y\stackrel{g}{\u21dd}Z$ in

**FinStoch**.

**Remark**

**4.**

**Proposition**

**4.**

- (a)
- For every $x\in X\backslash {N}_{p}$ and $z\in Z$, there exists at most one $y\in Y$ such that ${g}_{zy}{f}_{yx}\ne 0$.
- (b)
- The pair $(f,g)$ admits a mediator $Z\times X\stackrel{h}{\to}Y$.
- (c)
- There exists a function $Z\times X\stackrel{h}{\to}Y$ such that$${h}_{y(z,x)}{(g\circ f)}_{zx}{p}_{x}={g}_{yz}{f}_{yx}{p}_{x}\phantom{\rule{2.em}{0ex}}\forall (z,y,x)\in Z\times Y\times X.$$

**Proof.**

**Theorem**

**2**

**.**Let $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\u21dd}(Z,r)$ be a composable pair of morphisms in $\mathbf{FinPS}$. Then

**Lemma**

**3.**

**Proof of Lemma**

**3.**

**Lemma**

**4.**

**Proof of Lemma**

**4.**

**Proof of Theorem**

**2.**

**Corollary**

**1**

**.**Let $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\u21dd}(Z,r)$ be a composable pair of morphisms in $\mathbf{FinPS}$. Then $K(g\circ f)=K\left(f\right)+K\left(g\right)$ if and only if there exists a mediator $Z\times X\stackrel{h}{\to}Y$ for the pair $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\u21dd}(Z,r)$.

**Proof.**

**Example**

**2.**

**a.e. deterministic**, which means ${f}_{yx}={\delta}_{yf\left(x\right)}$ for all $x\in X\backslash {N}_{p}$ for some function f (abusive notation is used). In this case, the deviation from functoriality, (4), simplifies to

**Definition**

**18.**

**a.e. coalescable**if and only if $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\u21dd}(Z,r)$ admits a mediator $Z\times X\stackrel{h}{\to}Y$. Similarly, a pair $X\stackrel{f}{\u21dd}Y\stackrel{g}{\u21dd}Z$ of composable morphisms in

**FinStoch**is called

**coalescable**iff $X\stackrel{f}{\u21dd}Y\stackrel{g}{\u21dd}Z$ admits a strong mediator $Z\times X\stackrel{h}{\to}Y$.

**Remark**

**5.**

**Definition**

**19.**

**semi-functorial**iff $F(g\circ f)=F\left(g\right)+F\left(f\right)$ for every a.e. coalescable pair $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\u21dd}(Z,r)$ in $\mathbf{FinPS}$.

**Example**

**3.**

**Proposition**

**5.**

**Proof.**

**Lemma**

**5.**

- (i)
- $(W,s)\stackrel{e}{\to}(X,p)\stackrel{f}{\u21dd}(Y,q)$
- (ii)
- $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\to}(Z,r)$
- (iii)
- $(W,s)\stackrel{e}{\to}(X,p)\stackrel{g\circ f}{\u21dd}(Z,r)$
- (iv)
- $(W,s)\stackrel{f\circ e}{\u21dd}(Y,q)\stackrel{g}{\to}(Z,r)$

**Proof.**

## 6. Bayesian Inversion

**Definition**

**20.**

**almost everywhere equivalent**(or p-a.e.

**equivalent**) if and only if ${f}_{yx}={g}_{yx}$ for every $x\in X$ with ${p}_{x}\ne 0$. In such a case, the p-a.e. equivalence of f and g will be denoted $f\underset{p}{=}g$.

**Theorem**

**3**

**.**Let $(X,p)\stackrel{f}{\u21dd}(Y,q)$ be a morphism in $\mathbf{FinPS}$. Then there exists a morphism $(Y,q)\stackrel{\overline{f}}{\u21dd}(X,p)$ such that ${\overline{f}}_{xy}{q}_{y}={f}_{yx}{p}_{x}$ for all $x\in X$ and $y\in Y$. Furthermore, for any other morphism $(Y,q)\stackrel{{\overline{f}}^{\prime}}{\u21dd}(X,p)$ satisfying this condition, $\overline{f}\underset{q}{=}{\overline{f}}^{\prime}$.

**Definition**

**21.**

**Bayesian inverse**of $(X,p)\stackrel{f}{\u21dd}(Y,q)$. It follows that ${\overline{f}}_{xy}={p}_{x}{f}_{yx}/{q}_{y}$ for all $y\in Y$ with ${q}_{y}\ne 0$.

**Proposition**

**6.**

- (i)
- Suppose $(X,p)\stackrel{f}{\u21dd}(Y,q)$ and $(X,p)\stackrel{g}{\u21dd}(Y,q)$ are p-a.e. equivalent, and let $\overline{f}$ and $\overline{g}$ be Bayesian inverses of f and g, respectively. Then $\overline{f}\underset{q}{=}\overline{g}$.
- (ii)
- Given two morphisms $(X,p)\stackrel{f}{\u21dd}(Y,q)$ and $(Y,q)\stackrel{g}{\u21dd}(X,p)$ in $\mathbf{FinPS}$, then f is a Bayesian inverse of g if and only if g is a Bayesian inverse of f.
- (iii)
- Let $(Y,q)\stackrel{\overline{f}}{\u21dd}(X,p)$ be a Bayesian inverse of $(X,p)\stackrel{f}{\u21dd}(Y,q)$, and let $\gamma :X\times Y\to Y\times X$ be the swap map (as in Definition 4). Then $\vartheta \left(f\right)=\gamma \circ \vartheta \left(\overline{f}\right)$
- (iv)
- Let $(f,g)$ be a composable pair of morphisms in $\mathbf{FinPS}$, and suppose $\overline{f}$ and $\overline{g}$ are Bayesian inverses of f and g respectively. Then $(\overline{g},\overline{f})$ is a composable pair, and $\overline{f}\circ \overline{g}$ is a Bayesian inverse of $g\circ f$.

**Proof.**

**Definition**

**22.**

**Bayesian inversion functor**if and only if $\mathcal{B}$ acts as the identity on objects and $\mathcal{B}\left(f\right)$ is a Bayesian inverse of f for all morphisms f in $\mathbf{FinPS}$.

**Remark**

**6.**

**Corollary**

**2.**

**Proposition**

**7.**

**a.e. convex linear**in the sense that

**Proof.**

**Proposition**

**8.**

**Proof.**

**Proposition**

**9.**

**Proof.**

**Corollary**

**3.**

**Remark**

**7.**

## 7. Bloom-Shriek Factorization

**Definition**

**23.**

**bloom of f**is the stochastic map $X\stackrel{{\xa1}_{f}}{\u21dd}X\times Y$ given by the composite $X\stackrel{{\mathsf{\Delta}}_{X}}{\u21dd}X\times X\stackrel{{\mathrm{id}}_{X}\times f}{\u21dd}X\times Y$, and the

**shriek of f**is the deterministic map $X\times Y\stackrel{{!}_{f}}{\to}X$ given by the projection ${\pi}_{X}$.

**Proposition**

**10.**

- (i)
- The composite $(X,p)\stackrel{{\xa1}_{f}}{\u21dd}(X\times Y,\vartheta \left(f\right))\stackrel{{!}_{f}}{\to}(X,p)$ is equal to the identity $i{d}_{X}$.
- (ii)
- The morphism f equals the composite $(X,p)\stackrel{{\xa1}_{f}}{\u21dd}(X\times Y,\vartheta \left(f\right))\stackrel{{!}_{\overline{f}}\circ \gamma}{\to}(Y,q)$, where $\overline{f}$ denotes any Bayesian inverse of f and $\gamma :X\times Y\to Y\times X$ is the swap map.
- (iii)
- The pair $(X,p)\stackrel{{\xa1}_{f}}{\u21dd}(X\times Y,\vartheta \left(f\right))\stackrel{{!}_{\overline{f}}\circ \gamma \equiv {\pi}_{Y}}{\to}(Y,q)$ is coalescable.

**Definition**

**24.**

**bloom-shriek factorization**of f.

**Proof of Propostion**

**10.**

**Definition**

**25.**

**invariant**if and only if for every triple of composable morphisms $(W,s)\stackrel{e}{\to}(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\to}(Z,r)$ such that e and g are isomorphisms, then $F\left(f\right)=F(g\circ f\circ e)$.

**Lemma**

**6.**

**Proof.**

**Lemma**

**7.**

- (i)
- $F\left(f\right)=F\left({!}_{\overline{f}}\right)+F\left({\xa1}_{f}\right)$
- (ii)
- $F\left({\xa1}_{f}\right)={\sum}_{x\in X}{p}_{x}F\left({\xa1}_{{f}_{x}}\right)$
- (iii)
- $F\left({!}_{f}\right)={\sum}_{x\in X}{p}_{x}F\left({!}_{{f}_{x}}\right)$

**Proof.**

**Proposition**

**11.**

**Proof.**

## 8. An Intrinsic Characterization of Conditional Information Loss

**Theorem**

**4.**

- 1.
- F is semi-functorial.
- 2.
- F is convex linear.
- 3.
- F is continuous.
- 4.
- $F\left({\xa1}_{p}\right)=0$ for every probability distribution $\u2022\stackrel{p}{\u21dd}X$.

**Proof.**

**Remark**

**8.**

**Theorem**

**5.**

- 1.
- F is semi-functorial.
- 2.
- F is convex linear.
- 3.
- F is continuous.
- 4.
- $F\left({!}_{p}\right)=0$ for every probability distribution $\u2022\stackrel{p}{\u21dd}X$.

**Definition**

**26.**

**Bayesian reflection**of F.

**Remark**

**9.**

**Lemma**

**8.**

**Proof of Lemma**

**8.**

**Lemma**

**9.**

**Proof of Lemma**

**9.**

**Lemma**

**10.**

**Proof of Lemma**

**10.**

**Proof of Theorem**

**5.**

## 9. A Bayesian Characterization of Conditional Entropy

**Definition**

**27.**

**entropic Bayes’ rule**if and only if

**Remark**

**10.**

**Theorem**

**6**

**.**Suppose $F:\mathbf{FinPS}\to \mathbb{B}{\mathbb{R}}_{\ge 0}$ is a function satisfying the following conditions.

- 1.
- F is semi-functorial.
- 2.
- F is convex linear.
- 3.
- F is continuous.
- 4.
- F satisfies an entropic Bayes’ rule.

**Proof.**

**Remark**

**11.**

## 10. Concluding Remarks

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Correctable Codes and Conditional Information Loss

**Definition**

**A1.**

**disintegration**of g (or $(g,q,p)$ for clarity) if and only if $g\circ f\underset{p}{=}{\mathrm{id}}_{X}$.

**Lemma**

**A1.**

**Theorem**

**A1.**

**Proof.**

**Remark**

**A1**

**.**The vanishing of the conditional information loss is closely related to the correctability of classical codes. Our references for correctable codes include [12,13], though our particular emphasis in terms of possibilistic maps instead of stochastic maps appears to be new. The correctability of classical codes does not require the datum of a stochastic map, but rather that of a possibilistic map.

**possibilistic map**(also called a

**full relation**) from a finite set X to a finite set Y is an assignment f sending $x\in X$ to a nonempty subset ${f}_{x}\subseteq Y$. Such a map can also be viewed as a transition kernel $X\stackrel{f}{\u21dd}Y$ such that ${f}_{yx}\in \{0,1\}$ for all $x\in X$ and $y\in Y$ and for each $x\in X$ there exists a $y\in Y$ such that ${f}_{yx}=1$. A

**classical code**is a tuple $(A,X,Y,E,N)$ consisting of finite sets $A,X,Y$, an inclusion $A\stackrel{E}{\hookrightarrow}X$ (the

**encoding**), and a possibilistic map $X\stackrel{N}{\u21dd}Y$ (the

**noise**). Such a classical code is

**correctable**iff there exists a possibilistic map $Y\stackrel{D}{\u21dd}A$ (the

**recovery map**) such that $D\circ N\circ E={\mathrm{id}}_{A}$.

## Appendix B. The Markov Category Setting

**Definition**

**A2.**

**Markov category**is a symmetric monoidal category $(\mathcal{M},\otimes ,I)$, with $\otimes $ the tensor product and I the unit (associators and unitors are excluded from the notation), and where each object X in $\mathcal{M}$ is equipped with morphisms !

_{X}≡ : X → I and Δ

_{X}≡ : X → X $\otimes $ X, all satisfying the following conditions

**state**on X is a morphism $I\stackrel{p}{\to}X$, which is drawn as .

**FinStoch**is a Markov category (cf. Section 2). Although the definitions and results that follow are stated for stochastic maps, many hold for arbitrary Markov categories as well.

**Definition**

**A3**

**.**Let $(X,p)\stackrel{f}{\u21dd}(Y,q)$ be a morphism in $\mathbf{FinPS}$. The

**joint distribution**associated with f is given by the following commutative diagram/string diagram equality:

**Proposition**

**A1**

**.**The composable pair $(X,p)\stackrel{f}{\u21dd}(Y,q)\stackrel{g}{\u21dd}(Z,r)$ in $\mathbf{FinPS}$ is a.e. coalescable if and only if there exists a deterministic morphism $Z\times X\stackrel{h}{\to}Y$ such that

**Proof.**

**Remark**

**A2.**

**a.e. conditional**of F given Z is a morphism $Z\times X\stackrel{{F|}_{Z}}{\u21dd}Y$ such that

**Remark**

**A3.**

**Example**

**A1**

**.**The mediator $Z\times X\stackrel{h}{\to}Y$ in this case may be given by

**Definition**

**A4**

**.**Let $(X,p)\stackrel{f}{\u21dd}(Y,q)$ be a morphism in $\mathbf{FinPS}$. A

**Bayesian inverse**of a morphism $(X,p)\stackrel{f}{\u21dd}(Y,q)$ in $\mathbf{FinPS}$ is a morphism $(Y,q)\stackrel{\overline{f}}{\u21dd}(X,p)$ such that the following diagram commutes/string diagram equality holds:

**Alternative proof of Propotion 8.**

**Definition**

**A5**

**.**Given a stochastic map $X\stackrel{f}{\u21dd}Y$, the

**bloom**$X\stackrel{{\xa1}_{f}}{\u21dd}X\times Y$ and

**shriek**$X\times Y\stackrel{{!}_{f}}{\to}X$ of f are given by

**Diagrammatic proof of Propostion 10.**

- (i)
- First,
- (ii)
- Secondly,
- (iii)
- Finally, set the mediator function $Y\times X\stackrel{h}{\to}X\times Y$ to be the swap map. Then

## References

- Baez, J.C.; Fritz, T.; Leinster, T. A characterization of entropy in terms of information loss. Entropy
**2011**, 13, 1945–1957. [Google Scholar] [CrossRef] - Cho, K.; Jacobs, B. Disintegration and Bayesian inversion via string diagrams. Math. Struct. Comp. Sci.
**2019**, 29, 938–971. [Google Scholar] [CrossRef] [Green Version] - Fritz, T. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Adv. Math.
**2020**, 370, 107239. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing); Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
- Gromov, M. Probability, Symmetry, Linearity. 2014. Available online: https://www.youtube.com/watch?v=aJAQVletzdY (accessed on 17 November 2020).
- Fritz, T. Probability and Statistics as a Theory of Information Flow. 2020. Available online: https://youtu.be/H4qbYPPcZU8 (accessed on 11 November 2020).
- Mac Lane, S. Categories for the Working Mathematician, 2nd ed.; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1998; Volume 5. [Google Scholar] [CrossRef]
- Parzygnat, A.J. A functorial characterization of von Neumann entropy. arXiv
**2020**, arXiv:2009.07125. [Google Scholar] - Fong, B. Causal Theories: A Categorical Perspective on Bayesian Networks. Master’s Thesis, University of Oxford, Oxford, UK, 2012. [Google Scholar]
- Parzygnat, A.J. Inverses, disintegrations, and Bayesian inversion in quantum Markov categories. arXiv
**2020**, arXiv:2001.08375. [Google Scholar] - Smithe, T.S.C. Bayesian Updates Compose Optically. arXiv
**2020**, arXiv:2006.01631. [Google Scholar] - Chang, H.H. An Introduction to Error-Correcting Codes: From Classical to Quantum. arXiv
**2006**, arXiv:0602157. [Google Scholar] - Knill, E.; Laflamme, R.; Ashikhmin, A.; Barnum, H.; Viola, L.; Zurek, W.H. Introduction to Quantum Error Correction. arXiv
**2002**, arXiv:0207170. [Google Scholar] - Selinger, P. A Survey of Graphical Languages for Monoidal Categories. Lect. Notes Phys.
**2010**, 289–355. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**A visualization of bloom and the bloom-shriek factorization via water droplets as inspired by Gromov [5]. The bloom of f splits each water droplet of volume 1 (an element of X) into several water droplets whose total volume equates to 1. If X has a probability p on it, then the initial volume of that water droplet is scaled by this probability. The stochastic map therefore splits the water droplet using this scale.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fullwood, J.; Parzygnat, A.J.
The Information Loss of a Stochastic Map. *Entropy* **2021**, *23*, 1021.
https://doi.org/10.3390/e23081021

**AMA Style**

Fullwood J, Parzygnat AJ.
The Information Loss of a Stochastic Map. *Entropy*. 2021; 23(8):1021.
https://doi.org/10.3390/e23081021

**Chicago/Turabian Style**

Fullwood, James, and Arthur J. Parzygnat.
2021. "The Information Loss of a Stochastic Map" *Entropy* 23, no. 8: 1021.
https://doi.org/10.3390/e23081021