# Coarse-Graining and the Blackwell Order

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

**Theorem**

**1.**

- 1.
- When the agent chooses ${\kappa}_{2}$ (and uses the decision rule that is optimal for ${\kappa}_{2}$), her expected utility is always at least as big as the expected utility when she chooses ${\kappa}_{1}$ (and uses the optimal decision rule for ${\kappa}_{1}$), independent of the utility function and the distribution of the input S.
- 2.
- ${\kappa}_{1}$ is a garbling of ${\kappa}_{2}$.

- S and ${X}_{1}$ are independent given $f\left(S\right)$.
- $({X}_{1}\leftarrow f\left(S\right))\phantom{\rule{0.277778em}{0ex}}\le \phantom{\rule{0.277778em}{0ex}}({X}_{2}\leftarrow f\left(S\right))$.
- $({X}_{1}\leftarrow S)\phantom{\rule{0.277778em}{0ex}}\nleqq \phantom{\rule{0.277778em}{0ex}}({X}_{2}\leftarrow S)$.

## 2. Pre-Garbling

**Example**

**1.**

**Lemma**

**1.**

**Proof.**

**Theorem**

**2.**

- 1.
- Under the optimal decision rule, when the agent chooses ${X}_{2}$, her expected utility is always at least as large as the expected utility when she chooses ${X}_{1}$, independent of the utility function.
- 2.
- $({X}_{1}\leftarrow S)\le ({X}_{2}\leftarrow S)$.

## 3. Pre-Garbling by Coarse-Graining

**Proposition**

**1.**

- 1.
- S and ${X}_{1}$ are independent given $f\left(S\right)$.
- 2.
- $({X}_{1}\leftarrow f\left(S\right))\phantom{\rule{0.277778em}{0ex}}<\phantom{\rule{0.277778em}{0ex}}({X}_{2}\leftarrow f\left(S\right))$.
- 3.
- $({X}_{1}\leftarrow S)\phantom{\rule{0.277778em}{0ex}}\nleqq \phantom{\rule{0.277778em}{0ex}}({X}_{2}\leftarrow S)$.

**Proof**

**of**

**Proposition**

**1.**

- S and ${X}_{1}$ are independent given $f\left(S\right)$.
- $({X}_{1}\leftarrow f\left(S\right))\phantom{\rule{0.277778em}{0ex}}\le \phantom{\rule{0.277778em}{0ex}}({X}_{2}\leftarrow f\left(S\right))$.
- $({X}_{1}\leftarrow S)\phantom{\rule{0.277778em}{0ex}}\nleqq \phantom{\rule{0.277778em}{0ex}}({X}_{2}\leftarrow S)$.

**Proposition**

**2.**

**Proof**

**of**

**Proposition**

**2.**

**Lemma**

**2.**

**Proof.**

**Lemma**

**3.**

**Proof.**

## 4. Examples

**Example**

**2.**

_{1}, X

_{2}). By symmetry, the joint distributions of the pairs $(f\left(S\right),{X}_{1})$ and $(f\left(S\right),{X}_{2})$ are identical, and so the two channels ${X}_{1}\leftarrow f\left(S\right)$ and ${X}_{2}\leftarrow f\left(S\right)$ are identical. In particular, $({X}_{1}\leftarrow f\left(S\right))\le ({X}_{2}\leftarrow f\left(S\right))$.

- It is easy to see that ${X}_{2}$ has more irrelevant information than ${X}_{1}$: namely, ${X}_{2}$ can determine relatively precisely when $S=0$. However, since $S=0$ gives no utility independent of the action, this information is not relevant. It is more difficult to understand why ${X}_{2}$ has less relevant information than ${X}_{1}$. Surprisingly, ${X}_{1}$ can determine more precisely when $S=1$: if $S=1$, then ${X}_{1}$ “detects this” (in the sense that ${X}_{1}$ chooses action 0) with probability $2/3$. For ${X}_{2}$, the same probability is only $1/3$.
- The conditional entropies of S given ${X}_{2}$ are smaller than the conditional entropies of S given ${X}_{1}$:$$\begin{array}{cccc}\hfill H\left(S\right|{X}_{1}=0)& =log\left(2\right),\hfill & \hfill H\left(S\right|{X}_{1}=1)& {\textstyle =\frac{3}{2}log\left(2\right),}\hfill \\ \hfill H\left(S\right|{X}_{2}=0)& {\textstyle =2log\left(2\right)-\frac{3}{2}log\left(3\right)\approx 0.4150375log\left(2\right),}\hfill & \hfill H\left(S\right|{X}_{2}=1)& =log\left(2\right).\hfill \end{array}$$
- One can see in which sense $f\left(S\right)$ captures the relevant information for ${X}_{1}$, and indeed for the whole decision problem: knowing $f\left(S\right)$ is completely sufficient in order to receive the maximal utility for each state of S. However, when information is incomplete, it matters how the information about the different states of S is mixed, and two variables ${X}_{1},{X}_{2}$ that have the same joint distribution with $f\left(S\right)$ may perform differently. It is somewhat surprising that it is the random variable that has less information about S and that is conditionally independent of S given $f\left(S\right)$ which actually performs better.

**Example**

**3.**

## 5. Information Decomposition and Le Cam Deficiency

**Example**

**4.**

- 1.
- ${X}_{1},{X}_{2}$ are independent;
- 2.
- f(S) = And(X
_{1}, X_{2}), where f is as in Example 2; and - 3.
- ${X}_{1}$ is independent of S given $f\left(S\right)$.

**Example**

**5.**

- 1.
- ${X}_{2}$ is a function of S, where the function is as in Example 3.
- 2.
- ${X}_{1}$ is independent of S given $f\left(S\right)$.
- 3.
- The channels ${X}_{1}\leftarrow f\left(S\right)$ and ${X}_{2}\leftarrow f\left(S\right)$ are identical.

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Blackwell, D. Equivalent Comparisons of Experiments. Ann. Math. Stat.
**1953**, 24, 265–272. [Google Scholar] [CrossRef] - Torgersen, E. Comparison of Statistical Experiments; Cambridge University Press: New York, NY, USA, 1991. [Google Scholar]
- Le Cam, L. Comparison of Experiments—A Short Review. Stat. Probab. Game Theory
**1996**, 30, 127–138. [Google Scholar] - Bergmans, P. Random coding theorem for broadcast channels with degraded components. IEEE Trans. Inf. Theory
**1973**, 19, 197–207. [Google Scholar] [CrossRef] - Cohen, J.; Kemperman, J.; Zbăganu, G. Comparisons of Stochastic Matrices with Applications in Information Theory, Statistics, Economics, and Population Sciences; Birkhäuser: Boston, MA, USA, 1998. [Google Scholar]
- Körner, J.; Marton, K. Comparison of two noisy channels. In Topics in Information Theory; Colloquia Mathematica Societatis János Bolyai: Keszthely, Hungary, 1975; Volume 16, pp. 411–423. [Google Scholar]
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy
**2014**, 16, 2161–2183. [Google Scholar] [CrossRef] - Rauh, J.; Banerjee, P.K.; Olbrich, E.; Jost, J.; Bertschinger, N. On extractable shared information. arXiv, 2017; arXiv:1701.07805. [Google Scholar]
- Raginsky, M. Shannon meets Blackwell and Le Cam: Channels, codes, and statistical experiments. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, St. Petersburg, Russia, 31 July–5 August 2011; pp. 1220–1224. [Google Scholar]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Rauh, J.; Banerjee, P.K.; Olbrich, E.; Jost, J.; Bertschinger, N.; Wolpert, D.
Coarse-Graining and the Blackwell Order. *Entropy* **2017**, *19*, 527.
https://doi.org/10.3390/e19100527

**AMA Style**

Rauh J, Banerjee PK, Olbrich E, Jost J, Bertschinger N, Wolpert D.
Coarse-Graining and the Blackwell Order. *Entropy*. 2017; 19(10):527.
https://doi.org/10.3390/e19100527

**Chicago/Turabian Style**

Rauh, Johannes, Pradeep Kr. Banerjee, Eckehard Olbrich, Jürgen Jost, Nils Bertschinger, and David Wolpert.
2017. "Coarse-Graining and the Blackwell Order" *Entropy* 19, no. 10: 527.
https://doi.org/10.3390/e19100527