# Probability Mass Exclusions and the Directed Components of Mutual Information

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

Indeed, this interpretation led Hartley to derive the measure of information associated with a set of equally likely choices, which Shannon later generalised to account for unequally likely choices. Nevertheless, despite being used since the foundation of information theory, there is a surprising lack of a formal characterisation of information in terms of the elimination of choice. Both Fano [3] and Ash [4] motivate the notion of information in this way, but go on to derive the measure without explicit reference to the restriction of choice. More specifically, their motivational examples consider a set of possible choices $\mathcal{X}$ modelled by a random variable X. Then in alignment with Hartley’s description, they consider information to be something which excludes possible choices x, with more eliminations corresponding to greater information; however, this approach does not capture the concept of information in its most general sense since it cannot account for information provided by partial eliminations which merely reduces the likelihood of a choice x from occurring. (Of course, despite motivating the notion of information in this way, both Fano and Ash provide Shannon’s generalised measure of information which can account for unequally likely choices.) Nonetheless, Section 2 of this paper generalises Hartley’s interpretation of information by providing a formal characterisation of information in terms of probability mass exclusions.“By successive selections a sequence of symbols is brought to the listener’s attention. At each selection there are eliminated all of the other symbols which might have been chosen. As the selections proceed more and more possible symbol sequences are eliminated, and we say that the information becomes more precise.”

## 2. Information and Eliminations

**Definition**

**1**

**.**A probability mass exclusion induced by the event y from the random variable Y is the probability mass associated with the complementary event $\overline{y}$, i.e., $p\left(\overline{y}\right)$.

**Definition**

**2**

**.**For the joint event $xy$ from the random variables X and Y, an informative probability mass exclusion induced by the event y is the portion of the probability mass exclusion associated with the complementary event $\overline{x}$, i.e., $p(\overline{x},\overline{y})$.

**Definition**

**3**

**.**For the joint event $xy$ from the random variables X and Y, a misinformative probability mass exclusion induced by the event y is the portion of the probability mass exclusion associated with the event x, i.e., $p(x,\overline{y})$.

## 3. Information Decomposition and Probability Mass Exclusions

## 4. The Directed Components of Mutual Information

**Postulate**

**1**

**.**The information provided by y about x can be decomposed into two non-negative components, such that $i(x;y)={i}_{+}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)-{i}_{-}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$.

**Postulate**

**2**

**.**The functions ${i}_{+}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$ and ${i}_{-}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$ should satisfy the following conditions:

- 1.
- For all fixed $p(x,y)$ and $p(x,\overline{y})$, the function ${i}_{+}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$ is a continuous, increasing function of $p(\overline{x},\overline{y})$.
- 2.
- For all fixed $p(\overline{x},y)$ and $p(\overline{x},\overline{y})$, the function ${i}_{-}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$ is a continuous, increasing function of $p(x,\overline{y})$.
- 3.
- For all fixed $p(x,y)$ and $p(\overline{x},y)$, the functions ${i}_{+}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$ and ${i}_{-}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$ are increasing and decreasing functions of $p(\overline{x},\overline{y})$, respectively.

**Postulate**

**3**

**.**An event cannot misinform about itself, hence ${i}_{+}(x\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)=i(x;x)=h\left(x\right)$.

**Postulate**

**4**

**.**The functions ${i}_{+}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$ and ${i}_{-}(y\phantom{\rule{-0.166667em}{0ex}}\to \phantom{\rule{-0.166667em}{0ex}}x)$ satisfy a chain rule; i.e.,

**Theorem**

**1.**

## 5. Discussion

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A

**Lemma**

**A1.**

**Proof.**

**Figure A1.**The probability mass diagram associated with (A12). Lemma A2 uses Postulates 3 and 4 to provide a solution for the purely informative case.

**Lemma**

**A2.**

**Proof.**

**Lemma**

**A3.**

**Proof.**

**Proof**

**of**

**Theorem**

**1.**

**Corollary**

**A1.**

**Proof.**

**Corollary**

**A2.**

**Proof.**

**Corollary**

**A3.**

**Proof.**

**Corollary**

**A4.**

**Proof.**

## References

- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Labs Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] - Hartley, R.V.L. Transmission of information. Bell Syst. Labs Tech. J.
**1928**, 7, 535–563. [Google Scholar] [CrossRef] - Fano, R. Transmission of Information; The MIT Press: Massachusetts, MA, USA, 1961. [Google Scholar]
- Ash, R. Information Theory Interscience Tracts in Pure and Applied Mathematics; Interscience Publishers: Hoboken, NJ, USA, 1965. [Google Scholar]
- Lizier, J.T. Computation in Complex Systems. In The Local Information Dynamics of Distributed Computation in Complex Systems; Springer: Berlin/Heidelberg, Germany, 2013; pp. 13–52. [Google Scholar] [CrossRef]
- Prokopenko, M.; Boschetti, F.; Ryan, A.J. An information-theoretic primer on complexity, self-organization, and emergence. Complexity
**2008**, 15, 11–28. [Google Scholar] [CrossRef] - Lizier, J.T.; Bertschinger, N.; Jost, J.; Wibral, M. Information Decomposition of Target Effects from Multi-Source Interactions: Perspectives on Previous, Current and Future Work. Entropy
**2018**, 20, 307. [Google Scholar] [CrossRef] - Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv, 2010; arXiv:1004.2515. [Google Scholar]
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared information—New insights and problems in decomposing information in complex systems. Mathematics
**2012**, 251–269. [Google Scholar] - Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E
**2013**, 87, 012130. [Google Scholar] [CrossRef] [PubMed] - Griffith, V.; Koch, C. Quantifying Synergistic Mutual Information. In Guided Self-Organization: Inception; Prokopenko, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 159–190. [Google Scholar]
- Finn, C.; Lizier, J.T. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy
**2018**, 20, 297. [Google Scholar] [CrossRef] - Yuichiro, K. Abstract Methods in Information Theory; World Scientific: Singapore, 2016. [Google Scholar]

**Figure 1.**In probability mass diagrams, height represents the probability mass of each joint event from $\mathcal{X}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}Y$ which must sum to 1. The leftmost of the diagrams depicts the joint distribution $P(X,Y)$, while the central diagrams depict the joint distribution after the occurence of the event $y\in \mathcal{Y}$ leads to exclusion of the probability mass associated with the complementary event $\overline{y}$. By convention, vertical and diagonal hatching represent informative and misinformative exclusions, respectively. The rightmost diagrams represent the conditional distribution after the remaining probability mass has been normalised. Top row: A purely informative probability mass exclusion, $p(\overline{x},\overline{y})>0$ and $p(x,\overline{y})=0$, leading to $p\left(x\right|y)>p(x)$ and hence $i(x;y)>0$. Middle row: A purely misinformative probability mass exclusion, $p(\overline{x},\overline{y})=0$ and $p(x,\overline{y})>0$, leading to $p\left(x\right|y)<p(x)$ and hence $i(x;y)<0$. Bottom row: The general case $p(\overline{x},\overline{y}>0)$ and $p(x,\overline{y})>0$. Whether $p\left(x\right|y)$ turns out to be greater or less than $p\left(x\right)$ depends on the size of both the informative and misinformative exclusions.

**Figure 2.**Top: probability mass diagram for $\mathcal{X}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}Y$. Bottom: probability mass diagram for $\mathcal{X}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}Z$. Note that the events ${y}_{1}$ and ${z}_{1}$ can induce different exclusions in $P\left(X\right)$ and yet still yield the same conditional distributions $P\left(X\right|{y}_{1})=P\left(X\right|{z}_{1})$ and hence provide the same amount of information $i({x}_{1};{y}_{1})=i({x}_{1};{z}_{1})$ about the event ${x}_{1}$.

**Figure 3.**Top: y and z both simultaneously induce probability mass exclusions in $P\left(X\right)$ leading directly to $P\left(X\right|y,z)$. Middle: y could induce exclusions in $P\left(X\right)$ yielding $P\left(X\right|y)$, and then z could induce exclusions in $P\left(X\right|y)$ leading to $P\left(X\right|y,z)$. Bottom: the same as the middle, only vice versa in y and z.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Finn, C.; Lizier, J.T.
Probability Mass Exclusions and the Directed Components of Mutual Information. *Entropy* **2018**, *20*, 826.
https://doi.org/10.3390/e20110826

**AMA Style**

Finn C, Lizier JT.
Probability Mass Exclusions and the Directed Components of Mutual Information. *Entropy*. 2018; 20(11):826.
https://doi.org/10.3390/e20110826

**Chicago/Turabian Style**

Finn, Conor, and Joseph T. Lizier.
2018. "Probability Mass Exclusions and the Directed Components of Mutual Information" *Entropy* 20, no. 11: 826.
https://doi.org/10.3390/e20110826