- freely available
- re-usable

*Entropy*
**2015**,
*17*(7),
4644-4653;
doi:10.3390/e17074644

^{1}

^{2}

^{*}

## Abstract

**:**We consider the problem of defining a measure of redundant information that quantifies how much common information two or more random variables specify about a target random variable. We discussed desired properties of such a measure, and propose new measures with some desirable properties.

## 1. Introduction

Many molecular and neurological systems involve multiple interacting factors affecting an outcome synergistically and/or redundantly. Attempts to shed light on issues such as population coding in neurons, or genetic contribution to a phenotype (e.g., eye-color), have motivated various proposals to leverage principled information-theoretic measures for quantifying informational synergy and redundancy, e.g., [1–5]. In these settings, we are concerned with the statistics of how two (or more) random variables X_{1}, X_{2}, called predictors, jointly or separately specify/predict another random variable Y, called a target random variable. This focus on a target random variable is in contrast to Shannon’s mutual information which quantifies statistical dependence between two random variables, and various notions of common information, e.g., [6–8].

The concepts of synergy and redundancy are based on several intuitive notions, e.g., positive informational synergy indicates that X_{1} and X_{2} act cooperatively or antagonistically to influence Y; positive redundancy indicates there is an aspect of Y that X_{1} and X_{2} can each separately predict. However, it has been challenging [9–12] to come up with precise information-theoretic definitions of synergy and redundancy that are consistent with all intuitively desired properties.

## 2. Background: Partial Information Decomposition

Partial Information Decomposition (PID) [13] defines the concepts of synergistic, redundant and unique information in terms of intersection information, I_{∩}({X_{1},…,X_{n}}:Y), which quantifies the common information that each of the n predictors X_{1},…,X_{n} conveys about a target random variable Y. An antichain lattice [14] of redundant, unique, and synergistic partial informations is built from the intersection information.

Partial information diagrams (PI-diagrams) extend Venn diagrams to represent synergy. A PI-diagram is composed of nonnegative partial information regions (PI-regions). Unlike the standard Venn entropy diagram in which the sum of all regions is the joint entropy H(X_{1…}_{n}, Y), in PI-diagrams the sum of all regions (i.e. the space of the PI-diagram) is the mutual information I(X_{1…n}:Y). PI-diagrams show how the mutual information I(X_{1…n}:Y) is distributed across subsets of the predictors. For example, in the PI-diagram for n = 2 (Figure 1): {1} denotes the unique information about Y that only X_{1} carries (likewise {2} denotes the information only X_{2} carries); {1, 2} denotes the redundant information about Y that X_{1} as well as X_{2} carries, while {12} denotes the information about Y that is specified only by X_{1} and X_{2} synergistically or jointly.

Each PI-region is either redundant, unique, or synergistic, but any combination of positive PI-regions may be possible. Per [13], for two predictors, the four partial informations are defined as follows: the redundant information as I_{∩}({X_{1}, X_{2}}:Y), the unique informations as

## 3. Desired I_{∩} properties and canonical examples

There are a number of intuitive properties, proposed in [5,9–13], that are considered desirable for the intersection information measure I_{∩} to satisfy:

(So) Weak Symmetry: I

_{∩}({X1,…,X_{n}}:Y) is invariant under reordering of X_{1},…,X_{n}.(Mo) Weak Monotonicity: I

_{∩}({X1,…,X_{n}, Z}:Y) ≤ I_{∩}({X_{1},…,X_{n}}:Y) with equality if there exists X_{i}∈ {X_{1},…,X_{n}} such that H(Z, X_{i}) = H(Z).Weak Monotonicity is a natural generalization of the monotonicity property from [13]. Weak monotonicity is inspired by the property of mutual information that if H(X|Z) = 0, then I(X:Y) ≤ I(Z:Y).

(SR) Self-Redundancy: I

_{∩}({X_{1}}:Y) = I(X_{1}:Y). The intersection information a single predictor X_{1}conveys about the target Y is equal to the mutual information between the X_{1}and the target Y.(M

_{1}) Strong Monotonicity: I_{∩}({X1,…,X_{n}, Z}:Y) ≤ I_{∩}({X1,…,X_{n}}:Y) with equality if there exists X_{i}∈ {X_{1},…,X_{n}} such that I(Z,X_{i}:Y) = I(Z:Y).Strong Monotonicity captures more precisely what is meant by “redundant information”, it says explicitly that it information about Y that is redundant, not just any redundancy among the predictors (weak monotonicity).

(LP) Local Positivity: For all n, the derived “partial informations” defined in [13] are nonnegative. This is equivalent to requiring that I

_{∩}satisfy total monotonicity, a stronger form of supermodularity. For n = 2 this can be concretized as, I_{∩}({X_{1}, X_{2}}:Y) ≥ I(X_{1}:X_{2}) − I(X_{1}:X_{2}|Y).(TM) Target Monotonicity: If H(Y|Z) = 0, then I

_{∩}({X_{1},…,X_{n}}:Y) ≤ I_{∩}({X_{1},…,X_{n}}:Z).

There are also a number of canonical examples for which one or more of the partial informations have intuitive values, which are considered desirable for the intersection information measure I_{∩} to attain.

**Example UNQ**, shown in Figure 2, is a canonical case of unique information, in which each predictor carries independent information about the target. Y has four equiprobable states: ab, aB, Ab, and AB. X_{1} uniquely specifies bit a/A, and X_{2} uniquely specifies bit b/B. Note that the states are named so as to highlight the two bits of unique information; it is equivalent to choose any four unique names for the four states.

**Example RdnXor**, shown in Figure 3, is a canonical example of redundancy and synergy coexisting. The r/R bit is redundant, while the 0/1 bit of Y is synergistically specified as the XOR of the corresponding bits in X_{1} and X_{2}.

**Example And**, shown in Figure 4, is an example where the relationship between X_{1}, X_{2} and Y is nonlinear, making the desired partial information values less intuitively obvious. Nevertheless, it is desired that the partial information values should be nonnegative.

**Example ImperfectRdn**, shown in Figure 5, is an example of “imperfect” or “lossy” correlation between the predictors, where it is intuitively desirable that the derived redundancy should be positive. Given (LP), we can determine the desired decomposition analytically. First, I(X_{1},X_{2}:Y) = I(X_{1}:Y) = 1 bit; therefore, I(X_{2}:Y|X_{1}) = I(X_{1},X_{2}:Y) − I(X_{1}:Y) = 0 bits. This determines two of the partial informations—the synergistic information I_{∂}({X_{1}, X_{2}}Y) and the unique information I_{∂}({X_{2}}:Y) are both zero. Then, the redundant information I_{∂}({X_{1}, X_{2}}:Y) = I(X_{2}:Y) − I_{∂}({X_{2}}:Y) = I(X_{2}:Y) = 0.99 bits. Having determined three of the partial informations, we compute the final unique information I_{∂}({X_{1}}:Y) = I(X_{1}:Y) − 0.99 = 0.01 bits.

## 4. Previous candidate measures

In [13], the authors propose to use the following quantity, I_{min}, as the intersection information measure:

_{KL}is the Kullback-Leibler divergence.

Though I_{min} is an intuitive and plausible choice for the intersection information, [9] showed that I_{min} has counterintuitive properties. In particular, I_{min} calculates one bit of redundant information for example UNQ (Figure 2). It does this because each input shares one bit of information with the output. However, it is quite clear that the shared informations are, in fact, different: X_{1} provides the low bit, while X_{2} provides the high bit. This led to the conclusion that I_{min} overestimates the ideal intersection information measure by focusing only on how much information the inputs provide to the output. Another way to understand why I_{min} overestimates redundancy in example UNQ is to imagine a hypothetical example where there are exactly two bits of unique information for every state y ∈ Y and no synergy or redundancy. I_{min} would calculate the redundancy as the minimum over both predictors which would be min[1, 1] = 1 bit. Therefore I_{min} would calculate 1 bit of redundancy even though by definition there was no redundancy but merely two bits of unique information.

Another candidate measure of synergy, WholeMinusSum (WMS) [9,16], calculates zero synergy and redundancy for Example RDNXOR, as opposed to the intuitive value of one bit of redundancy and one bit of synergy.

## 5. New candidate measures

#### 5.1. The I_{Λ} measure

Based on [17], we can consider a candidate intersection information as the maximum mutual information I(Q:Y) that some random variable Q conveys about Y, subject to Q being a function of each predictor X_{1},…,X_{n}. After some algebra, this leads to,

Example IMPERFECTRDN highlights the foremost shortcoming of I_{Λ}; I_{Λ} does not detect “imperfect” or “lossy” correlations between X_{1} and X_{2}. Instead, I_{Λ} calculates zero redundant information, that I_{∩}({X_{1}, X_{2}}:Y) = 0 bits. This arises from Pr(X_{1} = 1,X_{2} = 0) > 0. If this were zero, IMPERFECTRDN reverts to being determined by the properties (**SR**) and the (**M _{0}**) equality condition. Due to the nature of the common random variable, I

_{Λ}only sees the “deterministic” correlations between X

_{1}and X

_{2}—add even an iota of noise between X

_{1}and X

_{2}and I

_{Λ}plummets to zero. This highlights a related issue with I

_{Λ}; it is not continuous—an arbitrarily small change in the probability distribution can result in a discontinuous jump in the value of I

_{Λ}.

Despite this, I_{Λ} is a useful stepping-stone, it captures what is inarguably redundant information (the common random variable). In addition, unlike earlier measures, I_{Λ} satisfies (**TM**).

#### 5.2. The I_{α} measure

Intuitively, we expect that if Q only specifies redundant information, that conditioning on any predictor X_{i} would vanquish all of the information Q conveys about Y. We take this intuition to its final conclusion and find it yields a tigther lowerbound on I_{∩} than I_{Λ}. Moreover, I_{α} pleasantly reduces to a I_{Λ} but loosens the constraint in Equation (4) from H(Q|X_{i}) = 0 to H(Q|X_{i}) = H(Q_{i}|X_{i},Y):

_{α}also satisfies (TM). We can also show (See Lemmas 1 and 2 in Appendix A) that

While I_{α} satisfies previously defined canonical examples, we have found another example, shown in Figure 6, for which I_{Λ} and I_{α} both calculate negative synergy. This example further complicates Example AND by making the predictors mutually dependent.

## 6. Conclusion

The important part of this paper is exchanging (M_{0}) with (M_{1}) thus further constraining the space of acceptable I_{∩} measures. The complexity community aspires to eventually find a unique I_{∩} measure that satisfies a large portion of the desired properties, and any noncontroversial tightening of the space of possible I_{∩} measures, even (or especially?) if obvious in hindsight, is immensely welcome.

As discussed in [12], I_{∩} measures fail (**LP**) if and only if they are too strict a measure of redundant information. Loosening the constraints on I_{Λ} yields I_{α} and achieves a nonnegative decomposition on example IMPERFECTRDN. a natural next step is to loosen the constraints on I_{α} until achieving a nonnegative decomposition for example SUBTLE. Alternatively, a very plausible measure of the “unique information” [9,15,18] that satisfies (LP) for n = 2 yet does not satisfy (**TM**). It seems that (**LP**) and (**TM**) will be incompatible, and it would be nice to prove this.

## Acknowledgments

We thank Jim Beck, Yaser Abu-Mostafa, Edwin Chong, Chris Ellison, and Ryan James for helpful discussions.

## A. Appendix

**Proof I**_{α} **does not satisfy** (**LP**). Proof by counter-example SUBTLE (Figure 6).

For I(Q:Y|X_{1}) = 0, then Q must not distinguish between states of Y = 00 and Y = 01 (because X_{1} does not distinguish between these two states). This entails that Pr(Q|Y = 00) = Pr(Q|Y = 01). By symmetry, likewise for I(Q:Y|X_{2}) = 0, Q must be distinguish between states Y = 01 and Y = 11. Altogether, this entails that Pr(Q|Y = 00) = Pr(Q|Y = 01) = Pr(Q|Y = 11), which then entails, Pr(q|yi) = Pr(q|yj) ∀q ∈ Q, y_{i} ∈ Y, y_{j} ∈ Y, which is only achievable when Pr(q) = Pr(q|y) ∀q ∈ Q, y ∈ Y. This makes I(Q:Y) = 0, therefore for example SUBTLE, I_{α}({X_{1}, X_{2}}:Y) = 0.

**Lemma 1.** We have I_{Λ}({X_{1},…,X_{n}}:Y) ≤ I_{α}({X_{1},…,X_{n}}:Y).

**Proof.** We define a random variable Q′ = X_{1} Λ⋯Λ X_{n}. We then plugin Q′ for Q in the definition of I_{α}. This newly plugged-in Q satisfies the constraint ∀i ∈ {1,…,n} that I(Q:Y|X_{i}) = 0. Therefore, Q′ is always a possible choice for Q, and the maximization of I(Q:Y) in I_{α} must be at least as large as I(Q′:Y) = I_{Λ}({X1,…,X_{n}}:Y). □

**Lemma 2.** We have I_{α}({X_{1},…,X_{n}}:Y) ≤ I_{min} (X_{1},…,X_{n}:Y)

**Proof.** For a given state y ∈ Y and two arbitrary random variables Q and X, given I(Q:y|X) = D_{KL}[Pr(QX|y)∥ Pr(Q|X) Pr(X|y)] = 0, we show that, I(Q:y) ≤ I(X:y),

Generalizing to n predictors X1,…,X_{n}, the above shows that that the maximum I(Q:y) under constraint I(Q:y|X_{i}) will always be less than min_{i}_{∈{1,…,n}} I(X_{i}:y), which completes the proof. □

**Lemma 3.** Measure I_{min} satisfies desired property Strong Monotonicity, (M_{1}).

**Proof.** Given H(Y|Z) = 0, then the specific-surprise I(Z:y) yields,

Given that for an arbitrary random variable
${X}_{i},\mathrm{I}({X}_{i}:y)\le \mathrm{log}\frac{1}{\mathrm{Pr}(y)}$. As I_{min} takes only uses the min_{i} I(X_{i}:y), the minimum is invariant under adding any predictor Z such that H(Y|Z) = 0. Therefore, measure I_{min} satisfies property (M_{1}). □

## Author Contributions

Both authors shared in this research equally. Both authors have read and approved the final manuscript.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

- Schneidman, E.; Bialek, W.; Berry, M.J. Synergy, redundancy, and independence in population codes. J. Neurosci.
**2003**, 23, 11539–11553. [Google Scholar] - Narayanan, N.S.; Kimchi, E.Y.; Laubach, M. Redundancy and Synergy of Neuronal Ensembles in Motor Cortex. J. Neurosci.
**2005**, 25, 4207–4216. [Google Scholar] - Balduzzi, D.; Tononi, G. Integrated information in discrete dynamical systems: Motivation and theoretical framework. PLoS Comput. Biol.
**2008**, 4, e1000091. [Google Scholar] - Anastassiou, D. Computational analysis of the synergy among multiple interacting genes. Mol. Syst. Biol.
**2007**, 3, 83. [Google Scholar] - Lizier, J.T.; Flecker, B.; Williams, P.L. Towards a Synergy-based Approach to Measuring Information Modification. Proceedings of 2013 IEEE Symposium on Artificial Life (ALIFE), Singapore, Singapore, 16–19 April 2013; pp. 43–51.
- Gács, P.; Körner, J. Common information is far less than mutual information. Prob. Control Inf. Theory.
**1973**, 2, 149–162. [Google Scholar] - Wyner, A.D. The common information of two dependent random variables. IEEE Trans. Inf. Theory.
**1975**, 21, 163–179. [Google Scholar] - Kumar, G.R.; Li, C.T.; Gamal, A.E. Exact Common Information. Proceedings of 2014 IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 161–165.
- Griffith, V.; Koch, C. Quantifying synergistic mutual information. In Guided Self-Organization: Inception; Emergence, Complexity and Computation Serie; Volume 9, Springer: Berlin/Heidelberg, Germany, 2014; pp. 159–190. [Google Scholar]
- Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E
**2013**, 87, 012130. [Google Scholar] - Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared Information—New Insights and Problems in Decomposing Information in Complex Systems. In Proceedings of European Conference on Complex Systems 2012; Springer Proceedings in Complexity Serie; Springer; Switzerland, 2013; pp. 251–269. [Google Scholar]
- Griffith, V.; Chong, E.K.P.; James, R.G.; Ellison, C.J.; Crutchfield, J.P. Intersection Information based on Common Randomness. Entropy
**2014**, 16, 1985–2000. [Google Scholar] - Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information
**2010**. arXiv: 1004-2515. - Weisstein, E.W. Antichain, Available online: http://mathworld.wolfram.com/Antichain.html accessed on 29 June 2015.
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy
**2014**, 16, 2161–2183. [Google Scholar] - Schneidman, E.; Still, S.; Berry, M.J.; Bialek, W. Network Information and Connected Correlations. Phys. Rev. Lett.
**2003**, 91, 238701–238705. [Google Scholar] - Wolf, S.; Wullschleger, J. Zero-error information and applications in cryptography. Proceedings of IEEE Information Theory Workshop, San Antonio, TX, USA, 24–29 October 2004; pp. 1–6.
- Rauh, J.; Bertschinger, N.; Olbrich, E.; Jost, J. Reconsidering unique information: Towards a multivariate information decomposition. Proceedings of 2014 IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 2232–2236.

**Figure 1.**PI-diagrams for n = 2 predictors, showing the amount of redundant (yellow/bottom), unique (magenta/left and right) and synergistic (cyan/top) information with respect to the target Y.

**Figure 2.**Example UNQ. X

_{1}and X

_{2}each uniquely carry one bit of information about Y. I(X

_{1}X

_{2}:Y) = H(Y) = 2 bits.

**Figure 3.**Example RDNXOR. This is the canonical example of redundancy and synergy coexisting. I

_{min}and I

_{Λ}each reach the desired decomposition of one bit of redundancy and one bit of synergy. This example demonstrates I

_{Λ}correctly extracting the embedded redundant bit within X

_{1}and X

_{2}.

**Figure 4.**Example AND. It is universally agreed that the redundant information is between [0, 0.311] bits. The most compelling argument is from [15] arguing for 0.311 bits of redundant information.

**Figure 5.**Example IMPERFECTRDN. I

_{Λ}is blind to the noisy correlation between X

_{1}and X

_{2}and calculates zero redundant information. An ideal I

_{∩}measure would detect that all of the information X

_{2}specifies about Y is also specified by X

_{1}to calculate I

_{∩}({X

_{1}, X

_{2}}:Y) = 0.99 bits.

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).