Non-Negative Decomposition of Multivariate Information: From Minimum to Blackwell-Specific Information
Abstract
:1. Introduction
1.1. Related Work
1.2. Contributions
- We propose a representation of distinct uncertainty and distinct information, which is used to demonstrate the unexpected behavior of the measure by Williams and Beer [1] (Section 2.2 and Section 3.1).
- We propose a non-negative decomposition for any f-information measure at an arbitrary number of discrete random variables that satisfies an inclusion–exclusion relation and provides a meaningful operational interpretation (Section 3.2, Section 3.3 and Section 3.5). The decomposition satisfies the original axioms of Williams and Beer [1] (Theorems 3 and 4) and obtains different properties from different information measures (Section 4).
- We demonstrate several transformations of the proposed decomposition: (i) We transform the cumulative measure between different decomposition lattices (Section 3.4). (ii) We demonstrate that the non-negative decomposition of f-information directly provides a non-negative decomposition of Rényi- and Bhattacharyya-information at a transformed inclusion–exclusion relation (Section 3.6).
2. Background
2.1. Blackwell and Zonogon Order
2.2. Partial Information Decomposition
- A source is a non-empty set of visible variables.
- An atom is a set of sources constructed by Equation (8).
2.3. Information Measures
- f is convex;
- ;
- is finite for all .
3. Decomposition Methodology
- In a similar manner to how Finn and Lizier [13] used probability mass exclusion to differentiate distinct information, we use Neyman–Pearson regions for each state of a target variable to differentiate distinct information.
- We propose applying the concepts about lattice re-graduations discussed by Knuth [19] to PIDs to transform the decomposition of one information measure to another while maintaining its consistency.
3.1. Representing f-Information
- We define a function as shown in Equation (28a) to quantify a vector, where .
- We define a target pointwise f-information function , as shown in Equation (28b), to quantify half the zonogon perimeter for the corresponding pointwise channel .
- The convexity of in is shown separately in Lemma A1 of Appendix A.
- That scales linearly in can directly be seen from Equation (28a).
- The triangle inequality of in is shown separately in Corollary A1 of Appendix A.
- A vector of slope one is quantified to zero , since is a requirement on the generator function of an f-divergence (Definition 16).
- The zero vector is quantified to zero by the convention of generator functions for an f-divergence (Definition 16).
- That the function maintains the ordering relation of the Blackwell order on binary input channels is shown separately in Lemma A2 of Appendix A (Equation (29a)).
- The bottom element consists of a single vector of slope one, which is quantified to zero by Theorem 1 (Equation (29b)). The combination with Equation (29a) ensures the non-negativity.
3.2. Decomposing f-Information on the Synergy Lattice
- Axiom 1: The measure (Equation (33a)) is invariant to permuting the order of sources in , since the join operator of the zonogon order () is. Therefore, also satisfies Axiom 1.
- Axiom 2: The monotonicity of both and on the synergy lattice is shown separately as Corollary A2 in Appendix C.
- Non-negativity: The non-negativity of and is shown separately as Lemma A8 in Appendix C.
3.3. Operational Interpretation
3.4. Decomposition Duality
3.5. Decomposing f-Information on the Redundancy Lattice
- Axiom 1: The measure is invariant to permuting the order of sources in , since the join operator of the zonogon order () is. Therefore, also, satisfies Axiom 1.
- Non-negativity: The non-negativity of is obtained from Lemma 7 and Theorem 3 as shown in Equation (44). The non-negativity of the pointwise measure implies the non-negativity of the combined measure .
- Axiom 2: Since the cumulative measures and correspond to the sum of partial contributions in their down-set, the non-negativity of partial information implies the monotonicity of the cumulative measures.
- Axiom 3*: For a single source, equals f-information by definition (see Equation (30)). Therefore, satisfies Axiom 3*.
3.6. Decomposing Rényi-Information
- Axiom 1: is invariant to permuting the order of sources, since satisfies Axiom 1 (see Section 3.2).
- Axiom 2: satisfies monotonicity, since satisfies Axiom 2 (see Section 3.2) and the transformation function is monotonically increasing for .
- Axiom 3*: Since satisfies Axiom 3* (see Section 3.2, Equations (45) and (47)), satisfies the self-redundancy axiom by definition, however, at a transformed operator: .
- Non-negativity: The decomposition of is non-negative, since is non-negative (see Section 3.2), the Möbius inverse is computed with transformed operators (Equation (48b)) and the function satisfies .
4. Evaluation
4.1. Partial Information Decomposition
4.1.1. Comparison of Different f-Information Measures
4.1.2. The Special Case of Total Variation
- (a)
- (b)
- For a non-empty set of pointwise channels , pointwise total variation quantifies the join element to the maximum of its individual channels (Equation (51b)).
- (c)
- The loss measure quantifies the meet for a set of sources on the synergy lattice to their minimum (Equation (51c)).
4.2. Information Flow Analysis
5. Discussion
- All information measures in Section 2.3 are the expected value of the pointwise information (quantification of the Neyman–Pearson region boundary) for an indicator variable of each target state. Therefore, we argue for acknowledging the “pointwise nature” [13] of these information measures and to decompose them accordingly. A similar argument was made previously by Finn and Lizier [13] for the case of mutual information and motivated their proposed pointwise partial information decomposition.
- The Blackwell order does not form a lattice beyond indicator variables since it does not provide a unique meet or join element for [17]. However, from a pointwise perspective, the informativity (Definition 2) provides a unique representation of union information. This enables separating the definition of redundant, unique, and synergetic information from a specific information measure, which then only serves for its quantification. We interpret these observations as an indication that the Blackwell order should be used to decompose pointwise information based on indicator variables rather than decomposing the expected information based on the full target distribution.
- We can consider where the alternative approach would lead, if we decomposed the expected information from the full target distribution using the Blackwell order: the decomposition would become identical to the method of Bertschinger et al. [9] and Griffith and Koch [10]. For bivariate examples (), this decomposition [9,10] is non-negative and satisfies an additional property (identity, proposed by Harder et al. [5]). However, the identity property is inconsistent [32] with the axioms of Williams and Beer [1] and non-negativity for . This causes negative partial information when extending the approach to . The identity property also contradicts the conclusion of Finn and Lizier [13] from studying Kelly Gambling that, “information should be regarded as redundant information, regardless of the independence of the information sources” ([13], p. 26). It also contradicts our interpretation of distinct information through distinct decision regions when predicting an indicator variable for some target state. We do not argue that this interpretation should be applicable to the concept of information in general, but acknowledge that this behavior seems present in the information measures studied in this work and construct their decomposition accordingly.
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Quantifying Zonogon Perimeters
Appendix B. Inclusion-Exclusion Inequality of Zonogons
- (a)
- For any subset with even cardinality, the function returns a subset of function :
- (b)
- The function that satisfies Equation (A5) has an inverse on its first argument .
- In the base case , the sets of subsets are and . We define the function for any to satisfy both required properties:
- (a)
- (b)
- The function is a bijection from to and, therefore, has an inverse on its first argument (requirement of Equation (A6)).
- For the induction step, we show the definition of a function that satisfies both required properties. For sets , the subsets of even and odd cardinality can be expanded as shown in Equation (A7).We define for and at any as shown in Equation (A8) using the function and its inverse from the induction hypothesis. The function is defined for any subset in as can be seen from Equation (A7).Figure A1 provides an intuition for the definition of : the outcome of determines if the function maintains or breaks the mapping of .Figure A1. Intuition for the definition of Equation (A8). We can divide the set into and . The definition of function mirrors if (blue) and otherwise breaks its mapping (orange).The function F as defined in Equation (A8) satisfies both requirements (Equations (A5) and (A6)) for any :
- (a)
- To demonstrate that the function satisfies the subset relation of Equation (A5), we analyze the four cases for the return value of as defined in Equation (A8) individually:
- -
- holds, since the function always returns a subset of its input (Equation (A4)).
- -
- holds by the induction hypothesis.
- -
- If , then : Since the input to function is not the empty set, the function returns a singleton subset of its input (Equation (A4)). If the element in the singleton subset is unequal to q, then it is a subset of .
- -
- If , then holds trivially.
- (b)
- To demonstrate that the function has an inverse (Equation (A6)), we show that the function is a bijection from to . Since the function is defined for all elements in and both sets have the same cardinality (, Equation (A3)), it is sufficient to show that the function is distinct for all inputs.The return value of has four cases, two of which return a set containing q (cases 1 and 4 in Equation (A8)), while the two others do not (cases 2 and 3 in Equation (A8)). Therefore, we have to show that both of these cases cannot coincide for any input:
- -
- Cases 2 and 3 in Equation (A8): If the return value of both cases was equal, then , and therefore, . This leads to a contradiction, since the condition of case 3 ensures , while the condition of case 2 ensures . Hence, the return values of cases 2 and 3 are distinct.
- -
- Cases 1 and 4 in Equation (A8): If the return value of both cases was equal, then , and therefore, . This leads to a contradiction, since the condition of case 4 ensures , while the condition of case 1 ensures . Hence, the return values of cases 1 and 4 are distinct.
Since the function is a bijection, there exists an inverse .
Appendix C. Non-Negativity of Partial f-Information on the Synergy Lattice
Appendix C.1. Properties of the Loss Measure on the Synergy Lattice
- If , then the implication holds for any since the bottom element is inferior (⊑) to any other channel.
- If , then is also a non-empty set since .
Appendix C.2. The Non-Negativity of the Decomposition
- Let , then its cover set is non-empty (). Additionally, we know that no atom in the cover set is the empty set (), since the empty atom is the top element ().Since it will be required later, note that the inclusion–exclusion principle of a constant is the constant itself as shown in Equation (A16) since, without the empty set, there exists one more subset of odd cardinality than with even cardinality (see Equation (A3)).We can re-write the Möbius inverse as shown in Equation (A17), where Equation (A17b) is obtained from ([23], p. 15)).We can construct an upper bound on based on the cover set as shown in Equation (A19).By the transitivity of Equations (A18b) and (A19d), we obtain Equation (A20).By Equations (A17) and (A20), we obtain the non-negativity of pointwise partial information as shown in Equation (A21).
Appendix D. Mappings between Decomposition Lattices and Their Duality
- ;
- .
- Property 1: the n-ary Cartesian product () provides all combinations of one variable from each source (Definition 28). Let , then by Definition 11 (≃) of equivalence , we have to show that both elements are inferior to each other under the redundancy order:
- -
- : We begin by expanding the definition of the redundancy order as shown in Equation (A24) to highlight that it is sufficient to show that .To show , we have to demonstrate that is is possible to select one variable from each source in to reconstruct each source in :
- *
- By definition , each source in contains one variable from each source in , and all variables from each source in can be found in some source of .
- *
- By selecting the variable in each source of that originated from the same source in , we can exactly reconstruct each source in .
- *
- Therefore, , which implies .
- -
- : We begin by expanding the definition of the redundancy order (Equation (9)) as shown in Equation (A25) to highlight that we have to show that all sources in are a super-set of some source in .For a proof by induction, the recursive definition as shown in Equation (A26) highlights the relation of interest more clearly. We use the notation to indicate the i-th variable in source . That both functions are equivalent can directly be seen, since recursively combines all possible choices of selecting one variable from each source in , which is the definition of .Induction on the cardinality of :
- *
- Hypothesis: It is impossible to choose one variable from each source in without selecting all variables of some source :
- *
- Base case : The condition is satisfied as shown in Equation (A28), since turns each variable in into its own source. The second application recombines them.
- *
- Assume the induction hypothesis holds for .
- *
- For the induction step, let : From the recursive definition shown in Equation (A29), we can directly see all relevant options of choosing one element from each resulting source.
- ·
- Case 1: From every source in , we choose the variable that was contributed by the new source . The resulting set contains all variables of .
- ·
- Case 2: To avoid choosing all variables from , we have to select the variables contributed by instead for some . By the induction hypothesis, choosing one variable from each set in leads to choosing all variables of some source .
- ·
- Choosing one variable from each set in leads to choosing all variables of or all variables of some source .
- ·
- Thus, the induction hypothesis holds for .
- -
- As shown above, and , which implies .
- Property 2: We first expand the definitions:Then, we view both implications separately:
- Assume . Then, there exists a function that associates each source in with a source in .All sets contain one variable of each source in . Let the function indicate this selection:Define the set using the defined functions above as shown in Equation (A32). The function w is defined for all sources in , and the selected element is in the original source () by Equation (A30).The constructed set is a subset of , and it can be constructed for each . This proves Equation (A33):
- For the other direction, we show Equation (A34) and start with its simplification:The left-hand side states that, for some , all sources contain an element that is not in . Let us fix a particular and define a function returning this element :Then, we can define the set . The source selects one variable from each source; thus, , and by definition, . All sets must select one element from and, thus, contain one element that is not in . This provides the required implication of Equation (A34).
- ;
- .
- Property 1 is obtained from Definition 28: the first two cases revert each other, and the third case ( and ) holds since .
- Property 2:
- -
- Case 1: If , then . Therefore, .
- -
- Case 2: If , then . Therefore, .
- -
- Case 3: If , then :
- Case : The condition holds since it implies to be the minimal element in .
- Case : We start by simplifying the expression.
- -
- The left-hand side states that, for all , all must have at least one element that is not in .
- -
- The right-hand side states that, for all , there exists a combination of one variable per source in such that no element of the resulting collection is in . This is possible if and only if all sources have at least one element that is not in .
Therefore, both statements imply each other.
Appendix E. Scaling f-Information Does Not Affect Its Transformation
Appendix F. Decomposition Example Distributions
State | Probability | ||||||||
---|---|---|---|---|---|---|---|---|---|
XOR | Unq | PwUnq | RdnErr | Tbc | AND | Generic | |||
0 | 0 | 0 | 1/4 | 1/4 | 0 | 3/8 | 1/4 | 1/4 | 0.0625 |
0 | 0 | 1 | - | - | - | - | - | - | 0.3000 |
0 | 1 | 0 | - | 1/4 | 1/4 | 1/8 | - | 1/4 | 0.1875 |
0 | 1 | 1 | 1/4 | - | - | - | 1/4 | - | 0.1500 |
0 | 2 | 1 | - | - | 1/4 | - | - | - | - |
1 | 0 | 0 | - | - | 1/4 | - | - | 1/4 | 0.0375 |
1 | 0 | 1 | 1/4 | 1/4 | - | 1/8 | - | - | 0.0500 |
1 | 0 | 2 | - | - | - | - | 1/4 | - | - |
1 | 1 | 0 | 1/4 | - | - | - | - | - | 0.2125 |
1 | 1 | 1 | - | 1/4 | - | 3/8 | - | 1/4 | - |
1 | 1 | 3 | - | - | - | - | 1/4 | - | - |
2 | 0 | 1 | - | - | 1/4 | - | - | - | - |
Appendix G. The Relation of Total Variation to the Zonogon Height
Appendix H. Information Flow Example Parameters and Visualization
References
- Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
- Lizier, J.T.; Bertschinger, N.; Jost, J.; Wibral, M. Information Decomposition of Target Effects from Multi-Source Interactions: Perspectives on Previous, Current and Future Work. Entropy 2018, 20, 307. [Google Scholar] [CrossRef] [PubMed]
- Griffith, V.; Chong, E.K.P.; James, R.G.; Ellison, C.J.; Crutchfield, J.P. Intersection Information Based on Common Randomness. Entropy 2014, 16, 1985–2000. [Google Scholar] [CrossRef]
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared Information—New Insights and Problems in Decomposing Information in Complex Systems. In Proceedings of the European Conference on Complex Systems 2012; Gilbert, T., Kirkilionis, M., Nicolis, G., Eds.; Springer: Cham, Switzerland, 2013; pp. 251–269. [Google Scholar]
- Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef]
- Finn, C. A New Framework for Decomposing Multivariate Information. Ph.D. Thesis, University of Sydney, Darlington, NSW, Australia, 2019. [Google Scholar]
- Polyanskiy, Y.; Wu, Y. Information Theory: From Coding to Learning; Book Draft; Cambridge University Press: Cambridge, UK, 2022; Available online: https://people.lids.mit.edu/yp/homepage/data/itbook-2022.pdf (accessed on 13 May 2024).
- Mironov, I. Rényi Differential Privacy. In Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF), Santa Barbara, CA, USA, 21–25 August 2017; pp. 263–275. [Google Scholar] [CrossRef]
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying Unique Information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
- Griffith, V.; Koch, C. Quantifying Synergistic Mutual Information. In Guided Self-Organization: Inception; Springer: Berlin/Heidelberg, Germany, 2014; pp. 159–190. [Google Scholar] [CrossRef]
- Goodwell, A.E.; Kumar, P. Temporal information partitioning: Characterizing synergy, uniqueness, and redundancy in interacting environmental variables. Water Resour. Res. 2017, 53, 5920–5942. [Google Scholar] [CrossRef]
- James, R.G.; Emenheiser, J.; Crutchfield, J.P. Unique information via dependency constraints. J. Phys. A Math. Theor. 2018, 52, 014002. [Google Scholar] [CrossRef]
- Finn, C.; Lizier, J.T. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy 2018, 20, 297. [Google Scholar] [CrossRef]
- Ince, R.A.A. Measuring Multivariate Redundant Information with Pointwise Common Change in Surprisal. Entropy 2017, 19, 318. [Google Scholar] [CrossRef]
- Rosas, F.E.; Mediano, P.A.M.; Rassouli, B.; Barrett, A.B. An operational information decomposition via synergistic disclosure. J. Phys. A Math. Theor. 2020, 53, 485001. [Google Scholar] [CrossRef]
- Kolchinsky, A. A Novel Approach to the Partial Information Decomposition. Entropy 2022, 24, 403. [Google Scholar] [CrossRef] [PubMed]
- Bertschinger, N.; Rauh, J. The Blackwell relation defines no lattice. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 2479–2483. [Google Scholar] [CrossRef]
- Lizier, J.T.; Flecker, B.; Williams, P.L. Towards a synergy-based approach to measuring information modification. In Proceedings of the 2013 IEEE Symposium on Artificial Life (ALife), Singapore, 16–19 April 2013; pp. 43–51. [Google Scholar] [CrossRef]
- Knuth, K.H. Lattices and Their Consistent Quantification. Ann. Phys. 2019, 531, 1700370. [Google Scholar] [CrossRef]
- Mages, T.; Rohner, C. Decomposing and Tracing Mutual Information by Quantifying Reachable Decision Regions. Entropy 2023, 25, 1014. [Google Scholar] [CrossRef]
- Blackwell, D. Equivalent comparisons of experiments. Ann. Math. Stat. 1953, 24, 265–272. [Google Scholar] [CrossRef]
- Neyman, J.; Pearson, E.S., IX. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. London. Ser. A Contain. Pap. Math. Phys. Character 1933, 231, 289–337. [Google Scholar]
- Chicharro, D.; Panzeri, S. Synergy and Redundancy in Dual Decompositions of Mutual Information Gain and Information Loss. Entropy 2017, 19, 71. [Google Scholar] [CrossRef]
- Csiszár, I. On information-type measure of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 1967, 2, 299–318. [Google Scholar]
- Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, Berkeley, CA, USA, 20–30 July 1960; University of California Press: Berkeley, NC, USA, 1961; Volume 4, pp. 547–562. [Google Scholar]
- Sason, I.; Verdú, S. f -Divergence Inequalities. IEEE Trans. Inf. Theory 2016, 62, 5973–6006. [Google Scholar] [CrossRef]
- Kailath, T. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Commun. Technol. 1967, 15, 52–60. [Google Scholar] [CrossRef]
- Arikan, E. Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels. IEEE Trans. Inf. Theory 2009, 55, 3051–3073. [Google Scholar] [CrossRef]
- Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distribution. Bull. Calcutta Math. Soc. 1943, 35, 99–110. [Google Scholar]
- Mages, T.; Anastasiadi, E.; Rohner, C. Implementation: PID Blackwell Specific Information. 2024. Available online: https://github.com/uu-core/pid-blackwell-specific-information (accessed on 15 March 2024).
- Cardenas, A.; Baras, J.; Seamon, K. A framework for the evaluation of intrusion detection systems. In Proceedings of the 2006 IEEE Symposium on Security and Privacy (S & P’06), Berkeley, CA, USA, 21–24 May 2006; pp. 15–77. [Google Scholar] [CrossRef]
- Rauh, J.; Bertschinger, N.; Olbrich, E.; Jost, J. Reconsidering unique information: Towards a multivariate information decomposition. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 2232–2236. [Google Scholar] [CrossRef]
- Bossomaier, T.; Barnett, L.; Harré, M.; Lizier, J.T. An Introduction to Transfer Entropy; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
Redundancy Order | Synergy Order | |
---|---|---|
Ordering/equivalence | ≼/≃ | ⪯/≅ |
Join/meet | ⋎/⋏ | ∨/∧ |
Up-set/strict up-set | / | / |
Down-set/strict down-set | / | / |
Cover-set | ||
Top/bottom | / | / |
Notation | Name | Generator Function |
---|---|---|
Kullback-Leiber (KL)-divergence | ||
Total Variation (TV) | ||
-divergence | ||
Squared Hellinger distance | ||
Le Cam distance | ||
Jensen–Shannon divergence | ||
Hellinger-divergence with | ||
-divergence with |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mages, T.; Anastasiadi, E.; Rohner, C. Non-Negative Decomposition of Multivariate Information: From Minimum to Blackwell-Specific Information. Entropy 2024, 26, 424. https://doi.org/10.3390/e26050424
Mages T, Anastasiadi E, Rohner C. Non-Negative Decomposition of Multivariate Information: From Minimum to Blackwell-Specific Information. Entropy. 2024; 26(5):424. https://doi.org/10.3390/e26050424
Chicago/Turabian StyleMages, Tobias, Elli Anastasiadi, and Christian Rohner. 2024. "Non-Negative Decomposition of Multivariate Information: From Minimum to Blackwell-Specific Information" Entropy 26, no. 5: 424. https://doi.org/10.3390/e26050424