Next Article in Journal
A Potential Information Capacity Index for Link Prediction of Complex Networks Based on the Cannikin Law
Previous Article in Journal
Remaining Useful Life Prediction with Similarity Fusion of Multi-Parameter and Multi-Sample Based on the Vibration Signals of Diesel Generator Gearbox
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MAXENT3D_PID: An Estimator for the Maximum-Entropy Trivariate Partial Information Decomposition

by
Abdullah Makkeh
1,*,†,
Daniel Chicharro
2,†,‡,
Dirk Oliver Theis
1,† and
Raul Vicente
1,†
1
Institute of Computer Science, University of Tartu, 51014 Tartu, Estonia
2
Neural Computation Laboratory, Center for Neuroscience and Cognitive Systems@UniTn, Istituto Italiano di Tecnologia, 38068 Rovereto (TN), Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Current address: Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA.
Entropy 2019, 21(9), 862; https://doi.org/10.3390/e21090862
Submission received: 28 June 2019 / Revised: 26 August 2019 / Accepted: 27 August 2019 / Published: 3 September 2019
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Partial information decomposition (PID) separates the contributions of sources about a target into unique, redundant, and synergistic components of information. In essence, PID answers the question of “who knows what” of a system of random variables and hence has applications to a wide spectrum of fields ranging from social to biological sciences. The paper presents MaxEnt3D_Pid, an algorithm that computes the PID of three sources, based on a recently-proposed maximum entropy measure, using convex optimization (cone programming). We describe the algorithm and its associated software utilization and report the results of various experiments assessing its accuracy. Moreover, the paper shows that a hierarchy of bivariate and trivariate PID allows obtaining the finer quantities of the trivariate partial information measure.

1. Introduction: Motivation and Significance

The characterization of dependencies within complex multivariate systems helps to identify the mechanisms operating in the system and understanding their function. Recent work has developed methods to characterize multivariate interactions by separating n-variate dependencies for different orders n [1,2,3,4,5]. In particular, the work of Williams and Beer [6,7] introduced a framework, called partial information decomposition (PID), which quantifies whether different input variables provide redundant, unique, or synergistic information about an output variable when combined with other input variables. Intuitively, inputs are redundant if each carries individually information about the same aspects of the output. Information is unique if it is not carried by any other single (or group of) variables, and synergistic information can only be retrieved combining several inputs.
This information-theoretic approach to study interactions has found many applications to complex systems such as gene networks (e.g., [8,9,10]), interactive agents (e.g., [11,12,13,14]), or neural processing (e.g., [15,16,17]). More generally, the nature of the information contained in the inputs determines the complexity of extracting it [18,19], how robust it is to disrupt the system [20], or how input dimensionality can be reduced without information loss [21,22].
Despite this great potential, the applicability of the PID framework has been hindered by the lack of agreement on the definition of a suitable measure of redundancy. In particular, Harder et al. [23] indicated that the original measure proposed by [6] only quantifies common amounts of information, instead of shared information that is qualitatively the same. A constellation of measures has been proposed to implement the PID (e.g., [23,24,25,26,27,28,29]), and core properties, such as requiring nonnegativity as a property of the measures, are still the subject of debate [29,30,31,32].
A widespread application of the PID framework has also been limited by the lack of multivariate implementations. Some of the proposed measures were only defined for the bivariate case [23,24,33]. Other multivariate measures allow negative components in the PID [26,29], which, although it may be adequate for statistical characterization of dependencies, limits the interpretation of the information-theoretic quantities in terms of information communication [34]. Even though at the level of local information, negativity is regarded as misinformation and can be interpreted, for example, operationally in terms of changes in belief [35], when considering information in the context of communication [36], then interpreting it as the number of messages to be retrieved without error through a noisy channel requires nonnegativity; for example, assessing the information representation about multidimensional sensory stimulus across neurons, in particular the analyses of the information content of neural responses [17,37]. Among the PID measures proposed, the maximum entropy measures of Bertschinger et al. [24] have a preeminent role in the bivariate case because they provide bounds for any other measure consistent with a set of properties shared by many of the proposed measures. Motivated by this special role of the maximum entropy measures, Chicharro [38] extended the maximum entropy approach to measures of the multivariate redundant information, which provide analogous bounds for the multivariate case. However, the work in [38] did not address their numerical implementation.
In this work, we present MaxEnt3D_Pid, a Python module that computes a trivariate information decomposition following the maximum entropy PID of [38] and exploits the connection with the bivariate decompositions associated with the trivariate ones [28]. This is, to our knowledge, the first available implementation of the maximum-entropy PID framework beyond the bivariate case [39,40,41,42], see Appendix B. This implementation is relevant for the theoretical development and practical use of the PID framework.
From a theoretical point of view, this implementation will provide the possibility to test the properties of the PID beyond the bivariate case. This is critical with regard to the nonnegativity property because, while nonnegativity is guaranteed in the bivariate case, for the multivariate case, it has been proven that negative terms can appear in the presence of deterministic dependencies [30,32,43]. However, the violation of nonnegativity has only been proven with isolated counterexamples, and it is not understood which properties of a system’s dependencies lead to negative PID measures.
From a practical point of view, the trivariate PID allows studying new types of distributed information that only appear beyond the bivariate case, such as information that is redundant for two inputs and unique with respect to a third [6]. This extension is significant both to study multivariate systems directly, as well as to be exploited for data analysis [21,44]. As mentioned above, the characterization of synergy and redundancy in multivariate systems is relevant for a broad range of fields that encompass social and biological systems. So far, the PID has particularly found applications in neuroscience (e.g., [17,37,45,46,47,48]). For data analysis, the quantification of multivariate redundancy can be applied to dimensionality reduction [22] or to better understand how representations emerge in neural networks during learning [49,50]. Altogether, this software promises to contribute significantly to the refinement of the information-theoretic tools it implements and also to foster its widespread application to analyze data from multivariate systems.

2. Models and Software

The section starts by briefly describing the mathematical model of the problem. Then, it discusses the architecture of MaxEnt3D_Pid. It closes by explaining in details of how to use the software.

2.1. Maximum Entropy Decomposition Measure

Consider X , Y , and Z as the sources and T as the target of some system. Let P be the joint distribution of ( T , X , Y , Z ) and MI ( T ; S ) be the mutual information of T and S , where S is any nonempty subset of ( X , Y , Z ) . The PID decomposes MI ( T ; X , Y , Z ) into finer parts, namely synergistic, unique, redundant unique, and redundant information. These finer parts respect certain identities [6], e.g., a subset of them sums up to MI ( T , X ) (all identities are explained in Appendix A and Appendix C). Following the maximum entropy approach [24], to obtain this decomposition, it is necessary to solve the following optimization problems:
min Δ P MI ( T ; X , Y , Z )
min Δ P MI ( T ; X 1 , X 2 ) for X 1 , X 2 { X , Y , Z }
where:
Δ P = { Q Δ : Q ( T , X ) = P ( T , X ) , Q ( T , Y ) = P ( T , Y ) , Q ( T , Z ) = P ( T , Z ) }
and Δ is the set of all joint distributions of ( T , X , Y , Z ) . The four minimization problems in Equation (1a,b) can be formulated as exponential cone programs, a special case of convex optimization. The authors refer to [41] for a nutshell introduction to cone programs, in particular the exponential ones. The full details on how to formulate (1a,b) as exponential cone programs and their convergence properties are explained in [51] (Chapter 5).
MaxEnt3D_Pid on its own returns the synergistic information and unique information collectively. In addition, with the help of the bivariate solver [39] (used in a specific way), the finer synergistic and unique information can also be extracted. Hence, the presented model obtains all the trivariate PID quantities. The full details for recovering the finer parts can be found in Appendices Appendix C and Appendix D.

2.2. Software Architecture and Functionality

MaxEnt3D_Pid is implemented using the standard Python syntax. The module uses an optimization software ECOS [52] to solve several optimization problems needed to compute the trivariate PID. To install the module, the ECOS Python package has to be installed, and then from the GitHub repository, the files MAXENT3D_PID.py, TRIVARIATE_SYN.py, TRIVARIATE_UNQ.py, and TRIVARIATE_QP.py must be downloaded [53].
MaxEnt3D_Pid has two Python classes Solve_w_ECOS and QP. Class Solve_w_ECOS receives the marginal distributions of ( T , X ) , ( T , Y ) , and  ( T , Z ) as Python dictionaries. These distributions are used by Solve_w_ECOS sub-classes Opt_I and Opt_II to solve the optimization problems of Equation (1a,b) respectively. The class QP is used to recover the solution of any optimization problems of Equation (1a,b) when Solve_w_ECOS fails to obtain a solution of a good quality. Figure 1 gives an overview of how these two classes interact.

2.2.1. The Subclass Opt_I and Opt_II

The sub-classes Opt_I and Opt_II formulate the problems Equation (1a,b), use ECOS to get the optimal values, and compute their violations of the optimality certificates. They return the optimal values and their optimality violations. These violations are quality measures of the obtained PID. Figure 1 describes this process within the class Solve_w_ECOS. Note that both sub-classes Opt_I and Opt_II optimize conditional entropy functionals; however, the different number of arguments leads to a difference in how to fit the problems into the cone program and retrieving the optimal solution; hence the requirement of splitting them into different classes.

2.2.2. The Class QP

Class QP acts if Solve_w_ECOS returns values of a subset of Equation (1a,b) with high optimality violations. It improves the errant values by best fitting them using quadratic programming, where the PID identities (A12) are respected.

2.3. Using MaxEnt3D_Pid

The process of computing the PID is packed in the function pid(). This function takes as input the distribution P of ( T , X , Y , Z ) via a Python dictionary where the tuples ( t , x , y , z ) are keys and their associated probability P ( t , x , y , z ) is the value of the key; see Figure 2. The function formulates and solves the problems of (1a,b) using Solve_w_ECOS and, if needed, uses QP to improve the solution. This function pid() returns a Python dictionary, explained in Table 1 and Table 2, containing the PID of ( T , X , Y , Z ) in addition to the optimality violations.
The function pid() has three other optional inputs. The first optional input is called parallel (the default value is parallel=’off’), which determines whether the process will be parallelized. If parallel=’off’, then the process is going to be done sequentially, i.e., the four problems of Equation (1a,b) are going to be formulated and solved one after the other. Their optimality violations are also computed consecutively, and then, final results are obtained; whereas, when parallel=’on’, the formulation of the four problems Equation (1a,b) is done in parallel. The four problems are solved simultaneously, and finally, the optimality violations along with the final results are computed in parallel. Thus, when parallel=’on’, there will be three sequential steps: formulating the problems, solving them, and obtaining the final results, as opposed to parallel=’off’, which requires at least twelve sequential steps.
The second optional input is a dictionary that allows the user to tune the tolerances controlling the optimization routines of ECOS listed in Table 3.
In this dictionary, the user only sets the parameters that will be tuned. For example, if the user wants to achieve high accuracy, then the parameters abstol and reltol should be small (e.g.,  10 12 ) and the parameter max_iter should be high (e.g., 1000). In Figure 3, it is shown how to modify the parameters. In this case, the solver will take longer to return the solution. For further details about the parameter’s tuning, check [41].
The third optional input is called output, and it controls what pid() will print on the user’s screen. This optional input is explained in Table 4.
Table 4. Description of the printing modes in the function pid().
Table 4. Description of the printing modes in the function pid().
ValueDescription
0 (default)Simple Mode:pid() prints its output (Python dictionary).
1Time Mode: In addition to what is printed when output=0pid() prints a flag when it starts
preparing the optimization problems in Equation (1a,b), the total time to create each problem, a flag when it
calls ECOS, brief stats from ECOS of each problem after solving it (Figure 4), the total time for
retrieving the results, the total time for computing the optimality violations, and the total time
to store the results.
2Detailed Time Mode: In addition to what is printed when output=0pid() prints for each
problem the time of each major step of creating the model, brief stats from ECOS of each problem
after solving it, the total time of each function used for retrieving the results, the time of each
major step used to compute the optimality violations, the time of each function used to obtain
the final results, and the total time to store the results.
3Detailed Optimization Mode: In addition to what is printed when output=1pid() prints
ECOS detailed stats of each problem after solving it (Figure 5).

3. Illustrations

This section shows some performance tests of MaxEnt3D_Pid on three types of instances. We will describe each type of instance and show the results of testing MaxEnt3D_Pid for each one of them. The first two types, paradigmatic and Copy gates, are used as validation and memory tests. The last type, random probability distributions, is used to evaluate the accuracy and efficiency of MaxEnt3D_Pid in computing the trivariate partial information decomposition. More precisely, accuracy is evaluated as how close the values of UI ( T ; X Y , Z ) and UI ( T ; Y X , Z ) are to zero when Z has a considerably higher dimension, which is expected theoretically. The efficiency will be depicted in how fast MaxEnt3D_Pid is able to produce the results. The machine used comes with an Intel(R) Core(TM) i7-4790K CPU (four cores) and 16 GB of RAM. Only the computations of the last type were done using parallelization.

3.1. Paradigmatic Gates

As a first test, we used some trivariate PIDs that are known and have been studied previously [25]. These examples are the logic gates collected in Table 5. For these examples, the decomposition can be derived analytically, and thus, they serve to check the numerical estimations.

Testing

The test was implemented in test_gates.py. MaxEnt3D_Pid returns, for all gates, the same values as ([25], Table 1) up to a precision error of order 10 9 . The slowest solving time (not in parallel) was one millisecond.

3.2. Copy Gate

As a second test, we used the Copy gate example to examine the simulation of large systems. We simulated how the solver handled large systems in terms of speed and reliability. Reliability, in this context, is meant as the consistency of the measure on large systems and the degree to which the results can be trusted to be accurate enough.
The Copy gate is the mapping of ( x , y , z ) , chosen uniformly at random, to  ( t , x , y , z ) , where t = ( x , y , z ) . The size of the joint distribution of ( T , X , Y , Z ) scales as | X | 2 · | Y | 2 · | Z | 2 , where x , y , z X × Y × Z . In our test, | X | = , | Y | = m and | Z | = n , where , m , n { 10 , 20 , , 50 } .
Since X , Y and Z are independent, it is easy to see that the only nonzero quantities are UI ( T ; X 1 X 2 , X 3 ) = H ( X 1 ) for X 1 , X 2 , X 3 { X , Y , Z } .

Testing

The test was implemented in test_copy_gate.py. The slowest solving time was less than 100 s, and the worst deviation from the actual values was 0.0001 % . For more details, see Table 6.

3.3. Random Probability Distributions

As a last example, we used joint distributions of ( T , X , Y , Z ) sampled uniformly at random over the probability space, to test the accuracy of the solver. The size of T, X, and Y was fixed to two, whereas | Z | varied in { 2 , , 14 } . For each | Z | , 500 joint distributions of ( T , X , Y , Z ) were sampled.

Testing

As | Z | increased, the average value of UI ( T ; X Y , Z ) and of UI ( T ; Y X , Z ) decreased, while that of UI ( T ; Z X , Y ) increased. In Figure 6, the accuracy of the optimization is reflected in the low divergence from zero obtained for the unique information UI ( T ; X Y , Z ) and UI ( T ; Y X , Z ) . In Figure 7, the time has a constant trend, and the highest time value recorded was 0 . 8 s.

3.4. Challenging Distributions

We tested MaxEnt3D_Pid on randomly uniformly-sampled distributions, but with large sizes of T, X, Y, and Z. For each m , 500 joint distributions of ( T , X , Y , Z ) were sampled where | T | = | X | = | Y | = | Z | = m and 2 m 19 . The idea was to check with random and huge distributions (not structured as in the case of the Copy gate) how stable the estimator was.

3.4.1. Testing

For m 5 , some of the optimization problems (1a,b) did not converge due to numerical instabilities. This issue started to be frequent and significant when m 14 , for example 5% of the distributions had numerical problems in some of their optimization problems. We noticed that the returned solution from the non-convergent problem was feasible and far from optimal by a factor of 100 at most. The feasibility of the returned solution suggested fitting it along with the returned (optimal) solutions from the other convergent problems into the system of PID identities (A12), which will reduce the optimality gap.

3.4.2. Recommendation

These challenging distributions have mainly two features, namely the high dimensionality of the quadruple ( T , X , Y , Z ) and a significant number of relatively small (almost null) probability masses along with few concentrated probability masses. We suspect that these two features combined were the main reason for the convergence problems. Our approach was to use a quadratic programming (Class QP), which focuses on reducing the optimality gap and thus returns a close PID to the optimal PID (in case of no convergence problems).
Furthermore, we advise users to mitigate such distributions by dropping some of the points with almost null probability masses. Since the objective functions in (1a,b) are continuous and smooth (full support distributions) on Δ P , then the PID of the mitigated distribution is considered a good approximation of that of the original distribution. Although we did not test this ad hoc on MaxEnt3D_PID, the same technique was applied to such instances for Broja_2PID ([51], Chapter 5).
We speculated that when m 50 , the solver will suffer dire numerical instabilities. It is recommended for the user to avoid large discrete binning resulting in humongous distributions.

3.4.3. Time Complexity

Theoretically, Makkeh et al. [39,51] showed that the worst running time complexity for solving (1a) (the hardest problem computationally) was O ( N 3 / 2 log N ) where N = | T × X × Y × Z | . Note that this time complexity bound was for the so-called barrier method, whereas Ecos uses the primal-dual Mehrotra predictor-corrector method [54], which does not have a theoretical complexity bound [55].

4. Summary and Discussion

In this work, we presented MaxEnt3D_Pid, a Python module that computes a trivariate decomposition based on the partial information decomposition (PID) framework of Williams and Beer [6], in particular following the maximum entropy PID of [38] and exploiting the connection with the bivariate decompositions associated with the trivariate one [28]. This is, to our knowledge, the first available implementation extending the maximum-entropy PID framework beyond the bivariate case [39,40,41,42].
The PID framework allows decomposing the information that a group of input variables has about a target variable into redundant, unique, and synergistic components. For the bivariate case, this results in decomposition with four components, quantifying the redundancy, synergy, and unique information of each of the two inputs. In the multivariate case, finer parts appear, which do not correspond to purely redundant or unique components. For example, the redundancy components of the multivariate decomposition can be interpreted based on local unfoldings when a new input is added, with each redundancy component unfolding into a component also redundant with the new variable and a component of unique redundancy with respect to it [38]. The PID analysis can qualitatively characterize the distribution of information beyond the standard mutual information measures [56] and has already been proven useful to study information in multivariate systems (e.g., [14,17,37,56,57,58,59,60,61,62]).
However, the definition of suited measures to quantify synergy and redundancy is still a subject of debate. From all the proposed PID measures, the maximum entropy measures by Bertschinger et al. [24] have a preeminent role in the bivariate case because they provide bounds to any other alternative measures that share fundamental properties related to the notions of redundancy and unique information. Chicharro [38] generalized the maximum entropy approach, proposing multivariate definitions of redundant information and showing that these measures implement the local unfolding of redundancy via hierarchically-related maximum entropy constraints. The package MaxEnt3D_Pid efficiently implemented the constrained information minimization operations involved in the calculation of the trivariate maximum-entropy PID decomposition. In Section 2, we described the architecture of the software, presented in detail the main function of the software that computes the PID along with its optional inputs, and described how to use it. In Section 3, we provided examples that verified that the software produced correct results on paradigmatic gates, simulated how the software scaled with large systems, and hinted to the accuracy of the software in estimating PID. In this section, we also presented challenging examples where the MaxEnt3D_PID core optimizer had convergence problems and discussed our technique to retrieve an approximate PID and some suggestions to avoid such anomalies.
The possibility to calculate a trivariate decomposition of the mutual information represents a qualitative extension of the PID framework that goes beyond an incremental extension of the bivariate case, both regarding its theoretical development and its applicability. From a theoretical point of view, regarding the maximum-entropy approach, the multivariate case requires the introduction of new types of constraints in the information minimization that do not appear in the bivariate case (Section 2 and [38]). More generally, the trivariate decomposition allows further studying one of the key unsolved issues in the PID formulation, namely the requirement of the nonnegativity of the PID measures in the multivariate case.
In particular, Harder et al. [23] indicated that the original measure proposed by [6] only quantified common amounts of information and required new properties for the PID measures, to quantify qualitatively and not quantitatively how information is distributed. However, for the multivariate case, these properties have been proven to be incompatible with guaranteeing nonnegativity, by using some counterexamples [30,32,43]. This led some subsequent proposals to define PID measures that either focus on the bivariate case [23,24] or do not require nonnegativity [26,29]. A multivariate formulation was desirable because the notions of synergy and redundancy are not restrained to the bivariate case, while nonnegativity is required for an interpretation of the measures in terms of information communication [34] and not only as a statistical description of the probability distributions. MaxEnt3D_Pid will allow systematically exploring when negative terms appear, beyond the currently-studied isolated counterexamples. Furthermore, it has been shown that in those counterexamples, the negative terms result from the criterion used to assign the information identity to different pieces of information when deterministic relations exist [32]. Therefore, a systematic analysis of the appearance of negative terms will provide a better understanding of how information identity is assigned when quantifying redundancy, which is fundamental to assess how the PID measures conform to the corresponding underlying concepts.
From a practical point of view, the trivariate decomposition allows studying qualitatively new types of distributed information, identifying finer parts of the information that the inputs have about the target, such as information that is redundant for two inputs and unique with respect to a third [6]. This is particularly useful when examining multivariate representations, such as the interactions between several genes [8,63] or characterizing the nature of coding in neural populations [64,65]. Furthermore, exploiting the connection between the bivariates and the trivariate decomposition due to the invariance of redundancy to context [28], MaxEnt3D_Pid also allows estimating the finer parts of the synergy component (Appendix D). This also offers a substantial extension in the applicability of the PID framework, in particular for the study of dynamical systems [66,67]. In particular, a question that requires a trivariate decomposition is how information transfer is distributed among multivariate dynamic processes. Information transfer is commonly quantified with the measure called transfer entropy [68,69,70,71,72], which calculates the conditional mutual information between the current state of a certain process Y and the past of another process X, given the past of Y and of any other processes Z that may also influence those two. In this case, by construction, the PID analysis should operate with three inputs corresponding to the pasts of X, Y, and Z. Transfer entropy is widely applied to study information flows between brain areas to characterize dynamic functional connectivity [73,74,75], and characterizing the synergy, redundancy, and unique information of these flows can provide further information about the degree of integration or segregation across brain areas [76].
More generally, the availability of software implementing the maximum entropy PID framework beyond the bivariate case promises to be useful in a wide range of fields in which interactions in multivariate systems are relevant, spanning the domain of social [12,77] and biological sciences [3,10,17,63]. Furthermore, the PID measures can also be used as a tool for data analysis and to characterize computational models. This comprises dimensionality reduction via synergy or redundancy minimization [19,22], the study of generative networks that emerge from information maximization constraints [78,79], or explaining the representations in deep networks [50].
The MaxEnt3D_Pid package presents several differences and advantages with respect to other software packages currently available to implement the PID framework. Regarding the maximum entropy approach, other packages only compute bivariate decompositions [39,40,41,42]. The dit package [42] also implements several other PID measures, including bivariate implementations for the measure of [23,27]. Among the multivariate decompositions, the ones using the measures I m i n  [6] or I M M I  [80] can readily be calculated with standard estimators of the mutual information. However, the former, as discussed above, only quantifies common amounts of information, while the latter is only valid for a certain type of data, namely multivariate Gaussian distributed. Software to estimate multivariate pointwise PIDs is also available [26,29,81]. However, as mentioned above, these measures by construction allow negative components, which may not be desirable for the interpretation of the decomposition, for example in the context of communication theory, and limits their applicability for data analysis in such regimes [22]. Altogether, MaxEnt3D_Pid is the first software that implements the mutual information PID framework via hierarchically-related maximum entropy constraints, extending the bivariate case by efficiently computing the trivariate PID measures.

Author Contributions

Conceptualization, A.M., D.C., D.O.T., and R.V.; formal analysis, A.M. and D.C.; funding acquisition, D.C., D.O.T., and R.V.; investigation, A.M. and D.O.T.; methodology, A.M.; project administration, R.V.; software, A.M.; supervision, D.O.T.; validation, R.V.; visualization, A.M.; writing, original draft, A.M. and D.C.; writing, review and editing, A.M., D.C., D.O.T., and R.V.

Funding

This research was supported by the Estonian Research Council, ETAG (Eesti Teadusagentuur), through PUTExploratory Grant #620. D.C. was supported by the Fondation Bertarelli. R.V. also thanks the financial support from ETAG through the personal research grant PUT1476. We also gratefully acknowledge funding by the European Regional Development Fund through the Estonian Center of Excellence in IT, EXCITE.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Appendix A. Williams–Beer PID Framework

In order to decompose MI ( T , S ) where T is the target and S are the sources, Williams and Beer [6] defined a set of axioms leading to what is known as the redundancy lattice (Figure A1). These axioms and lattice form the framework for partial information decomposition (PID) upon which all the exiting definitions of PID are formulated.

Appendix A.1. Williams–Beer Axioms

Suppose that a source A is a subset of S and a collection α is a set of sources. A shorthand notation inspired by [38] will be used to represent the collection of sources; for example, if the system is ( T , X , Y , Z ) , then the collection of sources { { X , Y } , { X , Z } } will be denoted as X Y . X Z . [6] defined the following axioms that redundancy should comply with:
  • Symmetry (S): MI ( T ; α ) is invariant with respect to the order of the sources in the collection.
  • Self-redundancy (SR): The redundancy of a collection formed by a single source is equal to the mutual information of that source.
  • Monotonicity (M): Adding sources to a collection can only decrease the redundancy of the resulting collection, and redundancy is kept constant when adding a superset of any of the existing sources.

Appendix A.2. The Redundancy Lattice

Williams and Beer [6] defined a lattice formed from the collections of sources. They used (M) to define the partial ordering between the collections. The axiom (S) reflects the fact that each atom of the lattice will represent a partial information decomposition quantity. More importantly, not all the collections of sources will be considered as atoms since adding a superset of any source to the examined system does not change redundancy, i.e., (M). The set of collections of sources included in the lattice which will form its atoms is defined as:
A ( S ) = { α P ( S ) { } : A i , A j α , A i A j } ,
where P ( S ) is the power set of S . For this set of collections (atoms), the partial ordering relation that constructs the redundancy lattice is:
α , β A ( S ) , ( α β B β , A α , A B ) ,
i.e., for two collections α and β , α β , if for each source in β , there is a source in α that is a subset of that source. In Figure A1, the bivariate and trivariate redundancy lattices are shown.
Figure A1. (A) Bivariate and (B) trivariate redundancy lattices. Letters indicate the mapping of terms between the lattices.
Figure A1. (A) Bivariate and (B) trivariate redundancy lattices. Letters indicate the mapping of terms between the lattices.
Entropy 21 00862 g0a1

Appendix A.3. Defining PID over the Redundancy Lattice

The mutual information decomposition was constructed in [6] by implicitly defining partial information measures δ C ( T ; α ) associated with each node α of the redundancy lattice C (Figure A1), such that the redundancy measures are obtained as:
MI ( T , α ) = β α δ C ( T ; β ) ,
where α refers to the set of collections lower than or equal to α in the partial ordering, and hence reachable descending from α in the lattice C .

Appendix B. Bivariate Partial Information Decomposition

Let T be the target random variable, X and Y be the two source random variables, and P be the joint probability distribution of ( T , X , Y ) . The PID captures the synergistic, unique, and redundant information as follows:
  • The synergistic information between X and Y about T , namely CI ( T ; X : Y ) .
  • The redundant information of X and Y about T , namely SI ( T ; X , Y ) .
  • The unique information of X about T , namely UI ( T ; X Y ) .
  • The unique information of Y about T , namely UI ( T ; Y X ) .
This decomposition, using Beer–Williams axioms, yields these identities:
MI ( T ; X , Y ) = CI ( T ; X : Y ) + SI ( T ; X , Y ) + UI ( T ; X Y ) + UI ( T ; X Y ) MI ( T ; X i ) = SI ( T ; X i , X j ) + UI ( T ; X i X j ) for all X i , X j { X , Y } .
Given the generic structure of the PID framework, the work in [24] (BROJA) defined PID measures considering the following polytope:
Δ P = { Q Δ : Q ( T , X ) = P ( T , X ) , Q ( T , Y ) = P ( T , Y ) } ,
where Δ is the set of all joint distributions of ( T , X , Y ) . The work in [24] (BROJA) used the maximum entropy decomposition over Δ P in order to quantify the above quantities. Moreover, BROJA assumed that the following assumptions hold.
Assumption A1 (Lemma 3[24]). On the bivariate redundancy lattice (Figure A1), the following assumptions must hold to quantify the PID
  • All partial information measures of the redundancy lattice are nonnegative.
  • The terms δ ( T ; X . Y ) , δ ( T ; X ) , and δ ( T ; Y ) , are constant on Δ P .
  • The synergistic term, namely δ ( T , X Y ) , vanishes on Δ P upon minimizing the mutual information MI ( T ; X , Y ) .
Under the above assumptions and using maximal entropy decomposition, BROJA defined the following optimization problems that compute the PID quantities.
CI ˜ ( T ; X : Y ) = MI ( T ; X , Y ) min Q Δ P MI ( T ; Y , X )
UI ˜ ( T ; X i X j ) = min Q Δ P MI ( T ; X i , X j ) min Q Δ P MI ( T ; X j ) for all X i , X j { X , Y }
SI ˜ ( T ; X , Y ) = max Q Δ P CoI ( T ; X ; Y )
where CoI ( T ; X ; Y ) is the co-information of T , X , and Y defined as MI ( T , X ) MI ( T , X Y ) . Note that [38] proved that (A6c) is equivalent to:
SI ˜ ( T ; X , Y ) = min Q Δ P , CoI ( T ; X ; Y ) = 0 MI ( T ; X , Y ) min Q Δ P MI ( T ; X , Y ) .

Appendix B.1. Mutual Information over the Bivariate Redundancy Lattice

This subsection writes down some mutual information quantities in terms of redundancy lattice partial information measures using (A3). These formulas will be used in the following subsection to verify that the measures defined in (A6a–c) to quantify the desired partial information quantities. MI ( T ; X , Y ) will be the sum of partial information measure on every node of the redundancy lattice C as follows:
MI ( T ; X , Y ) = δ ( T , X Y ) + δ ( T , X ) + δ ( T , Y ) + δ ( T , X . Y ) .
The mutual information of one source and the target is expressed as:
MI ( T ; X i ) = δ ( T , X i ) + δ ( T , X i . X j ) for X i , X j { X , Y } .
The mutual information of one source and the target conditioned on knowing the other source is expressed as:
MI ( T ; X i X j ) = δ ( T , X i X j ) + δ ( T , X i ) for all X i , X j { X , Y } .
The co-information CoI ( T ; X ; Y ) is expressed as:
CoI ( T ; X ; Y ) = δ ( T , X . Y ) δ ( T , X Y ) .

Appendix B.2. Verification of BROJA Optimization

This subsection will verify that the measures defined in (A6a–c) quantify the desired partial information quantities under the maximum decomposition principle. Assumption A1 implies that min Q Δ P δ ( T , X Y ) = 0 , min Q Δ P δ ( T , X . Y ) = δ ( T , 1.2 ) , min Q Δ P δ ( T , X ) = δ ( T , X ) , and min Q Δ P δ ( T , Y ) = δ ( T , Y ) . Therefore, it is easy to see that:
CI ˜ ( T ; X : Y ) = MI ( T ; X , Y ) min Q Δ P MI ( T ; Y , X ) = δ ( T , X Y ) UI ˜ ( T ; X Y ) = min Q Δ P MI ( T ; X , Y ) min Q Δ P MI ( T ; Y ) = δ ( T , X ) UI ˜ ( T ; Y X ) = min Q Δ P MI ( T ; X , Y ) min Q Δ P MI ( T ; X ) = δ ( T , Y ) .
Now, CoI ( T ; X ; Y ) = 0 implies that δ ( T , X Y ) = δ ( T , X . Y ) ; thus:
SI ˜ ( T ; X , Y ) = min Q Δ P , CoI ( T ; X ; Y ) = 0 MI ( T ; X , Y ) min Q Δ P MI ( T ; X , Y ) = δ ( T , X . Y ) .
Hence, under Assumption A1,
CI ˜ ( T ; X : Y ) = CI ( T ; X : Y ) , UI ˜ ( T ; X Y ) = UI ( T ; X Y ) UI ˜ ( T ; Y X ) = UI ( T ; Y X ) , SI ˜ ( T ; X , Y , Z ) = SI ( T ; X , Y , Z ) .

Appendix C. Maximum Entropy Decomposition of Trivariate PID

Let T be the target random variable, X , Y , Z be the source random variables, and P be the joint probability distribution of ( T , X , Y , Z ) . Chicharro [38] using maximum entropy decomposed mutual information MI ( T , X , Y , Z ) into: synergistic, unique, unique redundant, and redundant information. In this decomposition,
  • the synergistic quantity, CI ˜ ( T ; X , Y , Z ) , captures the sum of all individual synergistic terms, namely δ ( T ; X Y Z ) + δ ( T ; X Y ) + δ ( T ; X Z ) + δ ( T ; Y Z ) + δ ( T ; X Y . X Z ) + δ ( T ; X Y . Y Z ) + δ ( T ; X Z . Y Z ) + δ ( T ; X Y . X Z . Y Z ) ,
  • the unique information, UI ˜ ( T ; X i X j , X k ) , captures the sum of the information that X i has about T solely, δ ( T ; X i ) , and the information X i knows redundantly with the synergy of ( X j , X k ) , δ ( T ; X i . X j X k ) for all X i , X j , X k { X , Y , Z } ,
  • the unique redundant information, UI ˜ ( T ; X i , X j X k ) , captures the actual unique information that X i and X j have redundantly about T , δ ( T ; X i . X j ) for all X i , X j , X k { X , Y , Z } ,
  • and the redundant information, SI ˜ ( T ; X , Y , Z ) captures the actual redundant information of X , Y , and Z about T , i.e, δ ( T ; X . Y . Z ) .
Using Beer–Williams axioms. the decomposition yields these identities:
MI ( T ; X , Y , Z ) = CI ˜ ( T ; X , Y , Z ) + SI ˜ ( T ; X ; Y ; Z ) + UI ˜ ( T ; X Y , Z ) + UI ˜ ( T ; Y X , Z ) + UI ˜ ( T ; Z X , Y ) + UI ˜ ( T ; X , Y Z ) + UI ˜ ( T ; X , Z Y ) + UI ˜ ( T ; Y , Z X ) MI ( T ; X i ) = SI ˜ ( T ; X i ; X j ; X k ) + UI ˜ ( T ; X i X j , X k ) + UI ˜ ( T ; X i , X j X k ) + UI ˜ ( T ; X i , X k X j ) for all X i , X j , X k { X , Y , Z } .
and Δ is the set of all joint distributions of ( T , X , Y , Z ) . The measure uses the maximum entropy decomposition over Δ P in order to compute the above quantities. Moreover, the work in [38] made some assumptions over the partial information measures of the redundancy lattice.
Assumption A2 (Assumptions a.1 and a.2 in [38]). On the trivariate redundancy lattice (Figure A1), the following assumptions are made to quantify the PID
  • All partial information measures of the redundancy lattice are nonnegative.
  • The terms δ ( T ; X . Y . Z ) and δ ( T ; X i . X j ) for all X i , X j { X , Y , Z } are invariant on Δ P .
  • The summands δ ( T ; X i ) + δ ( T ; X i . X j X k ) for all X i , X j , X k { X , Y , Z } are invariant on Δ P .
  • The terms δ ( T ; X Y Z ) , δ ( T ; X Y . X Z . Y Z ) , δ ( T ; X i X j ) , δ ( T ; X i X j . X i X k ) , δ ( T ; X i ) , and  δ ( T ; X i . X j X k ) for all X i , X j , X k { X , Y , Z } are not constant on Δ P .
  • All synergistic terms, δ ( T ; X Y Z ) , δ ( T ; X Y . X Z . Y Z ) , δ ( T ; X i X j ) , and δ ( T ; X i X j . X i X k ) for all X i , X j , X k { X , Y , Z } vanish at the minimum over Δ P .
  • The partial information measures δ ( T ; X i . X j X k ) for all X i , X j , X k { X , Y , Z } vanish at the minimum over Δ P .
Under the above assumptions and using maximal entropy decomposition, the work in [38] defined the following optimization problems that compute the PID quantities.
CI ˜ ( T ; X , Y , Z ) = MI ( T ; X , Y , Z ) min Q Δ P MI ( T ; Y , X , Z )
UI ˜ ( T ; X i X j , X k ) = min Q Δ P MI ( T ; X i , X j , X k ) min Q Δ P MI ( T ; X j , X k ) for all X i , X j , X k { X , Y , Z }
UI ˜ ( T ; X i , X j X k ) = min Q Δ P , CoI ( T ; X i ; X j X k ) = 0 MI ( T ; X i , X j , X k ) min Q Δ P MI ( T ; X i , X j , X k ) for all X i , X j , X k { X , Y , Z }
SI ˜ ( T ; Z , Y , Z ) = min Q Δ P , CoI ( T ; X ; Y ) = 0 , CoI ( T ; X ; Y Z ) = 0 , w ( Q ) MI ( T ; X , Y , Z ) min Q Δ P , w ( Q ) , CoI ( T ; X ; Y Z ) = 0 MI ( T ; X , Y , Z ) ,
where:
w ( Q ) : = { Q Δ : MI ( T ; X , Y ) = min Q Δ P MI ( T ; X , Y ) , MI ( T ; X , Z ) = min Q Δ P MI ( T ; X , Z ) , MI ( T ; Y , Z ) = min Q Δ P MI ( T ; Y , Z ) } .

Mutual Information over the Trivariate Redundancy Lattice

This subsection writes down some mutual information quantities in terms of the trivariate redundancy lattice’s partial information measures using (A3). The verification that the optimization defined in (A13a–d) quantifies the desired partial information quantities was discussed in detail by [38] and so will be skipped. However, these formulas are needed later when discussing how to compute the individual PID terms using a hierarchy of BROJA and [38] PID decompositions. The mutual information quantities are in terms of redundancy lattice partial information measures.
MI ( T ; X , Y , Z ) will be the sum of the partial information measure on every node of the redundancy lattice C as follows.
MI ( T ; X , Y , Z ) = δ ( T , X Y Z ) + δ ( T , X Y ) + δ ( T , X Z ) + δ ( T , Y Z ) + δ ( T , X Y . X Z ) + δ ( T , X Y . Y Z ) + δ ( T , X Z . Y Z ) + δ ( T , X Y . X Z . Y Z ) + δ ( T , X ) + δ ( T , Y ) + δ ( T , Z ) + δ ( T , X . Y Z ) + δ ( T , Y . X Z ) + δ ( T , Z . X Y ) + δ ( T , X . Y ) + δ ( T , X . Z ) + δ ( T , Y . Z ) + δ ( T , X . Y . Z ) .
For all X i , X j , X k { X , Y , Z } , the mutual information of two sources (jointly) and the target is expressed as:
MI ( T ; X i , X j ) = δ ( T , X i X j ) + δ ( T , X i X j . X i X k ) + δ ( T , X i X j . X j X k ) + δ ( T , X i X j . X i X k . X j X k ) + δ ( T , X i ) + δ ( T , X j ) + δ ( T , X i . X j X k ) + δ ( T , X j . X i X k ) + δ ( T , X k . X i X j ) + δ ( T , X i . X j ) + δ ( T , X i . X k ) + δ ( T , X j . X k ) + δ ( T , X i . X j . X k ) .
For all X i , X j , X k { X , Y , Z } , the mutual information of one source and the target is as follows:
MI ( T ; X i ) = δ ( T , X i ) + δ ( T , X i . X j X k ) + δ ( T , X i . X j ) + δ ( T , X i . X k ) + δ ( T , X i . X j . X k ) .
For all X i , X j , X k { X , Y , Z } , the mutual information of two sources (jointly) and the target conditioned on knowing the other source is evaluated as:
MI ( T ; X i , X j X k ) = δ ( T , X i X j X k ) + δ ( T , X i X j ) + δ ( T , X i X k ) + δ ( T , X j X k ) + δ ( T , X i X j . X i X k ) + δ ( T , X i X j . X j X k ) + δ ( T , X i X k . X j X k ) + δ ( T , X i X j . X i X k . X j X k ) + δ ( T , X i ) + δ ( T , X j ) + δ ( T , X i . X j X k ) + δ ( T , X j . X i X k ) + δ ( T , X i . X j ) .
For all X i , X j , X k { X , Y , Z } , the mutual information of one source and the target conditioned on knowing only one of the other sources is written as:
MI ( T ; X i X j ) = δ ( T , X i X j ) + δ ( T , X i X j . X i X k ) + δ ( T , X i X j . X j X k ) + δ ( T , X i X j . X i X k . X j X k ) + δ ( T , X i ) + δ ( T , X i . X j X k ) + δ ( T , X k . X i X j ) + δ ( T , X i . X k ) .
For all X i , X j , X k { X , Y , Z } , the mutual information of one source and the target conditioned on knowing the other sources is:
MI ( T ; X i X j , X k ) = δ ( T , X i X j X k ) + δ ( T , X i X j ) + δ ( T , X i X k ) + δ ( T , X i X j . X i X k ) + δ ( T , X i ) .
For all X i , X j , X k { X , Y , Z } , the co-information of two sources and the target is expressed as:
CoI ( T ; X i ; X j ) = δ ( T , X i . X j ) + δ ( T , X i . X j . X k ) ( δ ( T , X i X j ) + δ ( T , X i X j . X i X k ) + δ ( T , X i X j . X j X k ) + δ ( T , X i X j . X i X k . X j X k ) + δ ( T , X k . X i X j ) ) .
For all X i , X j , X k { X , Y , Z } , the co-information of one source, two sources (jointly), and the target is as follows:
CoI ( T ; X i ; X j , X k ) = δ ( T , X i . X j X k ) + δ ( T , X i . X j ) + δ ( T , X i . X k ) + δ ( T , X i . X j . X k ) δ ( T , X i X j X k ) + δ ( T , X i X j ) + δ ( T , X i X k ) + δ ( T , X i X j . X i X k ) .
For all X i , X j , X k { X , Y , Z } , the co-information of two sources (jointly), two sources (jointly), and the target is evaluated as:
CoI ( T ; X i , X j ; X i , X k ) = δ ( T , X i X j . X i X k ) + δ ( T , X i X j . X i X k . X j X k ) + δ ( T , X i ) + δ ( T , X k . X i X j ) + δ ( T , X i . X j ) + δ ( T , X i . X j X k ) + δ ( T , X j . X i X k ) + δ ( T , X i . X k ) + δ ( T , X j . X k ) + δ ( T , X i . X j . X k ) δ ( T , X i X j X k ) δ ( T , X j X k ) .
For all X i , X j , X k { X , Y , Z } , the co-information of two sources and the target conditioning on knowing the other source can be written as:
CoI ( T ; X i ; X j X k ) = δ ( T , X i X k . X j X k ) + δ ( T , X i X j . X i X k . X j X k ) + δ ( T , X i . X j X k ) + δ ( T , X j . X i X k ) + δ ( T , X i . X j ) δ ( T , X i X j X k ) δ ( T , X i X k ) .

Appendix D. The Finer Quantities of Trivariate Maximum Entropy PID

In Appendix C, the maximum entropy decomposition for trivariate PID returns a synergistic term, which is the sum of all individual synergy quantities, and a unique term, which is the sum of unique and unique redundancy quantities. This section aims to show how to use maximum entropy decomposition for bivariate PID in order to obtain each individual synergy quantity, as well as each individual unique and unique redundancy quantity.
Let T be the target random variable, X , Y , Z be the source random variables, and P be the joint probability distribution of ( T , X , Y , Z ) . Now, BROJA will be applied to some subsystems of ( T , X , Y , Z ) , namely, ( T , ( X i , X j ) , X k ) , (one single source) and ( T , ( X i , X j ) , ( X i , X k ) ) (two double sources) for all X i , X j , X k { X , Y , Z } . Not that the pairs ( X i , X j ) and ( X i , X k ) are ordered alphabetically. Consider the following probability polytopes upon which the optimization will be carried out:
Δ P = { Q Δ ; Q ( T , X ) = P ( T , X ) , Q ( T , Y ) = P ( T , Y ) , Q ( T , Z ) = P ( T , Z ) } Δ P X i , X j . X k = { Q Δ ; Q ( T , X i , X j ) = P ( T , X i , X j ) , Q ( T , X k ) = P ( T , X k ) } where X i X j , X i X k , X j X k for all X i , X j , X k { X , Y , Z } Δ P X i , X j . X i , X k = { Q Δ ; Q ( T , X i , X j ) = P ( T , X i , X j ) , Q ( T , X i , X k ) = P ( T , X i , X k ) } where X i X j , X i X k , X j X k for all X i , X j , X k { X , Y , Z } .
Note that Δ P Δ P X i , X j . X k Δ P X i , X j . X i , X k for all X i , X j , X k { X , Y , Z } .

Appendix D.1. One Single Source Subsystems

These subsystems have the form ( T , ( X i , X j ) , X k ) where X i , X j , X k { X , Y , Z } , X i X j X k , and X i X k . Now, apply the BROJA decomposition to the subsystem ( T , ( X , Y ) , Z ) . Therefore, its four PID quantities are defined as follows:
CI ˜ ( T ; X , Y ) = MI ( T ; X , Y , Z ) min Q Δ P X Y . Z MI ( T ; X , Y , Z ) UI ˜ ( T ; X , Y Z ) = min Q Δ P X Y . Z MI ( T ; X , Y , Z ) min Q Δ P X Y . Z MI ( T ; Z ) UI ˜ ( T ; Z X , Y ) = min Q Δ P X Y . Z MI ( T ; X , Y , Z ) min Q Δ P X Y . Z MI ( T ; X , Y ) SI ˜ ( T ; X , Y , Z ) = min Q Δ P X Y . Z , CoI ( T ; X , Y ; Z ) = 0 MI ( T ; X , Y , Z ) min Q Δ P X Y . Z MI ( T ; X , Y , Z ) .
Note that the ( X , Y ) marginal distribution is fixed. This implies that the mutual information MI ( T ; X , Y ) , MI ( T ; X Y ) , MI ( T ; Y X ) , and CoI ( T ; X ; Y ) is invariant over Δ P X Y . Z . Therefore, the summands δ ( T , X Y ) + δ ( T , X Y . X Z ) + δ ( T , X Y . Y Z ) + δ ( T , X Y . X Z . Y Z ) + δ ( T , Z . X Y ) are fixed. However, from Assumption A2 and the fact that the X , Y marginal is fixed, the redundancy δ ( T ; Z . X Y ) is invariant over Δ P X Y . Z . Thus, in addition to 2 in Assumption A2, the following partial information measures are invariant over Δ P X Y . Z .
  • δ ( T ; Z . X Y ) since the ( X , Y ) marginal is fixed.
  • δ ( T ; Z ) since MI ( T ; Z ) and δ ( T ; Z . X Y ) are invariant over Δ P X Y . Z .
  • δ ( T , X Y ) + δ ( T , X Y . X Z ) + δ ( T , X Y . Y Z ) + δ ( T , X Y . X Z . Y Z ) since CoI ( T ; X , Y ) and δ ( T ; Z . X Y ) are invariant over Δ P X Y . Z .
Thus, using Assumption A2 and the definition of MI ( T ; X , Y , Z ) over the redundancy lattice,
min Q Δ P X Y . Z MI ( T ; X , Y , Z ) = δ ( T , X Y ) + δ ( T , X Y . X Z ) + δ ( T , X Y . Y Z ) + δ ( T , X Y . X Z . Y Z ) + δ ( T , X ) + δ ( T , Y ) + δ ( T , Z ) + δ ( T , X . Y Z ) + δ ( T , Y . X Z ) + δ ( T , Z . X Y ) + δ ( T , X . Y ) + δ ( T , X . Z ) + δ ( T , Y . Z ) + δ ( T , X . Y . Z ) .
The synergy is evaluated as:
CI ˜ ( T ; X , Y ) = MI ( T ; X , Y , Z ) min Q Δ P X Y . Z MI ( T ; X , Y , X ) = δ ( T , X Y Z ) + δ ( T , X Z ) + δ ( T , Y Z ) + δ ( T , X Z . Y Z ) .
The unique information of ( X , Y ) in terms of the redundancy lattice atoms is:
UI ˜ ( T ; X Y Z ) = min Q Δ P X Y . Z MI ( T ; X , Y , Z ) min Q Δ P X Y . Z MI ( T ; Z ) = δ ( T , X Y ) + δ ( T , X Y . X Z ) + δ ( T , X Y . Y Z ) + δ ( T , X Y . X Z . Y Z ) + δ ( T , X ) + δ ( T , Y ) + δ ( T , X . Y Z ) + δ ( T , Y . X Z ) + δ ( T , X . Y ) .
The unique information of Z is written as:
UI ˜ ( T ; Z X Y ) = min Q Δ P X Y . Z MI ( T ; X , Y , Z ) min Q Δ P X Y . Z MI ( T ; X , Y ) = δ ( T , Z )
When CoI ( T ; X , Y ; Z ) = 0 , then δ ( T , Z . X Y ) + δ ( T , X . Z ) + δ ( T , Y . Z ) + δ ( T , X . Y . Z ) is equal to δ ( T , X Y Z ) + δ ( T , X Z ) + δ ( T , Y Z ) + δ ( T , X Z . Y Z ) .
Then, in terms of redundancy lattice atoms, the shared information of ( X , Y ) and Z is:
SI ˜ ( T ; X , Y , Z ) = min Q Δ P X Y . Z , CoI ( T ; X , Y ; Z ) = 0 MI ( T ; X , Y , Z ) min Q Δ P X Y . Z MI ( T ; X , Y , Z ) = δ ( T , Z . X Y ) + δ ( T , X . Z ) + δ ( T , Y . Z ) + δ ( T , X . Y . Z ) .
Hence, for all X i , X j , X k { X , Y , Z } , the BROJA decomposition of ( T , ( X i , X j ) , X k ) is:
CI ˜ ( T ; ( X i , X j ) , X k ) = δ ( T ; X i X j X k ) + δ ( T ; X i X k ) + δ ( T ; X j X k ) + δ ( T ; X i X k . X j X k ) UI ˜ ( T ; ( X i , X j ) X k ) = δ ( T ; X i ) + δ ( T ; X i . X j X k ) + δ ( T ; X j ) + δ ( T ; X j . X i X k ) + δ ( T ; X i . X j ) + δ ( T ; X i X j ) + δ ( T ; X i X j . X i X k ) + δ ( T ; X i X j . X j X k ) + δ ( T ; X i X j . X i X k . X j X k ) UI ˜ ( T ; X k X i , X j ) = δ ( T ; X k ) SI ˜ ( T ; X i , X j , X k ) = δ ( T ; X k . X i X j ) + δ ( T ; X i . X k ) + δ ( T ; X j . X k ) + δ ( T ; X i . X j . X k ) .

Appendix D.2. Two Double Source Subsystems

These subsystems have the form ( T , ( X i , X j ) , ( X i , X k ) ) where X i , X j , X k { X , Y , Z } , X i X j , X i X k , and X j X k . Now, apply the BROJA decomposition to the subsystem ( T , ( X , Y ) , ( X , Z ) ) . Therefore, its four PID quantities are defined as follows:
CI ˜ ( T ; X , Y ) = MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) UI ˜ ( T ; ( X , Y ) ( X , Z ) ) = min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z MI ( T ; X , Z ) UI ˜ ( T ; ( X , Z ) ( X , Y ) ) = min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z MI ( T ; X , Y ) SI ˜ ( T ; ( X , Y ) , ( X , Z ) ) = min Q Δ P X Y . X Z , CoI ( T ; X , Y ; X , Z ) = 0 MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) .
Note that the ( X , Y ) and ( X , Z ) marginal distributions are fixed. Then, MI ( T ; X 1 , X 2 ) , MI ( T ; X 1 X 2 ) , CoI ( T ; X 1 ; X 2 ) , and MI ( T ; X : Y , X : Z ) are invariant over Δ P X Y . X Z for ( X 1 , X 2 ) = ( X , Y ) and ( X 1 , X 2 ) = ( X , Z ) . Therefore, for  ( X 1 , X 2 , X 3 ) = ( X , Y , Z ) and ( X 1 , X 2 , X 3 ) = ( X , Z , Y ) , δ ( T , X 1 X 2 ) + δ ( T , X 1 X 2 . X 1 X 3 ) + δ ( T , X 1 X 2 . X 2 , X 3 ) + δ ( T , X 1 X 2 . X 1 X 3 . X 2 X 3 ) + δ ( T , X 3 . X 1 X 2 ) are fixed. However, from Assumption A2 and the two fixed ( X , Y ) and ( X , Z ) marginals, then the redundancies δ ( T ; Z . X Y ) and δ ( T ; Y . X Z ) are invariant over Δ P X Y . X Z . Therefore, in addition to 2 in Assumption A2, the following partial information measures are invariant Δ P X Y . X Z :
  • δ ( T ; X i . X X j ) , for all X i , X j { Y , Z } since the ( X , X j ) marginal is fixed.
  • δ ( T ; X Y . X Z ) + δ ( T , X Y . X Z . Y Z ) since MI ( T ; X : Y , X : Z ) and δ ( T ; Z . X Y ) are invariant.
  • δ ( T ; X i ) , for all X i , X j { Y , Z } since MI ( T ; X i ) and δ ( T ; X i . X X j ) are invariant over Δ P X Y . X Z .
  • δ ( T , X X i ) + δ ( T , X X i . X i X j ) , is invariant for all X i , X j { Y , Z } since δ ( T ; X X i . X X j ) + δ ( T , X X i . X X j . X i X j ) , and CoI ( T ; X ; X i ) , δ ( T ; X j . X X i ) are invariant over Δ P X Y . X Z .
Thus, using Assumption A2 and the definition of MI ( T ; X , Y , Z ) over the redundancy lattice,
min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) = δ ( T , X Y ) + δ ( T , X Z ) + δ ( T , X Y . X Z ) + δ ( T , X Y . Y Z ) + δ ( T , X Z . Y Z ) + δ ( T , X Y . X Z . Y Z ) + δ ( T , X ) + δ ( T , Y ) + δ ( T , Z ) + δ ( T , X . Y Z ) + δ ( T , Y . X Z ) + δ ( T , Z . X Y ) + δ ( T , X . Y ) + δ ( T , X . Z ) + δ ( T , Y . Z ) + δ ( T , X . Y . Z ) .
The synergy is evaluated as:
CI ˜ ( T ; X , Y ) = MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) = δ ( T , X Y Z ) + δ ( T , Y Z ) .
The unique information of ( X , Y ) is expressed as:
UI ˜ ( T ; X , Y X , Z ) = min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z MI ( T ; X , Z ) = δ ( T , X Y ) + δ ( T , X Y . Y Z ) + δ ( T , Y ) .
The unique information of ( X , Z ) is written as:
UI ˜ ( T ; X , Z X , Y ) = min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z MI ( T ; X , Y ) = δ ( T , Y Z ) + δ ( T , X Z . Y Z ) + δ ( T , Z ) .
When CoI ( T ; X , Y ; X , Z ) = 0 , then:
δ ( T , X Y Z ) + δ ( T , Y Z ) = δ ( T , X Y . X Z ) + δ ( T , X Y . X Z . Y Z ) + δ ( T , X ) + δ ( T , X . Y Z ) + δ ( T , Y . X Z ) + δ ( T , Z . X Y ) + δ ( T , X . Y ) + δ ( T , X . Z ) + δ ( T , Y . Z ) + δ ( T , X . Y . Z ) .
The shared information of X , Y and X , Z is evaluated as:
SI ˜ ( T ; X , Y , Z ) = min Q Δ P X Y . X Z , CoI ( T ; X , Y ; X , Z ) = 0 MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z MI ( T ; X , Y , Z ) = δ ( T , X Y . X Z ) + δ ( T , X Y . X Z . Y Z ) + δ ( T , X ) + δ ( T , X . Y Z ) + δ ( T , Y . X Z ) + δ ( T , Z . X Y ) + δ ( T , X . Y ) + δ ( T , X . Z ) + δ ( T , Y . Z ) + δ ( T , X . Y . Z ) .
Hence, then the BROJA decomposition of the subsystem ( T , ( X i , X j ) , ( X i , X k ) ) is:
CI ˜ ( T ; ( X i , X j ) , ( X i , X k ) ) = δ ( T ; X i X j X k ) + δ ( T ; X j X k ) , UI ˜ ( T ; ( X i , X j ) ( X i , X k ) ) = δ ( T ; X i X j ) + δ ( T ; X i X j . X j X k ) + δ ( T ; X j ) , UI ˜ ( T ; ( X i , X j ) ( X i , X k ) ) = δ ( T ; X i X k ) + δ ( T ; X i X k . X j X k ) + δ ( T ; X k ) , SI ˜ ( T ; ( X i , X j ) , ( X i , X k ) ) = δ ( T ; X i X j . X i X k ) + δ ( T ; X i ) + δ ( T ; X i X j . X i X k . Y j X k ) + δ ( T ; X i . X j X k ) + δ ( T ; X j . X i X k ) + δ ( T ; X k . X i X j ) + δ ( T ; X i . X j ) + δ ( T ; X i . X k ) + δ ( T ; X j . X k ) + δ ( T ; X i . X j . X k ) .

Appendix D.3. Synergy of Three Double Source Systems

Consider the system of the form ( T , ( X , Y ) , ( X , Z ) , ( Y , Z ) ) . The sources here are called composite, as they are compositions of the primary sources X , Y and Z . Applying the PID measure [38] based on maximum entropy decomposition (A13a–d) captures the synergy of composite sources only and cannot capture other contributions such as those involving unique or redundant composite sources; meaning that the optimization (A13a) is the only useful one for a system of composite sources. Therefore, using (A13a), the optimization is taken over the polytope:
Δ P X Y . X Z . Y Z = { Q Δ ; Q ( T , X i , X j ) = P ( T , X i , X j ) for all X i , X j { X , Y , Z } } .
In this polytope, MI ( X i , X j ) , CoI ( X i , X j ) , and MI ( X i X j ) are invariant for all X i , X j { X , Y , Z } . Therefore, in addition to Assumption 2, the following partial information measures are invariant Δ P X Y . X Z . Y Z .
  • δ ( T ; X k . X i X j ) , for all X i , X j , X k { X , Y , Z } since the ( X i , X j ) marginal is fixed.
  • δ ( T ; X i X j . X i X k ) , for all X i , X j , X k { X , Y , Z } since ( X i , X j ) , ( X i , X k ) , and ( X j , X k ) marginals are fixed.
  • δ ( T ; X Y . X Z . Y Z ) since the ( X , Y ) , ( X , Z ) , and ( Y , Z ) marginals are fixed.
  • δ ( T ; X i ) , for all X i , X j , X k { X , Y , Z } since MI ( T ; X i ) and δ ( T ; X i . X j X k ) are invariant over Δ P X Y . X Z . Y Z .
  • δ ( T , X i X j ) , for all X i , X j , X k { X , Y , Z } since CoI ( T ; X i ; X j ) , δ ( T , X i X j . X i X k ) , δ ( T , X i X j . X j X k ) , δ ( T , X i X j . X i X k . X j X k ) , and  Δ ( T ; X k . X i X j ) are invariant over Δ P X Y . X Z . Y Z .
Hence, the only partial information measure that is not fixed is δ ( T ; X Y Z ) and:
min Q Δ P X Y . X Z . Y Z MI ( T ; X , Y , Z ) = δ ( T , X Y ) + δ ( T , X Z ) + δ ( T , Y Z ) + δ ( T , X Y . X Z ) + δ ( T , X Y . Y Z ) + δ ( T , X Z . Y Z ) + δ ( T , X Y . X Z . Y Z ) + δ ( T , X ) + δ ( T , Y ) + δ ( T , Z ) + δ ( T , X . Y Z ) + δ ( T , Y . X Z ) + δ ( T , Z . X Y ) + δ ( T , X . Y ) + δ ( T , X . Z ) + δ ( T , Y . Z ) + δ ( T , X . Y . Z ) .
The synergy is evaluated as:
CI ˜ ( T ; ( X , Y ) , ( X , Z ) , ( Y , Z ) ) = MI ( T ; X , Y , Z ) min Q Δ P X Y . X Z . Y Z MI ( T ; X , Y , X ) = δ ( T , X Y Z ) .

Appendix D.4. Computing the Finest Parts of the Trivariate PID

The values of δ ( T ; X ) , δ ( T ; Y ) , δ ( T ; Z ) , δ ( T ; X . Y Z ) , δ ( T ; Y . X Z ) , and δ ( T ; Z . X Y ) are recovered from UI ˜ ( X k X i , X j ) of ( T , ( X i , X j ) , X k ) and UI ˜ ( X k X i , X j ) of ( T , X , Y , Z ) , for all X i , X j , X k { X , Y , Z } .
To recover the individual synergistic quantities, construct the following system of equations from the synergy of ( T , X , Y , Z ) , ( T , ( X , Y ) , ( X , Z ) , ( Y , Z ) ) , ( T , ( X i , X j ) , X k ) , and  ( T , ( X i , X j ) , ( X i , X k ) ) for all X i , X j , X k { X , Y , Z } .
CI ˜ ( T ; X , Y , Z ) = δ ( T ; X Y Z ) + δ ( T ; X Y . X Z . Y Z ) + δ ( T ; X Y ) + δ ( T ; X Z ) + δ ( T ; Y Z ) + δ ( T ; X Y . X Z ) + δ ( T ; X Y . Y Z ) + δ ( T ; X Z . Y Z ) CI ˜ ( T ; ( X , Y ) , Z ) = δ ( T ; X Y Z ) + δ ( T ; X Z ) + δ ( T ; Y Z ) + δ ( T ; X Z . Y Z ) CI ˜ ( T ; ( X , Z ) , Y ) = δ ( T ; X Y Z ) + δ ( T ; X Y ) + δ ( T ; Y Z ) + δ ( T ; X Y . Y Z ) CI ˜ ( T ; ( Y , Z ) , X ) = δ ( T ; X Y Z ) + δ ( T ; X Y ) + δ ( T ; X Z ) + δ ( T ; X Y . X Z ) CI ˜ ( T ; ( X , Y ) , ( X , Z ) ) = δ ( T ; X Y Z ) + δ ( T ; Y Z ) CI ˜ ( T ; ( X , Y ) , ( Y , Z ) ) = δ ( T ; X Y Z ) + δ ( T ; X Z ) CI ˜ ( T ; ( X , Z ) , ( Y , Z ) ) = δ ( T ; X Y Z ) + δ ( T ; X Y ) CI ˜ ( T ; ( X , Y ) , ( X , Z ) , ( Y , Z ) ) = δ ( T ; X Y Z ) .
This hierarchy that is needed to compute the trivariate PID quantities is implemented in script file test_trivariate_finer_parts.py.

References

  1. Amari, S. Information Geometry on Hierarchy of Probability Distributions. IEEE Trans. Inf. Theory 2001, 47, 1701–1711. [Google Scholar] [CrossRef]
  2. Schneidman, E.; Still, S.; Berry, M.J.; Bialek, W. Network Information and Connected Correlations. Phys. Rev. Lett. 2003, 91, 238701. [Google Scholar] [CrossRef] [PubMed]
  3. Timme, N.; Alford, W.; Flecker, B.; Beggs, J.M. Synergy, Redundancy, and Multivariate Information Measures: An Experimentalist’s Perspective. J. Comput. Neurosci. 2014, 36, 119–140. [Google Scholar] [CrossRef] [PubMed]
  4. Olbrich, E.; Bertschinger, N.; Rauh, J. Information Decomposition and Synergy. Entropy 2015, 17, 3501–3517. [Google Scholar] [CrossRef] [Green Version]
  5. Perrone, P.; Ay, N. Hierarchical Quantification of Synergy in Channels. Front. Robot. AI 2016, 2, 35. [Google Scholar] [CrossRef]
  6. Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
  7. Williams, P.L. Information Dynamics: Its Theory and Application to Embodied Cognitive Systems. Ph.D. Thesis, Indiana University, Bloomington, IN, USA, 2011. [Google Scholar]
  8. Anastassiou, D. Computational Analysis of the Synergy among Multiple Interacting Genes. Mol. Syst. Biol. 2007, 3, 83. [Google Scholar] [CrossRef]
  9. Watkinson, J.; Liang, K.C.; Wang, X.; Zheng, T.; Anastassiou, D. Inference of Regulatory Gene Interactions from Expression Data Using Three-way Mutual Information. Ann. N. Y. Acad. Sci. 2009, 1158, 302–313. [Google Scholar] [CrossRef]
  10. Chatterjee, P.; Pal, N.R. Construction of Synergy Networks from Gene Expression Data Related to Disease. Gene 2016, 590, 250–262. [Google Scholar] [CrossRef]
  11. Katz, Y.; Tunstrøm, K.; Ioannou, C.C.; Huepe, C.; Couzin, I.D. Inferring the Structure and Dynamics of Interactions in Schooling Fish. Proc. Natl. Acad. Sci. USA 2011, 108, 18720–18725. [Google Scholar] [CrossRef]
  12. Flack, J.C. Multiple Time-scales and the Developmental Dynamics of Social Systems. Philos. Trans. R. Soc. B Biol. Sci. 2012, 367, 1802–1810. [Google Scholar] [CrossRef] [PubMed]
  13. Ay, N.; Der, R.; Prokopenko, M. Information-Driven Self-Organization: The Dynamical System Approach to Autonomous Robot Behavior. Theory Biosci. 2012, 131, 125–127. [Google Scholar] [CrossRef] [PubMed]
  14. Frey, S.; Albino, D.K.; Williams, P.L. Synergistic Information Processing Encrypts Strategic Reasoning in Poker. Cogn. Sci. 2018, 42, 1457–1476. [Google Scholar] [CrossRef]
  15. Marre, O.; El Boustani, S.; Frégnac, Y.; Destexhe, A. Prediction of Spatiotemporal Patterns of Neural Activity from Pairwise Correlations. Phys. Rev. Lett. 2009, 102, 138101. [Google Scholar] [CrossRef] [PubMed]
  16. Faes, L.; Marinazzo, D.; Nollo, G.; Porta, A. An Information-Theoretic Framework to Map the Spatiotemporal Dynamics of the Scalp Electroencephalogram. IEEE. Trans. Biomed. Eng. 2016, 63, 2488–2496. [Google Scholar] [CrossRef]
  17. Pica, G.; Piasini, E.; Safaai, H.; Runyan, C.A.; Diamond, M.E.; Fellin, T.; Kayser, C.; Harvey, C.D.; Panzeri, S. Quantifying How Much Sensory Information in a Neural Code is Relevant for Behavior. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  18. Latham, P.E.; Nirenberg, S. Synergy, Redundancy, and Independence in Population Codes, Revisited. J. Neurosci. 2005, 25, 5195–5206. [Google Scholar] [CrossRef]
  19. Ver Steeg, G.; Brekelmans, R.; Harutyunyan, H.; Galstyan, A. Disentangled Representations via Synergy Minimization. arXiv 2017, arXiv:1710.03839. [Google Scholar]
  20. Rauh, J.; Ay, N. Robustness, Canalyzing Functions and Systems Design. Theory Biosci. 2014, 133, 63–78. [Google Scholar] [CrossRef] [PubMed]
  21. Tishby, N.; Pereira, F.C.; Bialek, W. The Information Bottleneck Method. arXiv 2000, arXiv:physics/0004057. [Google Scholar]
  22. Banerjee, P.K.R.; Montúfar, G. The Variational Deficiency Bottleneck. arXiv 2018, arXiv:1810.11677. [Google Scholar]
  23. Harder, M.; Salge, C.; Polani, D. Bivariate Measure of Redundant Information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef] [PubMed]
  24. Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying Unique Information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef] [Green Version]
  25. Griffith, V.; Koch, C. Quantifying Synergistic Mutual Information. In Guided Self-Organization: Inception; Springer: Berlin/Heidelberg, Germany, 2014; pp. 159–190. [Google Scholar] [Green Version]
  26. Ince, R.A.A. Measuring Multivariate Redundant Information with Pointwise Common Change in Surprisal. Entropy 2017, 19, 318. [Google Scholar] [CrossRef]
  27. James, R.G.; Emenheiser, J.; Crutchfield, J.P. Unique Information via Dependency Constraints. arXiv 2017, arXiv:1709.06653v1. [Google Scholar] [CrossRef]
  28. Chicharro, D.; Panzeri, S. Synergy and Redundancy in Dual Decompositions of Mutual Information Gain and Information Loss. Entropy 2017, 19, 71. [Google Scholar] [CrossRef]
  29. Finn, C.; Lizier, J.T. Pointwise Information Decomposition Using the Specificity and Ambiguity Lattices. arXiv 2018, arXiv:1801.09010v1. [Google Scholar] [CrossRef]
  30. Rauh, J. Secret Sharing and Shared Information. Entropy 2017, 19, 601. [Google Scholar] [CrossRef]
  31. James, R.G.; Emenheiser, J.; Crutchfield, J.P. A Perspective on Unique Information: Directionality, Intuitions, and Secret Key Agreement. arXiv 2017, arXiv:1808.08606. [Google Scholar]
  32. Chicharro, D.; Pica, G.; Panzeri, S. The Identity of Information: How Deterministic Dependencies Constrain Information Synergy and Redundancy. Entropy 2018, 20, 169. [Google Scholar] [CrossRef]
  33. Rauh, J.; Banerjee, P.K.; Olbrich, E.; Jost, J.; Bertschinger, N. On Extractable Shared Information. Entropy 2017, 19, 328. [Google Scholar] [CrossRef]
  34. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: New York, NY, USA, 2006. [Google Scholar]
  35. Wibral, M.; Lizier, J.T.; Priesemann, V. Bits from Brains for Biologically Inspired Computing. Front. Robot. AI 2015, 2, 5. [Google Scholar] [CrossRef]
  36. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  37. Pica, G.; Piasini, E.; Chicharro, D.; Panzeri, S. Invariant Components of Synergy, Redundancy, and Unique Information among Three Variables. Entropy 2017, 19, 451. [Google Scholar] [CrossRef]
  38. Chicharro, D. Quantifying Multivariate Redundancy with Maximum Entropy Decompositions of Mutual Information. arXiv 2017, arXiv:1708.03845v2. [Google Scholar]
  39. Makkeh, A.; Theis, D.O.; Vicente, R. Bivariate Partial Information Decomposition: The Optimization Perspective. Entropy 2017, 19, 530. [Google Scholar] [CrossRef]
  40. Banerjee, P.K.; Rauh, J.; Montúfar, G. Computing the Unique Information. arXiv 2018, arXiv:1709.07487v2. [Google Scholar]
  41. Makkeh, A.; Theis, D.O.; Vicente, R. Broja_2Pid: A Robust Estimator for Bivariate Partial Information Decomposition. Entropy 2018, 20, 271. [Google Scholar] [CrossRef]
  42. James, R.G.; Ellison, C.J.; Crutchfield, J.P. dit: A Python Package for Discrete Information Theory. J. Open Source Softw. 2018, 3, 738. [Google Scholar] [CrossRef]
  43. Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared Information—New Insights and Problems in Decomposing Information in Complex Systems. In Proceedings of the European Conference on Complex Systems 2012; Springer: Cham, Switzerland, 2012; pp. 251–269. [Google Scholar]
  44. Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  45. Stramaglia, S.; Angelini, L.; Wu, G.; Cortes, J.M.; Faes, L.; Marinazzo, D. Synergetic and Redundant Information Flow Detected by Unnormalized Granger Causality: Application to Resting State fMRI. IEEE. Trans. Biomed. Eng. 2016, 63, 2518–2524. [Google Scholar] [Green Version]
  46. Wibral, M.; Finn, C.; Wollstadt, P.; Lizier, J.T.; Priesemann, V. Quantifying Information Modification in Developing Neural Networks via Partial Information Decomposition. Entropy 2017, 19, 494. [Google Scholar] [CrossRef]
  47. Ghazi-Zahedi, K.; Langer, C.; Ay, N. Morphological Computation: Synergy of Body and Brain. Entropy 2017, 19, 456. [Google Scholar] [CrossRef]
  48. Faes, L.; Marinazzo, D.; Stramaglia, S. Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes. Entropy 2017, 19, 408. [Google Scholar] [CrossRef]
  49. Tax, T.M.S.; Mediano, P.A.M.; Shanahan, M. The Partial Information Decomposition of Generative Neural Network Models. Entropy 2017, 19, 474. [Google Scholar] [CrossRef]
  50. Schwartz-Ziv, R.; Tishby, N. Opening the Black Box of Deep Neural Networks via Information. arXiv 2017, arXiv:1703.00810. [Google Scholar]
  51. Makkeh, A. Applications of Optimization in Some Complex Systems. Ph.D. Thesis, University of Tartu, Tartu, Estonia, 2018. [Google Scholar]
  52. Domahidi, A.; Chu, E.; Boyd, S. ECOS: An SOCP Solver for Embedded Systems. In Proceedings of the European Control Conference, Zurich, Switzerland, 17–19 July 2013; pp. 3071–3076. [Google Scholar]
  53. Makkeh, A.; Theis, D.O.; Vicente, R.; Chicharro, D. A Trivariate PID Estimator. 2018. Available online: https://github.com/Abzinger/MAXENT3Dunderlinetag|PID (accessed on 21 June 2018).
  54. Mehrotra, S. On the implementation of a primal-dual interior point method. SIAM J. Optim. 1992, 2, 575–601. [Google Scholar] [CrossRef]
  55. Potra, F.A.; Wright, S.J. Interior-point methods. J. Comput. Appl. Math. 2000, 124, 281–302. [Google Scholar] [CrossRef] [Green Version]
  56. James, R.G.; Crutchfield, J.P. Multivariate Dependence Beyond Shannon Information. Entropy 2017, 19, 531. [Google Scholar] [CrossRef]
  57. Lizier, J.T.; Flecker, B.; Williams, P.L. Towards a Synergy-based Approach to Measuring Information Modification. In Proceedings of the 2013 IEEE Symposium on Artificial Life (ALife), Singapore, 16–19 April 2013; pp. 43–51. [Google Scholar]
  58. Wibral, M.; Priesemann, V.; Kay, J.W.; Lizier, J.T.; Phillips, W.A. Partial Information Decomposition as a Unified Approach to the Specification of Neural Goal Functions. Brain Cogn. 2015, 112, 25–38. [Google Scholar] [CrossRef]
  59. Banerjee, P.K.; Griffith, V. Synergy, Redundancy, and Common Information. arXiv 2015, arXiv:1509.03706v1. [Google Scholar]
  60. Kay, J.W.; Ince, R.A.A. Exact Partial Information Decompositions for Gaussian Systems Based on Dependency Constraints. arXiv 2018, arXiv:1803.02030v1. [Google Scholar] [CrossRef]
  61. Crosato, E.; Jiang, L.; Lecheval, V.; Lizier, J.T.; Wang, X.R.; Tichit, P.; Theraulaz, G.; Prokopenko, M. Informative and Misinformative Interactions in a School of Fish. Swarm Intell. 2018, 12, 283–305. [Google Scholar] [CrossRef]
  62. Sootla, S.; Theis, D.O.; Vicente, R. Analyzing Information Distribution in Complex Systems. Entropy 2017, 19, 636. [Google Scholar] [CrossRef]
  63. Erwin, D.H.; Davidson, E.H. The Evolution of Hierarchical Gene Regulatory Networks. Nat. Rev. Genet. 2009, 10, 141–148. [Google Scholar] [CrossRef] [PubMed]
  64. Olshausen, B.A.; Field, D.J. Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1? Vis. Res. 1997, 37, 3311–3325. [Google Scholar] [CrossRef]
  65. Palmer, S.E.; Marre, O.; Berry, M.J.; Bialek, W. Predictive Information in a Sensory Population. Proc. Natl. Acad. Sci. USA 2015, 112, 6908–6913. [Google Scholar] [CrossRef]
  66. Faes, L.; Kugiumtzis, D.; Nollo, G.; Jurysta, F.; Marinazzo, D. Estimating the Decomposition of Predictive Information in Multivariate Systems. Phys. Rev. E 2015, 91, 032904. [Google Scholar] [CrossRef]
  67. Chicharro, D.; Ledberg, A. Framework to Study Dynamic Dependencies in Networks of Interacting Processes. Phys. Rev. E 2012, 86, 041901. [Google Scholar] [CrossRef]
  68. Schreiber, T. Measuring Information Transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef] [Green Version]
  69. Sun, J.; Cafaro, C.; Bollt, E. Identifying the coupling structure in complex systems through the optimal causation entropy principle. Entropy 2014, 16, 3416–3433. [Google Scholar] [CrossRef]
  70. Vicente, R.; Wibral, M.; Lindner, M.; Pipa, G. Transfer Entropy: A Model-free Measure of Effective Connectivity for the Neurosciences. J. Comput. Neurosci. 2011, 30, 45–67. [Google Scholar] [CrossRef]
  71. Hlaváčková-Schindler, K.; Paluš, M.; Vejmelka, M.; Bhattacharya, J. Causality detection based on information-theoretic approaches in time series analysis. Phys. Rep. 2007, 441, 1–46. [Google Scholar] [CrossRef]
  72. Vicente, R.; Wibral, M. Efficient estimation of information transfer. In Directed Information Measures in Neuroscience; Springer: Berlin/Heidelberg, Germany, 2014; pp. 37–58. [Google Scholar]
  73. Valdes-Sosa, P.A.; Roebroeck, A.; Daunizeau, J.; Friston, K. Effective Connectivity: Influence, Causality and Biophysical Modeling. NeuroImage 2011, 58, 339–361. [Google Scholar] [CrossRef] [PubMed]
  74. Wibral, M.; Vicente, R.; Lizier, J.T. Directed Information Measures in Neuroscience; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
  75. Wibral, M.; Vicente, R.; Lindner, M. Transfer entropy in neuroscience. In Directed Information Measures in Neuroscience; Springer: Berlin/Heidelberg, Germany, 2014; pp. 3–36. [Google Scholar]
  76. Deco, G.; Tononi, G.; Boly, M.; Kringelbach, M.L. Rethinking Segregation and Integration: Contributions of Whole-brain Modelling. Nat. Rev. Neurosci. 2015, 16, 430–439. [Google Scholar] [CrossRef] [PubMed]
  77. Daniels, B.C.; Ellison, C.J.; Krakauer, D.C.; Flack, J.C. Quantifying Collectivity. Curr. Opin. Neurobiol. 2016, 37, 106–113. [Google Scholar] [CrossRef] [PubMed]
  78. Linsker, R. Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network. Neural Comput. 1992, 4, 691–702. [Google Scholar] [CrossRef]
  79. Bell, J.A.; Sejnowski, T.J. An Information Maximisation Approach to Blind Separation and Blind Deconvolution. Neural Comput. 1995, 7, 1129–1159. [Google Scholar] [CrossRef] [PubMed]
  80. Barrett, A.B. Exploration of Synergistic and Redundant Information Sharing in Static and Dynamical Gaussian Systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef]
  81. Lizier, J.T. JIDT: An information-theoretic toolkit for studying the dynamics of complex systems. Front. Robot. AI 2014, 1, 11. [Google Scholar] [CrossRef]
Figure 1. A flowchart describing the process of computing the trivariate PID via MaxEnt3D_Pid. It gives an overview of how pid() utilizes the classes Solve_w_ECOS and QP in the aim of computing the trivariate PID.
Figure 1. A flowchart describing the process of computing the trivariate PID via MaxEnt3D_Pid. It gives an overview of how pid() utilizes the classes Solve_w_ECOS and QP in the aim of computing the trivariate PID.
Entropy 21 00862 g001
Figure 2. Using MaxEnt3D_Pid to compute the PID of the distribution obtained from the AndDuplicate gate (andDgate). The AndDuplicate gate evaluates T as the logical and of X and Y ( X Y ) such that Z copies X .
Figure 2. Using MaxEnt3D_Pid to compute the PID of the distribution obtained from the AndDuplicate gate (andDgate). The AndDuplicate gate evaluates T as the logical and of X and Y ( X Y ) such that Z copies X .
Entropy 21 00862 g002
Figure 3. Tuning the parameters of ECOS.
Figure 3. Tuning the parameters of ECOS.
Entropy 21 00862 g003
Figure 4. Brief stats from ECOS after solving Problem (1a).
Figure 4. Brief stats from ECOS after solving Problem (1a).
Entropy 21 00862 g004
Figure 5. Detailed stats from ECOS after solving Problem (1a).
Figure 5. Detailed stats from ECOS after solving Problem (1a).
Entropy 21 00862 g005
Figure 6. The variation of the unique information, as the size of Z increases, for the random probability distributions described in Section 3.3. It shows that the value of the unique information of Z increases as the dimension of Z increases.
Figure 6. The variation of the unique information, as the size of Z increases, for the random probability distributions described in Section 3.3. It shows that the value of the unique information of Z increases as the dimension of Z increases.
Entropy 21 00862 g006
Figure 7. Box plotting of the time for MaxEnt3D_Pid to compute the PID of random joint probability distributions of ( T , X , Y , Z ) for | T | = | X | = | Y | = 2 and different sizes of Z. For the size of the sets explored, the computational time shows a flat trend, and its variance is small.
Figure 7. Box plotting of the time for MaxEnt3D_Pid to compute the PID of random joint probability distributions of ( T , X , Y , Z ) for | T | = | X | = | Y | = 2 and different sizes of Z. For the size of the sets explored, the computational time shows a flat trend, and its variance is small.
Entropy 21 00862 g007
Table 1. The keys of the trivariate PID quantities in the returned dictionary. Note that UI ( T ; X i X j , X k ) and UI ( T ; X i , X k X j ) refer to unique and unique redundant information for X i , X k , X j { X , Y , Z } , CI ( T ; X , Y , Z ) refers to synergistic information, and  SI ( T ; X , Y , Z ) refers to redundant or shared information.
Table 1. The keys of the trivariate PID quantities in the returned dictionary. Note that UI ( T ; X i X j , X k ) and UI ( T ; X i , X k X j ) refer to unique and unique redundant information for X i , X k , X j { X , Y , Z } , CI ( T ; X , Y , Z ) refers to synergistic information, and  SI ( T ; X , Y , Z ) refers to redundant or shared information.
KeysValuesKeysValues
’UIX’ UI ( T ; X Y , Z ) ’UIYZ’ UI ( T ; Y , Z X )
’UIY’ UI ( T ; Y X , Z ) ’UIXZ’ UI ( T ; X , Z Y )
’UIZ’ UI ( T ; Z X , Y ) ’UIXY’ UI ( T ; X , Y Z )
’CI’ CI ( T ; X , Y , Z ) ’SI’ SI ( T ; X , Y , Z )
Table 2. The keys of optimality violations for each problem (Section 2.1a,b) in the returned dictionary.
Table 2. The keys of optimality violations for each problem (Section 2.1a,b) in the returned dictionary.
KeyValue
’Num_Err_I’Optimality violations of min Δ P MI ( T ; X , Y , Z )
’Num_Err_12’Optimality violations of min Δ P MI ( T ; X , Y )
’Num_Err_13’Optimality violations of min Δ P MI ( T ; X , Z )
’Num_Err_23’Optimality violations of min Δ P MI ( T ; Y , Z )
Table 3. Parameters (tolerances) that govern the optimization in ECOS.
Table 3. Parameters (tolerances) that govern the optimization in ECOS.
ParameterDescriptionDefault Value
feastolprimal/dual feasibility tolerance 10 7
abstolabsolute tolerance on the duality gap 10 6
reltolrelative tolerance on the duality gap 10 6
feastol_inaccprimal/dual infeasibility relaxed tolerance 10 3
abstol_inaccabsolute relaxed tolerance on the duality gap 10 4
reltol_inaccrelaxed relative duality gap 10 4
max_itermaximum number of iterations that ECOS does100
Table 5. Paradigmatic gates with a brief explanation of their operation, where ⊕ is the logical Xor and ∧ is the logical And.
Table 5. Paradigmatic gates with a brief explanation of their operation, where ⊕ is the logical Xor and ∧ is the logical And.
InstanceOperation
XorDuplicate T = X Y ; Z = X ; X , Y i.i.d.
XorLoses T = X Y ; Z = X Y ; X , Y i.i.d.
XorMultiCoal T = U V W ; X = ( U , V ) ,
Y = ( U , W ) , Z = ( V , W ) ; U , V , W i.i.d.
AndDuplicate T = X Y ; Z = X ; X , Y i.i.d.
Table 6. Copy gate results. The results are divided into three sets ordered increasingly w.r.t. the size of the joint distributions. Dimensions capture the unordered triplet ( | X | , | Y | , | Z | ) , and the deviation is computed as the maximum over all PID quantities of 100 | r ˜ r | where r ˜ is the obtained PID quantity and r is the analytical PID quantity. Note that the theoretical results are either zero or log 2 ( | S | ) , where S X , Y , Z .
Table 6. Copy gate results. The results are divided into three sets ordered increasingly w.r.t. the size of the joint distributions. Dimensions capture the unordered triplet ( | X | , | Y | , | Z | ) , and the deviation is computed as the maximum over all PID quantities of 100 | r ˜ r | where r ˜ is the obtained PID quantity and r is the analytical PID quantity. Note that the theoretical results are either zero or log 2 ( | S | ) , where S X , Y , Z .
Set 1Set 2Set 3
DimensionsTime (s)Deviation (%)DimensionsTime (s)Deviation (%)DimensionsTime (s)Deviation (%)
(10,10,10)0.82 10 7 (20,20,30)7.53 10 6 (30,30,40)25.72 10 6
(10,10,20)1.06 10 7 (10,30,50)8.67 10 6 (20,40,50)24.32 10 5
(10,10,30)1.62 10 7 (10,40,40)8.68 10 7 (30,30,50)27.90 10 6
(10,10,40)2.08 10 7 (20,20,40)8.85 10 7 (30,40,40)29.85 10 6
(10,20,20)2.21 10 7 (20,30,30)11.41 10 6 (20,50,50)34.94 10 6
(10,10,50)2.61 10 6 (10,40,50)11.44 10 6 (30,40,50)47.40 10 5
(10,20,30)2.99 10 6 (20,20,50)11.34 10 6 (40,40,40)42.21 10 5
(10,20,40)4.11 10 6 (20,30,40)13.00 10 6 (30,50,50)55.60 10 4
(20,20,20)4.96 10 6 (10,50,50)16.37 10 7 (40,40,50)55.18 10 5
(10,30,30)4.43 10 7 (30,30,30)16.28 10 7 (40,50,50)89.58 10 6
(10,20,50)5.51 10 7 (20,30,50)17.24 10 7 (50,50,50)97.74 10 5
(10,30,40)6.51 10 6 (20,40,40)18.34 10 5

Share and Cite

MDPI and ACS Style

Makkeh, A.; Chicharro, D.; Theis, D.O.; Vicente, R. MAXENT3D_PID: An Estimator for the Maximum-Entropy Trivariate Partial Information Decomposition. Entropy 2019, 21, 862. https://doi.org/10.3390/e21090862

AMA Style

Makkeh A, Chicharro D, Theis DO, Vicente R. MAXENT3D_PID: An Estimator for the Maximum-Entropy Trivariate Partial Information Decomposition. Entropy. 2019; 21(9):862. https://doi.org/10.3390/e21090862

Chicago/Turabian Style

Makkeh, Abdullah, Daniel Chicharro, Dirk Oliver Theis, and Raul Vicente. 2019. "MAXENT3D_PID: An Estimator for the Maximum-Entropy Trivariate Partial Information Decomposition" Entropy 21, no. 9: 862. https://doi.org/10.3390/e21090862

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop