Next Article in Journal
Cryptographically Secure PseudoRandom Bit Generator for Wearable Technology
Next Article in Special Issue
Cumulative Residual Entropy of the Residual Lifetime of a Mixed System at the System Level
Previous Article in Journal
Backdoor Attack against Face Sketch Synthesis
Previous Article in Special Issue
On the Uncertainty Properties of the Conditional Distribution of the Past Life Time
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Orders between Channels and Implications for Partial Information Decomposition

by
André F. C. Gomes
* and
Mário A. T. Figueiredo
Instituto de Telecomunicações and LUMLIS (Lisbon ELLIS Unit), Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa, Portugal
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(7), 975; https://doi.org/10.3390/e25070975
Submission received: 5 May 2023 / Revised: 21 June 2023 / Accepted: 22 June 2023 / Published: 25 June 2023
(This article belongs to the Special Issue Measures of Information III)

Abstract

:
The partial information decomposition (PID) framework is concerned with decomposing the information that a set of random variables has with respect to a target variable into three types of components: redundant, synergistic, and unique. Classical information theory alone does not provide a unique way to decompose information in this manner, and additional assumptions have to be made. Recently, Kolchinsky proposed a new general axiomatic approach to obtain measures of redundant information based on choosing an order relation between information sources (equivalently, order between communication channels). In this paper, we exploit this approach to introduce three new measures of redundant information (and the resulting decompositions) based on well-known preorders between channels, contributing to the enrichment of the PID landscape. We relate the new decompositions to existing ones, study several of their properties, and provide examples illustrating their novelty. As a side result, we prove that any preorder that satisfies Kolchinsky’s axioms yields a decomposition that meets the axioms originally introduced by Williams and Beer when they first proposed PID.

1. Introduction

Williams and Beer [1] proposed the partial information decomposition (PID) framework as a way to characterize or analyze the information that a set of random variables (often called sources) has about another variable (referred to as the target). PID is a useful tool for gathering insights and analyzing the way information is stored, modified, and transmitted within complex systems [2,3]. It has found applications in areas such as cryptography [4] and neuroscience [5,6], with many other potential use cases, such as in understanding how information flows function in gene regulatory networks [7], neural coding [8], financial markets [9], and network design [10].
Consider the simplest case, that of a three-variable joint distribution p ( y 1 , y 2 , t ) describing three random variables: two sources Y 1 and Y 2 and a target T. Notice that despite the names sources and target, there is no directionality assumption, either causal or otherwise. The goal of PID is to decompose the information that Y = ( Y 1 , Y 2 ) has about T into the sum of four non-negative quantities: the information that is present in both Y 1 and Y 2 , known as redundant information R; the information that only Y 1 (respectively, Y 2 ) has about T, known as unique information U 1 (respectively, U 2 ); and the synergistic information S that is present in the pair ( Y 1 , Y 2 ) , and is not present in either Y 1 or Y 2 alone. That is, in this case with two variables, the goal is to write
I ( T ; Y ) = R + U 1 + U 2 + S ,
where I ( T ; Y ) is the mutual information between T and Y [11]. In this paper, mutual information is always assumed to refer to Shannon’s mutual information, which for two discrete variables X X and Z Z is provided by
I ( X ; Z ) = x X z Z p ( x , z ) log p ( x , z ) p ( x ) p ( z ) ,
and satisfies the following well-known fundamental properties: I ( X ; Z ) 0 and I ( X ; Z ) = 0 X Z (X and Z are independent) [11].
Because unique information and redundancy satisfy the relationship U i = I ( T ; Y i ) R (for i { 1 , 2 } ), it turns out that defining how to compute one of these quantities (R, U i , or S) is enough to fully determine the others [1]. As the number of variables grows, the number of terms appearing in the PID of I ( T ; Y ) grows super-exponentially [12]. Williams and Beer [1] suggested a set of axioms that a measure of redundancy should satisfy and proposed a measure of their own. These axioms have become known as the Williams–Beer axioms, although the measure they proposed has subsequently been criticized for not capturing informational content, only information size [13].
Spawned by their initial work, other measures and axioms for information decomposition have been introduced; see, for example, the work by Bertschinger et al. [14], Griffith and Koch [15], and James et al. [16]. There is no consensus about what axioms any measure should satisfy or whether a given measure is capturing the information that it should capture other than the Williams–Beer axioms. Today, debate continues about which axioms a measure of redundant information ought to satisfy, and there is no general agreement on what makes for an appropriate PID [16,17,18,19,20].
Recently, Kolchinsky [21] suggested a new general approach to defining measures of redundant information, known as intersection information (II), the designation that we adopt hereinafter. The core of this approach is the choice of an order relation between information sources (random variables), which allows two sources to be compared in terms of how informative they are with respect to the target variable.
In this work, we use previously studied preorders between communication channels, which correspond to preorders between the corresponding output variables in terms of information content with respect to the input. Following Kolchinsky’s approach, we show that these orders lead to the definition of new II measures. The rest of the paper is organized as follows. In Section 2 and Section 3, we review Kolchinsky’s definition of an II measure and the degradation order. In Section 4, we describe a number of preorders between channels then, based on the work by Korner and Marton [22] and Américo et al. [23], we derive the resulting II measures and study of their properties. Section 5 presents comments on the optimization problems involved in computation of the proposed measures. In Section 6, we explore the relationships between the new II measures and previous PID approaches, then apply the proposed II measures to several famous PID problems. Section 7 concludes the paper by pointing out suggestions for future work.

2. Kolchinsky’s Axioms and Intersection Information

Consider a set of n discrete random variables Y 1 Y 1 , , Y n Y n , called the source variables, and let T T be the target variable (also discrete), with a joint distribution (probability mass function) p ( y 1 , , y n , t ) . Let ⪯ denote some preorder between random variables that satisfies the following axioms, herein referred to as Kolchinsky’s axioms [21]:
(i) 
Monotonicity of mutual information w.r.t. T: Y i Y j I ( Y i ; T ) I ( Y j ; T ) .
(ii) 
Reflexivity: Y i Y i for all Y i .
(iii) 
For any Y i , C Y i ( Y 1 , , Y n ) , where C C is any variable taking a constant value with probability one, i.e., with a distribution that is a delta function or such that C is a singleton.
Kolchinsky [21] showed that such an order can be used to define an II measure via
I ( Y 1 , , Y n T ) : = sup Q : Q Y i , i { 1 , . . , n } I ( Q ; T ) ,
and we now show that this implies that the II measure in (2) satisfies the Williams–Beer axioms [1,2], establishing a strong connection between these formulations. Before stating and proving this result, we first recall the Williams–Beer axioms [2], where the definition of a source A i is that of a set of random variables, e.g., A 1 = { X 1 , X 2 } .
Definition 1.
Let A 1 , , A r be an arbitrary number of r 2 sources. An intersection information measure I is said to satisfy the Williams-Beer axioms if it satisfies the following:
1. 
Symmetry: I is symmetric in the A i s.
2. 
Self-redundancy: I ( A i ) = I ( A i ; T ) .
3. 
Monotonicity: I ( A 1 , , A r 1 , A r ) I ( A 1 , , A r 1 ) .
4. 
Equality for Monotonicity: If A r 1 A r , then I ( A 1 , , A r 1 , A r ) = I ( A 1 , , A r 1 ) .
Theorem 1.
Let ⪯ be some preorder that satisfies Kolchinsky’s axioms, and define its corresponding II measure as in (2). Then, the corresponding II measure satisfies the Williams–Beer axioms.
Proof. 
Symmetry and monotonicity follow trivially given the form of (2) (the definition of the supremum and restriction set). Self-redundancy follows from the reflexivity of the preorder and monotonicity of mutual information. Now, suppose A r 1 A r , and let Q be a solution of I ( A 1 , , A r 1 ) , implying that Q A r 1 . Now, because A r 1 A r , the third Kolchinsky axiom and transitivity of the preorder ⪯ guarantee that Q A r 1 A r , meaning that Q is an admissible point of I ( A 1 , , A r ) . Therefore, I ( A 1 , , A r 1 , A r ) I ( A 1 , , A r 1 ) and monotonicity guarantees that I ( A 1 , , A r 1 , A r ) = I ( A 1 , , A r 1 ) .    □
In conclusion, every preorder relation that satisfies the set of axioms introduced by Kolchinsky [21] yields a valid II measure, in the sense that the measure satisfies the Williams–Beer axioms. Having a more informative relation ⪯ allows us to draw conclusions about information flowing from different sources, and allows for the construction of PID measures that are well-defined for more than two sources. In the following, we omit “ T ” from the notation unless we need to explicitly refer to it, with the understanding that the target variable is always some arbitrary discrete random variable T.

3. Channels and the Degradation/Blackwell Order

In an information-theoretical perspective, given two discrete random variables X X and Z Z , the corresponding conditional distribution p ( z | x ) corresponds to a discrete memoryless channel with a channel matrix K such that K [ x , z ] = p ( z | x )  [11]. This matrix is row-stochastic, i.e., K [ x , z ] 0 for any x X and z Z , and z Z K [ x , z ] = 1 .
The comparison of different channels (equivalently, different stochastic matrices) is an object of study with many applications in different fields [24]. Such investigations address order relations between channels and their properties. One such order, named the degradation order (or Blackwell order) and defined next, was used by Kolchinsky to obtain a particular II measure [21].
Consider the distribution p ( y 1 , , y n , t ) and the channels K ( i ) between T and each Y i , that is, K ( i ) is a | T | × | Y i | row-stochastic matrix with the conditional distribution p ( y i | t ) .
Definition 2.
We say that channel K ( i ) is a degradation of channel K ( j ) , and write K ( i ) d K ( j ) or Y i d Y j , if there exists a channel K U from Y j to Y i , i.e., a | Y j | × | Y i | row-stochastic matrix, such that K ( i ) = K ( j ) K U .
Intuitively, consider two agents, one with access to Y i and the other with access to Y j . The agent with access to Y j has at least as much information about T as the one with access to Y i , as it has access to channel K U , which permits sampling from Y i conditionally on Y j  [19]. Blackwell [25] showed that this is equivalent to saying that, for whatever decision game where the goal is to predict T and for whatever utility function, the agent with access to Y i cannot do better on average than the agent with access to Y j .
Based on the degradation/Blackwell order, Kolchinsky [21] introduced the degradation II measure by plugging the “ d ” order into (2):
I d ( Y 1 , , Y n ) : = sup Q : Q d Y i , i { 1 , . . , n } I ( Q ; T ) .
As noted by Kolchinsky [21], this II measure has the following operational interpretation. Supposing that n = 2 and considering two agents, 1 and 2, with access to variables Y 1 and Y 2 , respectively, I d ( Y 1 , Y 2 ) is the maximum information that agent 1 (respectively 2) can have with respect to T without being able to do better than agent 2 (respectively 1) on any decision problem that involves guessing T. That is, the degradation II measure quantifies the existence of a dominating strategy for any guessing game.

4. Other Orders and Corresponding II Measures

4.1. The “Less Noisy” Order

Korner and Marton [22] introduced and studied preorders between channels with the same input. We follow most of their definitions, and change others when appropriate. We interchangeably write Y 1 Y 2 to mean K ( 1 ) K ( 2 ) , where K ( 1 ) and K ( 2 ) are the channel matrices defined above.
Before introducing the next channel order, we need to review the notion of Markov chains [11]. We can say that three random variables X 1 , X 2 , and X 3 form a Markov chain, for which we write X 1 X 2 X 3 , if the following equality holds: p ( x 1 , x 3 | x 2 ) = p ( x 1 | x 2 ) p ( x 3 | x 2 ) , i.e., if X 1 and X 3 are conditionally independent given X 2 . Of course, X 1 X 2 X 3 if and only if X 3 X 2 X 1 .
Definition 3.
We say that channel K ( 2 ) is less noisy than channel K ( 1 ) , and write K ( 1 ) l n K ( 2 ) , if for any discrete random variable U with finite support (such that both U T Y 1 and U T Y 2 hold) we have I ( U ; Y 1 ) I ( U ; Y 2 ) .
The less noisy order has been primarily used in network information theory to study the problems of the capacity regions of broadcast channels [26] and the secrecy capacity of wiretap and eavesdrop channels [27]. The secrecy capacity ( C S ) is the maximum rate at which information can be transmitted over a communication channel while keeping the communication secure from eavesdroppers, that is, having zero information leakage [28,29]. It has been shown that C S > 0 unless K ( 2 ) l n K ( 1 ) , where C S is the secrecy capacity of the Wyner wiretap channel, with K ( 2 ) as the main channel and K ( 1 ) as the eavesdropper channel ([27], Corollary 17.11).
Plugging the less noisy order l n into (2) yields a new II measure
I l n ( Y 1 , , Y n ) : = sup Q : Q l n Y i , i { 1 , . . , n } I ( Q ; T ) .
Intuitively, I l n ( Y 1 , , Y n ) is the most information that a channel K Q can have about T such that it is less noisy than any other channel K ( i ) , i = 1 , , n , that is, a channel that leads to a positive secrecy capacity, as compared to any other channel K ( i ) .

4.2. The “More Capable” Order

The next order we consider, termed “more capable”, has been used in calculating the capacity region of broadcast channels [30] and to help determine whether one system is more secure than another [31]; see the book by Cohen et al. [24] for more applications of the degradation, less noisy, and more capable orders.
Definition 4.
We say that channel K ( 2 ) is more capable than K ( 1 ) , and write K ( 1 ) m c K ( 2 ) , if for any distribution p ( t ) we have I ( T ; Y 1 ) I ( T ; Y 2 ) .
Inserting the “more capable” order into (2) leads to
I m c ( Y 1 , , Y n ) : = sup Q : Q m c Y i , i { 1 , . . , n } I ( Q ; T ) ,
that is, I m c ( Y 1 , , Y n ) is the information that the ‘largest’ (in the more capable sense), though no larger than any Y i , that the random variable Q has with respect to T. Whereas under the degradation order it is guaranteed that agent 2 will make better decisions if Y 1 d Y 2 for whatever decision game, on average, under the “more capable“ order, such a guarantee is not available. However, we do have a guarantee that, if Y 1 m c Y 2 , then for a given distribution p ( t ) we know that agent 2 always has more information about T than agent 1. This has an interventional approach meaning; if we intervene on variable T by changing its distribution p ( t ) in whichever way we see fit, we have I ( Y 1 ; T ) I ( Y 2 ; T ) (assuming that the distribution p ( Y 1 , , Y n , T ) can be modeled as a set of channels from T to each Y i ); that is to say, I m c ( Y 1 , , Y n ) is the highest information that a channel K Q can have about T such that for any change in p ( t ) , K Q knows less about T than any Y i , i = 1 , , n . Because PID is concerned with decomposing a distribution that has fixed p ( t ) , the “more capable” measure is concerned with the mechanism by which T generates Y 1 , , Y n for any p ( t ) , and is not concerned with the specific distribution p ( t ) yielded by p ( Y 1 , , Y n , T ) .
For the sake of completeness, we could additionally study the II measure that would result from the capacity order. Recall that the capacity of the channel from a variable X to another variable Z, which is only a function of the conditional distribution p ( z | x ) , is defined as [11]
C = max p ( x ) I ( X ; Z ) .
Definition 5.
We can write W c V if the capacity of V is at least as large as the capacity of W.
Even though it is clear that W m c V W c V , the c order does not comply with the first of Kolchinsky’s axioms, as the definition of capacity involves the choice of a particular marginal that achieves the maximum in (6), which may not coincide with the marginal corresponding to p ( y 1 , , y n , t ) . For this reason, we do not define an II measure based on it.

4.3. The “Degradation/Supermodularity” Order

In order to introduce the last II measure, we follow the work and notation of Américo et al. [23]. Given two real vectors r and s with dimension n, let r s : = ( max ( r 1 , s 1 ) , , max ( r n , s n ) ) and r s : = ( min ( r 1 , s 1 ) , , min ( r n , s n ) ) . Consider an arbitrary channel K, and let K i be its ith column. From K, we may define a new channel which we construct column by column using the JoinMeet operator i , j . Column l of the new channel is defined for i j as
( i , j K ) l = K i K j , i f   l = i K i K j , i f   l = j K l , o t h e r w i s e .
Américo et al. [23] used this operator to define the two new orders described below. Intuitively, the operator i , j makes the rows of the channel matrix more similar to each other by putting all the maxima in column i and the minima in column j between every pair of elements in columns i and j of every row. In the following definitions, the s stands for supermodularity, a concept we need not introduce in this work.
Definition 6.
We can write W s V if there exists a finite collection of tuples ( i k , j k ) such that W = i 1 , j 1 ( i 2 , j 2 ( ( i m , j m V ) ) .
Definition 7.
Ww write W d s V if there are m channels U ( 1 ) , , U ( m ) such that W 0 U ( 1 ) 1 U ( 2 ) 2 m 1 U ( m ) m V , where each i stands for d or s . We call this the degradation/supermodularity order.
Using the “degradation/supermodularity” (ds) order, we can define the ds II measure as follows:
I d s ( Y 1 , , Y n ) : = sup Q : Q d s Y i , i { 1 , . . , n } I ( Q ; T ) .
The ds order was recently introduced in the context of core-concave entropies [23]. Given a core-concave entropy H, the leakage about T through Y 1 is defined as I H ( T ; Y 1 ) = H ( T ) H ( T | Y 1 ) . In this work, we are mainly concerned with the Shannon entropy H; however, as we elaborate in the future work section at the end of this paper, PID may be applied to other core-concave entropies. Although the operational interpretation of the ds order is not yet clear, it has found applications in privacy/security contexts, as well as in finding the most secure deterministic channel (under certain constraints) [23].

4.4. Relations between Orders

Korner and Marton [22] proved that W d V W l n V W m c V , and provided examples to show that the reverse implications do not hold in general. As Américo et al. [23] note, the degradation ( d ), supermodularity ( s ), and degradation/ supermodularity ( d s ) orders are structural orders, in the sense that they only depend on the conditional probabilities that are defined by each channel. On the other hand, the less noisy and more capable orders are concerned with information measures resulting from different distributions. It is trivial to see (directly from the definition) that the degradation order implies the degradation/supermodular order. In turn, Américo et al. [23] showed that the degradation/supermodular order implies the more capable order. This set of implications is schematically depicted in Figure 1.
For any set of variables Y 1 , , Y n , T , these relations between the orders imply, via the corresponding definitions, that
I d ( Y 1 , , Y n ) I l n ( Y 1 , , Y n ) I m c ( Y 1 , , Y n )
and
I d ( Y 1 , , Y n ) I d s ( Y 1 , , Y n ) I m c ( Y 1 , , Y n ) .
These, in turn, imply the following result.
Theorem 2.
The preorders l n , m c , and d s , satisfy Kolchinsky’s axioms.
Proof. 
Let i { 1 , , n } . Because any of the introduced orders implies the more capable order, it follows that they all satisfy the axiom of monotonicity of mutual information. Axiom 2 is trivially true, as reflexivity is guaranteed by the definition of preorder. For axiom 3, the rows of a channel corresponding to a variable C taking a constant value must all be the same (and yield zero mutual information with any target variable T), from which it is clear that any Y i satisfies C Y i for any of the introduced orders per the definition of each order. To see that Y i Y = ( Y 1 , , Y n ) for the less noisy and the more capable, recall that for any U such that U T Y i and U T Y it is trivial that I ( U ; Y i ) I ( U ; Y ) ; hence, Y i l n Y . A similar argument can be used to show that Y i m c Y , as I ( T ; Y i ) I ( T ; Y ) . Finally, to see that Y i d s ( Y 1 , , Y n ) , note that Y i d ( Y 1 , , Y n )  [21]; hence, Y i d s ( Y 1 , , Y n ) .    □

5. Optimization Problems

We now focus on certain observations around optimization problems involving the introduced II measures. All of these problems seek to maximize I ( Q ; T ) (under different constraints) as a function of the conditional distribution p ( q | t ) , and equivalently with respect to the channel from T to Q, which we denote as K Q : = K Q | T . For fixed p ( t ) , as is the case in PID, I ( Q ; T ) is a convex function of K Q ([11], Theorem 2.7.4). As we will see, the admissible region of all problems is a compact set, and because I ( Q ; T ) is a continuous function of the parameters of K Q , the supremum is achieved; thus, we replace sup here with max.
As noted by Kolchinsky [21], the computation of (3) can be rewritten as an optimization problem using auxiliary variables such that it involves only linear constraints, and because the objective function is convex, its maximum is attained at one of the vertices of the admissible region. The computation of the other measures, however, is not as simple, as shown in the following subsections.

5.1. The “Less Noisy” Order

To solve (4), we can use one of the necessary and sufficient conditions presented by (Makur and Polyanskiy [26], Theorem 1). For instance, let V and W be two channels with input T, and let Δ T 1 be the probability simplex of the target T. Then, V l n W , if and only if the inequality
χ 2 ( p ( t ) W | | q ( t ) W ) χ 2 ( p ( t ) V | | q ( t ) V )
holds for any pair of distributions p ( t ) , q ( t ) Δ T 1 , where χ 2 in the above equation denotes the χ 2 -distance between two vectors. The χ 2 distance between two vectors u and v of dimension n is given by χ 2 ( u | | v ) = i = 1 n ( u i v i ) 2 / v i . Notice that p ( t ) W is the distribution of the output of channel W for input distribution p ( t ) ; thus, intuitively, the condition in (10) means that the two output distributions of the less noisy channel are more different from each other than those of the other channel. Hence, computing I l n ( Y 1 , , Y n ) can be formulated as solving the problem
max K Q I ( Q ; T ) s . t . K Q i s   a   s t o c h a s t i c   m a t r i x , p ( t ) , q ( t ) Δ T 1 , i { 1 , , n } , χ 2 ( p ( t ) K ( i ) | | q ( t ) K ( i ) ) χ 2 ( p ( t ) K Q | | q ( t ) K Q ) .
Although the restriction set is convex, as the χ 2 -divergence is an f-divergence with convex f [27], the problem is intractable because we have an infinite (uncountable) number of restrictions. It is possible to construct a set S by taking an arbitrary number of samples S of p ( t ) Δ T 1 to define the problem
max K Q I ( Q ; T ) , s . t . K Q i s   a   s t o c h a s t i c   m a t r i x , p ( t ) , q ( t ) S , i { 1 , , n } , χ 2 ( p ( t ) K ( i ) | | q ( t ) K ( i ) ) χ 2 ( p ( t ) K Q | | q ( t ) K Q ) .
The above problem yields an upper bound on I l n ( Y 1 , , Y n ) .

5.2. The “More Capable” Order

To compute I m c ( Y 1 , , Y n ) , we can define the problem
max K Q I ( Q ; T ) s . t . K Q i s   a   s t o c h a s t i c   m a t r i x , p ( t ) Δ T 1 , i { 1 , , n } , I ( Y i ; T ) I ( Q ; T ) ,
which again leads to a convex restriction set, as I ( Q ; T ) is a convex function of K Q . We can discretize the problem in the same manner as above to obtain a tractable version
max K Q I ( Q ; T ) s . t . K Q i s   a   s t o c h a s t i c   m a t r i x , p ( t ) S , i { 1 , , n } , I ( Y i ; T ) I ( Q ; T ) ,
which again yields an upper bound on I m c ( Y 1 , , Y n ) .

5.3. The “Degradation/Supermodularity” Order

The final introduced measure, I d s ( Y 1 , , Y n ) , is provided by
max K Q I ( Q ; T ) s . t . K Q i s   a   s t o c h a s t i c   m a t r i x , i , K Q d s K ( i ) .
To the best of our knowledge, there is currently no known condition to check whether K Q d s K ( i ) .

6. Relation to Existing PID Measures

Griffith et al. [32] introduced a measure of II as
I ( Y 1 , , Y n ) : = max Q I ( Q ; T ) , such that i Q Y i ,
with the order relation ◃ defined by A B if A = f ( B ) for some deterministic function f, that is, I is used to quantify the redundancy as the presence of deterministic relations between the input and target. If Q is a solution of (15), then there exist functions f 1 , , f n such that Q = f i ( Y i ) , i = 1 , , n , which implies that for all i it is the case that T Y i Q is a Markov chain. Therefore, Q is an admissible point of the optimization problem that defines I d ( Y 1 , , Y n ) , and we have I ( Y 1 , , Y n ) I d ( Y 1 , , Y n ) .
Barrett [33] introduced the so-called minimum mutual information (MMI) measure of bivariate redundancy as
I MMI ( Y 1 , Y 2 ) : = min { I ( T ; Y 1 ) , I ( T ; Y 2 ) } .
It turns out that if ( Y 1 , Y 2 ) is jointly Gaussian and T is univariate, then most of the introduced PIDs in the literature are equivalent to this measure [33]. Furthermore, as noted by Kolchinsky [21], it may be generalized to more than two sources:
I MMI ( Y 1 , , Y n ) : = sup Q I ( Q ; T ) , such that i I ( Q ; T ) I ( Y i ; T ) ,
which allows us to trivially conclude that for any set of variables Y 1 , , Y n , T ,
I ( Y 1 , , Y n ) I d ( Y 1 , , Y n ) I m c ( Y 1 , , Y n ) I MMI ( Y 1 , , Y n ) .
One of the appeals of measures of II as defined by Kolchinsky [21] is that the underlying preorder determines what is intersection (or redundant) information. For example, taking the degradation II measure in the n = 2 case, its solution Q satisfies T Q | Y 1 and T Q | Y 2 ; that is, if either Y 1 or Y 2 are known, then Q has no additional information about T. The same is not necessarily the case for the less noisy or the more capable II measures, where the solution Q may have additional information about T even when a source is known. However, the three proposed measures satisfy the property that any solution Q of the optimization problem satisfies
i { 1 , , n } , t S T , I ( Y i ; T = t ) I ( Q ; T = t ) ,
where S T is the support of T and I ( T = t ; Y i ) refers to the so-called specific information [1,34]. This means that, independent of the outcome of T, Q has less specific information about T = t than any source variable Y i . This can be seen by noting that any of the introduced orders imply the more capable order. This is not the case, for example, for I MMI , which is arguably one of the reasons why it has been criticized for depending only on the amount of information and not on its content [21]. As mentioned, there is not much consensus as to which properties a measure of II should satisfy. The three proposed measures for partial information decomposition do not satisfy the so-called Blackwell property [14,35]:
Definition 8.
An intersection information measure I ( Y 1 , Y 2 ) is said to satisfy the Blackwell property if the equivalence Y 1 d Y 2 I ( Y 1 , Y 2 ) = I ( T ; Y 1 ) holds.
This definition is equivalent to demanding that Y 1 d Y 2 if and only if Y 1 has no unique information about T. Although the ( ) implication holds for the three proposed measures, the reverse implication does not, as shown by specific examples presented by Korner and Marton [22], which we mention below. If we define the “more capable” property by replacing the degradation order with the more capable order in the original definition of the Blackwell property, then it is clear that measure k satisfies the k property, with k referring to any of the three introduced intersection information measures.
In PID, the identity property (IP) has been frequently studied [13]. For this property, let the target T be a copy of the source variables, that is, let T = ( Y 1 , Y 2 ) . An II measure I is said to satisfy the IP if
I ( Y 1 , Y 2 ) = I ( Y 1 ; Y 2 ) .
Criticism has been levied against this proposal for being too restrictive [16,36]. A less strict property was introduced by [20] under the name independent identity property (IIP). If the target T is a copy of the input, an II measure is said to satisfy the IIP if
I ( Y 1 ; Y 2 ) = 0 I ( Y 1 , Y 2 ) = 0 .
Note that the IIP is implied by the IP, while the reverse does not hold. It turns out that all the introduced measures, as is the case for the degradation II measure, satisfy the IIP and not the IP, as we show later. This can be seen from (8) and (9), as well as from the fact that I m c ( Y 1 , Y 2 ( Y 1 , Y 2 ) ) equals 0 if I ( Y 1 ; Y 2 ) = 0 , as we argue now. Consider the distribution where T is a copy of ( Y 1 , Y 2 ) , as presented in Table 1.
We assume that each of the four events has non-zero probability. In this case, channels K ( 1 ) and K ( 2 ) are provided by
K ( 1 ) = 1 0 1 0 0 1 0 1 , K ( 2 ) = 1 0 0 1 1 0 0 1 .
Note that for any distribution p ( t ) = p ( 0 , 0 ) , p ( 0 , 1 ) , p ( 1 , 0 ) , p ( 1 , 1 ) , if p ( 1 , 0 ) = p ( 1 , 1 ) = 0 , then I ( T ; Y 1 ) = 0 , which implies that for any such distributions the solution Q of (12) must satisfy I ( Q ; T ) = 0 . Thus, the first and second rows of K Q must be the same. The same is the case for any distribution p ( t ) with p ( 0 , 0 ) = p ( 0 , 1 ) = 0 ; on the other hand, if p ( 0 , 0 ) = p ( 1 , 0 ) = 0 or p ( 1 , 1 ) = p ( 0 , 1 ) = 0 , then I ( T ; Y 2 ) = 0 , implying that I ( Q ; T ) = 0 for such distributions. Hence, K Q must be an arbitrary channel, that is, a channel that satisfies Q T , yielding I m c ( Y 1 , Y 2 ) = 0 .
Now, recall the Gács–Korner common information [37], defined as
C ( Y 1 Y 2 ) : = sup Q H ( Q ) s . t . Q Y 1 Q Y 2
We use a similar argument, while slightly changing the notation, to show the following result.
Theorem 3.
Let T = ( X , Y ) be a copy of the source variables; then, I l n ( X , Y ) = I d s ( X , Y ) = I m c ( X , Y ) = C ( X Y ) .
Proof. 
As shown by Kolchinsky [21], I d ( X , Y ) = C ( X Y ) . Thus, (8) implies that I m c ( X , Y ) C ( X Y ) . The proof is completed by showing that I m c ( X , Y ) C ( X Y ) . Construct the bipartite graph with vertex set X Y and edges ( x , y ) if p ( x , y ) > 0 . Consider the set of maximally connected components M C C = { C C 1 , , C C l } for some l 1 , where each C C i refers to a maximal set of connected edges. Let C C i , i l be an arbitrary set in M C C . Suppose that the edges ( x 1 , y 1 ) and ( x 1 , y 2 ) (with y 1 y 2 ) are in C C i . This means that the channels K X : = K X | T and K Y : = K Y | T have rows corresponding to the outcomes T = ( x 1 , y 1 ) and T = ( x 1 , y 2 ) of the form
K X = 0 0 1 0 0 0 0 1 0 0 , K Y = 0 0 1 0 0 0 0 0 0 1 0 0 .
Choosing p ( t ) = [ 0 , , 0 , a , 1 a , 0 , , 0 ] , that is, p ( T = ( x 1 , y 1 ) ) = a and p ( T = ( x 1 , y 2 ) ) = 1 a , we have a [ 0 , 1 ] , I ( X ; T ) = 0 , which implies that the solution Q must be such that a [ 0 , 1 ] , I ( Q ; T ) = 0 (from the definition of the more capable order), which in turn implies that the rows of K Q corresponding to these outcomes must be the same to ensure that they yield I ( Q ; T ) = 0 under this set of distributions. We may choose the values of those rows to be the same as those rows from K X , that is, a row that is composed of zeros except for one of the positions whenever T = ( x 1 , y 1 ) or T = ( x 1 , y 2 ) . On the other hand, if the edges ( x 1 , y 1 ) and ( x 2 , y 1 ) (with x 1 x 2 ) are in C C i , the same argument leads to the conclusion that the rows of K Q corresponding to the outcomes T = ( x 1 , y 1 ) , T = ( x 1 , y 2 ) and T = ( x 2 , y 1 ) must be the same. Applying this argument to every edge in C C i , we can conclude that the rows of K Q corresponding to outcomes ( x , y ) C C i must all be the same. Using this argument for every set C C 1 , , C C l implies that if two edges are in the same C C , the corresponding rows of K Q must be the same. These corresponding rows of K Q may vary between different CCs; however, for the same CC they must be the same.
We are left with the choice of appropriate rows of K Q for each corresponding C C i . Because I ( Q ; T ) is maximized by a deterministic relation between Q and T and, as suggested before, we choose a row that is composed of zeros except for one of the positions for each C C i such that Q is a deterministic function of T, this admissible point Q implies that Q = f 1 ( X ) and Q = f 2 ( Y ) , as X and Y are also functions of T under the channel perspective. For this choice of rows, we have
I m c ( X , Y ) = sup Q I ( Q ; T ) sup Q H ( Q ) = C ( X Y ) s . t . Q m c X s . t . Q = f 1 ( X ) Q m c Y Q = f 2 ( Y )
where we have used the fact that I ( Q ; T ) min { H ( Q ) , H ( T ) } to conclude that I m c ( X , Y ) C ( X Y ) . Hence I l n ( X , Y ) = I d s ( X , Y ) = I m c ( X , Y ) = C ( X Y ) if T is a copy of the input.    □
Bertschinger et al. [14] suggested what later became known as the (*) assumption, which states that in the bivariate source case any sensible measure of unique information should only depend on K ( 1 ) , K ( 2 ) , and p ( t ) . It is not clear that this assumption should hold for every PID. It is trivial to see that all the introduced II measures satisfy the (*) assumption.
We conclude with several applications of the proposed measures to famous (bivariate) PID problems; the results are shown in Table 2. Due to the channel design in these problems, computation of the proposed measures is fairly trivial. We assume that the input variables are binary (taking values in { 0 , 1 } ), independent, and equiprobable.
We note that in these fairly simple toy distributions all of the introduced measures yield the same value. This is not surprising when the distribution p ( t , y 1 , y 2 ) yields K ( 1 ) = K ( 2 ) , which implies that I ( T ; Y 1 ) = I ( T ; Y 2 ) = I k ( Y 1 , Y 2 ) , where k refers to any of the introduced preorders, as is the case in the T = Y 1 AND Y 2 and T = Y 1 + Y 2 examples. Less trivial examples lead to different values over the introduced measures. We present distributions showing that our three introduced measures lead to novel information decompositions by comparing them to the following existing measures: I from Griffith et al. [32], I MMI from Barrett [33], I WB from Williams and Beer [1], I GH from Griffith and Ho [38], I Ince from Ince [20], I FL from Finn and Lizier [39], I BROJA from Bertschinger et al. [14], I Harder from Harder et al. [13], and I dep from [16]. We used the dit package [40] to compute them, along with the code provided in [21]. Consider counterexample 1 from [22] with p = 0.25 , ϵ = 0.2 , δ = 0.1 , provided by
K ( 1 ) = 0.25 0.75 0.35 0.65 , K ( 2 ) = 0.675 0.325 0.745 0.255 .
These channels satisfy K ( 2 ) l n K ( 1 ) and K d ( 2 ) K ( 1 ) from Korner and Marton [22]. This is an example that satisfies I l n ( Y 1 , Y 2 ) = I ( T ; Y 2 ) for a given distribution p ( t ) . It is noteworthy to see that even though there is no degradation order between the two channels, we nonetheless have I d ( Y 1 , Y 2 ) > 0 , as there is some non-trivial channel K Q that satisfies K Q d K ( 1 ) and K Q d K ( 2 ) . In Table 3, we present various PIDs under different measures after choosing p ( t ) = [ 0.4 , 0.6 ] (which yields I ( T ; Y 2 ) 0.004 ) and assuming p ( t , y 1 , y 2 ) = p ( t ) p ( y 1 | t ) p ( y 2 | t ) .
We write I d s = * here, as we do not yet have a way to find the ‘largest’ Q such that Q d s K ( 1 ) and Q d s K ( 2 ) (see counterexample 2 from [22] for an example of channels K ( 1 ) , K ( 2 ) that satisfy K ( 2 ) m c K ( 1 ) while K ( 2 ) l n K ( 1 ) , leading to different values of the proposed II measures). An example of K ( 3 ) , K ( 4 ) that satisfy K ( 4 ) d s K ( 3 ) while K ( 4 ) d K ( 3 ) is presented by (Américo et al. [23], page 10), provided by
K ( 3 ) = 1 0 0 1 0.5 0.5 , K ( 4 ) = 1 0 1 0 0.5 0.5 .
There is no stochastic matrix K U such that K ( 4 ) = K ( 3 ) K U while K ( 4 ) d s K ( 3 ) , as K ( 4 ) = 1 , 2 K ( 3 ) . Using (10), it is possible to check whether there is any less noisy relation between the two channels. (Compute (10) with V = K ( 4 ) , W = K ( 3 ) , p ( t ) = [ 0 , 0 , 1 ] , and q ( t ) = [ 0.1 , 0.1 , 0.8 ] to conclude that K ( 4 ) l n K ( 3 ) , then switch the roles of V and W and set p ( t ) = [ 0 , 1 , 0 ] and q ( t ) = [ 0.1 , 0 , 0.9 ] to conclude that K ( 3 ) l n K ( 4 ) ). We present the decomposition of p ( t , y 3 , y 4 ) = p ( t ) p ( y 3 | t ) p ( y 4 | t ) for the choice of p ( t ) = [ 0.3 , 0.3 , 0.4 ] (which yields I ( T ; Y 4 ) 0.322 ) in Table 4.
We write I l n = 0 * because we conjecture, after numerical experiments based on (10), that the ‘largest’ channel that is less noisy than both K ( 3 ) and K ( 4 ) is a channel that satisfies I ( Q ; T ) = 0 . (We tested all 3 × 3 row-stochastic matrices with entries that take values in { 0 , 0.1 , 0.2 , , 0.9 , 1 } with all distributions p ( t ) and q ( t ) having entries that take values in the same set.)

7. Conclusions and Future Work

In this paper, we have introduced three new measures of intersection information for the partial information decomposition (PID) framework based on preorders between channels implied by the degradation/Blackwell order. The new measures were obtained from the orders by following the approach recently proposed by Kolchinsky [21]. The main contributions and conclusions of this paper can be summarized as follows:
  • We show that a measure of intersection information that satisfies the axioms by Kolchinsky [21] and is based on a preorder, satisfies the Williams–Beer axioms as well [1].
  • As a corollary of the previous result, the proposed measures satisfy the Williams–Beer axioms, and can be extended beyond two sources.
  • We demonstrate that if there is a degradation ordering between the sources, then the measures coincide in their decomposition. Conversely, if there is no degradation ordering (i.e., only a weaker ordering) between the source variables, the proposed measures lead to novel finer information decompositions that capture different finer information.
  • We show that while the proposed measures do not satisfy the identity property (IP) [13], they do satisfy the independent identity property (IIP) [20].
  • We formulate the optimization problems that yield the proposed measures, and derive bounds by relating them to existing measures.
Finally, we believe that this paper opens several avenues for future research; thus, we point to several directions that could be pursued in upcoming work:
  • Investigating conditions to verify whether two channels K ( 1 ) and K ( 2 ) satisfy K ( 1 ) d s K ( 2 ) .
  • Kolchinsky [21] showed that when computing I d ( Y 1 , , Y n ) , it is sufficient to consider variables Q with a support size of at most i | S Y i | n + 1 , which is a consequence of the admissible region of I d ( Y 1 , , Y n ) being a polytope. The same is not the case with the less noisy or the more capable measures; hence, it is not clear whether it is sufficient to consider Q with the same support size, which could represent a direction for future research.
  • Studying the conditions under which different intersection information measures are continuous.
  • Implementing the introduced measures by addressing their corresponding optimization problems.
  • Considering the usual PID framework, except that instead of decomposing I ( T ; Y ) = H ( Y ) H ( Y | T ) , where H denotes the Shannon entropy, other mutual informations induced by different entropy measures could be considered, such as the guessing entropy [41] or the Tsallis entropy [42] (see the work of Américo et al. [23] for other core-concave entropies that may be decomposed under the introduced preorders, as these entropies are consistent with the introduced orders).
  • Another line for future work might be to define measures of union information using the introduced preorders, as suggested by Kolchinsky [21], and to study their properties.
  • As a more long-term research direction, it would be interesting to study how the approach taken in this paper can be extended to quantum information; the fact that partial quantum information can be negative might open up new possibilities or create novel difficulties [43].

Author Contributions

Conceptualization, A.F.C.G.; writing—original draft preparation, A.F.C.G.; writing—review and editing, A.F.C.G. and M.A.T.F.; supervision, M.A.T.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by: FCT—Fundação para a Ciência e a Tecnologia under the grants SFRH/BD/145472/2019 and UIDB/50008/2020; Instituto de Telecomunicações; Portuguese Recovery and Resilience Plan, project C645008882-00000055 (NextGenAI, CenterforResponsibleAI).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Artemy Kolchinsky for helpful discussions and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Williams, P.; Beer, R. Nonnegative decomposition of multivariate information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
  2. Lizier, J.; Flecker, B.; Williams, P. Towards a synergy-based approach to measuring information modification. In Proceedings of the 2013 IEEE Symposium on Artificial Life (ALIFE), Singapore, 16–19 April 2013; pp. 43–51. [Google Scholar]
  3. Wibral, M.; Finn, C.; Wollstadt, P.; Lizier, J.; Priesemann, V. Quantifying information modification in developing neural networks via partial information decomposition. Entropy 2017, 19, 494. [Google Scholar] [CrossRef] [Green Version]
  4. Rauh, J. Secret sharing and shared information. Entropy 2017, 19, 601. [Google Scholar] [CrossRef] [Green Version]
  5. Vicente, R.; Wibral, M.; Lindner, M.; Pipa, G. Transfer entropy—A model-free measure of effective connectivity for the neurosciences. J. Comput. Neurosci. 2011, 30, 45–67. [Google Scholar] [CrossRef] [Green Version]
  6. Ince, R.; Van Rijsbergen, N.; Thut, G.; Rousselet, G.; Gross, J.; Panzeri, S.; Schyns, P. Tracing the flow of perceptual features in an algorithmic brain network. Sci. Rep. 2015, 5, 17681. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Gates, A.; Rocha, L. Control of complex networks requires both structure and dynamics. Sci. Rep. 2016, 6, 24456. [Google Scholar] [CrossRef] [Green Version]
  8. Faber, S.; Timme, N.; Beggs, J.; Newman, E. Computation is concentrated in rich clubs of local cortical networks. Netw. Neurosci. 2019, 3, 384–404. [Google Scholar] [CrossRef]
  9. James, R.; Ayala, B.; Zakirov, B.; Crutchfield, J. Modes of information flow. arXiv 2018, arXiv:1808.06723. [Google Scholar]
  10. Arellano-Valle, R.; Contreras-Reyes, J.; Genton, M. Shannon Entropy and Mutual Information for Multivariate Skew-Elliptical Distributions. Scand. J. Stat. 2013, 40, 42–62. [Google Scholar] [CrossRef]
  11. Cover, T. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
  12. Gutknecht, A.; Wibral, M.; Makkeh, A. Bits and pieces: Understanding information decomposition from part-whole relationships and formal logic. Proc. R. Soc. A 2021, 477, 20210110. [Google Scholar] [CrossRef]
  13. Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef] [Green Version]
  15. Griffith, V.; Koch, C. Quantifying synergistic mutual information. In Guided Self-Organization: Inception; Springer: Berlin/Heidelberg, Germany, 2014; pp. 159–190. [Google Scholar]
  16. James, R.; Emenheiser, J.; Crutchfield, J. Unique information via dependency constraints. J. Phys. A Math. Theor. 2018, 52, 014002. [Google Scholar] [CrossRef] [Green Version]
  17. Chicharro, D.; Panzeri, S. Synergy and redundancy in dual decompositions of mutual information gain and information loss. Entropy 2017, 19, 71. [Google Scholar] [CrossRef] [Green Version]
  18. Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared information—New insights and problems in decomposing information in complex systems. In Proceedings of the European Conference on Complex Systems 2012, Brussels, Belgium, 2–7 September 2012; Springer: Berlin/Heidelberg, Germany, 2013; pp. 251–269. [Google Scholar]
  19. Rauh, J.; Banerjee, P.; Olbrich, E.; Jost, J.; Bertschinger, N.; Wolpert, D. Coarse-graining and the Blackwell order. Entropy 2017, 19, 527. [Google Scholar] [CrossRef] [Green Version]
  20. Ince, R. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy 2017, 19, 318. [Google Scholar] [CrossRef] [Green Version]
  21. Kolchinsky, A. A Novel Approach to the Partial Information Decomposition. Entropy 2022, 24, 403. [Google Scholar] [CrossRef]
  22. Korner, J.; Marton, K. Comparison of two noisy channels. In Topics in Information Theory; Csiszr, I., Elias, P., Eds.; North-Holland Pub. Co.: Amsterdam, The Netherlands, 1977; pp. 411–423. [Google Scholar]
  23. Américo, A.; Khouzani, A.; Malacaria, P. Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization. Entropy 2021, 24, 39. [Google Scholar] [CrossRef]
  24. Cohen, J.; Kempermann, J.; Zbaganu, G. Comparisons of Stochastic Matrices with Applications in Information Theory, Statistics, Economics and Population; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
  25. Blackwell, D. Equivalent comparisons of experiments. Ann. Math. Stat. 1953, 24, 265–272. [Google Scholar] [CrossRef]
  26. Makur, A.; Polyanskiy, Y. Less noisy domination by symmetric channels. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 2463–2467. [Google Scholar]
  27. Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  28. Wyner, A. The wire-tap channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
  29. Bassi, G.; Piantanida, P.; Shamai, S. The secret key capacity of a class of noisy channels with correlated sources. Entropy 2019, 21, 732. [Google Scholar] [CrossRef] [Green Version]
  30. Gamal, A. The capacity of a class of broadcast channels. IEEE Trans. Inf. Theory 1979, 25, 166–169. [Google Scholar] [CrossRef]
  31. Clark, D.; Hunt, S.; Malacaria, P. Quantitative information flow, relations and polymorphic types. J. Log. Comput. 2005, 15, 181–199. [Google Scholar] [CrossRef]
  32. Griffith, V.; Chong, E.; James, R.; Ellison, C.; Crutchfield, J. Intersection information based on common randomness. Entropy 2014, 16, 1985–2000. [Google Scholar] [CrossRef] [Green Version]
  33. Barrett, A. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef] [Green Version]
  34. DeWeese, M.; Meister, M. How to measure the information gained from one symbol. Netw. Comput. Neural Syst. 1999, 10, 325. [Google Scholar] [CrossRef]
  35. Rauh, J.; Banerjee, P.; Olbrich, E.; Jost, J.; Bertschinger, N. On extractable shared information. Entropy 2017, 19, 328. [Google Scholar] [CrossRef] [Green Version]
  36. Rauh, J.; Bertschinger, N.; Olbrich, E.; Jost, J. Reconsidering unique information: Towards a multivariate information decomposition. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 2232–2236. [Google Scholar]
  37. Gács, P.; Körner, J. Common information is far less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
  38. Griffith, V.; Ho, T. Quantifying redundant information in predicting a target random variable. Entropy 2015, 17, 4644–4653. [Google Scholar] [CrossRef] [Green Version]
  39. Finn, C.; Lizier, J. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy 2018, 20, 297. [Google Scholar] [CrossRef] [Green Version]
  40. James, R.; Ellison, C.; Crutchfield, J. “dit”: A Python package for discrete information theory. J. Open Source Softw. 2018, 3, 738. [Google Scholar] [CrossRef]
  41. Massey, J. Guessing and entropy. In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 204. [Google Scholar]
  42. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  43. Horodecki, M.; Oppenheim, J.; Winter, A. Partial quantum information. Nature 2005, 436, 673–676. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Implications satisfied by the orders. The reverse implications do not hold in general.
Figure 1. Implications satisfied by the orders. The reverse implications do not hold in general.
Entropy 25 00975 g001
Table 1. Copy distribution.
Table 1. Copy distribution.
T Y 1 Y 2 p ( t , y 1 , y 2 )
(0, 0)00 p ( T = ( 0 , 0 ) )
(0, 1)01 p ( T = ( 0 , 1 ) )
(1, 0)10 p ( T = ( 1 , 0 ) )
(1, 1)11 p ( T = ( 1 , 1 ) )
Table 2. Application of the proposed measures to famous PID problems.
Table 2. Application of the proposed measures to famous PID problems.
Target I I d I ln I ds I mc I MMI
T = Y 1 AND Y 2 00.3110.3110.3110.3110.311
T = Y 1 + Y 2 00.50.50.50.50.5
T = Y 1 000000
T = ( Y 1 , Y 2 ) 000001
Table 3. Different decompositions of p ( t , y 1 , y 2 ) .
Table 3. Different decompositions of p ( t , y 1 , y 2 ) .
I I d I ln I ds I mc I MMI I WB I GH I Ince I FL I BROJA I Harder I dep
00.0020.004*0.0040.0040.0040.0020.0030.0470.0030.0040
Table 4. Different decompositions of p ( t , y 3 , y 4 ) .
Table 4. Different decompositions of p ( t , y 3 , y 4 ) .
I I d I ln I ds I mc I MMI I WB I GH I Ince I FL I BROJA I Harder I dep
00 0 * 0.3220.3220.3220.193000.058000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gomes, A.F.C.; Figueiredo, M.A.T. Orders between Channels and Implications for Partial Information Decomposition. Entropy 2023, 25, 975. https://doi.org/10.3390/e25070975

AMA Style

Gomes AFC, Figueiredo MAT. Orders between Channels and Implications for Partial Information Decomposition. Entropy. 2023; 25(7):975. https://doi.org/10.3390/e25070975

Chicago/Turabian Style

Gomes, André F. C., and Mário A. T. Figueiredo. 2023. "Orders between Channels and Implications for Partial Information Decomposition" Entropy 25, no. 7: 975. https://doi.org/10.3390/e25070975

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop