Orders between Channels and Implications for Partial Information Decomposition

The partial information decomposition (PID) framework is concerned with decomposing the information that a set of random variables has with respect to a target variable into three types of components: redundant, synergistic, and unique. Classical information theory alone does not provide a unique way to decompose information in this manner, and additional assumptions have to be made. Recently, Kolchinsky proposed a new general axiomatic approach to obtain measures of redundant information based on choosing an order relation between information sources (equivalently, order between communication channels). In this paper, we exploit this approach to introduce three new measures of redundant information (and the resulting decompositions) based on well-known preorders between channels, contributing to the enrichment of the PID landscape. We relate the new decompositions to existing ones, study several of their properties, and provide examples illustrating their novelty. As a side result, we prove that any preorder that satisfies Kolchinsky’s axioms yields a decomposition that meets the axioms originally introduced by Williams and Beer when they first proposed PID.


Introduction
Williams and Beer [2010] proposed the partial information decomposition (PID) framework as a way to characterize, or analyze, the information that a set of random variables (often called sources) has about another variable (referred to as the target).PID is a useful tool for gathering insights and analyzing the way information is stored, modified, and transmitted within complex systems [Lizier et al., 2013], [Wibral et al., 2017].It has found applications in areas such as cryptography [Rauh, 2017] and neuroscience [Vicente et al., 2011, Ince et al., 2015], with many other potential use cases, such as in understanding how information flows in gene regulatory networks [Gates and Rocha, 2016], neural coding [Faber et al., 2019], financial markets [James et al., 2018a], and network design [Arellano-Valle et al., 2013].
Consider the simplest case: a three-variable joint distribution p(y 1 , y 2 , t) describing three random variables: two sources, Y 1 and Y 2 , and a target T .Notice that, despite what the names sources and target might suggest, there is no directionality (causal or otherwise) assumption.The goal of PID is to decompose the information that Y = (Y 1 , Y 2 ) has about T into the sum of 4 non-negative quantities: the information that is present in both Y 1 and Y 2 , known as redundant information R; the information that only Y 1 (respectively Y 2 ) has about T , known as unique information U 1 (respectively U 2 ); the synergistic information S that is present in the pair (Y 1 , Y 2 ) but not in Y 1 or Y 2 alone.That is, in this case with two variables, the goal is to write where I(T ; Y ) is the mutual information between T and Y [Cover, 1999].Because unique information and redundancy satisfy the relationship U i = I(T ; Y i ) − R (for i ∈ {1, 2}), it turns out that defining how to compute one of these quantities (R, U i , or S) is enough to fully determine the others [Williams and Beer, 2010].As the number of variables grows, the number of terms appearing in the PID of I(T ; Y ) grows super exponentially [Gutknecht et al., 2021].Williams and Beer [2010] suggested a set of axioms that a measure of redundancy should satisfy, and proposed a measure of their own.Those axioms became known as the Williams-Beer axioms and the measure they proposed has subsequently been criticized for not capturing informational content, but only information size [Harder et al., 2013].
Spawned by that initial work, other measures and axioms for information decomposition have been introduced; see, for example, the work by Bertschinger et al. [2014], Griffith and Koch [2014], and James et al. [2018b].There is no consensus about what axioms any measure should satisfy or whether a given measure is capturing the information that it should capture, except for the Williams-Beer axioms.Today, there is still debate about what axioms a measure of redundant information should satisfy and there is no general agreement on what is an appropriate PID [Chicharro and Panzeri, 2017, James et al., 2018b, Bertschinger et al., 2013, Rauh et al., 2017a, Ince, 2017].
Recently, Kolchinsky [2022] suggested a new general approach to define measures of redundant information, also known as intersection information (II), the designation that we adopt hereinafter.At the core of that approach is the choice of an order relation between information sources (random variables), which allows comparing two sources in terms of how informative they are with respect to the target variable.
In this work, we take previously studied preorders between communication channels, which correspond to preorders between the corresponding output variables in terms of information content with respect to the input.Following Kolchinsky's approach, we show that those orders thus lead to the definition of new II measures.The rest of the paper is organized as follows.In Sections 2 and 3, we review Kolchinsky's definition of an II measure and the degradation order.In Section 4, we describe some preorders between channels, based on the work by Korner and Marton [1977] and Américo et al. [2021], derive the resulting II measures, and study some of their properties.Section 5 presents and comments on the optimization problems involved in the computation of the proposed measures.In Section 6, we explore the relationships between the new II measures and previous PID approaches, and we apply the proposed II measures to some famous PID problems.Section 7 concludes the paper by pointing out some suggestions for future work.

Kolchinsky's Axioms and Intersection Information
Consider a set of n discrete random variables, Y 1 ∈ Y 1 , ..., Y n ∈ Y n , called the source variables, and let T ∈ T be the (also discrete) target variable, with joint distribution (probability mass function) p(y 1 , ..., y n , t).Let denote some order between random variables that satisfies the following axioms, herein referred to as Kolchinsky's axioms [Kolchinsky, 2022]: where C ∈ C is any variable taking a constant value with probability one, i.e., with a distribution that is a delta function or such that C is a singleton.
Kolchinsky [2022] showed that such an order can be used to define an II measure via and we now show that if is a preorder then the II measure in (2) satisfies the Williams-Beer axioms [Williams andBeer, 2010, Lizier et al., 2013], thus establishing a strong connection between these formulations.Recall that a relation is a preorder if it satisfies transitivity and reflexivity [Schröder, 2003].Before stating and proving this result, we recall the Williams-Beer axioms [Lizier et al., 2013].
Definition 1.Let A 1 , ..., A r an arbitrary number of r ≥ 2 sources3 .An intersection information measure I ∩ is said to satisfy the Williams-Beer axioms if it satisfies: 1. Symmetry: I ∩ is symmetric in the A i 's.

Equality for Monotonicity
Theorem 1.Let be some preorder that satisfies Kolchinsky's axioms and define its corresponding II measure as in (2).Then the corresponding II measure satisfies the Williams-Beer axioms.
Proof.Symmetry and monotonicity follow trivially given the form of (2) (the definition of supremum and restriction set).Self-redundancy follows from the reflexivity of the preorder and monotonicity of mutual information.Now suppose A r−1 ⊆ A r , and let Q be a solution of I ∩ (A 1 , ..., A r−1 ), which implies that Q A r−1 .Now, since A r−1 ⊆ A r , the Kolchinsky axiom (iii) and transitivity of the preorder guarantees that Q In conclusion, every preorder that satisfies the set of axioms introduced by Kolchinsky [2022] yields a valid II measure, in the sense that the measure satisfies the Williams-Beer axioms.Having a more informative relation allows us to draw conclusions about information flowing from different sources.It also allows for the construction of PID measures that are well-defined for more than two sources.In the following, we will omit "→ T " from the notation (unless we need to explicitly refer to it), with the understanding that the target variable is always some arbitrary, discrete random variable T .
3 Channels and the Degradation/Blackwell Order Given two discrete random variables X ∈ X and Z ∈ Z, the corresponding conditional distribution p(z|x) corresponds, in an information-theoretical perspective, to a discrete memoryless channel with a channel matrix K, i.e., such that K[x, z] = p(z|x) [Cover, 1999].This matrix is row-stochastic: K[x, z] ≥ 0, for any x ∈ X and z ∈ Z, and The comparison of different channels (equivalently, different stochastic matrices) is an object of study with many applications in different fields [Cohen et al., 1998].That study addresses order relations between channels and their properties.One such order, named degradation order (or Blackwell order) and defined next, was used by Kolchinsky to obtain a particular II measure [Kolchinsky, 2022].
Consider the distribution p(y 1 , ..., y n , t) and the channels K (i) between T and each Y i , that is, K (i) is a |T | × |Y i | row-stochastic matrix with the conditional distribution p(y i |t).Definition 2. We say that channel K (i) is a degradation of channel K (j) , and write Intuitively, consider 2 agents, one with access to Y i and the other with access to Y j .The agent with access to Y j has at least as much information about T as the one with access to Y i , because it has access to channel K U , which allows sampling from Y i , conditionally on Y j [Rauh et al., 2017a].Blackwell [1953] showed that this is equivalent to saying that, for whatever decision game where the goal is to predict T and for whatever utility function, the agent with access to Y i cannot do better, on average, than the agent with access to Y j .
Based on the degradation/Blackwell order, Kolchinsky [2022] introduced the degradation II measure, by plugging the " d " order in (2): As noted by Kolchinsky [2022], this II measure has the following operational interpretation.Suppose n = 2 and consider agents 1 and 2, with access to variables Y 1 and Y 2 , respectively.Then is the maximum information that agent 1 (resp.2) can have w.r.t.T without being able to do better than agent 2 (resp.1) on any decision problem that involves guessing T .That is, the degradation II measure quantifies the existence of a dominating strategy for any guessing game.
4 Other Orders and Corresponding II Measures 4.1 The "Less Noisy" Order Korner and Marton [1977] introduced and studied preorders between channels with the same input.We follow most of their definitions and change others when appropriate.We interchangeably write Y 1 Y 2 to mean K (1) K (2) , where K (1) and K (2) are the channel matrices as defined above.
Before introducing the next channel order, we need to review the notion of Markov chain [Cover, 1999].We say that three random variables X 1 , X 2 , X 3 form a Markov chain, and write X 1 → X 2 → X 3 , if the following equality holds: p(x 1 , x 3 |x 2 ) = p(x 1 |x 2 ) p(x 3 |x 2 ), i.e., if X 1 and X 3 are conditionally independent, given X 2 .Of course, Definition 3. We say that channel K (2) is less noisy than channel K (1) , and write K (1)  ln K (2) , if for any discrete random variable U with finite support, such that both The less noisy order has been primarily used in network information theory to study the capacity regions of broadcast channels [Makur and Polyanskiy, 2017] and the secrecy capacity of the wiretap and eavesdrop channels problem [Csiszár and Körner, 2011].Secrecy capacity (C S ) is the maximum rate at which information can be transmitted over a communication channel while keeping the communication secure from eavesdroppers -that is, having zero information leakage [Wyner, 1975, Bassi et al., 2019].It has been shown that C S > 0 unless K (2)  ln K (1) , where C S is the secrecy capacity of the Wyner wiretap channel with K (2) as the main channel and K (1) as the eavesdropper channel [Csiszár and Körner, 2011, Corollary 17.11].
Plugging the less noisy order ln in (2) yields a new II measure Intuitively, is the most information that a channel K Q can have about T such that it is less noisy than any other channel K (i) , i = 1, ..., n, that is, a channel that leads to a positive secrecy capacity, when compared to any other channel K (i) .

The "More Capable" Order
The next order we consider, termed "more capable", was used in calculating the capacity region of broadcast channels [Gamal, 1979] or in deciding whether a system is more secure than another [Clark et al., 2005].See the book by Cohen et al. [1998], for more applications of the degradation, less noisy, and more capable orders.Definition 4. We say that channel K (2) is more capable than K (1) , and write K (1)  mc K (2) , if for any distribution p(t) we have Inserting the "more capable" order into (2) leads to that is, is the information that the 'largest' (in the more capable sense), but no larger than any Y i , random variable Q has w.r.t.T .Whereas under the degradation order, it is guaranteed that if Y 1 d Y 2 , then agent 2 will make better decisions, for whatever decision game, on average, under the "more capable" order such a guarantee is not available.We do, however, have the guarantee that if Y 1 mc Y 2 , then for whatever distribution p(t) we know that agent 2 will always have more information about T than agent 1.This has an interventional approach meaning: if we intervene on variable T by changing its distribution p(t) in whichever way we see fit, we have that I(Y 1 ; T ) ≤ I(Y 2 ; T ) (assuming that the distribution p(Y 1 , ..., Y n , T ) can be modeled as a set of channels from T to each Y i ).That is, Since PID is concerned with decomposing a distribution that has fixed p(t), the "more capable" measure is concerned with the mechanism by which T generates Y 1 , ..., Y n , for any p(t), and not concerned with the specific distribution p(t) yielded by p(Y 1 , ..., Y n , T ).
For the sake of completeness, we could also study the II measure that would result from the capacity order.Recall that the capacity of the channel from a variable X to another variable Z, which is only a function of the conditional distribution p(z|x), is defined as [Cover, 1999] C = max Definition 5. Write W c V if the capacity of V is at least as large as the capacity of W.
Even though it is clear that W mc V ⇒ W c V , the c order does not comply with the first of Kolchinsky's axioms (since the definition of capacity involves the choice of a particular marginal that achieves the maximum in (6), which may not coincide with the marginal corresponding to p(y 1 , ..., y n , t)), which is why we don't define an II measure based on it.
Consider an arbitrary channel K and let K i be its ith column.From K, we may define a new channel, which we construct column by column using the JoinMeet operator i,j .Column l of the new channel is defined, for i = j, as Américo et al. [2021] used this operator to define the following two new orders.Intuitively, the operator i,j makes the rows of the channel matrix more similar to each other, by putting in column i all the maxima and in column j the minima, between every pair of elements in columns i and j of every row.In the following definitions, the s stands for supermodularity, a concept we need not introduce in this work.Definition 6.We write W s V if there exists a finite collection of tuples , where each i stands for d or s .We call this the degradation/supermodularity order.
Using the "degradation/supermodularity" (ds) order, we define the ds II measure as: The ds order was recently introduced in the context of core-concave entropies [Américo et al., 2021].Given a coreconcave entropy H, the leakage about T through Y 1 is defined as I H (T ; Y 1 ) = H(T ) − H(T |Y 1 ) .In this work, we are mainly concerned with Shannon's entropy H, but as we will elaborate in the future work section below, one may apply PID to other core-concave entropies.Although the operational interpretation of the ds order is not yet clear, it has found applications in privacy/security contexts and in finding the most secure deterministic channel (under some constraints) [Américo et al., 2021].

Relations Between Orders
Korner and Marton [1977] proved that W d V ⇒ W ln V ⇒ W mc V and gave examples to show that the reverse implications do not hold in general.As Américo et al.
[2021] note, the degradation ( d ), supermodularity ( s ), and degradation/supermodularity ( ds ) orders are structural orders, in the sense that they only depend on the conditional probabilities that are defined by each channel.On the other hand, the less noisy and more capable orders are concerned with information measures resulting from different distributions.It is trivial to see (directly from the definition) that the degradation order implies the degradation/supermodular order.Américo et al. [2021] showed that the degradation/supermodular order implies the more capable order.The set of implications we have seen is schematically depicted in Figure 1.
Figure 1: Implications satisfied by the orders.The reverse implications do not hold in general.
These relations between the orders, for any set of variables Y 1 , ..., Y n , T , imply via the corresponding definitions that These implications, in turn, imply the following result.
Proof.Let i ∈ {1, ..., n}.Since any of the introduced orders implies the more capable order, it follows that they all satisfy the axiom of monotonicity of mutual information.Axiom 2 is trivially true since reflexivity is guaranteed by the definition of preorder.As for axiom 3, the rows of a channel corresponding to a variable C taking a constant value must all be the same (and yield zero mutual information with any target variable T ).From this, it is clear that any Y i satisfies C Y i for any of the introduced orders, by definition of each order.To see that for the less noisy and the more capable, recall that for any

Optimization Problems
We now focus on some observations about the optimization problems of the introduced II measures.All problems seek to maximize I(Q; T ) (under different constraints) as a function of the conditional distribution p(q|t), equivalently with respect to the channel from T to Q, which we will denote as , 1999, Theorem 2.7.4].As we will see, the admissible region of all problems is a compact set and, since I(Q; T ) is a continuous function of the parameters of K Q , the supremum will be achieved, thus we replace sup with max.
As noted by Kolchinsky [2022], the computation of (3) can be rewritten as an optimization problem using auxiliary variables such that it involves only linear constraints, and since the objective function is convex, its maximum is attained at one of the vertices of the admissible region.The computation of the other measures is not as simple, as shown in the following subsections.

The "Less Noisy" Order
To solve (4), we may use one of the necessary and sufficient conditions presented by Makur and Polyanskiy [2017, Theorem 1].For instance, let V and W be two channels with input T , and ∆ T −1 be the probability simplex of the target T .Then, V ln W if and only if, for any pair of distributions p(t), q(t) ∈ ∆ T −1 , the inequality ) holds, where χ 2 denotes the χ 2 -distance4 between two vectors.Notice that p(t)W is the distribution of the output of channel W for input distribution p(t); thus, intuitively, the condition in (10) means that the two output distributions of the less noisy channel are more different from each other than those of the other channel.Hence, computing I ln ∩ (Y 1 , ..., Y n ) can be formulated as solving the problem max Although the restriction set is convex since the χ 2 -divergence is an f -divergence, with f convex [Csiszár and Körner, 2011], the problem is intractable because we have an infinite (uncountable) number of restrictions.One may construct a set S by taking an arbitrary number of samples S of p(t) ∈ ∆ T −1 to define the problem max The above problem yields an upper bound on I ln ∩ (Y 1 , ..., Y n ).

The "More Capable" Order
To compute I mc ∩ (Y 1 , ..., Y n ), we define the problem max which also leads to a convex restriction set, because I(Q; T ) is a convex function of K Q .We discretize the problem in the same manner to obtain a tractable version max which also yields an upper bound on I mc ∩ (Y 1 , ..., Y n ).

The "Degradation/Supermodularity" Order
The final introduced measure, I ds ∩ (Y 1 , ..., Y n ), is given by max To the best of our knowledge, there is currently no known condition to check if 6 Relation with Existing PID Measures Griffith et al. [2014] introduced a measure of II as with the order relation defined by A B if A = f (B) for some deterministic function f .That is, I ∩ quantifies redundancy as the presence of deterministic relations between input and target.If Q is a solution of (15), then there exist functions f 1 , ..., f n , such that Q = f i (Y i ), i = 1, ..., n, which implies that, for all i, T → Y i → Q is a Markov chain.Therefore, Q is an admissible point of the optimization problem that defines Barrett [2015] introduced the so-called minimum mutual information (MMI) measure of bivariate redundancy as It turns out that, if (Y 1 , Y 2 ) is jointly Gaussian and T is univariate, then most of the introduced PIDs in the literature are equivalent to this measure [Barrett, 2015].As noted by Kolchinsky [2022], it may be generalized to more than two sources, which allows us to trivially conclude that, for any set of variables Y 1 , ..., Y n , T , One of the appeals of measures of II, as defined by Kolchinsky [2022], is that it is the underlying preorder that determines what is intersection -or redundant -information.For example, take the degradation II measure, in the n = 2 case.Its solution, Q, satisfies or Y 2 are known, Q has no additional information about T .Such is not necessarily the case for the less noisy or the more capable II measures, that is, the solution Q may have additional information about T even when a source is known.However, the three proposed measures satisfy the following property: any solution Q of the optimization problem satisfies ∀i ∈ {1, ..., n}, ∀t ∈ S T , I(Y i ; where S T is the support of T and I(T = t; Y i ) refers to the so-called specific information [Williams andBeer, 2010, DeWeese andMeister, 1999].That is, independently of the outcome of T , Q has less specific information about T = t than any source variable Y i .This can be seen by noting that any of the introduced orders imply the more capable order.Such is not the case, for example, for I MMI

∩
, which is arguably one of the reasons why it has been criticized for depending only on the amount of information, and not on its content Kolchinsky [2022].As mentioned, there is not much consensus as to what properties a measure of II should satisfy.The three proposed measures for partial information decomposition do not satisfy the so-called Blackwell property [Bertschinger et al., 2014, Rauh et al., 2017b]: This definition is equivalent to demanding that Y 1 d Y 2 , if and only if Y 1 has no unique information about T .Although the (⇒) implication holds for the three proposed measures, the reverse implication does not, as shown by specific examples presented by Korner and Marton [1977], which we will mention below.If one defines the "more capable property" by replacing the degradation order with the more capable order in the original definition of the Blackwell property, then it is clear that measure k satisfies the k property, with k referring to any of the three introduced intersection information measures.Also often studied in PID is the identity property (IP) [Harder et al., 2013].Let the target T be a copy of the source variables, that is, T = (Y 1 , Y 2 ).An II measure I ∩ is said to satisfy the IP if Criticism was levied against this proposal for being too restrictive [Rauh et al., 2014, James et al., 2018b].A less strict property was introduced by Ince [2017], under the name independent identity property (IIP).If the target T is a copy of the input, an II measure is said to satisfy the IIP if Note that the IIP is implied by the IP, but the reverse does not hold.It turns out that all the introduced measures, just like the degradation II measure, satisfy the IIP, but not the IP, as we will show.This can be seen from ( 8), ( 9), and the fact that , as we argue now.Consider the distribution where T is a copy of (Y 1 , Y 2 ), presented in Table 1.
Theorem 3. Let T = (X, Y ) be a copy of the source variables.Then

Proof. As shown by Kolchinsky
. The proof will be complete by showing that I mc ∩ (X, Y ) ≤ C(X ∧ Y ).Construct the bipartite graph with vertex set X ∪ Y and edges (x, y) if p(x, y) > 0. Consider the set of maximally connected components M CC = {CC 1 , ..., CC l }, for some l ≥ 1, where each CC i refers to a maximal set of connected edges.Let CC i , i ≤ l, be an arbitrary set in M CC.Suppose the edges (x 1 , y 1 ) and (x 1 , y 2 ), with y 1 = y 2 are in CC i .This means that the channels K X := K X|T and K Y := K Y |T have rows corresponding to the outcomes T = (x 1 , y 1 ) and T = (x 1 , y 2 ) of the form .
Choosing p(t) = [0, ..., 0, a, 1 − a, 0, ..., 0], that is, p(T = (x 1 , y 1 )) = a and p(T = (x 1 , y 2 )) = 1 − a, we have that, ∀a ∈ [0, 1], I(X; T ) = 0, which implies that the solution Q must be such that, ∀a ∈ [0, 1], I(Q; T ) = 0 (from the definition of the more capable order), which in turn implies that the rows of K Q corresponding to these outcomes must be the same, so that they yield I(Q; T ) = 0 under this set of distributions.We may choose the values of those rows to be the same as those rows from K X -that is, a row that is composed of zeros except for one of the positions whenever T = (x 1 , y 1 ) or T = (x 1 , y 2 ).On the other hand, if the edges (x 1 , y 1 ) and (x 2 , y 1 ), with x 1 = x 2 , are also in CC i , the same argument leads to the conclusion that the rows of K Q corresponding to the outcomes T = (x 1 , y 1 ), T = (x 1 , y 2 ), and T = (x 2 , y 1 ) must be the same.Applying this argument to every edge in CC i , we conclude that the rows of K Q corresponding to outcomes (x, y) ∈ CC i must all be the same.Using this argument for every set CC 1 , ..., CC l implies that if two edges are in the same CC, the corresponding rows of K Q must be the same.These corresponding rows of K Q may vary between different CCs, but for the same CC, they must be the same.
We are left with the choice of appropriate rows of K Q for each corresponding CC i .Since I(Q; T ) is maximized by a deterministic relation between Q and T , and as suggested before, we choose a row that is composed of zeros except for one of the positions, for each CC i , so that Q is a deterministic function of T .This admissible point Q implies that Q = f 1 (X) and Q = f 2 (Y ), since X and Y are also functions of T , under the channel perspective.For this choice of rows, we have where we have used the fact that Bertschinger et al. [2014] suggested what later became known as the (*) assumption, which states that, in the bivariate source case, any sensible measure of unique information should only depend on K (1) , K (2) , and p(t).It is not clear that this assumption should hold for every PID.It is trivial to see that all the introduced II measures satisfy the (*) assumption.
We conclude with some applications of the proposed measures to famous (bivariate) PID problems, with results shown in Table 2. Due to channel design in these problems, the computation of the proposed measures is fairly trivial.We assume the input variables are binary (taking values in {0, 1}), independent, and equiprobable.
We note that in these fairly simple toy distributions, all the introduced measures yield the same value.This is not surprising when the distribution p(t, y 1 , y 2 ) yields K (1) = K (2) , which implies that where k refers to any of the introduced preorders, as is the case in the  James et al. [2018b].We use the dit package [James et al., 2018c] to compute them as well as the code provided in [Kolchinsky, 2022].Consider counterexample 1 by Korner and Marton [1977] with p = 0.25, = 0.2, δ = 0.1, given by K (1) = 0.25 0.75 0.35 0.65 , K (2) = 0.675 0.325 0.745 0.255 .
These channels satisfy K (2) ln K (1) and K (2) d K (1) Korner and Marton [1977].This is an example that satisfies, for whatever distribution p(t), It is noteworthy to see that even though there is no degradation order between the two channels, we still have that 2) .We present various PID under different measures, after choosing p(t) = [0.4,0.6] (which yields I(T ; Y 2 ) ≈ 0.004) and assuming p(t, y 1 , y 2 ) = p(t)p(y 1 |t)p(y 2 |t).We write I ds ∩ = * because we don't yet have a way to find the 'largest' Q, such that Q ds K (1) and Q ds K (2) .See counterexample 2 by Korner and Marton [1977] for an example of channels K (1) , K (2) that satisfy K (2)  mc K (1) but K (2)  ln K (1) , leading to different values of the proposed II measures.An example of K (3) , K (4) that satisfy There is no stochastic matrix K U , such that K (4) = K (3) K U , but K (4) ds K (3) because K (4) = 1,2 K (3) .Using (10) one may check that there is no less noisy relation between the two channels5 .We present the decomposition of p(t, y 3 , y 4 ) = p(t)p(y 3 |t)p(y 4 |t) for the choice of p(t) = [0.3,0.3, 0.4] (which yields I(T ; Y 4 ) ≈ 0.322) in Table 4.We write I ln ∩ = 0 * because we conjecture, after some numerical experiments based on (10), that the 'largest' channel that is less noisy than both K (3) and K (4) is a channel that satisfies I(Q; T ) = 06 .

Conclusion and future work
In this paper, we introduced three new measures of intersection information for the partial information decomposition (PID) framework, based on preorders between channels implied by the degradation/Blackwell order.The new measures were obtained from the orders by following the approach recently proposed by Kolchinsky [2022].The main contributions and conclusions of the paper can be summarized as follows: • We showed that a measure of intersection information generated by a preorder that satisfies the axioms by Kolchinsky [2022] also satisfies the Williams-Beer axioms [Williams and Beer, 2010].• As a corollary of the previous result, the proposed measures satisfy the Williams-Beer axioms and can be extended beyond two sources.• We demonstrated that, if there is a degradation ordering between the sources, the measures coincide in their decomposition.Conversely, if there is no degradation ordering (only a weaker ordering) between the source variables, the proposed measures lead to novel finer information decompositions that capture different, finer information.• We showed that the proposed measures do not satisfy the identity property (IP) [Harder et al., 2013], but satisfy the independent identity property (IIP) [Ince, 2017].• We formulated the optimization problems that yield the proposed measures and derived bounds by relating them to existing measures.
Finally, we believe this paper opens several avenues for future research, thus we point at several directions to be pursued in upcoming work: • Investigating conditions to verify if two channels, K (1) and K (2) , satisfy K (1) ds K (2) .• Kolchinsky [2022] showed that when computing  [Massey, 1994] or the Tsallis entropy [Tsallis, 1988].See the work of Américo et al. [2021] for other core-concave entropies that may be decomposed under the introduced preorders, as these entropies are consistent with the introduced orders.• Another line for future work is to define measures of union information with the introduced preorders, as suggested by Kolchinsky [2022], and study their properties.• As a more long-term research direction, it would be interesting to study how the approach studied in this paper can be extended to quantum information, where the fact that partial quantum information can be negative may open possibilities or create difficulties [Horodecki et al., 2005].
S Faber, N Timme, J Beggs, and E Newman.

Table 2 :
Application of the measures to famous PID problems.
T = Y 1 AND Y 2 and T = Y 1 + Y 2 examples.Less trivial examples lead to different values over the introduced measures.We present distributions that show that our three introduced measures lead to novel information decompositions by comparing them to the following existing measures: I ∩ from Griffith et al. [2014], I MMI ∩ from Barrett [2015], I WB ∩ from Williams and Beer [2010], I GH ∩ from Griffith and Ho [2015], I Ince ∩ from Ince [2017], I FL ∩ from Finn and Lizier [2018], I BROJA
it is sufficient to consider variables Q with support size, at most, i |S Yi | − n + 1, as a consequence of the admissible region of I d ∩ (Y 1 , ..., Y n ) being a polytope.Such is not the case with the less noisy or the more capable measures, hence it is not clear if it suffices to consider Q with the same support size.This is a direction of future research.• Studying under which conditions the different intersection information measures are continuous.• Implementing the different introduced measures, by addressing the corresponding optimization problems.• Considering the usual PID framework, but instead of decomposing I(T ; Y ) = H(Y ) − H(Y |T ), where H denotes Shannon's entropy, one can consider other mutual informations, induced by different entropy measures, such as the guessing entropy Computation is concentrated in rich clubs of local cortical networks.Network Neuroscience, 3(2):384-404, 2019.R James, B Ayala, B Zakirov, and J Crutchfield.Modes of information flow.arXiv preprint arXiv:1808.06723,2018a.R. Arellano-Valle, J. Contreras-Reyes, and M. Genton.Shannon entropy and mutual information for multivariate skew-elliptical distributions.Scandinavian Journal of Statistics, 40(1):42-62, 2013.T Cover.Elements of information theory.John Wiley & Sons, 1999.A Gutknecht, M Wibral, and A Makkeh.Bits and pieces: Understanding information decomposition from part-whole relationships and formal logic.Proceedings of the Royal Society A, 477(2251):20210110, 2021.M Harder, C Salge, and D Polani.Bivariate measure of redundant information.Physical Review E, 87(1):012130, 2013.N Bertschinger, J Rauh, E Olbrich, J Jost, and N Ay.Quantifying unique information.Entropy, 16(4):2161-2183, 2014.V Griffith and C Koch.Quantifying synergistic mutual information.In Guided self-organization: inception, pages 159-190.Springer, 2014.R James, J Emenheiser, and J Crutchfield.Unique information via dependency constraints.Journal of Physics A: Mathematical and Theoretical, 52(1):014002, 2018b.D Chicharro and S Panzeri.Synergy and redundancy in dual decompositions of mutual information gain and information loss.Entropy, 19(2):71, 2017.N Bertschinger, J Rauh, E Olbrich, and J Jost.Shared information-new insights and problems in decomposing information in complex systems.In Proceedings of the European conference on complex systems 2012, pages 251-269.Springer, 2013.J Rauh, P Banerjee, E Olbrich, J Jost, N Bertschinger, and D Wolpert.Coarse-graining and the blackwell order.Entropy, 19(10):527, 2017a.R Ince.Measuring multivariate redundant information with pointwise common change in surprisal.Entropy, 19(7):318, 2017.A Kolchinsky.A novel approach to the partial information decomposition.Entropy, 24(3):403, 2022.J Korner and K Marton.Comparison of two noisy channels.Topics in Information Theory, I. Csiszr and P. Elias, Eds., Amsterdam, The Netherlans, pages 411-423, 1977.A Américo, A Khouzani, and P Malacaria.Channel-supermodular entropies: Order theory and an application to query anonymization.Entropy, 24(1):39, 2021.Bernd SW Schröder.Ordered sets.Springer, 29:30, 2003.J Cohen, J Kempermann, and G Zbaganu.Comparisons of stochastic matrices with applications in information theory, statistics, economics and population.Springer Science & Business Media, 1998.D Blackwell.Equivalent comparisons of experiments.The annals of mathematical statistics, pages 265-272, 1953.Anuran Makur and Y Polyanskiy.Less noisy domination by symmetric channels.In 2017 IEEE International Symposium on Information Theory (ISIT), pages 2463-2467.IEEE, 2017.I Csiszár and J Körner.Information theory: coding theorems for discrete memoryless systems.Cambridge University Press, 2011.A Wyner.The wire-tap channel.Bell system technical journal, 54(8):1355-1387, 1975.Germán Bassi, Pablo Piantanida, and Shlomo Shamai.The secret key capacity of a class of noisy channels with correlated sources.Entropy, 21(8):732, 2019.A Gamal.The capacity of a class of broadcast channels.IEEE Transactions on Information Theory, 25(2):166-169, 1979.D Clark, S Hunt, and P Malacaria.Quantitative information flow, relations and polymorphic types.Journal of Logic and Computation, 15(2):181-199, 2005.V Griffith, E Chong, R James, C Ellison, and J Crutchfield.Intersection information based on common randomness.Entropy, 16(4):1985-2000, 2014.
A Barrett.Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems.Physical Review E, 91(5):052802, 2015.M DeWeese and M Meister.How to measure the information gained from one symbol.Network: Computation in Neural Systems, 10(4):325, 1999.