Orders between Channels and Implications for Partial Information Decomposition

Gomes, André F. C.; Figueiredo, Mário A. T.

doi:10.3390/e25070975

Open AccessArticle

Orders between Channels and Implications for Partial Information Decomposition

by

André F. C. Gomes

^*

and

Mário A. T. Figueiredo

Instituto de Telecomunicações and LUMLIS (Lisbon ELLIS Unit), Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa, Portugal

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(7), 975; https://doi.org/10.3390/e25070975

Submission received: 5 May 2023 / Revised: 21 June 2023 / Accepted: 22 June 2023 / Published: 25 June 2023

(This article belongs to the Special Issue Measures of Information III)

Download

Browse Figure

Versions Notes

Abstract

:

The partial information decomposition (PID) framework is concerned with decomposing the information that a set of random variables has with respect to a target variable into three types of components: redundant, synergistic, and unique. Classical information theory alone does not provide a unique way to decompose information in this manner, and additional assumptions have to be made. Recently, Kolchinsky proposed a new general axiomatic approach to obtain measures of redundant information based on choosing an order relation between information sources (equivalently, order between communication channels). In this paper, we exploit this approach to introduce three new measures of redundant information (and the resulting decompositions) based on well-known preorders between channels, contributing to the enrichment of the PID landscape. We relate the new decompositions to existing ones, study several of their properties, and provide examples illustrating their novelty. As a side result, we prove that any preorder that satisfies Kolchinsky’s axioms yields a decomposition that meets the axioms originally introduced by Williams and Beer when they first proposed PID.

Keywords:

information theory; partial information decomposition; channel preorders; intersection information; shared information; redundancy

1. Introduction

Williams and Beer [1] proposed the partial information decomposition (PID) framework as a way to characterize or analyze the information that a set of random variables (often called sources) has about another variable (referred to as the target). PID is a useful tool for gathering insights and analyzing the way information is stored, modified, and transmitted within complex systems [2,3]. It has found applications in areas such as cryptography [4] and neuroscience [5,6], with many other potential use cases, such as in understanding how information flows function in gene regulatory networks [7], neural coding [8], financial markets [9], and network design [10].

Consider the simplest case, that of a three-variable joint distribution

p (y_{1}, y_{2}, t)

describing three random variables: two sources

Y_{1}

and

Y_{2}

and a target T. Notice that despite the names sources and target, there is no directionality assumption, either causal or otherwise. The goal of PID is to decompose the information that

Y = (Y_{1}, Y_{2})

has about T into the sum of four non-negative quantities: the information that is present in both

Y_{1}

and

Y_{2}

, known as redundant information R; the information that only

Y_{1}

(respectively,

Y_{2}

) has about T, known as unique information

U_{1}

(respectively,

U_{2}

); and the synergistic information S that is present in the pair

(Y_{1}, Y_{2})

, and is not present in either

Y_{1}

or

Y_{2}

alone. That is, in this case with two variables, the goal is to write

I (T; Y) = R + U_{1} + U_{2} + S,

(1)

where

I (T; Y)

is the mutual information between T and Y [11]. In this paper, mutual information is always assumed to refer to Shannon’s mutual information, which for two discrete variables

X \in X

and

Z \in Z

is provided by

I (X; Z) = \sum_{x \in X} \sum_{z \in Z} p (x, z) log \frac{p (x, z)}{p (x) p (z)},

and satisfies the following well-known fundamental properties:

I (X; Z) \geq 0

and

I (X; Z) = 0 \Leftrightarrow X ⊥ Z

(X and Z are independent) [11].

Because unique information and redundancy satisfy the relationship

U_{i} = I (T; Y_{i}) - R

(for

i \in {1, 2}

), it turns out that defining how to compute one of these quantities (R,

U_{i}

, or S) is enough to fully determine the others [1]. As the number of variables grows, the number of terms appearing in the PID of

I (T; Y)

grows super-exponentially [12]. Williams and Beer [1] suggested a set of axioms that a measure of redundancy should satisfy and proposed a measure of their own. These axioms have become known as the Williams–Beer axioms, although the measure they proposed has subsequently been criticized for not capturing informational content, only information size [13].

Spawned by their initial work, other measures and axioms for information decomposition have been introduced; see, for example, the work by Bertschinger et al. [14], Griffith and Koch [15], and James et al. [16]. There is no consensus about what axioms any measure should satisfy or whether a given measure is capturing the information that it should capture other than the Williams–Beer axioms. Today, debate continues about which axioms a measure of redundant information ought to satisfy, and there is no general agreement on what makes for an appropriate PID [16,17,18,19,20].

Recently, Kolchinsky [21] suggested a new general approach to defining measures of redundant information, known as intersection information (II), the designation that we adopt hereinafter. The core of this approach is the choice of an order relation between information sources (random variables), which allows two sources to be compared in terms of how informative they are with respect to the target variable.

In this work, we use previously studied preorders between communication channels, which correspond to preorders between the corresponding output variables in terms of information content with respect to the input. Following Kolchinsky’s approach, we show that these orders lead to the definition of new II measures. The rest of the paper is organized as follows. In Section 2 and Section 3, we review Kolchinsky’s definition of an II measure and the degradation order. In Section 4, we describe a number of preorders between channels then, based on the work by Korner and Marton [22] and Américo et al. [23], we derive the resulting II measures and study of their properties. Section 5 presents comments on the optimization problems involved in computation of the proposed measures. In Section 6, we explore the relationships between the new II measures and previous PID approaches, then apply the proposed II measures to several famous PID problems. Section 7 concludes the paper by pointing out suggestions for future work.

2. Kolchinsky’s Axioms and Intersection Information

Consider a set of n discrete random variables

Y_{1} \in Y_{1}, \dots, Y_{n} \in Y_{n}

, called the source variables, and let

T \in T

be the target variable (also discrete), with a joint distribution (probability mass function)

p (y_{1}, \dots, y_{n}, t)

. Let ⪯ denote some preorder between random variables that satisfies the following axioms, herein referred to as Kolchinsky’s axioms [21]:

(i): Monotonicity of mutual information w.r.t. T: $Y_{i} ⪯ Y_{j} \Rightarrow I (Y_{i}; T) \leq I (Y_{j}; T)$ .
(ii): Reflexivity: $Y_{i} ⪯ Y_{i}$ for all $Y_{i}$ .
(iii): For any $Y_{i}$ , $C ⪯ Y_{i} ⪯ (Y_{1}, \dots, Y_{n})$ , where $C \in C$ is any variable taking a constant value with probability one, i.e., with a distribution that is a delta function or such that $C$ is a singleton.

Kolchinsky [21] showed that such an order can be used to define an II measure via

I_{\cap} (Y_{1}, \dots, Y_{n} \to T) : = sup_{Q : Q ⪯ Y_{i}, i \in {1, . ., n}} I (Q; T),

(2)

and we now show that this implies that the II measure in (2) satisfies the Williams–Beer axioms [1,2], establishing a strong connection between these formulations. Before stating and proving this result, we first recall the Williams–Beer axioms [2], where the definition of a source

A_{i}

is that of a set of random variables, e.g.,

A_{1} = {X_{1}, X_{2}}

.

Definition 1.

Let

A_{1}, \dots, A_{r}

be an arbitrary number of

r \geq 2

sources. An intersection information measure

I_{\cap}

is said to satisfy the Williams-Beer axioms if it satisfies the following:

1.: Symmetry: $I_{\cap}$ is symmetric in the $A_{i}$ s.
2.: Self-redundancy: $I_{\cap} (A_{i}) = I (A_{i}; T)$ .
3.: Monotonicity: $I_{\cap} (A_{1}, \dots, A_{r - 1}, A_{r}) \leq I_{\cap} (A_{1}, \dots, A_{r - 1})$ .
4.: Equality for Monotonicity: If $A_{r - 1} \subseteq A_{r}$ , then $I_{\cap} (A_{1}, \dots, A_{r - 1}, A_{r}) = I_{\cap} (A_{1}, \dots, A_{r - 1})$ .

Theorem 1.

Let ⪯ be some preorder that satisfies Kolchinsky’s axioms, and define its corresponding II measure as in (2). Then, the corresponding II measure satisfies the Williams–Beer axioms.

Proof.

Symmetry and monotonicity follow trivially given the form of (2) (the definition of the supremum and restriction set). Self-redundancy follows from the reflexivity of the preorder and monotonicity of mutual information. Now, suppose

A_{r - 1} \subseteq A_{r}

, and let Q be a solution of

I_{\cap} (A_{1}, \dots, A_{r - 1})

, implying that

Q ⪯ A_{r - 1}

. Now, because

A_{r - 1} \subseteq A_{r}

, the third Kolchinsky axiom and transitivity of the preorder ⪯ guarantee that

Q ⪯ A_{r - 1} ⪯ A_{r}

, meaning that Q is an admissible point of

I_{\cap} (A_{1}, \dots, A_{r})

. Therefore,

I_{\cap} (A_{1}, \dots, A_{r - 1}, A_{r}) \geq I_{\cap} (A_{1}, \dots, A_{r - 1})

and monotonicity guarantees that

I_{\cap} (A_{1}, \dots, A_{r - 1}, A_{r}) = I_{\cap} (A_{1}, \dots, A_{r - 1})

. □

In conclusion, every preorder relation that satisfies the set of axioms introduced by Kolchinsky [21] yields a valid II measure, in the sense that the measure satisfies the Williams–Beer axioms. Having a more informative relation ⪯ allows us to draw conclusions about information flowing from different sources, and allows for the construction of PID measures that are well-defined for more than two sources. In the following, we omit “

\to T

” from the notation unless we need to explicitly refer to it, with the understanding that the target variable is always some arbitrary discrete random variable T.

3. Channels and the Degradation/Blackwell Order

In an information-theoretical perspective, given two discrete random variables

X \in X

and

Z \in Z

, the corresponding conditional distribution

p (z | x)

corresponds to a discrete memoryless channel with a channel matrix K such that

K [x, z] = p (z | x)

[11]. This matrix is row-stochastic, i.e.,

K [x, z] \geq 0

for any

x \in X

and

z \in Z

, and

\sum_{z \in Z} K [x, z] = 1

.

The comparison of different channels (equivalently, different stochastic matrices) is an object of study with many applications in different fields [24]. Such investigations address order relations between channels and their properties. One such order, named the degradation order (or Blackwell order) and defined next, was used by Kolchinsky to obtain a particular II measure [21].

Consider the distribution

p (y_{1}, \dots, y_{n}, t)

and the channels

K^{(i)}

between T and each

Y_{i}

, that is,

K^{(i)}

is a

| T | \times | Y_{i} |

row-stochastic matrix with the conditional distribution

p (y_{i} | t)

.

Definition 2.

We say that channel

K^{(i)}

is a degradation of channel

K^{(j)}

, and write

K^{(i)} ⪯_{d} K^{(j)}

or

Y_{i} ⪯_{d} Y_{j}

, if there exists a channel

K^{U}

from

Y_{j}

to

Y_{i}

, i.e., a

| Y_{j} | \times | Y_{i} |

row-stochastic matrix, such that

K^{(i)} = K^{(j)} K^{U}

.

Intuitively, consider two agents, one with access to

Y_{i}

and the other with access to

Y_{j}

. The agent with access to

Y_{j}

has at least as much information about T as the one with access to

Y_{i}

, as it has access to channel

K^{U}

, which permits sampling from

Y_{i}

conditionally on

Y_{j}

[19]. Blackwell [25] showed that this is equivalent to saying that, for whatever decision game where the goal is to predict T and for whatever utility function, the agent with access to

Y_{i}

cannot do better on average than the agent with access to

Y_{j}

.

Based on the degradation/Blackwell order, Kolchinsky [21] introduced the degradation II measure by plugging the “

⪯_{d}

” order into (2):

I_{\cap}^{d} (Y_{1}, \dots, Y_{n}) : = sup_{Q : Q ⪯_{d} Y_{i}, i \in {1, . ., n}} I (Q; T) .

(3)

As noted by Kolchinsky [21], this II measure has the following operational interpretation. Supposing that

n = 2

and considering two agents, 1 and 2, with access to variables

Y_{1}

and

Y_{2}

, respectively,

I_{\cap}^{d} (Y_{1}, Y_{2})

is the maximum information that agent 1 (respectively 2) can have with respect to T without being able to do better than agent 2 (respectively 1) on any decision problem that involves guessing T. That is, the degradation II measure quantifies the existence of a dominating strategy for any guessing game.

4. Other Orders and Corresponding II Measures

4.1. The “Less Noisy” Order

Korner and Marton [22] introduced and studied preorders between channels with the same input. We follow most of their definitions, and change others when appropriate. We interchangeably write

Y_{1} ⪯ Y_{2}

to mean

K^{(1)} ⪯ K^{(2)}

, where

K^{(1)}

and

K^{(2)}

are the channel matrices defined above.

Before introducing the next channel order, we need to review the notion of Markov chains [11]. We can say that three random variables

X_{1}

,

X_{2}

, and

X_{3}

form a Markov chain, for which we write

X_{1} \to X_{2} \to X_{3}

, if the following equality holds:

p (x_{1}, x_{3} | x_{2}) = p (x_{1} | x_{2}) p (x_{3} | x_{2})

, i.e., if

X_{1}

and

X_{3}

are conditionally independent given

X_{2}

. Of course,

X_{1} \to X_{2} \to X_{3}

if and only if

X_{3} \to X_{2} \to X_{1}

.

Definition 3.

We say that channel

K^{(2)}

is less noisy than channel

K^{(1)}

, and write

K^{(1)} ⪯_{l n} K^{(2)}

, if for any discrete random variable U with finite support (such that both

U \to T \to Y_{1}

and

U \to T \to Y_{2}

hold) we have

I (U; Y_{1}) \leq I (U; Y_{2})

.

The less noisy order has been primarily used in network information theory to study the problems of the capacity regions of broadcast channels [26] and the secrecy capacity of wiretap and eavesdrop channels [27]. The secrecy capacity (

C_{S}

) is the maximum rate at which information can be transmitted over a communication channel while keeping the communication secure from eavesdroppers, that is, having zero information leakage [28,29]. It has been shown that

C_{S} > 0

unless

K^{(2)} ⪯_{l n} K^{(1)}

, where

C_{S}

is the secrecy capacity of the Wyner wiretap channel, with

K^{(2)}

as the main channel and

K^{(1)}

as the eavesdropper channel ([27], Corollary 17.11).

Plugging the less noisy order

⪯_{l n}

into (2) yields a new II measure

I_{\cap}^{l n} (Y_{1}, \dots, Y_{n}) : = sup_{Q : Q ⪯_{l n} Y_{i}, i \in {1, . ., n}} I (Q; T) .

(4)

Intuitively,

I_{\cap}^{l n} (Y_{1}, \dots, Y_{n})

is the most information that a channel

K^{Q}

can have about T such that it is less noisy than any other channel

K^{(i)}, i = 1, \dots, n

, that is, a channel that leads to a positive secrecy capacity, as compared to any other channel

K^{(i)}

.

4.2. The “More Capable” Order

The next order we consider, termed “more capable”, has been used in calculating the capacity region of broadcast channels [30] and to help determine whether one system is more secure than another [31]; see the book by Cohen et al. [24] for more applications of the degradation, less noisy, and more capable orders.

Definition 4.

We say that channel

K^{(2)}

is more capable than

K^{(1)}

, and write

K^{(1)} ⪯_{m c} K^{(2)}

, if for any distribution

p (t)

we have

I (T; Y_{1}) \leq I (T; Y_{2})

.

Inserting the “more capable” order into (2) leads to

I_{\cap}^{m c} (Y_{1}, \dots, Y_{n}) : = sup_{Q : Q ⪯_{m c} Y_{i}, i \in {1, . ., n}} I (Q; T),

(5)

that is,

I_{\cap}^{m c} (Y_{1}, \dots, Y_{n})

is the information that the ‘largest’ (in the more capable sense), though no larger than any

Y_{i}

, that the random variable Q has with respect to T. Whereas under the degradation order it is guaranteed that agent 2 will make better decisions if

Y_{1} ⪯_{d} Y_{2}

for whatever decision game, on average, under the “more capable“ order, such a guarantee is not available. However, we do have a guarantee that, if

Y_{1} ⪯_{m c} Y_{2}

, then for a given distribution

p (t)

we know that agent 2 always has more information about T than agent 1. This has an interventional approach meaning; if we intervene on variable T by changing its distribution

p (t)

in whichever way we see fit, we have

I (Y_{1}; T) \leq I (Y_{2}; T)

(assuming that the distribution

p (Y_{1}, \dots, Y_{n}, T)

can be modeled as a set of channels from T to each

Y_{i}

); that is to say,

I_{\cap}^{m c} (Y_{1}, \dots, Y_{n})

is the highest information that a channel

K^{Q}

can have about T such that for any change in

p (t)

,

K^{Q}

knows less about T than any

Y_{i}, i = 1, \dots, n

. Because PID is concerned with decomposing a distribution that has fixed

p (t)

, the “more capable” measure is concerned with the mechanism by which T generates

Y_{1}, \dots, Y_{n}

for any

p (t)

, and is not concerned with the specific distribution

p (t)

yielded by

p (Y_{1}, \dots, Y_{n}, T)

.

For the sake of completeness, we could additionally study the II measure that would result from the capacity order. Recall that the capacity of the channel from a variable X to another variable Z, which is only a function of the conditional distribution

p (z | x)

, is defined as [11]

C = max_{p (x)} I (X; Z) .

(6)

Definition 5.

We can write

W ⪯_{c} V

if the capacity of V is at least as large as the capacity of W.

Even though it is clear that

W ⪯_{m c} V \Rightarrow W ⪯_{c} V

, the

⪯_{c}

order does not comply with the first of Kolchinsky’s axioms, as the definition of capacity involves the choice of a particular marginal that achieves the maximum in (6), which may not coincide with the marginal corresponding to

p (y_{1}, \dots, y_{n}, t)

. For this reason, we do not define an II measure based on it.

4.3. The “Degradation/Supermodularity” Order

In order to introduce the last II measure, we follow the work and notation of Américo et al. [23]. Given two real vectors r and s with dimension n, let

r \lor s : = (max (r_{1}, s_{1}),

\dots, max (r_{n}, s_{n}))

and

r \land s : = (min (r_{1}, s_{1}), \dots, min (r_{n}, s_{n}))

. Consider an arbitrary channel K, and let

K_{i}

be its ith column. From K, we may define a new channel which we construct column by column using the JoinMeet operator

⋄_{i, j}

. Column l of the new channel is defined for

i \neq j

as

{(⋄_{i, j} K)}_{l} = \{\begin{matrix} K_{i} \lor K_{j}, & i f l = i \\ K_{i} \land K_{j}, & i f l = j \\ K_{l}, & o t h e r w i s e \end{matrix} .

Américo et al. [23] used this operator to define the two new orders described below. Intuitively, the operator

⋄_{i, j}

makes the rows of the channel matrix more similar to each other by putting all the maxima in column i and the minima in column j between every pair of elements in columns i and j of every row. In the following definitions, the s stands for supermodularity, a concept we need not introduce in this work.

Definition 6.

We can write

W ⪯_{s} V

if there exists a finite collection of tuples

(i_{k}, j_{k})

such that

W = ⋄_{i_{1}, j_{1}} (⋄_{i_{2}, j_{2}} (\dots (⋄_{i_{m}, j_{m}} V))

.

Definition 7.

Ww write

W ⪯_{d s} V

if there are m channels

U^{(1)}, \dots, U^{(m)}

such that

W ⪯_{0} U^{(1)} ⪯_{1} U^{(2)} ⪯_{2} \dots ⪯_{m - 1} U^{(m)} ⪯_{m} V

, where each

⪯_{i}

stands for

⪯_{d}

or

⪯_{s}

. We call this the degradation/supermodularity order.

Using the “degradation/supermodularity” (ds) order, we can define the ds II measure as follows:

I_{\cap}^{d s} (Y_{1}, \dots, Y_{n}) : = sup_{Q : Q ⪯_{d s} Y_{i}, i \in {1, . ., n}} I (Q; T) .

(7)

The ds order was recently introduced in the context of core-concave entropies [23]. Given a core-concave entropy H, the leakage about T through

Y_{1}

is defined as

I_{H} (T; Y_{1}) = H (T) - H (T | Y_{1})

. In this work, we are mainly concerned with the Shannon entropy H; however, as we elaborate in the future work section at the end of this paper, PID may be applied to other core-concave entropies. Although the operational interpretation of the ds order is not yet clear, it has found applications in privacy/security contexts, as well as in finding the most secure deterministic channel (under certain constraints) [23].

4.4. Relations between Orders

Korner and Marton [22] proved that

W ⪯_{d} V \Rightarrow W ⪯_{l n} V \Rightarrow W ⪯_{m c} V

, and provided examples to show that the reverse implications do not hold in general. As Américo et al. [23] note, the degradation (

⪯_{d}

), supermodularity (

⪯_{s}

), and degradation/ supermodularity (

⪯_{d s}

) orders are structural orders, in the sense that they only depend on the conditional probabilities that are defined by each channel. On the other hand, the less noisy and more capable orders are concerned with information measures resulting from different distributions. It is trivial to see (directly from the definition) that the degradation order implies the degradation/supermodular order. In turn, Américo et al. [23] showed that the degradation/supermodular order implies the more capable order. This set of implications is schematically depicted in Figure 1.

For any set of variables

Y_{1}, \dots, Y_{n}, T

, these relations between the orders imply, via the corresponding definitions, that

\begin{matrix} I_{\cap}^{d} (Y_{1}, \dots, Y_{n}) \leq I_{\cap}^{l n} (Y_{1}, \dots, Y_{n}) \leq I_{\cap}^{m c} (Y_{1}, \dots, Y_{n}) \end{matrix}

(8)

and

\begin{matrix} I_{\cap}^{d} (Y_{1}, \dots, Y_{n}) \leq I_{\cap}^{d s} (Y_{1}, \dots, Y_{n}) \leq I_{\cap}^{m c} (Y_{1}, \dots, Y_{n}) . \end{matrix}

(9)

These, in turn, imply the following result.

Theorem 2.

The preorders

⪯_{l n}

,

⪯_{m c}

, and

⪯_{d s}

, satisfy Kolchinsky’s axioms.

Proof.

Let

i \in {1, \dots, n}

. Because any of the introduced orders implies the more capable order, it follows that they all satisfy the axiom of monotonicity of mutual information. Axiom 2 is trivially true, as reflexivity is guaranteed by the definition of preorder. For axiom 3, the rows of a channel corresponding to a variable C taking a constant value must all be the same (and yield zero mutual information with any target variable T), from which it is clear that any

Y_{i}

satisfies

C ⪯ Y_{i}

for any of the introduced orders per the definition of each order. To see that

Y_{i} ⪯ Y = (Y_{1}, \dots, Y_{n})

for the less noisy and the more capable, recall that for any U such that

U \to T \to Y_{i}

and

U \to T \to Y

it is trivial that

I (U; Y_{i}) \leq I (U; Y)

; hence,

Y_{i} ⪯_{l n} Y

. A similar argument can be used to show that

Y_{i} ⪯_{m c} Y

, as

I (T; Y_{i}) \leq I (T; Y)

. Finally, to see that

Y_{i} ⪯_{d s} (Y_{1}, \dots, Y_{n})

, note that

Y_{i} ⪯_{d} (Y_{1}, \dots, Y_{n})

[21]; hence,

Y_{i} ⪯_{d s} (Y_{1}, \dots, Y_{n})

. □

5. Optimization Problems

We now focus on certain observations around optimization problems involving the introduced II measures. All of these problems seek to maximize

I (Q; T)

(under different constraints) as a function of the conditional distribution

p (q | t)

, and equivalently with respect to the channel from T to Q, which we denote as

K^{Q} : = K^{Q | T}

. For fixed

p (t)

, as is the case in PID,

I (Q; T)

is a convex function of

K^{Q}

([11], Theorem 2.7.4). As we will see, the admissible region of all problems is a compact set, and because

I (Q; T)

is a continuous function of the parameters of

K^{Q}

, the supremum is achieved; thus, we replace sup here with max.

As noted by Kolchinsky [21], the computation of (3) can be rewritten as an optimization problem using auxiliary variables such that it involves only linear constraints, and because the objective function is convex, its maximum is attained at one of the vertices of the admissible region. The computation of the other measures, however, is not as simple, as shown in the following subsections.

5.1. The “Less Noisy” Order

To solve (4), we can use one of the necessary and sufficient conditions presented by (Makur and Polyanskiy [26], Theorem 1). For instance, let V and W be two channels with input T, and let

Δ^{T - 1}

be the probability simplex of the target T. Then,

V ⪯_{l n} W

, if and only if the inequality

χ^{2} (p (t) W | | q (t) W) \geq χ^{2} (p (t) V | | q (t) V)

(10)

holds for any pair of distributions

p (t), q (t) \in Δ^{T - 1}

, where

χ^{2}

in the above equation denotes the

χ^{2}

-distance between two vectors. The

χ^{2}

distance between two vectors u and v of dimension n is given by

χ^{2} (u | | v) = \sum_{i = 1}^{n} {(u_{i} - v_{i})}^{2} / v_{i}

. Notice that

p (t) W

is the distribution of the output of channel W for input distribution

p (t)

; thus, intuitively, the condition in (10) means that the two output distributions of the less noisy channel are more different from each other than those of the other channel. Hence, computing

I_{\cap}^{l n} (Y_{1}, \dots, Y_{n})

can be formulated as solving the problem

\begin{matrix} max_{K^{Q}} & I (Q; T) \\ s . t . & K^{Q} i s a s t o c h a s t i c m a t r i x, \\ \forall p (t), q (t) \in Δ^{T - 1}, \forall i \in {1, \dots, n}, χ^{2} (p (t) K^{(i)} | | q (t) K^{(i)}) \geq χ^{2} (p (t) K^{Q} | | q (t) K^{Q}) . \end{matrix}

Although the restriction set is convex, as the

χ^{2}

-divergence is an f-divergence with convex f [27], the problem is intractable because we have an infinite (uncountable) number of restrictions. It is possible to construct a set

S

by taking an arbitrary number of samples S of

p (t) \in Δ^{T - 1}

to define the problem

\begin{matrix} max_{K^{Q}} & I (Q; T), \\ s . t . & K^{Q} i s a s t o c h a s t i c m a t r i x, \\ \forall p (t), q (t) \in S, \forall i \in {1, \dots, n}, χ^{2} (p (t) K^{(i)} | | q (t) K^{(i)}) \geq χ^{2} (p (t) K^{Q} | | q (t) K^{Q}) . \end{matrix}

(11)

The above problem yields an upper bound on

I_{\cap}^{l n} (Y_{1}, \dots, Y_{n})

.

5.2. The “More Capable” Order

To compute

I_{\cap}^{m c} (Y_{1}, \dots, Y_{n})

, we can define the problem

\begin{matrix} max_{K^{Q}} & I (Q; T) \\ s . t . & K^{Q} i s a s t o c h a s t i c m a t r i x, \\ \forall p (t) \in Δ^{T - 1}, \forall i \in {1, \dots, n}, I (Y_{i}; T) \geq I (Q; T), \end{matrix}

(12)

which again leads to a convex restriction set, as

I (Q; T)

is a convex function of

K^{Q}

. We can discretize the problem in the same manner as above to obtain a tractable version

\begin{matrix} max_{K^{Q}} & I (Q; T) \\ s . t . & K^{Q} i s a s t o c h a s t i c m a t r i x, \\ \forall p (t) \in S, \forall i \in {1, \dots, n}, I (Y_{i}; T) \geq I (Q; T), \end{matrix}

(13)

which again yields an upper bound on

I_{\cap}^{m c} (Y_{1}, \dots, Y_{n})

.

5.3. The “Degradation/Supermodularity” Order

The final introduced measure,

I_{\cap}^{d s} (Y_{1}, \dots, Y_{n})

, is provided by

\begin{matrix} max_{K^{Q}} & I (Q; T) \\ s . t . & K^{Q} i s a s t o c h a s t i c m a t r i x, \\ \forall i, K^{Q} ⪯_{d s} K^{(i)} . \end{matrix}

(14)

To the best of our knowledge, there is currently no known condition to check whether

K^{Q} ⪯_{d s} K^{(i)}

.

6. Relation to Existing PID Measures

Griffith et al. [32] introduced a measure of II as

\begin{matrix} I_{\cap}^{◃} (Y_{1}, \dots, Y_{n}) : = max_{Q} I (Q; T), such that \forall i Q ◃ Y_{i}, \end{matrix}

(15)

with the order relation ◃ defined by

A ◃ B

if

A = f (B)

for some deterministic function f, that is,

I_{\cap}^{◃}

is used to quantify the redundancy as the presence of deterministic relations between the input and target. If Q is a solution of (15), then there exist functions

f_{1}, \dots, f_{n}

such that

Q = f_{i} (Y_{i}), i = 1, \dots, n

, which implies that for all i it is the case that

T \to Y_{i} \to Q

is a Markov chain. Therefore, Q is an admissible point of the optimization problem that defines

I_{\cap}^{d} (Y_{1}, \dots, Y_{n})

, and we have

I_{\cap}^{◃} (Y_{1}, \dots, Y_{n}) \leq I_{\cap}^{d} (Y_{1}, \dots, Y_{n})

.

Barrett [33] introduced the so-called minimum mutual information (MMI) measure of bivariate redundancy as

I_{\cap}^{MMI} (Y_{1}, Y_{2}) : = min {I (T; Y_{1}), I (T; Y_{2})} .

It turns out that if

(Y_{1}, Y_{2})

is jointly Gaussian and T is univariate, then most of the introduced PIDs in the literature are equivalent to this measure [33]. Furthermore, as noted by Kolchinsky [21], it may be generalized to more than two sources:

I_{\cap}^{MMI} (Y_{1}, \dots, Y_{n}) : = sup_{Q} I (Q; T), such that \forall i I (Q; T) \leq I (Y_{i}; T),

which allows us to trivially conclude that for any set of variables

Y_{1}, \dots, Y_{n}, T

,

I_{\cap}^{◃} (Y_{1}, \dots, Y_{n}) \leq I_{\cap}^{d} (Y_{1}, \dots, Y_{n}) \leq I_{\cap}^{m c} (Y_{1}, \dots, Y_{n}) \leq I_{\cap}^{MMI} (Y_{1}, \dots, Y_{n}) .

One of the appeals of measures of II as defined by Kolchinsky [21] is that the underlying preorder determines what is intersection (or redundant) information. For example, taking the degradation II measure in the

n = 2

case, its solution Q satisfies

T ⊥ Q | Y_{1}

and

T ⊥ Q | Y_{2}

; that is, if either

Y_{1}

or

Y_{2}

are known, then Q has no additional information about T. The same is not necessarily the case for the less noisy or the more capable II measures, where the solution Q may have additional information about T even when a source is known. However, the three proposed measures satisfy the property that any solution Q of the optimization problem satisfies

\forall i \in {1, \dots, n}, \forall t \in S_{T}, I (Y_{i}; T = t) \geq I (Q; T = t),

where

S_{T}

is the support of T and

I (T = t; Y_{i})

refers to the so-called specific information [1,34]. This means that, independent of the outcome of T, Q has less specific information about

T = t

than any source variable

Y_{i}

. This can be seen by noting that any of the introduced orders imply the more capable order. This is not the case, for example, for

I_{\cap}^{MMI}

, which is arguably one of the reasons why it has been criticized for depending only on the amount of information and not on its content [21]. As mentioned, there is not much consensus as to which properties a measure of II should satisfy. The three proposed measures for partial information decomposition do not satisfy the so-called Blackwell property [14,35]:

Definition 8.

An intersection information measure

I_{\cap} (Y_{1}, Y_{2})

is said to satisfy the Blackwell property if the equivalence

Y_{1} ⪯_{d} Y_{2} \Leftrightarrow I_{\cap} (Y_{1}, Y_{2}) = I (T; Y_{1})

holds.

This definition is equivalent to demanding that

Y_{1} ⪯_{d} Y_{2}

if and only if

Y_{1}

has no unique information about T. Although the

(\Rightarrow)

implication holds for the three proposed measures, the reverse implication does not, as shown by specific examples presented by Korner and Marton [22], which we mention below. If we define the “more capable” property by replacing the degradation order with the more capable order in the original definition of the Blackwell property, then it is clear that measure k satisfies the k property, with k referring to any of the three introduced intersection information measures.

In PID, the identity property (IP) has been frequently studied [13]. For this property, let the target T be a copy of the source variables, that is, let

T = (Y_{1}, Y_{2})

. An II measure

I_{\cap}

is said to satisfy the IP if

I_{\cap} (Y_{1}, Y_{2}) = I (Y_{1}; Y_{2}) .

Criticism has been levied against this proposal for being too restrictive [16,36]. A less strict property was introduced by [20] under the name independent identity property (IIP). If the target T is a copy of the input, an II measure is said to satisfy the IIP if

I (Y_{1}; Y_{2}) = 0 \Rightarrow I_{\cap} (Y_{1}, Y_{2}) = 0 .

Note that the IIP is implied by the IP, while the reverse does not hold. It turns out that all the introduced measures, as is the case for the degradation II measure, satisfy the IIP and not the IP, as we show later. This can be seen from (8) and (9), as well as from the fact that

I_{\cap}^{m c} (Y_{1}, Y_{2} \to (Y_{1}, Y_{2}))

equals 0 if

I (Y_{1}; Y_{2}) = 0

, as we argue now. Consider the distribution where T is a copy of

(Y_{1}, Y_{2})

, as presented in Table 1.

We assume that each of the four events has non-zero probability. In this case, channels

K^{(1)}

and

K^{(2)}

are provided by

K^{(1)} = [\begin{matrix} 1 & 0 \\ 1 & 0 \\ 0 & 1 \\ 0 & 1 \end{matrix}], K^{(2)} = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 0 \\ 0 & 1 \end{matrix}] .

Note that for any distribution

p (t) = [p (0, 0), p (0, 1), p (1, 0), p (1, 1)]

, if

p (1, 0) = p (1, 1) = 0

, then

I (T; Y_{1}) = 0

, which implies that for any such distributions the solution Q of (12) must satisfy

I (Q; T) = 0

. Thus, the first and second rows of

K^{Q}

must be the same. The same is the case for any distribution

p (t)

with

p (0, 0) = p (0, 1) = 0

; on the other hand, if

p (0, 0) = p (1, 0) = 0

or

p (1, 1) = p (0, 1) = 0

, then

I (T; Y_{2}) = 0

, implying that

I (Q; T) = 0

for such distributions. Hence,

K^{Q}

must be an arbitrary channel, that is, a channel that satisfies

Q ⊥ T

, yielding

I_{\cap}^{m c} (Y_{1}, Y_{2}) = 0

.

Now, recall the Gács–Korner common information [37], defined as

\begin{matrix} C (Y_{1} \land Y_{2}) : = & sup_{Q} & H (Q) \\ s . t . & Q ◃ Y_{1} \\ Q ◃ Y_{2} \end{matrix}

(16)

We use a similar argument, while slightly changing the notation, to show the following result.

Theorem 3.

Let

T = (X, Y)

be a copy of the source variables; then,

I_{\cap}^{l n} (X, Y) = I_{\cap}^{d s} (X, Y) = I_{\cap}^{m c} (X, Y) = C (X \land Y)

.

Proof.

As shown by Kolchinsky [21],

I_{\cap}^{d} (X, Y) = C (X \land Y)

. Thus, (8) implies that

I_{\cap}^{m c} (X, Y)

\geq C (X \land Y)

. The proof is completed by showing that

I_{\cap}^{m c} (X, Y) \leq C (X \land Y)

. Construct the bipartite graph with vertex set

X \cup Y

and edges

(x, y)

if

p (x, y) > 0

. Consider the set of maximally connected components

M C C = {C C_{1}, \dots, C C_{l}}

for some

l \geq 1

, where each

C C_{i}

refers to a maximal set of connected edges. Let

C C_{i}, i \leq l

be an arbitrary set in

M C C

. Suppose that the edges

(x_{1}, y_{1})

and

(x_{1}, y_{2})

(with

y_{1} \neq y_{2}

) are in

C C_{i}

. This means that the channels

K^{X} : = K^{X | T}

and

K^{Y} : = K^{Y | T}

have rows corresponding to the outcomes

T = (x_{1}, y_{1})

and

T = (x_{1}, y_{2})

of the form

K^{X} = [\begin{matrix} ⋮ \\ 0 & \dots & 0 & 1 & 0 & \dots & 0 \\ 0 & \dots & 0 & 1 & 0 & \dots & 0 \\ ⋮ \end{matrix}], K^{Y} = [\begin{matrix} ⋮ \\ 0 & \dots & 0 & 1 & 0 & 0 & \dots & 0 \\ 0 & \dots & 0 & 0 & 1 & 0 & \dots & 0 \\ ⋮ \end{matrix}] .

Choosing

p (t) = [0, \dots, 0, a, 1 - a, 0, \dots, 0]

, that is,

p (T = (x_{1}, y_{1})) = a

and

p (T = (x_{1}, y_{2})) = 1 - a

, we have

\forall a \in [0, 1], I (X; T) = 0

, which implies that the solution Q must be such that

\forall a \in [0, 1], I (Q; T) = 0

(from the definition of the more capable order), which in turn implies that the rows of

K^{Q}

corresponding to these outcomes must be the same to ensure that they yield

I (Q; T) = 0

under this set of distributions. We may choose the values of those rows to be the same as those rows from

K^{X}

, that is, a row that is composed of zeros except for one of the positions whenever

T = (x_{1}, y_{1})

or

T = (x_{1}, y_{2})

. On the other hand, if the edges

(x_{1}, y_{1})

and

(x_{2}, y_{1})

(with

x_{1} \neq x_{2}

) are in

C C_{i}

, the same argument leads to the conclusion that the rows of

K^{Q}

corresponding to the outcomes

T = (x_{1}, y_{1})

,

T = (x_{1}, y_{2})

and

T = (x_{2}, y_{1})

must be the same. Applying this argument to every edge in

C C_{i}

, we can conclude that the rows of

K^{Q}

corresponding to outcomes

(x, y) \in C C_{i}

must all be the same. Using this argument for every set

C C_{1}, \dots, C C_{l}

implies that if two edges are in the same

C C

, the corresponding rows of

K^{Q}

must be the same. These corresponding rows of

K^{Q}

may vary between different CCs; however, for the same CC they must be the same.

We are left with the choice of appropriate rows of

K^{Q}

for each corresponding

C C_{i}

. Because

I (Q; T)

is maximized by a deterministic relation between Q and T and, as suggested before, we choose a row that is composed of zeros except for one of the positions for each

C C_{i}

such that Q is a deterministic function of T, this admissible point Q implies that

Q = f_{1} (X)

and

Q = f_{2} (Y)

, as X and Y are also functions of T under the channel perspective. For this choice of rows, we have

\begin{matrix} I_{\cap}^{m c} (X, Y) = & sup_{Q} I (Q; T) & \leq & sup_{Q} H (Q) & = C (X \land Y) \\ s . t . Q ⪯_{m c} X & s . t . Q = f_{1} (X) \\ Q ⪯_{m c} Y & Q = f_{2} (Y) \end{matrix}

where we have used the fact that

I (Q; T) \leq min {H (Q), H (T)}

to conclude that

I_{\cap}^{m c} (X, Y) \leq C (X \land Y)

. Hence

I_{\cap}^{l n} (X, Y) = I_{\cap}^{d s} (X, Y) = I_{\cap}^{m c} (X, Y) = C (X \land Y)

if T is a copy of the input. □

Bertschinger et al. [14] suggested what later became known as the (*) assumption, which states that in the bivariate source case any sensible measure of unique information should only depend on

K^{(1)}, K^{(2)}

, and

p (t)

. It is not clear that this assumption should hold for every PID. It is trivial to see that all the introduced II measures satisfy the (*) assumption.

We conclude with several applications of the proposed measures to famous (bivariate) PID problems; the results are shown in Table 2. Due to the channel design in these problems, computation of the proposed measures is fairly trivial. We assume that the input variables are binary (taking values in

{0, 1}

), independent, and equiprobable.

We note that in these fairly simple toy distributions all of the introduced measures yield the same value. This is not surprising when the distribution

p (t, y_{1}, y_{2})

yields

K^{(1)} = K^{(2)}

, which implies that

I (T; Y_{1}) = I (T; Y_{2}) = I_{\cap}^{k} (Y_{1}, Y_{2})

, where k refers to any of the introduced preorders, as is the case in the

T = Y_{1} AND Y_{2}

and

T = Y_{1} + Y_{2}

examples. Less trivial examples lead to different values over the introduced measures. We present distributions showing that our three introduced measures lead to novel information decompositions by comparing them to the following existing measures:

I_{\cap}^{◃}

from Griffith et al. [32],

I_{\cap}^{MMI}

from Barrett [33],

I_{\cap}^{WB}

from Williams and Beer [1],

I_{\cap}^{GH}

from Griffith and Ho [38],

I_{\cap}^{Ince}

from Ince [20],

I_{\cap}^{FL}

from Finn and Lizier [39],

I_{\cap}^{BROJA}

from Bertschinger et al. [14],

I_{\cap}^{Harder}

from Harder et al. [13], and

I_{\cap}^{dep}

from [16]. We used the dit package [40] to compute them, along with the code provided in [21]. Consider counterexample 1 from [22] with

p = 0.25, ϵ = 0.2, δ = 0.1

, provided by

K^{(1)} = [\begin{matrix} 0.25 & 0.75 \\ 0.35 & 0.65 \end{matrix}], K^{(2)} = [\begin{matrix} 0.675 & 0.325 \\ 0.745 & 0.255 \end{matrix}] .

These channels satisfy

K^{(2)} ⪯_{l n} K^{(1)}

and

K_{d}^{(2)} K^{(1)}

from Korner and Marton [22]. This is an example that satisfies

I_{\cap}^{l n} (Y_{1}, Y_{2}) = I (T; Y_{2})

for a given distribution

p (t)

. It is noteworthy to see that even though there is no degradation order between the two channels, we nonetheless have

I_{\cap}^{d} (Y_{1}, Y_{2}) > 0

, as there is some non-trivial channel

K^{Q}

that satisfies

K^{Q} ⪯_{d} K^{(1)}

and

K^{Q} ⪯_{d} K^{(2)}

. In Table 3, we present various PIDs under different measures after choosing

p (t) = [0.4, 0.6]

(which yields

I (T; Y_{2}) \approx 0.004

) and assuming

p (t, y_{1}, y_{2}) = p (t) p (y_{1} | t) p (y_{2} | t)

.

We write

I_{\cap}^{d s} =

* here, as we do not yet have a way to find the ‘largest’ Q such that

Q ⪯_{d s} K^{(1)}

and

Q ⪯_{d s} K^{(2)}

(see counterexample 2 from [22] for an example of channels

K^{(1)}, K^{(2)}

that satisfy

K^{(2)} ⪯_{m c} K^{(1)}

while

K^{(2)} ⋠_{l n} K^{(1)}

, leading to different values of the proposed II measures). An example of

K^{(3)}, K^{(4)}

that satisfy

K^{(4)} ⪯_{d s} K^{(3)}

while

K^{(4)} ⋠_{d} K^{(3)}

is presented by (Américo et al. [23], page 10), provided by

K^{(3)} = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 0.5 & 0.5 \end{matrix}], K^{(4)} = [\begin{matrix} 1 & 0 \\ 1 & 0 \\ 0.5 & 0.5 \end{matrix}] .

There is no stochastic matrix

K^{U}

such that

K^{(4)} = K^{(3)} K^{U}

while

K^{(4)} ⪯_{d s} K^{(3)}

, as

K^{(4)} = ⋄_{1, 2} K^{(3)}

. Using (10), it is possible to check whether there is any less noisy relation between the two channels. (Compute (10) with

V = K^{(4)}, W = K^{(3)}

,

p (t) = [0, 0, 1]

, and

q (t) = [0.1, 0.1, 0.8]

to conclude that

K^{(4)} ⋠_{l n} K^{(3)}

, then switch the roles of V and W and set

p (t) = [0, 1, 0]

and

q (t) = [0.1, 0, 0.9]

to conclude that

K^{(3)} ⋠_{l n} K^{(4)}

). We present the decomposition of

p (t, y_{3}, y_{4}) = p (t) p (y_{3} | t) p (y_{4} | t)

for the choice of

p (t) = [0.3, 0.3, 0.4]

(which yields

I (T; Y_{4}) \approx 0.322

) in Table 4.

We write

I_{\cap}^{l n} = 0^{*}

because we conjecture, after numerical experiments based on (10), that the ‘largest’ channel that is less noisy than both

K^{(3)}

and

K^{(4)}

is a channel that satisfies

I (Q; T) = 0

. (We tested all

3 \times 3

row-stochastic matrices with entries that take values in

{0, 0.1, 0.2, \dots, 0.9, 1}

with all distributions

p (t)

and

q (t)

having entries that take values in the same set.)

7. Conclusions and Future Work

In this paper, we have introduced three new measures of intersection information for the partial information decomposition (PID) framework based on preorders between channels implied by the degradation/Blackwell order. The new measures were obtained from the orders by following the approach recently proposed by Kolchinsky [21]. The main contributions and conclusions of this paper can be summarized as follows:

We show that a measure of intersection information that satisfies the axioms by Kolchinsky [21] and is based on a preorder, satisfies the Williams–Beer axioms as well [1].
As a corollary of the previous result, the proposed measures satisfy the Williams–Beer axioms, and can be extended beyond two sources.
We demonstrate that if there is a degradation ordering between the sources, then the measures coincide in their decomposition. Conversely, if there is no degradation ordering (i.e., only a weaker ordering) between the source variables, the proposed measures lead to novel finer information decompositions that capture different finer information.
We show that while the proposed measures do not satisfy the identity property (IP) [13], they do satisfy the independent identity property (IIP) [20].
We formulate the optimization problems that yield the proposed measures, and derive bounds by relating them to existing measures.

Finally, we believe that this paper opens several avenues for future research; thus, we point to several directions that could be pursued in upcoming work:

Investigating conditions to verify whether two channels $K^{(1)}$ and $K^{(2)}$ satisfy $K^{(1)} ⪯_{d s} K^{(2)}$ .
Kolchinsky [21] showed that when computing $I_{\cap}^{d} (Y_{1}, \dots, Y_{n})$ , it is sufficient to consider variables Q with a support size of at most $\sum_{i} | S_{Y_{i}} | - n + 1$ , which is a consequence of the admissible region of $I_{\cap}^{d} (Y_{1}, \dots, Y_{n})$ being a polytope. The same is not the case with the less noisy or the more capable measures; hence, it is not clear whether it is sufficient to consider Q with the same support size, which could represent a direction for future research.
Studying the conditions under which different intersection information measures are continuous.
Implementing the introduced measures by addressing their corresponding optimization problems.
Considering the usual PID framework, except that instead of decomposing $I (T; Y) = H (Y) - H (Y | T)$ , where H denotes the Shannon entropy, other mutual informations induced by different entropy measures could be considered, such as the guessing entropy [41] or the Tsallis entropy [42] (see the work of Américo et al. [23] for other core-concave entropies that may be decomposed under the introduced preorders, as these entropies are consistent with the introduced orders).
Another line for future work might be to define measures of union information using the introduced preorders, as suggested by Kolchinsky [21], and to study their properties.
As a more long-term research direction, it would be interesting to study how the approach taken in this paper can be extended to quantum information; the fact that partial quantum information can be negative might open up new possibilities or create novel difficulties [43].

Author Contributions

Conceptualization, A.F.C.G.; writing—original draft preparation, A.F.C.G.; writing—review and editing, A.F.C.G. and M.A.T.F.; supervision, M.A.T.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by: FCT—Fundação para a Ciência e a Tecnologia under the grants SFRH/BD/145472/2019 and UIDB/50008/2020; Instituto de Telecomunicações; Portuguese Recovery and Resilience Plan, project C645008882-00000055 (NextGenAI, CenterforResponsibleAI).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Artemy Kolchinsky for helpful discussions and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Williams, P.; Beer, R. Nonnegative decomposition of multivariate information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
Lizier, J.; Flecker, B.; Williams, P. Towards a synergy-based approach to measuring information modification. In Proceedings of the 2013 IEEE Symposium on Artificial Life (ALIFE), Singapore, 16–19 April 2013; pp. 43–51. [Google Scholar]
Wibral, M.; Finn, C.; Wollstadt, P.; Lizier, J.; Priesemann, V. Quantifying information modification in developing neural networks via partial information decomposition. Entropy 2017, 19, 494. [Google Scholar] [CrossRef] [Green Version]
Rauh, J. Secret sharing and shared information. Entropy 2017, 19, 601. [Google Scholar] [CrossRef] [Green Version]
Vicente, R.; Wibral, M.; Lindner, M.; Pipa, G. Transfer entropy—A model-free measure of effective connectivity for the neurosciences. J. Comput. Neurosci. 2011, 30, 45–67. [Google Scholar] [CrossRef] [Green Version]
Ince, R.; Van Rijsbergen, N.; Thut, G.; Rousselet, G.; Gross, J.; Panzeri, S.; Schyns, P. Tracing the flow of perceptual features in an algorithmic brain network. Sci. Rep. 2015, 5, 17681. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gates, A.; Rocha, L. Control of complex networks requires both structure and dynamics. Sci. Rep. 2016, 6, 24456. [Google Scholar] [CrossRef] [Green Version]
Faber, S.; Timme, N.; Beggs, J.; Newman, E. Computation is concentrated in rich clubs of local cortical networks. Netw. Neurosci. 2019, 3, 384–404. [Google Scholar] [CrossRef]
James, R.; Ayala, B.; Zakirov, B.; Crutchfield, J. Modes of information flow. arXiv 2018, arXiv:1808.06723. [Google Scholar]
Arellano-Valle, R.; Contreras-Reyes, J.; Genton, M. Shannon Entropy and Mutual Information for Multivariate Skew-Elliptical Distributions. Scand. J. Stat. 2013, 40, 42–62. [Google Scholar] [CrossRef]
Cover, T. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
Gutknecht, A.; Wibral, M.; Makkeh, A. Bits and pieces: Understanding information decomposition from part-whole relationships and formal logic. Proc. R. Soc. A 2021, 477, 20210110. [Google Scholar] [CrossRef]
Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef] [Green Version]
Griffith, V.; Koch, C. Quantifying synergistic mutual information. In Guided Self-Organization: Inception; Springer: Berlin/Heidelberg, Germany, 2014; pp. 159–190. [Google Scholar]
James, R.; Emenheiser, J.; Crutchfield, J. Unique information via dependency constraints. J. Phys. A Math. Theor. 2018, 52, 014002. [Google Scholar] [CrossRef] [Green Version]
Chicharro, D.; Panzeri, S. Synergy and redundancy in dual decompositions of mutual information gain and information loss. Entropy 2017, 19, 71. [Google Scholar] [CrossRef] [Green Version]
Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared information—New insights and problems in decomposing information in complex systems. In Proceedings of the European Conference on Complex Systems 2012, Brussels, Belgium, 2–7 September 2012; Springer: Berlin/Heidelberg, Germany, 2013; pp. 251–269. [Google Scholar]
Rauh, J.; Banerjee, P.; Olbrich, E.; Jost, J.; Bertschinger, N.; Wolpert, D. Coarse-graining and the Blackwell order. Entropy 2017, 19, 527. [Google Scholar] [CrossRef] [Green Version]
Ince, R. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy 2017, 19, 318. [Google Scholar] [CrossRef] [Green Version]
Kolchinsky, A. A Novel Approach to the Partial Information Decomposition. Entropy 2022, 24, 403. [Google Scholar] [CrossRef]
Korner, J.; Marton, K. Comparison of two noisy channels. In Topics in Information Theory; Csiszr, I., Elias, P., Eds.; North-Holland Pub. Co.: Amsterdam, The Netherlands, 1977; pp. 411–423. [Google Scholar]
Américo, A.; Khouzani, A.; Malacaria, P. Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization. Entropy 2021, 24, 39. [Google Scholar] [CrossRef]
Cohen, J.; Kempermann, J.; Zbaganu, G. Comparisons of Stochastic Matrices with Applications in Information Theory, Statistics, Economics and Population; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Blackwell, D. Equivalent comparisons of experiments. Ann. Math. Stat. 1953, 24, 265–272. [Google Scholar] [CrossRef]
Makur, A.; Polyanskiy, Y. Less noisy domination by symmetric channels. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 2463–2467. [Google Scholar]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Wyner, A. The wire-tap channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
Bassi, G.; Piantanida, P.; Shamai, S. The secret key capacity of a class of noisy channels with correlated sources. Entropy 2019, 21, 732. [Google Scholar] [CrossRef] [Green Version]
Gamal, A. The capacity of a class of broadcast channels. IEEE Trans. Inf. Theory 1979, 25, 166–169. [Google Scholar] [CrossRef]
Clark, D.; Hunt, S.; Malacaria, P. Quantitative information flow, relations and polymorphic types. J. Log. Comput. 2005, 15, 181–199. [Google Scholar] [CrossRef]
Griffith, V.; Chong, E.; James, R.; Ellison, C.; Crutchfield, J. Intersection information based on common randomness. Entropy 2014, 16, 1985–2000. [Google Scholar] [CrossRef] [Green Version]
Barrett, A. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef] [Green Version]
DeWeese, M.; Meister, M. How to measure the information gained from one symbol. Netw. Comput. Neural Syst. 1999, 10, 325. [Google Scholar] [CrossRef]
Rauh, J.; Banerjee, P.; Olbrich, E.; Jost, J.; Bertschinger, N. On extractable shared information. Entropy 2017, 19, 328. [Google Scholar] [CrossRef] [Green Version]
Rauh, J.; Bertschinger, N.; Olbrich, E.; Jost, J. Reconsidering unique information: Towards a multivariate information decomposition. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 2232–2236. [Google Scholar]
Gács, P.; Körner, J. Common information is far less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
Griffith, V.; Ho, T. Quantifying redundant information in predicting a target random variable. Entropy 2015, 17, 4644–4653. [Google Scholar] [CrossRef] [Green Version]
Finn, C.; Lizier, J. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy 2018, 20, 297. [Google Scholar] [CrossRef] [Green Version]
James, R.; Ellison, C.; Crutchfield, J. “dit”: A Python package for discrete information theory. J. Open Source Softw. 2018, 3, 738. [Google Scholar] [CrossRef]
Massey, J. Guessing and entropy. In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 204. [Google Scholar]
Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Horodecki, M.; Oppenheim, J.; Winter, A. Partial quantum information. Nature 2005, 436, 673–676. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. Implications satisfied by the orders. The reverse implications do not hold in general.

Table 1. Copy distribution.

T	$Y_{1}$	$Y_{2}$	$p (t, y_{1}, y_{2})$
(0, 0)	0	0	$p (T = (0, 0))$
(0, 1)	0	1	$p (T = (0, 1))$
(1, 0)	1	0	$p (T = (1, 0))$
(1, 1)	1	1	$p (T = (1, 1))$

Table 2. Application of the proposed measures to famous PID problems.

Target	$I_{\cap}^{d}$	$I_{\cap}^{\ln}$	$I_{\cap}^{ds}$	$I_{\cap}^{mc}$	$I_{\cap}^{MMI}$
$T = Y_{1} AND Y_{2}$	0.311	0.311	0.311	0.311	0.311
$T = Y_{1} + Y_{2}$	0.5	0.5	0.5	0.5	0.5
$T = Y_{1}$	0	0	0	0	0
$T = (Y_{1}, Y_{2})$	0	0	0	0	1

Table 3. Different decompositions of

p (t, y_{1}, y_{2})

.

Table 3. Different decompositions of

p (t, y_{1}, y_{2})

.

$I_{\cap}^{◃}$	$I_{\cap}^{d}$	$I_{\cap}^{\ln}$	$I_{\cap}^{ds}$	$I_{\cap}^{mc}$	$I_{\cap}^{MMI}$	$I_{\cap}^{WB}$	$I_{\cap}^{GH}$	$I_{\cap}^{Ince}$	$I_{\cap}^{FL}$	$I_{\cap}^{BROJA}$	$I_{\cap}^{Harder}$	$I_{\cap}^{dep}$
0	0.002	0.004	*	0.004	0.004	0.004	0.002	0.003	0.047	0.003	0.004	0

Table 4. Different decompositions of

p (t, y_{3}, y_{4})

.

Table 4. Different decompositions of

p (t, y_{3}, y_{4})

.

$I_{\cap}^{◃}$	$I_{\cap}^{d}$	$I_{\cap}^{\ln}$	$I_{\cap}^{ds}$	$I_{\cap}^{mc}$	$I_{\cap}^{MMI}$	$I_{\cap}^{WB}$	$I_{\cap}^{GH}$	$I_{\cap}^{Ince}$	$I_{\cap}^{FL}$	$I_{\cap}^{BROJA}$	$I_{\cap}^{Harder}$	$I_{\cap}^{dep}$
0	0	$0^{*}$	0.322	0.322	0.322	0.193	0	0	0.058	0	0	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gomes, A.F.C.; Figueiredo, M.A.T. Orders between Channels and Implications for Partial Information Decomposition. Entropy 2023, 25, 975. https://doi.org/10.3390/e25070975

AMA Style

Gomes AFC, Figueiredo MAT. Orders between Channels and Implications for Partial Information Decomposition. Entropy. 2023; 25(7):975. https://doi.org/10.3390/e25070975

Chicago/Turabian Style

Gomes, André F. C., and Mário A. T. Figueiredo. 2023. "Orders between Channels and Implications for Partial Information Decomposition" Entropy 25, no. 7: 975. https://doi.org/10.3390/e25070975

APA Style

Gomes, A. F. C., & Figueiredo, M. A. T. (2023). Orders between Channels and Implications for Partial Information Decomposition. Entropy, 25(7), 975. https://doi.org/10.3390/e25070975

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Orders between Channels and Implications for Partial Information Decomposition

Abstract

1. Introduction

2. Kolchinsky’s Axioms and Intersection Information

3. Channels and the Degradation/Blackwell Order

4. Other Orders and Corresponding II Measures

4.1. The “Less Noisy” Order

4.2. The “More Capable” Order

4.3. The “Degradation/Supermodularity” Order

4.4. Relations between Orders

5. Optimization Problems

5.1. The “Less Noisy” Order

5.2. The “More Capable” Order

5.3. The “Degradation/Supermodularity” Order

6. Relation to Existing PID Measures

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI