Broadcast Channel Cooperative Gain: An Operational Interpretation of Partial Information Decomposition

Tian, Chao; Shamai (Shitz), Shlomo

doi:10.3390/e27030310

Open AccessArticle

Broadcast Channel Cooperative Gain: An Operational Interpretation of Partial Information Decomposition

by

Chao Tian

^1,*

and

Shlomo Shamai (Shitz)

²

¹

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA

²

Department of Electrical Engineering, Technion Institute of Technology of Israel, Haifa 3200003, Israel

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(3), 310; https://doi.org/10.3390/e27030310

Submission received: 15 February 2025 / Revised: 7 March 2025 / Accepted: 12 March 2025 / Published: 15 March 2025

(This article belongs to the Special Issue Semantic Information Theory)

Download

Browse Figure

Versions Notes

Abstract

Partial information decomposition has recently found applications in biological signal processing and machine learning. Despite its impacts, the decomposition was introduced through an informal and heuristic route, and its exact operational meaning is unclear. In this work, we fill this gap by connecting partial information decomposition to the capacity of the broadcast channel, which has been well studied in the information theory literature. We show that the synergistic information in the decomposition can be rigorously interpreted as the cooperative gain, or a lower bound of this gain, on the corresponding broadcast channel. This interpretation can help practitioners to better explain and expand the applications of the partial information decomposition technique.

Keywords:

network information theory; channel capacity

1. Introduction

Shannon’s mutual information has been widely accepted as a measure to gauge the amount of information that can be revealed by one random variable regarding another random variable. Partial information decomposition (PID) is an approach to refine and further decompose this fundamental quantity to explain the effect of interactions among several random variables. Recently, this approach has found applications in biological information processing [1,2,3,4,5,6,7] and machine learning [8,9,10].

There exist different approaches to decompose the total information [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27], but the general idea in the case with two observables is as follows: the total information revealed by X and Y regarding a third quantity T is the mutual information between

(X, Y)

and T, i.e.,

I (X, Y; T)

(see [28]), and it needs to be decomposed into four non-negative parts:

The common (or redundancy) information in X and Y, regarding T;
The unique information in X, but not in Y, regarding T;
The unique information in Y, but not in X, regarding T;
The synergistic information (or complementary) information in X and Y, regarding T, which only becomes useful when combined, but is useless otherwise.

This decomposition helps to explain the effect of combining X and Y, or separately using X or Y, to infer T. For example, in multi-modality machine learning, X can represent one modality such as the vision image of the event, Y another modality such as the soundtrack of the event, and T the event’s label. In this case, we can use the amounts of synergistic and unique information to determine which multi-modality models will be the most effective in capturing the interactions and therefore are most likely to be accurate. In neural signal processing, the quantities can be used in a similar manner to interpret different biological signals.

One of the most influential notions of partial information decomposition was proposed by Bertschinger et al. [13], sometimes referred to as the BROJA PID (the initials of the authors; we will simply refer to it as partial information decomposition as it is the only decomposition we consider). The desirable properties of this definition were thoroughly studied in [13], but in terms of the operation meaning, only a qualitative justification in a decision-making setting was given. The intuition and motivation were that these quantities can help explain how rewards can be optimized via decision-making in such a setting, either using only X (or only Y) as the observations, or using X and Y jointly as the observations (a refinement was given in [29]). However, this qualitative justification is far from satisfying, as it does not provide the exact quantitative interpretation and requires an artificially introduced utility function, which makes its meaning quite vague. In this work, we fill this gap by connecting PID to the well-studied broadcast channel in information theory [30,31,32], and show that there is indeed a quantitative connection between PID and the sum-rate capacity of the broadcast channel. More precisely, the synergistic information defined in [13] can be interpreted as either the cooperative gain or a lower bound of the cooperative gain in the broadcast channel, under a fixed signaling distribution.

2. Partial Information Decomposition

2.1. A Few Examples

Before formally introducing the PID defined in [13], let us consider a few simple settings to understand intuitively how the decomposed information should behave.

1.: Common information dominant: X is a uniformly distributed Bernoulli random variable and $Y = T = X$ . It is clear that $I (X, Y; T) = 1$ . In this case, the common information should be one, since X and Y are exactly identical. The other components should all be 0, since there is no unique information in either X or Y regarding T, and there is no synergistic information when combining X and Y.
2.: Synergistic information dominant: X and Y are uniformly distributed Bernoulli random variables independent of each other, and $T = X \oplus Y$ , where ⊕ is the XOR operation. Here, the synergistic information should be 1, and the other components should be 0. This is because X or Y alone do not reveal any information regarding T, and since they are completely independent, they do not share any common information either, but their combination reveals the full information on T.
3.: Component-wise decomposition: $X = (X_{1}, X_{2})$ and $Y = (Y_{1}, Y_{2})$ , where $X_{1}, X_{2}, Y_{1}, Y_{2}$ are all uniformly-distributed Bernoulli random variables, mutually independent of each other. Let $T = (X_{1}, Y_{1}, X_{2} \oplus Y_{2})$ . Clearly, in this case, the common information is still 0, but the two kinds of unique information are both 1, and the synergistic information is also 1.

2.2. Partial Information Decomposition

We next introduce the formal definition for the PID given in [13]. Let

X, Y,

and T be three random variables in their respective alphabet

X

,

Y

, and

T

, which follow the joint distribution

P_{X, Y, T}

. The total information between

X, Y,

and T is their mutual information, i.e.,

I (T; X, Y)

; however, from here on out, we shall write it as

I_{P} (T; X, Y)

to make the dependence on the distribution more explicit. As mentioned earlier, the decomposition first needs to satisfy the total information rule:

\begin{matrix} Total information : \\ I_{P} (T; X, Y) = I^{(C)} (T; X, Y) + I^{(U_{X})} (T; X, Y) + I^{(U_{Y})} (T; X, Y) + I^{(S)} (T; X, Y), \end{matrix}

(1)

where the superscript C indicates common (or redundancy) information,

U_{X}

indicates unique information from X,

U_{Y}

indicates unique information from Y, and S indicates synergistic (or complementary) information. This condition simply states that the total information is the summation of four parts. The decomposition also needs to satisfy the individual information rule:

\begin{matrix} Individual information : \begin{matrix} I_{P} (T; X) = I^{(C)} (T; X, Y) + I^{(U_{X})} (T; X, Y) \\ I_{P} (T; Y) = I^{(C)} (T; X, Y) + I^{(U_{Y})} (T; X, Y) . \end{matrix} \end{matrix}

(2)

This rule dictates that the information that can be revealed from X regarding T needs to be the summation of the unique information in X, and the common information in both X and Y. Note that the synergistic information is not included here, since it can only manifest when combining X and Y.

We now already have three linear equations for the four quantities in the decomposition, and the authors of [13] introduced the following definition of synergistic information:

\begin{matrix} I^{(S)} (T; X, Y) ≜ I_{P} (T; X, Y) - min_{Q \in Q} I_{Q} (T; X, Y), \end{matrix}

(3)

where the set of distributions

Q

is defined as

\begin{matrix} Q = {Q \in Δ_{X \times Y \times T} : Q_{T, X} = P_{T, X}, Q_{T, Y} = P_{T, Y}}, \end{matrix}

(4)

where

Δ_{X \times Y \times T}

is the probability simplex on

X \times Y \times T

; in other words,

Q

is the set of all joint distributions such that the pairwise marginals of

P_{T, X}

and

P_{T, Y}

are preserved. As a consequence, this definition, together with (1) and (2), also specifies the values of the other three quantities

I^{(C)} (T; X, Y)

,

I^{(U_{X})} (T; X, Y)

, and

I^{(U_{Y})} (T; X, Y)

. It can be easily verified that in the examples discussed earlier, this decomposition gives the quantities matching our expectations.

It was shown in [13] that this PID definition enjoys several desirable properties, for example, the following: (1) All the quantities

I^{(C)} (T; X, Y)

,

I^{(U_{X})} (T; X, Y)

,

I^{(U_{Y})} (T; X, Y)

, and

I^{(S)} (T; X, Y)

are non-negative. (2) They satisfy certain extremal relations in the sense that the shared information

I^{(S)} (T; X, Y)

is the minimum among all possible definition under an additional axiom that the unique information only depends on the two marginal distributions. (3) They satisfy the symmetry, self-redundancy, and monotonicity axioms stipulated in [11].

3. Broadcast Channels

The broadcast channel is a well-studied communication system in classic information theory [30], where a transmitter wishes to send two independent messages to two receivers through a channel with a single input signal that induces two separate output signals; see Figure 1a. To accomplish this goal, the transmitter must encode the messages in such a way that the individual messages can be decoded by the intended receivers. Clearly, if the first receiver is completely ignored, the transmitter can send more information to the second receiver, and vice versa. In other words, there is a tradeoff between the amounts of information that can be sent to the two receivers. One particular important quantity here is the sum of the rates that can be supported on this channel, often referred to as the sum-rate capacity.

Next, we provide a rigorous definition of the broadcast channel capacity region, where the channel is allowed to be used many times at a time, i.e., a block code where the block length approaches infinity. A two-user broadcast channel is specified by an input alphabet

T

, two output alphabets

X

and

Y

, and a conditional probability distribution

P_{X, Y ∣ T}

that gives the channel law for each symbol

t \in T

and

(x, y) \in X \times Y

. The alphabets are usually assumed to be finite, though the results usually hold under more general assumptions [33].

Definition 1.

For a blocklength

n \in N

, let

M_{1}

and

M_{2}

be two positive integers. An

(n, M_{1}, M_{2})

-code for the broadcast channel

P_{X, Y ∣ T}

consists of the following:

Two message sets:

$M_{1} = {1, 2, \dots, M_{1}}, M_{2} = {1, 2, \dots, M_{2}} .$
An encoding function

$f : M_{1} \times M_{2} \to T^{n},$

which assigns to each pair of messages $(m_{1}, m_{2})$ a length-n input sequence $T^{n} = (T_{1}, T_{2}, \dots, T_{n})$ .
Two decoding functions:

$g_{1} : X^{n} \to M_{1}, g_{2} : Y^{n} \to M_{2},$

where ${\hat{M}}_{1} = g_{1} (X^{n})$ is the estimate of message $M_{1}$ by receiver 1, and ${\hat{M}}_{2} = g_{2} (Y^{n})$ is the estimate of message $M_{2}$ by receiver 2.

Definition 2.

For an

(n, M_{1}, M_{2})

-code, let

(M_{1}, M_{2})

be chosen uniformly from

M_{1} \times M_{2}

. The average probability of error is defined as

P_{e}^{(n)} = Pr \{({\hat{M}}_{1}, {\hat{M}}_{2}) \neq (M_{1}, M_{2})\} .

A sequence of codes (indexed by n) is said to have vanishing error probability if

P_{e}^{(n)} \to 0

as

n \to \infty

.

Definition 3.

A rate pair

(R_{1}, R_{2})

is said to be achievable for the broadcast channel

P_{X, Y ∣ T}

if there exists a sequence of

(n, M_{1}^{(n)}, M_{2}^{(n)})

-codes with vanishing error probability such that

\underset{n \to \infty}{lim inf} \frac{1}{n} log M_{1}^{(n)} \geq R_{1}, \underset{n \to \infty}{lim inf} \frac{1}{n} log M_{2}^{(n)} \geq R_{2} .

The capacity region

C

of the two-user broadcast channel is the closure of the set of all achievable rate pairs. The sum-rate capacity is

C_{sum} ≜ {max}_{(R_{1}, R_{2}) \in C} R_{1} + R_{2}

.

4. Main Result: An Operational Interpretation of PID

In this section, we provide the main result of this work, which is an operational interpretation of partial information decomposition.

4.1. PID via Sato’s Outer Bound

Researchers in the information theory community have made numerous efforts to identify a computable characterization of the capacity region of general broadcast channels (see the textbook [33] for a historical summary); yet, at this time, a complete solution is still elusive. Nevertheless, significant progress has indeed been made toward this goal. Particularly, Sato [34] provided an outer bound for

C

, and it can be specialized to yield an upper bound for the sum-rate capacity of the general broadcast channel as follows:

\begin{matrix} C_{sum} \leq C_{Sato} ≜ min_{Q \in Q^{'}} max_{P_{T}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T), \end{matrix}

(5)

where the set

Q^{'}

is defined as

\begin{matrix} Q^{'} = {Q \in Δ_{X \times Y ∣ T} : Q_{X | T} = P_{X | T}, Q_{Y | T} = P_{Y | T}}, \end{matrix}

(6)

i.e., the set of conditional distributions for which the marginal conditional distributions

P_{X | T}

and

P_{Y | T}

are preserved. The inner maximization is over the possible marginal distribution of the random variable

P_{T}

in the alphabet

T

. The form already bears a certain similarity to (4). Note that for channels on general alphabets (i.e., not necessarily optimized on a compact space), the maximization should be replaced by the supremum and the minimization by the infimum. Due to the minimax form, the meaning is not yet clear, but the max–min inequality (weak duality) implies that

\begin{matrix} min_{Q \in Q^{'}} max_{P_{T}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T) \geq max_{P_{T}} min_{Q \in Q^{'}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T) = max_{P_{T}} min_{Q \in Q} I_{Q_{X, Y, T}} (X, Y; T), \end{matrix}

(7)

where the equality is by the definition of

Q^{'}

, and the set

Q

is exactly the one defined in (4) with

Q_{X, T} = P_{T} Q_{X | T}

and

Q_{Y, T} = P_{T} Q_{Y | T}

. The inner minimization of this form is exactly the same as the second term in (3). Though the max–min form does not yield a true upper bound of the sum-rate capacity, in the PID setting we consider,

P_{T}

is always fixed; therefore, the max–min and min–max forms are in fact equivalent in this setting. The equivalence in mathematical forms does not fully explain the significance of this connection, and we will need to consider Sato’s bound more carefully. Let us define the following quantity for notational simplicity:

\begin{matrix} R_{P_{T}} ≜ min_{Q \in Q^{'}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T) . \end{matrix}

(8)

Sato’s outer bound was derived using the following argument. For a channel that has a single input signal T and a single output signal Y on a channel

P_{Y | T}

, Shannon’s channel coding theorem [28] states that the channel capacity is given by

{max}_{P_{T}} I_{P_{T} P_{Y | T}} (Y; T)

. Moreover, for any fixed distribution

P_{T}

, the rate

I_{P_{T} P_{Y | T}} (Y; T)

is achievable, where the probability distribution

P_{T}

represents the statistical signaling pattern of the underlying codes. Turning our attention back to the broadcast channel with a transition probability

P_{X, Y | T}

, if the receivers are allowed to cooperate fully and share the two output signals X and Y—i.e., they become a single virtual receiver (see Figure 1b)—then clearly the maximum rate achievable would be

{max}_{P_{T}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T)

. However, Sato further observed that the error probability of any code should not differ on any broadcast channel

Q_{X, Y ∣ T} \in Q^{'}

, even if the transition distribution

Q_{X, Y ∣ T}

is different from the true broadcast channel transition probability

P_{X, Y ∣ T}

. This is because the channel outputs only depend on the marginal transition probabilities

P_{X | T}

and

P_{Y | T}

, respectively, and the decoders only use their respective channel outputs to decode. Therefore, we can obtain an upper bound by choosing the worst channel configuration

Q \in Q

, i.e., the outer minimization in (5).

With the interpretation of Sato’s upper bound above, it becomes clear that

R_{P_{T}}

is essentially an upper bound on the sum rate of the broadcast channel where the receivers are not allowed to cooperate, when the input signaling pattern is fixed to follow

P_{T}

. On the other hand, the quantity

I_{P_{T} P_{X, Y ∣ T}} (X, Y; T)

is the rate that can be achieved by allowing the two receivers to fully cooperate, also with

P_{T}

being the input signaling pattern. In this sense,

I^{(S)} (T; X, Y)

defined in (3) is a lower bound on the difference between the sum rate with full cooperation and that without any cooperation, with the input signaling pattern following

P_{T}

.

This connection provides an operational interpretation of PID for general distributions. Essentially, synergistic information can be viewed as a surrogate of the cooperative gain. When this lower bound is in fact also achievable,

I^{(S)} (T; X, Y)

would be exactly equal to the cooperative gain. In the corresponding learning setting, it is the difference between what can be inferred about T by using both X and Y in a non-cooperative manner, and what can be inferred by using them jointly. This indeed matches our expectations for the synergistic information. In the next subsection, we consider a special case when Sato’s bound is indeed achievable, and the lower bound mentioned above becomes exact.

In one sense, this operational interpretation is quite intuitive as explained above, but on the other hand, it is also quite surprising. For example, in broadcast channels, a more general setup allows the transmitter to also send a common message to both receivers [35,36], in addition to the two individual messages to the two respective receivers. It would appear plausible to expect this generalized setting to be more closely connected to the PID setting with the common message related to the common information, yet this turns out to be not the case here. Moreover, a dual communication problem studied in information theory is the multiple access channel (see, e.g., [28]), where two transmitters wish to communicate to the same receiver simultaneously. The readers may also wonder if an operational meaning should be extracted on this channel, instead of on the broadcast channel. However, note that in the PID setting, we are inferring T from X and Y, which is similar to the decoding process in the broadcast channel, instead of the multiple access channel. Moreover, in the multiple access channel, the joint distribution of the two transmitters’ inputs is always independent when the two transmitters cannot cooperate, and this will not match the PID setting under consideration. Another seemingly related problem setting studied in the information theory literature is the common information between two random variables [37,38]; however, for the PID defined in [13], this approach also does not yield a meaningful interpretation.

4.2. Gaussian MIMO Broadcast Channel and Gaussian PID

One setting where a full capacity region characterization is indeed known is the Gaussian multiple-input multiple-output (MIMO) channel [32,39]. In the two-user Gaussian MIMO broadcast channel, the channel transition probability

P_{\vec{X} | \vec{T}}

and

P_{\vec{Y} | \vec{T}}

are given, with

\vec{T}

being the transmitter input variable, and

\vec{X}, \vec{Y}

the channel outputs at the two individual receivers. The channel is usually defined as follows:

\begin{matrix} \vec{X} = H_{X} \vec{T} + {\vec{n}}_{X} \end{matrix}

(9)

\begin{matrix} \vec{Y} = H_{Y} \vec{T} + {\vec{n}}_{Y}, \end{matrix}

(10)

where

H_{X}

and

H_{y}

are two channel matrices, the additive noise vector

{\vec{n}}_{X}

is independent of

\vec{T}

, and similarly,

{\vec{n}}_{Y}

is independent of

\vec{T}

. For a fixed input signaling distribution

P_{T}

, the pairwise marginal distributions

P_{\vec{T}, \vec{X}}

and

P_{\vec{T}, \vec{Y}}

are well specified. Conversely, for any joint distribution

P_{\vec{T}, \vec{X}, \vec{Y}}

, where the marginals

P_{\vec{T}, \vec{X}}

and

P_{\vec{T}, \vec{Y}}

are jointly Gaussian, respectively, we can represent their relation in the form above via a Gram–Schmidt orthogonalization. Note that the joint distribution of

{\vec{n}}_{X}, {\vec{n}}_{Y}

is not fully specified here, as the noise vectors

{\vec{n}}_{X}

and

{\vec{n}}_{Y}

are not necessarily jointly Gaussian, but can be dependent in a more sophisticated manner. The standard Gaussian MIMO broadcast problem usually specifies the noises zero-mean Gaussian with certain fixed covariances, and there is also a covariance constraint on the transmitter’s signaling

\vec{T}

. The problem can be further simplified using certain linear transformations, as discussed below.

Let us assume

Σ_{{\vec{n}}_{X}}

and

Σ_{{\vec{n}}_{Y}}

are full rank for now (when

Σ_{{\vec{n}}_{X}}

and

Σ_{{\vec{n}}_{Y}}

are not fully rank, a limiting argument can be invoked to show the same conclusion holds), and in this case, it is clearly without loss of generality to assume that

Σ_{{\vec{n}}_{i}}

’s are in fact identity matrices, since otherwise, we can perform receiver-side linear transforms to make them so, i.e., through a transformation based on the eigenvalue decomposition of

Σ_{{\vec{n}}_{X}}

and

Σ_{{\vec{n}}_{Y}}

, respectively. For the same reason, we can assume

Σ_{\vec{T}}

is an identity matrix, through a linear transformation at the transmitter. These reductions to independent noise and independent channel input are often referred to as the transmitter precoding transformation and the receiver precoding transformations in the communication literature; see, e.g., [39].

For the two-user Gaussian broadcast channel, the worst channel configuration problem we discussed in the general setting is essentially the least favorable noise problem considered by Yu and Cioffi [39] with the simplification above, where the noise relation between

{\vec{n}}_{X}

and

{\vec{n}}_{Y}

needs to be identified for a channel that makes it the hardest to communicate. It was shown in [39] that the least favorable noise problem can be recast as an optimization problem:

\begin{matrix} minimize : & log \frac{| H Σ_{\vec{T}} H^{T} + Σ_{\vec{n}} |}{| Σ_{\vec{n}} |} \end{matrix}

(11)

\begin{matrix} subject to : & Σ_{{\vec{n}}_{x}} = I, \end{matrix}

(12)

\begin{matrix} Σ_{{\vec{n}}_{y}} = I, \end{matrix}

(13)

\begin{matrix} Σ_{\vec{n}} ⪰ 0, \end{matrix}

(14)

where

H^{T} = [H_{X}^{T}, H_{Y}^{T}]

, and

\vec{n} = ({\vec{n}}_{X}, {\vec{n}}_{Y})

, when

H Σ_{\vec{T}} H^{T}

is nonsingular. It can be shown that the problem is convex.

Yu and Cioffi also showed that in this setting, Sato’s upper bound is achievable—i.e., it is exactly the sum-rate capacity. Moreover, for any input signaling

P_{\vec{T}}

that is Gaussian-distributed, the corresponding

R_{P_{T}}

can be achieved on this broadcast channel through a more sophisticated scheme known as dirty-paper coding [40]. Therefore, when the pairwise marginals

P_{\vec{T}, \vec{X}}

and

P_{\vec{T}, \vec{X}}

are jointly Gaussian, respectively, the synergistic information

I^{(S)} (\vec{T}; \vec{X}, \vec{Y})

is exactly the cooperative gain of the corresponding Gaussian broadcast channel using this specific input signaling pattern

P_{\vec{T}}

. The connection in the Gaussian setting is of particular interest, given the practical importance of the Gaussian PID, which was thoroughly explored in [41].

4.3. Revisiting the Examples

Let us now revisit the examples given earlier, and attempt to understand them in the broadcast channel setting:

1.: Common information dominant: Here, the two receivers both know the transmitted signal completely, and therefore, there is no difference even if they are allowed to cooperate, and the cooperative gain is 0.
2.: Synergistic information dominant: Clearly, here, T is a uniformly distributed Bernoulli random variable. When the two receivers are not allowed to cooperate, they cannot decode anything because the channel $T \to X$ is completely noisy, and similarly for the channel $T \to Y$ . When the receivers are allowed to cooperate, then from X and Y, we can completely recover T—i.e., the channel becomes noiseless. Therefore, the full information in T can be decoded. The cooperative gain is then one, matching the synergistic information.
3.: Component-wise decomposition: Here, T has three uniformly distributed Bernoulli random variable components, mutually independent of each other. In the channel $T \to X$ , the first component is noiseless, and the second component is completely noisy. This is similarly the case for $T \to Y$ . When the receivers are not allowed to cooperate, the sum rate on the channel is two. However, when the channels are allowed to cooperate, the last component in T becomes useful (noiseless), and the total communication rate becomes three. Therefore the cooperative gain is $3 - 2 = 1$ , equal to the synergistic information.

5. Conclusions

We provide an operational interpretation of the PID given in [13] via a connection to the well-studied broadcast channel capacity in the information theory literature. The synergistic information is directly connected to the cooperative gain on the corresponding broadcast channel, either quantitatively exactly equal or being a lower bound. This interpretation can help us better understand why such decomposition can be used to guide model selection in machine learning and in the analysis of biological signals, and potentially design new methods to utilize it.

We note that in the information theory literature, broadcast channels with cooperative receivers have been studied carefully [42], where the two receivers are allowed to communicate on rate-limited links. It may be of interest to consider the corresponding notion of partial information decompositions, when the synergistic information can be parametrized by the degree of cooperation between the receivers. Moreover, the concept of Renyi information has found applications in many fields [43], and the decomposition of Renyi information has not been explored; it may be beneficial to study how our approach can be applied to such decompositions.

Author Contributions

Conceptualization, C.T.; Formal analysis, C.T.; Investigation, S.S.; Writing—original draft, C.T.; Writing—review & editing, C.T. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work of S. Shamai was supported by the German Research Foundation (DFG) via the German-Israeli Project Cooperation (DIP) under Project SH 1937/1-1 and the Ollendorff Minerva Center of the Technion.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable.

Acknowledgments

C. Tian wishes to acknowledge his discussion with Paul Liang of the Massachusetts Institute of Technology regarding the application of PID in multimodality learning. This work was partially completed while C. Tian was on a faculty development leave at the Massachusetts Institute of Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wibral, M.; Priesemann, V.; Kay, J.W.; Lizier, J.T.; Phillips, W.A. Partial information decomposition as a unified approach to the specification of neural goal functions. Brain Cogn. 2017, 112, 25–38. [Google Scholar] [CrossRef] [PubMed]
Feldman, A.K.; Venkatesh, P.; Weber, D.J.; Grover, P. Information-theoretic tools to understand distributed source coding in neuroscience. IEEE J. Sel. Areas Inf. Theory 2024, 5, 509–519. [Google Scholar] [CrossRef]
Wibral, M.; Finn, C.; Wollstadt, P.; Lizier, J.T.; Priesemann, V. Quantifying information modification in developing neural networks via partial information decomposition. Entropy 2017, 19, 494. [Google Scholar] [CrossRef]
Ehrlich, D.A.; Schneider, A.C.; Priesemann, V.; Wibral, M.; Makkeh, A. A measure of the complexity of neural representations based on partial information decomposition. Trans. Mach. Learn. Res. 2023, 5, 2835–8856. [Google Scholar]
Timme, N.; Alford, W.; Flecker, B.; Beggs, J.M. Synergy, redundancy, and multivariate information measures: An experimentalist’s perspective. J. Comput. Neurosci. 2014, 36, 119–140. [Google Scholar] [CrossRef]
Stramaglia, S.; Cortes, J.M.; Marinazzo, D. Synergy and redundancy in the Granger causal analysis of dynamical networks. New J. Phys. 2014, 16, 105003. [Google Scholar] [CrossRef]
Timme, N.M.; Ito, S.; Myroshnychenko, M.; Nigam, S.; Shimono, M.; Yeh, F.C.; Hottowy, P.; Litke, A.M.; Beggs, J.M. High-degree neurons feed cortical computations. PLoS Comput. Biol. 2016, 12, e1004858. [Google Scholar] [CrossRef]
Liang, P.P.; Cheng, Y.; Fan, X.; Ling, C.K.; Nie, S.; Chen, R.; Deng, Z.; Allen, N.; Auerbach, R.; Mahmood, F.; et al. Quantifying & modeling multimodal interactions: An information decomposition framework. Adv. Neural Inf. Process. Syst. 2024, 36, 27351–27393. [Google Scholar]
Liang, P.P.; Zadeh, A.; Morency, L.P. Foundations & trends in multimodal machine learning: Principles, challenges, and open questions. ACM Comput. Surv. 2024, 56, 1–42. [Google Scholar]
Liang, P.P.; Deng, Z.; Ma, M.Q.; Zou, J.Y.; Morency, L.P.; Salakhutdinov, R. Factorized contrastive learning: Going beyond multi-view redundancy. Adv. Neural Inf. Process. Syst. 2024, 36, 32971–32998. [Google Scholar]
Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
Schneidman, E.; Bialek, W.; Berry, M.J. Synergy, redundancy, and independence in population codes. J. Neurosci. 2003, 23, 11539–11553. [Google Scholar] [CrossRef] [PubMed]
Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared information—New insights and problems in decomposing information in complex systems. In Proceedings of the European Conference on Complex Systems 2012, Agadir, Morocco, 5–6 November 2012; pp. 251–269. [Google Scholar]
Griffith, V.; Koch, C. Quantifying synergistic mutual information. In Guided Self-Organization: Inception; Springer: Berlin/Heidelberg, Germany, 2014; pp. 159–190. [Google Scholar]
Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E—Stat. Nonlinear Soft Matter Phys. 2013, 87, 012130. [Google Scholar] [CrossRef]
Olbrich, E.; Bertschinger, N.; Rauh, J. Information decomposition and synergy. Entropy 2015, 17, 3501–3517. [Google Scholar] [CrossRef]
Porta, A.; Bari, V.; De Maria, B.; Takahashi, A.C.; Guzzetti, S.; Colombo, R.; Catai, A.M.; Raimondi, F.; Faes, L. Quantifying net synergy/redundancy of spontaneous variability regulation via predictability and transfer entropy decomposition frameworks. IEEE Trans. Biomed. Eng. 2017, 64, 2628–2638. [Google Scholar]
Lizier, J.T.; Bertschinger, N.; Jost, J.; Wibral, M. Information decomposition of target effects from multi-source interactions: Perspectives on previous, current and future work. Entropy 2018, 20, 307. [Google Scholar] [CrossRef]
Quax, R.; Har-Shemesh, O.; Sloot, P.M. Quantifying synergistic information using intermediate stochastic variables. Entropy 2017, 19, 85. [Google Scholar] [CrossRef]
Barrett, A.B. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef]
Chatterjee, P.; Pal, N.R. Construction of synergy networks from gene expression data related to disease. Gene 2016, 590, 250–262. [Google Scholar] [CrossRef]
Rauh, J.; Banerjee, P.K.; Olbrich, E.; Jost, J.; Bertschinger, N. On extractable shared information. Entropy 2017, 19, 328. [Google Scholar] [CrossRef]
Ince, R.A. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy 2017, 19, 318. [Google Scholar] [CrossRef]
Finn, C.; Lizier, J.T. Pointwise partial information decompositionusing the specificity and ambiguity lattices. Entropy 2018, 20, 297. [Google Scholar] [CrossRef] [PubMed]
James, R.G.; Emenheiser, J.; Crutchfield, J.P. Unique information via dependency constraints. J. Phys. A Math. Theor. 2018, 52, 014002. [Google Scholar] [CrossRef]
Varley, T.F. Generalized decomposition of multivariate information. PLoS ONE 2024, 19, e0297128. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Kolchinsky, A. A novel approach to the partial information decomposition. Entropy 2022, 24, 403. [Google Scholar] [CrossRef]
Cover, T. Broadcast channels. IEEE Trans. Inf. Theory 1972, 18, 2–14. [Google Scholar] [CrossRef]
Cover, T. An achievable rate region for the broadcast channel. IEEE Trans. Inf. Theory 1975, 21, 399–404. [Google Scholar] [CrossRef]
Weingarten, H.; Steinberg, Y.; Shamai, S.S. The capacity region of the Gaussian multiple-input multiple-output broadcast channel. IEEE Trans. Inf. Theory 2006, 52, 3936–3964. [Google Scholar] [CrossRef]
El Gamal, A.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Sato, H. An outer bound to the capacity region of broadcast channels (Corresp.). IEEE Trans. Inf. Theory 1978, 24, 374–377. [Google Scholar] [CrossRef]
Geng, Y.; Nair, C. The capacity region of the two-receiver Gaussian vector broadcast channel with private and common messages. IEEE Trans. Inf. Theory 2014, 60, 2087–2104. [Google Scholar] [CrossRef]
Tian, C. Latent capacity region: A case study on symmetric broadcast with common messages. IEEE Trans. Inf. Theory 2011, 57, 3273–3285. [Google Scholar] [CrossRef]
Gacs, P.; Korner, J. Common information is far less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
Wyner, A. The common information of two dependent random variables. IEEE Trans. Inf. Theory 1975, 21, 163–179. [Google Scholar] [CrossRef]
Yu, W.; Cioffi, J.M. Sum capacity of Gaussian vector broadcast channels. IEEE Trans. Inf. Theory 2004, 50, 1875–1892. [Google Scholar] [CrossRef]
Costa, M. Writing on dirty paper (corresp.). IEEE Trans. Inf. Theory 1983, 29, 439–441. [Google Scholar] [CrossRef]
Venkatesh, P.; Bennett, C.; Gale, S.; Ramirez, T.; Heller, G.; Durand, S.; Olsen, S.; Mihalas, S. Gaussian partial information decomposition: Bias correction and application to high-dimensional data. Adv. Neural Inf. Process. Syst. 2024, 36, 74602–74635. [Google Scholar]
Dabora, R.; Servetto, S.D. Broadcast channels with cooperating decoders. IEEE Trans. Inf. Theory 2006, 52, 5438–5454. [Google Scholar] [CrossRef]
Liese, F.; Vajda, I. On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 2006, 52, 4394–4412. [Google Scholar] [CrossRef]

Figure 1. Standard broadcast channel vs. broadcast channel with receiver cooperation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tian, C.; Shamai, S. Broadcast Channel Cooperative Gain: An Operational Interpretation of Partial Information Decomposition. Entropy 2025, 27, 310. https://doi.org/10.3390/e27030310

AMA Style

Tian C, Shamai S. Broadcast Channel Cooperative Gain: An Operational Interpretation of Partial Information Decomposition. Entropy. 2025; 27(3):310. https://doi.org/10.3390/e27030310

Chicago/Turabian Style

Tian, Chao, and Shlomo Shamai (Shitz). 2025. "Broadcast Channel Cooperative Gain: An Operational Interpretation of Partial Information Decomposition" Entropy 27, no. 3: 310. https://doi.org/10.3390/e27030310

APA Style

Tian, C., & Shamai, S. (2025). Broadcast Channel Cooperative Gain: An Operational Interpretation of Partial Information Decomposition. Entropy, 27(3), 310. https://doi.org/10.3390/e27030310

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Broadcast Channel Cooperative Gain: An Operational Interpretation of Partial Information Decomposition

Abstract

1. Introduction

2. Partial Information Decomposition

2.1. A Few Examples

2.2. Partial Information Decomposition

3. Broadcast Channels

4. Main Result: An Operational Interpretation of PID

4.1. PID via Sato’s Outer Bound

4.2. Gaussian MIMO Broadcast Channel and Gaussian PID

4.3. Revisiting the Examples

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI