Non-Negative Decomposition of Multivariate Information: From Minimum to Blackwell-Specific Information

Partial information decompositions (PIDs) aim to categorize how a set of source variables provides information about a target variable redundantly, uniquely, or synergetically. The original proposal for such an analysis used a lattice-based approach and gained significant attention. However, finding a suitable underlying decomposition measure is still an open research question at an arbitrary number of discrete random variables. This work proposes a solution with a non-negative PID that satisfies an inclusion–exclusion relation for any f-information measure. The decomposition is constructed from a pointwise perspective of the target variable to take advantage of the equivalence between the Blackwell and zonogon order in this setting. Zonogons are the Neyman–Pearson region for an indicator variable of each target state, and f-information is the expected value of quantifying its boundary. We prove that the proposed decomposition satisfies the desired axioms and guarantees non-negative partial information results. Moreover, we demonstrate how the obtained decomposition can be transformed between different decomposition lattices and that it directly provides a non-negative decomposition of Rényi-information at a transformed inclusion–exclusion relation. Finally, we highlight that the decomposition behaves differently depending on the information measure used and how it can be used for tracing partial information flows through Markov chains.


Introduction
From computer science to neuroscience, we can find the following problem: We would like to know information about a random variable T, called the target, which we cannot observe directly.However, we can obtain information about the target indirectly from another set of variables V = {V 1 , . . ., V n }.We can use information measures to quantify how much information any set of variables provides about the target.When doing so, we can identify the concept of redundancy: For example, if we have two identical variables V 1 = V 2 , then we can use one variable to predict the other and, thus, anything that this other variable can predict.Similarly, we can identify the concept of synergy: For example, if we have two independent variables and a target that corresponds to their XOR operation T = (V 1 XOR V 2 ), then both variables provide no advantage on their own for predicting the state of T, yet their combination fully determines it.Williams and Beer [1] suggested that it is possible to characterize information as visualized by the Venn diagram for two variables V = {V 1 , V 2 } in Figure 1a.This decomposition attributes the total information about the target to being redundant, synergetic, or unique to a particular variable.As indicated in Figure 1a by I(•, T), we can quantify three of the areas using information measures.However, this is insufficient to determine the four partial areas that represent the individual contributions.This causes the necessity to extend an information measure to either quantify the amount of redundancy or synergy between a set of variables.
Williams and Beer [1] first proposed a framework for Partial Information Decompositions (PIDs) and found favor by the community [2].However, the proposed measure of redundancy was criticized for not distinguishing, "the same information and the same amount of information" [3][4][5][6].The proposal of Williams and Beer [1] focused specifically on mutual information.This work additionally studies the decomposition of any f -information or Rényi-information at discrete random variables.They have significance, among others, in parameter estimations, high-dimensional statistics, hypothesis testing, channel coding, data compression, and privacy analyses [7,8]., where the redundancy measure I ∩ quantifies the information that is contained in all of its provided variables (inside their intersection).The ordering represents the expected subset relation of redundancy.(c) Representation as synergy lattice, where the loss measure I ∪ quantifies the information that is contained in neither of its provided variables (outside their union).(d) Information flow visualization: When having two partial information decompositions with respect to the same target variable, we can study how the partial information of one decomposition propagates into the next.We refer to this as information flow analysis of a Markov chain such as T → (A 1 , A 2 ) → (B 1 , B 2 ).

Related Work
Most of the literature focuses on the decomposition of mutual information.Here, many alternative measures have been proposed, but cannot fully replace the original measure of Williams and Beer [1] since they do not provide non-negative results for any |V|: The special case of bivariate partial information decompositions (|V| = 2) has been well studied, and several non-negative decompositions for the framework of Williams and Beer [1] are known [5,[9][10][11][12].However, each of these decompositions provides negative partial information for |V| > 2. Further research [13][14][15] specifically aimed to define decompositions of mutual information for an arbitrary number of observable variables, but similarly obtained negative partial contributions and the resulting difficulty of interpreting their results.Griffith et al. [3] studied the decomposition of zero-error information and obtained negative partial contributions.Kolchinsky [16] proposed a decomposition framework for an arbitrary number of observable variables that is applicable beyond Shannon information theory, however, where the partial contributions do not sum to the total amount.
In this work, we propose a decomposition measure for replacing the one presented by Williams and Beer [1] while maintaining its desired properties.To achieve this, we combine several concepts from the literature: We use the Blackwell order, a preorder of information channels, for the decomposition and for deriving its operational interpretation, similar to Bertschinger et al. [9] and Kolchinsky [16].We use its special case for binary input channels, the zonogon order studied by Bertschinger and Rauh [17], to achieve non-negativity at an arbitrary number of variables and provide it with a practical meaning by highlighting its equivalence to the Neyman-Pearson (decision) region.To utilize this special case for a general decomposition, we use the concept of a target pointwise decomposition as demonstrated by Williams and Beer [1] and related to Lizier et al. [18], Finn and Lizier [13], and Ince [14].Specifically, we use Neyman-Pearson regions of an indicator variable for each target state to define distinct information and quantify pointwise information from its boundary.This allows for the non-negative decomposition of an arbitrary number of variables, where the source and target variables can have an arbitrary finite number of states.Finally, we apply the concepts from measuring on lattices, discussed by Knuth [19], to transform a non-negative decomposition with an inclusion-exclusion relation from one information measure to another while maintaining the decomposition properties.
Remark 1.We use the term "target pointwise" or simply "pointwise" within this work to refer to the analysis of each target state individually.This differs from [13,14,18], who use the latter term for the analysis of all joint source-target realizations.

Contributions
In a recent work [20], we presented a decomposition of mutual information on the redundancy lattice (Figure 1b).This work aims to simplify, generalize, and extend these ideas to make the following contributions to the area of partial information decompositions:

•
We propose a representation of distinct uncertainty and distinct information, which is used to demonstrate the unexpected behavior of the measure by Williams and Beer [1] (Sections 2.2 and 3.1).

•
We propose a non-negative decomposition for any f -information measure at an arbitrary number of discrete random variables that satisfies an inclusion-exclusion relation and provides a meaningful operational interpretation (Sections 3.2, 3.3 and 3.5).The decomposition satisfies the original axioms of Williams and Beer [1] (Theorems 3 and 4) and obtains different properties from different information measures (Section 4).

•
We demonstrate several transformations of the proposed decomposition: (i) We transform the cumulative measure between different decomposition lattices (Section 3.4).
(ii) We demonstrate that the non-negative decomposition of f -information directly provides a non-negative decomposition of Rényi-and Bhattacharyya-information at a transformed inclusion-exclusion relation (Section 3.6).

Background
This section aims to provide the required background information and introduce the notation used.Section 2.1 discusses the Blackwell order and its special case at binary targets, the zonogon order, which will be used for operational interpretations and the representation of f -information for its decomposition.Section 2.2 discusses the PID framework of Williams and Beer [1] and the relation between a decomposition based on the redundancy lattice and one based on the synergy lattice.We also demonstrate the unintuitive behavior of the original decomposition measure, which will be resolved by our proposal in Section 3. Section 2.3 provides the considered definitions of f -information, Rényi-information, and Bhattacharyya-information for the later demonstration of transforming decomposition results between measures.Notation 1 (Random variables and their distribution).We use the notation T (upper case) to represent a random variable, ranging over the event space T (calligraphic) containing events t ∈ T (lower case) and use the notation P T (P with subscript) to indicate its probability distribution.The same convention applies to other variables, such as a random variable S with events s ∈ S and distribution P S .We indicate the outer product of two probability distributions as P S ⊗ P T , which assigns the product of their marginals P S (s) • P T (t) to each event (s, t) of the Cartesian product S × T .Unless stated otherwise, we use the notation T, S, and V to represent random variables throughout this work.

Blackwell and Zonogon Order
Definition 1 (Channel).A channel µ = T → S from T to S represents a garbling of the input variable T, which results in variable S. Within this work, we represent an information channel µ as a (row) stochastic matrix, where each element is non-negative, and all rows sum to one.
For the context of this work, we consider a variable S to be the observation of the output from an information channel T → S from the target variable T, such that the corresponding channel can be obtained from their conditional probability distribution, as shown in Equation ( 1) where T = {t 1 , . . ., t n } and S = {s 1 , . . ., s m }.
Notation 2 (Binary input channels).Throughout this work, we reserve the symbol κ for binary input channels, meaning κ signals a stochastic matrix of dimension 2 × m.We use the notation ⃗ v ∈ κ to indicate a column of this matrix.
Definition 2 (More informative [17,21]).An information channel µ 1 = T → S 1 is more informative than another channel µ 2 = T → S 2 if-for any decision problem involving a set of actions a ∈ Ω and a reward function u : (Ω, T ) → R that depends on the chosen action and state of the variable T-an agent with access to S 1 can always achieve an expected reward at least as high as another agent with access to S 2 .
Definition 3 (Blackwell order [17,21]).The Blackwell order is a preorder of channels.A channel µ 1 is Blackwell superior to channel µ 2 , if we can pass its output through a second channel λ to obtain an equivalent channel to µ 2 , as shown in Equation (2).
Blackwell [21] showed that a channel is more informative if and only if it is Blackwell superior.Bertschinger and Rauh [17] showed that the Blackwell order does not form a lattice for channels µ = T → S if |T | > 2 since the ordering does not provide unique meet and join elements.However, binary target variables |T | = 2 are a special case where the Blackwell order is equivalent to the zonogon order (discussed next) and does form a lattice [17].[17]).The zonogon Z(κ) of a binary input channel κ = T → S is defined using the Minkowski sum from the collection of vector segments as shown in Equation (3).The zonogon Z(κ) can similarly be defined as the image of the unit cube [0, 1] |S| under the linear map of κ.

Definition 4 (Zonogon
The zonogon Z(κ) is a centrally symmetric convex polygon, and the set of vectors ⃗ v i ∈ κ spans its perimeter.Figure 2 shows an example of a binary input channel and its corresponding zonogon.
Definition 5 (Zonogon sum).The addition of two zonogons corresponds to their Minkowski sum as shown in Equation (4).
Bertschinger and Rauh [17] showed that, for binary input channels, the zonogon order is equivalent to the Blackwell order and forms a lattice (Equation ( 5)).In the remaining work, we will only discuss binary input channels, such that the orderings of Definitions 2, 3, and 6 are equivalent and can be thought of as zonogons with a subset relation.
To obtain an interpretation of what a channel zonogon Z(κ) represents, we can consider a binary decision problem by aiming to predict the state t ∈ T of a binary target variable T using the output of channel κ = T → S. Any decision strategy λ ∈ [0, 1] |S|×2 for obtaining a binary prediction T can be fully characterized by its resulting pair of True-Positive Rate (TPR) and False-Positive Rate (FPR), as shown in Equation ( 6): Therefore, a channel zonogon Z(κ) provides the set of all achievable (TPR,FPR)-pairs for a given channel κ [20,22].This can also be seen from Equation (3), where the unit cube a ∈ [0, 1] |S| represents all possible first columns of the decision strategy λ.The first column of λ fully determines the second since each row has to sum to one.As a result, κa provides the (TPR,FPR)-pair for the decision strategy λ = [ a (1−a) ] and the definition of Equation ( 3) for all achievable (TPR,FPR)-pairs for predicting the state of a binary target variable.Since this will be helpful for operational interpretations, we label the axis of zonogon plots accordingly, as shown in The zonogon is the Neyman-Pearson region, and its perimeter corresponds to the vectors ⃗ v s i ∈ κ sorted by an increasing/decreasing slope for the lower/upper half, which results from the likelihood ratio test.The zonogon, thus, represents the achievable (TPR,FPR)-pairs for predicting T while knowing S.
Definition 7 (Neyman-Pearson region [7] and decision regions).The Neyman-Pearson region for a binary decision problem is the set of achievable (TPR,FPR)-pairs and can be visualized as shown in Figure 2. The Neyman-Pearson regions underlie the zonogon order, and their boundary can be obtained from the likelihood-ratio test.We refer to subsets of the Neyman-Pearson region as reachable decision regions, or simply decision regions, and the boundary as the zonogon perimeter.
Remark 2. Due to the zonogon symmetry, the diagram labels can be swapped (FPR x-axis/TPR y-axis), which changes the interpretation to aiming at a prediction for T ̸ = t.
Notation 3 (Channel lattice).We use the notation κ 1 ⊓ κ 2 for the meet element of binary input channels under the Blackwell order and κ 1 ⊔ κ 2 for their join element.We use the notation ⊤ BW = 1 0 0 1 for the top element of binary input channels under the Blackwell order and ⊥ BW = 1  1 for the bottom element.
For binary input channels, the meet element of the Blackwell order corresponds to the zonogon intersection Z(κ 1 ⊓ κ 2 ) = Z(κ 1 ) ∩ Z(κ 2 ) and the join element of the Blackwell order corresponds to the convex hull of their union Z(κ 1 ⊔ κ 2 ) = Conv(Z(κ 1 ) ∪ Z(κ 2 )).Equation (7) describes this for an arbitrary number of channels. and Example 1.The remaining work only analyzes indicator variables, so we only need to consider the case |T | = 2 where all presented ordering relations of this section are equivalent and form a lattice.Figure 3a visualizes a channel T κ − → S with |S| = 3.We can use the observations of S for making a prediction T about T. For example, we predict that T is in its first state with probability w 1 if S is in its first state, with probability w 2 if S is in its second state, and with probability w 3 if S is in its third state.These randomized decision strategies can be noted as stochastic matrix λ shown in Figure 3a.The resulting TPR and FPR of this decision strategy is obtained from the weighted sum of these parameters (w 1 , w 2 , and w 3 ) with the vectors in κ.Each decision strategy corresponds to a point within the zonogon, since the probabilities are constrained by w 1 , w 2 , w 3 ∈ [0, 1] and the resulting zonogon is the Neyman-Pearson region.
True-Positive Rate (TPR) False-Positive Rate (FPR)    3b visualizes an example for the discussed ordering relations, where all observable variables have two states: |S i | = 2 where i ∈ {1, 2, 3}.The zonogon/Neyman-Pearson region corresponding to variable S 3 is fully contained within the others (Z(κ 3 ) ⊆ Z(κ 1 ) and Z(κ 3 ) ⊆ Z(κ 2 )).Therefore, we can say that S 3 is Blackwell inferior (Definition 3) and less informative (Definition 2) than S 1 and S 2 about T. Practically, this means that we can construct an equivalent variable to S 3 by garbling S 1 or S 2 and that, for any sequence of actions based on S 3 and any reward function with dependence on T, we can achieve an expected reward at least as a high by acting based on S 1 or S 2 instead.The variables S 1 and S 2 are incomparable to the zonogon order, Blackwell order, and informativity order, since the Neyman-Pearson region of one is not fully contained in the other.
The zonogon shown in Figure 3a corresponds to the join under the zonogon order, Blackwell order, and informativity order of S 1 and S 2 in Figure 3b about T. For binary targets, this distribution can directly be obtained from the convex hull of their Neyman-Pearson regions and corresponds to a valid joint distribution for (T, S 1 , S 2 ).All other joint distributions are either equivalent or superior to it.When doing this on indicator variables for |T | > 2, then the obtained joint distributions for each t ∈ T may not combine into a specific valid overall joint distribution.

Partial Information Decomposition
The commonly used framework for PIDs was introduced by Williams and Beer [1].A PID is computed with respect to a particular random variable that we would like to know information about, called the target, and tries to identify from which variables that we have access to, called visible variables, we obtain this information.Therefore, this section considers sets of variables that represent their joint distribution.Notation 4. Throughout this work, we use the notation T for the target variable and V = {V 1 , . . ., V n } for the set of visible variables.We use the notation P(V) for the power set of V and P 1 (V) = P(V) \ ∅ for its power set without the empty set.

•
A source S i ∈ P 1 (V) is a non-empty set of visible variables.

•
An atom α ∈ A(V) is a set of sources constructed by Equation (8).
The filter used for obtaining the set of atoms (Equation ( 8)) removes sets that would be equivalent to other elements.This is required for obtaining a lattice from the following two ordering relations: Definition 9 (Redundancy/gain lattice [1] ).The redundancy lattice (A(V), ≼) is obtained by applying the ordering relation of Equation ( 9) to all atoms α, β ∈ A(V).
The redundancy lattice for three visible variables is visualized in Figure 4a.On this lattice, we can think of an atom as representing the information that can be obtained from all of its sources about the target T (their redundancy or informational intersection).For example, the atom α = {{V 1 , V 2 }, {V 1 , V 3 }} represents on the redundancy lattice the information that is contained in both (V 1 , V 2 ) and (V 1 , V 3 ) about T. Since both sources in α provide the information of V 1 , their redundancy contains at least this information, and the atom β = {{V 1 }} is considered its predecessor.Therefore, the ordering indicates an informational subset relation for the redundancy of atoms, and the information that is represented by an atom increases as we move up.The up-set of an atom α on the redundancy lattice indicates the information that is lost when losing all of its sources.Considering the example from above, if we lose access to {V 1 (or) V 2 } and {V 1 (or) V 3 }, then we lose access to all atoms in the up-set of α = {{V 1 , V 2 }, {V  9) quantifies information present in all sources.The redundancy of all sources within an atom increases while moving up on the redundancy lattice.(b) A synergy/loss lattice (A({V 1 , V 2 , V 3 }), ⪯) based on the ordering of Equation ( 10) quantifies information present in neither source.On the synergy lattice, the information that is obtained from neither source of an atom increases while moving up.
The synergy lattice for three visible variables is visualized in Figure 4b.On this lattice, we can think of an atom as representing the information that is contained in neither of its sources (information outside their union).For example, the atom α = {{V 1 , V 2 }, {V 1 , V 3 }} represents on the synergy lattice the information that is obtained from neither The ordering again indicates their expected subset relation: the information that is obtained from neither {V 1 (and) V 2 } nor {V 1 (and) V 3 } is fully contained in the information that cannot be obtained from β = {{V 1 }}, and thus, α is a predecessor of β.
With an intuition for both ordering relations in mind, we can see how the filter in the construction of atoms (Equation ( 8)) removes sets that would be equivalent to another atom: the set {{V 1 , V 2 }, {V 1 }} is removed from the power set of sources since it would be equivalent to the atom {{V 1 }} under the ordering of the redundancy lattice and equivalent to the atom {{V 1 , V 2 }} under the ordering of the synergy lattice.Using Definition 11, one can similarly define the atoms of the decomposition lattices from the power set of sources without the equivalence relation.Definition 11.We define equivalence relations for sets of sources under the redundancy and synergy order: Synergy order: We use the notation A { ∼ =} B to indicate that two sets of atoms are equal when comparing their contained atoms with respect to equivalence under the synergy order.
Notation 5 (Redundancy/synergy lattices).We use the notation (A(V), ⋎, ⋏) for the join and meet operators on the redundancy lattice, and (A(V), ∨, ∧) for the join and meet operators on the synergy lattice.We use the notation ⊤ RL = {V} for the top and ⊥ RL = ∅ for the bottom atom on the redundancy lattice, and ⊤ SL = ∅ and ⊥ SL = {V} for the top and bottom atom on the synergy lattice.For an atom α on the redundancy lattice, we use the notation ↓ R α for its down-set, ↓R α for its strict down-set, ↑ R α for its up-set, ↑R α for its strict up-set, and α − R for its cover set.For an atom α on the synergy lattice, we use the notation ↓ S α for its down-set, ↓S α for its strict down-set, ↑ S α for its up-set, ↑S α for its strict up-set, and α − S for its cover set.
For convenience, Table 1 provides a summary of the notation used.
Table 1.Summary of the notation used for the redundancy and synergy lattice.

Redundancy Order Synergy Order
Ordering/equivalence The redundant, unique, or synergetic information (partial contributions) can be calculated based on either lattice.They are obtained by quantifying each atom of the redundancy or synergy lattice with a cumulative measure that increases as we move up in the lattice.The partial contributions are then obtained in a second step from a Möbius inverse.

Definition 12 ([Cumulative] redundancy measure [1]
).A redundancy measure I ∩ (α; T) is a function that assigns a real value to each atom of the redundancy lattice.It is interpreted as a cumulative information measure that quantifies the redundancy between all sources S ∈ α of an atom α ∈ A(V) about the target T. Definition 13 ([Cumulative] loss measure [23]).A loss measure I ∪ (α; T) is a function that assigns a real value to each atom of the synergy lattice.It is interpreted as a cumulative measure that quantifies the information about T that is provided by neither of the sources S ∈ α of an atom α ∈ A(V).
To ensure that a redundancy measure actually captures the desired concept of redundancy, Williams and Beer [1] defined three axioms that a measure I ∩ should satisfy.For the synergy lattice, we consider the equivalent axioms discussed by Chicharro and Panzeri [23]: [1,23]).Invariance in the order of sources (σ permuting the order of indices): Axiom 2 (Monotonicity [1,23]).Additional sources can only decrease redundant information.
Additional sources can only decrease the information that is in neither source.
Axiom 3 (Self-redundancy [1,23]).For a single source, redundancy equals mutual information.For a single source, the information loss equals the difference between the total available mutual information and the mutual information of the considered source with the target.
The first axiom states that an atom's redundancy and information loss should not depend on the order of its sources.The second axiom states that adding sources to an atom can only decrease the redundancy of all sources (redundancy lattice) and decrease the information from neither source (synergy lattice).The third axiom binds the measures to be consistent with mutual information and ensures that the bottom element of both lattices is quantified to zero.
Once a lattice with the corresponding cumulative measure (I ∩ /I ∪ ) is defined, we can use the Möbius inverse to compute the partial contribution of each atom.This partial information can be visualized as the partial area in a Venn diagram (see Figure 1a) and corresponds to the desired redundant, unique, and synergetic contributions.However, the same atom represents different partial contributions on each lattice: As visualized for the case of two visible variables in Figure 1, the unique information of variable V 1 is represented by α = {{V 1 }} on the redundancy lattice and by β = {{V 2 }} on the synergy lattice.Definition 14 (Partial information [1,23]).Partial information ∆I ∩ (α; T) and ∆I ∪ (α; T) corresponds to the Möbius inverse of its corresponding cumulative measure on the respective lattice.
Redundancy lattice: Synergy lattice: Remark 3. Using the Möbius inverse for defining partial information enforces an inclusionexclusion relation in that all partial information contributions have to sum to the corresponding cumulative measure.Kolchinsky [16] argues that an inclusion-exclusion relation should not be expected to hold for PIDs and proposes an alternative decomposition framework.In this case, the sum of partial contributions (unique/redundant/synergetic information) is no longer expected to sum to the total amount I(V; T).

Property 1 (Local positivity, non-negativity [1]
).A partial information decomposition satisfies non-negativity or local positivity if its partial information contributions are always non-negative, as shown in Equation (13).
The non-negativity property is important if we assume an inclusion-exclusion relation since it states that the unique, redundant, or synergetic information cannot be negative.If an atom α provides a negative partial contribution in the framework of Williams and Beer [1], then this may indicate that we over-counted some information in its down-set.Remark 4. Several additional axioms and properties have been suggested since the original proposal of Williams and Beer [1], such as target monotonicity and the target chain rule [4].However, this work will only consider the axioms and properties of Williams and Beer [1].To the best of our knowledge, no other measure since the original proposal (discussed below) has been able to satisfy these properties for an arbitrary number of visible variables while ensuring an inclusion-exclusion relation for their partial contributions.

It is possible to convert between both representations due to a lattice duality:
Definition 15 (Lattice duality and dual-decompositions [23] ).Let C = (A(V)\{⊥ RL }, ≼) be a redundancy lattice with associated measure I ∩ , and let D = (A(V)\{⊥ SL }, ⪯) be a synergy lattice with measure I ∪ ; then, the two decompositions are said to be dual if and only if the down-set on one lattice corresponds to the up-set in the other, as shown in Equation (14).
Williams and Beer [1] proposed I min ∩ , as shown in Equation ( 15), to be used as a measure of redundancy and demonstrated that it satisfies the three required axioms and local positivity.They define redundancy (Equation (15b)) as the expected value of the minimum specific information (Equation (15a)).
Remark 5. Throughout this work, we use the term "target pointwise information" or simply "pointwise information" to refer to "specific information".This shall avoid confusion when naming their corresponding binary input channels in Section 3.
To the best of our knowledge, this measure is the only existing non-negative decomposition that satisfies all three axioms listed above for an arbitrary number of visible variables while providing an inclusion-exclusion relation of partial information.However, the measure I min ∩ could be criticized for not providing a notion of distinct information due to its use of a pointwise minimum (for each t ∈ T ) over the sources.This leads to the question of distinguishing "the same information and the same amount of information" [3][4][5][6].We can use the definition through a pointwise minimum (Equation ( 15)) to construct examples of unexpected behavior: consider, for example, a uniform binary target variable T and two visible variables as the output of the channels visualized in Figure 5.The channels are constructed to be equivalent for both target states and provide access to distinct decision regions while ensuring constant pointwise information ∀t ∈ T : Even though our ability to predict the target variable significantly depends on which of the two indicated channel outputs we observe (blue or green in Figure 5, incomparable informativity based on Definition 2), the measure I min ∩ concludes full redundancy between them We think this behavior is undesired and, as discussed in the literature, caused by an underlying lack of distinguishing the same information.To resolve this issue, we will present a representation of f -information in Section 3.1, which allows the use of all (TPR,FPR)-pairs for each state of the target variable to represent a distinct notion of uncertainty.
Example of the unexpected behavior of I min ∩ : the dashed isoline indicates the pairs (x, y) for which channel κ(x, y) = T → V i results in pointwise information ∀t ∈ T : I(V i , T = t) = 0.2 for a uniform binary target variable.Even though observing the output of both indicated example channels (blue/green) provides significantly different abilities for predicting the target variable state, the measure I min ∩ indicates full redundancy.

Information Measures
This section discusses two generalizations of mutual information at discrete random variables based on f -divergences and Rényi-divergences [24,25].While mutual information has interpretational significance in channel coding and data compression, other f -divergences have their significance in parameter estimations, high-dimensional statistics, and hypothesis testing ( [7], p. 88), while Rényi-divergences can be found among others in privacy analysis [8].Finally, we introduce Bhattacharyya information for demonstrating that it is possible to chain decomposition transformations in Section 3.6.All definitions in this section only consider the case of discrete random variables (which is what we need for the context of this work).
Definition 16 ( f -divergence [24]).Let f : (0, ∞) → R be a function that satisfies the following three properties: By convention, we understand that f (0) = lim z→0 + f (z) and 0 f 0 0 = 0.For any such function f and two discrete probability distributions P and Q over the event space X , the f -divergence for discrete random variables is defined as shown in Equation (16).
Notation 6.Throughout this work, we reserve the name f for functions that satisfy the required properties for an f -divergence of Definition 16.
An f -divergence quantifies a notion of dissimilarity between two probability distributions P and Q. Key properties of f -divergences are their non-negativity, their invariance under bijective transformations, and them satisfying a data-processing inequality ( [7], p. 89).A list of commonly used f -divergences is shown in Table 2. Notably, the continuation for a = 1 of both the Hellinger-and α-divergence results in the KL-divergence [26].
The generator function of an f -divergence is not unique since D f (z) = D f (z)+c(z−1) for a real constant c ∈ R ( [7], p. 90f).As a result, the considered α-divergence is a linear scaling of the Hellinger divergence (D H a = a • D α=a ), as shown in Equation (17).
Table 2. Commonly used functions for f -divergences.

Notation Name Generator Function
Definition 17 ( f -information [7]).An f -information is defined based on an f -divergence from the joint distribution of two discrete random variables and the product of their marginals, as shown in Equation (18).
Definition 18 ( f -entropy).A notion of f -entropy for a discrete random variable is obtained from the self-information of a variable H f (T) := I f (T; T).

Notation 7.
Using the KL-divergence results in the definition of mutual information and Shannon entropy.Therefore, we use the notation I KL for mutual information (KL-information) and H KL (KL entropy) for the Shannon entropy.
The remaining part of this section will define Rényi-and Bhattacharyya-information to highlight that they can be represented as an invertible transformation of Hellingerinformation.This will be used in Section 3.6 to transform the decomposition of Hellingerinformation to a decomposition of Rényi-and Bhattacharyya-information. Remark 6.We could similarly choose to represent Rényi-divergence as a transformation of the α-divergence.A liner scaling of the considered f -divergence will, however, not affect our later results (see Section 3.6).
Definition 20  ).Rényi-information is defined equivalent to f -information as shown in Equation (20) and corresponds to an invertible transformation of Hellingerinformation (I H a ).I R a (S; T) := R a P (S;T) Finally, we consider the Bhattacharyya distance (Definition 21), which is equivalent to a linear scaling from a special case of Rényi-divergence (Equation ( 21)) [26].It is applied, among others, in signal processing [27] and coding theory [28].The corresponding information measure (Equation ( 22)) is like its distance, the scaling of a special case of Rényi-information.
Definition 21 (Bhattacharyya distance [29] ).Let P and Q be two discrete probability distributions over the event space X , then the Bhattacharyya distance is defined as shown in Equation (21).
Definition 22 (Bhattacharyya-information).Bhattacharyya-information is defined equivalent to f -information as shown in Equation (22).
Example 2. Consider the channel T κ − → S with T = {t 1 , t 2 } and S = {s 1 , s 2 }.While it will be discussed in more detail in Section 3.1, Equation (23) already indicates that f -information can be interpreted as the expected value of quantifying the boundary of the Neyman-Pearson region for an indicator variable of each target state t ∈ T .Each state of a source variable s ∈ S corresponds to one side/edge of this boundary as discussed in Section 2.1 and visualized in Figure 2. Therefore, the sum over s ∈ S corresponds to the sum of quantifying each edge of the zonogon by some function, which is only parameterized by the distribution of the indicator variable for t.This function satisfies a triangle inequality (Corollary A1), and the total boundary is non-negative (Theorem 2 discussed later).Therefore, we can vaguely think of pointwise f -information as quantifying the length of the boundary of the Neyman-Pearson region or zonogon perimeter to give an oversimplified intuition.
Below is a stepwise computation of χ 2 -information ( f (z) = (z − 1) 2 ) on a small example from this interpretation for the setting of Equation (24).
Since |T | = 2, we compute the pointwise information for two indicator variables as shown in Figure 6.Since each state s ∈ S corresponds to one edge of the zonogon, we compute them individually.Notice that the quantification of each vector v s i can be expressed as a function that is only parameterized by the distribution of the indicator variable.The total zonogon perimeter is quantified as the sum of each of its edges, which equals pointwise information.In this particular case, we obtain 0.292653 for the total boundary on the indicator of t 1 and 0.130068 for the total boundary on the indicator of t 2 .The expected information corresponds to the expected value of these pointwise quantifications and provides the final result (Equation ( 25)).

Decomposition Methodology
To construct a partial information decomposition in the framework of Williams and Beer [1], we only have to define a cumulative redundancy measure (I ∩ ) or cumulative loss measure (I ∪ ).However, doing this requires a meaningful definition of when information is the same.Therefore, Section 3.1 presents an interpretation of f -information that enables a representation of distinct information.Specifically, we demonstrate that pointwise finformation for a target state t ∈ T corresponds to the Neyman-Pearson region of its indicator variable, which is quantified by its boundary (zonogon perimeter).This allows for the interpretation that each distinct (TPR,FPR)-pair for predicting a state of the target variable provides a distinct notion of uncertainty.This interpretation of f -information is used in Section 3.2 to construct a partial information decomposition on the synergy lattice under the Blackwell order for each state t ∈ T individually.These individual decompositions are then combined into the final result.Therefore, we decompose specific information based on the Blackwell order rather than using its minimum, like Williams and Beer [1].The resulting operational interpretation is discussed in Section 3.3.Section 3.4 studies the relation between decomposition lattices to derive the dual-decomposition of any f -information on the redundancy lattice in the following Section 3.5 and prove its correctness.We use the obtained decomposition for any f -information in Section 3.6 to transform a Hellinger-information decomposition into a Rényi-information decomposition while maintaining its non-negativity and an inclusion-exclusion relation.To achieve the desired axioms and properties, we combine different aspects of the existing literature: • Like Bertschinger et al. [9] and Kolchinsky [16], we base the decomposition on the Blackwell order and use this to obtain the operational interpretation of the decomposition.• Like Williams and Beer [1] and related to Lizier et al. [18], Finn and Lizier [13], and Ince [14], we perform a decomposition from a pointwise perspective, but only for the target variable.

•
In a similar manner to how Finn and Lizier [13] used probability mass exclusion to differentiate distinct information, we use Neyman-Pearson regions for each state of a target variable to differentiate distinct information.

•
We propose applying the concepts about lattice re-graduations discussed by Knuth [19] to PIDs to transform the decomposition of one information measure to another while maintaining its consistency.
We extend Axiom 3 of Williams and Beer [1] as shown below, to allow binding any information measure to the decomposition.Axiom 3* (Self-redundancy).For a single source, redundancy I ∩, * and information loss I ∪, * correspond to information measure I * as shown below:

Representing f-Information
We begin with an interpretation of f -information, for which we define a pointwise (indicator) variable π(T, t) that represents one state of the target variable (Equation (27a)) and construct its pointwise information channel (Definition 23).Then, we define a function r f based on the generator function of an f -divergence for quantifying (half) the zonogon perimeter of each pointwise information channel (see Figure 2).These perimeter quantifications are pointwise f -information.Definition 23 ([Target] pointwise binary input channel).We define a target pointwise binary input channel κ(S, T, t) from one state of the target variable t ∈ T to an information source S with event space S = {s 1 , . . ., s m } as shown in Equation (27b).

•
We define a function r f as shown in Equation (28a) to quantify a vector, where 0 ≤ p, x, y ≤ 1.

•
We define a target pointwise f -information function i f , as shown in Equation (28b), to quantify half the zonogon perimeter for the corresponding pointwise channel Z(κ(S, T, t)).
scales linearly in ⃗ v, (3) satisfies a triangle inequality in ⃗ v, (4) quantifies any vector of slope one to zero, and (5) quantifies the zero vector to zero. Proof.

1.
The convexity of That r f (p, ℓ⃗ v) = ℓr f (p,⃗ v) scales linearly in ⃗ v can directly be seen from Equation (28a).

3.
The triangle inequality of r f (p,⃗ v) in⃗ v is shown separately in Corollary A1 of Appendix A.
The zero vector is quantified to zero r f (p, 0 0 ) = 0 • f 0 0 = 0 by the convention of generator functions for an f -divergence (Definition 16).
The function r f provides the following properties to the pointwise information measure i f .Theorem 2 (Properties of i f ).The pointwise information measure i f (1) maintains the ordering relation of the Blackwell order for binary input channels and (2) is non-negative.

1.
That the function r f maintains the ordering relation of the Blackwell order on binary input channels is shown separately in Lemma A2 of Appendix A (Equation (29a)).

2.
The bottom element ⊥ BW = 1 1 consists of a single vector of slope one, which is quantified to zero by Theorem 1 (Equation (29b)).The combination with Equation (29a) ensures the non-negativity.
An f -information corresponds to the expected value of the target pointwise f -information function defined above (Equation ( 30)).As a result, we can interpret f -information as the expected value of quantifying (half) the zonogon perimeters for the target pointwise channels κ(S, T, t).

Decomposing f-Information on the Synergy Lattice
With the representation of Section 3.1 in mind, we can define a non-negative partial information decomposition for a set of visible variables V = {V 1 , . . ., V n } about a target variable T for any f -information.The decomposition is performed from a pointwise perspective, which means that we decompose the pointwise measure i f on the synergy lattice (A(V), ⪯) for each t ∈ T .The pointwise synergy lattices are then combined using a weighted sum to obtain the decomposition of I f .
We map each atom of the synergy lattice to the join of pointwise channels for its contained sources.
Definition 25 (From atoms to channels).We define the channel corresponding to an atom α ∈ A(V) as shown in Equation (31).
Lemma 1.For any set of sources α, β ∈ P (P 1 (V)) and target variable T with state t ∈ T , the function κ ⊔ maintains the ordering of the synergy lattice under the Blackwell order as shown in Equation (32).
Lemma 1 is shown separately in Appendix C. The mapping from Definition 25 provides a lattice that can be quantified using pointwise f -information to construct a cumulative loss measure for its decomposition using the Möbius inverse.

Definition 26 ([Target] pointwise cumulative and partial loss measures).
We define the target pointwise cumulative and partial loss functions as shown in Equations (33a) and (33b).i ∪, f (α, T, t) := i f (P T (t), κ(V, T, t)) − i f (P T (t), κ ⊔ (α, T, t)) (33a) The combined cumulative and partial measures are the expected value of their corresponding pointwise measures.This corresponds to combining the pointwise decomposition lattices by a weighted sum.Definition 27 (Combined cumulative and partial loss measures).The cumulative loss measure I ∪, f is defined by Equation (34) and the decomposition result ∆I ∪, f by Equation (35).
Theorem 3. The presented definitions for the pointwise and expected loss measures (i ∪, f and I ∪, f ) provide a non-negative PID on the synergy lattice with an inclusion-exclusion relation that satisfies Axioms 1, 2, and 3* for any f -information measure.

Proof.
• Axiom 1: The measure i ∪, f (Equation (33a)) is invariant to permuting the order of sources in α, since the join operator of the zonogon order ( S∈α ) is.Therefore, also I ∪, f satisfies Axiom 1.
• Axiom 2: The monotonicity of both i ∪, f and I ∪, f on the synergy lattice is shown separately as Corollary A2 in Appendix C. • Axiom 3*: For a single source, i ∪, f equals the pointwise information loss by definition (see Equations ( 26), (28b), and (33a)).Therefore, I ∪, f satisfies Axiom 3*.

•
Non-negativity: The non-negativity of ∆i ∪, f and ∆I ∪, f is shown separately as Lemma A8 in Appendix C.

Operational Interpretation
From a pointwise perspective (|T | = 2), there always exists a dependency between the sources for which the synergy of this state becomes zero.This dependence corresponds, by definition, to the join of their channels.This is helpful for the operational interpretation in the following paragraph since, individually, each pointwise synergy becomes fully volatile to the dependence between the sources.There may not exist a dependency between the sources for which the expected synergy becomes zero for |T | > 2. However, each decision region that is quantified as synergetic becomes inaccessible at some dependence between the sources.
The decomposition obtains the operational interpretation that, if a variable provides pointwise unique information, then there exists a unique decision region for some t ∈ T that this variable provides access to.Moreover, if a set of variables provides synergetic information, then a decision region for some t ∈ T may become inaccessible if the dependence between the variables changes.Due to the equivalence of the zonogon and Blackwell order for binary input variables, these interpretations can also be transferred to a set of actions a ∈ Ω and a pointwise reward function u(a, π(T, t)), which only depends on one state of the target variable π(T, t) (see Section 2.1): If a variable provides unique information, then it provides an advantage for some set of actions and pointwise reward function, while synergy indicates that the advantage for some pointwise reward function is based on the dependence between variables.
The implication of the interpretation does not hold in the other direction, which we will also highlight in the example of I ∪,TV in Section 4.1.Finally, the definition of the Blackwell order through the chaining of channels (Equation ( 2)) highlights its suitability for tracing the flows of information in Markov chains (see Section 4.2).

Remark 7.
The operational interpretation can be strengthened further such that the implication between accessible regions and partial information holds in both directions by revising Lemmas A1 and A2 with a strictly convex generator function to obtain κ 1 ⊏ κ 2 =⇒ i f (p, κ 1 ) < i f (p, κ 2 ).

Decomposition Duality
A non-negative decomposition on the synergy lattice raises the question about its dualdecomposition on the redundancy lattice.Unfortunately, the definition of decomposition duality (Definition 15 [23]) does not specify the mapping between atoms to easily construct dual-decompositions. Therefore, this section discusses how the redundancy and synergy lattice are related by identifying operators that transform one lattice into the other.This transformation can then be used to refine the definition of decomposition duality and, correspondingly, transforms the cumulative measure between lattices.Definition 28.We define two functions: The function Ξ : P (P 1 (V)) → P (P 1 (V)) provides the atom with complement sources, and the function Ψ : P (P 1 (V)) → P (P 1 (V)) is the n-ary Cartesian product.We indicate the i-th source of an atom as α[i] and indicate some variable within the i-th source as x i .
Figure 7 visualizes the relations from the introduced operators to provide an intuition.Applying the function Ψ to all atoms is equal to reversing the redundancy order, while applying the function Ξ to all atoms is equal to swapping the ordering relation used (synergy/redundancy order).With these definitions in place, we can refine the definition of decomposition duality: Lemma 4 (Decomposition duality).A redundancy-and synergy-based information decomposition is pointwise dual if, for all α ∈ A(V) \ {⊥ RL }: A redundancy-and synergy-based information decomposition is dual if, for all α ∈ A(V) \ {⊥ RL }: The proof of Lemma 4 is shown separately in Appendix D. To convert a decomposition from the synergy lattice into its dual-decomposition on to the redundancy lattice, the following relation is particularly useful.It states that, on the synergy lattice, all atoms are either in the up-set of Ξ(Ψ(α)) or in the down-set of an atom that corresponds to an individual source within α.

Lemma 5. For
Proof.When expanding the definition of up-and down-sets, it can directly be seen from Lemma A9 that both sets provide an exclusive partitioning of all atoms.
Figure 8 summarizes and visualizes the required relations for the following transformation of the cumulative measure: (i) The bottom elements of all lattices are mapped to each other and quantified to zero.(ii) The function Ψ reverses the redundancy lattice (β ≃ Ψ(α) such that α ≃ Ψ(β)) to relate the down-set of α to the up-set of β while ignoring the bottom element.The function Ξ captures the relation between both orderings (α ′ ∼ = Ξ(β) such that β ≃ Ξ(α ′ )), to relate the up-set of β on the redundancy lattice to the up-set of α ′ on the synergy lattice.This provides the desired mapping from the down-set of α on the redundancy lattice to the up-set of α ′ on the synergy lattice for duality.Alternatively, we could first transform the down-set of α on the redundancy to the down-set of β ′ = Ξ(α) on the synergy lattice, then reverse the synergy order and obtain the same result.(iii) Lemma 5 states that all atoms on the synergy lattice are either in the up-set of Ξ(Ψ(α)) or in the down-set of {S a } with S a ∈ α.The example α = {S 3 , S 12 } is visualized in Figure 8, and we encourage the reader to view another example such as α = {S 13 }  (iv) Lemma 5 states that, on the synergy lattice, exactly those atoms that are not in the up-set of Ξ(Ψ({S 3 , S 12 })) must be in the down-set of either {S 3 } or {S 12 }.
With these relations in place, we can construct dual-decompositions and prove their correctness.Lemma 6.The pointwise dual-decomposition for the redundancy lattice of a loss measure on the synergy lattice is defined by: The proof of Lemma 6 is shown separately in Appendix D. This section discussed the relation between four decomposition lattices, which are the redundancy and synergy lattice, as well as their reversed counterparts.Additionally, we demonstrated how this relation can be used to transform a cumulative decomposition measure between them.Decomposition duality enforces each lattice to be consistent with its set-theoretic interpretation.The function Ψ corresponds to taking the set-theoretic complement on the redundancy lattice and, thus, reflects on the cumulative measure by subtracting it from the top atom.The function Ξ corresponds to the relation between the union and intersection and, thus, introduces an inclusion-exclusion principle between their cumulative measures.

Decomposing f-Information on the Redundancy Lattice
Using the results from Section 3.4, we can now convert the decomposition of Section 3.2 to the redundancy lattice.The conversion can be applied to both the expected or pointwise measure.The partial contributions (∆i ∩, f and ∆I ∩, f ) are obtained from the Möbius inverse.
Lemma 7 (Dual-decomposition on the redundancy lattice).The definitions of Equation ( 43) correspond to the dual-decomposition of Definition 26.
Proof.The duality of the pointwise measure is obtained from Lemma 6 and Definition 26.
The duality of the pointwise measure implies the duality of the combined measure.
The function i f (P T (t), κ ⊔ (α, T, t)) quantifies the convex hull/blackwell join of the Neyman-Pearson regions of its sources and represents a notion of pointwise union information about the target state t ∈ T .It is used in Equation (33a) to define a pointwise loss measure for the synergy lattice by subtracting it from the total information.As expected, we can see that the corresponding dual-decomposition on the redundancy lattice enforces an inclusion-exclusion relation between our notions of pointwise union information (i f (P T (t), κ ⊔ (α, T, t))) and pointwise intersection information (i ∩, f (α, T, t)).
Theorem 4. The dual-decomposition as defined by Equation (43) provides a non-negative PID, which satisfies an inclusion-exclusion relation and the axioms of Williams and Beer [1] on the redundancy lattice for any f -information.

Proof.
• Axiom 1: The measure i ∩, f is invariant to permuting the order of sources in α, since the join operator of the zonogon order ( S∈α ) is.Therefore, also, I ∩, f satisfies Axiom 1.

•
Non-negativity: The non-negativity of ∆i ∩, f is obtained from Lemma 7 and Theorem 3 as shown in Equation ( 44).The non-negativity of the pointwise measure implies the non-negativity of the combined measure ∆I ∩, f .
• Axiom 2: Since the cumulative measures i ∩, f and I ∩, f correspond to the sum of partial contributions in their down-set, the non-negativity of partial information implies the monotonicity of the cumulative measures.• Axiom 3*: For a single source, I ∩, f equals f -information by definition (see Equation ( 30)).Therefore, I ∩, f satisfies Axiom 3*.
The operational interpretation of Section 3.3 is maintained since the partial contributions are identical between both lattices.Remark 8.The definitions of Equations ( 34) and (43) satisfy the desired property of Bertschinger et al. [9], who argued that any sensible measure for unique and redundant information should only depend on the marginal distribution of sources.

Remark 9.
As discussed before [20], it is possible to further split redundancy into two components for extracting the pointwise meet under the Blackwell order (zonogon intersection, first component).The second component of redundancy as defined above contains decision regions that are part of the convex hull, but not the individual channel zonogons (discussed as shared information in [20]).By combining Equation (43) and Lemma A7, we obtain that both components of this split for redundancy are non-negative.

Decomposing Rényi-Information
Since Rényi-information is an invertible transformation of Hellinger-information and α-information, we argue that their decompositions should be consistent.We propose to view the decomposition of Rényi-information as a transformation from an f -information and demonstrate the approach by transferring the Hellinger-information decomposition to a Rényi-information decomposition.Then, we demonstrate that the result is invariant to a linear scaling of the considered f -information, such that the transformation from α-information provides identical results.The obtained Rényi-information decomposition is non-negative and satisfies the three axioms proposed by Williams and Beer [1] (see below).However, its inclusion-exclusion relation is based on a transformed addition operator.For transforming the decomposition, we consider Rényi-information to be a re-graduation of Hellinger-information, as shown in Equation (45).
To maintain consistency when transforming the measure, we also have to transform its operators ( [19], p. 6 ff.): Definition 29 (Addition of Rényi-information).We define the addition of Rényi-information ⊕ a with its corresponding inverse function ⊖ a by Equation (46).
x ⊕ a y 1)x + e (a−1)y − 1 1)x − e (a−1)y + 1 To transform a decomposition of the synergy lattice, we define the cumulative loss measures as shown in Equation (47) and use the transformed operators when computing the Möbius inverse (Equation (48a)) to maintain consistency in the results (Equation (48b)).Definition 30.The cumulative and partial Rényi-information loss measures are defined as transformations of the cumulative and partial Hellinger-information loss measures, as shown in Equations ( 47) and (48). where: Remark 10.We show in Lemma A11 of Appendix E that re-scaling the original f -information does not affect the resulting decomposition or transformed operators.Therefore, transforming a Hellinger-information decomposition or a α-information decomposition to a Rényi-information decomposition provides identical results.
The operational interpretation presented in Section 3.2 is similarly applicable to partial Rényi-information (∆I ∪,R a , Equation (48b)), since the function v a satisfies v a (0) = 0 and x ≤ 0 =⇒ 0 ≤ v a (x).
Theorem 5.The presented definitions for the cumulative loss measure I ∪,R a provide a non-negative PID on the synergy lattice with an inclusion-exclusion relation under the transformed addition (Definition 29) that satisfies Axioms 1, 2, and 3* for any Rényi-information measure.

Proof.
•  45) and ( 47)), I ∪,R a satisfies the self-redundancy axiom by definition, however, at a transformed operator: Non-negativity: The decomposition of I ∪,R a is non-negative, since ∆I ∪,H a is nonnegative (see Section 3.2), the Möbius inverse is computed with transformed operators (Equation (48b)) and the function v a satisfies x ≤ 0 =⇒ 0 ≤ v a (x).
Remark 11.To obtain an equivalent decomposition of Rényi-information on the redundancy lattice, we can correspondingly transform the dual-decomposition from the redundancy lattice of Hellingerinformation as shown in Equation ( 49).The resulting decomposition will satisfy the non-negativity, the axioms of Williams and Beer [1], and an inclusion-exclusion relation under the transformed operators (Definition 29) for the same reasons described above from Theorem 4.
Remark 12.The relation between the redundancy and synergy lattice can be used for the definition of a bi-valuation [19] in calculations as discussed in [20].This is also possible for Rényi-information at transformed operators.
When taking the limit of Rényi-information for a → 1, we obtain mutual information (I KL ).Since mutual information is also an f -information, we expect its operators in the Möbius inverse to be addition.This is indeed the case (Equation (50)), and the measures will be consistent.lim Finally, the decomposition of Bhattacharyya-information can be obtained by re-scaling the decomposition of Rényi-information at a = 0.5, which causes another transform of the addition operator for the inclusion-exclusion relation.

Evaluation
A comparison of the proposed decomposition with other methods of the literature can be found in [20] for mutual information.Therefore, this section first compares different f -information measures for typical decomposition examples and discusses the special case of total variation (TV)-information to explain its distinct behavior.Since we can see larger differences between measures in more complex scenarios, we compare the measures by analyzing the information flows in a Markov chain.We provide the implementation used for both dual-decompositions of f-information and the examples used in this work in [30].

Comparison of Different f-Information Measures
We use the examples discussed by Finn and Lizier [13] to compare different finformation decompositions and add a generic example from [20].All probability distributions used and their abbreviations can be found in Appendix F. We normalize the decomposition results to the f -entropy of the target variable for the visualization in Figure 9.  [13,20].The example name abbreviations are listed below in Table A1.The measures behave mostly similarly since the decompositions follow an identical structure.However, it can be seen that total variation attributes more information to being redundant than other measures and appears to behave differently in the generic example since it does not attribute any partial information to the first variable or their synergy.
Since all results are based on the same framework, they behave similarly for examples that analyze a specific aspect of the decomposition function (XOR, Unq, PwUnq, RdnErr, Tbc, AND).However, it can be observed that the decomposition of total variation (TV) appears to differ from others: (1) In all examples, total variation attributes more information to being redundant than other measures.(2) In the generic example, total variation is the only measure that does not attribute any information to being unique to variable one or synergetic.We discuss the case of total variation in Section 4.1.2to explain its distinct behavior.
We visualize the zonogons for the generic example in Figure A2, which shall highlight that the implication of the operational interpretation does not hold in the other direction: the existence of partial information implies an advantage for the expected reward towards some state of the target variable, but an advantage for the expected reward towards some state of the target variable does not imply partial information in the example of total variation.

The Special Case of Total Variation
The behavior of total variation appears different compared to other f -information measures (Figure 9).This is due to total variation measuring the perimeter of a zonogon such that the result corresponds to a linear scaling of the maximal (Euclidean) height h * that the zonogon reaches above the diagonal, as visualized in Figure 10.Remark 13.From a cost perspective, the height h * can be interpreted as the performance evaluation of the optimal decision strategy (symmetric point to P * in the lower zonogon half) for a prediction T with minimal expected cost at the cost ratio Cost(T=t, T̸ =t)−Cost(T=t, T=t) Cost(T̸ =t, T=t)−Cost(T̸ =t, T̸ =t) = 1−P T (t) P T (t) (see Equation ( 8) of [31]) for each target state individually.

Lemma 8.
(a) The pointwise total variation (i TV ) is a linear scaling of the maximal (Euclidean) height h * that the corresponding zonogon reaches above the diagonal, as visualized in Figure 10 (Equation (51a)).(b) For a non-empty set of pointwise channels A, pointwise total variation i TV quantifies the join element to the maximum of its individual channels (Equation (51b)).(c) The loss measure i ∪,TV quantifies the meet for a set of sources on the synergy lattice to their minimum (Equation (51c)).
Proof.The proof of the first two statements (Equations (51a) and (51b)) is provided separately in Appendix G, which imply the third (Equation (51c)) by Definition 26.
Quantifying the meet element on the synergy lattice to the minimum has the following consequences for total variation: (1) It attributes a minimum amount of synergy, and therefore more information to redundancy than other measures.(2) For each state of the target, at most one variable can provide unique information.In the case of |T | = 2, the pointwise channels are symmetric (see Equation ( 6)), such that the same variable provides the maximal zonogon height both times.This is the case in the generic example of Figure 9, and the reason why at most one variable can provide unique information in this setting.However, beyond binary targets (|T | > 2), both variables may provide unique information at the same time since different sources can provide the maximal zonogon height for different target states (see the later example in Figure 11).
Analysis of the Markov chain information flow (Equation (A47)).Visualized results for the information measures: KL, TV, and χ 2 .The remaining results (H 2 -, LC-, and JS-information) can be found in Figure A3.Remark 14.Using the pointwise minimum on the synergy lattice results in a similar structure to the proposed measure of Williams and Beer [1].However, TV-information is based on a different pointwise measure i TV , which displays the same behavior (Equation (51b)), unlike pointwise KL-information.χ 2 -information in Figure 11, and the results for H 2 -, LC-, and JS-information in Figure A3 of Appendix H.
All results display the expected behavior that the information that M i provides about M 3 increases for 1 ≤ i ≤ 3 and decreases for 3 ≤ i ≤ 5.The information flow results of KL-, H 2 -, LC-, and JS-information are conceptually similar.Their main differences appear in the rate at which the information decays and, therefore, how much of the total information we can trace.In contrast, the results of TV-and χ 2 -information display different behavior, as shown in Figure 11: TV-information indicates significantly more redundancy, and χ 2information displays significantly more synergy than the other measures.Additionally, the decomposition of TV-information contains fever information flows.For example, it is the only analysis that does not show any information flow from M 2 into the unique contribution of Y 3 or from M 2 into the synergy of (X 3 , Y 3 ).This demonstrates that the same decomposition method can obtain different behaviors from different f -divergences.

Discussion
Using the Blackwell order to construct pointwise lattices and to decompose pointwise information is motivated from the following three aspects:

•
All information measures in Section 2.3 are the expected value of the pointwise information (quantification of the Neyman-Pearson region boundary) for an indicator variable of each target state.Therefore, we argue for acknowledging the "pointwise nature" [13] of these information measures and to decompose them accordingly.A similar argument was made previously by Finn and Lizier [13] for the case of mutual information and motivated their proposed pointwise partial information decomposition.

•
The Blackwell order does not form a lattice beyond indicator variables since it does not provide a unique meet or join element for |T | > 2 [17].However, from a pointwise perspective, the informativity (Definition 2) provides a unique representation of union information.This enables separating the definition of redundant, unique, and synergetic information from a specific information measure, which then only serves for its quantification.We interpret these observations as an indication that the Blackwell order should be used to decompose pointwise information based on indicator variables rather than decomposing the expected information based on the full target distribution.

•
We can consider where the alternative approach would lead, if we decomposed the expected information from the full target distribution using the Blackwell order: the decomposition would become identical to the method of Bertschinger et al. [9] and Griffith and Koch [10].For bivariate examples (|V| = 2), this decomposition [9,10] is non-negative and satisfies an additional property (identity, proposed by Harder et al. [5]).However, the identity property is inconsistent [32] with the axioms of Williams and Beer [1] and non-negativity for |V| > 2. This causes negative partial information when extending the approach to |V| > 2. The identity property also contradicts the conclusion of Finn and Lizier [13] from studying Kelly Gambling that, "information should be regarded as redundant information, regardless of the independence of the information sources" ( [13], p. 26).It also contradicts our interpretation of distinct information through distinct decision regions when predicting an indicator variable for some target state.We do not argue that this interpretation should be applicable to the concept of information in general, but acknowledge that this behavior seems present in the information measures studied in this work and construct their decomposition accordingly.
Our critique for the decomposition measure of Williams and Beer [1] focuses on the implication that a less informative variable (Definition 2) about t ∈ T provides less pointwise information (I(S; T = t), Equation (15a)): κ(S 1 , T, t) ⊑ κ(S 2 , T, t) =⇒ I(S 1 ; T = t) ≤ I(S 2 ; T = t).This implication does not hold in the other direction.Therefore, equal pointwise information does not imply equal informativity and, thus, does not mean being redundant.
We chose to define a notion of pointwise union information based on the join of the Blackwell order since it leads to a meaningful operational interpretation: the convex hull of the pointwise Neyman-Pearson regions is always a subset of their joint distribution.Moreover, it is possible to construct joint distributions for which each individual decision region outside the convex hull becomes inaccessible, even if there may not exist one unique joint distribution at which all synergetic regions are lost simultaneously.This volatility due to the dependence between variables appears suitable for a notion of synergy.Similarly, the resulting unique information appears suitable since it ensures that a variable with unique information must provide access to some additional decision region.Finally, the obtained unique and redundant information is sensible [9] since it only depends on the marginal distributions with the target.The operational interpretation can be strengthened further such that the implication between accessible regions and partial information holds in both directions by revising Lemmas A1 and A2 with a strictly convex generator function.
We perform the decomposition on a pointwise lattice using the Blackwell join since it is possible to represent f -information as the expected value of quantifying the Neyman-Pearson region boundary (zonogon perimeter) for indicator variables (pointwise channels).Since the pointwise measures satisfy a triangle inequality, we mentioned the oversimplified intuition of pointwise f -information as the length of the zonogon perimeter.Correspondingly, if we identified an information measure that behaved more like the area of the zonogon (which could also maintain their ordering), then we would need to decompose it on a pointwise lattice using the Blackwell meet to achieve non-negativity.We assume that most information measures behave more similar to quantifying the boundary length rather than its area, since the boundary segments can directly be obtained from the conditional probability distribution and do not require an actual construction from the likelihoodratio test.
In the literature, PIDs have been defined based on different ordering relations [16], the Blackwell order being only one of them.We think that this diversity is desirable since each approach provides a different operational interpretation of redundancy and synergy.For this reason, we wonder if obtaining a non-negative decomposition with the inclusionexclusion relation for other ordering relations was possible when transferring them to a pointwise perspective or from mutual information to other information measures.
Studying the relations between different information measures for the same decomposition method may provide further insights into their properties, as demonstrated by the example of total variation in Section 4.2.The ability to decompose different information measures is also a necessity to apply the method in a variety of areas, since each information measure can then provide the operational meaning within its respective domains.To ensure consistency between related information measures, we allowed the re-definition of information addition, as demonstrated in the example of Rényi-information in Section 3.6, which also opens new possibilities for satisfying the inclusion-exclusion relation.
There is currently no universally accepted definition of conditional Rényi information.Assuming that I R a (T; S i | S j ) should capture the information that S i provides about T when already knowing the information from S j , then one could propose that this quantity should correspond to the according partial information contributions (unique/synergetic) and, thus, the definition of Equation ( 53).
With this in mind, it is also possible to define, model, decompose, and trace Transfer Entropy [33], used in the analysis of complex systems, for each presented information measure with the methodology of Section 4.2.
Finally, studying the corresponding definitions for continuous random variables and identifying suitable information measures for specific applications would be interesting directions for future work.

Conclusions
In this work, we demonstrated a non-negative PID in the framework of Williams and Beer [1] for any f -information with practical operational interpretation and the conversion of measures between decomposition lattices.We demonstrated that the decomposition of f -information can be used to obtain a non-negative decomposition of Rényi-information, for which we re-defined the addition to demonstrate that its results satisfy an inclusionexclusion relation.Finally, we demonstrated how the proposed decomposition method can be used for tracing the flow of information through Markov chains and how the decomposition obtains different properties depending on the chosen information measure.
The case of a i = 0 is handled by the convention that 0 • f 0 0 = 0. Therefore, we can assume that a i ̸ = 0 and use 0 ≤ b 1 ≤ 1 with b 2 = 1 − b 1 to apply the definition of convexity on the function f : Lemma A2.For a constant p ∈ [0, 1], the function i f maintains the ordering relation from the Blackwell order on binary input channels: Equation ( A11c) is parameterized by g e , and the subsets are closed under set union.Therefore, we can combine all choices for g e and g o using the set-theoretic union as shown below.
For the notation, let m = 2 |A|−1 , and we indicate subsets of A with even cardinality as E i ∈ L e , where 1 ≤ i ≤ m.We use the last index for the empty set E m = ∅.The subsets of A with odd cardinality are correspondingly noted as O i ∈ L o .For clarity, we note binary input channels from an even subset as λ ∈ E and binary input channels from an odd subset as ν ∈ O.

Appendix C. Non-Negativity of Partial f-Information on the Synergy Lattice
The proof of non-negativity can be divided into three parts.First, we show that the loss measure maintains the ordering relation of the synergy lattice and how the quantification of a meet element i ∪, f (α ∧ β, T, t) can be computed.Second, we demonstrate how the inclusion-exclusion inequality of zonogons under the Minkowski sum from Appendix B leads to relating pointwise information measures with respect to the Blackwell order.Finally, we combine these two results to demonstrate that an inclusion-exclusion relation using the convex hull of zonogons is greater than their intersection and obtain the non-negativity of the decomposition by transitivity.
Corollary A3.The cumulative pointwise loss of the meet from two atoms is equivalent to the cumulative pointwise loss of their union for any target variable T with state t ∈ T Proof.The result follows from Lemma A6 and Corollary A2.The decomposition of f -information is non-negative on the pointwise and combined synergy lattice for any target variable T with state t ∈ T : ∀α ∈ A(V).0 ≤ ∆i ∪, f (α, T, t), ∀α ∈ A(V).0 ≤ ∆I ∪, f (α; T).
Proof.We show the non-negativity of pointwise partial information (∆i ∪, f (α, T, t)) in two cases.We write α − S to represent the cover set of α on the synergy lattice and use p = P T (t) as the abbreviation: 1.
Let α ∈ A(V) \ {⊥ SL }, then its cover set is non-empty (α − S ̸ = ∅).Additionally, we know that no atom in the cover set is the empty set (∀β ∈ α − S .β ̸ = ∅), since the empty atom is the top element (⊤ SL = ∅).Since it will be required later, note that the inclusion-exclusion principle of a constant is the constant itself as shown in Equation (A16) since, without the empty set, there exists one more subset of odd cardinality than with even cardinality (see Equation (A3)).
. Visualization of the zonogons from the generic example of [20] in state t = 0.The target variable T has two states.Therefore, the zonogons of its second state are symmetric (second column of Equation ( 6)) and have identical heights.

Appendix G. The Relation of Total Variation to the Zonogon Height
Proof of Lemma 8(a) from Section 4.1.2The pointwise total variation (i TV ) is a linear scaling of the maximal (Euclidean) height h * that the corresponding zonogon Z(κ) reaches above the diagonal, as visualized in Figure 10 for any 0 ≤ p ≤ 1.
Proof.The point of maximal height P * that a zonogon Z(κ) reaches above the diagonal is visualized in Figure 10 and can be obtained as shown in Equation (A45), where ∆⃗ v represents the slope of vector ⃗ v.
The maximal height (Euclidean distance) above the diagonal is calculated as shown in Equation (A46), where P * = (P * x , P * y ).The pointwise total variation i TV can be expressed as the invertible transformation of the maximal euclidean zonogon height above the diagonal as shown below, where ⃗ v = (⃗ v x ,⃗ v y ).

IFigure 1 .
Figure 1.Partial information decomposition representations at two variables V = {V 1 , V 2 }.(a) Desired set-theoretic analogy: Visualization of the desired intuition for multivariate information as a Venn diagram.(b) Representation as redundancy lattice, where the redundancy measure I ∩ quantifies the information that is contained in all of its provided variables (inside their intersection).The ordering represents the expected subset relation of redundancy.(c) Representation as synergy lattice, where the loss measure I ∪ quantifies the information that is contained in neither of its provided variables (outside their union).(d) Information flow visualization: When having two partial information decompositions with respect to the same target variable, we can study how the partial information of one decomposition propagates into the next.We refer to this as information flow analysis of a Markov chain such as T → (A 1 , A 2 ) → (B 1 , B 2 ).

Figure 3 .
Figure 3. Visualizations for Example 1 where |T | = 2. (a) A randomized decision strategy for predictions based on T κ − → S can be represented by a |S| × 2 stochastic matrix λ.The first column of this decision matrix provides the weights for summing the columns of channel κ to determine the resulting prediction performance (TPR, FPR).Any decision strategy corresponds to a point in the zonogon.(b) All presented ordering relations in Section 2.1 are equivalent at binary targets and correspond to the subset relation of the visualized zonogons.The variable S 3 is less informative than both S 1 and S 2 with respect to T, and the variables S 1 and S 2 are incomparable.The shown channel in (a) is the Blackwell join of κ 1 and κ 2 in (b).

Figure
Figure 3bvisualizes an example for the discussed ordering relations, where all observable variables have two states: |S i | = 2 where i ∈ {1, 2, 3}.The zonogon/Neyman-Pearson region corresponding to variable S 3 is fully contained within the others (Z(κ 3 ) ⊆ Z(κ 1 ) and Z(κ 3 ) ⊆ Z(κ 2 )).Therefore, we can say that S 3 is Blackwell inferior (Definition 3) and less informative (Definition 2) than S 1 and S 2 about T. Practically, this means that we can construct an equivalent variable to S 3 by garbling S 1 or S 2 and that, for any sequence of actions based on S 3 and any reward function with dependence on T, we can achieve an expected reward at least as a high by acting based on S 1 or S 2 instead.The variables S 1 and S 2 are incomparable to the zonogon order, Blackwell order, and informativity order, since the Neyman-Pearson region of one is not fully contained in the other.The zonogon shown in Figure3acorresponds to the join under the zonogon order, Blackwell order, and informativity order of S 1 and S 2 in Figure3babout T. For binary targets, this distribution can directly be obtained from the convex hull of their Neyman-Pearson regions and corresponds to a valid joint distribution for (T, S 1 , S 2 ).All other joint distributions are either equivalent or superior to it.When doing this on indicator variables for |T | > 2, then the obtained joint distributions for each t ∈ T may not combine into a specific valid overall joint distribution.

2 = 2 = 0.155106 total pointwise χ 2 - 1 0 2 = 2 = 0.068936 total pointwise χ 2 - 2 Figure 6 .
Figure 6.This example visualizes the computation of χ 2 -information by indicating its results on the representation of zonogons of an indicator variable.(a) For the pointwise information of t 1 , both vectors of the zonogon perimeter are quantified to the sum 0.292653.(b) For the pointwise information of t 2 , both vectors of the zonogon perimeter are quantified to the sum of 0.130068.The final χ 2 -information is their expected value I χ 2 (S; T) = 0.4 • 0.292653 + 0.6 • 0.130068 = 0.195102.

Figure 7 .
Figure 7. Visualization of the functions Ψ and Ξ: The application of function Ψ is equal to reversing the redundancy order, and the application of function Ξ is equal to swapping the ordering relation used between the redundancy and synergy lattice.

Figure 8 .
Figure 8. Visualization of lattice duality and Lemma 5. We abbreviate the notation of sources within this figure by listing the contained visible variables as source index (S 12 = {V 1 , V 2 }).(i)All bottom elements are mapped to each other and quantified to zero.(ii) To identify the dual for α = {S 3 , S 12 } from the redundancy lattice, we first apply the transformation Ψ(α) ≃ {S 13 , S 23 } and, then, Ξ(Ψ(α)) ∼ = {S 1 , S 2 }. (iii) Ignoring the bottom elements, the down-set of α on the redundancy lattice corresponds to the up-set of Ξ(Ψ(α)) on the synergy lattice for duality (gray areas).(iv) Lemma 5 states that, on the synergy lattice, exactly those atoms that are not in the up-set of Ξ(Ψ({S 3 , S 12 })) must be in the down-set of either {S 3 } or {S 12 }.

Figure 9 .
Figure 9.Comparison of different f -information measures normalized to the f -entropy of the target variable.All distributions are shown in Appendix F and correspond to the examples of[13,20].The example name abbreviations are listed below in TableA1.The measures behave mostly similarly since the decompositions follow an identical structure.However, it can be seen that total variation attributes more information to being redundant than other measures and appears to behave differently in the generic example since it does not attribute any partial information to the first variable or their synergy.

Figure 10 .
Figure 10.Visualization of the maximal (Euclidean) height h * at point P * that a zonogon (blue) reaches above the diagonal. b

κ 2 )
(by Equation (28b))Lemma A3.Consider two non-empty sets of binary input channels with equal cardinality (|A| = |B|) and a constant p ∈ [0, 1].If the Minkowski sum for the zonogons of channels in A is a subset of the Minkowski sum for the zonogons of channels in B, then the sum of pointwise information for the channels in A is less than the sum of pointwise information for the channels in B as shown in Equation (A1).

Appendix C. 2 . 1 )
The Non-Negativity of the Decomposition Lemma A7.Consider a non-empty set of of binary input channel A ̸ = ∅ and 0 ≤ p ≤ 1. Quantifying an inclusion-exclusion principle on the pointwise information of their Blackwell join is larger than the pointwise information of their Blackwell meet as shown in Equation (A14).|B|−1 i f p, κ∈B κ Lemma A8 (Non-negativity on the synergy lattice).