Next Article in Journal
A Quantum Probability Approach to Improving Human–AI Decision Making
Previous Article in Journal
Quantum Dynamics Framework with Quantum Tunneling Effect for Numerical Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Algebraic Representations of Entropy and Fixed-Sign Information Quantities

by
Keenan J. A. Down
1,2,* and
Pedro A. M. Mediano
3,4
1
Department of Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, Mile End Road, Bethnal Green, London E1 4NS, UK
2
Department of Psychology, University of Cambridge, Downing Site, Downing Place, Cambridge CB2 3EB, UK
3
Department of Computing, Imperial College London, 180 Queen’s Gate, South Kensington, London SW7 2RH, UK
4
Division of Psychology and Language Sciences, University College London, 26 Bedford Way, London WC1H 0AP, UK
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(2), 151; https://doi.org/10.3390/e27020151
Submission received: 18 November 2024 / Revised: 21 January 2025 / Accepted: 28 January 2025 / Published: 1 February 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Many information-theoretic quantities have corresponding representations in terms of sets. Many of these information quantities do not have a fixed sign—for example, the co-information can be both positive and negative. In previous work, we presented a signed measure space for entropy where the smallest sets (called atoms) all have fixed signs. In the present work, we demonstrate that these atoms have natural algebraic behaviour which can be expressed in terms of ideals (characterised here as upper sets), and we show that this behaviour allows us to make bounding arguments and describe many fixed-sign information quantity expressions. As an application, we give an algebraic proof that the only completely synergistic system of three finite variables X, Y and Z = f ( X , Y ) is the XOR gate.

1. Introduction

1.1. Information Decomposition

The Shannon entropy has many properties which intuitively mirror many set-theoretic identities. The I-measure of Yeung, built on earlier work by Hu Kuo Ting, fleshed out the correspondence between expressions of information quantities and set-theoretic expressions via a formal symbolic substitution [1,2]. Occasionally, however, the I-measure can be seen to conflate qualitatively different behaviours. A classic example of two such systems are the dyadic and triadic systems of James and Crutchfield [3], whose co-information signatures are identical, despite having qualitatively different constructions. In that work, they note that ‘no standard Shannon-like information measure, and exceedingly few nonstandard methods, can distinguish the two’.
One approach for discerning between these two systems is Partial Information Decomposition (PID), which aims to decompose the mutual information between a series of random source variables X 1 , , X n and a target variable T into parts [4,5,6,7,8,9]. These parts are combinations of redundant, unique, and synergistic contributions to I ( X 1 , , X n ; T ) —which are all qualitatively distinct. The last of these pieces, the synergistic information, is information that is provided by multiple sources considered together but no source alone. For example, the perception of depth is severely hindered unless both eyes are recruited, and hence depth information is conferred synergistically. The co-information between three variables is, under the PID framework, the redundant (shared) information minus the synergistic information. For this reason, negative co-information is evidence that the system is exhibiting synergistic behaviour.
Beloved amongst the practitioners of PID is the XOR gate—one of the few examples of synergy on which there is nearly unanimous agreement. We will show in this work that the XOR gate is in fact the only system of three variables X , Y and Z = f ( X , Y ) which can have purely synergistic behaviour.
To accomplish this, we will leverage our previously introduced refined signed measure space Δ Ω , representing a collection of ‘atomic’ pieces of information [10,11]. This decomposition, built from the entropy loss when merging variable outcomes into coarser events (see for instance [12] for more context), allowed us to construct a Shannon-like measure which can discern between the dyadic and triadic systems of James and Crutchfield [3], which was demonstrated in [11]. This decomposition, coupled with the investigation of its algebraic properties in the present work, leads us to the capstone XOR result in Theorem 9.

1.2. Contributions

In the present work, we demonstrate that the constructed space Δ Ω has much natural algebraic behaviour when considered in tandem with the measure μ . We show that the structure of co-information, a standard information quantity [13], can be expressed algebraically inside of Δ Ω , and this description has very stable behaviour under the measure μ .
In Section 2, we present some background on the measure μ and recapitulate the main concepts introduced in [11]. From there, in Section 3, we develop the algebraic theory of this decomposition, introducing a new object, the ideal, and we show that it has some natural properties for simplifying representations of subsets in Δ Ω , highlighting how they can be used to generalise partitions in Ω .
After this, in Section 4, we show that the algebraic structure of these ideals plays uniquely well with the measure μ , allowing us to describe an algebraic property we call ‘strong fixed parity’, which, we show, corresponds to an information quantity having a fixed sign.
Lastly, in Section 5, we use these ideas to investigate the co-information between systems of variables, showing that this can be demonstrably fixed-sign in many cases. We finish with a result showing that the XOR gate is the only purely synergistic deterministic gate in three variables. That is, given three discrete variables X and Y with Z = f ( X , Y ) where X and Y are finite, the XOR is the only system with negative co-information for all probability mass functions on X and Y.
To start, we give a brief recapitulation of the concepts introduced in the previous work [11] here as background. Proofs for all new results can be found in Appendix A.

2. Background

2.1. Background on the Measure

In previous work, we introduced the signed measure space ( Δ Ω , μ ) , where the σ -algebra is taken implicitly as the set of all subsets of Δ Ω . To start, we restate the definition of the space Δ Ω as was given in the first paper [10].
Definition 1. 
Let ( Ω , F , P ) be a finite probability space where the σ-algebra F is given by all subsets of Ω. Then, we define the  complex  (or content) of Ω, written Δ Ω , to be the simplicial complex on all outcomes ω Ω , with the vertices removed:
Δ Ω = k = 2 N Ω k P ( Ω ) { { ω } : ω Ω } { }
where Ω k is the set of all subsets of size k inside of Ω.
This space contains 2 | Ω | | Ω | 1 elements (called atoms) for a given finite outcome space Ω .
Definition 2. 
Given a discrete outcome space Ω, an  atom  is a subset S Ω where | S | 2 .
For a general set S, we use the notation b S for an atom, but where outcomes are explicitly labelled, e.g., S = { 1 , 2 , 3 } , we might write b { 1 , 2 , 3 } , b 123 or simply 123 where this is clear from context.
In order to construct the signed measure space ( Δ Ω , μ ) , we must also define the measure. In the original work [10,11], this representation of the measure is given as a proposition. We give it here as the primary definition.
Definition 3. 
Let T = { p 1 , , p k } be some subset of probabilities of an atom { ω 1 , , ω n } . For clarity, we write
σ ( T ) = σ ( p 1 , , p k ) = ( p 1 + + p k ) ( p 1 + + p k ) .
Taking all subsets of the atom { ω 1 , , ω n } of size k, we write
A k = S { p 1 , , p n } | S | = k σ ( S ) .
Then the measure on the atom is given by
μ ( p 1 , , p n ) = k = 1 n ( 1 ) n k log ( A k ) .
This definition arises from the perspective of entropy loss, which has appeared previously in the literature and has some natural advantages over the classical formulation of entropy [12]. The measure given here is constructed using two steps: firstly, by considering the entropy loss L when a number of outcomes ω 1 , , ω t are merged and treated as a single outcome; and secondly, by performing a Möbius inversion with L over the partially ordered set of subsets of Ω (ordered under inclusion).
Using the measure of loss L alone, while sufficient to derive a measure space (see [14]), does not possess sufficient resolution to capture all information quantities, missing quantities such as the mutual information and co-information. Incorporating the Möbius inversion breaks the construction into smaller pieces which multiple systems might share, creating an additive measure μ .
The entropy loss, as given immediately by a result of Baez et al. [12], is homogeneous of degree d when applied to the d-th Tsallis entropy [15]. By extension, the measure μ , which can be viewed as an alternating sum of the losses L, is also homogeneous of degree d when built on the d-th order Tsallis entropy. Moreover, the measure μ has some intriguing properties, which we shall briefly restate here. The interested reader should refer to the original works [10,11] for more detail.
Example 1. 
Let Ω = { ω 1 , ω 2 , ω 3 , ω 4 } with corresponding probabilities p 1 = 0.1 , p 2 = 0.2 , p 3 = 0.3 and p 4 = 0.4 . The atom b 12 , which we might also write simply as 12 or { 1 , 2 } , has corresponding measure
μ ( ω 1 , ω 2 ) = μ ( 0.1 , 0.2 ) = 0.275 bits .
The atom b 123 , meanwhile, has a negative sign. Using the method given above, this is given by
μ ( ω 1 , ω 2 , ω 3 ) = log 2 ( 0.1 + 0.2 + 0.3 ) ( 0.1 + 0.2 + 0.3 ) · 0 . 1 0.1 · 0 . 2 0.2 · 0 . 3 0.3 ( 0.1 + 0.2 ) ( 0.1 + 0.2 ) · ( 0.1 + 0.3 ) ( 0.1 + 0.3 ) · ( 0.2 + 0.3 ) ( 0.2 + 0.3 )
= log 2 0 . 6 0.6 · 0 . 1 0.1 · 0 . 2 0.2 · 0 . 3 0.3 0 . 3 0.3 · 0 . 4 0.4 · 0 . 5 0.5 = 0.210 bits .
We will see in Theorem 1 that this change in sign is inevitable for certain atoms.
Lemma 1. 
For p 1 , , p n , x R + where n 0 , we have
lim x 0 μ ( p 1 , , p n , x ) = 0 .
This lemma guarantees that the measure becomes null if any of the constituent probabilities are zero.
Lemma 2. 
Let p 1 , , p n 1 , x R + and let x vary. Then
lim x | μ ( p 1 , , p n 1 , x ) | = | μ ( p 1 , , p n 1 ) | .
This result shows that if one of the ‘probabilities’ tends to infinity, then the size of the entropy contribution tends towards that of an atom lying beneath it. Although discrete probabilities cannot tend to infinity, this result will prove useful for bounding arguments as part of Corollary 1 below.
Lastly, as a particularly intriguing property of the measure, its sign is known on all atoms of the partial order.
Theorem 1. 
Let p 2 , , p n R + be a sequence of nonzero arguments for n 2 and m 0 . Then
( 1 ) m + n m μ x m ( x , p 2 , , p n ) 0 .
Setting m = 0 , it becomes clear that the sign of the measure μ on a given atom ω 1 , , ω n is dependent only on the number of outcomes n. The co-information, by contrast, is not a fixed-sign quantity in general. For example, given three random variables, the co-information can be positive, negative, or even dependent on the underlying probabilities.
Coupled with Lemma 2, we have that μ varies monotonically between 0 and the magnitude of the atoms beneath it.
Corollary 1 
(Magnitude can only decrease). Let p 1 , , p n 1 , τ R + { 0 } for n 3 . Then
| μ ( p 1 , , p n 1 , τ ) | < | μ ( p 1 , , p n 1 ) | .
This corollary is intriguing in that it bounds the contribution to the entropy of an atom by all of the atoms which lie under it in the partial order. This can be thought of as the notion that ‘higher-order contributions to the entropy are bounded above by lower-order contributions to the entropy’.

2.2. Ideals in Ring Theory

It may be helpful for some readers to briefly introduce the notion of an ideal as it appears in the algebraic theory of rings, since we will introduce an analogous object in the next section. While the ideals introduced in this work are constructs inside of a lattice (rather than a ring), they are usually first introduced inside of rings, where their structure is intuitive. In addition, there are ways in which it might be natural to extend the definition given in the remainder of this work to an ideal in a ring. Thus, we have chosen to use the name ‘ideal’ rather than ‘order ideal’ (as might be more standard). The reader familiar with the algebraic theory of rings and ideals can confidently skip this subsection.
A ring is, broadly speaking, a set where there exist notions of addition, subtraction, and multiplication (though not division, in general). A standard example is the integers Z or the ring of polynomials in a single variable x with real coefficients R [ x ] .
Definition 4 
(Ideal in a ring). An ideal I over a (commutative) ring R is a subset I R such that I is a group under addition inside of I and closed under multiplication with an element of R. That is, for any x , y I and r R , we have
x I
x + y I
r x I .
Note that because of the first and second requirements, every ring ideal also contains zero.
Ideals capture a notion of dependency between elements in the ring. The presence of one element in the ideal forces those ‘above’ the element to also be contained in the ideal (where the order can be described by multiplication/divisibility). Ideals also have some convenient properties.
Proposition 1. 
Let I , J be two ideals of a ring R. Then
I J = { x R : x I and x J }
I + J = { x + y : x I , y J }
are both ideals.
A classic example of an ideal in Z is n , which is the set of all numbers which are divisible by an integer n. If a and b are elements of n (i.e., they are both divisible by n), then we certainly must have that a + b n (their sum is also divisible by n), and multiplying by any number r Z will force a r n , as the factor of n is still present.
Ideals also play a large role in algebraic geometry, where polynomial rings are a natural point of study. In this scenario, the ideal f ( x ) R [ x ] is the set of all polynomials which contain f ( x ) as a factor. Equivalently, it is the set of polynomials which, given that f ( x ) = 0 , must also be zero.
In much the same way that knowledge that n is divisible by 2 implies 3 n is divisible by 2, knowledge that two outcomes ω 1 , ω 2 are distinct automatically provides knowledge that some pair inside of ω 1 , ω 2 , ω 3 is distinct. This is the structure of dependency which we make use of when restating Definition 5 below.

3. An Algebraic Perspective on Entropy

3.1. Representing Information Quantities Inside Δ Ω

We briefly state a key result from our previous work [11], where we expressed the entropy associated to a random variable X in terms of a subset of Δ Ω .
Definition 5. 
Given a random variable X, we define the  content  Δ X inside of Δ Ω to be the set of all atoms inside of Δ Ω crossing a boundary in X. That is, if X corresponds to a partition P 1 , , P n , then
Δ X = { b S : S Ω , ω i , ω j S with ω i P k , ω j P l such that k l } .
Intuitively, this means that at least two of the outcomes in the atom b ω 1 ω n correspond to distinct events in X, although possibly more. We will in general make use of Δ to represent the logarithmic decomposition functor from random variables and information quantities to their corresponding sets in Δ Ω . Note that we often write 123 to refer to b 1 , 2 , 3 for added readability.
As expected, we have that μ ( Δ X ) = H ( X ) , and we concretise this in a theorem, which is taken from [11].
Theorem 2. 
Let R be a region on an I-diagram of variables X 1 , , X r with Yeung’s I-measure. In particular, R is given by some set-theoretic expression in terms of the set variables X ˜ 1 , , X ˜ r under some combination of unions, intersections and set differences.
Making the formal substitution
X ˜ 1 , X ˜ 2 , , X ˜ r Δ X 1 , Δ X 2 , , Δ X r
to obtain an expression Δ R , the content corresponding to the region R of the I-diagram, in terms of the Δ X i , we have
I ( R ) = B Δ R μ ( B ) .
That is, the interior loss measure μ is consistent with Yeung’s I-measure.
For examples on how this measure can be interpreted geometrically, as well as all proofs of the above results, we refer the interested reader to the previous work [11], where we present figures and diagrams demonstrating the geometric and set-theoretic significance of the atoms of our construction.
Many questions about the underlying structure of this space remain to be answered. One peculiarity is that most atoms do not normally appear alone in information quantities. For example, given that an atom ω 1 ω n Δ X appears in a content, we must also have that the atom ω 1 ω n ω n + 1 Δ X appears in the same content, as the definition is just those atoms which, as a set, cross a boundary in X. While all atoms have an interpretation of crossing boundaries in partitions, individual atoms, at first, do not seem to have much meaning without other atoms in context. Understanding the structural interrelationship between all atoms would allow for a better understanding of the relationship between different information measures.
In the rest of this section, we explore the structure of our decomposition in the language of posets and upper sets (or ideals) on those posets, which appear to provide the natural language for the analogous ‘molecules’ to our atoms. We begin by defining an order ≼ on our atoms before giving a definition for ideals in Δ Ω . From there, we will show that all co-information expressions correspond to ideals and vice versa. We finish this section by characterising the ideals which correspond to the entropy of a variable.

3.2. Ideals in Δ Ω

Definition 6. 
Let b S 1 , b S 2 Δ Ω where S 1 , S 2 Ω . We define a partial ordering on the set Δ Ω by setting b S 1 b S 2 whenever S 1 S 2 .
The following definition is taken from [16].
Definition 7. 
Given a (partially) ordered set P, a subset J P is called an  order ideal  (upper-set,  up-set,  increasing set) if, for all x J and y P , we must have y J whenever x y and J . That is, J is non-empty and closed under ascending order.
Following standard language, we will say that an ideal J is  generated  by a collection of elements g 1 , , g t if, for all b J , we have g i b for at least one g i { g 1 , , g t } . We will write J = g 1 , , g t .
That is to say, the ideal J is the set in Δ Ω which contains g 1 , , g t and all elements which lie above them in the order.
We note that we deviate from standard nomenclature in this case and refer to these upper sets simply as ‘ideals’. In classical order theory, ideals in lattices are down sets rather than upper sets and are subject to an additional constraint. In the current work, we use ideal in the order-ideal sense, as we expect that future work on Δ Ω as a ring might make this definition more intuitive.
We will concern ourselves later with the relationship between the generators of an ideal and the measure of the ideal itself. The following definition for the degree of an atom is taken from [11], which we then extend to ideals.
Definition 8. 
Let b = ω 1 ω d Δ Ω . We define the  degree  of b to be the number of outcomes it contains. That is, deg ( b ) = d .
Definition 9. 
We will call J a  degree n ideal  or  n-ideal  if it can be generated by purely degree n atoms.
One significant motivation for introducing the language of ideals is to simplify the description of the sets constructed by the decomposition. Rather than writing out the complete set of all atoms, it is often possible to write out the generators of the set as an ideal, vastly reducing the complexity of the notation. Much like ideals in ring theory, it is straightforward to describe the intersection and union of ideals using the generators alone. We introduce some notation and give a proposition to this effect.
Notation 1. 
To further the parallel to ideals in rings, it is sometimes useful to introduce multiplicative notation for the union of two atoms. That is, we will make use of the notation
b S T = b S b T .
For example, using the shorthand notation from before, we have 123 · 234 = 1234 .
Proposition 2. 
Let G = { g 1 , , g n } be a set of generators for the ideal I = G = g 1 , , g n , and let H = { h 1 , , h m } be a set of generators for the ideal J = H = h 1 , , h m , where I and J are ideals inside Δ Ω . Then,
I J = g 1 , , g n , h 1 , , h m = b b G H ,
I J = g 1 h 1 , g 1 h 2 , , g n h m 1 , g n h m = g h g G , h H
This formulation mimics the natural behavior of ideals in rings.
Remark 1. 
It is occasionally convenient for notation to consider ideals generated by single outcomes ω even though we formerly excluded these and from Δ Ω . We may alternate between including and excluding these atoms for algebraic simplicity. Recall that the singlet and empty atoms do not contribute to the entropy, so this choice of notation does not affect the measure (we note also that including these entities would endow Δ Ω with the complete structure of a lattice, which, although currently not required, might be useful for future work).
Example 2. 
Let Ω = { 1 , 2 , 3 , 4 } . Then Δ Ω = { 12 , 13 , 14 , 23 , 24 , 34 , 123 , 124 , 134 , 234 , 1234 } . Inside of Δ Ω , an ideal consists of all atoms which ‘contain’ a generator. For example, the ideal 12 , 13 = { 12 , 13 , 123 , 124 , 134 , 1234 } is a 2-atom ideal, as it is generated by degree 2 atoms. All of the atoms in this ideal must contain either outcomes 1 and 2 or outcomes 1 and 3, or both.

3.3. Representation of Quantities with Ideals

We should justify that these ideals are a natural object of study. As it turns out, all entropy expressions without multiplicity and without conditioning are given by ideals, which follows from the next lemma.
Lemma 3. 
Given any co-information I ( X 1 ; ; X t ) (including entropy and mutual information), the corresponding content Δ X 1 Δ X t is an ideal.
Example 3. 
It is worth noting that ideals themselves do not, in general, have corresponding partitions, but every partition has a corresponding ideal. As an example for how to conceptualise these ‘sub-partitions,’ consider the system Ω = { 1 , 2 , 3 } where X has partition { { 1 } , { 2 , 3 } } and Y has partition { { 1 , 3 } , { 2 } } as per Figure 1.
We have that
Δ X = { 12 , 13 , 123 } and
Δ Y = { 12 , 23 , 123 } .
Taking the mutual information between these sets corresponds algebraically to the intersection Δ X Δ Y = 12 , 13 12 , 23 = 12 = { 12 , 123 } .
We note that this upper set 12 corresponds to the ability to discern between 1 and 2, but not between 1 and 3, or 2 and 3. Moreover, the upper set 12 , despite not representing a partition itself, gives the mutual information when measured, i.e., μ ( 12 ) = I ( X ; Y ) .
That is to say, generalising from the language of partitions to the language of ideals has allowed us to properly describe mutual information—a quantity which partitions cannot in general represent.
As it turns out, the converse to Lemma 3 also holds, which we state here.
Theorem 3. 
Let Ω be a finite outcome space and let { X a : a A } be the collection of all possible random variables defined on Ω (indexed by A). Then, there is a one-to-one correspondence
{ ideals in Δ Ω } { possible co - informations on Ω }
where the co-informations are given by I ( X 1 ; ; X j ) for any number of arbitrary variables X 1 , , X j defined on Ω.
This result tells us that for any valid co-information on some collection of variables defined on an outcome space Ω , then there is a corresponding ideal in Δ Ω , and for every ideal in Δ Ω , there is a corresponding collection of variables which give the resulting co-information. As an immediate side effect of this result, we have an alternative derivation of proposition 33 in [11]:
Corollary 2. 
Let Ω be a finite outcome space. Then, there is a one-to-one correspondence
{ subsets of Δ Ω } entropy expressions without multiplicity on Ω ,
where by an ‘entropy expression without multiplicity’ we mean an expression of the form
P partitioning Ω n P H ( P )
for n P Z where no region is double-counted in any I-diagram.
Note that for the purposes of these two results, we do not consider the singlet atoms { ω } or the empty set to be elements of Δ Ω , as they contribute no entropy (see [11] for more justification).
This result shows that with clever inclusion and exclusion, it is always possible to extract individual atoms as classical entropy expressions on variables in Δ Ω . That is, they form a natural basis for entropy expressions. As such, the atoms of Δ Ω are uniquely placed for a module-theoretic or vector-space perspective on information.
Since these atoms appear to be a natural basis for entropy expressions, if we count them without multiplicity, we are able to determine how many expressions for information exist without accounting for the same contribution multiple times. We give a corollary to this end.
Corollary 3. 
Given a finite outcome space Ω with | Ω | = n , there are 2 2 n n 1 possible classical entropy expressions without multiplicity.
Counting with multiplicity, we can see that the space of all entropy expressions on Ω is a free module over Z , where the atoms b form a very natural basis.
We now state a practical result which tells us, intuitively, exactly which ideals correspond to partitions and how we can find the generators of the ideal corresponding to a finite random variable X. For the purpose of this result, it is again useful to consider the singlets { ω } , but the resulting representation of Δ X will not contain them.
Theorem 4. 
Let X be a discrete random variable on the outcome space Ω, where X has corresponding partition Q t : t T for some indexing set T of parts Q t . Then Δ X as an ideal is given by
Δ X = a , b T a b { ω Q a } { ω Q b } .
We note in particular that in posets, the union of order ideals is equal to the order ideal with the union of their generators. Equivalently, we have
Δ X = t T { ω Q t } { ω Q t c } = t T { ω ω ¯ : ω Q t , ω ¯ Q t c } .
In particular, Δ X as an ideal is generated by 2-atoms.
Example 4. 
Consider the outcome space Ω = { 1 , 2 , 3 , 4 } . Now, let X be the variable with partition { { 1 , 2 } , { 3 } , { 4 } } . Then, as an ideal, we have
Δ X = ( 1 , 2 3 ) ( 1 , 2 4 ) ( 3 4 ) = 13 , 23 14 , 24 34 = 13 , 23 , 14 , 24 , 34 .
Corollary 4. 
Let X a , a A be a family of discrete variables on the outcome space Ω. Knowledge of how the 2-atoms are located among the Δ X a is sufficient to describe how all other atoms are located.
Restating this, knowledge of the 2-atoms contained in each Δ X a is sufficient to deduce the presence of any atom in any set-theoretic expression constructed using the Δ X a .
We have now successfully described the structure of entropy through the algebraic lens of ideals in a poset and illustrated that ideals in this lattice correspond to co-informations, while other subsets of Δ Ω correspond to entropy expressions on Ω .
To illuminate the power of this flavour of the theory, in the next section, we shall see how these ideals interact with the measure μ and use our results to demonstrate that mutual information is always given by a degree 2 ideal. Not only this, but we will give a generalisation which bounds the degree of the generators for ideals representing the intersection of more than two variables. We then extend these techniques to explore ideals giving fixed-sign information quantities. This intriguing result will show that a surprising amount can be learned about an information quantity without much knowledge of the underlying probabilities.

4. Properties of the Measure on Ideals

We have now developed lots of language for discussing the ideals inside of the lattice Δ Ω . Moreover, having seen that co-information is perfectly described by these ideals, it would be a natural question to ask how the measure μ interacts with the ideal structure. In this section, we will demonstrate that the entropy contribution of an ideal can, much like atoms, be neatly categorised as either positive or negative in many cases, and we shall see that this provides various tools for constructing new bounds.
In this section, we begin by demonstrating that the mutual information is always given by a degree 2 ideal. To accomplish this, we shall need the following notion of restriction, which we shall utilise in the proofs to follow.
Definition 10. 
Let X be a random variable on a finite outcome space Ω, and let S Ω . We define the  restriction  of a collection of atoms W Δ Ω to S as
W S = { b Q W : Q S } .
In particular, we will use the notation Δ X S and S or occasionally · | S to construct contents and ideals inside of restrictions.
Restriction simply allows us to focus our attention on a subset of the atoms—in particular, those whose outcomes all belong to the restricting subset S. Note that given some subset W Δ Ω , we have that W S W rather than operating with an entirely new class of atoms.
One of the strengths of the measure μ beyond entropy alone is that μ is homogeneous and works across multiple scales. As such, every statement and piece of structure given here for Δ Ω and ideals therein also applies to Δ Ω S for S Ω . We shall demonstrate that many problems exploring the intersection of ideals (and hence the intersection of entropies) can be much simplified by restricting. We proceed with the first result on mutual information, where we use this concept in the proof.
Theorem 5 
(Mutual information is a degree 2 ideal). Let X and Y be two random variables. Then, there exists a set of 2-atom generators { a i b i : a i , b i Ω } for i = 1 , , k such that
I ( X ; Y ) = μ ( a 1 b 1 , , a k b k )
We have demonstrated something rather intriguing: mutual information looks a lot like a normal variable content in that it is always generated by degree 2 atoms, but the generators of mutual information do not need to correspond to a representable subset of Δ Ω . When working with ideals in general, one would expect that the intersection between generators of degree m and n would have degree bounded above by m + n , so it is rather surprising that the mutual information has this property.
Extending the investigation of the Gács–Körner common information in [11], the following can now be seen:
Corollary 5. 
The Gács-Körner common information is generated by degree 2 atoms, and the generating set is the largest subset of generators of the mutual information, which is representable by some random variable (in [10] this property is referred to as ‘discernibility’).
This result confirms our natural intuition for selecting generators to construct a variable. We provide also a generalisation of Theorem 5 to co-information.
Theorem 6. 
Let Ω be the joint outcome space of M discrete variables X 1 , , X M . Then, the content of I ( X 1 ; ; X M ) can be completely generated by atoms of degree at most M.
This result states that the degree of the generators of an ideal corresponding to some co-information is always bounded above by the number of variables. This result vastly reduces the search space of generators when studying the properties of co-information, and we make use of it in our study of fixed-parity systems in the next section.
Example 5. 
Consider the standard OR gate given by outcomes ( X , Y , Z = OR ( X , Y ) ) , which we label as follows:
XY Z = OR ( X , Y ) Outcome ( ω )
0001
0112
1013
1114
Note that in this instance, we have
I ( X ; Y ; Z ) = μ ( 14 , 123 ) ,
which is generated by at most degree 3 atoms, as expected. Note that we are discussing the structure of the OR gate without any mention of the probabilities. A diagram representing the structure is given in Figure 2. As expected, the degree of the generators is bounded above by 3.
Although this is an interesting representation of the structure of the co-information between random variables, we have not said much yet about the relationship between these ideals and their measures μ ( J ) . As it turns out, for certain classes of ideals, the sign of μ ( J ) is just as easy to characterise as the signs of the atoms themselves.
Lemma 4. 
Let J = ω 1 ω d be an ideal generated by a single degree d atom with  P ( ω 1 ) , , P ( ω d ) 0 . Then, ( 1 ) d μ ( J ) > 0 .
This result is quite powerful, as it tells us that in certain scenarios, we can know the sign of the ideal and the information measure it represents without any knowledge of the probabilities. We will strengthen this result shortly to demonstrate that certain classes of ideals, which we call strongly fixed-parity ideals, have fixed-sign measures.
Definition 11. 
Let J = g 1 , , g j be an ideal. If j = 1 , we say that J is  strongly fixed parity, and set the  parity  of J as P ( J ) = ( 1 ) deg g 1 .
Moreover, if j 2 , we shall say that J has  strongly fixed parity  if there is an expression
μ ( J ) = P ( J ) α A P ( J α ) μ ( J α )
for some finite collection of fixed-parity ideals { J α : α A } , with the equality holding across all probability distributions P on Ω and some P ( J ) { 1 , 1 } , which we call the  parity  of J.
Lastly, we shall say that an ideal J Δ Ω is of  strongly mixed parity  if it has generators of both even and odd degree.
Example 6. 
The ideal 12 , 23 is strongly fixed even parity as
μ ( 12 , 23 ) = μ ( 12 ) + μ ( 23 ) μ ( 12 23 ) = μ ( 12 ) + μ ( 23 ) μ ( 123 ) .
The ideal 123 has strong negative (odd) parity, and the two degree 2 ideals have strong positive (even) parity. When composing the parities, the measure of the whole set is given by three positive parts, so it makes sense to call 12 , 23 positive fixed-parity.
Theorem 7 
(Ideals of Strong Parity). Let J be an ideal in Δ Ω . If J is of strongly even parity, then μ ( J ) 0 , and if J is of strongly odd parity, then μ ( J ) 0 .
This result is most pleasant as it reflects what feels like a natural intuition for how these systems should behave. In particular, given a system of variables X 1 , , X n , defined by partitions on a finite outcome space Ω , information quantities which reflect strongly fixed-parity ideals have a predetermined sign for any underlying probability distribution over Ω .
In this section, we demonstrated that the algebraic construction of ideals in our poset from the previous section plays remarkably well with the measure μ of our construction, and we developed several tricks for manipulating expressions in Δ Ω . We build on this theory in the next section and apply it to the problem of finding purely synergistic systems of the form X, Y, and Z = f ( X , Y ) .

5. Fixed-Parity Systems

To motivate our investigation, we give here an example of how certain information quantities do not have fixed sign.
Example 7. 
Let X and Y be binary variables with Z = OR ( X , Y ) .
Recall that the co-information (also known as the interaction information [13,17]) is given by
I ( X ; Y ; Z ) = H ( X ) + H ( Y ) + H ( Z ) H ( X , Y ) H ( X , Z ) H ( Y , Z ) + H ( X , Y , Z ) .
In the case that P ( X = x , Y = y ) = 0.25 for all outcomes ( x , y ) Ω , we have that the co-information is I ( X ; Y ; Z ) 0.19 bits, being negative in this case. However, in the case that P ( X = 0 , Y = 0 ) = P ( X = 1 , Y = 1 ) = 0.45 and P ( X = 0 , Y = 1 ) = P ( X = 1 , Y = 0 ) = 0.05 , then we have I ( X ; Y ; Z ) 0.52 bits. That is, for this system (and many others), knowledge of the structure of the outcomes alone (that is, any prior knowledge that certain combinations of symbols have zero probability) is not sufficient to determine the sign of the co-information, and it depends upon the underlying probabilities of the system states. We shall see why this is the case for the OR gate in particular in this section.
We now give a definition to connect the algebraic perspective on ideals to information quantities.
Definition 12. 
Let X 1 , , X n be a system of variables on a finite outcome space Ω. We say that the system X 1 , , X n is a  fixed sign  or  fixed parity  system if the sign of
I ( X 1 ; ; X n )
is fixed regardless of the underlying probability distribution. Similarly, we say that an entropy expression E ( X 1 , , X n ) defined on Ω has  fixed parity  if its sign is always fixed regardless of the underlying probabilities in X 1 , , X n . We say that the system is  negative/odd fixed parity  if the co-information is always negative, and it is  positive/even fixed parity  if the co-information is always positive.
There is a natural dual question to be asked here: is it possible to have a fixed-parity system where the co-information is a strongly-mixed ideal—that is, generated by elements of varying degrees? The next theorem shows this is, in fact, impossible.
Theorem 8. 
A system of variables X 1 , , X n with I ( X 1 ; ; X n ) given by an ideal of strongly mixed-parity cannot have a fixed parity.
That is to say, being strongly-mixed (an algebraic property) implies mixed signs (a property of the measure on the set).
This theorem gives a partial converse to Theorem 7 in that it gives us a way to characterise some mixed-parity systems. Although we have not characterised every fixed-parity or mixed-parity system, we expect such a characterisation in terms of ideal properties (algebraic reasoning alone) should be possible. However, the tools we have already constructed are sufficient to show the main result of this paper.
Theorem 9. 
The only negative fixed-parity (always synergy-dominated) system given by two finite variables X , Y and a deterministic function Z = f ( X , Y ) is the XOR gate.
Of note is that the XOR gate is generated by the presence of three degree 3 atoms, as depicted in Figure 3. Each of these atoms generates the synergistic effect. While they all exhibit a change in X, Y and Z, the places where these changes are seen are distinct between the variables. There is no single degree 2 atom ω 1 ω 2 (the knowledge that ω 1 and ω 2 are distinct) which the three variables share.

6. Discussion

In this paper, we extended our results on the measure μ from [10,11] to an algebraic construction inside of the space Δ Ω . We demonstrated that in many cases, the study of ‘ideals’ (in the order-theoretic sense) inside of Δ Ω simplifies bounding problems, and we showed that these ideals form a natural intermediate language between partitions and have useful behaviour in tandem with the measure μ .
While in the present work, the issue of bounding is applied to the study of fixed-sign quantities, we expect these techniques can be used in multiple scenarios where bounding over all possible probabilities is required. Moreover, we expect that there is a stronger version of Theorem 7 which might be stated with weaker restrictions on the underlying ideals, and a full characterisation of all fixed-parity systems would be an insightful direction. Future work may develop this theory.
One particularly intriguing result given here is that the underlying cause of three-variable synergy appears to be easily characterisable by its geometry alone. The three-outcome ‘flower’ shape presented in Figure 3 is required for the existence of synergy in three variables, being the only generator of three-variable co-information which has a negative measure, with an intrinsically three-dimensional shape. We leave here the problem of classifying the generators of various orders open. While it is straightforward to see all such effects now for two or three variables, the case beyond three variables appears to be more opaque. In particular, we expect that there may be vastly many new information effects in four or more variables which cannot geometrically exist in simpler systems.
We applied the new algebraic results developed in this paper to show that the XOR gate is the only purely synergistic system (i.e., always possessing negative co-information) of finite variables X , Y and Z = f ( X , Y ) for a deterministic function f (see also [18]). In particular, we highlight that this was achieved with a purely algebraic proof that did not require any navigation of the space of probability distributions.
We hope this work might be applied to the problem of Partial Information Decomposition (PID) [3,5,6,8,19] and contribute to the widening body of knowledge in set-theoretic information theory.

Author Contributions

Conceptualisation, K.J.A.D.; writing—original draft preparation, K.J.A.D.; writing—review and editing, K.J.A.D. and P.A.M.M.; supervision, P.A.M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank Dan Bor, Fernando Rosas and Abel Jansma for helpful input on the contents of this work and future applications.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proofs for Results

The proof for Theorem 2 and results in Section 2 can be found in [11], where we also give an alternative expression for Definition 3.
Proof of Proposition 2. 
Suppose b I J . Then, either b g or b h for some g G or h H . Hence, b g 1 , , g n , h 1 , , h m as needed. Conversely, if b is contained in g 1 , , g n , h 1 , , h m , then b b for some b { g 1 , , g n , h 1 , , h m } , so b must be contained in either I or J, so b I J .
Suppose that b I and b J . Then, there exists g G and h H with b g and b h . Hence, b g h , so b g 1 h 1 , , g n h m . Conversely, if b g 1 h 1 , , g n h m , then there exists a generator g h of g 1 h 1 , , g n h m with b g h . Since b g h , we must have b g and b h , so h I and h J , so h I J . □
Proof of Lemma 3. 
By the definition of content, we must have that Δ X i is an ideal. Moreover, the intersection of ideals is an ideal, so the co-information I ( X 1 ; ; X t ) must also correspond to an ideal. □
Proof of Theorem 3. 
Firstly, we note that all co-informations correspond to ideals by the previous result in Lemma 3. It suffices to show that all ideals in Δ Ω correspond to some co-information. As such, given an ideal J, we need to find a collection of variables X 1 , , X t where I ( X 1 ; ; X t ) = μ ( J ) .
Ideals are unique up to their generators, and we only need to consider sets of generators which are not contained in each other; otherwise, one of them is not needed as a generator. For each generator g i , let n i = deg g i . Let S i Ω be the set of elements in g i . Then, consider the 2 n i 2 variables given by the partitions
X i j = { ( Ω S i ) Q i j , S i Q i j }
for every Q i j S i a non-empty proper subset of S i with j { 1 , , 2 n i 2 } . Intuitively, these variables spread the elements in g i across two partitions in every combination possible, so that the only guaranteed boundary crosses consistent across the entire collection X i j are by the g i atom and atoms in g i . Then
Q i j S i Δ X i j = g i .
Here, we write Q i j to symbolise that these partitions are taken to obtain atom g i . Across the set of generators g 1 , , g k , we consider all possible products
Y j 1 j 2 j k = X 1 j 1 X 2 j 2 X k j k = 1 i k X i j i .
where any combination from the j i where 1 j i 2 n i 2 can be taken. Note that we write A B to mean the coarsest partition which is finer than A and B. In practice, every variable corresponds to choosing one of the Q j for each generator, so every single combination is represented as a variable. Then
j 1 , , j k Δ X j 1 j 2 j k = g 1 , , g k
exactly, as any other generators will be removed, giving the result. □
Proof of Corollary 2. 
Given any atom b, we can consider the ideal I = b , and then using conditioning (in the sense of a set difference of information), we may subtract the higher co-information C = { ω b b ω } , which is itself an ideal, as the unions of ideals in lattices are ideals. We have that C I , meaning we may condition it out in order to obtain the expression I C = { b } . That is, b alone populates some region on an I-diagram between all variables X a .
Taking any collection of atoms hence corresponds to a collection of regions on the maximal I-diagram provided they are not counted with multiplicity. □
Proof of Corollary 3. 
There are 2 n n 1 elements in Δ Ω , as the points and the empty set do not contribute to the entropy, leaving 2 n n 1 atoms, and hence 2 2 n n 1 possible entropic expressions without multiplicity, including the zero expression. □
Proof of Theorem 4. 
Precisely those atoms Δ X are all those atoms which cross a boundary in X, that is, they must contain the pair ω a ω b for ω a P a and ω b P b where P a and P b are different parts in the partition. The atom ω a ω b can be written as the intersection of these two prime ideals.
Since the union of ideals I 1 and I 2 is the ideal generated by the union of their generators in lattices, we have that Δ X must be the union across these parts and hence generated by 2-atoms.
The second expression follows quickly from the first; every atom in Q a c must lie in some other part Q b and vice versa. □
Proof of Corollary 4. 
All other atoms are described by whether or not they are contained in the intersection of the variable ideals Δ X a , which by the previous result are generated by degree 2 atoms. Hence, the knowledge of how these are distributed will describe the distribution of all other atoms. □
Proof of Theorem 5. 
We know that Δ X and Δ Y are both degree 2 ideals as they are given by the union of intersections of prime degree 1 ideals, so their intersection Δ X Δ Y can have generators of at most degree 4. Hence, we need to demonstrate that for every degree 3 or degree 4 generator in Δ X Δ Y , there is a degree 2 generator which contains it.
We demonstrate that every degree 4 atom in Δ X Δ Y is contained in a degree 2 ideal. The argument for the degree 3 atoms is very straightforward and uses the same trick. Suppose that we have a degree 4 atom ω 1 ω 2 ω 3 ω 4 which crosses a boundary in X and in Y. We may restrict to just these four outcomes ω 1 , ω 2 , ω 3 , ω 4 , on which the partition of X and the partition of Y must now also restrict to a partition.
Since ω 1 ω 2 ω 3 ω 4 is contained in Δ X { 1 , 2 , 3 , 4 } and Δ Y { 1 , 2 , 3 , 4 } , we must have that the local partition of X and the local partition of Y are non-trivial so that ω 1 ω 2 ω 3 ω 4 crosses a boundary in this partition.
Without loss of generality, the potential local partitions of any random variable Δ Q { 1 , 2 , 3 , 4 } up to reordering of the ω i are given by
12 , 13 , 14 13 , 23 , 14 , 24 12 , 13 , 14 , 23 , 24 12 , 13 , 14 , 23 , 24 , 34
In particular, the total number of possible degree 2 generators in four outcomes is C 2 4 = 6 , so taking the intersection of Δ X { 1 , 2 , 3 , 4 } Δ Y { 1 , 2 , 3 , 4 } will, by the pigeonhole principle, have a degree 2 atom in their intersection unless both Δ X { 1 , 2 , 3 , 4 } and Δ Y { 1 , 2 , 3 , 4 } have at most three degree 2-atoms. Of the four possibilities above, only the first satisfies this possibility, so both X and Y are of this form.
Without loss of generality, we assume Δ X { 1 , 2 , 3 , 4 } is given by 12 , 13 , 14 . The only possible degree 2 ideal which does not intersect with 12 , 13 , 14 is given by 23 , 24 , 34 , so we should expect that Δ Y 1 , 2 , 3 , 4 = 23 , 24 , 34 . However, this does not correspond to any partition on { 1 , 2 , 3 , 4 } , as it does not contain a generator containing element 1. Thus, Y cannot have this form, and we must have that Δ X { 1 , 2 , 3 , 4 } Δ Y { 1 , 2 , 3 , 4 } must intersect and contain a degree 2 element, so the degree 4 atom ω 1 ω 2 ω 3 ω 4 is contained in a degree 2 ideal.
For any degree 3 atom, the argument is even simpler; the smallest possible ideal Δ X { 1 , 2 , 3 } must have either 2 or 0 generators of degree 2 when restricted. If it had 0 generators, then { 1 , 2 , 3 } cannot cross a boundary in X, so it would not be present in Δ X Δ Y . Hence, we must have Δ X { 1 , 2 , 3 } , which has at least two generators of degree 2 with the same being true for Y. As such, they must intersect with each other at a degree 2 atom by the pigeonhole principle, as the total number of possible generators is C 2 3 = 3 . □
Proof of Corollary 5. 
Using a result from [10], we have that C GK ( X ; Y ) = Rep ( Δ X Δ Y ) (the maximally representable subset inside of Δ X Δ Y ). We have now also shown that both Δ X Δ Y and Rep ( Δ X Δ Y ) are degree 2 ideals. Hence, the generators of the representable subset must be a subset of the generators of the mutual information. □
Proof of Theorem 6. 
We proceed by induction on M by showing that the theoretical minimum number of generators must still be large enough to force an overlap. We have demonstrated in the previous theorem that the statement is true for M = 2 . Suppose that the statement is true for M 1 , then the ideal corresponding to the co-information I ( X 1 ; ; X M 1 ) for the first M 1 variables has generators of degree at most M 1 . Multiplying the generators of Δ I ( X 1 ; ; X M 1 ) by the generators for Δ X M , we hence know that Δ I ( X 1 ; ; X M ) can be generated by atoms of at most degree M + 1 . Hence, we need to show that any degree M + 1 atom is actually contained in a degree M ideal.
We will use a similar counting argument to result Theorem 5. In particular, given a finite set of size k and two subsets of size a 1 and a 2 , the minimum size of their intersection is given by a 1 + a 2 k . Given three subsets, a minimum size for the intersection is then given by ( a 1 + a 2 k ) + a 3 k and so on. Hence, given l subsets, the corresponding expression is
a 1 + + a l k ( l 1 ) .
Suppose that ω 1 ω M + 1 is a degree M + 1 atom contained in the co-information Δ I ( X 1 , , X M ) , which we need to demonstrate is contained in a degree M ideal. Restricting to { ω 1 , , ω M + 1 } , the minimum number of degree M atoms in Δ X i , { 1 , , M + 1 } for any i is given when X i corresponds locally to a partition of the form { { ω i } , { ω i } c } for some single ω i Ω [If this is not immediately clear, consider any partition of Ω —we could choose a coarser sub-partition into two parts which must contain fewer 2-atoms, so minimising the number of 2-atoms overall is equivalent to finding the minimum number of degree 2 atoms in a partition of Ω into 2 parts. This is equivalent to minimising the value of k · ( | Ω | k ) = k | Ω | k 2 for 0 < k < | Ω | , which happens at k = 1 or k = | Ω | 1 ].
Hence, there must be a minimum of C M 1 M = M degree M atoms in Δ X i , { 1 , , M + 1 } (as we have already selected one outcome from the M + 1 available outcomes—now we must select the other M 1 outcomes). The maximum size of the set of all possible degree M atoms in the restriction to { 1 , , M + 1 } is C M M + 1 = M + 1 .
Hence, taking the intersection of M variables, assuming a minimal number of degree M atoms, and using the expression in Equation (A6), we need to only to demonstrate that
M · M ( M 1 ) · ( M + 1 ) > 0 ,
which always evaluates to unity, proving that there is at least one degree M ideal containing every degree M + 1 atom in the intersection, proving the result. □
Proof of Theorem 4. 
Let our ideal be ω 1 ω k . We will proceed by induction on the difference d = | Ω | k , arguing at each step that the upper set is monotonic in probability of the last element ω k + d . We note that the sign of the upper set might only change if it contains additional outcomes, so provided we treat this carefully, we can also allow Ω to vary (via restriction) provided that it always contains ω 1 , , ω k .
As earlier, we will write ω 1 ω k S to illustrate that we are operating inside some restricting set S. These quasi-ideals are quite justified, as all of the previous results must still hold even if we assume the probabilities inside of S do not sum to 1. These atoms will still have the same measure regardless of the context S in which we find them.
For the first case | Ω | = k , we note that ω 1 ω k { ω 1 , , ω k } consists of the single atom ω 1 ω k . We will use the shorthand notation ω 1 , , ω k { ω 1 , , ω n } = ω 1 , , ω k n for some simplicity. By Theorem 1, which characterises the sign of individual atoms, we have both that ( 1 ) k μ ( ω 1 ω k k ) > 0 for P ( ω k ) 0 and that μ ( ω 1 ω k k ) varies monotonically in P ( ω k ) between 0 and μ ( ω 1 , , ω k 1 ) . So, the theorem is true for d = 0 .
Now, suppose that μ ( ω 1 ω k k + d ) varies monotonically in P ( ω k + d ) between 0 and μ ( ω 1 ω k k + d 1 ) . Then, we first note that
ω 1 ω k { ω 1 , , ω k + d , ω k + d + 1 } = ω 1 ω k { ω 1 , , ω k + d } ω k + d + 1 ω 1 ω k { ω 1 , , ω k + d } .
where we use the multiplicative notation to signify that ω k + d + 1 is added as an outcome to all atoms in ω 1 ω k . For example,
4 · 12 { 1 , 2 , 3 } = { 124 , 1234 } .
Hence, we can view μ ( ω 1 ω k ) as a function on P ( ω k + d + 1 ) = p k + d + 1 :
μ ( ω 1 ω k { ω 1 , , ω k + d + 1 } ) = μ ( ω 1 ω k { ω 1 , , ω k + d } ) + μ ( ω k + d + 1 ω 1 ω k { ω 1 , , ω k + d } )
We now notice that the second term can be expressed
μ ( ω k + d + 1 ω 1 ω k { ω 1 , , ω k + d } ) = μ ( ω 1 ω k ω k + d + 1 { ω 1 , , ω k , ω k + d + 1 } )
But now we can see that the difference between k + 1 and k + d + 1 is just d, so this reduces to the case for d. By assumption, we hence have that this ideal varies monotonically on ω k + d + 1 between 0 and μ ( ω 1 ω k k + d ) .
This means that the entire expression in Equation (A11) must monotonically vary between 0 and μ ( ω 1 ω k ) as a function of ω k + d + 1 .
Since we can construct each ideal by successively increasing d and this leaves the sign intact (note that no probability tends to infinity), the sign is left unchanged, proving the result. □
Proof of Theorem 7. 
Let J = g 1 , , g j be an ideal of strong even/positive parity with the result for odd/negative parity following equivalently. By the definition of strong even parity, we must have
μ ( J ) = α A P ( J α ) μ ( J α ) .
Every strong fixed-parity ideal is defined in terms of an a finite sum of ideals with one generator, so we may assume without loss of generality that the J α has single generators.
By virtue of Lemma 4, we know that P ( J α ) = sgn ( μ ( J α ) ) . Hence, taking the sum across all the J α values, we have that all of the terms P ( J α ) μ ( J α ) must be positive. As all terms in the sum positive, so too is μ ( J ) . □
Proof of Theorem 8. 
We will first allow ourselves to consider probabilities not summing to one, demonstrating that the sign has a given parity, and then we shall scale appropriately using the homogeneity property of μ to obtain meaningful probabilities once more, while the parity shall be fixed.
Suppose J is a strongly mixed ideal. Then, J has an even degree generator g. We first send all atoms in Ω g (as a set) to 0. Then, we have
μ J = μ g > 0 .
Summing the probabilities, we let K = ω g P ( ω ) . Then, we scale
0 1 K μ ( { P ( ω ) : ω g } ) = μ P ( ω ) K : ω g
where we now have ω g P ( ω ) K = 1 . Hence, we have found a given set of probabilities where μ ( J ) > 0 .
Repeating the exercise for g of odd degree will similarly show that there are probabilities such that μ ( J ) < 0 . Hence, μ ( J ) can be either positive or negative given a strongly mixed-parity ideal, giving the result. □
Proof of Theorem 9. 
We begin by briefly demonstrating that Z = X O R ( X , Y ) has co-information Δ I ( X ; Y ; Z ) given by a strongly odd parity ideal. Given the outcomes
XYZ ω
0001
0112
1013
1104
we have that Δ X Δ Y Δ Z = 123 , 124 , 134 , 234 . In this case, we have
μ ( 123 , 124 , 134 , 234 ) = μ ( 123 , 124 ) + μ ( 134 , 234 ) μ ( 123 , 124 134 , 234 ) = μ ( 123 , 124 ) + μ ( 134 , 234 ) μ ( 1234 )
where now we also have
μ ( 123 , 124 ) = μ ( 123 ) + μ ( 124 ) μ ( 1234 )
μ ( 123 , 234 ) = μ ( 123 ) + μ ( 234 ) μ ( 1234 ) .
Working backwards, we see that 123 , 124 and 134 , 234 are negative (odd) fixed-parity ideals, so that Δ X Δ Y Δ Z in this case is a negative fixed-parity ideal.
To show that there are no other such deterministic functions on three variables, we start by considering the case where X and Y are both binary variables. In this case, we know that we can express all events on Z in terms of f ( X , Y ) on the four outcomes
XY ω
001
012
103
114
In this case, we have Δ X Δ Y = 14 , 23 , which is known to have positive measure as it reflects a mutual information. Similarly, any subset 14 or 23 will also have a positive measure, so we cannot have an ideal generated by a degree 2 ideal alone. However, the ideal cannot have even and odd generators (as then it would have mixed parity by Theorem 8 and it cannot have generators more than degree 3 by Theorem 6). Hence, the ideal must be exclusively generated by degree 3 atoms and we must nullify these two degree 2 atoms.
Hence, we know that in order to have degree 3 atoms generating I ( X ; Y ; Z ) , Z must have equal values on the outcome pairs { 1 , 4 } and { 2 , 3 } . Moreover, we cannot have that f ( 0 , 0 ) = f ( 0 , 1 ) = f ( 1 , 0 ) = f ( 1 , 1 ) for all outcomes, as then I ( X ; Y ; Z ) = 0 . Hence, we must have that f ( 0 , 0 ) = f ( 1 , 1 ) = 0 and f ( 0 , 1 ) = f ( 1 , 0 ) = 1 ; that is, Z is the XOR gate.
We now extend by induction to give the full result. Let N X be the number of events in X and N Y the total number of events in Y. In the case where either N X or N Y is 1, then that variable must be constant and have zero entropy, so the co-information I ( X ; Y ; Z ) is trivially zero.
We have seen that in the case where N X = N Y = 2 that the only negative fixed-parity system of the form X , Y , f ( X , Y ) is the XOR gate. We consider the case N X = 3 and N Y = 2 to highlight the inductive argument. In this case, again, we know that Z can be computed deterministically from X and Y, allowing us to use the same trick. Labelling outcomes, we have
XY ω
001
012
103
114
205
216
In this case, we see that
Δ X Δ Y = 14 , 16 , 23 , 25 , 36 , 45 .
Again, this is a mutual information and hence positive. Thus, we know that for Z to be purely negative, it must be generated by purely degree 3 atoms, so Z must remain unchanged on these pairs of outcomes. Assigning a symbol to Z ( 0 , 0 ) , we can use the same trick as before and successively annihilate various pairs of atoms, giving us the following chain:
Z ( 0 , 0 ) = 0 Z ( 1 , 1 ) = 0 , Z ( 2 , 1 ) = 0 Z ( 2 , 0 ) = 0 , Z ( 1 , 0 ) = 0 Z ( 0 , 1 ) = 0
That is to say, Z does not vary on Ω and the co-information is zero in this case.
We now suppose for the induction that there are no fixed-parity systems with N X outcomes on X and N Y outcomes on Y. We will demonstrate that we can introduce an additional event to either X or Y and we will still obtain that Z is the trivial variable.
Without loss of generality, we increase N X by one. This will introduce N Y additional outcomes. As we have (without reference to the probabilities) demonstrated that Z ( ω 1 ) = 0 for ω 1 S 1 = { 1 , , N X N Y } , it suffices to show that for every further outcome ω 2 S 2 = { N X N Y + 1 , , ( N X + 1 ) N Y } , there is an atom ω 1 ω 2 with ω 1 S 1 .
For each ω 2 , we shall pick some ω 1 S 1 such that ω 1 ω 2 Δ X Δ Y . Using the ordering we have utilised so far and restricting to the bottom of the table, we may assume that X ( ω 2 ) = N X as a symbol, and Y ( ω 1 ) { 0 , , N Y 1 } .
Given ω 2 , we may select the outcome ω 1 to be the outcome corresponding to X = X ( ω 2 ) 1 and Y = Y ( ω 2 ) 1 , where in Y we perform arithmetic mod N Y . We must then have that X and Y both change when moving from ω 1 to ω 2 . By the definition of content, this means that the ω 1 ω 2 atom will be contained in Δ X Δ Y , as needed, so we must have Z ( ω 2 ) = Z ( ω 1 ) = 0 , showing that Z is actually the trivial variable, inductively giving the result. □

References

  1. Yeung, R.W. A new outlook on Shannon’s information measures. IEEE Trans. Inf. Theory 1991, 37, 466–474. [Google Scholar] [CrossRef]
  2. Ting, H.K. On the amount of information. Theory Probab. Its Appl. 1962, 7, 439–447. [Google Scholar] [CrossRef]
  3. James, R.G.; Crutchfield, J.P. Multivariate dependence beyond Shannon information. Entropy 2017, 19, 531. [Google Scholar] [CrossRef]
  4. Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
  5. Kolchinsky, A. A novel approach to the partial information decomposition. Entropy 2022, 24, 403. [Google Scholar] [CrossRef] [PubMed]
  6. Ince, R.A. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy 2017, 19, 318. [Google Scholar] [CrossRef]
  7. Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
  8. Rosas, F.E.; Mediano, P.A.; Rassouli, B.; Barrett, A.B. An operational information decomposition via synergistic disclosure. J. Phys. A Math. Theor. 2020, 53, 485001. [Google Scholar] [CrossRef]
  9. Barrett, A.B. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef] [PubMed]
  10. Down, K.J.; Mediano, P.A. A logarithmic decomposition for information. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 150–155. [Google Scholar]
  11. Down, K.J.; Mediano, P.A. A logarithmic decomposition and a signed measure space for entropy. arXiv 2024, arXiv:2409.03732. [Google Scholar]
  12. Baez, J.C.; Fritz, T.; Leinster, T. A characterization of entropy in terms of information loss. Entropy 2011, 13, 1945–1957. [Google Scholar] [CrossRef]
  13. Bell, A.J. The co-information lattice. In Proceedings of the Fifth International Workshop on Independent Component Analysis and Blind Signal Separation: ICA, Granada, Spain, 22–24 September 2004; Volume 2003. [Google Scholar]
  14. Campbell, L. Entropy as a measure. IEEE Trans. Inf. Theory 1965, 11, 112–114. [Google Scholar] [CrossRef]
  15. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  16. Davey, B.A.; Priestley, H.A. Introduction to Lattices and Order; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
  17. McGill, W. Multivariate information transmission. Trans. IRE Prof. Group Inf. Theory 1954, 4, 93–111. [Google Scholar] [CrossRef]
  18. Jansma, A. Higher-order interactions and their duals reveal synergy and logical dependence beyond Shannon-information. Entropy 2023, 25, 648. [Google Scholar] [CrossRef] [PubMed]
  19. Mediano, P.A.; Rosas, F.E.; Luppi, A.I.; Carhart-Harris, R.L.; Bor, D.; Seth, A.K.; Barrett, A.B. Towards an extended taxonomy of information dynamics via integrated information decomposition. arXiv 2021, arXiv:2109.13186. [Google Scholar]
Figure 1. An outcome space Ω = { 1 , 2 , 3 } and two variables X and Y defined over Ω . In this case, the intersection of the contents Δ X Δ Y is given by the ideal 12 . That is to say, I ( X ; Y ) = μ ( 12 ) . If the mutual information could be represented by a partition in this case, we would obtain something like the above intersection. This is, of course, impossible in the language of partitions but valid in ideals.
Figure 1. An outcome space Ω = { 1 , 2 , 3 } and two variables X and Y defined over Ω . In this case, the intersection of the contents Δ X Δ Y is given by the ideal 12 . That is to say, I ( X ; Y ) = μ ( 12 ) . If the mutual information could be represented by a partition in this case, we would obtain something like the above intersection. This is, of course, impossible in the language of partitions but valid in ideals.
Entropy 27 00151 g001
Figure 2. An I-diagram demonstrating the entropy structure for the OR gate. The shaded region corresponds to the ideal 14 , 123 . Note that in this case, the degree of the generators is bounded above by 3, as we have the intersection of 3 variables as per Theorem 6.
Figure 2. An I-diagram demonstrating the entropy structure for the OR gate. The shaded region corresponds to the ideal 14 , 123 . Note that in this case, the degree of the generators is bounded above by 3, as we have the intersection of 3 variables as per Theorem 6.
Entropy 27 00151 g002
Figure 3. The XOR gate and the four subsets of outcomes which directly contribute to the negativity of the co-information. The presence of nonzero probabilities in one of these ‘flower-shaped’ patterns is required for any synergistic effect in a system of three binary outcomes.
Figure 3. The XOR gate and the four subsets of outcomes which directly contribute to the negativity of the co-information. The presence of nonzero probabilities in one of these ‘flower-shaped’ patterns is required for any synergistic effect in a system of three binary outcomes.
Entropy 27 00151 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Down, K.J.A.; Mediano, P.A.M. Algebraic Representations of Entropy and Fixed-Sign Information Quantities. Entropy 2025, 27, 151. https://doi.org/10.3390/e27020151

AMA Style

Down KJA, Mediano PAM. Algebraic Representations of Entropy and Fixed-Sign Information Quantities. Entropy. 2025; 27(2):151. https://doi.org/10.3390/e27020151

Chicago/Turabian Style

Down, Keenan J. A., and Pedro A. M. Mediano. 2025. "Algebraic Representations of Entropy and Fixed-Sign Information Quantities" Entropy 27, no. 2: 151. https://doi.org/10.3390/e27020151

APA Style

Down, K. J. A., & Mediano, P. A. M. (2025). Algebraic Representations of Entropy and Fixed-Sign Information Quantities. Entropy, 27(2), 151. https://doi.org/10.3390/e27020151

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop