Next Article in Journal
Methodology for Predicting Geochemical Anomalies Using Preprocessing of Input Geological Data and Dual Application of a Multilayer Perceptron
Previous Article in Journal
Modelling of Batch Fermentation Processes of Ethanol Production by Kluyveromyces marxianus
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Information Inequalities for Five Random Variables

1
Alfréd Rényi Institute of Mathematics, 1053 Budapest, Hungary
2
Institute of Information Theory and Automation, CZ-182 00 Prague, Czech Republic
*
Author to whom correspondence should be addressed.
Computation 2026, 14(2), 42; https://doi.org/10.3390/computation14020042
Submission received: 29 December 2025 / Revised: 22 January 2026 / Accepted: 27 January 2026 / Published: 2 February 2026
(This article belongs to the Section Computational Engineering)

Abstract

The entropic region is formed by the collection of the Shannon entropies of all subvectors of finitely many jointly distributed discrete random variables. For four or more variables, the structure of the entropic region is mostly unknown. We utilize a variant of the Maximum Entropy Method to obtain five-variable non-Shannon entropy inequalities, which delimit the five-variable entropy region. This method adds copies of some of the random variables in generations. A significant reduction in computational complexity, achieved through theoretical considerations and by harnessing the inherent symmetries, allowed us to calculate all five-variable non-Shannon inequalities provided by the first nine generations. Based on the results, we define two infinite collections of such inequalities and prove them to be entropy inequalities. We investigate downward-closed subsets of non-negative lattice points that parameterize these collections, and based on this, we develop an algorithm to enumerate all extremal inequalities. The discovered set of entropy inequalities is conjectured to characterize the applied method completely.
MSC:
05B35; 26A12; 52B12; 90C29; 94A17; 52B40; 90C27

1. Introduction

Many important mathematical problems can be reduced to the following question: does a collection of finite random variables exist such that the entropies of the variable subsets satisfy certain linear constraints? Examples include, but are not limited to, channel coding [1] and network coding in particular [2], estimating the efficiency of secret sharing schemes [3,4,5], questions about matroid representations [6], guessing games [7], extracting information from common strings in cryptography [8], additive combinatorics [9], and finding conditional independence inference rules [10].
The entropy function of finitely many discrete random variables ξ i : i N indexed by the fixed finite set N maps the non-empty subsets I N to the Shannon entropy H ( ξ I ) of the variable set ξ I = x i : i I , see [11]. The entropy region, denoted by Γ N , is the range of the entropy function; it is a part of the 2 | N | 1 -dimensional Euclidean space where the coordinates are labeled by non-empty subsets of N. Entropies are non-negative real numbers, and thus the entropy region lies in the non-negative orthant of this Euclidean space. It is delimited by a collection of homogeneous linear inequalities corresponding to the non-negativity of basic Shannon information measures [11]. Points satisfying all these inequalities form the Shannon-bound; the Shannon-bound is denoted by Γ N .
N. Pippenger argued in [12] that linear inequalities bounding the entropic region Γ N encode the fundamental laws of Information Theory and determine the limits of information transmission and data compression. The long-standing problem of whether a linear information inequality can properly cut into the Shannon bound was settled in 1998 by Zhang and Yeung [13] who exhibited the first example of such a non-Shannon information inequality. Their discovery initiated intensive research. The phrase Copy Lemma was coined by Dougherty et al. [14] to describe the general method distilled from the original Zhang–Yeung construction. The Copy Lemma has been applied successfully to generate several hundred sporadic and a couple of infinite families of non-Shannon entropy inequalities for Γ 4 , see [14,15,16]. A different method, utilizing an information-theoretic lemma attributed to Ahlswede and Körner [17], was proposed in [18]; later it was shown to be equivalent to a special case of the Copy Lemma [19].
Our method to obtain five-variable non-Shannon entropy inequalities is based on a more general paradigm of which the Copy Lemma is a special case [20]. Derived from the principle of maximum entropy [21], it is called MEM, short for Maximum Entropy Method. For more details, see Section 3.
Previous works on generating and applying non-Shannon entropy inequalities, such as [4,10,14,15,22,23], focused on the four-variable case, and only a few sporadic five-variable non-Shannon inequalities have been discovered, such as the MMRV inequality from [18]. This is the first work that provides a method that generates an infinite collection of non-Shannon bounds on the five-variable entropy region Γ 5 . Compared to the four-variable case, there are significant challenges, both theoretical and computational. The four-variable entropy region Γ 4 sits in the 15-dimensional Euclidean space, while the five-variable region Γ 5 is 31-dimensional. The structure of the Shannon bound Γ 4 is well-understood: it has 41 extremal directions, and only 6 of them have no entropic points. The entropy region Γ 4 has an inner polyhedral cone where it fills its Shannon bound, and has six isomorphic “protrusions” towards the six exceptional extremal directions, each protrusion surrounded by 15 hyperplanes of which 14 come from the Shannon bound [24]. Only the protrusions contribute to new entropy inequalities, and their dimension can be reduced to 10. Computational results about Γ 4 can be obtained by computing vertices and facets of numerous implicitly defined 10-dimensional polyhedra [22]. In contrast, the Shannon bound Γ 5 of the five-variable entropy region has 117,983 extremal directions [25], and for a few of them it is not even known whether they contain an entropic point or not. No structural reduction similar to the four-variable case is available, and it is not known whether such a reduction exists or not. Computations about Γ 5 can still be reduced to 25-dimensional polyhedral enumeration problems (although with significantly larger number of constraints than in the 4-variable case). The complexity of enumeration problems typically doubles when the dimension increases by one, making such high-dimensional enumeration problems practically intractable.
We overcome this computational difficulty by applying a particular variant of the Maximum Entropy Method. This variant, working in generations, first reduces the problem dimension from 31 to 19, and then, at each generation, adds extra copies of some of the random variables, increasing the problem dimension again. Theoretical considerations and harnessing the inherent symmetry allowed us to complete the associated polyhedral computations up to nine generations. The output was the complete list of five-variable non-Shannon inequalities provided by the first nine generations. Based on the experimental results, we define an infinite collection of five-variable inequalities that we prove are provided by this MEM variant—in particular, they are valid non-Shannon entropy inequalities—and conjecture this collection to be complete; that is, no additional inequalities are yielded by this MEM variant. The collection of the inequalities is parametrized by finite, downward closed subsets of the non-negative lattice points of the plane. Some of the inequalities in our collection are consequences of the others; those that are not, are called extremal. We developed an incremental algorithm that enumerates, from generation to generation, the parameters yielding the extremal inequalities, in complete agreement with the computational results. The algorithm allowed us to significantly exceed the capabilities of polyhedral computation. While numerical instability prevented the completion of the polyhedral computation for the tenth generation, all extremal entropy inequalities were enumerated up to generation 60. Finally, we have examined the large-scale behavior of the extremal inequalities, and depicted how these inequalities delimit a three-dimensional cross-section of Γ 5 .
The new five-variable non-Shannon inequalities can be applied to real-world problems. The most immediate application is in network coding. The new inequalities tighten the boundaries; they provide stricter and more accurate bounds on network capacity. In a network protocol they can assist in proving whether a targeted data rate is achievable or not [2].
Cloud storage services (like Google Drive or AWS S3) distribute data fragments across many nodes [26]. In case of failure, the node has to download the missing data from other nodes. The new five-variable inequalities can be used to determine the theoretical limits of storage efficiency for systems with more complex failure models or larger clusters.
In the realm of secret sharing, entropy inequalities provide lower bounds on the size of secrets [3,4,5]. To explore another facet of these problems, the new inequalities can prove that certain efficient schemes are impossible to realize. In complex datasets, it is important to distinguish between correlation and actual causation [27]. When an AI model analyzes data to build a causal graph, it can use entropy inequalities to rule out models that are information-theoretically impossible, narrowing the search space and improving accuracy.
In this paper lemmas, claims and theorems are arranged so that each is used in the same section only, typically right after they are stated and proved. Section 4 proves structural properties of the entropy region that are used to reduce the computational complexity of the polyhedral algorithms. The main theoretical results are stated and proved in Section 7. In Theorem 1 we prove a large collection of entropy inequalities parametrized by the downward-closed subsets of the non-negative lattice points. Lemmas estimating different entropy expressions are used in this section only. Unfortunately, many inequalities provided by Theorem 1 are consequences of the others. Claims and lemmas in Section 8 provide the theoretical foundation for our algorithm that selects and enumerates the extremal inequalities among them.
The remaining part of the paper is organized as follows. Notations are recalled in Section 2. Section 3 describes the special variant of the Maximum Entropy Method we apply to Γ 5 . Section 4 discusses possible simplifications, including how symmetry can be utilized and how the MEM parameters were chosen. Section 5 describes the chosen coordinate systems, polyhedral computations, and their results. Section 6 presents the five-variable inequalities we obtained, paving the way for the definition of two infinite families of such inequalities in Section 7. Additional theoretical results, including the proof that inequalities in these families are indeed generated by the MEM method, are presented in Section 7. Section 8 discusses methods that can recognize extremal inequalities, discusses the incremental algorithm that enumerates the extremal inequalities for each MEM generation, describes the large-scale behavior of the new inequalities, and investigates the delimited part of the five-variable entropy region. Finally, Section 9 summarizes our work, lists open questions, and provides directions for further work.

2. Preliminaries

In this paper all sets are finite. Capital letters, such as A, J, N, etc., denote (finite) sets; elements of these sets are denoted by lower case letters. The union sign and the curly brackets around singletons are frequently omitted, thus, N i j denotes the set N { i , j } . The difference of two sets is written as A B , or  A b if the second set is a singleton. The star in the union A B emphasizes that A and B are disjoint sets. A partition of N is a collection of non-empty disjoint subsets of N whose union equals N.
A discrete random variable ξ takes its values from a finite set X , called alphabet. The probability that ξ takes x X is denoted by Pr ( ξ = x ) , or simply by Pr ( x ) when the random variable ξ is clear from the context. Suppose ξ is defined on the direct product X = i N X i for some finite set N, called the base set. For a non-empty A N the marginal ξ A is defined on the product alphabet X A = i A X i so that the probability of y X A is the sum of the probabilities of those x X whose projection to X A equals y:
Pr ( y ) = { Pr ( x ) : x A = y } .
To emphasize that ξ is defined on a product space, we write ξ = ( ξ i : i N ) , and say that the random variables ξ i are distributed jointly. The Shannon entropy of the distribution ξ is defined as
H ( ξ ) = x X Pr ( x ) log Pr ( x )
with the convention that 0 log 0 = 0 . If ξ = ( ξ i : i N ) is a joint distribution, then we write H ξ ( A ) for H ( ξ A ) . The index ξ is also dropped when it is clear from the context. By convention, H ( ) = 0 . The entropies H ξ ( A ) are arranged into a vector indexed by the non-empty subsets A of N. This vector is the entropy profile of the distribution ξ . The collection of these ( 2 | N | 1 ) -dimensional vectors forms the entropy region, denoted by Γ N . Elements of Γ N are considered interchangeably as vectors, as points in this Euclidean space, and as functions assigning non-negative real numbers to non-empty subsets of the base set N. For a gentle introduction to these notions of Information Theory, please consult [11].
Notions of conditional entropy, mutual information, and conditional mutual information from Information Theory are formally extended to the functional form of these vectors. If f is any function on subsets of N, then for subsets A , B , C , D of N the following forms will be used as abbreviations:
f ( A | B ) = def f ( A B ) f ( B ) , f ( A , B ) = def f ( A ) + f ( B ) f ( A B ) , f ( A , B | C ) = def f ( A C ) + f ( B C ) f ( A B C ) f ( C ) ,   and f [ A , B , C , D ] = def f ( A , B ) + f ( A , B | C ) + f ( A , B | D ) + f ( C , D ) .
The first three expressions are called conditional entropy, mutual information, and conditional mutual information, respectively. The last line defines the Ingleton expression. An entropy function is not defined on the empty set, nevertheless, f ( ) = 0 will be assumed whenever convenient. In particular, f ( A , B | ) and f ( A , B ) are the same expressions. Frequently, when clear from the context, the function f is omitted before the parenthesized expression. Additionally, if applied to singletons, the Ingleton expression is written without commas. An example is the inequality
[ a b c d ] + ( a , b | z ) + ( b , z | a ) + ( a , z | b ) + 3 ( z | a b ) 0 .
Shannon inequalities state the non-negativity of the conditional entropy, mutual information, and conditional mutual information for all subsets A, B, C of the base set N. They are consequences of the unique minimal set of such inequalities, called basic Shannon inequalities, see [11], listed in (B1) and (B2) below:
(B1)
f ( i | N i ) 0 for all i N ;
(B2)
f ( a , b | K ) 0 for all K N and different a , b N K , including K = .
The collection of all ( 2 | N | 1 ) -dimensional vectors (or points, or functions) that satisfy the Shannon inequalities is denoted by Γ N . It is a natural outer bound for the entropy region Γ N . Γ N is a pointed polyhedral cone [28]; its facets are the hyperplanes specified by the basic Shannon inequalities. Polymatroids are elements of Γ N written in functional form. A polymatroid is usually written as ( f , N ) , or just f, when we say that f is on N. The polymatroid f is entropic if it is in Γ N , and almost entropic, or aent for short, if it is in the closure (in the usual Euclidean topology) of Γ N . Linear inequalities valid for all polymatroids are consequences of the basic Shannon inequalities; an example is the inequality (3). A non-Shannon inequality is a homogeneous linear inequality that is valid for points of the entropic region but not for all points of the Shannon bound. Equivalently, the non-negative side of the hyperplane corresponding to such an inequality contains the complete entropy region, while it cuts properly into Γ N .
The closure of the entropic region is a pointed convex full-dimensional cone [11], and only its boundary points can be non-entropic [29].
The polymatroid ( f , N ) on the base set N is linearly representable over the field F , or  F -representable in short, if there is a finite-dimensional vector space V over F , and linear subspaces V i V for i N , such that for all I N , f ( I ) is the dimension of the linear subspace spanned by i I V i . Clearly, if both ( f , N ) and ( g , N ) are F -representable over the same field, then so is their sum f + g . The polymatroid f is F -linear if it is in the closure of the multiplies of F -representable polymatroids. By the previous remark, F -linear polymatroids form a closed cone. Finally, f is linear if it is F -linear for some field F .
Following a compactness argument, if f is F -representable, then it is representable over some finite field as well, see [30], meaning that the vector space V is also finite. Taking the uniform distribution on V provides the entropic polymatroid ( log | V | ) f . Thus, linear polymatroids are also almost entropic.
Linear polymatroids on the base set N with | N | 5 are F -linear for every field F , see [24,31]; this statement is not true in general. For  | N | 3 every polymatroid is linear. For  N = { a b c d } a polymatroid f on N is linear if and only if it satisfies the following six instances of the Ingleton inequality:
f [ a b c d ] 0 , f [ a c b d ] 0 , f [ a d b c ] 0 , f [ b c a d ] 0 , f [ b d a c ] 0 , f [ c d a b ] 0 ,
see [24]. Since the Ingleton expression is symmetric in the first two and in the last two arguments, these expressions cover all 24 permutations of N.
Finally, we recall notions of independence. Let ( f , N ) be a polymatroid, and X, Y 1 , …, Y k be disjoint subsets of N. Y 1 and Y 2 are independent in f if f ( Y 1 , Y 2 ) = 0 . The collection Y 1 , , Y k is completely independent in f if for any two disjoint subsets I and J of the indices { 1 , 2 , , k } , Y I = i I Y i and Y J are independent, or, equivalently, if 
f ( Y 1 Y k ) = f ( Y 1 ) + + f ( Y k ) .
In this case we also have f ( Y I ) = i I f ( Y i ) for every subset I of the indices. The disjoint subsets Y 1 and Y 2 are conditionally independent over X if f ( Y 1 , Y 2 | X ) = 0 ; and Y 1 , , Y k are completely conditionally independent over X if Y I and Y J are conditionally independent over X for arbitrary disjoint subsets I and J of the indices. An equivalent condition is
f ( Y 1 Y k | X ) = f ( Y 1 | X ) + + f ( Y k | X ) ,
which similarly implies f ( Y I | X ) = i I f ( Y i | X ) for every index set I.

3. The Maximum Entropy Method

In general terms, the principle of maximum entropy is easy to formulate: if a probability distribution is specified only partially, take the one with the largest entropy, see, e.g., [21]. In the particular case applied here “partial specification” means fixing some, but not all, marginal distributions. To be more concrete, suppose ξ is distributed jointly on the base set N. Partition N into three non-empty subsets as N = Y X Z . Take n 1 disjoint copies of Y and m 1 disjoint copies of Z to form the enlarged base set
N = Y 1 Y n X Z 1 Z m .
Consider the collection of those distributions ξ on N whose marginals on Y i X are equal to ξ Y X , and marginals on X Z j are equal to ξ X Z . That is, the marginal of ξ on Y X and the marginals of ξ on all Y i X are the same as well as the marginal of ξ on X Z and the marginals of ξ on X Z j . This collection of distributions is not empty, as one can take each Y i to be the same as Y, and each Z j to be the same as Z. The total entropy is a strictly concave function of the probability masses, and fixing certain marginals imposes linear constraints on those masses. Consequently, there is a unique optimal distribution ξ with maximum total entropy, see [32]. Although structural properties of the maximum entropy distributions are mainly unknown, they are known to satisfy numerous conditional independencies. For this particular case, these are stated as Lemma 1 below.
Lemma 1.
In the distribution with maximum total entropy, the subsets Y 1 , , Y n and Z 1 , , Z m are completely conditionally independent over X.
Proof. 
If some of the conditional independence statements do not hold, then one can redefine the distribution keeping the specified marginals while increasing the total entropy. For details, see [20]. □
Since identical distributions have identical entropy profiles, Lemma 1 immediately implies that an entropic polymatroid has an n , m -copy as defined below:
Definition 1.
Let f be a polymatroid on N, and partition N into three non-empty subsets as N = Y X Z . Let Y 1 , , Y n and Z 1 , , Z m be disjoint copies of Y and Z, respectively. The polymatroid f on the base set N = Y 1 Y n X Z 1 Z m is an  n , m -copy of f if
(i) 
f restricted to Y i X is isomorphic to f Y X for every i n ,
(ii) 
f restricted to X Z j is isomorphic to f X Z for every j m ,
(iii) 
the n + m subsets Y 1 , , Y n , Z 1 , , Z m are completely conditionally independent over X in f .
The special version of the Maximum Entropy Method used in this paper is based on the fact that entropic polymatroids have n , m -copies. For fixed integers n and m, polymatroids on Y X Z that have an n , m -copy form a polyhedral cone C n , m . This is proved as Claim 1 below. The cone C n , m contains the complete entropy region Γ Y X Z , and is contained in the Shannon cone Γ Y X Z . Consequently, bounding facets of the cone C n , m that are not facets of the Shannon cone provide new entropy inequalities. This method is summarized as follows.
Maximum Entropy Method
(special case). Fix the base set N and the partition N = Y X Z . For  n , m 1 let C n , m be the polyhedral cone of those polymatroids on N that have an n , m -copy. Compute all bounding facets of C n , m as homogeneous linear inequalities, and delete those which are consequences of the basic Shannon inequalities. The remaining inequalities form the maximal set of non-Shannon inequalities provided by the partition Y X Z and the numbers n and m.
Let us remark that while the maximum entropy extension is unique, the  m , n -copy in Definition 1 is typically not, as the definition captures only a small part of the properties of the maximum entropy extension. The obtained entropy inequalities form the facets of a convex polytope; consequently, they are independent in the sense that none of them is a consequence of the others or the Shannon inequalities.
Next we prove that C n , m is a polyhedral cone indeed.
Claim 1.
Polymatroids ( f , N ) with an n , m -copy form a polyhedral cone.
Proof. 
Consider the polymatroid f as a ( 2 | N | 1 ) -dimensional vector indexed by the non-empty subsets of N. Write this vector as ( x , u ) where x of dimension d 1 contains those coordinates where the index I is a subset of either Y X or X Z , and  u of dimension d 2 contains the rest, namely those subsets that intersect both Y and Z. Clearly, d 1 + d 2 = 2 | N | 1 . Similarly, let y be the vector formed from the values of the n , m -copy polymatroid f as indexed by the subsets of N . The vector y has dimension d 3 = 2 | N | 1 . Now, ( f , N ) is a polymatroid if the vector y satisfies all linear inequality constraints imposed by the basic Shannon inequalities in (B1) and (B2); and it is an n , m -copy of f if, additionally, the composed vector ( x , y ) satisfies the equality constraints corresponding to conditions (i)–(iii) in Definition 1. Consequently, there exists a matrix M with d 1 + d 3 columns, depending only on the partition Y X Z and the numbers n and m, so that f has an n , m -copy if and only if there is a vector y satisfying M · ( x , y ) 0 . Similarly, ( f , N ) is a polymatroid if, for another matrix B with ( d 1 + d 2 ) columns expressing the basic Shannon inequalities for Y X Z , we have B · ( x , u ) 0 . Thus the collection of polymatroids on N that have an n , m -copy is the set
Q = { ( x , u ) R d 1 + d 2 : B · ( x , u ) T 0 ,   and M · ( x , y ) 0   for   some   y R d 3 } .
Here M and B are matrices with integer entries; these matrices depend only on Y X Z , n, and m. Since Q is the intersection of a polyhedral cone and the projection of a polyhedral cone, it is also a polyhedral cone, as claimed. □
From the proof it is clear that the u -part of Q is constrained only by the basic Shannon inequalities encoded in the matrix B. Furthermore, constraints on x imposed by the first condition are contained in the second one. Thus, it suffices to consider the bounding facets of
Q = x R d 1 : M · ( x , y ) 0   for   some   y R d 3
for new entropy inequalities. This is because, due to the duality theorem of linear programming [28], facets of Q are convex linear combinations of facets of Q and facets corresponding to the basic Shannon inequalities for the base set Y X Z .
Coordinates in x are indexed by subsets of Y X and X Z , so the inequalities provided by the bounding facets of Q contain only elements of the restrictions f Y X and f X Z . We emphasize that these restrictions are not arbitrary polymatroids on Y X and X Z with a common restriction on X, as they also have a common extension, namely f. Conditions ensuring the existence of such a common extension are assumed to hold, see [33], and they do not contribute towards the non-Shannon entropy inequalities we are searching for.

4. What to Compute? How to Compute?

As discussed in Section 3, the task of finding new non-Shannon entropy inequalities implied by the existence of an n , m -copy reduces to enumerating all facets of the polyhedral cone Q defined in (8). However, without further reduction, this polyhedral computation is intractable even for small parameter values. Therefore, in this section we look at some general methods to reduce the complexity of the computation, and then discuss how the number of elements in the Y X Z partition was chosen.

4.1. Tight and Modular Parts

Both the polyhedral region Γ N and the closure of the entropy region Γ N ¯ decompose naturally into direct sums of modular and tight parts, see [23]. To discuss this result, let us first introduce some notation. For  i N define the function r i on the non-empty subsets A of N as
r i : A 1 if   i A , 0 otherwise .
Non-negative multiples of r i are clearly entropic polymatroids; modular polymatroids are, by definition, the conic combinations of the vectors { r i : i N } . For a polymatroid ( f , N ) , a singleton i N and a real number α 0 , the function f α i is defined on the non-empty subsets of N as follows:
f α i : A min { f ( A i ) α , f ( A ) } .
When α is set to f ( i | N i ) , f α i is denoted simply by f i . Note that for i A we have f ( A i ) f ( A ) f ( N ) f ( N i ) = f ( i | N i ) by submodularity. Consequently, f i can be written explicitly as
f i ( A ) = f ( A ) f ( i | N i ) if   i A , f ( A ) if   i A .
Therefore, f = f i +   f ( i | N i ) r i , where r i is the polymatroid defined in (9). The result of tightening f at i is the function f i . The tight part of f, denoted by f , is the result of tightening f at every element of its base set N = { i 1 , , i n } :
f   = ( ( f i 1 ) i 2 ) i n .
This result is independent of the order in which the reductions are applied, which is also shown by the decomposition formula
f = f + i N f ( i | N i ) r i .
The proof of the following lemma can be found in [20] or [34]. In this paper only the first part of the lemma is needed, which can be verified by direct computation.
Lemma 2.
Let 0 α f ( i ) . If f is a polymatroid, then f α i is also a polymatroid. If, in addition, f is almost entropic, then so is f α i . □
Accordingly, f (the tight part of f) is a polymatroid, and it is also almost entropic (aent) whenever f is aent. The difference f f is the modular part, and it is a modular polymatroid. This decomposition of f into a tight and a modular part is unique, and both parts are aent if f is aent.
The cone formed by the modular polymatroids over N is | N | -dimensional, and is generated by the linearly independent vectors { r i : i N } . The cone of tight polymatroids is orthogonal to this (modular) cone, and so to every vector r i , and is bounded by the hyperplanes corresponding to the basic Shannon inequalities in (B2). The cone of tight, almost entropic polymatroids is similarly orthogonal to the modular cone. A consequence of this decomposition is that linear bounds on the entropic cone also decompose into bounds on the tight part and bounds on the modular part—the latter being trivial, that is, a Shannon inequality. The normal n of a supporting hyperplane of the tight part is necessarily orthogonal to all vectors r i , that is, the scalar products n · r i are zero. Consequently, if the normal has the coordinates n = t I : I N , then the sum { t I : i I } is zero for every i N . For this reason, these hyperplanes are called balanced. The tight component of any entropy inequality is balanced, and it is also an entropy inequality. This fact is equivalent to saying that every entropy inequality can be strengthened to become a balanced one, see [35].
From the above it follows that the facets of the cone Q belong to two disjoint groups. There are | N | (trivial, Shannon) facets that bound the modular part of Q , and the rest bound the tight part. The normal vectors of the facets in the second group are balanced, and only they can provide non-Shannon inequalities. Therefore, it suffices to consider only the tight part of Q . This part is generated by a smaller collection of polymatroids, has fewer dimensions, and so can be handled more efficiently.
Claim 2.
The tight part of Q is generated by the n , m -copies of the polymatroids f on N = Y X Z that are (i) tight; (ii) satisfy f ( Y , Z | X ) = 0 ; and (iii) for all y Y , f ( y | Y X y ) = 0 , and for all z Z , f ( z | X Z z ) = 0 .
Observe that the tightness of f at the elements of Y and Z follows from condition (iii) and submodularity; thus, (i) is relevant only for elements of X.
Proof. 
Let f be an n , m -copy of f. In the definition of Q only the values of f Y X and the values of f X Z are used. Therefore, f can be replaced with any other polymatroid that has the same restrictions. Such a polymatroid is f Y 1 X Z 1 by part (i) of Definition 1, which gives (ii). For (iii) let y Y , and  α = f ( y | Y X y ) . Apply Lemma 2 to f and all instances of y in the copies Y i to get the new polymatroid g . Denoting the instance of y in Y 1 by y 1 , the lemma provides g ( y 1 | Y 1 X y 1 ) = 0 . In addition, g is an n , m -copy of its restriction to Y 1 X Z 1 . Since this restriction and the polymatroid f differ only by a modular shift on subsets of Y X and X Z , their tight parts are the same. A similar reduction on elements of Z, and finally on elements of X, provides the statement. □
Using Claim 2, the number of columns in the constraint matrix M in (8) can be significantly reduced. It is so since, by the tightness of f, f ( A i ) = f ( A ) holds for many subsets A of N with few elements, and this equality implies f ( B i ) = f ( B ) for every A B N .

4.2. Symmetry

The inherent symmetry in the n , m -copy allows for another significant complexity reduction. Let π be one of the ( n ! m ! ) permutations of the base set N that permutes the subsets Y i and the subsets Z j independently. This permutation naturally extends to the subsets of N , and then to the polymatroids on N . The  n , m -copy f of f is symmetric if it is invariant for each such permutation π , that is, f ( A ) = ( π f ) ( A ) = f ( π A ) for all A N .
Claim 3.
f has an n , m -copy if and only if it has a symmetric n , m -copy.
Proof. 
If f is an n , m -copy of f, then clearly so is π f . Since conditions (ii) and (iii) in Definition 1 are linear, they are also satisfied by the average of all such permutations of f , that is, by the polymatroid g = ( n ! m ! ) 1 π π f . Clearly, g is a symmetric n , m -copy of f. □
Symmetry alone reduces the number of auxiliary variables in the definition of Q from exponential in n and m to polynomial in these parameters.

4.3. No New Inequality

In some cases, the computations required by the Maximum Entropy Method, as defined in Section 3, can be simplified further, or even completely avoided. The first claim of this subsection states that certain polymatroids do not contribute to new entropy inequalities.
Claim 4.
Suppose f is a polymatroid on N = Y X Z , and f restricted to X is modular. Then f has an n , m -copy for every n and m.
Proof. 
The statement follows from the following lemma by induction. □
Lemma 3.
Suppose that the polymatroids ( f 1 , Y X ) and ( f 2 , X Z ) have a common restriction on X which is modular. Then there is a polymatroid ( g , Y X Z ) that extends both f 1 and f 2 such that g ( Y , Z | X ) = 0 .
Proof. 
For I Y , J X and K Z define
g ( I J K ) = def min L { f 1 ( I L ) + f 2 ( L K ) f 1 ( L ) : J L X } .
Using the fact that f 1 X and f 2 X are isomorphic and modular, a simple calculation shows that g is a polymatroid and satisfies the requirements. For details, consult [20,33] or [29]. □
If either Y or Z has a single element, then one does not need to look beyond n , 1 -copies.
Claim 5.
Suppose | Z | = 1 . Entropy inequalities generated by n , m -copies of polymatroids on Y X Z are also generated by n , 1 -copies.
Proof. 
We claim that the cone generated by the tight part of n , m -copies is the same as the cone generated by the n , 1 -copies. To prove this, let f be a polymatroid on Y X Z that satisfies the conditions of Claim 2, and let f be an n , 1 -copy of f so that f is identified with f Y 1 X Z 1 where Z 1 has a single element z. Let g be the polymatroid when m 1 identical copies of z are added to f . We claim that g is an n , m -copy. The only non-trivially satisfied condition is that the copies of z, z 1 and z 2 , are independent over X. Since f ( z | X ) = 0 by (iii) of Claim 2, we have g ( X z 1 ) = g ( X z 2 ) = g ( X z 1 z 2 ) = g ( X ) , thus g ( z 1 , z 2 | X ) = 0 . Since g Y 1 X Z 1 and f Y 1 X Z 1 are the same polymatroids, the  n , m -cone is part of the n , 1 -cone, as claimed. □

4.4. Problem Parameters

By Claim 4, the Maximum Entropy Method does not yield new inequalities when f X is modular. This is certainly the case when | X | = 1 , so we must have | X | 2 . By Claim 5, if  | Y | = | Z | = 1 , then beyond the 1 , 1 -copy, no additional inequalities are generated. The smallest parameter setting when new entropy inequalities are expected as the number of copies grows is | Y | = 2 , | X | = 2 , and  | Z | = 1 . We fix these sizes, as well as the labels of the members of each set as
X = { a , b } , Y = { c , d } , and   Z = { z } .
Since | Z | = 1 , according to Claim 5, it suffices to consider n , 1 -copies only. To simplify the notation, the extra 1 will be dropped and we write n-copy instead. We also explicitly state the definition of the n-copy for this particular partition.
Definition 2.
Let f be a polymatroid on N = { a b c d z } , and let n 1 . The polymatroid f on the base set N = a b z { c i d i : 1 i n } is an n-copy of f, if 
(i) 
f a b z is isomorphic to f a b z , and, for each i n , with the c i c , d i d correspondences, f a b c i d i is isomorphic to f a b c d ;
(ii) 
{ c i d i : i n } and z are completely conditionally independent over a b .
This special case of the Maximum Entropy Method provides new non-Shannon entropy inequalities based on the fact that entropic polymatroids on the 5-element base set a b c d z have an n-copy for each n 1 . The steps we will follow are as below:
  • Fix the number of copies n, called a generation. Determine the generating matrix M of the cone Q as specified in Claim 1 using only polymatroids that satisfy the conditions of Claim 2.
  • The new inequalities are provided by the non-Shannon facets of the tight part of Q ; these facets can be computed using some polyhedral algorithm from the generating matrix M.

5. Computation

The cone Q whose non-Shannon bounding facets provide the new entropy inequalities sits in the d 1 = 19 -dimensional Euclidean space with coordinates indexed by the non-empty subsets of Y X = { a b c d } and X Z = { a b z } . Fix the number of copies to n 1 . This choice also fixes the dimension d 3 of the vector y . The generating matrix M of the polyhedral cone Q from (8) is repeated here:
Q = x R 19 : M · ( x , y ) 0   for   some   y R d 3 .
The modular part of Q is 5-dimensional, and so its tight part sits in a 14-dimensional subspace of R 19 .
A structural property of the polymatroid region on the four-element set a b c d allows us to further reduce the complexity of the polyhedral computation required in step (2) above. This region has a central part and six permutationally equivalent “protrusions,” depending on the signs of the Ingleton expressions
f [ a b c d ] , f [ a c b d ] , f [ a d b c ] , f [ b c a d ] , f [ b d a c ] ,   and   f [ c d a b ] .
If all of them are non-negative, then the restriction f a b c d is a linear polymatroid; otherwise exactly one of these Ingleton expressions is negative, see e.g., [24]. Accordingly, the cone Q is cut into seven parts by these Ingleton hyperplanes: the central part where all Ingleton values are non-negative, and six other parts where exactly one of the expressions is negative. The facets of each part can be computed separately.
Parts of Q on the negative side of [ a c b d ] , [ a d b c ] , [ b c a d ] , and  [ b d a c ] are isomorphic because swapping a b and/or c d are symmetries of Q . Therefore, it suffices to consider only one of them. The central part, where every Ingleton expression is non-negative, does not yield new inequalities. This follows from Lemma 4 below, as the elements of the central part are linear.
Lemma 4.
If f restricted to a b c d is linear, then f has an n-copy for all n 1 .
Proof. 
Since every polymatroid on three elements is linear, and linearly representable polymatroids on three or four elements are representable over any field, we can assume, after scaling and using continuity, that both f a b c d and f a b z are F -linearly representable over the same finite field F . Denote the two representing vector spaces by V 1 and V 2 , and consider the subspace arrangements ( V a 1 , V b 1 ) and ( V a 2 , V b 2 ) in the two vector spaces. Now V a i and V b i have dimensions f ( a ) and f ( b ) , respectively, and their linear span has dimension f ( a b ) . Therefore, these arrangements are isomorphic, and  V 1 and V 2 can be glued along the linear span of ( V a 1 , V b 1 ) and ( V a 2 , V b 2 ) . This gluing yields an F -linear polymatroid g that has the same restrictions on a b c d and on a b z as f does. Since this g is entropic, it has an n-copy for every n 1 . This n-copy is also an n-copy of f, as required. □
Consequently, up to the a b and c d symmetries, three mutually exclusive cases are left: f [ a b c d ] < 0 , f [ a c b d ] < 0 , and  f [ c d a b ] < 0 . Using the homogeneity of Q , the Ingleton value can be set to 1 , in effect taking a cross-section of Q that has one fewer dimension. Facets of the part of Q we are considering are also facets of these cross-sections; consequently, only facets of the cross-sections need to be computed. We consider these three cases separately in the subsections below.
The definition (16) of the cone Q uses the 19-dimensional coordinate system where the coordinates of the vector x are labeled by the non-empty subsets of a b c d and a b z . In all three cases we perform calculations in different coordinate systems that are chosen so that
  • The first coordinate is the Ingleton expression defining the cross-section;
  • The tight and modular parts of the cross-section have disjoint coordinates;
  • Apart from the Ingleton coordinate, other coordinates have non-negative values.
The first property allows to set the Ingleton value explicitly. Based on the second property, the tight part of the cross-section can be separated by dropping some coordinates; and the third property potentially reduces the complexity of the polyhedral enumeration algorithm.

5.1. Case I

The cone Q is intersected with the hyperplane [ a b c d ] = 1 . In this case we use the coordinate system
C 1 : [ a b c d ] , C 2 C 4 : ( a , b | c ) , ( a , c | b ) , ( b , c | a ) , C 5 C 7 : ( a , b | d ) , ( a , d | b ) , ( b , d | a ) , C 8 C 11 : ( c , d | a ) , ( c , d | b ) , ( c , d ) , ( a , b | c d ) , C 12 C 14 : ( a , b | z ) , ( a , z | b ) , ( b , z | a ) , C 15 C 19 : ( a | b c d ) , ( b | a c d ) , ( c | a b d ) , ( d | a b c ) , ( z | a b ) .
Coordinates C 15 C 19 cover the modular part of Q . The tight part is spanned by the coordinate vectors C 1 C 14 , and each of these vectors is orthogonal to the modular part. Let P ˜ 1 be the inverse of the matrix of this coordinate transformation, and the vector p 1 be the first row of P ˜ 1 . Let P 1 be the submatrix formed from rows 2 to 14 of P ˜ . Coordinates of the vector x R 19 in this coordinate system are P ˜ 1 x , and, in particular, the Ingleton value f [ a b c d ] is the scalar product p 1 · x . Consequently, the tight part of the intersection of Q and the hyperplane [ a b c d ] = 1 in this coordinate system is
Q 1 = P 1 x : p 1 · x = 1 ,   and   M · ( x , y ) 0   for   some   y R d 3 .
Finding all facets of Q 1 determined by the matrices M and P ˜ is closely related to linear multiobjective optimization [36], and can benefit significantly by working in the 13-dimensional target space [37] instead of the significantly larger, d 3 -dimensional problem space. We have developed a variant of Benson’s inner approximation algorithm [22,38] which takes advantage of the additional special property that Q 1 is in the non-negative orthant of the target space. The program version 1.3 is available on GitHub as https://github.com/csirmaz/information-inequalities-5, (accessed on 5 January 2026).
Table 1 shows the sizes of the generating matrix M, the  total number of facets and vertices (including extremal directions) of the cross-section Q 1 , and the running time of the vertex enumeration algorithm on a single-core desktop computer with an Intel® Core i5-4590 CPU @ 3.30 GHz processor and 8 GB of memory. The running time was taken up almost exclusively by the underlying LP solver. While the number of facets grows quite moderately with n, the number of vertices more than doubles at each generation. The matrix M, despite the numerous improvements, is highly degenerate, and numerical instability, originating from both the LP solver and the applied polyhedral algorithm, prevented the completion of the computation for larger values of n. The results of the computation are presented in Section 6.

5.2. Case II

The cone Q is intersected with the hyperplane [ a c b d ] = 1 . The coordinate system is similar to the one used in Section 5.1. Base elements b and c are swapped in coordinates C 1 C 11 , while the other coordinates remain unchanged. The tight part of the intersection, denoted by Q 2 , is defined similarly with the same matrix M but a different coordinate transformation matrix P ˜ 2 , vector p 2 , and submatrix P 2 as
Q 2 = P 2 x : p 2 · x = 1 ,   and   M · ( x , y ) 0   for   some   y R d 3 .
The problem size, number of facets and vertices, and the running time in seconds are summarized in Table 2. Both the number of facets and the number of vertices grow moderately. A plausible conjecture is that, in general, the number of facets is 2 n + 14 , and the number of vertices is 2 n 2 + 17 .
The running time is significantly shorter than in Section 5.1. It is explained by the fact that the polyhedral algorithm requires solving an LP instance for each vertex and each facet in the result, and those numbers are significantly smaller here. The generating matrix M is the same in both cases, implying that the problem size is the same. Numerical instability prevented completing the computation for n = 10 even in this case.

5.3. Case III

No new inequality is generated when the cone Q is intersected with the hyperplane [ c d a b ] = 1 . This can be proved as follows. Since this intersection, denoted by Q 3 , is an (unbounded) polyhedron, every polymatroid in Q 3 is a conic combination of its vertices and extremal directions. These vertices and extremal directions can be represented by certain extremal polymatroids. Conic combinations of polymatroids that have an n-copy also have an n-copy. Consequently, it suffices to show that these extremal polymatroids have an n-copy for all n 1 .
Changing the first 11 coordinates of the coordinate system used in Section 5.1 to
C 1 : [ c d a b ] , C 2 C 4 : ( c , d | a , ( a , c | d ) , ( a , d | c ) , C 5 C 7 : ( b , c | d ) , ( c , d | b ) , ( b , d | c ) , C 8 C 11 : ( a , b | c ) , ( a , b | d ) , ( a , b ) , ( c , d | a b ) ,
and keeping the rest, the vertex enumeration algorithm used in the previous cases generated the vertices and extremal directions of the 13-dimensional tight part of Q 3 . The computation showed that it is a pointed cone with a single vertex that has coordinates C 2 C 14 equal to zero (while C 1 = 1 ) and has 14 extremal directions, 12 of which are coordinate axes. Polymatroids representing the extremal directions are linear when restricted to the base set a b c d (they satisfy f [ c d a b ] = 0 ; therefore, the other Ingleton values are also non-negative). Consequently, these polymatroids have an n-copy for all n 1 . Finally, the remaining polymatroid at the single vertex has f ( a , b ) = 0 (as the coordinate C 10 is zero), which means that f a b is modular. By Claim 4 it also has an n-copy for all n 1 . This concludes the proof that no non-Shannon inequality is generated in this case.

6. Experimental Information Inequalities

For a fixed n 1 , the problem of extracting the set of non-Shannon inequalities that form the necessary and sufficient conditions for the existence of an n-copy of a polymatroid on the base set a b c d z was shown to be equivalent to determining all facets of a 13-dimensional polyhedral cone. The cone was cut into several pieces and the facets of each piece were computed for n 9 . In this section we take a quick look at the computational results. Below, the symbols Z , C , D denote the following entropy expressions:
Z = def ( a , z | b ) + ( b , z | a ) , C = def ( a , c | b ) + ( b , c | a ) , D = def ( a , d | b ) + ( b , d | a ) .

6.1. Case I

In the [ a b c d ] < 0 case, facets of the polyhedron Q 1 from (19) include all 13 coordinate planes orthogonal to the coordinate axes C 2 C 14 . These facets correspond to the non-negativity of the expression defining the coordinate. Q 1 has two additional Shannon facets, corresponding to the Shannon inequalities ( a , z ) 0 and ( b , z ) 0 . The remaining facets determine the non-Shannon inequalities we are interested in. They come in three flavors:
( a , b | z ) + α s [ a b c d ] + α s Z + β s C + γ s D 0 ,
( a , b | c ) + α s [ a b c d ] + ( α s + β s ) C + γ s D 0 ,
( a , b | d ) + α s [ a b c d ] + β s C + ( α s + γ s ) D 0 ,
where α s , β s , γ s are certain triplets of non-negative integers. For illustration, consider the n = 3 case. As reported in Table 1, for  n = 3 the polyhedron Q 1 has 34 facets. These facets determine 15 Shannon inequalities, 11 inequalities of the form (22), and 4–4 inequalities of the form (23) and (24). The  α , β , γ triplets appearing in (22) are listed in three columns in Table 3. Inequalities in (23) and (24) use triplets from the first column only; these are the triplets that also appear in the n = 2 generation.
In general, inequalities in (23) and (24) are consequences of (22) via replacing z with c and d, respectively. Since the copy c n of c in an n-copy polymatroid f can be considered to be the variable z in the n 1 -copy when f is restricted to N { d n z } , inequalities valid for n 1 -copy instances must hold in an n-copy with z replaced by c, and, similarly, when z is replaced by d. This property is confirmed by the computational results. Additionally, all inequalities not containing the variable z proved to be derivatives from the previous generation via the above substitutions. The main goal in Section 7 and Section 8 is to obtain a general description of the triplets α s , β s , γ s occurring in (22).

6.2. Case II

Inequalities in the [ a c b d ] < 0 case have a similar but significantly simpler structure. Facets of the n-copy cone Q 2 from (20) include the coordinate planes, the two Shannon facets ( a , z ) 0 and ( b , z ) 0 as above, and additional facets generating the inequalities
( a , b | z ) + k [ a c b d ] + k Z + k ( k 1 ) 2 C 0 , ( a , b | c ) + k [ a c b d ] + ( k + 1 ) k 2 C 0
for 1 k n for the first, and  1 k n 1 for the second set of inequalities. As noted in the [ a b c d ] < 0 case, inequalities in the second set are instances of ones from the first from the previous generation when z is replaced by c. When z is replaced by d, the resulting inequality
( a , b | d ) + k [ a c b d ] + k D + k ( k 1 ) 2 C 0
is Shannon as [ a c b d ] + D 0 holds in every polymatroid.

7. New Inequalities

In this section we define a set of α , β , γ triplets, and prove that each of them gives rise to a non-Shannon inequality that must hold in polymatroids having an n-copy. These inequalities cover those that were discovered experimentally for n 9 . We conjecture this set to be complete, that is, the applied MEM method yields no additional non-Shannon inequalities; or in other words, if a polymatroid on 5 elements satisfies all these inequalities, then it has an n-copy for all n.

7.1. Case I

For notational convenience, b ( x , y ) , for binomial, denotes the function defined on N × N that satisfies the following recurrent definition for positive integers x and y:
b ( 0 , 0 ) = b ( x , 0 ) = b ( 0 , y ) = 1 ,   and b ( x , y ) = b ( x 1 , y ) + b ( x , y 1 ) .
Clearly, b ( x , y ) = b ( y , x ) = ( x + y x ) . The following summation formulas will be used later.
Lemma 5.
For x , y N the following summation formulas hold:
i x , j y b ( i , j ) = b ( x + 1 , y + 1 ) 1 , i x , j y i b ( i , j ) = x b ( x + 1 , y + 1 ) b ( x , y + 2 ) + 1 .
Proof. 
Induction on x shows that i x b ( i , y ) = b ( x , y + 1 ) , and also that
i x i b ( i , y ) = x b ( x , y + 1 ) b ( x 1 , y + 2 ) .
Following this, induction on y gives the desired results. □
Definition 3.
The set S N × N of pairs of non-negative integers is downward closed if ( i , j ) S implies ( i , j ) S for every non-negative i i and j j . For  n 1 the diagonal set D n  is
D n = def { ( i , j ) N × N : i + j < n } .
Clearly, D n is downward closed.
Definition 4.
For a finite, downward closed set S N × N , define the three-dimensional vector v S as
v S = α S , β S , γ S = ( i , j ) S b ( i , j ) 1 , i , j ,
When S is the empty set, define v = 0 , 0 , 0 .
The diagonal D 1 has a single point, the origin, and the corresponding vector is v D 1 = 1 , 0 , 0 . In general, the diagonal D n has n ( n + 1 ) / 2 points, and the vector associated with D n is
v D n = 2 n 1 , ( n 2 ) 2 n 1 + 1 , ( n 2 ) 2 n 1 + 1 .
The following theorem provides a family of non-Shannon inequalities that covers all inequalities that were found experimentally in Section 6.1.
Theorem 1.
Let f be a polymatroid on the base set a b c d z that has an n-copy over the { c d } { a b } { z } partition. Then, for every downward closed set S D n , f satisfies the inequality
( a , b | z ) + α S [ a b c d ] + Z + β S C + γ S D 0 .
Proof. 
Let f be an n-copy of f on the base set N = { a b z } { c i d i : i n } . Using Claims 2 and 3, we can assume that
(i)
f is isomorphic to f a b c 1 d 1 z , and 
(ii)
f is symmetric for all n ! permutations of the pairs c i d i .
By (i) it suffices to show that f satisfies all inequalities in (30). By (ii), permutationally equivalent subsets of N have the same f -value. Below, c k , etc., stands for k elements chosen from c 1 , , c n . Occasionally, c 1 , d 1 will also be denoted by c and d, and c k + 1 will be written as c c k , letting c = c 1 be one of the chosen elements.
A representative element for the subset A N will be written as
B c k d l ( c d ) m ,
with k + l + m n , where B is a (possibly empty) subset of a b z ; and from the c i d i pairs there are k that intersect A in c i , there are pairs that intersect A in d i , and there are m pairs that intersect A in c i d i . Only non-zero exponents will be presented.
The following inequality is denoted by I ( k , l ) :
[ a b c d ] + Z + k C + l D ( a , b | c k d l z ) + ( a , b | c k + 1 d l z ) + ( a , b | c k d l + 1 z ) .
By Lemma 9 below, this inequality holds for f when k + l < n . Let S D n be a downward closed set, and consider the following combination of the inequalities I ( k , l ) :
( k , l ) S b ( k , l ) I ( k , l ) .
On the left-hand side we have α S many copies of [ a b c d ] and Z , β S many copies of C , and  γ S copies of D . On the right-hand side the only remaining negative term is ( a , b | z ) , all others cancel out as b ( k 1 , l ) + b ( k , l 1 ) = b ( k , l ) . Consequently, inequality (30) holds in f , as claimed. □
The rest of this section is devoted to the proof of Lemma 9 stating that I ( k , l ) holds in f . We start with some simple inequalities about the copy polymatroid f . For ease of reading, we omit the parentheses in addition to the function f .
Lemma 6.
For non-negative integers k, ℓ with k + l < n we have
a c k d l + 1 z a d z + k ( a c a ) + l ( a d a ) , b c k + 1 d l z b c z + k ( b c b ) + l ( b d b ) .
Proof. 
The claims are clearly true for k = l = 0 . Otherwise, use induction on k and using
a c k + 1 d l + 1 z a c k d l z = a c X a X a c a , a c k d l + 2 z a c k d l + 1 z = a d Y a Y a d a ,
for some subsets X and Y of N . The second inequality can be proved similarly. □
Lemma 7.
(i) 
If k + l n then a b c k d l z = a b z + k ( a b c a b ) + l ( a b d a b ) .
(ii) 
If k + l < n then a b ( c d ) c k d l z = a b c d + ( a b z a b ) + k ( a b c a b ) + l ( a b d a b ) .
Proof. 
Both statements follow from the fact that under the given conditions c d , c k , d l and z are conditionally independent over a b . □
Lemma 8.
(i) 
( c d ) c k d l z ( c d ) ( a b z a b ) + k ( a b c a b ) + l ( a b d a b ) .
(ii) 
b d c k z b d ( a b z a b ) + k ( a b c a b ) .
(iii) 
a ( c d ) c k z a ( c d ) ( a b z a b ) + k ( a b c a b ) .
Proof. 
For the first inequality, ( c d ) c k d l z ( c d ) a b ( c d ) c k d l z a b c d by submodularity. From here, apply Lemma 7 to get the required inequality. The other two inequalities can be proved in a similar way. □
Lemma 9.
For non-negative integers k, ℓ with k + l < n the inequality I ( k , l ) holds in f .
Proof. 
Recall that the inequality I ( k , l ) is
[ a b c d ] + Z + k C + l D ( a , b | c k d l z ) + ( a , b | c k + 1 d l z ) + ( a , b | c k d l + 1 z ) .
Write the right-hand side as the sum T 1 + T 2 + T 3 + T 4 where the four terms are
T 1 = c k d l z c k + 1 d l z c k d l + 1 z ,
T 2 = a b c k d l z a b c k + 1 d l z a b c k d l + 1 z ,
T 3 = a c k + 1 d l z a c k d l z + a c k d l + 1 z ,
T 4 = b c k + 1 d l z b c k d l z + b c k d l + 1 z .
We estimate each term separately. For (33) we have
T 1 = ( c , d | c k d l z ) ( c d ) c k d l z .
Here the first term is 0 , and the second term can be bounded using part (i) of Lemma 8. Therefore,
T 1 c d ( a b z a b ) k ( a b c a b ) l ( a b d a b ) .
Using part (i) of Lemma 7, the exact value of (34) can be computed as
T 2 = a b z ( k + 1 ) ( a b c a b ) ( l + 1 ) ( a b d a b ) .
For (35) use a c k + 1 d l z a c k d l z = a c X a X a c a and Lemma 6 to get
T 3 a d z + ( k + 1 ) ( a c a ) + l ( a d a ) .
Finally, to estimate (36) use the similar inequality b c k d l + 1 z b c k d l z b d b and the second statement of Lemma 6 to get
T 4 b c z + k ( b c b ) + ( l + 1 ) ( b d b ) .
The sum of the right-hand sides in the estimates (38)–(41) is
[ a b c d ] + Z + k C + l D ( c , z | b ) ( d , z | a ) .
This amount is ⩽ than the left-hand side of I ( k , l ) , proving the lemma. □

7.2. Case II

The following theorem claims that in the case of [ a c b d ] < 0 inequalities experimentally found in Section 6.2 indeed hold for every n.
Theorem 2.
Let f be a polymatroid on a b c d z that has an n-copy for the partition { c d } { a b } { z } . Then f satisfies the following inequality for every k n :
( a , b | z ) + k [ a c b d ] + k Z + k ( k 1 ) 2 C 0 .
We give two proofs. The first one is similar to the proof of Theorem 1 and uses an inequality mimicking Lemma 9. The second proof is by induction and uses a technique that also recovers some of the inequalities covered in Theorem 1.
Proof 1 of Theorem 2.
The following inequality holds in f for every 0 k < n :
[ a c b d ] + Z + k C ( a , b | c k z ) + ( a , b | c k + 1 z ) .
Summing up this inequality from zero to k 1 gives the claim of Theorem 2, thus it suffices to prove (44). The natural approach of using induction on k does not work. The reason for this is that the inequality
C ( a , b | c k z ) 2 ( a , b | c k + 1 z ) + ( a , b | c k + 2 z ) ,
required by the induction does not hold in general. Instead, we give a more involved reasoning, resembling the proof of Lemma 9. In (44) write c k + 1 as c c k , and let d be the element that forms a pair with this c. Adding ( b , d | c k z ) + ( a , c | c k d z ) to the right-hand side of (44) and rearranging, we obtain the upper bound
( c d c k z c c k z ) + ( a c c k z a c k z ) ( a b c c k z a b c k z ) + + b c c k z + a d c k z b d c k z a c d c k z .
Each of the seven terms is estimated separately as follows.
c d c k z c c k z c d c ,
a c c k z a c k z a c a ,
a b c c k z a b c k z = a b c a b ,
b c c k z b c z + k ( b c b ) ,
a d c k z a d z + k ( a c a ) ,
b d c k z b d k ( a b c a b ) ( a b z a b ) ,
a c d c k z a c d k ( a b c a b ) ( a b z a b ) .
Equations (47) and (48) follow from submodularity. Equation (49) expresses that c and c k z are independent over a b . Equations (50) and (51) are in Lemma 6, while (52) and (53) are from Lemma 8. The sum of the right-hand sides of (47)–(53) is
[ a c b d ] + Z + k C ( c , z | b ) ( d , z | a ) ,
proving (44). □
To describe the technique used in the second proof of Theorem 2, let E n denote the collection of all linear five-variable inequalities that are valid in every polymatroid on a b c d z that has an n-copy. Let f be such an n-copy. Having n instances of c d , one of the c i d i pairs can be singled out, and one of its elements can be renamed z . Restricting f to these elements is an n 1 -copy of a b c d z since z and the remaining n 1 pairs are independent over a b . Therefore, a b c d z satisfies the inequalities in E n 1 . Let E ( a , b , c , d , z ) E n 1 be such an inequality, marking the base elements explicitly. Then we have E ( a , b , c , d , c ) E n , and also E ( a , b , c , d , d ) E n . This fact has been observed and used in Section 6 to explain the coefficients in the obtained inequalities that do not contain the variable z.
Similarly to the above, the pairs { ( c i z , d i z ) : i n } are isomorphic and are independent over the pair ( a z , b z ) . Therefore, they form an n 1 -copy of the polymatroid with base elements { a z , b z , c z , d z , c z } . This means that the inequality E ( a z , b z , c z , d z , c z ) is also in E n , and so is the inequality with d z in the last position. For non-negative integers α and β denote the following inequality by J ( α , β ) :
J ( α , β ) = def ( a , b | z ) + α [ a c b d ] + α Z + β C 0 .
Lemma 10.
Suppose J ( α , β ) E n 1 . Then J ( α + 1 , α + β ) E n .
Proof. 
The following Shannon inequalities hold in every polymatroid:
( a , b | z ) + [ a c b d ] + Z ( a z , b z | c z ) 3 ( c d , z | a b ) , [ a c b d ] + Z [ a z , c z , b z , d z ] 3 ( c d , z | a b ) , ( a , c | b ) ( a z , c z | b z ) ( c , z | a b ) , ( b , c | a ) ( b z , c z | a z ) ( c , z | a b ) .
Since c d and z are conditionally independent over a b in f , the last terms are zero. Taking the first inequality once, the second one α times, and the last two ( α + β ) times, the sum of the left-hand sides is J ( α + 1 , α + β ) , while the right-hand side is just J ( α , β ) for the ( a z , b z , c z , d z , c z ) base. Since this inequality is in E n 1 by assumption, we have J ( α + 1 , α + β ) E n , as claimed. □
Proof 2 of Theorem 2.
The inequality to be proved is J ( k , ( k 2 ) ) . Use induction on k. For k = 0 it is a Shannon inequality, thus it holds in every polymatroid. For other values of k, Lemma 10 says that J ( k , ( k 2 ) ) E n 1 implies J ( k + 1 , ( k + 1 2 ) ) E n , concluding the induction step. □
Lemma 10 remains valid if in the definition of J ( α , β ) the Ingleton expression [ a c b d ] is replaced by [ a b c d ] . Consequently, some, but not all, of the inequalities covered in Theorem 1 can be obtained by similar inductive reasoning.

8. The Minimal Set of Inequalities

Experimental results reported in Section 5 and discussed in Section 6 provided the complete list of five-variable non-Shannon entropy inequalities implied by the existence of an n-copy for n 9 . Two families of non-Shannon inequalities, generalizing the ones found experimentally, were proven, in Theorem 1 and Theorem 2, respectively, to hold in every polymatroid with an n-copy. We conjecture that these families actually characterize those five-variable polymatroids that have an n-copy, so no further non-Shannon inequalities can be discovered by the version of the Maximum Entropy Method utilized in this paper.
In the [ a c b d ] < 0 case the family of non-Shannon inequalities provided by Theorem 2 matches exactly the inequalities obtained experimentally for n 9 .
In the [ a b c d ] < 0 case the family provided by Theorem 1 is parametrized by the downward closed subsets S of the diagonal set D n N × N . Not all of the generated inequalities correspond to facets of the cone Q 1 . While they are valid non-Shannon inequalities, some of them are consequences of others. Table 4 shows the downward closed subsets of D 3 as well as the corresponding v S = α S , β S , γ S triplets from Definition 4. Two triplets, marked by ∗, are not in Table 3. The corresponding inequality
( a , b | z ) + α [ a b c d ] + Z + β C + γ D 0
with α = 5 and β = γ = 3 is the average of the inequalities obtained from the triplets numbered 6, 10 and 13; thus, it is a consequence of them. The main goal of this Section is to obtain a description of those downward closed subsets of D n that generate facets of Q 1 , that is, inequalities that are not consequences of the others.
Since the inequality (56) contains the fixed term ( a , b | z ) , and trivially holds true when [ a b c d ] + Z 0 , it is a consequence of the inequalities obtained from the triplets { α i , β i , γ i : i I } if there is a convex combination
α , β , γ = i I λ i α i , β i , γ i , with   λ i 0 ,   and   i I λ i = 1 ,
such that α α , β β , and  γ γ . In this case we say that α , β , γ is superseded by the set { α i , β i , γ i : i I } . If  v S = α S , β S , γ S is not superseded by other elements of this family, then v S is called extremal. Actually, by the above observation, extremal vectors are the vertices of the convex hull of the set of triplets v S as S runs over the downward closed subset of D n . By Carathéodory’s theorem, see [28], v S is superseded if and only if it is (also) superseded by a set with at most three elements.
Lemma 11 below gives a necessary and sufficient condition for the vector v S to be superseded by a special three-element set. For a subset S of N × N we write S + ( i , j ) for adding the point ( i , j ) to S, and  S ( i , j ) to remove ( i , j ) from S. In the first case it is tacitly assumed that ( i , j ) is not in S, and in the second case that ( i , j ) S .
Lemma 11.
Let i 1 < i 2 < i 3 , and  j 1 > j 2 > j 3 .
(i) 
v S is superseded by the vectors { v S + ( i 1 , j 1 ) , v S ( i 2 , j 2 ) , v S + ( i 3 , j 3 ) } if and only if
j 2 j 3 i 3 i 2 j 1 j 3 i 3 i 1 .
(ii) 
v S is superseded by { v S ( i 1 , j 1 ) , v S + ( i 2 , j 2 ) , v S ( i 3 , j 3 ) } if and only if
j 2 j 3 i 3 i 2 j 1 j 3 i 3 i 1 .
Proof. 
We prove (i) only, (ii) is similar. Let b 1 = b ( i 1 , j 1 ) , b 2 = b ( i 2 , j 2 ) and b 3 = b ( i 3 , j 3 ) . Then, according to Definition 4,
v S + ( i 1 , j 1 ) = α S + b 1 , β S + i 1 b 1 , γ S + j 1 b 1 , v S ( i 2 , j 2 ) = α S b 2 , β S i 2 b 2 , γ S j 2 b 2 , v S + ( i 3 , j 3 ) = α S + b 3 , β S + i 3 b 3 , γ S + j 3 b 3 .
v S = α S , β S , γ S is superseded by these vectors if there are non-negative numbers λ 1 , λ 2 , λ 3 with λ 1 + λ 2 + λ 3 = 1 such that
α S λ 1 ( α S + b 1 ) + λ 2 ( α S b 2 ) + λ 3 ( α S + b 3 ) , β S λ 1 ( β S + i 1 b 1 ) + λ 2 ( β S i 2 b 2 ) + λ 3 ( β S + i 3 b 3 ) , γ S λ 1 ( γ S + j 1 b 1 ) + λ 2 ( γ S j 2 b 2 ) + λ 3 ( γ S + j 3 b 3 ) .
Since the sum of the λ i ’s is 1, this system is equivalent to
λ 2 b 2 λ 1 b 1 + λ 3 b 3 , i 2 λ 2 b 2 i 1 λ 1 b 1 + i 3 λ 3 b 3 , j 2 λ 2 b 2 j 1 λ 1 b 1 + j 3 λ 3 b 3 .
Clearly, λ 2 must be strictly positive as b 1 , b 2 , and  b 3 are positive. Introducing μ 1 = ( λ 1 b 1 ) / ( λ 2 b 2 ) and μ 3 = ( λ 3 b 3 ) / ( λ 2 b 2 ) , this system is equivalent to
1 μ 1 + μ 3 , i 2 i 1 μ 1 + i 3 μ 3 j 2 j 1 μ 1 + j 3 μ 3 .
One can assume that the first inequality holds with an equality. Since i 1 < i 2 < i 3 , the second inequality holds when i 2 is above the point which splits the interval [ i 1 , i 3 ] in ratio μ 3 to μ 1 . Similarly, j 1 < j 2 < j 3 implies that the third inequality holds when j 2 is below the point that splits [ j 1 , j 3 ] in the same ratio. Thus, non-negative numbers μ 1 and μ 3 satisfying these three inequalities exist if and only if the proportion of [ i 2 , i 3 ] in [ i 1 , i 3 ] is not larger than the proportion of [ j 2 , j 3 ] in the interval [ j 1 , j 3 ] , that is,
( j 3 ) ( j 2 ) ( j 3 ) ( j 1 ) i 3 i 2 i 3 i 1 .
This condition is equivalent to the one given in the claim. □
Corollary 1.
Let i 1 < i 2 < i 3 , and  j 1 > j 2 > j 3 . Assume both ( i k , j k ) and ( i k + 1 , j k 1 ) are in S, while ( i k , j k + 1 ) S and ( i k + 1 , j k ) S for k = 1 , 2 , 3 . When j 3 = 0 the condition with negative values is assumed to hold. If  ( j 1 j 3 ) / ( i 3 i 1 ) is not in the open interval
j 2 j 3 ( i 3 i 2 ) + 1 , j 2 j 3 ( i 3 i 2 ) 1 ,
then v S is superseded by vectors generated by one of the following two triplets:
{ S + ( i 1 + 1 , j 1 ) , S ( i 2 , j 2 ) , S + ( i 3 + 1 , j 3 ) } , { S ( i 1 , j 1 ) , S + ( i 2 + 1 , j 2 ) , S ( i 3 , j 3 ) } .
Proof. 
If the slope ( j 1 j 3 ) / ( i 3 i 1 ) is less than, or equal to the lower limit, then part (i) of Lemma 11 applies to the first triplet. When the slope is at or above the upper limit, then part (ii) of that Lemma applies to the second triplet. □
A downward closed set S N × N can be specified in two ways. Either by a non-increasing sequence S col = ( c 0 , c 1 , , c k ) specifying the maximal values in columns 0, …, k, or by a non-increasing sequence S row = ( r 0 , r 1 , , r l ) specifying the maximal values in rows 0, …, . It is easy to see that
( x , y ) S 0 y c x 0 x r y .
Corollary 2.
If the vector v S for some S D n is not superseded by other vectors generated by subsets of D n , then either the sequence S col is strictly decreasing, or the sequence S row is strictly decreasing.
Proof. 
If S col is not strictly decreasing, then the upper bound of S contains a horizontal segment length of at least 2. Similarly, if  S row is not strictly decreasing, then the right bound of S contains a vertical segment of length at least 2, see Figure 1. Take such a horizontal and a vertical segments whose distance is minimal. Let the horizontal segment be in row r between columns c 1 and c 2 , and the vertical segment be in column c between rows r 1 and r 2 . The horizontal and vertical segments are connected by (a possibly empty) diagonal staircase. Depending on which segment comes first, there are two possible arrangements as depicted on Figure 1.
In the first case c 1 < c 2 < c , and  r 1 < r 2 < r ; in the second case c < c 1 < c 2 and r < r 1 < r 2 . Apply Lemma 11 to the marked points and observe that the modified downward closed sets are always subsets of D n . In the first case
c 2 c 1 ( r + 1 ) r 1 c + 1 c 2 r 1 r ,
and in the second case
( c + 1 ) c ) r 1 r 2 1 c 2 ( c + 1 ) r r 1 .
Therefore, by Lemma 11, v S is superseded by the vectors generated by the indicated sets, proving the claim. □
By Corollary 2, the downward closed set corresponding to an extremal vertex is either a staircase with step heights 1 (when S row is strictly decreasing), which we call horizontal, or the mirror image of such a staircase. The only configuration that belongs to both cases is the diagonal D n . It will be more convenient to use the column-sequence ( c 0 , c 1 , , c k ) to represent horizontal staircases. Here k 0 is the length of the staircase, also denoted by len ( S ) . The last column size (height) is necessarily c k = 0 , and  c i equals either c i + 1 or c i + 1 + 1 for every 0 i < k . In the rest of this section, all staircases, if not mentioned otherwise, are horizontal ones.
Definition 5.
The staircase S is Positive-Negative-Positive (PNP)-reducible in D n if there are i 1 < i 2 < i 3 and j 1 > j 2 > j 3 such that S 1 = S + ( i 1 , j 1 ) , S 2 = S ( i 2 , j 2 ) , and  S 3 = S + ( i 3 , j 3 ) are staircases in D n and v S is superseded by { v S 1 , v S 2 , v S 3 } . S is PNP-irreducible if it is not PNP-reducible.
Negative-Positive-Negative (NPN)-reducibility and NPN-irreducibility is defined analogously, using staircases S ( i 1 , j 1 ) , S + ( i 2 , j 2 ) , and  S ( i 3 , j 3 ) , assuming that they are also subsets of D n . S is irreducible in D n  if it is both PNP- and NPN-irreducible. Finally, let S n be the collection of the irreducible staircases that are subsets of D n .
By the remark at the beginning of this section, by Lemma 11, and by Corollary 2, extremal vertices are generated by elements of S n and by their mirror images. We describe an incremental algorithm that generates the elements of the collection S n .
A horizontal staircase S of length n can be recovered from a unique horizontal staircase S of length n 1 as follows. If  S has the column sequence ( c 0 , c 1 , , c n 1 ) , then S is defined by one of the column sequences
( c 0 , c 1 , , c n 1 , 0 )   or ( c 0 + 1 , c 1 + 1 , , c n 1 + 1 , 0 ) ,
depending on whether the last two elements of the column sequence of S are equal.
Claim 6.
(i) 
Suppose S has length n. S is irreducible in D n + 1 if and only if it is irreducible in D m for any m n + 1 .
(ii) 
If len ( S ) = n and S is irreducible in D n , then S is irreducible in D n 1 .
(iii) 
If S S n but S S n + 1 , then len ( S ) = n and S is PNP-reducible in D n + 1 with i 3 = n + 1 and j 3 = 0 .
(iv) 
If S S n 1 and S S n , then either S is NPN-reducible with i 3 = n and j 3 = 0 , or it is PNP-reducible with i 3 = n 1 and j 3 = 1 .
Proof. 
(i) is immediate from the definition as the staircases S ± ( i , j ) must be subsets of D n + 1 .
(ii) Assume S is reducible in D n 1 shown by the staircases S 1 , S 2 and S 3 . Since they are in D n 1 , they can be lifted back to S 1 , S 2 , S 3 in D n . According to Lemma 11 these staircases witness the reducibility of S.
(iii) If S is reducible in D n + 1 but not in D n , then S + ( i 3 , j 3 ) is not in D n , leading to the stated condition.
(iv) If S is reducible in D n while S is not reducible in D n 1 , then the reduction must use ( i 3 , j 3 ) , which is in D n but not in D n 1 . If it is an NPN-reduction then it must use the newly added point ( n , 0 ) ; in other cases the reduction can be shifted back to S . In the case of a PNP-reduction this additional point is ( n 1 , 1 ) (when extending the staircase by a column of height zero), or can be shifted back to S again. □
Based on Claim 6, the incremental algorithm, sketched as Algorithm 1, generates all horizontal irreducible staircases. The PNP- and NPN-irreducibility can be checked based on Lemma 11. The last point ( i 3 , j 3 ) is fixed, and the naïve implementation requires quadratic running time in len ( S ) . With some simple bookkeeping it can be reduced to a backward scanning of the column sequence, resulting in linear running time.
Using the algorithm we have computed the complete set of irreducible staircases up to n = 60 . The number of new staircases that remained irreducible in each subsequent generation matches the sequence A103116 in the Encyclopedia of Integer Sequences [39]:
remains n = i n ( n i + 1 ) φ ( i ) ,
where φ is Euler’s totient function, which suggests that the connection is based on the number of different slopes determined by the lattice points in a rectangle. Proving the equivalence of these two sequences is an intriguing open problem.
For better visualization, triplets α S , β S , γ S corresponding to these irreducible staircases are plotted as the three-dimensional points β / α , γ / α , α using logarithmic scale for the third α coordinate. The plot in Figure 2 contains all 126,981 extremal triplets in the range β , γ 20 α . Some of the plotted triplets appear as late as generation n = 80 ; later generations do not contribute to this part of the complete set. For comparison, some triplets in the 80-th generation have values larger than 2 85 .
Algorithm 1: Generating irreducible staircases
Computation 14 00042 i001
To explain the shape of the surface of extremal triplets plotted in Figure 2, we provide some heuristic reasoning. A consequence of Corollary 1 is that if the extremal triplet v S is computed from the staircase S, then the slopes determined by the step edges ( i , j ) S (namely, points of S where neither ( i + 1 , j ) nor ( i , j + 1 ) are in S) are almost equal. Consequently, on a large scale, extremal v S vectors are generated by the set of lattice points in right-angled triangles defined by the inequality
S ( a , b ) = { ( x , y ) N × N : x a + y b 1 }
for some positive values of a and b. Since, by Lemma 5,
i x , j y b ( i , j ) b ( x + 1 , y + 1 ) ,   and i x , j y i b ( i , j ) x b ( x + 1 , y + 1 ) ,
the vector v S ( a , b ) is well approximated by b ( x + 1 , y + 1 ) 1 , x , y , where ( x , y ) S ( a , b ) is the point where b ( x + 1 , y + 1 ) takes its maximal value. As the function b ( x , y ) strictly increases in both coordinates, this maximum is taken on the boundary diagonal of the right-angled triangle S ( a , b ) that has endpoints ( 0 , b ) and ( a , 0 ) . Using the Stirling formula n ! 2 π n ( n / e ) n , we have
b ( x , y ) = ( x + y ) ! x ! y ! 1 2 π ( x + y ) x + y + 1 / 2 x x + 1 / 2 y y + 1 / 2 .
Introducing φ ( x ) = ( x + 1 2 ) log x , we see that the logarithm of b ( x , y ) is well approximated by the function
θ ( x , y ) = φ ( x + y ) φ ( x ) φ ( y ) log 2 π .
Using this approximation, the point ( u , v ) is extremal in the triangle S ( a , b ) if ( u , v ) is on the boundary diagonal and θ ( u + 1 , v + 1 ) has zero derivative along this diagonal. For fixed u and v such a positive a and b exist just in case the partial derivatives θ x and θ y are positive at ( u + 1 , v + 1 ) . By inspection, this condition is satisfied for every ( u , v ) . Consequently, if α , β , γ is an extremal triplet, then choosing u = β / α , v = γ / α , we expect
log α θ ( u + 1 , v + 1 ) ,
and, conversely, for each u, v, with the choice log α = θ ( u + 1 , v + 1 ) , β = u α , and γ = v α , we expect the triplet α , β , γ to be extremal. For comparison, Figure 3 plots these triplets over the same range that was used in Figure 2.
This approximation seems to slightly underestimate the real value of log α . For example, the extremal triplet obtained from the diagonal staircase D n + 1 is
α = 2 n + 1 1 , β = ( n 1 ) 2 n + 1 , γ = ( n 1 ) 2 n + 1 ,
thus, u = v = β / α ( n 1 ) / 2 , and log α ( n + 1 ) log 2 . At the same time,
θ ( u + 1 , v + 1 ) = ( n + 1 ) log 2 log n + 1 + O ( 1 ) .
Extremal triplets on the two edges of the surface are specified by the totally flat, stairless staircases. These triplets are
α = n , β = n ( n 1 ) / 2 , γ = 0
on one axis, and β , γ swapped on the other. In this case, the ( u , v ) pair is ( ( n 1 ) / 2 , 0 ) , and
θ ( u + 1 , v + 1 ) = log n + 1 log 8 π + O ( 1 / n ) ,
which differs from the correct value by a constant only.
We have also looked at how the newly discovered entropy inequalities delimit the 5-variable entropy region. The triplet α S , β S , γ S yields the inequality
( a , b | z ) + α S [ a b c d ] + Z + β S C + γ S D 0 .
Since the closure of the 5-variable entropy region is a pointed convex cone, one can normalize it by assuming ( a , b | z ) = 1 . An equivalent view is to take the cross-section of Γ 5 ¯ by this hyperplane. Consider the three-dimensional subspace spanned by the vectors
x = def C = ( a , c | b ) + ( b , c | a ) , y = def D = ( a , d | b ) + ( b , d | a ) , z = def [ a b c d ] + Z = [ a b c d ] + ( a , z | b ) + ( b , z | a ) ;
observe that z is negated. Normalize the five-variable entropic function f so that it satisfies f ( a , b | z ) = 1 , then project it to this subspace. Use the scalar products f · x , f · y , f · z as the projection coordinates. This three-dimensional cross-section of the five-variable entropy region is
Δ = f · x , f · y , f · z R 3 : f Γ 5 ¯   such   that   f ( a , b | z ) = 1 .
Clearly, points in Δ have non-negative x and y coordinates, while the z coordinate can take both positive and negative values. Since Γ 5 ¯ is a closed convex cone, Δ is closed and convex. We concentrate on the part above the x y plane:
Δ + = { x , y , z Δ : z 0 } .
Shannon inequalities provide no restriction whatsoever on Δ + as any non-negative coordinate triplet can be realized by some polymatroid. To show this, define r I for any subset I of the ground set a b c d z as
r I : A 1 if   I A , 0 otherwise .
Let, moreover, r be the function
r : A 2 if   | A | = 1 , 4 if   | A | 3 ,   or A   is   one   of   c d , c z , d z , 3 otherwise ,
as A runs over the non-empty subsets of a b c d z . Both r I and r are extremal rays of Γ 5 , so they are polymatroids. For arbitrary non-negative numbers x ,   y ,   z the linear combination f = r a b c d + x r a c + y r a d + z r satisfies f ( a , b | z ) = 1 and has coordinates x , y , z , providing the required polymatroid.
Points of Δ + with non-negative x and y coordinates and z = 0 are realized by linear polymatroids; thus, the complete non-negative quadrant of the x y plane is a part of Δ + . Our first non-Shannon inequality, generated by the triplet S D 1 = 1 , 0 , 0 , is
( a , b | z ) + [ a b c d ] + Z 0 .
This inequality immediately limits the region Δ + to z 1 ; therefore, points in Δ + have a height at most 1.
Other extremal triplets provide additional linear constraints. Figure 4 illustrates the delimited part of the non-negative octant as viewed from the origin, and cut at x 2.5 and at y 2.5 . The pictured bound of Δ + is extended to larger values of x and y. Along the x and y axes, this bound approaches the x z and y z coordinate planes as the functions z = y and z = x , respectively. Along the x y diagonal, the limiting behavior toward the z axis is similar to the entropy function z = ( x + y ) log ( x + y ) . The corner point of the plateau z = 1 has coordinates 1 , 1 , 1 . The θ ( u + 1 , v + 1 ) estimate gives a smooth bound on Δ + , which is asymptotically tight along the x and y axes.

9. Conclusions

Structural properties of the entropy region of four or more variables are mostly unknown. This region is bounded by linear inequalities corresponding to the non-negativity of Shannon information measures. Finding additional entropy inequalities is, and remains, an intriguing open problem. Previous works on generating and applying such non-Shannon entropy inequalities focused mainly on the four-variable case [4,10,14,15], and only a few sporadic five-variable non-Shannon inequalities have been discovered [18]. This work provides infinitely many five-variable non-Shannon information inequalities by systematically exploring a special property of entropic vectors. Other works utilized the Copy Lemma, a method distilled from the original Zhang–Yeung construction by Dougherty et al. [14]. Our method is based on a different paradigm derived from the principle of maximum entropy and is a special case of the Maximum Entropy Method described in [20]. As proven in Lemma 1, the principle of maximum entropy implies that every entropic polymatroid has an n , m -copy, which is a polymatroidal extension with special properties as defined in Definition 1. In Claim 1, we have proved that polymatroids having n , m -copies form a polyhedral cone and hint at how its facets can be computed. Facet equations provide the potentially new non-Shannon entropy inequalities.
While the polyhedral computation presented in Claim 1 is numerically intractable even for small parameter values, the theoretical results of Section 4 allowed us to reduce this complexity significantly. Computational aspects of determining the facets of a high-dimensional cone are closely related to linear multi-objective optimization [22]. We have developed a specially tailored variant of Benson’s inner approximation algorithm [22,38], which takes advantage of the special properties of this enumeration problem. Computational results are reported in Section 5 for generations n 9 . Numerical instability, originating from both the underlying LP solver and the polyhedral algorithm, prevented the completion of the computation for larger values of n.
Non-Shannon inequalities obtained from these computations are discussed in Section 6. Based on these experimental results, two infinite families of five-variable inequalities were defined. The first family in Theorem 1 is parametrized by downward closed subsets of non-negative lattice points. The second family in Theorem 2 has a single positive integer parameter. Inequalities in both families are proved to hold for polymatroids on five elements that have an n-copy; consequently, they are all valid entropy inequalities. It is conjectured that they cover all inequalities that can be obtained by the applied method. In other words, if a polymatroid on five elements satisfies all these inequalities, then it has an n-copy for all n. This conjecture is left as an open problem. The computational results confirmed this conjecture up to n = 9 .
Inequalities in the first family are investigated in Section 8 in more detail. They are specified by triplets α x , β S , γ S determined by downward closed sets S of nonnegative lattice points as discussed in Definition 4. Such a triplet is extremal if the corresponding inequality is not a consequence of other inequalities from the same family. Extremal triplets are determined by a special collection of downward closed sets called irreducible staircases. Based on the theoretical results in Corollary 2 and Claim 6, an incremental algorithm, sketched as Algorithm 1, was used to generate irreducible staircases up to generation 60. The converse implication, valid for the computed cases, that triplets generated by irreducible staircases are extremal, is left as an open problem. Triplets ( α S , β S , γ S in the range β S , γ S 20 α S , generated by irreducible staircases, are plotted in Figure 2. The number of new irreducible staircases that remained irreducible in the subsequent generation matches the sequence A103116 in the Encyclopedia of Integer Sequences [39]. It is an interesting open problem to prove the equality of these sequences.
To illustrate how the newly discovered entropy inequalities delimit the five-variable entropy region, entropy vectors were normalized to satisfy ( a , b | z ) = 1 and projected onto a three-dimensional subspace. Part of the projection in the non-negative octant is denoted by Δ + . The Shannon inequalities do not provide any restriction on this part. Figure 4 illustrates the bounds implied by the new inequalities. While the non-negative quadrant of the x y plane is known to be part of Δ + , and that it also contains points above that plane, it is an intriguing open problem whether our bound is, at least asymptotically, tight around the x and y axes. Showing that our bound is asymptotically tight at the zero point would amount to settling the long-standing open problem of whether the entropic region is semi-algebraic.

Author Contributions

Conceptualization, L.C. and E.P.C.; methodology, L.C. and E.P.C.; software, L.C. and E.P.C.; validation, L.C. and E.P.C.; formal analysis, L.C. and E.P.C.; investigation, L.C. and E.P.C.; resources, L.C. and E.P.C.; data curation, L.C. and E.P.C.; writing—original draft preparation, L.C. and E.P.C.; writing—review and editing, L.C. and E.P.C.; visualization, L.C. and E.P.C.; supervision, L.C. and E.P.C.; funding acquisition, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported in this paper was partially supported by the ERC Advanced Grant ERMiD.

Data Availability Statement

The data presented in this study are openly available in GitHub at https://github.com/csirmaz/information-inequalities-5, accessed on 28 December 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Csiszár, I.; Körner, J. Information Theory: Coding Theorems of Discrete Memoryless Systems; Akademia Kiado: New York, NY, USA; Budapest, Hungary, 1981. [Google Scholar]
  2. Yeung, R.W. Information Theory and Network Coding; Springer: New York, NY, USA, 2008. [Google Scholar]
  3. Beimel, A. Secret-sharing schemes: A survey. In Coding and Cryptology; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6639, pp. 11–46. [Google Scholar]
  4. Beimel, A.; Orlov, I. Secret Sharing and Non-Shannon Information Inequalities. IEEE Trans. Inf. Theory 2011, 57, 5634–5649. [Google Scholar]
  5. Gürpınar, E.; Romashchenko, A. How to Use Undiscovered Information Inequalities: Direct Applications of the Copy Lemma. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Paris, France, 8–12 July 2019; pp. 1377–1381. [Google Scholar]
  6. Bamiloshin, M.; Ben-Efraim, A.; Farràs, O.; Padró, C. Common information, matroid representation, and secret sharing for matroid ports. Des. Codes Cryptogr. 2021, 89, 143–166. [Google Scholar]
  7. Martin, J.; Rombach, P. Guessing Numbers and Extremal Graph Theory. Electron. J. Comb. 2022, 29, P2.58. [Google Scholar] [CrossRef]
  8. Groth, J.; Ostrovsky, R. Cryptography in the Multi-string Model. J. Cryptol. 2014, 27, 506–543. [Google Scholar] [CrossRef]
  9. Madiman, M.; Marcus, A.W.; Tetali, P. Information-theoretic inequalities in additive combinatorics. In Proceedings of the IEEE Information Theory Workshop, Dublin, Ireland, 30 August–3 September 2010; pp. 1–4. [Google Scholar]
  10. Sudeny, M. Conditional independence structures over four discrete random variables revisited. IEEE Trans. Inform. Theory 2021, 67, 7030–7049. [Google Scholar] [CrossRef]
  11. Yeung, R.W. A First Course in Information Theory; Kluwer Academic/Plenum Publishers: New York, NY, USA, 2002. [Google Scholar]
  12. Pippenger, N. What are the laws of information theory. In 1986 Special Problems on Communication and Computation Conference, Proceedings of the Tenth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Prague, Czech Republic, 7–11 July 1986; Springer: Palo Alto, CA, USA, 1986. [Google Scholar]
  13. Zhang, Z.; Yeung, R.W. On characterization of entropy function via information inequalities. IEEE Trans. Inform. Theory 1998, 44, 1440–1452. [Google Scholar]
  14. Dougherty, R.; Freiling, C.; Zeger, K. Non-Shannon information inequalities in four random variables. arXiv 2011, arXiv:1104.3602. [Google Scholar] [CrossRef]
  15. Csirmaz, L. Book inequalities. IEEE Trans. Inf. Theory 2014, 60, 6811–6818. [Google Scholar]
  16. Matúš, F. Infinitely many information inequalities. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 41–44. [Google Scholar]
  17. Ahlswede, R.; Gács, P.; Körner, J. Bounds on conditional probabilities with applications in multi-use communication. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 1976, 34, 157–177. [Google Scholar]
  18. Makarychev, K.; Makarychev, Y.; Romashchenko, A.; Vereshchagin, N. A new class of non-Shannon-type inequalities for entropies. Commun. Inf. Syst. 2002, 2, 147–166. [Google Scholar]
  19. Kaced, T. Equivalence of two proof techniques for non-Shannon-type inequalities. In Proceedings of the IEEE International Symposium on Information Theory, Istanbul, Turkey, 7–12 July 2013; pp. 236–240. [Google Scholar]
  20. Csirmaz, L. Exploring the entropic region. arXiv 2025, arXiv:2509.12439. [Google Scholar] [CrossRef]
  21. Guiasu, S.; Shenitzer, A. The principle of maximum entropy. Math. Intell. 1985, 7, 42–48. [Google Scholar]
  22. Csirmaz, L. Inner approximation algorithm for solving linear multiobjective optimization problems. Optimization 2020, 70, 1487–1511. [Google Scholar] [CrossRef]
  23. Matúš, F.; Csirmaz, L. Entropy region and convolution. IEEE Trans. Inform. Theory 2016, 62, 6007–6018. [Google Scholar]
  24. Matúš, F.; Studený, M. Conditional Independences among Four Random Variables I. Comb. Probab. Comput. 1995, 4, 269–278. [Google Scholar] [CrossRef]
  25. Studeny, M.; Bouckaert, R.R.; Kocka, T. Extreme Supermodular Set Functions over Five Variables; Research Report N. 1977; Institute of Information Theory and Automation: Prague, Czech Republic, 2000. [Google Scholar]
  26. Mazumdar, S.; Seybold, D.; Kritikos, K.; Verginadis, Y. A survey on data storage and placement methodologies for Cloud-Big Data ecosystem. J. Big Data 2019, 6, 15. [Google Scholar] [CrossRef]
  27. Huber, M. An introduction to causal discovery. Swiss J. Econ. Stat. 2024, 160, 14. [Google Scholar] [CrossRef]
  28. Ziegler, G.M. Lectures on Polytopes; Graduate Texts in Mathematics; Springer: Berlin/Heidelberg, Germany, 1994; Volume 152. [Google Scholar]
  29. Matúš, F. Adhesivity of polymatroids. Discret. Math. 2007, 307, 2464–2477. [Google Scholar][Green Version]
  30. Bell, J.; Funk, D.; Kim, D.D.; Mayhew, D. Effective Versions of Two Theorems of Rado. Q. J. Math. 2020, 71, 599–618. [Google Scholar] [CrossRef]
  31. Dougherty, R.; Freiling, C.; Zeger, K. Linear rank inequalities on five or more variables. arXiv 2010, arXiv:0910.0284. [Google Scholar] [CrossRef]
  32. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]
  33. Csirmaz, L. One-adhesive polymatroids. Kybernetika 2020, 56, 886–902. [Google Scholar] [CrossRef]
  34. Matúš, F. Two constructions on limits of entropy functions. IEEE Trans. Inform. Theory 2007, 53, 320–330. [Google Scholar]
  35. Chan, T.H. Balanced information inequalities. IEEE Trans. Inform. Theory 2003, 49, 3261–3267. [Google Scholar]
  36. Csirmaz, E.P.; Csirmaz, L. Enumerating Extremal Submodular Functions for n = 6. Mathematics 2024, 13, 97. [Google Scholar] [CrossRef]
  37. Ehrgott, M.; Löhne, A.; Shao, L. A dual variant of Benson’s ‘outer approximation algorithm’ for multiple objective linear programming. J. Glob. Optim. 2012, 52, 757–778. [Google Scholar] [CrossRef]
  38. Löhne, A.; Weißing, B. The vector linear program solver Bensolve—notes on theoretical background. Eur. J. Oper. Res. 2017, 260, 807–813. [Google Scholar]
  39. OEIS Foundation Inc. The On-Line Encyclopedia of Integer Sequences; OEIS Foundation Inc.: Highland Park, NJ, USA, 2019; Available online: https://oeis.org/A103116 (accessed on 28 December 2025).
Figure 1. Closest horizontal and vertical boundary segments of the gray region are marked by the arrows. Red circles indicate points to be added and subtracted.
Figure 1. Closest horizontal and vertical boundary segments of the gray region are marked by the arrows. Red circles indicate points to be added and subtracted.
Computation 14 00042 g001
Figure 2. Extremal configurations. Colors indicate the α value.
Figure 2. Extremal configurations. Colors indicate the α value.
Computation 14 00042 g002
Figure 3. The θ ( u + 1 , v + 1 ) function. Colors indicate the function value.
Figure 3. The θ ( u + 1 , v + 1 ) function. Colors indicate the function value.
Computation 14 00042 g003
Figure 4. Delimiting the five-variable entropy region. Entropic points are on or below the indicated surface.
Figure 4. Delimiting the five-variable entropy region. Entropic points are on or below the indicated surface.
Computation 14 00042 g004
Table 1. Results for the case [ a b c d ] < 0 .
Table 1. Results for the case [ a b c d ] < 0 .
nRowsColumnsFacetsVerticesTime (s)
1762316190.01
22845321430.03
3706101341550.35
41416171636753.54
52488267120217138.25
6399639322162755:24
7601453338614,52336:45
8861675163531,3792:59:17
911,876991100061,62713:13:45
Table 2. Results for the case [ a c b d ] < 0 .
Table 2. Results for the case [ a c b d ] < 0 .
nRowsColumnsFacetsVerticesTime (s)
1762316190.00
22845318250.03
370610120350.14
4141617122490.65
5248826724672.36
6399639326897.37
760145332811532.21
88616751301451:12
911,876991321795:01
Table 3. Coefficient values for n = 3 .
Table 3. Coefficient values for n = 3 .
α , β , γ α , β , γ α , β , γ
1 , 0 , 0 3 , 0 , 3 6 , 3 , 5
2 , 0 , 1 3 , 3 , 0 6 , 5 , 3
2 , 1 , 0 4 , 1 , 3 7 , 5 , 5
3 , 1 , 1 4 , 3 , 1
Table 4. Downward closed subsets of D 3 and the corresponding triplets. Triplets marked by * are consequences of the others.
Table 4. Downward closed subsets of D 3 and the corresponding triplets. Triplets marked by * are consequences of the others.
Computation 14 00042 i002
1 1 , 0 , 0
2 2 , 1 , 0
3 3 , 3 , 0
4 2 , 0 , 1
5 3 , 1 , 1
6 4 , 3 , 1
* 7 5 , 3 , 3
8 6 , 5 , 3
9 3 , 0 , 3
10 4 , 1 , 3
* 11 5 , 3 , 3
12 6 , 3 , 5
13 7 , 5 , 5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Csirmaz, L.; Csirmaz, E.P. Information Inequalities for Five Random Variables. Computation 2026, 14, 42. https://doi.org/10.3390/computation14020042

AMA Style

Csirmaz L, Csirmaz EP. Information Inequalities for Five Random Variables. Computation. 2026; 14(2):42. https://doi.org/10.3390/computation14020042

Chicago/Turabian Style

Csirmaz, Laszlo, and Elod P. Csirmaz. 2026. "Information Inequalities for Five Random Variables" Computation 14, no. 2: 42. https://doi.org/10.3390/computation14020042

APA Style

Csirmaz, L., & Csirmaz, E. P. (2026). Information Inequalities for Five Random Variables. Computation, 14(2), 42. https://doi.org/10.3390/computation14020042

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop