1. Introduction
Many important mathematical problems can be reduced to the following question: does a collection of finite random variables exist such that the entropies of the variable subsets satisfy certain linear constraints? Examples include, but are not limited to, channel coding [
1] and network coding in particular [
2], estimating the efficiency of secret sharing schemes [
3,
4,
5], questions about matroid representations [
6], guessing games [
7], extracting information from common strings in cryptography [
8], additive combinatorics [
9], and finding conditional independence inference rules [
10].
The
entropy function of finitely many discrete random variables
indexed by the fixed finite set
N maps the non-empty subsets
to the Shannon entropy
of the variable set
, see [
11]. The
entropy region, denoted by
, is the range of the entropy function; it is a part of the
-dimensional Euclidean space where the coordinates are labeled by non-empty subsets of
N. Entropies are non-negative real numbers, and thus the entropy region lies in the non-negative orthant of this Euclidean space. It is delimited by a collection of homogeneous linear inequalities corresponding to the non-negativity of basic Shannon information measures [
11]. Points satisfying all these inequalities form the
Shannon-bound; the Shannon-bound is denoted by
.
N. Pippenger argued in [
12] that linear inequalities bounding the entropic region
encode the fundamental laws of Information Theory and determine the limits of information transmission and data compression. The long-standing problem of whether a linear information inequality can properly cut into the Shannon bound was settled in 1998 by Zhang and Yeung [
13] who exhibited the first example of such a non-Shannon information inequality. Their discovery initiated intensive research. The phrase
Copy Lemma was coined by Dougherty et al. [
14] to describe the general method distilled from the original Zhang–Yeung construction. The Copy Lemma has been applied successfully to generate several hundred sporadic and a couple of infinite families of non-Shannon entropy inequalities for
, see [
14,
15,
16]. A different method, utilizing an information-theoretic lemma attributed to Ahlswede and Körner [
17], was proposed in [
18]; later it was shown to be equivalent to a special case of the Copy Lemma [
19].
Our method to obtain five-variable non-Shannon entropy inequalities is based on a more general paradigm of which the Copy Lemma is a special case [
20]. Derived from the principle of maximum entropy [
21], it is called
MEM, short for Maximum Entropy Method. For more details, see
Section 3.
Previous works on generating and applying non-Shannon entropy inequalities, such as [
4,
10,
14,
15,
22,
23], focused on the four-variable case, and only a few sporadic five-variable non-Shannon inequalities have been discovered, such as the MMRV inequality from [
18]. This is the first work that provides a method that generates an infinite collection of non-Shannon bounds on the five-variable entropy region
. Compared to the four-variable case, there are significant challenges, both theoretical and computational. The four-variable entropy region
sits in the 15-dimensional Euclidean space, while the five-variable region
is 31-dimensional. The structure of the Shannon bound
is well-understood: it has 41 extremal directions, and only 6 of them have no entropic points. The entropy region
has an inner polyhedral cone where it fills its Shannon bound, and has six isomorphic “protrusions” towards the six exceptional extremal directions, each protrusion surrounded by 15 hyperplanes of which 14 come from the Shannon bound [
24]. Only the protrusions contribute to new entropy inequalities, and their dimension can be reduced to 10. Computational results about
can be obtained by computing vertices and facets of numerous implicitly defined 10-dimensional polyhedra [
22]. In contrast, the Shannon bound
of the five-variable entropy region has 117,983 extremal directions [
25], and for a few of them it is not even known whether they contain an entropic point or not. No structural reduction similar to the four-variable case is available, and it is not known whether such a reduction exists or not. Computations about
can still be reduced to 25-dimensional polyhedral enumeration problems (although with significantly larger number of constraints than in the 4-variable case). The complexity of enumeration problems typically doubles when the dimension increases by one, making such high-dimensional enumeration problems practically intractable.
We overcome this computational difficulty by applying a particular variant of the Maximum Entropy Method. This variant, working in generations, first reduces the problem dimension from 31 to 19, and then, at each generation, adds extra copies of some of the random variables, increasing the problem dimension again. Theoretical considerations and harnessing the inherent symmetry allowed us to complete the associated polyhedral computations up to nine generations. The output was the complete list of five-variable non-Shannon inequalities provided by the first nine generations. Based on the experimental results, we define an infinite collection of five-variable inequalities that we prove are provided by this MEM variant—in particular, they are valid non-Shannon entropy inequalities—and conjecture this collection to be complete; that is, no additional inequalities are yielded by this MEM variant. The collection of the inequalities is parametrized by finite, downward closed subsets of the non-negative lattice points of the plane. Some of the inequalities in our collection are consequences of the others; those that are not, are called extremal. We developed an incremental algorithm that enumerates, from generation to generation, the parameters yielding the extremal inequalities, in complete agreement with the computational results. The algorithm allowed us to significantly exceed the capabilities of polyhedral computation. While numerical instability prevented the completion of the polyhedral computation for the tenth generation, all extremal entropy inequalities were enumerated up to generation 60. Finally, we have examined the large-scale behavior of the extremal inequalities, and depicted how these inequalities delimit a three-dimensional cross-section of .
The new five-variable non-Shannon inequalities can be applied to real-world problems. The most immediate application is in network coding. The new inequalities tighten the boundaries; they provide stricter and more accurate bounds on network capacity. In a network protocol they can assist in proving whether a targeted data rate is achievable or not [
2].
Cloud storage services (like Google Drive or AWS S3) distribute data fragments across many nodes [
26]. In case of failure, the node has to download the missing data from other nodes. The new five-variable inequalities can be used to determine the theoretical limits of storage efficiency for systems with more complex failure models or larger clusters.
In the realm of secret sharing, entropy inequalities provide lower bounds on the size of secrets [
3,
4,
5]. To explore another facet of these problems, the new inequalities can prove that certain efficient schemes are impossible to realize. In complex datasets, it is important to distinguish between correlation and actual causation [
27]. When an AI model analyzes data to build a causal graph, it can use entropy inequalities to rule out models that are information-theoretically impossible, narrowing the search space and improving accuracy.
In this paper lemmas, claims and theorems are arranged so that each is used in the same section only, typically right after they are stated and proved.
Section 4 proves structural properties of the entropy region that are used to reduce the computational complexity of the polyhedral algorithms. The main theoretical results are stated and proved in
Section 7. In Theorem 1 we prove a large collection of entropy inequalities parametrized by the downward-closed subsets of the non-negative lattice points. Lemmas estimating different entropy expressions are used in this section only. Unfortunately, many inequalities provided by Theorem 1 are consequences of the others. Claims and lemmas in
Section 8 provide the theoretical foundation for our algorithm that selects and enumerates the
extremal inequalities among them.
The remaining part of the paper is organized as follows. Notations are recalled in
Section 2.
Section 3 describes the special variant of the Maximum Entropy Method we apply to
.
Section 4 discusses possible simplifications, including how symmetry can be utilized and how the MEM parameters were chosen.
Section 5 describes the chosen coordinate systems, polyhedral computations, and their results.
Section 6 presents the five-variable inequalities we obtained, paving the way for the definition of two infinite families of such inequalities in
Section 7. Additional theoretical results, including the proof that inequalities in these families are indeed generated by the MEM method, are presented in
Section 7.
Section 8 discusses methods that can recognize extremal inequalities, discusses the incremental algorithm that enumerates the extremal inequalities for each MEM generation, describes the large-scale behavior of the new inequalities, and investigates the delimited part of the five-variable entropy region. Finally,
Section 9 summarizes our work, lists open questions, and provides directions for further work.
2. Preliminaries
In this paper all sets are finite. Capital letters, such as A, J, N, etc., denote (finite) sets; elements of these sets are denoted by lower case letters. The union sign and the curly brackets around singletons are frequently omitted, thus, denotes the set . The difference of two sets is written as , or if the second set is a singleton. The star in the union emphasizes that A and B are disjoint sets. A partition of N is a collection of non-empty disjoint subsets of N whose union equals N.
A
discrete random variable ξ takes its values from a finite set
, called
alphabet. The probability that
takes
is denoted by
, or simply by
when the random variable
is clear from the context. Suppose
is defined on the direct product
for some finite set
N, called the
base set. For a non-empty
the
marginal is defined on the product alphabet
so that the probability of
is the sum of the probabilities of those
whose projection to
equals
y:
To emphasize that
is defined on a product space, we write
, and say that the random variables
are
distributed jointly. The Shannon entropy of the distribution
is defined as
with the convention that
. If
is a joint distribution, then we write
for
. The index
is also dropped when it is clear from the context. By convention,
. The entropies
are arranged into a vector indexed by the non-empty subsets
A of
N. This vector is the
entropy profile of the distribution
. The collection of these
-dimensional vectors forms the
entropy region, denoted by
. Elements of
are considered interchangeably as vectors, as points in this Euclidean space, and as functions assigning non-negative real numbers to non-empty subsets of the base set
N. For a gentle introduction to these notions of Information Theory, please consult [
11].
Notions of conditional entropy, mutual information, and conditional mutual information from Information Theory are formally extended to the functional form of these vectors. If
f is any function on subsets of
N, then for subsets
of
N the following forms will be used as abbreviations:
The first three expressions are called
conditional entropy,
mutual information, and
conditional mutual information, respectively. The last line defines the
Ingleton expression. An entropy function is not defined on the empty set, nevertheless,
will be assumed whenever convenient. In particular,
and
are the same expressions. Frequently, when clear from the context, the function
f is omitted before the parenthesized expression. Additionally, if applied to singletons, the Ingleton expression is written without commas. An example is the inequality
Shannon inequalities state the non-negativity of the conditional entropy, mutual information, and conditional mutual information for all subsets
A,
B,
C of the base set
N. They are consequences of the unique minimal set of such inequalities, called
basic Shannon inequalities, see [
11], listed in (B1) and (B2) below:
- (B1)
for all ;
- (B2)
for all and different , including .
The collection of all
-dimensional vectors (or points, or functions) that satisfy the Shannon inequalities is denoted by
. It is a natural outer bound for the entropy region
.
is a pointed polyhedral cone [
28]; its facets are the hyperplanes specified by the basic Shannon inequalities.
Polymatroids are elements of
written in functional form. A polymatroid is usually written as
, or just
f, when we say that
f is on N. The polymatroid
f is
entropic if it is in
, and
almost entropic, or
aent for short, if it is in the closure (in the usual Euclidean topology) of
. Linear inequalities valid for all polymatroids are consequences of the basic Shannon inequalities; an example is the inequality (
3). A
non-Shannon inequality is a homogeneous linear inequality that is valid for points of the entropic region but not for all points of the Shannon bound. Equivalently, the non-negative side of the hyperplane corresponding to such an inequality contains the complete entropy region, while it cuts properly into
.
The closure of the entropic region is a pointed convex full-dimensional cone [
11], and only its boundary points can be non-entropic [
29].
The polymatroid on the base set N is linearly representable over the field , or -representable in short, if there is a finite-dimensional vector space V over , and linear subspaces for , such that for all , is the dimension of the linear subspace spanned by . Clearly, if both and are -representable over the same field, then so is their sum . The polymatroid f is -linear if it is in the closure of the multiplies of -representable polymatroids. By the previous remark, -linear polymatroids form a closed cone. Finally, f is linear if it is -linear for some field .
Following a compactness argument, if
f is
-representable, then it is representable over some finite field as well, see [
30], meaning that the vector space
V is also finite. Taking the uniform distribution on
V provides the entropic polymatroid
. Thus, linear polymatroids are also almost entropic.
Linear polymatroids on the base set
N with
are
-linear for every field
, see [
24,
31]; this statement is not true in general. For
every polymatroid is linear. For
a polymatroid
f on
N is linear if and only if it satisfies the following six instances of the Ingleton inequality:
see [
24]. Since the Ingleton expression is symmetric in the first two and in the last two arguments, these expressions cover all 24 permutations of
N.
Finally, we recall notions of independence. Let
be a polymatroid, and
X,
, …,
be disjoint subsets of
N.
and
are
independent in f if
. The collection
is
completely independent in
f if for any two disjoint subsets
I and
J of the indices
,
and
are independent, or, equivalently, if
In this case we also have
for every subset
I of the indices. The disjoint subsets
and
are
conditionally independent over X if
; and
are
completely conditionally independent over X if
and
are conditionally independent over
X for arbitrary disjoint subsets
I and
J of the indices. An equivalent condition is
which similarly implies
for every index set
I.
3. The Maximum Entropy Method
In general terms, the principle of maximum entropy is easy to formulate:
if a probability distribution is specified only partially, take the one with the largest entropy, see, e.g., [
21]. In the particular case applied here “partial specification” means fixing some, but not all, marginal distributions. To be more concrete, suppose
is distributed jointly on the base set
N. Partition
N into three non-empty subsets as
. Take
disjoint copies of
Y and
disjoint copies of
Z to form the enlarged base set
Consider the collection of those distributions
on
whose marginals on
are equal to
, and marginals on
are equal to
. That is, the marginal of
on
and the marginals of
on all
are the same as well as the marginal of
on
and the marginals of
on
. This collection of distributions is not empty, as one can take each
to be the same as
Y, and each
to be the same as
Z. The total entropy is a strictly concave function of the probability masses, and fixing certain marginals imposes linear constraints on those masses. Consequently, there is a unique optimal distribution
with maximum total entropy, see [
32]. Although structural properties of the maximum entropy distributions are mainly unknown, they are known to satisfy numerous conditional independencies. For this particular case, these are stated as Lemma 1 below.
Lemma 1. In the distribution with maximum total entropy, the subsets and are completely conditionally independent over X.
Proof. If some of the conditional independence statements do not hold, then one can redefine the distribution keeping the specified marginals while increasing the total entropy. For details, see [
20]. □
Since identical distributions have identical entropy profiles, Lemma 1 immediately implies that an entropic polymatroid has an -copy as defined below:
Definition 1. Let f be a polymatroid on N, and partition N into three non-empty subsets as . Let and be disjoint copies of Y and Z, respectively. The polymatroid on the base set is an -copy of f if
- (i)
restricted to is isomorphic to for every ,
- (ii)
restricted to is isomorphic to for every ,
- (iii)
the subsets are completely conditionally independent over X in .
The special version of the Maximum Entropy Method used in this paper is based on the fact that entropic polymatroids have -copies. For fixed integers n and m, polymatroids on that have an -copy form a polyhedral cone . This is proved as Claim 1 below. The cone contains the complete entropy region , and is contained in the Shannon cone . Consequently, bounding facets of the cone that are not facets of the Shannon cone provide new entropy inequalities. This method is summarized as follows.
Maximum Entropy Method (special case). Fix the base set N and the partition . For let be the polyhedral cone of those polymatroids on N that have an -copy. Compute all bounding facets of as homogeneous linear inequalities, and delete those which are consequences of the basic Shannon inequalities. The remaining inequalities form the maximal set of non-Shannon inequalities provided by the partition and the numbers n and m.
Let us remark that while the maximum entropy extension is unique, the -copy in Definition 1 is typically not, as the definition captures only a small part of the properties of the maximum entropy extension. The obtained entropy inequalities form the facets of a convex polytope; consequently, they are independent in the sense that none of them is a consequence of the others or the Shannon inequalities.
Next we prove that is a polyhedral cone indeed.
Claim 1. Polymatroids with an -copy form a polyhedral cone.
Proof. Consider the polymatroid
f as a
-dimensional vector indexed by the non-empty subsets of
N. Write this vector as
where
of dimension
contains those coordinates where the index
I is a subset of either
or
, and
of dimension
contains the rest, namely those subsets that intersect both
Y and
Z. Clearly,
. Similarly, let
be the vector formed from the values of the
-copy polymatroid
as indexed by the subsets of
. The vector
has dimension
. Now,
is a polymatroid if the vector
satisfies all linear inequality constraints imposed by the basic Shannon inequalities in (B1) and (B2); and it is an
-copy of
f if, additionally, the composed vector
satisfies the equality constraints corresponding to conditions (i)–(iii) in Definition 1. Consequently, there exists a matrix
M with
columns, depending only on the partition
and the numbers
n and
m, so that
f has an
-copy if and only if there is a vector
satisfying
. Similarly,
is a polymatroid if, for another matrix
B with
columns expressing the basic Shannon inequalities for
, we have
. Thus the collection of polymatroids on
N that have an
-copy is the set
Here
M and
B are matrices with integer entries; these matrices depend only on
,
n, and
m. Since
is the intersection of a polyhedral cone and the projection of a polyhedral cone, it is also a polyhedral cone, as claimed. □
From the proof it is clear that the
-part of
is constrained only by the basic Shannon inequalities encoded in the matrix
B. Furthermore, constraints on
imposed by the first condition are contained in the second one. Thus, it suffices to consider the bounding facets of
for new entropy inequalities. This is because, due to the duality theorem of linear programming [
28], facets of
are convex linear combinations of facets of
and facets corresponding to the basic Shannon inequalities for the base set
.
Coordinates in
are indexed by subsets of
and
, so the inequalities provided by the bounding facets of
contain only elements of the restrictions
and
. We emphasize that these restrictions are not arbitrary polymatroids on
and
with a common restriction on
X, as they also have a common extension, namely
f. Conditions ensuring the existence of such a common extension are assumed to hold, see [
33], and they do not contribute towards the non-Shannon entropy inequalities we are searching for.
5. Computation
The cone
whose non-Shannon bounding facets provide the new entropy inequalities sits in the
-dimensional Euclidean space with coordinates indexed by the non-empty subsets of
and
. Fix the number of copies to
. This choice also fixes the dimension
of the vector
. The generating matrix
M of the polyhedral cone
from (
8) is repeated here:
The modular part of
is 5-dimensional, and so its tight part sits in a 14-dimensional subspace of
.
A structural property of the polymatroid region on the four-element set
allows us to further reduce the complexity of the polyhedral computation required in step (2) above. This region has a central part and six permutationally equivalent “protrusions,” depending on the signs of the Ingleton expressions
If all of them are non-negative, then the restriction
is a linear polymatroid; otherwise exactly one of these Ingleton expressions is negative, see e.g., [
24]. Accordingly, the cone
is cut into seven parts by these Ingleton hyperplanes: the central part where all Ingleton values are non-negative, and six other parts where exactly one of the expressions is negative. The facets of each part can be computed separately.
Parts of on the negative side of , , , and are isomorphic because swapping and/or are symmetries of . Therefore, it suffices to consider only one of them. The central part, where every Ingleton expression is non-negative, does not yield new inequalities. This follows from Lemma 4 below, as the elements of the central part are linear.
Lemma 4. If f restricted to is linear, then f has an n-copy for all .
Proof. Since every polymatroid on three elements is linear, and linearly representable polymatroids on three or four elements are representable over any field, we can assume, after scaling and using continuity, that both and are -linearly representable over the same finite field . Denote the two representing vector spaces by and , and consider the subspace arrangements and in the two vector spaces. Now and have dimensions and , respectively, and their linear span has dimension . Therefore, these arrangements are isomorphic, and and can be glued along the linear span of and . This gluing yields an -linear polymatroid g that has the same restrictions on and on as f does. Since this g is entropic, it has an n-copy for every . This n-copy is also an n-copy of f, as required. □
Consequently, up to the and symmetries, three mutually exclusive cases are left: , , and . Using the homogeneity of , the Ingleton value can be set to , in effect taking a cross-section of that has one fewer dimension. Facets of the part of we are considering are also facets of these cross-sections; consequently, only facets of the cross-sections need to be computed. We consider these three cases separately in the subsections below.
The definition (
16) of the cone
uses the 19-dimensional coordinate system where the coordinates of the vector
are labeled by the non-empty subsets of
and
. In all three cases we perform calculations in different coordinate systems that are chosen so that
The first coordinate is the Ingleton expression defining the cross-section;
The tight and modular parts of the cross-section have disjoint coordinates;
Apart from the Ingleton coordinate, other coordinates have non-negative values.
The first property allows to set the Ingleton value explicitly. Based on the second property, the tight part of the cross-section can be separated by dropping some coordinates; and the third property potentially reduces the complexity of the polyhedral enumeration algorithm.
5.1. Case I
The cone
is intersected with the hyperplane
. In this case we use the coordinate system
Coordinates
cover the modular part of
. The tight part is spanned by the coordinate vectors
, and each of these vectors is orthogonal to the modular part. Let
be the inverse of the matrix of this coordinate transformation, and the vector
be the first row of
. Let
be the submatrix formed from rows 2 to 14 of
. Coordinates of the vector
in this coordinate system are
, and, in particular, the Ingleton value
is the scalar product
. Consequently, the tight part of the intersection of
and the hyperplane
in this coordinate system is
Finding all facets of
determined by the matrices
M and
is closely related to linear multiobjective optimization [
36], and can benefit significantly by working in the 13-dimensional
target space [
37] instead of the significantly larger,
-dimensional
problem space. We have developed a variant of Benson’s inner approximation algorithm [
22,
38] which takes advantage of the additional special property that
is in the non-negative orthant of the target space. The program version 1.3 is available on GitHub as
https://github.com/csirmaz/information-inequalities-5, (accessed on 5 January 2026).
Table 1 shows the sizes of the generating matrix
M, the total number of facets and vertices (including extremal directions) of the cross-section
, and the running time of the vertex enumeration algorithm on a single-core desktop computer with an Intel
® Core
™ i5-4590 CPU @ 3.30 GHz processor and 8 GB of memory. The running time was taken up almost exclusively by the underlying LP solver. While the number of facets grows quite moderately with
n, the number of vertices more than doubles at each generation. The matrix
M, despite the numerous improvements, is highly degenerate, and numerical instability, originating from both the LP solver and the applied polyhedral algorithm, prevented the completion of the computation for larger values of
n. The results of the computation are presented in
Section 6.
5.2. Case II
The cone
is intersected with the hyperplane
. The coordinate system is similar to the one used in
Section 5.1. Base elements
b and
c are swapped in coordinates
, while the other coordinates remain unchanged. The tight part of the intersection, denoted by
, is defined similarly with the same matrix
M but a different coordinate transformation matrix
, vector
, and submatrix
as
The problem size, number of facets and vertices, and the running time in seconds are summarized in
Table 2. Both the number of facets and the number of vertices grow moderately. A plausible conjecture is that, in general, the number of facets is
, and the number of vertices is
.
The running time is significantly shorter than in
Section 5.1. It is explained by the fact that the polyhedral algorithm requires solving an LP instance for each vertex and each facet in the result, and those numbers are significantly smaller here. The generating matrix
M is the same in both cases, implying that the problem size is the same. Numerical instability prevented completing the computation for
even in this case.
5.3. Case III
No new inequality is generated when the cone is intersected with the hyperplane . This can be proved as follows. Since this intersection, denoted by , is an (unbounded) polyhedron, every polymatroid in is a conic combination of its vertices and extremal directions. These vertices and extremal directions can be represented by certain extremal polymatroids. Conic combinations of polymatroids that have an n-copy also have an n-copy. Consequently, it suffices to show that these extremal polymatroids have an n-copy for all .
Changing the first 11 coordinates of the coordinate system used in
Section 5.1 to
and keeping the rest, the vertex enumeration algorithm used in the previous cases generated the vertices and extremal directions of the 13-dimensional tight part of
. The computation showed that it is a pointed cone with a single vertex that has coordinates
equal to zero (while
) and has 14 extremal directions, 12 of which are coordinate axes. Polymatroids representing the extremal directions are linear when restricted to the base set
(they satisfy
; therefore, the other Ingleton values are also non-negative). Consequently, these polymatroids have an
n-copy for all
. Finally, the remaining polymatroid at the single vertex has
(as the coordinate
is zero), which means that
is modular. By Claim 4 it also has an
n-copy for all
. This concludes the proof that no non-Shannon inequality is generated in this case.
8. The Minimal Set of Inequalities
Experimental results reported in
Section 5 and discussed in
Section 6 provided the complete list of five-variable non-Shannon entropy inequalities implied by the existence of an
n-copy for
. Two families of non-Shannon inequalities, generalizing the ones found experimentally, were proven, in Theorem 1 and Theorem 2, respectively, to hold in every polymatroid with an
n-copy. We
conjecture that these families actually characterize those five-variable polymatroids that have an
n-copy, so no further non-Shannon inequalities can be discovered by the version of the Maximum Entropy Method utilized in this paper.
In the case the family of non-Shannon inequalities provided by Theorem 2 matches exactly the inequalities obtained experimentally for .
In the
case the family provided by Theorem 1 is parametrized by the downward closed subsets
S of the diagonal set
. Not all of the generated inequalities correspond to facets of the cone
. While they are valid non-Shannon inequalities, some of them are consequences of others.
Table 4 shows the downward closed subsets of
as well as the corresponding
triplets from Definition 4. Two triplets, marked by ∗, are not in
Table 3. The corresponding inequality
with
and
is the average of the inequalities obtained from the triplets numbered 6, 10 and 13; thus, it is a consequence of them. The main goal of this Section is to obtain a description of those downward closed subsets of
that generate facets of
, that is, inequalities that are not consequences of the others.
Since the inequality (
56) contains the fixed term
, and trivially holds true when
, it is a consequence of the inequalities obtained from the triplets
if there is a convex combination
such that
,
, and
. In this case we say that
is
superseded by the set
. If
is not superseded by other elements of this family, then
is called
extremal. Actually, by the above observation, extremal vectors are the vertices of the convex hull of the set of triplets
as
S runs over the downward closed subset of
. By Carathéodory’s theorem, see [
28],
is superseded if and only if it is (also) superseded by a set with at most three elements.
Lemma 11 below gives a necessary and sufficient condition for the vector to be superseded by a special three-element set. For a subset S of we write for adding the point to S, and to remove from S. In the first case it is tacitly assumed that is not in S, and in the second case that .
Lemma 11. Let , and .
- (i)
is superseded by the vectors if and only if - (ii)
is superseded by if and only if
Proof. We prove (i) only, (ii) is similar. Let
,
and
. Then, according to Definition 4,
is superseded by these vectors if there are non-negative numbers
,
,
with
such that
Since the sum of the
’s is 1, this system is equivalent to
Clearly,
must be strictly positive as
,
, and
are positive. Introducing
and
, this system is equivalent to
One can assume that the first inequality holds with an equality. Since
, the second inequality holds when
is
above the point which splits the interval
in ratio
to
. Similarly,
implies that the third inequality holds when
is
below the point that splits
in the same ratio. Thus, non-negative numbers
and
satisfying these three inequalities exist if and only if the proportion of
in
is not larger than the proportion of
in the interval
, that is,
This condition is equivalent to the one given in the claim. □
Corollary 1. Let , and . Assume both and are in S, while and for . When the condition with negative values is assumed to hold. If is not in the open intervalthen is superseded by vectors generated by one of the following two triplets: Proof. If the slope is less than, or equal to the lower limit, then part (i) of Lemma 11 applies to the first triplet. When the slope is at or above the upper limit, then part (ii) of that Lemma applies to the second triplet. □
A downward closed set
can be specified in two ways. Either by a non-increasing sequence
specifying the maximal values in columns 0, …,
k, or by a non-increasing sequence
specifying the maximal values in rows 0, …,
ℓ. It is easy to see that
Corollary 2. If the vector for some is not superseded by other vectors generated by subsets of , then either the sequence is strictly decreasing, or the sequence is strictly decreasing.
Proof. If
is not strictly decreasing, then the upper bound of
S contains a horizontal segment length of at least 2. Similarly, if
is not strictly decreasing, then the right bound of
S contains a vertical segment of length at least 2, see
Figure 1. Take such a horizontal and a vertical segments whose distance is minimal. Let the horizontal segment be in row
r between columns
and
, and the vertical segment be in column
c between rows
and
. The horizontal and vertical segments are connected by (a possibly empty) diagonal staircase. Depending on which segment comes first, there are two possible arrangements as depicted on
Figure 1.
In the first case
, and
; in the second case
and
. Apply Lemma 11 to the marked points and observe that the modified downward closed sets are always subsets of
. In the first case
and in the second case
Therefore, by Lemma 11,
is superseded by the vectors generated by the indicated sets, proving the claim. □
By Corollary 2, the downward closed set corresponding to an extremal vertex is either a staircase with step heights 1 (when is strictly decreasing), which we call horizontal, or the mirror image of such a staircase. The only configuration that belongs to both cases is the diagonal . It will be more convenient to use the column-sequence to represent horizontal staircases. Here is the length of the staircase, also denoted by . The last column size (height) is necessarily , and equals either or for every . In the rest of this section, all staircases, if not mentioned otherwise, are horizontal ones.
Definition 5. The staircase S is Positive-Negative-Positive (PNP)-reducible in if there are and such that , , and are staircases in and is superseded by . S is PNP-irreducible if it is not PNP-reducible.
Negative-Positive-Negative (NPN)-reducibility and NPN-irreducibility is defined analogously, using staircases , , and , assuming that they are also subsets of . S is irreducible in if it is both PNP- and NPN-irreducible. Finally, let be the collection of the irreducible staircases that are subsets of .
By the remark at the beginning of this section, by Lemma 11, and by Corollary 2, extremal vertices are generated by elements of and by their mirror images. We describe an incremental algorithm that generates the elements of the collection .
A horizontal staircase
S of length
n can be recovered from a unique horizontal staircase
of length
as follows. If
has the column sequence
, then
S is defined by one of the column sequences
depending on whether the last two elements of the column sequence of
S are equal.
Claim 6. - (i)
Suppose S has length n. S is irreducible in if and only if it is irreducible in for any .
- (ii)
If and S is irreducible in , then is irreducible in .
- (iii)
If but , then and S is PNP-reducible in with and .
- (iv)
If and , then either S is NPN-reducible with and , or it is PNP-reducible with and .
Proof. (i) is immediate from the definition as the staircases must be subsets of .
(ii) Assume is reducible in shown by the staircases , and . Since they are in , they can be lifted back to , , in . According to Lemma 11 these staircases witness the reducibility of S.
(iii) If S is reducible in but not in , then is not in , leading to the stated condition.
(iv) If S is reducible in while is not reducible in , then the reduction must use , which is in but not in . If it is an NPN-reduction then it must use the newly added point ; in other cases the reduction can be shifted back to . In the case of a PNP-reduction this additional point is (when extending the staircase by a column of height zero), or can be shifted back to again. □
Based on Claim 6, the incremental algorithm, sketched as Algorithm 1, generates all horizontal irreducible staircases. The PNP- and NPN-irreducibility can be checked based on Lemma 11. The last point is fixed, and the naïve implementation requires quadratic running time in . With some simple bookkeeping it can be reduced to a backward scanning of the column sequence, resulting in linear running time.
Using the algorithm we have computed the complete set of irreducible staircases up to
. The number of new staircases that remained irreducible in each subsequent generation matches the sequence A103116 in the Encyclopedia of Integer Sequences [
39]:
where
is Euler’s totient function, which suggests that the connection is based on the number of different slopes determined by the lattice points in a rectangle. Proving the equivalence of these two sequences is an intriguing open problem.
For better visualization, triplets
corresponding to these irreducible staircases are plotted as the three-dimensional points
using logarithmic scale for the third
coordinate. The plot in
Figure 2 contains all 126,981 extremal triplets in the range
. Some of the plotted triplets appear as late as generation
; later generations do not contribute to this part of the complete set. For comparison, some triplets in the 80-th generation have values larger than
.
| Algorithm 1: Generating irreducible staircases |
![Computation 14 00042 i001 Computation 14 00042 i001]() |
To explain the shape of the surface of extremal triplets plotted in
Figure 2, we provide some heuristic reasoning. A consequence of Corollary 1 is that if the extremal triplet
is computed from the staircase
S, then the slopes determined by the step edges
(namely, points of
S where neither
nor
are in
S) are almost equal. Consequently, on a large scale, extremal
vectors are generated by the set of lattice points in right-angled triangles defined by the inequality
for some positive values of
a and
b. Since, by Lemma 5,
the vector
is well approximated by
, where
is the point where
takes its maximal value. As the function
strictly increases in both coordinates, this maximum is taken on the boundary diagonal of the right-angled triangle
that has endpoints
and
. Using the Stirling formula
, we have
Introducing
, we see that the logarithm of
is well approximated by the function
Using this approximation, the point
is extremal in the triangle
if
is on the boundary diagonal and
has zero derivative along this diagonal. For fixed
u and
v such a positive
a and
b exist just in case the partial derivatives
and
are positive at
. By inspection, this condition is satisfied for every
. Consequently, if
is an extremal triplet, then choosing
,
, we expect
and, conversely, for each
u,
v, with the choice
,
, and
, we expect the triplet
to be extremal. For comparison,
Figure 3 plots these triplets over the same range that was used in
Figure 2.
This approximation seems to slightly underestimate the real value of
. For example, the extremal triplet obtained from the diagonal staircase
is
thus,
, and
. At the same time,
Extremal triplets on the two edges of the surface are specified by the totally flat, stairless staircases. These triplets are
on one axis, and
,
swapped on the other. In this case, the
pair is
, and
which differs from the correct value by a constant only.
We have also looked at how the newly discovered entropy inequalities delimit the 5-variable entropy region. The triplet
yields the inequality
Since the closure of the 5-variable entropy region is a pointed convex cone, one can normalize it by assuming
. An equivalent view is to take the cross-section of
by this hyperplane. Consider the three-dimensional subspace spanned by the vectors
observe that
is negated. Normalize the five-variable entropic function
f so that it satisfies
, then project it to this subspace. Use the scalar products
as the projection coordinates. This three-dimensional cross-section of the five-variable entropy region is
Clearly, points in
have non-negative
x and
y coordinates, while the
z coordinate can take both positive and negative values. Since
is a closed convex cone,
is closed and convex. We concentrate on the part above the
plane:
Shannon inequalities provide no restriction whatsoever on
as any non-negative coordinate triplet can be realized by some polymatroid. To show this, define
for any subset
I of the ground set
as
Let, moreover,
be the function
as
A runs over the non-empty subsets of
. Both
and
are extremal rays of
, so they are polymatroids. For arbitrary non-negative numbers
the linear combination
satisfies
and has coordinates
, providing the required polymatroid.
Points of
with non-negative
x and
y coordinates and
are realized by linear polymatroids; thus, the complete non-negative quadrant of the
plane is a part of
. Our first non-Shannon inequality, generated by the triplet
, is
This inequality immediately limits the region
to
; therefore, points in
have a height at most 1.
Other extremal triplets provide additional linear constraints.
Figure 4 illustrates the delimited part of the non-negative octant as viewed from the origin, and cut at
and at
. The pictured bound of
is extended to larger values of
x and
y. Along the
x and
y axes, this bound approaches the
and
coordinate planes as the functions
and
, respectively. Along the
diagonal, the limiting behavior toward the
z axis is similar to the entropy function
. The corner point of the plateau
has coordinates
. The
estimate gives a smooth bound on
, which is asymptotically tight along the
x and
y axes.
9. Conclusions
Structural properties of the entropy region of four or more variables are mostly unknown. This region is bounded by linear inequalities corresponding to the non-negativity of Shannon information measures. Finding additional entropy inequalities is, and remains, an intriguing open problem. Previous works on generating and applying such non-Shannon entropy inequalities focused mainly on the four-variable case [
4,
10,
14,
15], and only a few sporadic five-variable non-Shannon inequalities have been discovered [
18]. This work provides infinitely many five-variable non-Shannon information inequalities by systematically exploring a special property of entropic vectors. Other works utilized the
Copy Lemma, a method distilled from the original Zhang–Yeung construction by Dougherty et al. [
14]. Our method is based on a different paradigm derived from the principle of maximum entropy and is a special case of the Maximum Entropy Method described in [
20]. As proven in Lemma 1, the principle of maximum entropy implies that every entropic polymatroid has an
-copy, which is a polymatroidal extension with special properties as defined in Definition 1. In Claim 1, we have proved that polymatroids having
-copies form a polyhedral cone and hint at how its facets can be computed. Facet equations provide the potentially new non-Shannon entropy inequalities.
While the polyhedral computation presented in Claim 1 is numerically intractable even for small parameter values, the theoretical results of
Section 4 allowed us to reduce this complexity significantly. Computational aspects of determining the facets of a high-dimensional cone are closely related to linear multi-objective optimization [
22]. We have developed a specially tailored variant of Benson’s inner approximation algorithm [
22,
38], which takes advantage of the special properties of this enumeration problem. Computational results are reported in
Section 5 for generations
. Numerical instability, originating from both the underlying LP solver and the polyhedral algorithm, prevented the completion of the computation for larger values of
n.
Non-Shannon inequalities obtained from these computations are discussed in
Section 6. Based on these experimental results, two infinite families of five-variable inequalities were defined. The first family in Theorem 1 is parametrized by downward closed subsets of non-negative lattice points. The second family in Theorem 2 has a single positive integer parameter. Inequalities in both families are
proved to hold for polymatroids on five elements that have an
n-copy; consequently, they are all valid entropy inequalities. It is
conjectured that they cover all inequalities that can be obtained by the applied method. In other words, if a polymatroid on five elements satisfies all these inequalities, then it has an
n-copy for all
n. This conjecture is left as an open problem. The computational results confirmed this conjecture up to
.
Inequalities in the first family are investigated in
Section 8 in more detail. They are specified by triplets
determined by downward closed sets
S of nonnegative lattice points as discussed in Definition 4. Such a triplet is
extremal if the corresponding inequality is not a consequence of other inequalities from the same family. Extremal triplets are determined by a special collection of downward closed sets called
irreducible staircases. Based on the theoretical results in Corollary 2 and Claim 6, an incremental algorithm, sketched as Algorithm 1, was used to generate irreducible staircases up to generation 60. The converse implication, valid for the computed cases, that triplets generated by irreducible staircases are extremal, is left as an open problem. Triplets (
in the range
, generated by irreducible staircases, are plotted in
Figure 2. The number of new irreducible staircases that remained irreducible in the subsequent generation matches the sequence A103116 in the Encyclopedia of Integer Sequences [
39]. It is an interesting open problem to prove the equality of these sequences.
To illustrate how the newly discovered entropy inequalities delimit the five-variable entropy region, entropy vectors were normalized to satisfy
and projected onto a three-dimensional subspace. Part of the projection in the non-negative octant is denoted by
. The Shannon inequalities do not provide any restriction on this part.
Figure 4 illustrates the bounds implied by the new inequalities. While the non-negative quadrant of the
plane is known to be part of
, and that it also contains points above that plane, it is an intriguing open problem whether our bound is, at least asymptotically, tight around the
x and
y axes. Showing that our bound is asymptotically tight at the zero point would amount to settling the long-standing open problem of whether the entropic region is semi-algebraic.