Article Non-linear Information Inequalities OPEN ACCESS

We construct non-linear information inequalities from Mat´uˇs’ infinite series of linear information inequalities. Each single non-linear inequality is sufficiently strong to prove that the closure of the set of all entropy functions is not polyhedral for four or more random variables, a fact that was already established using the series of linear inequalities. To the best of our knowledge, they are the first non-trivial examples of non-linear information inequalities.


Introduction
Information inequalities play a crucial role in the proofs for almost all source and channel coding converse theorems.Roughly speaking, these inequalities govern the impossibility in information theory.Among information inequalities discovered to date, the most well-known are the Shannon-type inequalities, including the non-negativity of (conditional) entropies and (conditional) mutual information.In [2], a non-Shannon information inequality (that cannot be deduced from any set of Shannon-type inequalities) involving more than three random variables was discovered.Since then, many additional information inequalities have been discovered [4].
Apart from their application in proving converse coding theorems, information inequalities (either linear or non-linear) were shown to have a very close relation with inequalities involving the cardinality of a group and its subgroups [3].Specifically, an information inequality is valid if and only if its grouptheoretic counterpart (obtained by mechanical substitution of symbols) is also valid.For example, the non-negativity of mutual information is equivalent to the group inequality where G 1 and G 2 are subgroups of the group G.
Information inequalities are also the most common tool (perhaps even unique), for the characterization of entropy functions (see Definition 1 below).In fact, entropy functions and information inequalities are two sides of the same coin.A complete characterization for entropy functions requires complete knowledge of the set of all information inequalities.
The set of entropy functions involving n random variables, Γ * n , and its closure Γ * n are of extreme importance not only because of their relation to information inequalities [6], but also for determination of the set of feasible multicast rates in communication networks employing network coding [5,7].Furthermore, determination of Γ * would resolve the implication problem of conditional independence (determination of every other conditional independence relation implied by a given set of conditional independence relationships).A simple and explicit characterization of Γ * , and Γ * will indeed be very useful.Unfortunately, except in the case when n < 4, such a characterization is still missing [1,2,4].
Recently, it was shown by Matúš that there are countably infinite many information inequalities [1].This result, summarized below in Section 2, implies that Γ * n is not polyhedral.The main result of this paper is non-linear inequalities, which we derive from Matúš' series in Section 3. To the best of our knowledge this is the first example of a non-trivial non-linear information inequality.We use the nonlinear inequality to deduce that the closure of the set of all entropy functions is not polyhedral -a fact previously proved in [1] using the infinite sequence of linear inequalities.Finally, in Section 4, we compare the series of linear inequalities and the proposed nonlinear inequality on a projection of Γ * n .

Background
Let the index set N = {1, 2 • • • , n} induce a real 2 n dimensional Euclidean space F n with coordinates indexed by the set of all subsets of N .Specifically, if g ∈ F n , then its coordinates are denoted (g(α) : α ⊆ N ).Consequently, points g ∈ F n can be regarded as functions g : 2 N → R. The focus of this paper is the subset of F n corresponding to (almost) entropic functions.
Definition 1 (Entropic function) A function g ∈ F n is entropic if g(∅) = 0 and there exists discrete random variables X 1 , . . ., X n such that the joint entropy of {X i : i ∈ α} is g(α) for all ∅ = α ⊆ N .Furthermore, g is almost entropic if it is the limit of a sequence of entropic functions.

Let Γ *
n be the set of all entropic functions.Its closure Γ * n (i.e., the set of all almost entropic functions) is well-known to be a closed, convex cone [6].An important recent result with significant implications for Γ * n is the series of linear information inequalities obtained by Matúš [1] (restated below in Theorem 1).Using this series, Γ * n was proved to be non-polyhedral for n ≥ 4.This means Γ * n cannot be defined by an intersection of any finite set of linear information inequalities.
Furthermore, for singletons i, j, k ∈ N , write ij|k as shorthand for ik,jk .
Theorem 1 (Mat úš) Let s ∈ Z + , the set of positive integers, and g ∈ Γ * n be the entropy function of discrete random variables Furthermore, assuming that X 5 = X 2 , the inequality reduces to To the best of our knowledge, this is the only result indicating the existence of infinitely many linear information inequalities.Reductions to Γ * 4 with s = 1 recovers the Zhang-Yeung inequality [2] and s = 2 obtains an inequality of [4].
In the following, we will derive non-linear information inequalities from the sequence of linear inequalities (3).
Then g satisfies ( 3) for all nonnegative integers s if and only if a(g), c(g) ≥ 0 and Proof: To simplify notation, a(g), b(g) and c(g) will simply be denoted as a, b and c.We will first prove the only-if part.Assume that g satisfies (3) for all nonnegative integers s.When s = 0, c ≥ 0. By Proposition 1, a ≥ 0. It remains to prove that (6) holds.Suppose first that a > 0. If the quadratic Q(s; a, b, c) has no distinct real roots in s, then clearly (b − a) 2 − 4ac ≤ 0 and the theorem holds.On the other hand, if Q(s; a, b, c) has distinct real roots, implying (b − a) 2 − 4ac > 0, then Q(s; a, b, c) is negative and is at its minimum when s = −(b − a)/2a which is greater than −1/2 by assumption.Since Q(s; a, b, c) ≥ 0 for all non-negative integer s, the "distance" between the two roots can be at most 2 min (w − w , w − w).In other words, If on the other hand that a = 0, then the assumption b ≤ 2a and Proposition 1 implies that b = 0.As such, the quadratic inequality (b − a) 2 − 4ac ≤ 0 and (6) clearly holds.Hence, the only-if part of the theorem is proved.Now, we will prove the if-part.If a = 0, then (6) and the assumption b ≤ 2a implies that b = 0.The theorem then holds as c ≥ 0 by assumption.Now suppose a > 0 and b ≤ 2a.Using a similar argument as before, (6) implies that either Q(s; a, b, c) ≥ 0 has no real roots or the two real roots are within the closed interval [ w , w ].Since a > 0, for all nonnegative integer s, we have Q(s; a, b, c) ≥ 0, or equivalently, s 2 a + s(b − a) + c ≥ 0 and hence the theorem is proved.
Theorem 2 showed that Matúš series of linear inequalities is equivalent to the single non-linear inequality (6) under the condition that that b(g) ≤ 2a(g) and a(g), c(g) ≥ 0.
Clearly, a(g), c(g) ≥ 0 holds for all entropic g because of the nonnegativity of conditional mutual information.Therefore, imposing these two conditions does not very much weaken (6).If on the other hand that b(g) ≤ 2a(g) does not hold, then Matúš series of inequalities are implied by that a(g), c(g) ≥ 0. In that case, Matúš' inequalities will not be of interest.Therefore, our proposed nonlinear inequality essentially is not much weaker than Matúš' ones.
While (6) is interesting in its own right, it is not so easy to work with.In the following, we shall consider a weaker form.
Corollary 1 (Quadratic information inequality) Suppose that g satisfies ( 3) for all nonnegative integers s.
Proof: Since min (w(g) − w(g) , w(g) − w(g)) ≤ 1/2, the corollary then follows directly from Theorem 2. Despite the fact that the above "quadratic" information inequality is a consequence of a series of linear inequalities, to the best of our knowledge, it is indeed the first non-trivial non-linear information inequality.

Implications of Corollary 1
In Proposition 1, we showed that Matúš' inequalities imply that if a(g) = 0, then b(g) ≥ 0. The same result can also be proved by using the quadratic information inequality in (7).
Hence, if b(g) < 0, then (8) will be violated leading to a contradiction.
In [1], it was proved that the cone Γ * n is not polyhedral for n ≥ 4. Ignoring the technical details, the idea of the proof is very simple.First, a sequence of entropic functions g t was constructed such that (1) the sequence converges to g 0 , and ( 2) it has a one-side tangent ġ0+ which is defined as lim t→0 + (g t −g 0 )/t.Clearly, if Γ * n is polyhedral, then there exists > 0 such that g 0 + ġ0+ is contained in Γ * n .It was then shown that for any > 0, the function g 0 + ġ0+ is not in Γ * n because it violates (3) for sufficiently large s.Therefore, Γ * n is not polyhedral, or equivalently, there are infinitely many information inequalities.In fact, we can also show that g 0 + ġ0+ also violates the quadratic information inequality obtained in Corollary 1 for any positive .As such, ( 7) is sufficient to prove that Γ * n is not polyhedral for n ≥ 4 and hence the following implication.

Implication 2
The quadratic inequality (7) is strong enough to imply that Γ * n is not polyhedral.
Some nonlinear information inequalities are direct consequences of basic linear information inequalities (e.g., H(X) 2 I(X; Y ) ≥ 0).Such inequalities are trivial in that they are obtained directly as nonlinear transformations of known linear inequalities.Our proposed quadratic inequality ( 7) is non-trivial, as proved in the following.

Implication 3
The quadratic inequality ( 7) is a non-linear inequality that cannot be implied by any finite number of linear information inequalities.Specifically, for any given finite set of valid linear information inequalities, there exists g ∈ Γ * n such that g does not satisfy (7) but satisfies all the given linear inequalities.
Proof: Suppose we are given a finite set of valid linear information inequalities.Then the set of g ∈ F n satisfying all these linear inequalities is polyhedral.In other words, the set is obtained by taking intersection of a finite number of half-spaces.For simplicity, such a polyhedron will be denoted by Ψ.
We will once again use the sequence of entropic functions {g t } ∞ t=1 constructed in [1].Clearly, g t ∈ Ψ for all t since g ∈ Γ * n .Again, as Ψ is polyhedral, g g 0 + ġ0+ ∈ Ψ for sufficiently small > 0. In other words, g 0 + ġ0+ satisfies all the given linear inequalities.However, as explained earlier, g violates the quadratic inequality (7) and hence the theorem follows.

Characterizing Γ * n by projection
Although the set of almost entropic functions Γ * n is a closed and convex cone, finding a complete characterization is an extremely difficult task.Therefore, instead of tackling the hard problem directly, it is sensible to consider a relatively simpler problem -the characterization of a "projection" of Γ * n .This projection problem is easier because the dimension of a projection can be much smaller, making it easier to be visualized and to be described.Furthermore, its low dimensionality may also facilitate the use of numerical techniques to find an approximation for the projection.
In this section, we consider a particular projection and will show how inequalities obtained in the previous section be expressed by equivalent ones on the proposed projection.As such, we can have a better idea how the projection looks like.First, we will define our proposed projection Υ.
Proof: Since the set {(a(g), b(g) − a(g)) : g ∈ Γ * n } is a closed and convex one, its cross-section (and its affine transform) Υ is also closed and convex.
Since Υ is obtained by projecting Γ * n onto a two-dimensional Euclidean space, any inequality satisfied by all points in Υ induces a corresponding information inequality.Specifically, we have the following proposition.
Proposition 2 Suppose that there exists k ≥ 0 such that if and only if Similarly, (11) holds for all (u, v) ∈ Υ and v ≤ u if and only if (12) holds for all g ∈ Γ * n and b(g) ≤ 2a(g).
Finally, the constrained counterpart follows from that (b(g) − a(g)) ≤ a(g) if and only if b(g) ≤ 2a(g).
By Proposition 2, there is a mechanical way to rewrite inequalities for Γ * n as ones for Υ, and vice versa.Therefore, we will abuse notations by calling that (11) and (12) equivalent.In the following, we will rewrite inequalities obtained in previous sections by using Proposition 2.
Proposition 3 (Mat úš' inequalities) When s is a positive integer, the inequality (3) is equivalent to Proof: A direct consequence of Proposition 2 and that By optimizing the choice of s, we can obtain a stronger piecewise linear inequality which can be rewritten as follows.
Proof: A direct consequence of Propositions 2 and 3.
As we shall see in the following lemma, L li (u) can be explicitly characterized.
. Therefore, L li (0) = sup s∈Z + f (s, 0) = 0. Also, it is straightforward to prove that • for 0 < u ≤ 1/2, f (s, u) is a strictly concave function of s for s ≥ 1 and is at its maximum when By solving a system of linear equations, we can show that Together with the fact that L li (u) = −1 = f (1, u) for 1/2 ≤ u ≤ 1, the lemma follows.
Proposition 4 (Quadratic inequality) The quadratic inequality (7) (subject to that b(g) ≤ 2a(g)) is equivalent to subject to that v ≤ u.
To illustrate (18), we plot the curves (v +u) 2 +2u 2 = 2u and v = u in Figure 1.From the proposition, if v ≤ u (i.e., the point (u, v) is below the dotted line), then (u, v) ∈ Υ implies that (u, v) is inside the ellipse.
Proposition 4 gives a nonlinear information inequality on Υ subject to a condition that v ≤ u.In the following theorem, we relax the inequality so as to remove the condition.
Proof: By Proposition 4, if (u, v) ∈ Υ such that v ≤ u, then The theorem then follows from Proposition 2. In the next proposition, we will show that the piecewise linear inequality v ≥ L li (u) and the proposed nonlinear inequality v ≥ L nl (u) coincides for countably infinite number of u.
Proof: By definition, L nl (0) = L li (0) = 0 and the proposition holds in this case.Assume now that 0 < u ≤ 1.We first show that L nl (u) = L li (u) when u = 1/(1 + 2s 2 ) for some nonnegative integer s.Suppose that s = 0, then L nl (u) = L li (u) = −1.On the other hand, if u = 1/(1 + 2s 2 ) where s is a positive integer, then it is straightforward to prove that Figure 2. Piecewise linear inequality and nonlinear inequality.

Conclusion
In this paper, we constructed several piecewise linear and quadratic information inequalities from a series of information inequalities proved in [1].Our proposed nonlinear inequality (6) was shown to be equivalent to the whole set of Matúš' linear inequalities.Hence, we can replace all Matúš' inequalities with our proposed ones.
However, the inequality is not smooth and may not be easy to work with.Therefore, we relax these nonlinear inequalities to quadratic ones.These quadratic inequalities are strong enough to show that the set of almost entropic functions is not polyhedral.
It is certain that the proposed quadratic inequalities we obtained in ( 16) and ( 19) are a consequence of Matúš' linear inequalities.Yet, the non-linear inequality has a much simpler form.By comparing the inequalites on projections of Γ * n , our figures suggested that these nonlinear inequalities are indeed fairly good approximations to the corresponding piecewise linear inequalities.Furthermore, they are of particular interest for several reasons.First, all these inequalities are non-trivial and cannot be deduced from any finite number of linear information inequalities.To the best of our knowledge, they are the first non-trivial nonlinear information inequalities.Second, in some cases, it will be relatively easier to work with a single nonlinear inequality, rather than an infinite number of linear inequalities.For example, in order to compute some bounds on a capacity region (say, in a network coding problem), a characterization of Γ * n may be needed as input to a computing system.Surely, Γ * n is unknown and hence an outer bound of Γ * n will be used instead.If one replace the countably infinite number of linear inequalities with a single nonlinear inequality, it may greatly simplify the computing problem.Third, these nonlinear inequalities prompt us to ask new fundamental questions -are nonlinear information inequalities more fundamental than linear information inequalities?Would it be possible that the set Γ * n be completely characterized by a finite number of nonlinear inequalities?If so, what will they look like?
As a final remark, Matúš' inequalities, and also all the non-linear inequalities we obtained, are "tighter" than the Shannon inequalities only in the region where b(g) ≤ 2a(g).When b(g) ≥ 2a(g), the two inequalities are direct consequences of non-negativity of conditional mutual information.This phenomenon seems to suggest that entropic functions are much more difficult to characterize in the region b(g) < 2a(g).An explanation for this phenomenon is still lacking.