Abstract
In this paper, we present a revision on some of the recent progresses made in characterising and understanding information inequalities, which are the fundamental physical laws in communications and compression. We will begin with the introduction of a geometric framework for information inequalities, followed by the first non-Shannon inequality proved by Zhang et al. in 1998 [1]. The discovery of this non-Shannon inequality is a breakthrough in the area and has led to the subsequent discovery of many more non-Shannon inequalities. We will also review the close relations between information inequalities and other research areas such as Kolmogorov complexity, determinantal inequalities, and group-theoretic inequalities. These relations have led to non-traditional techniques in proving information inequalities and at the same time made impacts back on those related areas by the introduction of information-theoretic tools.
1. Introduction
Information inequalities are the “physical laws” that characterise the fundamental limits in communications and compression. Probably the most well-known information inequalities are the nonnegativity of entropy and mutual information, extending back to Shannon [2]. They are indispensable in proving converse coding theorems and play a critical role in information theory.
To illustrate the idea about how inequalities are invoked to prove a converse, consider the following classical scenario: Alice aims to send a source message M to Bob in a hostile environment where the transmitted message may be eavesdropped by a malicious adversary Eve. In order to ensure that Eve will learn no knowledge about the source message M, Alice will encrypt it into a transmitted message X using a private key K which is known only by Bob and herself. It is well-known that in order to have perfect secrecy, the entropy of the key K is at least as large as the entropy of the message M. Such a result can be proved by invoking a few information inequalities as follows:
where (a) is due to perfect secrecy (i.e., M and X are independent), (b) follows from that M can be reconstructed from the key K and the encrypted message X, (c) follows from the nonnegativity of conditional entropy and (d) is due to the nonnegativity of mutual information .
Besides their role in proving converse coding theorems, information inequalities are also shown to have close relations with inequalities for Kolmogorov complexities [3], group-theoretic inequalities [4], subspace rank inequalities [5], determinantal inequalities [6] and combinatorial inequalities [7]. Therefore, any new technique in characterising information inequalities will also have direct impact on these areas.
Despite its great importance, characterising information inequalities is not an easy task. It has been open for years whether there exists other information inequalities besides the nonnegativiity of entropies and mutual information. No further information inequalities were found for fifty years, until [1] reported the first “non-Shannon” information inequality. The significance of that result lay not only in the inequality itself, but also in its construction. This particular approach for construction has been the main ingredient in every non-Shannon inequality that has been subsequently discovered. Using this approach, new inequalities can be found mechanically [8] and there are in fact infinitely many such independent inequalities even when there are only four random variables involved [9]. Despite this progress, a complete characterisation is still missing however.
In this survey paper, we will review some of the major progresses in the areas of information inequalities. The organisation of the paper is as follows. In Section 2, we will first outline a geometric framework for information inequalities, based on which we will explain how a Shannon inequality can be proved mechanically. Then we will outline the proof of a non-Shannon inequality which was first proved in [1]. A geometric perspective for the proof will also be given. Next, Matúš’ series of information inequality (and its relaxation) will be discussed.
In Section 3, we will consider several “equivalent frameworks” for information inequalities. First and the most natural one is for the scenario when random variables are continuous. We will prove that information inequalities for discrete and continuous random variables are “essentially the same”. Then we will change our focus to the one-to-one relation between information inequalities, inequalities for Kolmogorov complexity, group-theoretic inequalities and inequalities for box assignments. In Section 4, we will consider two constrained classes of information inequalities, subject to the constraint respectively that random variables are induced by vector subspaces and are Gaussian. These constrained classes of information inequalities are equivalent to subspace rank inequalities and determinantal inequalities respectively.
2. Notations
Let be a finite set and be its power set. If n is understood implicitly, we will simply denote by . We define as the set of all real functions defined on . Hence, is a -dimensional Euclidean space. Elements in are called rank functions over . Let be nonempty sets and be n jointly distributed discrete random variables defined on respectively. For any , denotes the joint random variable defined over (the Cartesian product of for ). As an example, is the random variable . For simplicity, the parentheses in the subscript are usually omitted, i.e., is written as (or even simply ).
For a discrete random variable X, denotes the support of the probability distribution function of X. In other words,
The (discrete) entropy of X, denoted by , is defined as
where p is the probability distribution of X. We will also use the following conventions. Singletons and sets with one element are not distinguished. For any set and subset , denotes the subset .
3. A Framework for Information Inequalities
Let be a set of discrete random variables. It induces a rank function h which is defined as follows: For any ,
We call h the entropy function induced by . For any function h in , we define
If h is the entropy function induced by random variables , then is the conditional entropy and is the mutual information .
All entropy functions must satisfy the following polymatroidal axioms.
The second axiom (R2) corresponds to that conditional entropy is nonnegative and the third axiom (R3) corresponds to that the conditional mutual information between and given is nonnegative.
3.1. Geometric Framework
Characterisation of entropic functions is one of the most important and challenging problems in information theory. In the following, we will review the geometric framework proposed in [10] which has greatly simplified our understanding about information inequalities.
A function is called weakly entropic if there exists such that is entropic, and is called almost entropic if it is the limit of a sequence of weakly entropic functions. Let be the set of all entropic functions and be its closure. Then is a closed and convex cone, and in fact is the set of all almost entropic functions. Compared to , its closure is more manageable. In fact, for many application, it is sufficient to consider . The following proves that characterising all linear information inequalities is equivalent to characterising the set .
Theorem 1
(Yeung [10]) An information inequality is valid (i.e., holds for all discrete random variables) if and only if
Unfortunately, is still extremely difficult to characterise explicitly for . As we shall see, the cone is not polyhedral and hence cannot be defined by a finite number of linear inequalities. Theorem 1 offers a geometric perspective in understanding information inequalities. Based on the theorem, Yan et al. [11] wrote the software called Information-Theoretic Inequality Prover (ITIP) which can mechanically verify all Shannon inequalities.
The idea behind ITIP is very simple: Suppose we have a cone of such that . Consider an information inequality
Suppose one can verify that
Then by Theorem 1, the information inequality (8) will be valid. In other words, if the minimum of the following optimisation problem is nonnegative,
then the information inequality (8) is valid.
As is a cone (hence, for all and ), it is only required to test if the origin is a global minimum or not in the above optimisation problem. Furthermore, as the optimisation problem is convex, the optimality of can be verified by checking the Karush–Kuhn–Tucker (KKT) condition.
In ITIP, is chosen as the cone whose elements are all rank functions h that satisfies the polymatroidal axioms (R1)-(R3). By picking such a cone, the ITIP can prove all inequalities that are implied by the three axioms (or equivalently, all Shannon inequalities).
3.2. Non-Shannon Inequalities
It has been an open question for many years whether there exist information inequalities that are not implied by Shannon’s information inequalities. This question was finally answered in [1] where non-Shannon type inequalities were constructed explicitly. The proof was based on the use of auxiliary random variables. This turns out to be a very powerful technique. In fact, all subsequently discovered non-Shannon type information inequalities are essentially proved by the same technique.
Theorem 2
(Non-Shannon’s inequality [1]) Let be random variables. Then
Or equivalently, if h is entropic, then
The information inequality in Theorem 2 is a non-Shannon’s inequality because one can construct a rank function such that (1) h satisfies all the polymatroidal axioms (R1)-(R3) and (2) h violates the inequality (10)
To illustrate the technique in proving new inequalities, we will sketch the proof for Theorem 2. Further details can be found in [1,12].
Sketch of proof of Theorem 2:
Let h be the entropy function induced by a set of discrete random variables whose underlying distribution is p. Construct two auxiliary random variables and such that
It is easy to see that the marginals of and are the same. By invoking the basic Shannon inequalities (involving six random variables), we can prove that
Hence,
Similarly, we can also prove that
and consequently,
Again, by invoking only Shannon’s inequalities, it can be proved that
Combining (15) and (17), the theorem is proved.Ⅰ
Remark: In the above proof of Theorem 2, the non-Shannon inequality is proved by invoking only a sequence of Shannon inequalities. This seems impossible at the first glance, as by definition, non-Shannon inequalities are all inequalities that are not implied by Shannon inequalities. The trick however is to apply Shannon inequalities over a larger set of random variables.
Using the geometric framework obtained earlier, we will describe in the following a “geometric interpretation” for the proof of the non-Shannon’s inequality.
Consider a set such that . Let . We define as a function such that
for all . Similarly, for any subset of , is the following subset
Now, suppose that one can construct two cones and such that
- ;
- For any , there exists a such that . Or equivalently, .
From the conditions 1 and 2, we have
Again, using Theorem 1, we can prove that an information inequality
is valid if
Equivalently, the inequality (18) is valid if the minimum of the following linear program is zero.
Remark: Instead of verifying if an information inequality is valid or not, we can also use the Fourier-Motzkin elimination method to find all linear inequalities that defines the cone . Clearly, each such inequality corresponds to a valid information inequality over .
Now, we will revisit the non-Shannon inequality in Theorem 2. Let and . Given any random variables , construct two random variables and such that the probability distribution of is given by (11). Let g be the entropy function of and h be the entropy function of . Then it is easy to see that for all and ,
and .
Let
and (which is the set of all functions h that satisfies the polymatroidal axioms). Then clearly and . It can be numerically verified that the minimum of the linear program in (19) is zero when the information inequality is the non-Shannon inequality (9). Consequently, the non-Shannon inequality is indeed proved.
3.3. Non-Polyhedral Property
In the pervious subsection, we have discussed a promising technique in proving (or even discovering) new information inequalities. Using the same technique proposed in [1], more and more linear information inequalities have been discovered [8,13,14,15]. Later in [9], Matúš obtained a countable infinite set of linear information inequalities for a set of four random variables. Using the same set of inequalities, Matúš further proved that is not a polyhedral. In the following, we will review Matúš’ inequalities and its relaxation.
Remark: The non-polyhedral property of was later used by [16] to show that the set of achievable tuples of a network is in general also non-polyhedral. As a result, this proved that the Linear Programming bounds is not tight in general.
Theorem 3 (Matúš)
Let and . Then
where for any distinct elements ,
While Matúš proved a series of linear information inequalities, it is sometimes difficult to use these infinitely number of inequalities at the same time. In [17], the series of Matúš’ inequalities is relaxed to a single non-linear inequality.
Remark: Using one single nonlinear inequality, it can be proved that the set of all almost entropic functions is not polyhedral.
Theorem 4
(Quadratic information inequality [17]) Let ,
If , then
and consequently,
Remark: Subject to the constraint that , then the series of linear inequalities (20) is implied by the Shannon inequalities. Therefore, the constraint (i.e., ) we imposed on Theorem 4 is not critical.
Conjecture 1
(20) holds for all . Consequently, if , then
4. Equivalent Frameworks
In the previous section, we have described a framework for information inequalities for discrete random variables. We have also demonstrated the common proving technique. In this section, we will construct several different frameworks which are “equivalent” or “almost equivalent” to the earlier one. These equivalence relations among different frameworks will turn out to be very useful in deriving new information theoretic tools.
4.1. Differential Entropy
The previous framework for information inequalities assumes that all random variables are discrete. A very natural extension of the framework is thus to relax the restriction by allowing random variables to be continuous. To achieve this goal, we will first need an analogous definition of discrete entropy in the domain of continuous random variables.
Definition 1 (Differential entropies)
Let be a set of continuous random variables such that are real numbers. For any , let be the density functions for . Then the differential entropy of is denoted by
Remark: For notation simplicity, we abuse our notations by using to denote both discrete and differential entropies. However, its exact meaning should be clear from the context.
Discrete and differential entropies shared similar and dissimilar properties. The main difference is that differential entropy can be negative, unlike discrete entropy. However, mutual information and its conditional counterpart (by defined analogously as in (7)) remain nonnegative. In fact, as we shall see, the sets of information inequalities for discrete and continuous random variables are almost the same.
Definition 2 (Balanced inequalities)
An information inequality (for either discrete or continuous random variables) is called balanced if for all , .
For any information inequality or expression , its residual weight is defined as
Clearly, an information inequality is balanced if and only if for all .
Example 1
The residual weights of the information inequality are both equal to one. Hence, the inequality is not balanced.
For any information inequality , its balanced counterpart is the following inequality
which is balanced (as its name suggests).
Proposition 1
(Necessary and sufficiency of balanced inequalities [6]) For any valid information inequality , it is a valid discrete information inequality if and only if
- 1.
- its residual weights for all n, and
- 2.
- its balanced counterpart is also valid.
Consequently, all valid discrete information inequalities are implied by the set of all valid balanced inequalities and the nonnegativity of (conditional) entropies.
It turns out that this set of balanced information inequalities also play the same significant role for inequalities involving continuous random variables.
Theorem 5
(Equivalence [6]) All information inequalities for continuous random variables are balanced. Furthermore, a balanced information inequality
is valid for continuous random variable if and only if it is also valid for discrete random variables.
By Theorem 5, to characterise information inequalities, it is sufficient to consider only balanced information inequalities which are the same for either discrete or continuous random variables.
4.2. Inequalities for Kolmogorov Complexity
The second framework we will describe is quite different from the earlier information-theoretic frameworks. For information inequalities, the objects of interest are random variables. However, for the following Kolmogorov complexity framework, the objects of interest are deterministic strings instead.
To understand what Kolmogorov complexity is, let us consider the following example: Suppose that and are the following binary strings
Kolmogorov complexity of a string x (denoted by ) is the minimal program length required to output that string [18] In the above example, it is clear that the Kolmogorov complexity of is much smaller than that of (which is obtained by flipping a fair coin).
Although the objects of interest are different, [3] proved a surprising result that inequalities for Kolmogorov complexities and for entropies are essentially the same.
Theorem 6
(Equivalence [3]) An information inequality (for discrete random variable) is valid if and only if the corresponding Kolmogorov complexity inequality defined below
is also valid.
4.3. A Group-Theoretic Framework
Besides Kolmogorov complexities, information inequalities are also closely related to group-theoretic inequalities [4]. To understand their relation, we first illustrate how to construct a random variable from a subgroup.
Definition 3 (Group-theoretic construction of random variables)
Let G be a finite group and U be a random variable that takes value in G uniformly. In other words,
for all .
For any subgroup K of G, it partitions G into ’s left (or right) coset of K in G such that each coset has exactly ’s elements. Note that, each coset can be written as the following subset for some elements
where ∘ is the binary group operator. Let be the collection of all left cosets of K in G. The subgroup K induces a random variable , which is defined as the random left coset of K in G that contains U. In fact, is equal to the following coset
Since U is uniformly distributed over G, we can easily prove that is uniformly distributed over and that
The above construction of a random variable from a subgroup can be extended naturally to multiple subgroups.
Theorem 7
(Group characterisable random variables [4]) Let G be a finite group and be a set of subgroups of G. For each , let be the random variable induced by the subgroup as defined above. Then for any ,
- 1.
- ,
- 2.
- ,
- 3.
- is uniformly distributed over its support. In other word, the value of the probability distribution function of is either zero or is a constant.
Definition 4
A function is called group characterisable if it is the entropy function of a set of random variables induced by a finite group G and its subgroups . Furthermore, h is
- 1.
- representable if are all vector space, and
- 2.
- abelian if G is abelian.
Clearly, random variables induced by a set of subgroups must satisfy all valid information inequalities Therefore, we have the following theorem.
Theorem 8
(Group-theoretic inequalities [4]) Let
be a valid information inequality. Then for any finite group G and its subgroups , we have
or equivalently,
Theorem 8 proved that we can directly “translate” any information inequality into a group-theoretic inequality. A very surprising result proved in [4] was that the the converse also holds.
Theorem 9
(Converse [4]) The information inequality (30) is valid if it is satisfied by all random variables induced by groups, or equivalently, the group-theoretic inequality (32) is valid.
Theorems 8 and 9 suggested that to prove an information inequality, it is necessary and sufficient to verify if the inequality is satisfied by all random variables induced by groups. Later, we will further illustrate how to use the two theorems to derive a group-theoretic proof for information inequalities.
In the following, we will further prove that many statistical properties of random variables induced by groups will have analogous algebraic interpretations.
Lemma 1 (Properties of group induced random variables)
Suppose that is a set of random variables induced by a finite group G and its subgroups . Then
- 1.
- (Functional dependency) (i.e., is a function of ) if and only if . Hence, functional dependency is equivalent to subset relation;
- 2.
- (Independency) if and only if
- 3.
- (Conditioning preserves group characterisation) for any fixed any , the group and its subgroups for induce a set of random variables such thatfor all . In other words, for any group characterisable , let such thatfor all . Then g is also group characterisable.
Proposition 2
(Duality [19]) Let be a set of vector subspaces of over the finite field . Define the following subspace for :
Then, for any ,
Hence, if such that for all , then h is weakly representable.
Remark: While and W are both subspaces of V and , in general. If , then (defined as in (34)) is the orthogonal complement of .
Theorems 8 and 9 suggested that proving an information inequality (30) is equivalent to proving a group-theoretic inequality (32). In the following, we will illustrate the idea by providing a group-theoretic proof for nonnegativity of mutual information
Example 2 (Group-theoretic Proof)
Let G be a finite group and and be its subgroups. Let
where ∘ is the binary group operator. As S is a subset of , . With a simple counting argument (by removing duplications), it can be proved easily that
Therefore,
Finally, according to Theorems 8 and 9, the inequality (35) follows.
It is worth mentioning that Theorems 8 and 9 also suggested an information-theoretic proof for group-theoretic inequalities. For example, the following information inequality
implies the following group-theoretic inequality
The meaning of this inequality and its implications in group theory are yet to be understood.
4.4. Combinatorial Perspective
Random variables that are induced by groups have many interesting properties. One interesting property is that they are quasi-uniform in nature.
Definition 5 (Quasi-uniform random variables)
A set of random variables is called quasi-uniform if for all , is uniformly distributed over its support . In other words,
Since is uniformly distributed for all , the entropy is thus equal to .
According to the Asymptotic Equipartition Property (AEP) [12], for a sufficiently long sequence of independent and identically distributed random variables, the set of typical sequences has a total probability close to one and the probability of each typical sequence is approximately the same. In certain sense, quasi-uniform random variables possess the non-aymptotic equipartition property that the probabilities are completely concentrated and uniformly distributed over their supports. As a result, quasi-uniform random variables can be fully characterised by their supports (because the probability distributions are uniform over the supports). This offers a combinatorial interpretation for quasi-uniform random variables. And it turns out that this interpretation offers a combinatorial approach to proving information inequalities.
Definition 6 (Box assignment)
Let be nonempty finite sets and be their Cartesian product . A box assignment in is a nonempty subset of .
For any and , we define
Roughly speaking, is the set of elements in such that its “-coordinate” is for . The set will be called the -layer of . And hence, contains all such that the -layer of is nonempty. And we will call the α-projection of .
Definition 7 (Quasi-uniform box assignment)
A box assignment is called quasi-uniform if for any , the cardinality of is constant for all . And we will denote the constant by for simplicity.
The following proposition proves that quasi-uniform box assignment and quasi-uniform random variables are in fact equivalent.
Proposition 3
(Equivalence [7]) Let be a set of quasi-uniform random variables and be its probability distribution’s support. Then is a quasi-uniform box assignment in . Furthermore, for all ,
Conversely, for any quasi-uniform box assignment , there exists a set of quasi-uniform random variables whose probability distribution’s support is indeed .
As random variables induced by groups are quasi-uniform, by Theorems 8 and 9, we have the following combinatorial interpretation for information inequalities.
Theorem 10
(Combinatorial interpretation [7]) An information inequality
is valid if and only if the following box assignment inequality is valid
or equivalently,
for all quasi-uniform box assignments .
Again, in the following example, we will illustrate how to use the combinatorial interpretation to derive a “combinatorial proof” for information inequality.
Example 3 (Combinatorial proof)
Let be a quasi-uniform box assignment in . Suppose . Then it is obvious that and . In other words, and consequently,
By Theorem 10, we prove that .
4.5. Coding Perspective
We can also view a box assignment as an error correcting code such that is the set of all codewords. For each codeword , is the symbol to be transmitted across a channel. Taking this coding perspective, in the following, a box assignment will simply be called a code. Also, a code is called a quasi-uniform code if is a quasi-uniform box assignment. Again, each quasi-uniform code will induce a set of quasi-uniform random variables .
For any code (which is just a box assignment) and two codewords , the Hamming distance between codewords and is defined as
In addition, the minimum Hamming distance of the code is defined as
The minimum Hamming distance of a code characterises how strong the error correcting capability of the code is. Specifically, a code with a minimum Hamming distance d can correct up to ’s symbol errors.
Example 4
Let be a length-3 code containing only two codewords and . The minimum Hamming distance of this code is 3 and hence can correct any single symbol error. For instance, suppose the codeword is transmitted. If a symbol error occurs, the receiver will receive either , or . In any case, the receiver can always determine which symbol is erroneous (by using a bounded-distance decoder) and hence can correct it.
In addition to the minimum Hamming distance, in many cases, a code’s distance profile is also of great importance: Let be a code and c be a codeword in . The distance profile of centered at c is a set of integers where
In other words, is the number of codewords in such that their Hamming distances to the centering codeword c is r.
The profile contains information about how likely a decoding error (i.e., the receiver decodes a wrong codeword) occurs if the transmitted codeword is c. In general, the distance profile depends on the choice of c. A code is called distance-invariant if its distance profile is independent of c. Roughly speaking, a distance-invariant code is one where the probability of decoding error is the same for all transmitted codewords .
Theorem 11
(Distance invariance [20]) Quasi-uniform codes are distance-invariant.
Example 5 (Linear codes)
Let P be a parity check matrix (over a finite field ) and the code is defined by
Then is called a linear code. Note that, for a linear code, if , then is also contained in . Linear codes are quasi-uniform codes and hence are also distance invariant.
In the following, we will consider only quasi-uniform codes. For simplicity, we will assume without loss of generality that there is a zero-codeword (by renaming). Also, for any , we define the Hamming weight of the codeword c (denoted by ) as .
Definition 8 (Weight enumerator)
The weight enumerator of a quasi-uniform code C with length n is
where x and y are indeterminates, and . Using simple counting, it is easy to prove that
In many cases, it is more convenient to work with weight enumerator than distance profile. However, conceptually, they are equivalent (i.e., they can be uniquely obtained from each other). Clearly, the weight enumerator is uniquely determined from the code . However, what “structural property” of the code determines the weight enumerator? For example, suppose that we construct a new code from by exchanging the first and the second codeword symbols. It is obvious that this modification will not affect the weight enumerator. In other words, ordering of the codeword symbols has no effects on the weight enumerator. The question therefore is: What property of a code has direct effects on the weight enumerator?
To answer the question, let us use the old perspective that a quasi-uniform code is merely a quasi-uniform box assignment (and also its associated set of quasi-uniform random variables). These random variables have a simple interpretation here: Suppose a codeword is randomly and uniformly selected from . Then is the symbol in the random codeword C, i.e., . Our answer to the above question is given in the following theorem.
Theorem 12
(Generalised Greene’s Theorem [20]) Let C be a quasi-uniform code and be its induced quasi-uniform random variables. Suppose that ρ is the entropy function of . In other words, . Then
Remark: The Greene’s Theorem is a special case of Theorem 12 when the code is a linear code.
By Theorem 12, the weight enumerator (and also the error-correcting capability) of a quasi-uniform code depends only on the entropy function induced by the codeword symbol random variables. By exploiting the relation between the entropy function of a set of quasi-uniform random variables and the weight enumerator of the induced code, we open a new door on how to harness coding theory results to derive new information theory results.
Example 6
(Code-theoretic proof) Consider a set of quasi-uniform random variables which induces a length-2 quasi-uniform code C. The length of the code is 2. By the Generalised Greene’s Theorem, the number of codewords which have Hamming weights 1 is given by
As is nonnegative, (50) implies that
Finally, by Theorem 10 (a variation of which to be precise), an information inequality holds if and only if it also holds for all quasi-uniform random variables. Consequently, we prove that (51) holds for all random variables.
5. Constrained Information Inequalities
In pervious sections, we considered general information inequalities where we do not impose any constraint on the choice of random variables. In the following, we will focus on two constrained classes of information inequalities: subspace rank inequalities and determinantal inequalities.
5.1. Rank Inequalities
Let be a set of vector subspaces over a field . A subspace rank inequality is an inequality about the rank or dimension of subspaces in the following form:
For example, it is straightforward to prove that
which is a direct consequence of the following identity
Subspace rank inequalities are in fact constrained information inequalities subject to the criteria that random variables are induced by vector subspaces over a field. Clearly, all valid information inequalities (including all Shannon inequalities) are subspace rank inequalities. For example, the subspace rank inequality (53) is indeed equivalent to the nonnegativity of mutual information. Besides all these known unconstrained information inequalities, one of the most well-known subspace rank inequalities is the Ingleton inequalities [21]. A recent work [22] proved that Ingleton inequalities also include Shannon inequalities as special cases and determined the unique minimal set of Ingleton inequalities that imply all the others.
Theorem 13 (Ingleton inequality)
Suppose r is a representable polymatroid over . Then for every choice of subsets
It has been open for years whether there exists subspace rank inequalities that are not implied by Ingleton inequalities and Shannon inequalities. It was until recently that the question was finally answered. In [5], insufficiency of Ingleton inequality to characterise all subspace rank inequalities was proved. And in [23,24], new subspace rank inequalities not implied by Ingleton inequalities were explicitly constructed. In fact, the set of subspace rank inequalities for up to five variables have all been determined. However, the complete characterisation involving more than five variables is still missing. In the following, we will review some of the important results along this line of work.
Theorem 14
(Kinser [23]) Suppose and h is representable over . Then
Or equivalently,
Theorem 15
(Dougherty et al. [24]) Suppose and h is representable over . Then
Remark: In addition to the inequalities obtained in Theorem 15, the work [24] found all subspace rank inequalities in five variables (called DFZ inequalities) and many more other new inequalities in six variables.
Definition 9 (ϵ-truncation)
Let h be a polymatroid over and . Define g as follows where
Then g is called the ϵ-truncation of h.
Definition 10 (Truncation-preserving inequalities)
Let . A set of rank inequalities
is said to preserve truncation (or is truncation-preserving) if for any h satisfying all the inequalities in (60), its truncation also satisfies all the inequalities.
Proposition 4
(Chan et al. [5]) DFZ inequalities are truncation preserving.
Theorem 16
(Insufficiency of truncation preserving inequalities [5]) Let be the set of all subspace rank inequalities involving n variables (or subspaces). Then for sufficiently large n, is not truncation-preserving.
5.2. Determinantal Inequalities
Information inequalities for Gaussian random variables are another interesting class of information inequalities. As we shall see, they are equivalent to determinantal inequalities.
Definition 11 (Gaussian polymatroid)
Let h be a polymatroid over . It is called Gaussian if there exists a set of jointly Gaussian random variables with a covariance matrix and a partition of into n disjoint nonempty subsets such that for any ,
where for all . Furthermore, h is called weakly Gaussian if there exists such that is Gaussian, and almost Gaussian if h is the limit of a sequence of weakly Gaussian functions.
It is straightforward to prove that the weakly Gaussian property is closed under addition. In other words, if h and g are weakly Gaussian, then their sum is also weakly Gaussian. Furthermore, like information inequality for any continuous random variables, if an inequality
holds for all Gaussian random variables [25], then it must be balanced. Therefore, in the following, we will only consider balanced information inequalities.
Let be a set of jointly Gaussian random variables with covariance matrix K which is a positive definite matrix. Suppose is partitioned into n disjoint nonempty subsets . A very compelling property of a set of Gaussian random variable is that its entropy and the determinant of its covariance matrix is related by the following relation:
where be the principal submatrix of K by deleting rows and columns that are not indexed by β. Substitute (63) back into (62), the inequality (62) is satisfied by all Gaussian random variables if and only if
where . Since the inequality (62) is balanced,
for all . On the other hand,
Therefore, the inequality (62) holds for all Gaussian random variables if and only if the following determinantal inequality holds for all positive definite matrix K
or equivalently,
As a direct consequence, for any valid information inequality, we can use the above relation to derive a corresponding determinantal inequality. For example, the following well-known determinantal inequalities can all be proved using this “information-theoretical method”.
- (Hadamard inequality) Let K be a positive definite matrix K. Thenwhere is the diagonal entry of K. This inequality follows from the following information inequality
- (Szasz inequality) For any ,This determinantal inequality follows from the following information inequality
Finally, we will conclude this section by the following open question: While Gaussian polymatroid is clearly almost entropic, is it true that an almost entropic polymatorid almost Gaussian? In other words, for any almost entropic polymatroid h, can we construct a sequence of Gaussian polymatroids such that
for some for all i.
6. Summary and Conclusions
In this paper, we have reviewed some of the recent progresses in characterisation of information inequalities. We first began with a geometric framework for information inequalities which has simplified the understanding of information inequalities. We also reviewed how the first non-Shannon inequality was proved and highlighted the general idea behind the proof. Next, we studied the infinite series of inequalities over and considered a nonlinear relaxation of the series of inequalities.
We have also reviewed how information inequalities are related to Kolmogorov complexity inequalities, group-theoretic inequalities and inequalities for box assignments. Based on their relations, we demonstrated non-traditional approaches to proving information inequalities.
Finally, we investigated two constrained classes of information inequalities. The first class is when random variables are induced by vector spaces. In this case, the constrained inequalities are equivalent to subspace rank inequalities. We showed that Ingleton and DFZ inequalities are insufficient to characterise all subspace rank inequalities in general where the set of all subspace rank inequalities is not truncation-preserving. The second constrained class of inequalities is when random variables are Gaussian. We have showed that these constrained inequalities are in fact determinantal inequalities.
As a final remark, we would like to emphasise that this survey paper aims not to cover every aspect about information inequalities. In fact, there are many interesting pieces of work that we did not cover. For example, as pointed out by one of the reviewers, one very interesting area is about the relation between convex body inequalities and information inequalities [26,27]. We strongly encourage readers who are interested to further explore those relevant areas.
Acknowledgements
This work was supported by the Australian Government under ARC grant DP1094571.
References
- Zhang, Z.; Yeung, R.W. On the characterization of entropy function via information inequalities. IEEE Trans. Inform. Theory 1998, 44, 1440–1452. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
- Hammer, D.; Romashchenko, A.; Shen, A.; Vereshchagin, N. Inequalities for Shannon Entropy and Kolmogorov Complexity. J. Computer Syst. Sci. 2000, 60, 442–464. [Google Scholar] [CrossRef]
- Chan, T.H.; Yeung, R.W. On a relation between information inequalities and group theory. IEEE Trans. Inform. Theory 2002, 48, 1992–1995. [Google Scholar] [CrossRef]
- Chan, T.H.; Grant, A.; Kern, D. Novel technique in characterising representable polymatroids. IEEE Trans. Inform. Theory 2009. submitted for publication. [Google Scholar]
- Chan, T.H. Balanced information inequalities. IEEE Trans. Inform. Theory 2003, 49, 3261–3267. [Google Scholar] [CrossRef]
- Chan, T.H. A combinatorial approach to information inequalities. Commun. Inform. Syst. 2001, 1, 1–14. [Google Scholar] [CrossRef]
- Dougherty, R.; Freiling, C.; Zeger, K. Six New Non-Shannon Information Inequalities. IEEE Int. Symp. Inform. Theory 2006, 233–236. [Google Scholar]
- Matus, F. Infinitely Many Information Inequalities. In Proceedings of ISIT 2007, Nice, France, June 2007.
- Yeung, R. A framework for linear information inequalities. IEEE Trans. Inform. Theory 1997, 43, 1924–1934. [Google Scholar] [CrossRef]
- Yeung, R.; Yan, Y. Information Theoretic Inequality Prover. Available online: http://user-www.ie.cuhk.edu.hk/ITIP/ (accessed on 27 January 2011).
- Yeung, R. A First Course in Information Theory; Kluwer Academic/Plenum Publisher: New York, NY, USA, 2002. [Google Scholar]
- Yeung, R.W.; Zhang, Z. A class of non-Shannon-type information inequalities and their applications. Commun. Inform. Syst. 2001, 1, 87–100. [Google Scholar] [CrossRef]
- Sason, I. Identification of new classes of non-Shannon type constrained information inequalities and their relation to finite groups. In Proceedings of 2002 IEEE International Symposium, Lausanne, Switzerland, 30 June–5 July 2002.
- Makarychev, K.; Makarychev, Y.; Romashchenko, A.; Vereshchagin, N. A new class of non-Shannon-type inequalities for entropies. Commun. Inform. Syst. 2002, 2, 147–165. [Google Scholar] [CrossRef]
- Chan, T.H.; Grant, A. Dualities between Entropy Functions and network codes. IEEE Trans. Inform. Theory 2008, 54, 4470–4487. [Google Scholar] [CrossRef]
- Chan, T.; Grant, A. Non-linear Information Inequalities. Entropy J. 2008, 10, 765–775. [Google Scholar] [CrossRef]
- Strictly speaking, the Kolmogorov complexity of a string depends on the chosen “computer model”. However, the choice of the computer model will only affect the resulting Kolmogorov up to a constant difference (because different computer models can emulate each other). Asymptotically, such a difference will not cause a significant difference.
- Chan, T.H.; Grant, A. Linear programming bounds for network coding. IEEE Trans. Inform. Theory 2011. sbmitted to be published. [Google Scholar]
- Chan, T.H.; Grant, A.; Britz, T. Properties of quasi-uniform codes. In Proceedings of 2010 IEEE International Symposium on Information Theory, Austin, TX, USA, June 2010.
- Ingleton inequalities are not valid information inequalities, as there exists almost entropic polymatroids violating the inequalities.
- Guille, L.; Chan, T.H.; Grant, A. The minimal set of Ingleton inequalities. IEEE Trans. Inform. Theory 2009. [Google Scholar]
- Kinser, R. New inequalities for subspace arrangements. J. Combin. Theory Ser. A 2010. [Google Scholar] [CrossRef]
- Dougherty, R.; Freiling, C.; Zeger, K. Linear rank inequalities on five or more variables. Arxiv Preprint 2009. [Google Scholar]
- Each Xi can be a vector of jointly distributed Gaussian random variables as defined in (61).
- Lutwak, E.; Yang, D.; Zhang, G. Cramer-Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information. IEEE Trans. Info. Theory 2005, 51, 473–478. [Google Scholar] [CrossRef]
- Lutwak, E.; Yang, D.; Zhang, G. Moment-entropy inequalities for a random vector. IEEE Trans. Info. Theory 2007, 53, 1603–1607. [Google Scholar] [CrossRef]
© 2011 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.