Information Inequalities via Submodularity and a Problem in Extremal Graph Theory

The present paper offers, in its first part, a unified approach for the derivation of families of inequalities for set functions which satisfy sub/supermodularity properties. It applies this approach for the derivation of information inequalities with Shannon information measures. Connections of the considered approach to a generalized version of Shearer’s lemma, and other related results in the literature are considered. Some of the derived information inequalities are new, and also known results (such as a generalized version of Han’s inequality) are reproduced in a simple and unified way. In its second part, this paper applies the generalized Han’s inequality to analyze a problem in extremal graph theory. This problem is motivated and analyzed from the perspective of information theory, and the analysis leads to generalized and refined bounds. The two parts of this paper are meant to be independently accessible to the reader.


I. INTRODUCTION
Information measures and information inequalities are of fundamental importance and wide applicability in the study of feasibility and infeasibility results in information theory, while also offering very useful tools which serve to deal with interesting problems in various fields of mathematics [1], [2].The characterization of information inequalities has been of interest for decades (see, e.g., [3], [4] and references therein), mainly triggered by their indispensable role in proving direct and converse results for channel coding and data compression for single and multi-user information systems.Information inequalities, which apply to classical and generalized information measures, have also demonstrated far-reaching consequences beyond the study of the coding theorems and fundamental limits of communication systems.One such remarkable example (among many) is the usefulness of information measures and information inequalities in providing information-theoretic proofs in the field of combinatorics and graph theory (see, e.g., [5]- [22]).
A basic property that is commonly used for the characterization of information inequalities relies on the nonnegativity of the (conditional and unconditional) Shannon entropy of discrete random variables, the nonnegativity of the (conditional and unconditional) relative entropy and the Shannon mutual information of general random variables, and the chain rules which hold for these classical information measures.A byproduct of these properties is the sub/supermodularity of some classical information measures, which also prove to be useful by taking advantage of the vast literature on sub/supermodular functions and polymatroids [22]- [31].Another instrumental lemma, and to other results in the literature.Most of the results in Section III are proved in Section IV.Section V applies the generalized Han's inequality to a problem in extremal graph theory (Theorem 2).A byproduct of Theorem 2, which is of interest in its own right, is also analyzed in Section V (Theorem 3).The presentation and analysis in Section V is accessible to the reader, independently of the earlier material on information inequalities in Sections III and IV.Some additional proofs, mostly for making the paper self-contained or for suggesting an alternative proof, are relegated to the appendices (Appendices A and B).

II. PRELIMINARIES AND NOTATION
The present section provides essential notation and preliminary material for this paper.
• R denotes the set of real numbers, and R + denotes the set of nonnegative real numbers.
• ∅ denotes the empty set.

• 2 Ω
A : A ⊆ Ω denotes the power set of a set Ω (i.e., the set of all subsets of Ω). • T c Ω \ T denotes the complement of a subset T in Ω.
• ½{E} is an indicator of E; it is 1 if event E is satisfied, and zero otherwise.
• [n] {1, . . ., n} for every n ∈ N; • X n (X 1 , . . ., X n ) denotes an n-dimensional random vector; • X S (X i ) i∈S is a random vector for a nonempty subset S ⊆ [n]; if S = ∅, then X S is an empty set, and conditioning on X S is void.• Let X be a discrete random variable that takes its values on a set X , and let P X be the probability mass function (PMF) of X.The Shannon entropy of X is given by where throughout this paper, we take all logarithms to base 2. • The binary entropy function H b : [0, 1] → [0, log 2] is given by where, by continuous extension, the convention 0 log 0 = 0 is used.• Let X and Y be discrete random variables with a joint PMF P XY , and a conditional PMF of X given Y denoted by P X|Y .The conditional entropy of X given Y is defined as and • The mutual information between X and Y is symmetric in X and Y , and it is given by • The conditional mutual information between two random variables X and Y , given a third random variable Z, is symmetric in X and Y and it is given by • For continuous random variables, the sums in ( 1) and ( 3) are replaced with integrals, and the PMFs are replaced with probability densities.The entropy of a continuous random variable is named differential entropy.• For an n-dimensional random vector X n , the entropy power of X n is given by where the base of the exponent is identical to the base of the logarithm in (1).
We rely on the following basic properties of the Shannon information measures: • Conditioning cannot increase the entropy, i.e., with equality in (8) if and only if X and Y are independent.• Generalizing (4) to n-dimensional random vectors gives the chain rule • The subadditivity property of the entropy is implied by ( 8) and ( 9): with equality in (10) if and only if X 1 , . . ., X n are independent random variables.• Nonnegativity of the (conditional) mutual information: In light of ( 5) and (8), I(X; Y ) ≥ 0 with equality if and only if X and Y are independent.More generally, I(X; Y |Z) ≥ 0 with equality if and only if X and Y are conditionally independent given Z.
Let Ω be a finite and non-empty set, and let f : 2 Ω → R be a real-valued set function (i.e., f is defined for all subsets of Ω).The following definitions are used.
Definition 1 (Sub/Supermodular function): The set function f : 2 An identical characterization of submodularity is the diminishing return property (see, e.g., Proposition 2.2 in [23]), where a set function f : 2 Ω → R is submodular if and only if This means that the larger is the set, the smaller is the increase in f when a new element is added.
Likewise, f is monotonically decreasing if −f is monotonically increasing.Definition 3 (Polymatroid, ground set and rank function): Let f : 2 Ω → R be submodular and monotonically increasing set function with f (∅) = 0.The pair (Ω, f ) is called a polymatroid, Ω is called a ground set, and f is called a rank function.

Proposition 1:
Let Ω be a finite and non-empty set, and let {X ω } ω∈Ω be a collection of discrete random variables.Then, the following holds: a) The set function f : 2 Ω → R, given by is a rank function.b) The set function f : 2 Ω → R, given by is supermodular, monotonically increasing, and is submodular, f (∅) = 0, but f is not a rank function.The latter holds since the equality f (T ) = f (T c ), for all T ⊆ Ω, implies that f is not a monotonic function.d) Let U , V ⊆ Ω be disjoint subsets, and let the entries of the random vector X V be conditionally independent given X U .Then, the set function f : 2 V → R given by is a rank function.
e) Let X Ω = {X ω } ω∈Ω be independent random variables, and let the set function f : 2 Ω → R be given by Then, f is a rank function.
The following proposition addresses the setting of general alphabets.Proposition 2: For general alphabets, the set functions f in ( 15) and ( 17)-( 19) are submodular, and the set function f in ( 16) is supermodular with f (∅) 0.Moreover, the function in (18) stays to be a rank function, and the function in (19) stays to be monotonically increasing.
Proof: The sub/supermodularity properties in Proposition 1 are preserved due to the nonnegativity of the (conditional) mutual information.The monotonicity property of the functions in ( 18) and ( 19) is preserved also in the general alphabet setting due to (A.10) and (A.14c), and the mutual information in ( 18) is nonnegative.
Remark 1: In contrast to the entropy of discrete random variables, the differential entropy of continuous random variables is not functionally submodular in the sense of Lemma A.2 in [38].This refers to a different form of submodularity, which was needed by Tao [38] to prove sumset inequalities for the entropy of discrete random variables.A follow-up study in [39] by Kontoyiannis and Madiman required substantially new proof strategies for the derivation of sumset inequalities with the differential entropy of continuous random variables.The basic property which replaces the discrete functional submodularity is the data-processing property of mutual information [39].In the context of the present work, where the commonly used definition of submodularity is used (see Definition 1), the Shannon entropy of discrete random variables and the differential entropy of continuous random variables are both submodular set functions.
We rely, in this paper, on the following standard terminology for graphs.An undirected graph G is an ordered pair G = (V, E) where V = V(G) is a set of elements, and E = E(G) is a set of 2-element subsets (pairs) of V .The elements of V are called the vertices of G, and the elements of E are called the edges of G.We use the notation V = V(G) and E = E(G) for the sets of vertices and edges, respectively, in the graph G.The number of vertices in a finite graph G is called the order of G, and the number of edges is called the size of G. Throughout this paper, we assume that the graph G is undirected and finite; it is also assumed to be a simple graph, i.e., it has no loops (no edge connects a vertex in G to itself) and there are no multiple edges which connect a pair of vertices in G.If e = {u, v} ∈ E(G), then the vertices u and v are the two ends of the edge e.The elements u and v are adjacent vertices (neighbors) if they are connected by an edge in G, i.e., if e = {u, v} ∈ E(G).

A. A New Methodology
The present subsection presents a new methodology for the derivation of families of inequalities for set functions, and in particular inequalities with information measures.The suggested methodology relies, to large extent, on the notion of submodularity of set functions, and it is presented in the next theorem.
Theorem 1: Let Ω be a finite set with |Ω| = n.Let f : 2 Ω → R with f (∅) = 0, and g : R → R. Let the sequence t (n) k n k=1 be given by a) If f is submodular, and g is monotonically increasing and convex, then the sequence t is monotonically decreasing, i.e., In particular, b) If f is submodular, and g is monotonically decreasing and concave, then the sequence t (n) k n k=1 is monotonically increasing.c) If f is supermodular, and g is monotonically increasing and concave, then the sequence t supermodular, and g is monotonically decreasing and convex, then the sequence t (n) k n k=1 is monotonically decreasing.Proof: See Section IV-A.Corollary 1: Let Ω be a finite set with |Ω| = n, f : 2 Ω → R, and g : R → R be convex and monotonically increasing.If • f is a rank function, • g(0) > 0 or there is ℓ ∈ N such that g(0) = . . .= g (ℓ−1) (0) = 0 with g (ℓ) (0) > 0, Furthermore, if lim Proof: See Section IV-B.
Further applications of Theorem 1 lead to the next corollary, which partially introduces some known results that have been proved on a case-by-case basis in [1, Theorems 17.6.1-17.6.3] and [2, Section 2].In particular, the monotonicity properties of the sequences in (30), (32), (33) and (34) were proved in Theorems 1 and 2, and Corollaries 1 and 2 of [2].Both known and new results are readily obtained here, in a unified way, from Theorem 1.The utility of one of these inequalities in extremal combinatorics is discussed in the continuation to this subsection (see Proposition 3), providing a natural generalization of a beautiful combinatorial result in [19,Section 3.2].
i=1 be random variables with finite entropies.Then, the following holds: a) The sequences are monotonically decreasing in k.If {X i } n i=1 are independent, then also the sequence is monotonically decreasing in k. b) The sequence is monotonically increasing in k. c) For every r > 0, the sequences 1 are monotonically decreasing in k.If {X i } n i=1 are independent, then also the sequence is monotonically decreasing in k.
Proof: The finite entropies of {X i } n i=1 assure that the entropies involved in the sequences (30)-( 37) are finite.Item (a) follows from Theorem 1a, where the submodular set functions f which correspond to (30)- (32) are given in ( 15), ( 17) and (19), respectively, and g is the identity function on the real line.The identity k n k = n n−1 k−1 is used for (32).Item (b) follows from Theorem 1c, where f is the supermodular function in (16) and g is the identity function on the real line.We next prove Item (c).The sequence (34) is monotonically decreasing by Theorem 1a, where f is the submodular function in (15), and g : R → R is the monotonically increasing and convex function defined as g(x) = exp(2rx) for x ∈ R (with r > 0).The sequence ( 35) is monotonically decreasing by Theorem 1d, where f is the supermodular function in (16), and g : R → R is the monotonically decreasing and convex function defined as g(x) = exp(−rx) for x ∈ R. The sequence (36) is monotonically decreasing by Theorem 1a, where f is the submodular function in (17) and g is the monotonically increasing and convex function defined as g(x) = exp(rx) for x ∈ R. Finally, the sequence (37) is monotonically decreasing by Theorem 1a, where f is the submodular function in (19) and g is the monotonically increasing and convex function defined as g(x) = exp(2rx) for x ∈ R.
Remark 2: From Proposition 2, since the proof of Corollary 3 only relies on the sub/ supermodularity property of f , the random variables {X i } n i=1 do not need to be discrete in Corollary 3. In the reproduction of Han's inequality as an application of Corollary 2, the random variables {X i } n i=1 do not need to be discrete as well since f is not required to be nonnegative if α = 1 (only the submodularity of f in ( 15) is required, which holds due to Proposition 2).
In light of Proposition 2, since the proof of Corollary 3 only relies on the submodularity/ supermodularity property of f , the random variables {X i } n i=1 do not need to be discrete in Corollary 3. In the reproduction of Han's inequality as an application of Corollary 2, the random variables {X i } n i=1 do not need to be discrete as well since the requirement that f be nonnegative is removed for α = 1 (only the submodularity of f in ( 15) is required, which holds due to Proposition 2).
The following result exemplifies the utility of the monotonicity result of the sequence (30) in extremal combinatorics.It also generalizes the result in Section 3.2 of [19] for an achievable upper bound on the cardinality of a finite set in the three-dimensional Euclidean space, expressed as a function of its number of projections on each of the planes XY, XZ and Y Z.The next result provides an achievable upper bound on the cardinality of a finite set of points in an ndimensional Euclidean space, expressed as a function of its number of projections on each of the k-dimensional Euclidean subspaces with an arbitrary k < n.
Proposition 3: Let P ⊆ R n be a finite set of points in the n-dimensional Euclidean space with |P| M .Let k ∈ [n − 1], and ℓ n k .Let R 1 , . . ., R ℓ be the projections of P on each of the k-dimensional subspaces of R n , and let Let R log M n , and R j log Mj k for all j ∈ [ℓ].An equivalent form of ( 38) is given by the inequality Moreover, if M 1 = . . .= M ℓ and k √ M 1 ∈ N, then (38) and ( 39) are satisfied with equality if P is a grid of points in R n with k √ M 1 points on each dimension (so, The sequence in ( 30) is monotonically decreasing, so h Let S 1 , . . ., S ℓ be the k-subsets of the set [n], ordered in a way such that M j is the cardinality of the projection of the set P on the k-dimensional subspace whose coordinates are the elements of the subset S j .Then, (41) can be expressed in the form and also since the entropy of a random variable is upper bounded by the logarithm of the number of its possible values.Combining ( 40), ( 42) and (43) gives Exponentiating both sides of (44) gives (38).In addition, using the identity gives (39) from (44).Finally, the sufficiency condition for equalities in (38) or ( 39) can be easily verified, which is obtained if P is a grid of points in R n with the same finite number of projections on each dimension.

B. Connections to a Generalized Version of Shearer's Lemma and Other Results in the Literature
The next proposition is a known generalized version of Shearer's Lemma.Proposition 4: Let Ω be a finite set, let {S j } M j=1 be a finite collection of subsets of Ω (with M ∈ N), and let f : 2 Ω → R be a set function.a) If f is non-negative and submodular, and every element in b) If f is a rank function, A ⊂ Ω, and every element in A is included in at least d ≥ 1 of the subsets {S j } M j=1 , then The first part of Proposition 4 was pointed out in Section 1.5 of [35], and the second part of Proposition 4 is a generalization of Remark 1 and inequality (47) in [20].Appendix B provides a (somewhat different) proof of Proposition 4a, as well as a self-contained proof of Proposition 4b.
Let {X i } n i=1 be discrete random variables, and consider the set function f : 2 Since f is a rank function [25], Proposition 4 then specializes to Shearer's Lemma [7] and a modified version of this lemma Remark 1 of [20].
In light of Item e) in Proposition 1 and Item b) of Proposition 4, Corollaries 4 and 5 are obtained as follows.
Corollary 4: In particular, if every Remark 3: Inequality (48) is also a special case of [37, Theorem 2], and they coincide if every element i ∈ [n] is included in a fixed number (d) of the subsets {S j } M j=1 .A specialization of Corollary 4 gives the next result.Corollary 5: Let {X i } n i=1 be independent and discrete random variables with finite variances.Then, the following holds: and equivalently, where (51) is in general looser than (50), with equivalence if {X i } n i=1 are i.i.d.; in particular, k−1 such subsets, which then gives (49) as a special case of (48).Alternatively, (49) follows from Corollary 3b, which yields m Exponentiating both sides of (49) gives (50).Inequality (51) is a loosened version of (50), which follows by invoking the AM-GM inequality (i.e., the geometric mean of nonnegative real numbers is less than or equal to their arithmetic mean, with equality between these two means if and only if these numbers are all equal), in conjunction with the identity k The next remarks consider information inequalities in Corollaries 3-5, in light of Theorem 1 here, and some known results in the literature.
Remark 4: Inequality (49) was derived by Madiman as a special case of Theorem 2 in [37].The proof of Corollary 5a shows that (49) can be also derived in two different ways as special cases of both Theorem 1a and Proposition 4a.
Remark 5: Inequality (51) can be also derived as a special case of Theorem 1a, where f is the rank function in (19), and g : R → R is given by g(x) exp(2nx) for all x ∈ R. It also follows from the monotonicity property in Corollary 3c, which yields w Remark 6: The result in Theorem 8 of [31] is a special case of Theorem 1a here, which follows by taking the function g in Theorem 1a to be the identity function.The flexibility in selecting the function g in Theorem 1 enables to obtain a larger collection of information inequalities.This is in part reflected from a comparison of Corollary 3 here with Corollary 9 of [31].More specifically, the findings about the monotonicity properties in (30), (31) and (33) were obtained in Corollary 9 of [31], while relying on Theorem 8 of [31] and the sub/supermodularity properties of the considered Shannon information measures.It is noted, however, that the monotonicity results of the sequences ( 34)-(37) (Corollary 3c) are not implied by Theorem 8 of [31].
Remark 7: Inequality (52) forms a counterpart of an entropy power inequality by Artstein et al. (Theorem 3 of [40]), where for independent random variables {X i } n i=1 with finite variances: Inequality (50), and also its looser version in (51), form counterparts of the generalized inequality by Madiman and Barron, which reads (see inequality (4) in [41]): IV. PROOFS The present section provides proofs of (most of the) results in Section III.

A. Proof of Theorem 1
We prove Item a, and then readily prove Items b-d.Define the auxiliary sequence averaging f over all k-element subsets of the n-element set Ω {ω 1 , . . ., ω n }.Let the permutation π : which are k-element subsets of Ω with k − 1 elements in common.Then, which holds by the submodularity of f (by assumption), i.e., Averaging the terms on both sides of (58) over all the n! permutations π of [n] gives and similarly (n) 0 = 0 since by assumption f (∅) = 0. Combining (58)-(60) gives which is rewritten as Consequently, it follows that where equality (63a) holds since f (n) 0 = 0, and inequality (63d) holds by (62).The sequence is therefore monotonically decreasing, and in particular We next prove (25) when α = 1, and then proceed to prove Theorem 1.By ( 64) where, by (55), Combining ( 65) and (66) gives Since there are n subsets T ⊆ Ω with |T | = n − 1, rearranging terms in (67) gives ( 25) for α = 1; it is should be noted that, for α = 1, the set function f does not need to be nonnegative for the satisfiability of (25) (however, this will be required for α > 1).
We next prove Item a).By (20), for k ∈ [n], Fix Ω k {t 1 , . . . ,t k } ⊆ Ω, and let f : 2 Ωk → R be the restriction of the function f to the subsets of Ω k .Then, f is a submodular set function with f (∅) = 0; similarly to (55), ( 65) and (66) with f replaced by f , and n replaced by k, the sequence where Combining ( 69) and (70) gives and, since by assumption g is monotonically increasing, From ( 68) and ( 72), for all k ∈ [2 : n], t and where (74a) holds by invoking Jensen's inequality to the convex function g; (74b) holds since the term of the inner summation in the right-hand side of (74a) does not depend on t i , so for every (k − 1)-element subset S = {t 1 , . . ., t i−1 , t i+1 , . . ., t k } ⊆ Ω, there are n − k + 1 possibilities to extend it by a single element (t i ) into a k-element subset T = {t 1 , . . ., t k } ⊆ Ω; (74e) is straightforward, and (74f) holds by the definition in (20).This proves Item a).Item b) follows from Item a), and similarly Item d) follows from Item c), by replacing g with −g.Item c) is next verified.If f is a supermodular set function with f (∅) = 0, then (57), (58), and (61)-(63) hold with flipped inequality signs.Hence, if g is monotonically decreasing, then inequalities (72) and (73) are reversed; finally, if g is also concave, then (by Jensen's inequality) (74) holds with a flipped inequality sign, which proves Item c).

B. Proof of Corollary 1
By assumption f : 2 Ω → R is a rank function, which implies that 0 ≤ f (T ) ≤ f (Ω) for every T ⊆ Ω.Since (by definition) f is submodular with f (∅) = 0, and (by assumption) the function g is convex and monotonically increasing, then (from (22), while replacing k with k n ) By the second assumption in Corollary 1, for positive values of x that are sufficiently close to zero, we have In both cases, it follows that In light of ( 75) and ( 76), and since (by assumption) By the following upper and lower bounds on the binomial coefficient: the combination of equalities ( 77) and (78) gives equality (23).Equality (24) holds as a special case of ( 23), under the assumption that lim
We next prove Item b).The function f is (by assumption) a rank function, which yields its nonnegativity.Hence, the leftmost inequality in (27) holds by (82).The rightmost inequality in ( 27) also holds since f : 2 Ω → R is monotonically increasing, which yields f (T ) ≤ f (Ω) for all T ⊆ Ω.For k ∈ [n] and α ≥ 0 (in particular, for α ≥ 1), where (84) holds since there are n k k-element subsets T of the n-element set Ω, and every summand f α (T ) (with T ⊆ Ω) is upper bounded by f α (Ω).

V. A PROBLEM IN EXTREMAL GRAPH THEORY
This section applies the generalization of Han's inequality in (28) to the following problem.

A. Problem Formulation
Let A ⊆ {−1, 1} n , with n ∈ N, and let τ ∈ [n].Let G = G A,τ be an un-directed simple graph with vertex set V(G) = A, and pairs of vertices in G are adjacent (i.e., connected by an edge) if and only if they are represented by vectors in A whose Hamming distance is less than or equal to τ : The question is how large can the size of G be (i.e., how many edges it may have) as a function of the cardinality of the set A, and possibly based also on some basic properties of the set A ?This problem and its related analysis generalize and refine, in a nontrivial way, the bound in Theorem 4.2 of [6] which applies to the special case where τ = 1.The motivation for this extension is next considered.

B. Problem Motivation
Constraint coding is common in many data recording systems and data communication systems, where some sequences are more prone to error than others, and a constraint on the sequences that are allowed to be recorded or transmitted is imposed in order to reduce the likelihood of error.Given such a constraint, it is then necessary to encode arbitrary user sequences into sequences that obey the constraint.be a subvector of x n of length n − d, obtained by dropping the bits of x n in positions k 1 , . . ., k d ; if d = n, then (k 1 , . . ., k n ) = (1, . . ., n), and x (k1,...,kd) is an empty vector.By the chain rule for the Shannon entropy, where equality (90c) holds by (86). For where the bits of x n in position k 1 , . . ., k d are flipped (in contrast to x (k1,...,kd) where the bits of x n in these positions are dropped), so x (k1,...,kd) ∈ {−1, 1} n and d H x n , x (k1,...,kd) = d.
In general, it would be preferable to have the largest possible values of m d and ℓ d (i.e., those satisfying inequalities (92) and (94) with equalities, for obtaining a better upper bound on the size of G (this point will be clarified in the sequel).If d = 1, then m d = 2 and ℓ d = 1 are the best possible constants (this holds by the definitions in ( 92) and ( 94), which can be also verified by the coincidence of the upper and lower bounds in (93) for d = 1, as well as those in (95)).If x n ∈ A, then we distinguish between the following two cases: • If x (k1,...,kd) ∈ A, then which holds by the way that m d is defined in (92), and since X n is randomly selected to be equiprobable in the set A. • If x (k1,...,kd) ∈ A, then which holds by the way that ℓ d is defined in (94), and since X n is equiprobable on A.
(99) Equality holds in (99) if the minima on the RHS of (92) and (94) are attained by any element in these sets, and if (92) and (94) are satisfied with equalities (i.e., m d and ℓ d are the maximal integers to satisfy inequalities (92) and (94) for the given set A). Hence, this equality holds in particular for d = 1, with the constants m d = 2 and ℓ d = 1.
The double sum in the first term on the RHS of (99) is equal to (k1,...,kd): 1≤k1<...<kd≤n since every pair of adjacent vertices in G that refer to vectors in A whose Hamming distance is equal to d is of the form x n ∈ A and x (k1,...,kd) ∈ A, and vice versa, and every edge {x n , x (k1,...,kd) } ∈ E d (G) is counted twice in the double summation on the LHS of (100).For calculating the double sum in the second term on the RHS of (99), we first calculate the sum of these two double summations: (k1,...,kd): 1≤k1<...<kd≤n x n ½{x n ∈ A, x (k1,...,kd) ∈ A} + (k1,...,kd): 1≤k1<...<kd≤n x n ½{x n ∈ A, x (k1,...,kd) ∈ A} = (k1,...,kd): 1≤k1<...<kd≤n so, subtracting (100) from (101d) gives that (k1,...,kd): 1≤k1<...<kd≤n Substituting (100) and (102) into the RHS of (99) gives that, for all d ∈ [τ ], (k1,...,kd): 1≤k1<...<kd≤n with the same necessary and sufficient condition for equality in (103a) as in (99).(Recall that it is in particular an equality for d = 1, where in this case m 1 = 2 and ℓ 1 = 1.)By the generalized Han's inequality in (28), (k1,...,kd): 1≤k1<...<kd≤n where equality (104b) holds by (87).Combining ( 103) and (104) yields and, by the identity This upper bound is specialized, for d = 1, to Theorem 4.2 of [6] (where, by definition, m 1 = 2 and ℓ 1 = 1).This gives that the number of edges in G, connecting pairs of vertices which refer to binary vectors in A whose Hamming distance is 1 from each other, satisfies It is possible to select, by default, the values of the integers m d and ℓ d to be equal to 2 and 1, respectively, independently of the value of d ∈ [τ ].It therefore follows that the upper bound in (106) can be loosened to This shows that the bound in (108) generalizes the result in Theorem 4.2 of [6], based only on the knowledge of the cardinality of A. Furthermore, the bound (108) can be tightened by the refined bound (106) if the characterization of the set A allows one to assert values for m d and ℓ d that are larger than the trivial values of 2 and 1, respectively.In light of (88) and (108), the number of edges in the graph G satisfies and if τ ≤ n+1 2 , then it follows that Indeed, the transition from (109) to (110) holds by the inequality where the latter bound is asymptotically tight in the exponent of n (for sufficiently large values of n).

D. Comparison of Bounds
We next consider the tightness of the refined bound (106) and the loosened bound (108).Since A is a subset of the n-dimensional cube {−1, 1} n , every point in A has at most n d neighbors in A with Hamming distance d, so Comparing the bound on the RHS of (106) with the trivial bound in (112) is imposed.The latter also forms a necessary and sufficient condition for the usefulness of the looser bound on the RHS of (108) in comparison to (112).
Example 1: Suppose that the set A ⊆ {−1, 1} n is characterized by the property that for all d ∈ [τ ], with a fixed integer τ ∈ [n], if x n ∈ A and x (k1,...,kd) ∈ A then all vectors y n ∈ {−1, 1} n which coincide with x n and x (k1,...,kd) in their (n − d) agreed positions are also included in the set A. Then, for all d ∈ [τ ], we get by definition that m d = 2 d , which yields τ ≤ ⌊log 2 |A|⌋.Setting m d = 2 d and the default value ℓ d = 1 on the RHS of (106) gives Unless A = {−1, 1} n , the upper bound on the RHS of (116d) is strictly smaller than the trivial upper bound on the RHS of (112).This improvement is consistent with the satisfiability of the (necessary and sufficient) condition in (115), which is strictly satisfied since On the other hand, the looser upper bound on the RHS of (108) gives which is d times larger than the refined bound on the RHS of (116d) (since it is based on the exact value of m d for the set A, rather than taking the default value of 2), and it is worse than the trivial bound if and only if |A| > 2 n d .The latter finding is consistent with (115).This exemplifies the utility of the refined upper bound on the RHS of (106) in comparison to the bound on the RHS of (108), where the latter generalizes [6, Theorem 4.2] from the case where d = 1 to all d ∈ [n].As it is explained above, this refinement is irrelevant in the special case where d = 1, though it proves to be useful in general for d ∈ [2 : n] (as it is exemplified here).
The following theorem introduces the results of our analysis (so far) in the present section.(be, preferably, the maximal possible values to) satisfy the requirements in ( 92) and (94), respectively.Then, and, if τ ≤ n+1 2 , then the (overall) number of edges in G satisfies c) The refined upper bound on the RHS of (119) and the loosened upper bound on the RHS of (120) improve the trivial bound

E. Influence of Fixed-Size Subsets of Bits
The result in Theorem 4.2 of [6], which is generalized and refined in Theorem 2 here, is turned to study the total influence of the n variables of an equiprobable random vector X n ∈ {−1, 1} n on a subset A ⊂ {−1, 1} n .To this end, let X (i) denote the vector where the bit at the i-th Then, the influence of the i-th variable is defined as and their total influence is defined to be the sum As it is shown in Chapters 9 and 10 of [6], influences of subsets of the binary hypercube have far reaching consequences in the study of threshold phenomena, and many other areas.As a corollary of (107), it is obtained in Theorem 4.3 of [6] that, for every subset A ⊂ {−1, 1} n , where Pr(A) P[X n ∈ A] = |A| 2 n by the equiprobable distribution of X n over {−1, 1} n .In light of Theorem 2, the same approach which is used in Section 4.4 of [6] for the transition from (107) to (124) can be also used to obtain, as a corollary, a lower bound on the average total influence over all subsets of d variables.To this end, let k 1 , . . ., k d be integers such that 1 ≤ k 1 < . . .< k d ≤ n, and the influence of the variables in positions k 1 , . . ., k d be given by I (k1,...,kd) (A) Pr ½{X n ∈ A} = ½ X (k1,...,kd) ∈ A .
(125) Then, let the average influence of subsets of d variables be defined as Hence, by ( 123) and (126), be the set of ordered pairs of sequences (x n , y n ), where x n , y n ∈ {−1, 1} n are of Hamming distance d from each other, with x n ∈ A and y n ∈ A. By the equiprobable distribution of X n on {−1, 1} n , we get where G is introduced in Theorem which is then specialized to the result in [6, Theorem 4.3] (see ( 124)).This gives the following result.Theorem 3: Let X n be an equiprobable random vector over the set {−1, 1} n , let d ∈ [n] and A ⊂ {−1, 1} n .Then, the average influence of subsets of d variables of X n , as it is defined in (126), is lower bounded as follows: By the induction hypothesis, the first ℓ sets in this sequence can be transformed into a chain (in a finite number of steps) by a recursive process as above; this gives a chain of the form B ′′ 1 ⊆ B ′′ 2 . . .⊆ B ′′ ℓ−1 ⊆ B ′′ ℓ .The first ℓ sets in (B.1) are all included in B ′ ℓ , so every combination of unions and intersections of these ℓ sets is also included in B ′ ℓ .Hence, the considered recursive process leads to a chain of the form where the last inclusion in (B.2) holds since B ′′ ℓ ⊆ B ′ ℓ .The claim thus holds for ℓ + 1 if it holds for a given ℓ, and it holds for ℓ = 2, it therefore holds by mathematical induction for all integers ℓ ≥ 2.
We first prove Proposition 4a.Suppose that there is a permutation π : [M ] → [M ] such that S π(1) ⊆ S π(2) ⊆ . . .⊆ S π(M ) is a chain.Since every element in Ω is included in at least d of these subsets, then it should be included in (at least) the d largest sets of this chain, so S π(j) = Ω for every j ∈ ∈ S n and ω / ∈ S m , then it does not belong to their intersection and union).Now, consider the recursive process in Lemma B.1.Since the profile of the number of inclusions of the elements in Ω is preserved in each step of the recursive process in Lemma B.1, it follows that every element in Ω stays to belong to at least d sets in the chain which is obtained at the end of this recursive process.Moreover, in light of (B.4), in every step of the recursive process in Lemma B.1, the sum in the LHS of (B.4) cannot increase.Inequality (45) therefore finally follows from the earlier part of the proof for a chain (see (B.3)).

(
x,y)∈X ×Y P XY (x, y) log P X|Y (x|y) (3a) = y∈Y P Y (y) H(X|Y = y), and (51) are consequently equivalent if {X i } n i=1 are i.i.d.random variables, and (52) is a specialized version of (50) and the loosened inequality (51) by setting k = n − 1.

Theorem 2 :
Let A ⊆ {−1, 1} n , with n ∈ N, and let τ ∈ [n].Let G = (V(G), E(G)) be an un-directed, simple graph with vertex set V(G) = A, and edges connecting pairs of vertices in G which are represented by vectors in A whose Hamming distance is less than or equal to τ .For d ∈ [τ ], let E d (G) be the set of edges in G which connect all pairs of vertices that are represented by vectors in A whose Hamming distance is equal to d (i.e., | E(G)| = τ d=1 | E d (G)|).a) For d ∈ [τ ], let the integers m d ∈ [2 : min{2 d , |A|}] and ℓ d ∈ [min{2 d − 1, |A| − 1}] ) b) A loosened bound, which only depends on the cardinality of the set A, is obtained by setting the default values m d = 2 and ℓ d = 1.It is then given by ,...,kd): 1≤k1<...<kd≤n I (k1,...,kd) (A).
2, and E d (G) is the set of edges connecting pairs of vertices in G which are represented by vectors in A of Hamming distance d.The multiplication by 2 on the RHS of (129) is because every edge whose two endpoints are in the set A is counted twice.Hence, by (106) and (129), B (n,d) (A) = n d |A| − 2 E d (G) bound on the RHS of (130d) is positive if and only if |A| < (m d ) n d (see also (114)).This gives from (128) that the average influence of subsets of d variables satisfiesI (n,d) (A) ≥ |A| 2 n−1 log m d − d n log|A| log md ℓd setting d = 1, and the default values m d = 2 and ℓ d = 1 on the RHS of (131c) gives the total influence of the n variables satisfies, for all A ⊆ {−1, 1} n , I(A) = nI (n,1)

where
Pr(A) P[X n ∈ A] = |A| 2 n , and the integers m d and ℓ d are introduced in Theorem 2. Similarly to the refined upper bound in Theorem 2, the lower bound on the RHS of (133) is informative (i.e., positive) if and only if |A| < (m d ) n d .The lower bound on the RHS of (133) can be loosened (by setting the default values m d = 2 and ℓ d = 1) toI (n,d) (A) ≥ 2 Pr(A) d n log 2 1 Pr(A) + 1 − d .(134)recursiveprocess where B ′ ℓ and B ℓ+1 are replaced with their intersection and union, consider the sequenceB ′ 1 , . . ., B ′ ℓ−1 , B ′ ℓ ∩ B ℓ+1 , B ′ ℓ ∪ B ℓ+1 .(B.1) [M − d + 1 : M ].Due to the non-negativity of f , it follows that M j=1 f (S j ) ≥ M j=M −d+1 f (S π(j) ) (B.3a) = d f (Ω).(B.3b)Otherwise, if we cannot get a chain by possibly permuting the subsets in the sequence S j } M j=1 , consider a pair of subsets S n and S m that are not related by inclusion, and replace them with their intersection and union.By the submodularity of f ,M j=1 f (S j ) = j =n,m f (S j ) + f (S n ) + f (S m ) (B.4a) ≥ j =n,m f (S j ) + f (S n ∩ S m ) + f (S n ∪ S m ).(B.4b)For all ω ∈ Ω, let deg(ω) be the number of indices j ∈ [M ] such that ω ∈ S j .By replacing S n and S m with S n ∩ S m and S n ∪ S m , the set of values {deg(ω)} ω∈Ω stays unaffected (indeed, if ω ∈ S n and ω ∈ S m , then it belongs to their intersection and union; if ω belongs to only one of the sets S n and S m , then ω / ∈ S n ∩ S m and ω ∈ S n ∪ S m ; finally, if ω /