A Generalized Information-Theoretic Approach for Bounding the Number of Independent Sets in Bipartite Graphs

This paper studies the problem of upper bounding the number of independent sets in a graph, expressed in terms of its degree distribution. For bipartite regular graphs, Kahn (2001) established a tight upper bound using an information-theoretic approach, and he also conjectured an upper bound for general graphs. His conjectured bound was recently proved by Sah et al. (2019), using different techniques not involving information theory. The main contribution of this work is the extension of Kahn’s information-theoretic proof technique to handle irregular bipartite graphs. In particular, when the bipartite graph is regular on one side, but may be irregular on the other, the extended entropy-based proof technique yields the same bound as was conjectured by Kahn (2001) and proved by Sah et al. (2019).


Introduction
The Shannon entropy and other classical information measures serve as a powerful tool in various combinatorial and graph-theoretic applications (see, e.g., [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]), such as the method of types, applications of Shearer's lemma, sub-and supermodularity properties of information measures and their applications, entropy-based proofs of Moore bound for irregular graphs, Bregman's theorem on the permanent of square matrices with binary entries, and a discrepancy theorem by Spencer. The enumeration of discrete structures that satisfy certain local constraints, and particularly the enumeration of independent sets in graphs, is of interest in discrete mathematics. Many important structures can be modeled by independent sets in a graph, i.e., subsets of vertices in a graph where none of them are connected by an edge. For example, if a graph models some kind of incompatibility, then an independent set in this graph represents a mutually compatible collection. Upper bounding the number of independent sets in a regular graph was motivated in [21] by a conjecture which has several applications in combinatorial group theory. A survey paper on upper bounding the number of independent sets in graphs, along with some of their applications, is provided in [22]. The problem of counting independent sets in graphs received, in general, significant attention in the literature on discrete mathematics over the last three decades, and also in the information theory literature ( [13,14]).
A tight upper bound on the number of independent sets in finite and undirected general graphs was proved in the special setting of bipartite regular graphs in [11], and it was conjectured to hold for general (irregular) graphs (2001, see Conjecture 4.2 in [11]). A decade later (2010), it was extended in [23] to regular graphs (that are not necessarily bipartite); a year later (2011), it was proved in [24] for graphs with a small maximal degree (up to 5). Finally, this conjecture was recently (2019) proved in general [25], by utilizing a new approach. The reader is referred to [26] for an announcement on the solution of this conjecture as a frustrating combinatorial problem for two decades, along with the history and ramifications of this problem, and some reflections of the authors on their work in [25].
The recently introduced proof of the conjecture for general undirected graphs [25] uses an induction on the number of vertices in a graph, and it obtains a recurrence inequality whose derivation involves some judicious applications of Hölder's inequality (see Sections 2 and 4 in [25]). The work in [25] proved, for the first time, Conjecture 4.2 in [11] by using an interesting approach which is unrelated to information theory. The possibility of generalizing the information-theoretic proof in [11] to irregular bipartite graphs was left in [25] as an open issue. It should be noted that by proving Kahn's conjecture for irregular bipartite graphs, this readily enables the extension of the proof to general undirected graphs (by invoking Zhao's inequality, see Lemma 3 in [24]).
The main contribution of this work is the extension of Kahn's information-theoretic proof technique for bipartite regular graphs [11] to handle irregular bipartite graphs. In particular, when the bipartite graph is regular on one side, but may be irregular on the other, the extended entropy-based proof technique yields the same bound that was conjectured by Kahn [11] and proved by Sah et al. [23].
The structure of the paper is as follows: Section 2 provides preliminaries and notations that are essential for the analysis in this paper. Section 3 explains (in more detail) the scientific merit and contributions of the present work; for the sake of causal presentation, we provide these explanations after Section 2. Sections 4 and 5 are the core of this work.

Preliminaries and Notation
In this section, we provide the notation and preliminary material which are essential for the presentation in this paper.

Notation and Basic Properties of the Entropy
The following notations are used in the present paper: . .} denotes the set of natural numbers; • X n (X 1 , . . . X n ) denotes an n-dimensional random vector of discrete random variables, having a joint probability mass function (PMF) that is denoted by P X n ; • For every n ∈ N, [1 : n] {1, . . . , n}; • X S (X i ) i∈S is a random vector for an arbitrary nonempty subset S ⊆ [1 : n]; if S = ∅, then conditioning on X S is void. Note that X n = X [1:n] , though X n is a commonly used notation; • 1{E} denotes the indicator of an event E; i.e., it is equal to 1 if this event is satisfied, and it is zero otherwise; • Let X be a discrete random variable that takes its values on a set X , and let P X be the PMF of X. The Shannon entropy of X is given by where throughout this paper, we take all logarithms to base 2; where H b (·) is the binary entropy function. By continuous extension, the convention 0 log 0 = 0 is used; • Let X and Y be discrete random variables with a joint PMF P XY , and a conditional PMF of X given Y that is denoted by P X|Y . Let X and Y take their values in the sets X and Y, respectively. The conditional entropy of X given Y is defined as and This paper relies on the following basic properties of the Shannon entropy: • Entropies and conditional entropies of discrete random variables (or vectors) are nonnegative; • If X is a finite set, then with equality in (6) if and only if X is equiprobable over the set X ; • Conditioning cannot increase the entropy, i.e., with equality in (7) if and only if X and Y are independent; • Generalizing (5) to n-dimensional random vectors gives the chain rule for the Shannon entropy: H(X n ) = H(X 1 ) + H(X 2 | X 1 ) + . . . + H(X n | X 1 , . . . , X n−1 ) • The following subadditivity property of the entropy is implied by (7) and (9) with equality in (10) if and only if X 1 , . . . , X n are independent random variables.

Shearer's Lemma
Shearer's lemma extends the subadditivity property (10) of the entropy. Due to its simplicity and usefulness in this paper (and elsewhere), we state and prove this lemma here.
Proof. Let S = {i 1 , . . . , i } with 1 ≤ i 1 < . . . < i ≤ n. By invoking the chain rule in this order, where the last inequality holds since an additional conditioning cannot increase the entropy (i.e., H(X| Y, Z) ≤ H(X| Y) for all X, Y and Z). By assumption, for i ∈ [1 : n], since the number of subsets {S j } m j=1 that include i as an element is at least k. Consequently, it follows that where (14) follows from (12); (15) holds by interchanging the order of summation; (16) holds by (13); (17) holds by the chain rule (8  [1 : n], and every element i ∈ [1 : n] continues to be included in at least k ≥ 1 of these subsets. Hence, Proposition 1 can be applied to the subsets S 1 , . . . , S m . By the monotonicity property of the entropy, the inclusion S j ⊆ S j implies that H(X S j ) ≤ H(X S j ) for all j ∈ [1 : m], which then yields the satisfiability of (11).

Graphs, Independent Sets, and Tensor Products
Let G be an undirected graph, and let V(G) and E(G) denote, respectively, the sets of vertices and edges in G.
A graph G is called d-regular if the degree of all the vertices in V(G) is equal to d. Otherwise, if the graph G is not d-regular for some d ∈ N, then G is an irregular graph.
A graph G is called bipartite if it has two types of vertices, and the edges cannot connect vertices of the same type; we refer to the two types of vertices of a bipartite graph G as left and right vertices.
is connected to all the other vertices in V(G) \ {v} (and not to itself); similarly, a bipartite graph is called complete if every vertex is connected to all the vertices of the other type in the graph. A complete (d − 1)-regular graph is denoted by K d , having a number of vertices V(K d ) = d, and a number of edges E(K d ) = 1 2 d(d − 1). Likewise, a complete d-regular bipartite graph is denoted by K d,d , having a number of vertices V(K d,d ) = 2d (i.e., d vertices of each of the two types), and a number of edges An independent set of an undirected graph G is a subset of its vertices such that none of the vertices in this subset are adjacent (i.e., none of them are joined by an edge). Let I(G) denote the set of all the independent sets in G, and let I(G) denote the number of independent sets in G. Similarly to [8,11,13,14,19,[21][22][23][24][25] (and references therein), our work considers the question of how many independent sets G can have.
The tensor product G × H of two undirected graphs G and H is a graph such that the following holds: In general, the following identities hold: By the definition of a complete d-regular graph K d , the graph K 2 is specialized to two vertices that are connected by an edge. Let us label the two vertices in K 2 by 0 and 1. For a graph G, the tensor product G × K 2 is a bipartite graph, called the bipartite double cover of G, where the set of vertices in G × K 2 is given by and its set of edges is given by . This implies that the numbers of vertices and edges in G × K 2 are doubled in comparison to their respective numbers in G; moreover, every edge in G, which connects a pair of vertices of specified degrees, is mapped onto two edges in G × K 2 , where each of these two edges connects a pair of vertices of the same specified degrees.

Upper Bounds on the Number of Independent Sets
The present subsection introduces the relevant results to this paper. The next theorem provides a tight upper bound on the number of independent sets in bipartite regular graphs, and its derivation in [11] makes clever use of Shearer's lemma (Proposition 1).
Theorem 1 (Kahn 2001, [11]). If G is a bipartite d-regular graph with n vertices, then Furthermore, if n is an even multiple of d, then the upper bound in the right side of (22) is tight, and it is obtained by a disjoint union of n 2d complete d-regular bipartite graphs (K d,d ).
Kahn's result was later extended by Zhao [23] for a general d-regular graph via a brilliant combinatorial reduction to the setting of d-regular bipartite graphs, which proved Conjecture 4.1 in [11].
Theorem 2 (Zhao 2010, [23]). The upper bound on the number of independent sets in (22) continues to hold for all d-regular graphs with n vertices.
Recently, Sah et al. [25] proved Kahn's conjecture (Conjecture 4.2 in [11], which was made eighteen years earlier) for an upper bound on the number of independent sets in a general undirected graph with no isolated vertices. The proof in [25] is combinatorial, and it extends the result in Theorem 2 as follows.
with an equality if G is a disjoint union of complete bipartite graphs.
Let K d u ,d v be a complete bipartite graph where the degrees of its left and right vertices are equal to d u and d v , respectively. Then, the number of independent sets in such a complete bipartite graph is equal to since every subset of the left vertices, as well as every subset of the right vertices, forms an independent set of K d u ,d v ; on the other hand, any subset which contains both left and right vertices is not an independent set (since the bipartite graph K d u ,d v is complete). Note that the substraction by 1 in the right side of (24) is because, in the counting of the number of subsets of left vertices (2 d v ) or right vertices (2 d u ), the empty set is counted twice. Hence, (23) can be rewritten in an equivalent form as Since it follows that the bound in (23) (or (25)) is achieved by the complete bipartite graph K d u ,d v . More generally, the bound is achieved by a disjoint finite union of such complete bipartite graphs, since the number of independent sets in a disjoint union of graphs is equal to the product of the number of independent sets in each of these component graphs.
For the extension of the validity of Theorem 1 to Theorem 2, obtained by relaxing the requirement that the graph is bipartite, the following inequality was introduced by Zhao for every finite graph G (Lemma 2.1 in [23]): which relates the number of independent sets in a graph to the number of independent sets in the bipartite double cover of this graph. The transition from Theorem 1 to Theorem 2, as introduced in [23], is a one-line proof. Let G be a d-regular graph with n vertices, then G × K 2 is d-regular bipartite graph with 2n vertices. Hence, (22) and (26) give and taking the square root of the leftmost and rightmost sides of (27) implies that (22) continues to hold even when the regular graph G is not necessarily bipartite.

Scientific Merit and Contributions of the Present Work
After introducing Shearer's lemma (Proposition 1) and Theorems 1-3, we address the scientific merit and contributions of the present work in more detail (in comparison to the introduction in Section 1).
Theorem 3 was recently proved in [25] (see also [26]) for general graphs, without relying on information theory. The motivation of our work is rooted in the following sentences from p. 174 in [25]: Kahn's proof [11] of the bipartite case of Theorem 1 made clever use of Shearer's entropy inequality [2]. It remains unclear how to apply Shearer's inequality in a lossless way in the irregular case, despite previous attempts to do so, e.g., Section 3 in [13] and Section 5.C in [14].
The present paper gives an information-theoretic proof of Theorem 3 in a setting where the bipartite graph is regular on one side (i.e., the vertices on the other side of the bipartite graph can be irregular, and have arbitrary degrees). Its contributions are as follows: • Section 4 provides a (non-trivial) extension of the proof of (22), from regular bipartite graphs [11] to general bipartite graphs. This leads to an upper bound on the number of independent sets, which is in general looser than the bound in (23) (or its equivalent form in (25)). However, for the family of bipartite graphs that are regular on one side of the graph, our bound in (72) coincides with the bound in (23). The main deviation from Kahn's information-theoretic proof in [11] is that here we allow the bipartite graph to be irregular. This generalization is not trivial in the sense that it requires a more careful analysis and a slightly more complicated version of Shearer's Lemma (see Remark 1). It should be noted, however, that the suggested proof follows the same recipe of Kahn's proof in [11], with some further complications that arise from the non-regularity of the bipartite graphs; • A variant of the proof of Zhao's inequality (26) (see Section 2 in [23]) is provided in Section 5.
It is interesting to note that the observation that (23) can be extended from (undirected) bipartite graphs to general graphs, by utilizing (26), was made in Lemma 3 in [24]. However, a computer-assisted proof of (23) was restricted there to graphs whose maximal degrees are at most 5 (see Theorem 2 in [24]).

An Information-Theoretic Proof of Theorem 3 for a Family of Bipartite Graphs
The core of the proof of Theorem 3 is proving (23) for an undirected bipartite graph. We provide an extension of the entropy-based proof by Kahn [11] from bipartite d-regular graphs to general bipartite graphs, and then we prove (23) for the family of bipartite graphs that are regular on one side. As is explained in Section 3, the proof in the present section follows the same recipe of Kahn's proof in [11], with some complications that arise from the non-regularity of the bipartite graphs. The following proof deviates from the proof in [11] at its starting point, by a proper adaptation of the proof technique to the general setting of irregular bipartite graphs, followed by a slightly more complicated usage of Shearer's lemma and a more involved analysis.
First consider a general bipartite graph G with a number of vertices | V(G)| = n, where none of its vertices is isolated. Label them by the elements of [1 : n]. Let L and R be the vertices of the two types in V(G) (called, respectively, the left and right vertices in G), so V(G) = L ∪ R is a disjoint union. Let D L and D R be, respectively, the sets of all possible degrees of vertices in L and R. For all d ∈ D L , let L d be the set of vertices in L with degree d, and let R d be the set of vertices in R that are adjacent to vertices in L d (note that the vertices in R d are not necessarily those vertices in R with degree d, so the definitions of L d and R d differ, i.e., they are not similar up to the replacement of left vertices of degree d with right vertices of the same degree). Then, where the first equality in (28) is (by definition) a union of pairwise disjoint sets. Let S ∈ I(G) be an independent set in G, which is selected uniformly at random from I(G), and let X n (X 1 , . . . , X n ) be given by so the binary random vector X n indicates which of the n vertices in V(G) belongs to the randomly selected independent set S. Since S is equiprobable in I(G), we have Let X L = (X i ) i∈L and X R = (X i ) i∈R be used as a shorthand. Then, where inequalities (33) and (34) hold by the subadditivity of the entropy, and due to (28). It should be noted that although the first summand in the right side of (35) is an entropy of X L d , the conditioning on X L (rather than just on X L d ) in the second term leads to a stronger upper bound on H(X n ) (since L d ⊆ L, and conditioning reduces the entropy). This is essential for the continuation of the proof (see (37)). We next upper bound the two summands in the right side of (35), starting with the conditional entropy. By invoking the subadditivity property of the entropy, for every d ∈ D L , For every r ∈ R d , let N (r) be the set of all the vertices that are adjacent to the vertex r. Since the graph G is bipartite, we have N (r) ⊆ L (but, in general, N (r) ⊆ L d ), and consequently Combining (36) and (37) gives For r ∈ R d , let be the indicator function of the event where none of the vertices that are adjacent (in G) to the vertex r are included in the (randomly selected) independent set S. Then, since the random vector X N (r) indicates which of the indices i ∈ N (r) are included in S, whereas the binary random variable Q r only indicates if there is such an index. Consequently, (38) and (40) imply that For the binary random variable Q r , let By (39), Q r = 0 if and only if S ∩ N (r) = ∅, which implies that r ∈ S, since there is a vertex in N (r) that belongs to the independent set S. Therefore, if Q r = 0, then X r = 0 (see (29)), so If Q r = 1, then X r ∈ {0, 1} and it is also equiprobable (the latter holds since given Q r = 1, the independent set S is uniformly distributed over all the independent sets in I(G) that do not include any neighbor of the vertex r, so the vertex r can be either removed from or added to such an independent set, while still giving an independent set that does not include any neighbor of r). Hence, Hence, from (42) to (44), and the combination of (41) and (45) yields We next upper bound H(X L d ), which is the first summand in the right side of (35), and here Shearer's lemma (see Proposition 1) comes into the picture. Since, by definition, R d is the set of the vertices that are connected to the subset L d of the degree-d vertices in L, and N (r) is the set of vertices in L that are connected to a vertex r ∈ R d in the bipartite graph G, then it follows that every vertex in L d belongs to at least d of the subsets {N (r)} r∈R d . Hence, by Shearer's lemma (in light of Remark 1, this also holds regardless of the fact that, for r ∈ R d , the set N (r) is not necessarily a subset of L d ), The binary random variable Q r is a deterministic function of the random vector X N (r) since, from (29) and (39), Q r = 1 if and only if all the entries of X N (r) are equal to 0. Consequently, for all r ∈ R d , where the equality in (50) follows from (2) and (42). Next, from (42), If Q r = 1, then X N (r) is a vector of zeros, so Otherwise, if Q r = 0, then X i = 1 for at least one element i ∈ N (r); since | N (r)| = d r is the degree of the vertex r (by assumption, there are no multiple edges connecting any pair of vertices), it follows that the vector X N (r) ∈ {0, 1} d r cannot be the zero vector, so Combining (48)-(53) gives and, from (47) and (54), the following upper bound on the first summand in the right side of (35) holds: Consequently, combining (31)-(35), (46) and (55) implies that Since q r ∈ [0, 1] for r ∈ R d , we next maximize an auxiliary function f r : [0, 1] → R, defined as in order to obtain an upper bound on the right side of (59) which is independent of {q r }. By (2), setting the first derivative of the concave function f r (·) to zero gives the equation whose solution is given by Consequently, it follows from (56)-(60) and (62) that and the calculation of the term in the inner sum in the right side of (64) gives where (65) and (66) hold, respectively, by (60) and (2). Substituting the equality in (70) into the upper bound on the entropy in the right side of (64), together with (30), gives which, by exponentiation of both sides of (71), gives The upper bound in the right-hand side of (72) is, in general, looser than the bound in Theorem 3. Indeed, to clarify this point, let Γ d,d denote the fraction of vertices in R d with a degree d ∈ D R . Then, where (73)  In light of this explanation, there is however an interesting case where the upper bound in the right side of (72) and the bound in Theorem 3 coincide. Let G be a bipartite graph that is d-regular on one side (i.e, one type of its vertices have a fixed degree d, and the other type of vertices may be irregular with arbitrary degrees). Without any loss of generality, one can assume that the left vertices are d-regular (as, otherwise, the graph can be flipped without affecting its independent sets, and also the bound in Theorem 3 is symmetric in the degrees d u and d v ). In this setting, L d = L and R d = R (recall that, by assumption, there are no isolated vertices). Consequently, the right side of (72) is specialized to Since there are exactly d r edges connecting each vertex r ∈ R with vertices in L, and (by the latter assumption) all of the left vertices in L are of a fixed degree d, it follows that in this setting, the right side of (76) can be rewritten in the form which, indeed, shows that the right side of (72) and the bound in Theorem 3 coincide for bipartite graphs that are regular on one side of the graph (without restricting the other side).

A Variant of the Proof of Zhao's Inequality
This section suggests a variant of the proof of Zhao's Inequality in (26) (see Lemma 2.1 in [23]). Although it is somewhat different from the one in [23], this forms in essence a reformulation of Zhao's proof, which is provided here for completeness.
Let G be a finite graph, and let V(G) = n. Label the vertices in the left and right sides of the bipartite graph G × K 2 (i.e., the bipartite double cover of G) by {(i, 0)} n i=1 and {(i, 1)} n i=1 , respectively. Choose, independently and uniformly at random, two independent sets S 0 , S 1 ∈ I(G). For i ∈ [1 : n], let X i , Y i ∈ {0, 1} be random variables defined as X i = 1 if and only if i ∈ S 0 , and Y i = 1 if and only if i ∈ S 1 . Then, by the statistical independence and equiprobable selection of the two independent sets from I(G), we have where (79) holds since the random vectors X n and Y n are statistically independent (by construction), and (80) holds since both X n and Y n have an equiprobable distribution over a set whose cardinality is I(G) . Consider the following set of vertices in G × K 2 : The set S is not necessarily an independent set in G × K 2 ; indeed, (i, 0), (j, 1) ∈ E(G × K 2 ) for all i ∈ S 0 and j ∈ S 1 for which (i, j) ∈ E(G) (see (21)). We next consider all (i, j) ∈ E(G), such that X i = Y j = 1. To that end, fix an ordering of all the 2 n subsets of V(G), and let T ∈ V(G) be the first subset in this particular ordering which includes exactly one endpoint of each edge (i, j) ∈ E(G) for which X i = Y j = 1. Consider the following replacements: • If (i, 0) ∈ S and i ∈ T , then (i, 0) is replaced by (i, 1); • Likewise, if (j, 1) ∈ S and j ∈ T , then (j, 1) is replaced by (j, 0). Let S be the set of new vertices after these possible replacements. Then, S ∈ I(G × K 2 ), since all adjacent vertices in S are no longer connected in S. Indeed, there is no way that after (say) a vertex (i, 0) is replaced by (i, 1), there is another replacement of a vertex (j, 1) by (j, 0), for some j such that (i, j) ∈ E(G); otherwise, that would mean that T contains both i and j, which is impossible by construction.
Similarly to the way in which X n , Y n ∈ {0, 1} n were defined, let X n , Y n ∈ {0, 1} n be defined such that, for all i ∈ [1 : n], X i = 1 if and only if (i, 0) ∈ S, and Y i = 1 if and only if (i, 1) ∈ S. The mapping from (X n , Y n ) to ( X n , Y n ) is injective. Indeed, it is shown to be injective by finding all indices (i, j) ∈ E(G) such that X i = X j = 1 or Y i = Y j = 1, finding the first subset T ∈ V(G) according to our previous fixed ordering of the 2 n subsets of V(G) that includes exactly one endpoint of each such edge (i, j) ∈ E(G), and performing the reverse operation to return to X n and Y n (e.g., if (i, j) ∈ E(G), X i = X j = 1 and i ∈ T while j ∈ T , then X i = 1 is transformed back to Y i = 1, and X j = 1 is transformed back to X j = 1). Consequently, we get H(X n , Y n ) = H( X n , Y n ) (83) where (83) holds by the injectivity of the mapping from (X n , Y n ) to ( X n , Y n ), and (84) holds since S is an independent set in G × K 2 , which implies that ( X n , Y n ) can get (at most) I(G × K 2 ) possible values (by definition, there is a one-to-one correspondence between S and ( X n , Y n )). Combining (79), (80), (83) and (84) gives 2 log I(G) ≤ log I(G × K 2 ) , which gives (26) by exponentiation of both sides of (85).
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing not applicable.
Acknowledgments: Correspondence with Ashwin Sah and Mehtaab Sawhney (both are currently mathematics graduate students at MIT), and the constructive comments in the review process are gratefully acknowledged.

Conflicts of Interest:
The author declares no conflict of interest.