Next Article in Journal
Entropy of Neuronal Spike Patterns
Previous Article in Journal
Correction: Herrera Romero, R.; Bastarrachea-Magnani, M.A. Phase and Amplitude Modes in the Anisotropic Dicke Model with Matter Interactions. Entropy 2024, 26, 574
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Information-Theoretic Proof of a Hypercontractive Inequality

Faculty of Math and Computer Science, Weizmann Institute of Science, Rehovot 7610001, Israel
Entropy 2024, 26(11), 966; https://doi.org/10.3390/e26110966
Submission received: 17 September 2024 / Revised: 4 November 2024 / Accepted: 7 November 2024 / Published: 11 November 2024
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
The famous hypercontractive estimate discovered independently by Gross, Bonami and Beckner has had a great impact on combinatorics and theoretical computer science since it was first used in this setting in a seminal paper by Kahn, Kalai and Linial. The usual proofs of this inequality begin with two-point space, where some elementary calculus is used and then generalised immediately by introducing another dimension using submultiplicativity (Minkowski’s integral inequality). In this paper, we prove this inequality using information theory. We compare the entropy of a pair of correlated vectors in { 0 , 1 } n to their separate entropies, analysing them bit by bit (not as a figure of speech, but as the bits are revealed) using the chain rule of entropy.

1. Introduction

The inequality that we consider in this note is a two-function version of a famous hypercontractive inequality discovered, independently, by Gross [1], Bonami [2] and Beckner [3]. This inequality, first introduced to the combinatorial landscape in a seminal paper [4] by Kahn, Kalai and Linial (henceforth KKL), has become one of the cornerstones of the analytical approach to Boolean functions and theoretical computer science; see, e.g., [5,6,7,8,9,10], and many others. See chapter 16 of [11] for a historical background.
Let ϵ ( 0 , 1 ) . We will be considering an operator T ϵ , which acts on real-valued functions in { 0 , 1 } n . We consider two equivalent definitions of the operator: a spectral definition and a more probabilistic one. The first definition is defined via the eigenfunctions and eigenvalues of the operator. The eigenfunctions are precisely the Walsh–Fourier variables, { u x } : x { 0 , 1 } n , which form a complete orthonormal system under the inner product < f , g > : = E [ f ( x ) g ( x ) ] , where the expectation of a function is defined by E [ h ( x ) ] = 2 n x h ( x ) . We recall the definition of these variables. For x , y { 0 , 1 } n ,
u x ( y ) = ( 1 ) i x i y i .
Given a function f : { 0 , 1 } n R and its unique Fourier expansion, f = x f ^ ( x ) u x , the action of T ϵ on f is defined by
T ϵ ( f ) = x ϵ i x i f ^ ( x ) u x .
This definition of T ϵ stresses the fact that it “focuses” on the low-frequency part of the Fourier spectrum, an idea that was a crucial element in the KKL proof [4].
For the other definition, let X and Y be random variables taking values in { 0 , 1 } n . For any distribution of X, let Y be such that for every coordinate i, independently, Y i is chosen so that P r [ X i = Y i ] = 1 + ϵ 2 . (Or, if one prefers the { 1 , 1 } n setting, the restriction is E [ X i Y i ] = ϵ .) Note that if X is chosen uniformly, then the marginal distribution of Y is also uniform. We call such a pair ( X , Y ) an ϵ -correlated pair. Then, one can define
T ϵ ( f ) ( x ) = E [ f ( Y ) | X = x ] .
It is not hard to verify that these two definitions of T ϵ are equivalent. The second definition, which is the one we will be working with in this paper, stresses the connection of this operator to random walks and isoperimetric inequalities, as it enables one to bind the probability of an ϵ -correlated pair of random points X , Y to belong to given sets. It also explains the fact (which will be made formal shortly) that T ϵ ( f ) is smoother than f, as the operator is an averaging operator. Without further ado, here is the statement of the inequality:
Theorem 1
([1,2,3]). Let f : { 0 , 1 } n R 0 and let ϵ [ 0 , 1 ] . Then,
| T ϵ ( f ) | 2 | f | 1 + ϵ 2 ,
where | g | p : = ( E [ | g | p ] ) 1 / p and the expectation is written with respect to a uniform measure.
Remarks:
  • There are various refinements of this inequality, either dealing with non-uniform measures, the norm of T ϵ as an operator ranging from L p to L q for q 2 , products of a base space with more than two points, or a reverse inequality that deals with the case p , q < 1 ; see [10,12,13,14]. It would not be surprising if the method used in this note could be extended to cover such cases too.
  • It is not difficult to see that (1) is equivalent to the following: Let f , g : { 0 , 1 } n R ; let X be uniformly distributed on { 0 , 1 } n ; and let X , Y be an ϵ -correlated pair. Then,
    E [ f ( X ) g ( Y ) ] | f | 1 + ϵ | g | 1 + ϵ .
    This is the inequality proved in this paper.
  • A major portion of the hypercontractive inequality’s applications deal with the case where f and g are Boolean functions. We will start our proof with this setting and then show how a small variation deals with the general case.
In this paper, we apply an information-theoretic approach to proving (2) for Boolean functions, trying to analyse the pair ( X , Y ) as the coordinates of X and Y are revealed to us one by one. Since all known direct proofs of (1) use induction, it is not surprising that one should adopt such a sequential approach. The difference is that the usual proofs begin with two-point space and proceed through induction using the submultiplicativity of the product operator and Minkowski’s integral inequality, whereas we use the chain rule of entropy, exposing the bits of the vectors in question one by one and comparing the amount of information about their joint distribution with the information captured by their marginal distributions. Fortunately, it turns out that regardless of the prefixes revealed so far, at every step, the conditional entropies obey the same inequality.
This is not the first application of entropy to this hypercontractive inequality. The connection between hypercontractivity-type inequalities and information theory was perhaps first addressed in [15]. In [16], the dual form of the hypercontractive inequality is proven for the case of comparing the 2-norm and the 4-norm of a low-degree polynomial acting on { 0 , 1 } n . Blais and Tan, [17], managed to improve this approach and, surprisingly, extract the precise optimal hypercontractive constant for comparing the 2-norm and the q-norm of such polynomials for all positive even integers q. Both these proofs analyse the Fourier space rather than the primal space and use no induction at all.
I will make one final remark regarding the proof in this paper. Although it is not difficult, it is probably, to date, the most involved proof of the inequality from a technical point of view. I believe that it is nonetheless worthwhile to add it to the list of existing proofs because it offers a new point of view which directly addresses the notion of studying the projections of the joint distribution of a pair of ϵ -correlated vectors.

2. Main Theorem

2.1. The Boolean Case

Theorem 2.
Let ϵ [ 0 , 1 ] , and let X , Y { 0 , 1 } n be nonempty. Let X be uniformly distributed in { 0 , 1 } n , and let Y be such that for each 1 i n , independently, P r [ X i = Y i ] = 1 + ϵ 2 . Then,
E [ 1 X ( X ) 1 Y ( Y ) ] μ ( X ) μ ( Y ) 1 1 + ϵ
where μ denotes a uniform measure.
Proof. 
For x , y { 0 , 1 } n , let a ( x , y ) denote the number of coordinates on which x and y agree and d ( x , y ) be the number of coordinates on which they differ. Then, the theorem is equivalent (by straightforward manipulation) to
log x X y Y ( 1 + ϵ ) a ( x , y ) ( 1 ϵ ) d ( x , y ) 1 1 + ϵ 2 ϵ n + log ( | X | ) + log ( | Y | ) ,
where all logs are base 2. As is usual in proofs using entropy, it suffices, through continuity, to treat the case where ϵ is rational. Let s r be natural numbers such that 1 + ϵ 2 = r r + s and 1 ϵ 2 = s r + s . Then, (2) reduces to
log x X y Y r a ( x , y ) s d ( x , y ) n ( log ( r + s ) s / r ) + r + s 2 r ( log ( | X | ) + log ( | Y | ) ) .
We will express the left-hand side of this expression as the entropy of a random variable and proceed to expand it according to the chain rule. First, let A 00 , A 11 , A 10 and A 01 be four disjoint sets with
| A 00 | = | A 11 | = r ,   | A 01 | = | A 10 | = s .
Next, let ( X , Y , Z ) be a random variable which is distributed uniformly over all triples such that X X , Y Y , and for every 1 i n , Z i A X i Y i . Clearly, Z determines X and Y so that
H ( X , Y , Z ) = H ( Z ) = log x X y Y r a ( x , y ) s d ( x , y ) .
Next, for any vector w { 0 , 1 } n , we denote ( w 1 , , w i 1 ) : = w < i . So, from the chain rule, we have
H ( Z ) = i H ( Z i | Z < i )
and
H ( X ) = i H ( X i | X < i ) log ( | X | ) , H ( Y ) = i H ( Y i | Y < i ) log ( | Y | ) .
Hence, it suffices to prove that
H ( Z ) r + s 2 r H ( X ) + H ( Y ) + n ( log ( r + s ) s / r ) .
By noting that any fixed value of Z < i determines the values of X < i and Y < i , we have
H ( X i | Z < i ) H ( X i | X < i ) ,
and
H ( Y i | Z < i ) H ( Y i | Y < i ) ,
so inequality (6) will follow from this. □
Claim 1.
Let P a s t = z < i be any specific value that Z < i attains. Then,
H ( Z i | Z < i = P a s t ) r + s 2 r H ( X i | Z < i = P a s t ) + H ( Y i | z < i = P a s t ) + log ( r + s ) s r
Proof. 
We use the condition that Z < i = P a s t , and through manipulating the notation, we drop the dependency in the notation so that, e.g., we write H ( X ) and H ( Z | X ) rather than H ( X | Z < i = P a s t ) and H ( Z | X , Z < i = P a s t ) , etc. We also drop the index i and write X for X i , etc., so, using the new notation, we want to prove (for all integers r s 0 ) that
H ( Z ) r + s 2 r H ( X ) + H ( Y ) + log ( r + s ) s r
Note that H ( Z ) = H ( X , Y ) + H ( Z | X , Y ) = H ( X , Y ) + log r ( P r [ X = Y ] ) + log s ( P r [ X Y ] ) , so we need to prove that
r + s 2 r H ( X ) + H ( Y ) H ( X , Y ) log r ( P r [ X = Y ] ) log s ( P r [ X Y ] ) + log ( r + s ) s r 0
Since this expression is invariant when r and s are multiplied by any positive constant, we can set r = 1 and denote δ : = s / r . Next, for a joint distribution of X and Y over { 0 , 1 } 2 and ( i , j ) { 0 , 1 } n , let P i j = P r [ X = i , Y = j ] . Hence, we want to prove that
F δ P 01 P 11 P 00 P 10 : = 1 + δ 2 H ( X ) + H ( Y ) H ( X , Y ) log δ ( P r [ X Y ] ) + log ( 1 + δ ) δ 0
We know (and can check directly from the formula) that this equality holds when X = Y = { 0 , 1 } n , in which case we have
P 01 P 11 P 00 P 10 = δ 2 + 2 δ 1 2 + 2 δ 1 2 + 2 δ δ 2 + 2 δ .
We wish to show that this is a unique minimum. To simplify the notation (and save using indices), we denote
P 01 P 11 P 00 P 10 : = a b c d ,
and attempt to minimise F δ a b c d under the constraints
a , b , c , d 0   and   a + b + c + d = 1 .
Using Lagrange multipliers, we deduce (after some simple cancelations) that at a local minimum in the interior of the region in question, one must have
δ a ( a + b ) ( a + c ) ( 1 + δ ) / 2 =
1 b ( a + b ) ( b + d ) ( 1 + δ ) / 2 =
1 c ( a + c ) ( c + d ) ( 1 + δ ) / 2 =
δ d ( c + d ) ( b + d ) ( 1 + δ ) / 2 .
From the fact that (9) · (12) = (10) · (11), we obtain
a d = δ 2 b c
Next, we plug (13) and the restriction a + b + c + d = 1 into Equation (9) = (12). This yields
d a ( 1 δ ) / 2 = 1 + ( δ 2 1 ) a 1 + ( δ 2 1 ) d ( 1 + δ ) / 2 .
Note that for every fixed value of b + c , the value of a + d is fixed, so letting d grow from 0 to 1 b c as a = 1 b c d decreases from 1 b c to 0, we see that the left-hand side of (14) is increasing and the right-hand side is decreasing. Hence, there exists a single solution, which, by inspection, is a = d . □
We now know that a = d and that b c = δ 2 a 2 and b + c = 1 2 a . This shows that b and c are the roots of the quadratic equation X 2 ( 1 2 a ) X + δ 2 a 2 = 0 . Plugging these roots into Equation (10) = (11) and using a = d yield the following equation for a:
[ 1 2 a + S ( a ) ] [ 1 S ( a ) ] 1 + δ [ 1 2 a S ( a ) ] [ 1 + S ( a ) ] 1 + δ = 1 ,
where S ( a ) denotes 1 4 a + 4 ( 1 δ 2 ) a 2 . Now, a can take on values between 0 and 1 / 2 as long as S ( a ) 0 , so the relevant range is 0 a δ 2 + 2 δ . When a = δ 2 + 2 δ , as required, S ( a ) = 0 and Equation (15) clearly holds. An elementary calculation shows that for all δ [ 0 , 1 ] , the left-hand side of (15) is a decreasing function of a in the interval [ 0 , δ 2 + 2 δ ] , so ( a , b , c , d ) is determined, and there is a unique internal minimum in the region, which we are exploring. (A meticulous reader may check that the derivative of the left-hand side of (15) according to a is
16 a 2 ( 1 δ ) 1 S ( a ) δ 1 + S ( a ) δ 2 ( 2 a ( δ + 1 ) δ ) δ 3 S ( a ) + 2 a 1 2 S ( a ) ,
whose sign is determined by 2 a ( δ + 1 ) δ , which is negative for all a ( 0 , δ 2 + 2 δ ) ).
What about points on the boundary of the region? We claim that there can be no minima with negative values on the boundary. We deal with two cases: the case of a b c 0 with a > 0 (and the different rotations of this) and the case of 0 b c 0 and its rotation (note that F δ 0 y z 0 F δ y 0 0 z ).
In the first case, there is a nearby point in the interior of the region, where the value of F δ is smaller. This is because the derivative of
F δ a t b c t
with respect to t at t = 0 is minus infinity (as one of the summands being derived is t log ( t ) ). In the second case, the value of F δ is non-negative,
F δ 0 b 1 b 0 = δ ( b log ( 1 / b ) + ( 1 b ) log ( 1 / ( 1 b ) ) ) + log ( 1 + δ ) δ ,
as log 2 ( 1 + δ ) δ for all δ [ 0 , 1 ] .

2.2. The General (Non-Boolean) Case

The general case is actually a minor extension of the Boolean one, which follows by adding one more coordinate to each of the random variables X and Y.
Theorem 3.
Let ϵ ( 0 , 1 ) , and let f , g : { 0 , 1 } n R 0 . As before, let X , Y be an ϵ-correlated pair of random variables with X uniformly distributed over { 0 , 1 } n . Then,
E [ f ( X ) g ( Y ) ] | f | 1 + ϵ | g | 1 + ϵ .
Proof. 
By continuity, it suffices to consider the case where all values of f and g are rational, and by homogeneity, we can clear common denominators and assume that they are integer-valued. Now, we wish to prove a slight extension of (5), namely
log x { 0 , 1 } n y { 0 , 1 } n r a ( x , y ) s d ( x , y ) f ( x ) g ( y )
n ( log ( r + s ) s / r ) + r + s 2 r log x { 0 , 1 } n f ( x ) 2 r r + s + log y { 0 , 1 } n f ( y ) 2 r r + s .
To this end, we now define as before the random variables X , Y and Z, add two more random integer variables a and b, and take ( X , Y , Z , A , B ) uniformly, with ( X , Y , Z ) defined as before and the additional constraint that A { 1 , , f ( X ) } and B { 1 , , g ( Y ) } . Now, (17) is precisely H ( Z , A , B ) = H ( Z ) + H ( A | Z ) + H ( B | Z ) = H ( Z ) + E X [ log ( f ( X ) ) ] + E Y [ log ( g ( Y ) ) ] . On the other hand, note that for any function t, it holds that
H ( X ) + E X [ log ( t ( X ) ) ] log x t ( x ) .
(This is the non-negativity of the Kullbak–Leibler divergence (relative entropy) of X and t ( X ) .) In particular,
r + s 2 r ( H ( X ) ) + E X [ log ( f ( X ) ) ] r + s 2 r log x { 0 , 1 } n f ( x ) 2 r r + s
and
r + s 2 r ( H ( Y ) ) + E Y [ log ( g ( Y ) ) ] r + s 2 r log y { 0 , 1 } n g ( y ) 2 r r + s
To complete the proof of Theorem 3, we just add E X [ log ( f ( X ) ) ] + E Y [ log ( g ( Y ) ) ] to both sides of the main inequality that we proved in the Boolean case, namely
H ( Z ) r + s 2 r H ( X ) + H ( Y ) + n ( log ( r + s ) s / r ) ,
and we are done. □

Funding

This research was supported in part by I.S.F. grant 0398246, BSF grant 2010247 and funding received from the MINERVA Stiftung with funds from the BMBF of the Federal Republic of Germany.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

I would like to thank David Ellis and Gideon Schechtman for our useful conversations and the anonymous referee for pointing out some inaccuracies, which, once fixed, led to a simplification of the proof. I would also like to thank Chandra Nair for spotting some errors in an early draft of this paper and for alerting me to the existence of two papers, [18,19], which also adopt an information-theoretic approach to hypercontractivity, along with [20,21], which address similar problems (and refer to an earlier arXiv version of this paper.) Also, I wish to thank an additional referee who suggested the following as further reading: [22,23].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gross, L. Logarithmic Sobolev inequalities. Am. J. Math. 1975, 97, 1061–1083. [Google Scholar] [CrossRef]
  2. Bonami, A. Etudes des coefficients Fourier des fonctiones de Lp(G). Ann. Inst. Fourier 1970, 20, 335–402. [Google Scholar] [CrossRef]
  3. Beckner, W. Inequalities in Fourier analysis. Ann. Math. 1975, 102, 159–182. [Google Scholar] [CrossRef]
  4. Kahn, J.; Kalai, G.; Linial, N. The influence of variables on boolean functions. In Proceedings of the 29th Annual Symposium on Foundations of Computer Science, White Plains, NY, USA, 24–26 October 1988; pp. 68–80. [Google Scholar]
  5. Benjamini, I.; Kalai, G.; Schramm, O. Noise sensitivity of boolean functions and applications to percolation. Inst. Hautes Etudes Sci. Publ. Math. 1999, 90, 5–43. [Google Scholar] [CrossRef]
  6. Dinur, I.; Friedgut, E.; Regev, O. Independent Sets in Graph Powers are Almost Contained in Juntas Geom. Funct. Anal. 2008, 18, 77–97. [Google Scholar] [CrossRef]
  7. Friedgut, E.; Kalai, G.; Naor, A. Boolean functions whose Fourier transform is concentrated on the first two levels. Adv. Appl. Math. 2002, 29, 427–437. [Google Scholar] [CrossRef]
  8. Kindler, G.; O’Donnell, R. Gaussian noise sensitivity and Fourier tails. In Proceedings of the 26th annual IEEE Conference on Computational Complexity, Porto, Portugal, 26–29 June 2012; pp. 137–147. [Google Scholar]
  9. Mossel, E. A Quantitative Arrow Theorem. Probab. Theory Relat. Fields 2012, 154, 49–88. [Google Scholar] [CrossRef]
  10. Mossel, E.; O’Donnell, R.; Regev, O.; Steif, J.; Sudakov, B. Non-interactive correlation distillation, inhomogeneous Markov chains, and the reverse Bonami-Beckner inequality. Israel J. Math. 2006, 154, 299–336. [Google Scholar] [CrossRef]
  11. O’Donnell, R. Analysis of Boolean Functions; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  12. Borell, C. Positivity improving operators and hypercontractivity. Math. Zeitschrift 1982, 180, 225–234. [Google Scholar] [CrossRef]
  13. Oleszkiewicz, K. On a nonsymmetric version of the Khinchine-Kahane inequality. Prog. Probab. 2003, 56, 156–168. [Google Scholar]
  14. Wolff, P. Hypercontractivity of simple random variables. Studia Math. 2007, 180, 219–236. [Google Scholar] [CrossRef]
  15. Ahlswede, R.; Gacs, P. Spreading of Sets in Product Spaces and Hypercontraction of the Markov Operator. Ann. Probab. 1976, 4, 925–939. [Google Scholar] [CrossRef]
  16. Friedgut, E.; Rödl, V. Proof of a hypercontractive estimate via entropy. Israel J. Math. 2001, 125, 369–380. [Google Scholar] [CrossRef]
  17. Blais, E.; Tan, L.Y. Hypercontractivity via the entropy method. Theory Comput. 2013, 9, 889–896. [Google Scholar] [CrossRef]
  18. Nair, C. Equivalent formulations of hypercontractivity using information measures. In Proceedings of the International Zurich Seminar on Communications, Zurich, Switzerland, 12–14 March 2014. [Google Scholar]
  19. Carlen, E.; Coredo-Erausquin, D. Subadditivity of the Entropy and its Relation to Brascamp–Lieb Type Inequalities. Geom. Funct. Anal. 2009, 19, 373–405. [Google Scholar] [CrossRef]
  20. Nair, C.; Wang, Y.N. Evaluating hypercontractivity parameters using information measures. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 570–574. [Google Scholar]
  21. Nair, C.; Wang, Y.N. Reverse hypercontractivity region for the binary erasure channel. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 938–942. [Google Scholar]
  22. Contreras-Reyes, J.E. Mutual information matrix based on Rényi entropy and application. Nonlinear Dyn. 2022, 110, 623–633. [Google Scholar] [CrossRef]
  23. Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Infor. Theor. 1991, 37, 145–151. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Friedgut, E. An Information-Theoretic Proof of a Hypercontractive Inequality. Entropy 2024, 26, 966. https://doi.org/10.3390/e26110966

AMA Style

Friedgut E. An Information-Theoretic Proof of a Hypercontractive Inequality. Entropy. 2024; 26(11):966. https://doi.org/10.3390/e26110966

Chicago/Turabian Style

Friedgut, Ehud. 2024. "An Information-Theoretic Proof of a Hypercontractive Inequality" Entropy 26, no. 11: 966. https://doi.org/10.3390/e26110966

APA Style

Friedgut, E. (2024). An Information-Theoretic Proof of a Hypercontractive Inequality. Entropy, 26(11), 966. https://doi.org/10.3390/e26110966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop