Next Article in Journal
Quantifying the Unitary Generation of Coherence from Thermal Quantum Systems
Next Article in Special Issue
Entropy and Information Inequalities
Previous Article in Journal
Evaluating Signalization and Channelization Selections at Intersections Based on an Entropy Method
Previous Article in Special Issue
Poincaré and Log–Sobolev Inequalities for Mixtures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dual Loomis-Whitney Inequalities via Information Theory

1
Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, USA
2
Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(8), 809; https://doi.org/10.3390/e21080809
Submission received: 6 July 2019 / Revised: 16 August 2019 / Accepted: 16 August 2019 / Published: 18 August 2019
(This article belongs to the Special Issue Entropy and Information Inequalities)

Abstract

:
We establish lower bounds on the volume and the surface area of a geometric body using the size of its slices along different directions. In the first part of the paper, we derive volume bounds for convex bodies using generalized subadditivity properties of entropy combined with entropy bounds for log-concave random variables. In the second part, we investigate a new notion of Fisher information which we call the L 1 -Fisher information and show that certain superadditivity properties of the L 1 -Fisher information lead to lower bounds for the surface areas of polyconvex sets in terms of its slices.

1. Introduction

Tomography concerns reconstructing a probability density by synthesizing data collected along sections (or slices) of that density and is a problem of great significance in applied mathematics. Some popular applications of tomography in the field of medical imaging are computed tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography (PET). In each of these, sectional data is obtained in a non-invasive manner using penetrating waves and images are generated using tomographic reconstruction algorithms. Geometric tomography is a term coined by Gardner [1] to describe an area of mathematics that deals with the retrieval of information about a geometric object from data about its sections, projections or both. Gardner notes that the term geometric is deliberately vague, since it may be used to study convex sets or polytopes as well as more general shapes such as star-shaped bodies, compact sets or even Borel sets.
An important problem in geometric tomography is estimating the size of a set using lower dimensional sections or projections. Here, projection of a geometric object refers to its shadow or orthogonal projection, as opposed to the marginal of a probability density. As detailed in Campi and Gronchi [2], this problem is relevant in a variety of settings ranging from the microscopic study of biological tissues [3,4], to the study of fluid inclusions in minerals [5,6] and to reconstructing the shapes of celestial bodies [7,8]. Various geometric inequalities provide bounds on the sizes of sets using lower dimensional data pertaining to their projections and slices of sets. The “size” of a set often refers to its volume but it may also refer to other geometric properties such as its surface area or its mean width. A canonical example of an inequality that bounds the volume of set using its orthogonal projections is the Loomis-Whitney inequality [9]. This inequality states that for any Borel measurable set K R n ,
V n ( K ) i = 1 n V n 1 ( P e i K ) 1 n 1 .
Equality holds in (1) if and only if K is a box with sides parallel to the coordinate axes. The Loomis-Whitney inequality has been generalized and strengthened in a number of ways. Burago and Zalgaller [10] proved a version of (1) that considers projections of K on to all m-dimensional spaces spanned by { e 1 , , e n } . Bollobas and Thomason [11] proved the Box Theorem which states that for every Borel set K R n , there exists a box B such that V n ( B ) = V n ( K ) and V m ( P S B ) V m ( P S K ) for every m-dimensional coordinate subspace S. Ball [12] showed that the Loomis-Whitney inequality is closely related to the Brascamp-Lieb inequality [13,14] from functional analysis and generalized it to projections along subspaces that satisfy a certain geometric condition. Inequality (1) also has deep connections to additive combinatorics and information theory. Some of these connections have been explored in Balister and Bollobas [15], Gyarmati et al. [16] and Madiman and Tetali [17].
A number of geometric inequalities also provide upper bounds for the surface area of a set using projections. Naturally, it is necessary to make some assumptions for such results, since one can easily conjure sets that have small projections while having a large surface area. Betke and McMullen [2,18] proved that, for compact convex bodies,
V n 1 ( K ) 2 i = 1 n V n 1 ( P e i K ) .
Motivated by inequalities (1) and (2), Campi and Gronchi [2] investigated upper bounds for intrinsic volumes [19] of compact convex sets.
Inequalities (1) and (2) provide upper bounds and a natural question of interest is developing analogous lower bounds. Lower bounds are obtained via reverse Loomis-Whitney inequalities or dual Loomis-Whitney inequalities. The former uses projection information whereas the latter uses slice information, often along the coordinate axes. A canonical example of a dual Loomis-Whitney inequality is Meyer’s inequality [20], which states that for a compact convex set K R n , the following lower bound holds:
V n ( K ) n ! n n i = 1 n V n 1 ( K e i ) 1 n 1 ,
with equality if and only if K is a regular crosspolytope. Generalizations of Meyer’s inequality have also been recently obtained in Li and Huang [21] and Liakopoulos [22]. Betke and McMullen [2,18] established a reverse Loomis-Whitney type inequality for surface areas of compact convex sets:
V n 1 ( K ) 2 4 i = 1 n V n 1 ( P e i K ) 2 .
Campi et al. [23] extended inequalities (3) and (4) for intrinsic volumes of certain convex sets.
Our goal in this paper is to develop lower bounds on volumes and surface areas of geometric bodies that are most closely related to dual Loomis-Whitney inequalities; that is, inequalities that use slice-based information. The primary mathematical tools we use are entropy and information inequalities; namely, the Brascamp-Lieb inequality, entropy bounds for log-concave random variables and superadditivity properties of a suitable notion of Fisher information. Using information theoretic tools allows our results to be quite general. For example, our volume bounds rely on maximal slices parallel to a set of subspaces and are valid for very general choices of subspaces. Our surface area bounds are valid for polyconvex sets, which are finite unions of compact convex sets. The drawback of using information theoretic strategies is that the resulting bounds are not always tight; that is, equality may not achieved by any geometric body. However, we show that in some cases our bounds are asymptotically tight as the dimension n tends to infinity, thus partly mitigating the drawbacks. Our main contributions are as follows:
  • Volume lower bounds: In Theorem 3, we establish a new lower bound on the volume of a compact convex set in terms of the size of its slices. Just as Ball [12] extended the Loomis-Whitney inequality to projections in more general subspaces, our inequality also allows for slices parallel to subspaces that are not necessarily e i . Another distinguishing feature of this bound is that unlike classical dual Loomis-Whitney inequalities, the lower bound is in terms of maximal slices; that is, the largest slice parallel to a given subspace. The key ideas we use are the Brascamp-Lieb inequality and entropy bounds for log-concave random variables.
  • Surface area lower bounds: Theorem 7 contains our main result that provides lower bounds for surface areas. Unlike the volume bounds, the surface area bounds are valid for the larger class of polyconvex sets, which consists of finite unions of compact, convex sets. Moreover, the surface area lower bound is not simply in terms of the maximal slice; instead, this bound uses all available slices along a particular hyperplane. As in the volume bounds, the slices used may be parallel to general ( n 1 ) -dimensional subspaces and not just e i . The key idea is motivated by a superadditivity property of Fisher information established in Carlen [24]. Instead of classical Fisher information, we develop superadditivity properties for a new notion of Fisher information which we call the L 1 -Fisher information. This superadditivity property when restricted to uniform distributions over convex bodies yields the lower bound in Theorem 7.
In Section 2 we state and prove our volume lower bound and in Section 3 we state and prove our surface area bound. We conclude with some open problems and discussions in Section 4.
Notation: For n 1 , let [ n ] denote the set { 1 , 2 , , n } . For K R n and any subspace E R n , the orthogonal projection of K on E is denoted by P E K . The standard basis vectors in R n are denoted by { e 1 , e 2 , , e n } . We use the notation V r to denote the volume functional in R r . The boundary of K is denoted by K and its surface area is denoted by V n 1 ( K ) . For a random variable X taking values in R n , the marginal of X along a subspace E is denoted by P E X . In this paper, we shall consider random variables with bounded variances and whose densities lie in the convex set { f | R n f ( x ) log ( 1 + f ( x ) ) < } . The differential entropy of such random variables is well-defined and is given by
h ( X ) = R n p X ( x ) log p X ( x ) d x ,
where X p X is an R n -valued random variable. The Fisher information of a random variable X with a differentiable density p X is given by
I ( X ) = R n log p X ( x ) 2 2 p X ( x ) d x .

2. Volume Bounds

The connection between functional/information theoretic inequalities and geometric inequalities is well-known. In particular, the Brascamp-Lieb inequality has found several applications in geometry as detailed in Ball [13]. In the following section we briefly discuss the Brascamp-Lieb inequality and its relation to volume inequalities.

2.1. Background on the Brascamp-Lieb Inequality

We shall use the information theoretic form of the Brascamp-Lieb inequality, as found in Carlen et al. [25]:
Theorem 1.
[Brascamp-Lieb inequality] Let X be a random variable taking values in R n . Let E 1 , E 2 , , E m R n be subspaces with dimensions r 1 , r 2 , , r n respectively and c 1 , c 2 , , c m > 0 be constants. Define
M = sup X h ( X ) j = 1 m c j h ( P E i X ) ,
and
M g = sup X G h ( X ) j = 1 m c j h ( P E j X ) ,
where G is the set of all Gaussian random variables taking values in R n . Then M = M g and M g (and therefore M) is finite if and only if i = 1 m r i c i = n and for all subspaces V R n , we have dim ( V ) i = 1 n dim ( P E i V ) c i .
Throughout this paper, we assume that E i and c i are such that M < . As detailed in Bennett et al. [14], the Brascamp-Lieb inequality generalizes many popular inequalities such as Holder’s inequality, Young’s convolution inequality and the Loomis-Whitney inequality. In particular, Ball [12] showed that the standard Loomis-Whitney inequality in (1) could be extended to settings where projections are obtained on more general subspaces:
Theorem 2
(Ball [12]). Let K be a closed and bounded set in R n . Let E i and c i for i [ m ] and M g be as in Theorem 1. Let P E i K be the projection of K on to the subspace E i , for i [ m ] . Let the dimension of E i be r i for i [ m ] . Then the volume of K may be upper-bounded as follows:
V n ( K ) e M g i = 1 m V r i ( P E i K ) c i .
Since we shall be using a similar idea in Section 2, we include a proof for completeness.
Proof. 
Consider a random variable X that is uniformly distributed on K; that is X p X = Unif ( K ) . Let P E i X denote the random variable obtained by projecting X on E i or equivalently the marginal of X in subspace E i . Naturally, supp ( P E i X ) P E i K and thus
h ( P E i X ) log V r i ( P E i K ) , for i [ m ] .
Substituting these inequalities in the Brascamp-Lieb inequality for X, we obtain
h ( X ) = log V n ( K ) i = 1 m c i log V r i ( P E i K ) + M g .
Exponentiating both sides concludes the proof. □
To show that the Loomis-Whitney inequality is implied by Theorem 2, we set E i = e i , c i = 1 / ( n 1 ) for i [ n ] and use Szasz’s inequality or other tools from linear algebra [26] to show that the supremum below evaluates to 1:
e M g = sup K 0 det K i = 1 n det K i 1 n .
In general, Ball [12] showed that if the E i and c i satisfy what is called John’s condition; that is i = 1 m c i P E i x = x for all x R n , then M g = 0 .

2.2. Volume Bounds Using Slices

Providing lower bounds for volumes in terms of projections requires making additional assumptions on the set K. A simple counterexample is the ( n 1 ) dimensional sphere (shell), which can have arbitrarily large projections in lower dimensional subspaces but has 0 volume. Even for convex K, providing lower bounds using a finite number of projections fails. For example, given a finite collection of subspaces, we may consider any convex set supported on a random ( n 1 ) dimensional subspace of R n which will have (with high probability) non-zero projections on all subspaces in the collection. Clearly, such a set has volume 0. Therefore, it makes sense to obtain lower bounds on volumes using slices instead of projections, as in Meyer’s inequality (3).
Given a subspace E i , the slice parallel to E i is not unambiguously defined as it depends on translations of E i . For this reason we consider the maximal slice; that is, the largest slice parallel to a given subspace. Note that although Meyer’s inequality (3) is not stated in terms of maximal slices, it remains valid even if the right hand side of inequality (3) is replaced by maximal slices parallel to e i . This is because one can always choose the origin of the coordinate system such that the largest slice parallel to e i is K e i . However, when subspaces are in a more general orientation, it is not always possible to select the origin that simultaneously maximizes the slices along all subspaces. Our main result is the following:
Theorem 3.
Let K be a compact convex body in R n . For j [ m ] , let E j R n be subspaces with dimensions r j and c j > 0 be constants. Let S max ( j ) be the largest slice of K by a subspace orthogonal to E j ; that is,
S max ( j ) = sup t E j V n r j ( K ( E j + t ) ) .
Then the following inequality holds:
V n ( K ) j = 1 m S max ( j ) c j e n + M g 1 / ( C 1 ) ,
where C = j = 1 m c j and M g is the Brascamp-Lieb constant corresponding to { E j , c j } j [ m ] .
Proof. 
There are two main components in the proof. First, let X be a random variable that is uniformly distributed on K. The Brascamp-Lieb inequality yields the bound
h ( X ) j = 1 m c j h ( P E j X ) + M g .
When deriving upper bounds on volume, we employ the upper bound h ( P E i X ) log V r i ( P E i K ) . Here, we employ a slightly different strategy. Note that X, being a uniform distribution on a convex set, is a log-concave random variable. Thus, any lower dimensional marginal of X is also log-concave [27]. Furthermore, the entropy of a log-concave random variable is tightly controlled by the maximum value of its density. For a log-concave random variable Z taking values in R n and distributed as p Z , it was shown in Bobkov and Madiman [28] that
1 n log 1 p Z h ( Z ) n 1 n log 1 p Z + 1 ,
where p Z is the largest value of the probability density p Z . Define Z i : = P E i X . The key point to note is that p Z i is given by the size of the largest slice parallel to E i , normalized by V n ( K ) ; that is, p Z i = S max ( i ) V n ( K ) . Thus, for i [ m ] ,
h ( Z i ) r i + log 1 p Z i = r i + log V n ( K ) S max ( i ) .
Substituting this in the Brascamp-Lieb bound, we obtain
log V n ( K ) j = 1 m c j r j + c j log V n ( K ) S max ( j ) + M g = n + C log V n ( K ) j = 1 m c j log S max ( j ) + M g .
Note that j = 1 m c j > j = 1 m c j ( r j / n ) = 1 , and thus we may rearrange and exponentiate to obtain
V n ( K ) j = 1 m S max ( j ) c j e n + M g 1 C 1 .
 □
It is instructive to compare Meyer’s inequality to the bound obtained using Theorem 3 for the same choice of parameters. Substituting M g = 0 , E i = e i and c i = 1 , Theorem 3 gives the bound
V n ( K ) j = 1 n S max ( j ) e n 1 n 1 .
To compare with Meyer’s inequality (3), first assume that the origin of the coordinate planes is selected such that the intersection with e i corresponds to the maximal slice along e i . With such a choice, we may simply compare the constants in the two inequalities. Observe that
n ! n n 1 n 1 1 e n 1 n 1 ,
and thus Meyer’s inequality (3) yields a tighter bound. However, Sterling’s approximation implies that for large enough n the two constants are approximately the same. Thus, Theorem 3 yields an asymptotically tight result.
Note that if the slices are not aligned along the coordinate axes or if the slices are in larger dimensions, then Meyer’s inequality (3) is not applicable but Theorem 3 continues to yield valid inequalities. An important special case is when there are more than n directions along which slices are available. If u 1 , u 2 , , u m are unit vectors and constants c 1 , c 2 , , c m satisfy John’s condition [12]; that is, j = 1 m c j P u j ( x ) = x for all x R n , then Theorem 3 yields the bound
V n ( K ) j = 1 m S max ( j ) c j e n 1 n 1 ,
where S max ( j ) is the size of the largest slice by a hyperplane perpendicular to u j . Note that if { u j , c j } satisfy John’s condition, then so do { u j , c j / ( n 1 ) } . Applying the bound from Theorem 2 in this case yields
V n ( K ) j = 1 m V n 1 ( P u j K ) c j / ( n 1 ) ,
which may be compared with inequality (12) by observing S max ( j ) V n 1 ( P u j K ) .

3. Surface Area Bounds

The information theoretic quantities of entropy and Fisher information are closely connected to the geometric quantities of volume and surface area, respectively. Surface area of K R n is defined as
V n 1 ( K ) = lim ϵ 0 + V n ( K ϵ B n ) V n ( K ) ϵ ,
where B n is the Euclidean ball in R n with unit radius and ⊕ refers to the Minkowski sum. The Fisher information of a random variable X satisfies a similar relation,
I ( X ) 2 = lim ϵ 0 h ( X + ϵ Z ) h ( Z ) ϵ ,
where Z is a standard Gaussian random variable that is independent of X. Other well-known connections include the relation between the entropy of a random variable and the volume of its typical set [29,30], isoperimetric inequalities concerning Euclidean balls and Gaussian distributions and the observed similarity between the Brunn-Minkowski inequality and the entropy power inequality [31]. In Section 2, we used subadditivity of entropy as given by the Brascamp-Lieb inequality to develop volume bounds. To develop surface area bounds, it seems natural to use Fisher information inequalities and adapt them to geometric problems. In the following subsection, we discuss relevant Fisher-information inequalities.

3.1. Superadditivity of Fisher Information

The Brascamp-Lieb subadditivity of entropy has a direct analog noted in Reference [25]. We focus on the case when { u j } and constants { c j } for j [ m ] satisfy John’s condition. The authors of Reference [25] provide an alternate proof to the Brascamp-Lieb inequality in this case by first showing a superadditive property of Fisher information, which states that
I ( X ) j = 1 m c j I ( P u j X ) .
The Brascamp-Lieb inequality follows by integrating inequality (15) using the following identity that holds for all random variables X taking values in R n [32]:
h ( X ) = n 2 log 2 π e t = 0 I ( X t ) n 1 + t d t ,
where X t = X + t Z for a standard normal random variable Z that is independent of X. In particular, using this formula for inequality (15) yields the geometric Brascamp-Lieb inequality of Ball [33]:
h ( X ) j = 1 m c j h ( P u j X ) .
If u i = e i and c i = 1 for i [ n ] , then inequality (15) reduces to the standard superadditivity of Fisher information:
I ( X ) i = 1 n I ( X i ) ,
where X = ( X 1 , , X n ) .
In Section 2, we directly used the entropic Brascamp-Lieb inequality on random variables uniformly distributed over suitable sets K R n . It is tempting to use inequality (15) to derive surface area bounds for geometric bodies. Unfortunately, directly substituting X to be uniform over K R n in inequality (15) does not lead to any useful bounds. This is because the left hand side, namely I ( X ) , is + since the density of X is not differentiable. Thus, it is necessary to modify inequality (15) before we can apply it to geometric problems. A classical result concerning superadditivity of Fisher information-like quantities is provided in Carlen [24]:
Theorem 4
(Theorem 2, [24]). For p [ 1 , ) , let f : R m × R n R be a function in L p ( R m ) W 1 , p ( R n ) . Define the marginal map M as
G ( y ) = R m | f ( x , y ) | p d x 1 / p ,
denoted by M f = G . Then the following inequality holds:
R n | y G ( y ) | p d y R m R n | y f ( x , y ) | p d x d y .
Carlen [24] also established the (weak) differentiability of G and the continuity of M prior to proving Theorem 4, so the derivatives in its statement are well-defined. The notion of Fisher information we wish to use is essentially identical to the case of p = 1 in Theorem 4. However, since our goal is to use this result for uniform densities over compact sets, we cannot directly use Theorem 4, since such densities do not satisfy the required assumptions. In particular, the (weak) partial derivatives of conditional densities are defined in terms of Dirac delta distributions and so the densities do not lie in the Sobolev space W 1 , 1 ( R n ) . To get around this, we redefine the p = 1 case as follows:
Definition 1.
Let X = ( X 1 , , X n ) be a random vector on R n and f X ( · ) be its density function. For any unit vector u R n , define
I 1 ( X ) u : = lim ϵ 0 + R | f X ( x ) f X ( x ϵ u ) | ϵ d x ,
given that the limit exists. Define the L 1 -Fisher information of X as
I 1 ( X ) : = i = 1 n I 1 ( X ) e i ,
given that the right hand side is well-defined. In particular, when X is a real-valued random variable,
I 1 ( X ) = lim ϵ 0 + R | f X ( x ) f X ( x ϵ ) | ϵ d x .
Our new definition is motivated by observing that Theorem 4 is essentially a data processing result for ϕ -divergences and specializing it to the total variation divergence yields our definition. To see this, consider real-valued random variables X and Y with a joint density f ˜ ( x , y ) . Let the marginal of Y on R be G ˜ ( · ) . For ϵ > 0 , consider the perturbed random variable ( X ϵ , Y ϵ ) = ( X , Y + ϵ ) . Let the joint density of this perturbed random variable be f ˜ ϵ and the marginal of Y ϵ by G ˜ ϵ . Recall that for every convex function ϕ satisfying ϕ ( 1 ) = 0 , it is possible to define the divergence D ϕ ( p | | q ) = ϕ p ( x ) q ( x ) q ( x ) d x for two probability densities p and q. It is well-known that such divergences satisfy the data-processing inequality [34]; that is, if X X Y Y is a Markov chain, then D ϕ ( X | | Y ) D ϕ ( X | | Y ) . Using this fact, we obtain
D ϕ ( f ˜ ϵ | | f ˜ ) D ϕ ( G ˜ ϵ | | G ˜ ) .
Choosing ϕ ( t ) = ( t 1 ) p and using Taylor’s expansion, it is easy to see that
D ϕ ( f ˜ ϵ | | f ˜ ) = R 2 | f ˜ ( x , y ) f ˜ ( x , y ϵ ) | p f ˜ ( x , y ) p 1 d x d y = ϵ p R 2 | f ˜ ( x , y ) / y | p f ˜ ( x , y ) p 1 d x d y + o ( ϵ p ) .
And similarly,
D ϕ ( G ˜ ϵ | | G ˜ ) = R | G ˜ ( y ) G ˜ ( y ϵ ) | p G ˜ ( y ) p 1 d y = ϵ p R | d G ˜ ( y ) / d y | p G ˜ ( y ) p 1 d y + o ( ϵ p ) .
Substituting in inequality (20), dividing by ϵ p and taking the limit as ϵ 0 yields
R 2 | f ˜ ( x , y ) / y | p f ˜ ( x , y ) p 1 d x d y R | d G ˜ / d y | p G ˜ ( y ) p 1 d y .
The above inequality is exactly equivalent to that in Theorem 4 using the substitution G ˜ = G p and f ˜ = f p . Although we focused on joint densities over R × R , the same argument also goes through for random variables on R m × R n .
Recall that Definition 1 redefines the case of p = 1 in Theorem 4. Such redefinitions could indeed be done for p > 1 as well. However, the perturbation argument presented above makes it clear that if p > 1 , the ϕ -divergence between a random variable (taking uniform values on some compact set) and its perturbation will be + , since their respective supports are mismatched. Thus, analogous definitions for p > 1 will not yield useful bounds for such distributions. Using Definition 1, we now establish superadditivity results for the L 1 -Fisher information.
Lemma 1.
Let X be an R n -valued random variable with a smooth density f X ( · ) . Let u R n be any unit vector. Define X · u to be the projection of X along u. Then the following inequality holds when both sides are well-defined:
I 1 ( X · u ) I 1 ( X ) u .
Proof. 
Define the random variable X ϵ : = X + ϵ u . Then the distribution of X ϵ satisfies
f X ϵ ( x ) = f X ( x ϵ u ) ,
and is therefore a translation of f X along the direction u by a distance ϵ . Using the data-processing inequality for total-variation distance, we obtain
d T V ( X · u , X ϵ · u ) d T V ( X ϵ , X ) ,
where d T V is the total variation divergence. Notice that X ϵ · u = X · u + ϵ and thus f X ϵ · u ( x ) = f X · u ( x ϵ ) . Dividing the left hand side of inequality (24) by ϵ and taking the limit as ϵ 0 , we obtain
lim ϵ 0 + d T V ( X · u , X ϵ · u ) ϵ = 1 2 lim ϵ 0 + R | f X · u ( x ) f X · u ( x ϵ ) | ϵ d x = ( a ) 1 2 I 1 ( X · u ) .
Here, equality ( a ) follows by the definition of I 1 ( X · u ) and the assumption that it is well-defined. Doing a similar calculation for the right hand side of inequality (24) leads to
lim ϵ 0 + d T V ( X , X ϵ ) ϵ = 1 2 lim ϵ 0 + R n | f X ( x ) f X ( x ϵ u ) | ϵ d x = ( a ) 1 2 I 1 ( X ) u .
The equality in ( a ) follows from the definition of I 1 ( X ) u and the assumption that it is well-defined. □
Our next result is a counterpart to the superadditivity property of Fisher information as in inequality (17).
Theorem 5.
Let X = ( X 1 , , X n ) be an R n -valued random variable. Then the following superadditivity property holds:
i = 1 n I 1 ( X i ) I 1 ( X ) .
Proof. 
Applying Lemma 1 for the unit vectors e 1 , , e n , we obtain
i = 1 n I 1 ( X i ) i = 1 n I 1 ( X ) e i = I 1 ( X ) .
 □

3.2. Surface Integral Form of the L 1 -Fisher Information

If we consider a random variable X that takes values uniformly over a set K R n , then the L 1 -Fisher information superaddivity from Theorem 5 allows us to derive surface area inequalities once we observe two facts:
(a)
The L 1 -Fisher information I 1 ( X ) is well-defined for X and is given by a surface integral over K and
(b)
The quantity I 1 ( X ) e i may be calculated exactly given the sizes of all slices parallel to e i or may be lower-bounded by using any finite number of slices parallel to e i .
Establishing the surface integral result in part (a) requires making some assumptions on the shape of the geometric body. We focus on the class of polyconvex sets [19,35], which are defined as follows:
Definition 2.
A set K R n is called a polyconvex set if it can be written as K = i = 1 m C i , where m < and each C i is a compact, convex set in R n that has positive volume. Denote the set of polyconvex sets in R n by K .
In order to make our analysis tractable and rigorous, we first focus on polytopes and prove the polyconvex case by taking a limiting sequence of polytopes. Recall that convex polytope is the convex hull of a set of points. A precise definition of a polytope is as follows:
Definition 3.
Define the set of polytopes, denoted by P to be all subsets of R n such that every K P admits a representation K = j = 1 m P j , where m > 0 and P j is a compact, convex polytope in R n with positive volume for each 1 j m .
In what follows, we make observations ( a ) and ( b ) precise.
Theorem 6.
Let X be uniformly distributed over a polytope K. Then the following equality holds:
I 1 ( X ) = 1 V n ( K ) K n ( x ) 1 d S .
where n ( x ) is a unit normal vector at x on K .
Proof of Theorem 6.
The equality in (25) is not hard to see intuitively. Consider the set K and its perturbed version K ϵ that is obtained by translating K in the direction of e i by ϵ . The L 1 distance between the uniform distributions on K and K ϵ is easily seen to be
1 V n ( K ) V n ( K K ϵ ) V n ( K K ϵ ) .
As shown in Figure 1, each small patch d S contributes | n ( x ) · e i | d S volume to ( K K ϵ ) \ ( K K ϵ ) , where n ( x ) is the normal to the surface at d S . Summing up over all such patches d S yields the desired conclusion. We make this proof rigorous with the aid of two lemmas:
Lemma 2
(Proof in Appendix A). Let X be uniformly distributed over a compact measurable set K R n . If there exists an integer L such that the intersection between K and any straight line can be divided into at most L disjoint closed intervals, then
I 1 ( X ) e i = R n 1 2 N i ( , x i ^ , ) V n ( K ) d x 1 d x i ^ d x n .
Here x i ^ stands for removing x i from the expression. The function N i ( , x i ^ , ) is the number of disjoint closed invervals of the intersection of K and line { X j = x j , 1 j n , j i } .
The above lemma does not require K to be a polytope. However, the surface integral Lemma 3 below uses this assumption.
Lemma 3
(Proof in Appendix B). Let X be uniform over a polytope K P . Then
R n 1 2 N i ( , x i ^ , ) V n ( K ) d x 1 d x i ^ d x n = 1 V n ( K ) K | n ( x ) · e i | d S .
Here n ( x ) is the normal vector at point x K and d S is the element for surface area.
Lemmas 2 and 3 immediately yield the desired conclusion, since I 1 ( X ) = i = 1 n I 1 ( X ) e i and n ( x ) 1 = i = 1 n | n ( x ) · e i | .  □
Our goal now is to connect I 1 ( X i ) to the size of the slices of K along e i .

3.3. L 1 -Fisher Information via Slices

Consider the marginal density of X 1 , which we denote by f X 1 . It is easy to see that for each x 1 supp f X 1 , we have
f X 1 ( x 1 ) = V n 1 ( K ( e 1 + x 1 ) ) V n ( K ) .
Thus, the distribution of X 1 is determined by the slices of K by hyperplanes parallel to e 1 . Since Theorem 5 is expressed in terms of I 1 ( X i ) , where each X i is a real-valued random variable, we establish a closed form expression for real-valued random variables in terms of their densities as follows:
Lemma 4
(Proof in Appendix C). Let X be a continuous real-valued random variable with density f X . If we can find = a 0 < a 1 < < a M + 1 = such that (a) f X is continuous and monotonic on each open interval ( a i , a i + 1 ) ; (b) For i = 0 , , M , the limits
f ( a i + ) = lim x a i + f X ( x ) f o r i = 1 , , M , a n d f ( a i ) = lim x a i f X ( x ) f o r i = 1 , , M
exist and are finite. Then
I 1 ( X ) = i = 0 M | f ( a i + 1 ) f ( a i + ) | + i = 1 M | f ( a i + ) f ( a i ) | .
We can see that the first sum in (28) captures the change of function values on each monotonic interval and the second term captures the difference of the one-sided limits at end points. The following two corollaries are immediate.
Corollary 1.
Let X be uniformly distributed on finitely many disjoint closed intervals; that is, there exist disjoint intervals [ a i , b i ] R for i [ N ] and τ R such that
f X ( x ) = τ x i = 1 N [ a i , b i ] , a n d 0 otherwise ,
then I 1 ( X ) = 2 N τ .
Corollary 2.
Let X be a real-valued random variable with unimodal piecewise continuous density function f X . Then the following equality holds:
I 1 ( X ) = 2 f .
Lemma 4 gives an explicit expression to compute I 1 when we know the whole profile of f X . When f X is only known for certain values x, we are able to establish a lower bound for I 1 ( X ) . Note that knowing f X for only certain values corresponds to knowing the sizes of slices along a certain directions.
Corollary 3.
Let X f X where f X is as in Lemma 4. If there exists a set
S = { = θ 0 < θ 1 < < θ N < θ N + 1 = }
such that f X is continuous at each θ i for i [ N ] , then
I 1 ( X ) i = 0 N | f ( θ i + 1 ) f ( θ i ) | .
Proof. 
We can find T = { a i | i = 0 , , M + 1 } where a 0 = θ 0 = , a M + 1 = θ N + 1 = + such that they satisfy the conditions in Lemma 4 and
I 1 ( X ) = i = 0 M | f X ( a i + 1 ) f X ( a i + ) | + i = 1 M | f X ( a i + ) f X ( a i ) | .
Consider the set S T = { c 0 , , c L + 1 } , which divides R into subintervals
( c i , c i + 1 ) for 0 i L + 1 .
We claim that
I 1 ( X ) = i = 0 L | f X ( c i + 1 ) f X ( c i + ) | + i = 1 L | f X ( c i + ) f X ( c i ) | .
For the second term, note that
i = 1 L | f X ( c i + ) f X ( c i ) | = i = 1 M | f X ( a i + ) f X ( a i ) | ,
since f X is assumed to be continuous at θ i for i [ N ] . The points in S \ T subdivide each of the intervals ( a i , a i + 1 ) ; that is, for each interval ( a i , a i + 1 ) we can find an index j 0 such that a i = c j 0 < c j 0 + 1 < < c j 0 + r < c j 0 + r + 1 = a i + 1 and the monotonicity of the function over ( a i , a i + 1 ) gives
| f X ( a i + ) f X ( a i + 1 ) | = j = 0 r | f X ( c j 0 + j + 1 ) f X ( c j 0 + j + ) | .
Summing up over all intervals yields equality (31). To conclude the proof, note that f X is not necessarily monotonic in the interval ( θ i , θ i + 1 ) . Thus, if we have indices θ i = c k 0 < < c k 0 + s + 1 = θ i + 1 , the triangle inequality yields
| f X ( θ i + 1 ) f X ( θ i ) | = ( a ) | f X ( θ i + 1 ) f X ( θ i + ) | = | u = 0 s f X ( c k 0 + u + ) f X ( c k 0 + u + 1 ) + u = 1 s f X ( c k 0 + u ) f X ( c k 0 + u + ) | u = 0 s | f X ( c k 0 + u + ) f X ( c k 0 + u + 1 ) | + u = 1 s | f X ( c k 0 + u ) f X ( c k 0 + u + ) | .
Here, equality ( a ) follows from the continuity of f X at the points in S. Performing the above summation over all intervals ( θ i , θ i + 1 ) for 0 i N and using equality (31), we may conclude the inequality
I 1 ( X ) i = 0 N | f ( θ i ) f ( θ i + 1 ) | .
 □
Remark 1.
Suppose K is the union of two squares joined at the corner as shown in Figure 2. Let X be uniformly distributed on K. Suppose also that the slice of K is known only at θ 1 . By direct calculation, we have I 1 ( X · e 1 ) = 2 , since X · e 1 is uniform over [ 0 , 1 ] . Notice that f X · e 1 ( θ 1 ) = 2 and thus the bound from Corollary 3 is 4, which is larger than I 1 ( X · e 1 ) . This reversal is due to the discontinuity of f X · e 1 at the sampled location θ 1 f X · e 1 ( θ 1 ) equals neither the left limit or the right limit at θ 1 . To avoid such scenarios, we require continuity of the density at sampled points.
Corollary 3 shows that under mild conditions, we can estimate I 1 ( X ) when only limited information is known about its density function.

3.4. Procedure to Obtain Lower Bounds on the Surface Area

We first verify that the assumptions required by Lemma 4 are satisfied by the marginals of uniform densities over polytopes.
Lemma 5
(Proof in Appendix D). Suppose X = ( X 1 , , X n ) is uniformly distributed over a polytope K P . Let u be any unit vector and let f X · u be the marginal density of X · u . Then f X · u ( · ) satisfies the conditions in Lemma 4.
Now suppose X = ( X 1 , , X n ) is uniformly distributed over a polytope K. Since K is a polytope, we may write K = i = 1 m P i where each P i is a compact, convex polytope. Theorem 5 provides the lower bound:
1 V n ( K ) K n ( x ) 1 d S i = 1 n I 1 ( X i ) .
To derive surface area bounds, notice that
n = n n ( x ) 2 n ( x ) 1 ,
and thus
V n 1 ( K ) V n ( K ) 1 n i = 1 n I 1 ( X i ) .
Suppose we know the sizes of some finite number of slices by hyperplanes parallel to e i for i [ n ] . We may use Corollary 3 to obtain lower bounds B i V n ( K ) on I 1 ( X i ) for each i [ n ] using the available slice information. This leads to the lower bound
V n 1 ( K ) V n ( K ) 1 n i = 1 n B i V n ( K ) ,
and thereby we may conclude the lower bound
V n 1 ( K ) 1 n i = 1 n B i .
This is made rigorous in the following result, which may be considered to be our main result concerning surface areas.
Theorem 7.
Let K be a polyconvex set. For i [ n ] , suppose that we have M i 0 slices of K obtained by hyperplanes e i + t 1 i , , e i + t M i i ( t 1 i < < t M i i ) , with sizes α 1 i , , α M i i . Then the surface area of K is lower-bounded by
V n 1 ( K ) 1 n i = 1 n j = 0 M i | α j i α j + 1 i | ,
where α 0 i , α M i + 1 i = 0 for all i [ n ] .
Proof. 
Let K be a polyconvex set with a representation K = i = 1 m C i where C i are compact, convex sets. For each C i , we construct a sequence of convex polytopes { P i k } which approximate C i from the outside. This means that C i P i k for all k 1 and lim k d ( P i k , C i ) 0 , where d is the Hausdorff metric. (This is easily achieved, for instance by sampling the support function of C i uniformly at random and constructing the corresponding polytope.) Consider the sequence of polytopes P k = i = 1 m P i k . For each k, we would like to assert that inequality (35) holds for the polytope P k ; that is, we would like to lower bound V n 1 ( P k ) using the slices of P k at ( e i + t i j ) for i [ n ] and j [ M i ] . The only difficulty in applying Corollary 3 to obtain such a lower bound on V n 1 ( P k ) is the continuity assumption, which states that the marginal of the uniform density of P k on e i , denoted by f P k · e i , should be continuous at t i j for all i [ n ] and all j [ M i ] . However, this is easily ensured by choosing an outer approximating polytope for C i that has no face parallel to e i for all i [ n ] .
To complete the proof for K, we need to show that lim k V n 1 ( P k ) = V n 1 ( K ) and lim k V n 1 ( ( e i + t i j ) P k ) = V n 1 ( ( e i + t i j ) K ) for any i [ n ] and any j [ M i ] . To show this, we use the following lemma [36]:
Lemma 6
(Lemma 1 [36]). Let K 1 , K m R n be compact sets. Let { K i k } , k 1 be a sequence of compact approximations converging to K i in Hausdorff distance, such that K i K i n for all n 1 and for i [ m ] . Then it holds that
lim k d i = 1 m K i , i = 1 m K i k = 0 .
Using Lemma 6, we observe that for any collection of indices 1 i 1 < < i l m , we must have d ( P i 1 k P i l k , C i 1 C i l ) 0 as k . Since surface area is convex continuous with respect to the Hausdorff measure [19,35], we have the limit
lim n V n 1 ( ( P i 1 k P i l k ) ) = V n 1 ( ( C i 1 C i l ) ) .
Moreover, surface area is a valuation on polyconvex sets [19,35] and thus the surface area of a union of convex sets is obtained using the inclusion exclusion principle. In particular, the surface area of K is
V n 1 ( K ) = i = 1 n V n 1 ( C i ) i 1 < i 2 V n 1 ( ( C i 1 C i 2 ) ) + + ( 1 ) m + 1 V n 1 ( ( i = 1 m C i ) ) ,
and the surface area of P k is given by
V n 1 ( P k ) = i = 1 n V n 1 ( P i k ) i 1 < i 2 V n 1 ( ( P i 1 k P i 2 k ) ) + + ( 1 ) m + 1 V n 1 ( ( i = 1 m P i k ) ) .
Using the limit in equation (37), we may conclude that every single term in (39) converges to the corresponding term in (38) and so
lim k V n 1 ( P k ) = V n 1 ( K ) .
We now show that each slice of P k converges in size to the corresponding slice of K. Let H be some fixed hyperplane that is orthogonal to one of the coordinate axes. Since each C i can be replaced by a polytope k = 1 n P i k , we can assume without loss of generality that for each i [ m ] , the sequence of polytopes that approximate C i from outside is monotonically decreasing; that is, P i k P i k + 1 for all k 1 . For any fixed compact convex set L H , Lemma 6 yields
d ( L P i k , L C i ) 0 ,
and thus the ( n 1 ) -dimensional volume of the two sets also converges. Picking L to be P i 1 H , we see that L P i k = H P i k and L C i is H C i and thus equation (41) yields
d ( H P i k , H C i ) 0 .
The sequence of set H P i k for k 1 is an outer approximation to H C i that converges in the Hausdorff metric. Therefore, using Lemma 6,
d ( ( H P i 1 k ) ( H P i l k ) , ( H C i 1 ) ( H C i l ) ) 0 .
Using the continuity of the volume functional,
V n 1 ( ( H P i 1 k ) ( H P i l k ) ) V n 1 ( ( H C i 1 ) ( H C i l ) ) .
Now an identical argument as above says that the ( n 1 ) -dimensional volume of H K is obtained via an inclusion exclusion principle applied to the convex sets H C i for i [ m ] . Applying Equation (44) to all the terms in the inclusion exclusion expression, we conclude that
V n 1 ( H P k ) V n 1 ( H K ) .
This concludes the proof. □
Note that there is nothing restricting us to hyperplanes parallel to e i . For example, suppose we have slice information available via hyperplanes parallel to { u 1 , , u m } for some unit vectors u i for i [ m ] . In this case, we have the inequality
1 V n ( K ) K j = 1 m | n ( x ) · u j | d S j = 1 m I 1 ( X · u j ) .
Using the slice information, we may lower bound I 1 ( X · u i ) via Corollary 3. Suppose this bound is 1 V n ( K ) j = 1 m B j . To arrive at a lower bound for the surface area, all we need is the best possible constant C n such that
C n j = 1 m | n ( x ) · u j |
for all unit vectors n ( x ) . (This constant happened to be n when u j ’s were the coordinate vectors.) With such a constant, we may conclude
V n 1 ( K ) j = 1 m B j C n .
In Appendix E, we work out the surface area lower bound from Theorem 7 for a particular example of a nonconvex (yet polyconvex) set.

4. Conclusions

In this paper, we provided two different families of geometric inequalities to provide (a) Lower bounds on the volumes of convex sets using their slices and (b) Lower bounds on the surface areas of polyconvex sets using their slices. These inequalities were derived using information theoretic tools. The volume bounds were obtained by using the Brascamp-Lieb subadditivity of entropy in conjunction with entropy bounds for log-concave random variables. Our main innovation in the surface area bounds is interpreting superadditivity of Fisher information as a consequence of the data-processing inequality applied to perturbed random variables. With this interpretation, we show that using the total variation distance for data-processing allows us to derive superadditivity results for the L 1 -Fisher information. Crucially, the L 1 -Fisher information is well-defined even for non-smooth densities and thus we are able to calculate it for uniform distributions over compact sets.
There are a number of future directions worth pursuing. One interesting question is whether the volume bounds can be tightened further using entropy bounds for log-concave random variables that depend not just on the maximum value of the density but also on the size of the support. Note that this means knowing the largest slices as well as the sizes of the projections of a convex set. Another interesting question is characterizing the equality cases of the superadditivity of Fisher information in Theorem 5 and thereby get a better understanding of when the resulting bounds provide meaningful estimates on the surface area of geometric body.

Author Contributions

Both authors contribute equally.

Funding

V.J. gratefully acknowledges the support by NSF through the grants CCF-1841190 and CCF-1907786, and by the Wisconsin Alumni Research Foundation (WARF) through the Fall Research Competition Awards.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 2

If K P , the assumption in Lemma 2 may be verified. Clearly, K has finitely many faces F 1 , F 2 , , F M . For a line intersecting K in some closed intervals, one of two events can happen. Either F j is one of the intervals or the interval has endpoints that are marked by F i 1 and F i 2 for some i 1 , i 2 [ M ] . The maximum number of intervals may be loosely bounded by L : = M + M 2 , which is finite.
We show (26) for i = 1 and the others can be proved in the same way. Since X is uniformly distributed over K,
f X ( x ) = 1 V n ( K ) x K , 0 otherwise .
Let
F ( x 2 , , x n , ϵ ) = R | f X ( x 1 , , x n ) f X ( x 1 ϵ , , x n ) | ϵ d x 1 .
We claim that there exists g ( x 2 , , x n ) L 1 , such that
F ( x 2 , , x n , ϵ ) g ( x 2 , , x n ) .
This would allow us to use the dominated convergence theorem to conclude
lim ϵ 0 R n 1 F ( x 2 , , x n , ϵ ) d x 2 d x n = R n 1 lim ϵ 0 F ( x 2 , , x n , ϵ ) d x 2 d x n
Fix the coordinates x 2 , , x n . If ( x 2 , , x n ) P e 1 ( K ) , we clearly have F ( x 2 , , x n , ϵ ) = 0 . Let ( x 2 , , x n ) P e 1 ( K ) . Since K intersects any straight line at most L times, f X ( x 1 , , x n ) is a constant function on at most L line segments and 0 else where. We can write it as i f i ( x 1 ) where each f i is a constant function with value 1 V n ( K ) on a small interval of x 1 and 0 elsewhere. Let f k be a function in this sum. We consider F ( x 2 , , x n , ϵ ) in the following situations.
1.
Support of f k is larger than or equal to ϵ , then
R | f k ( x 1 ) f k ( x 1 ϵ ) | ϵ d x = 2 · 1 V n ( K ) ϵ / ϵ = 2 V n ( K ) .
See Figure A1.
Figure A1. Support > ϵ .
Figure A1. Support > ϵ .
Entropy 21 00809 g0a1
2.
Support of f k is ϵ < ϵ , then
R | f k ( x 1 ) f k ( x 1 ϵ ) | ϵ d x 1 = 2 · 1 V n ( K ) ϵ / ϵ 2 V n ( K ) .
See Figure A2.
Figure A2. Support < ϵ .
Figure A2. Support < ϵ .
Entropy 21 00809 g0a2
In both cases, we have
R | f k ( x 1 ) f k ( x 1 ϵ ) | ϵ d x 1 2 V n ( K ) .
Therefore
R | f X ( x 1 , , x n ) f X ( x 1 ϵ , , x n ) | ϵ d x 1 = R | i f i ( x 1 ) i f i ( x 1 ϵ ) | ϵ d x 1 R i | f i ( x 1 ) f i ( x 1 ϵ ) | ϵ d x 1 i 2 V n ( K ) 2 L V n ( K ) .
Let
g ( x 2 , , x n ) = 2 L V n ( K ) ( x 2 , , x n ) P e 1 ( K ) , 0 otherwise .
Then F ( x 2 , , x n , ϵ ) g ( x 2 , , x n ) and
R n 1 g ( x 2 , , x n ) d x 2 d x n = P e 1 ( K ) 2 L V n ( K ) d x 2 d x n = 2 L V n 1 ( P e 1 ( K ) ) V n ( K ) ,
which shows that g is integrable. Using the dominated convergence theorem, we know that
lim ϵ 0 + R n | f X ( x 1 , , x n ) f X ( x 1 ϵ , , x n ) | ϵ d x 1 d x 2 d x n = lim ϵ 0 + R n 1 F ( x 2 , , x n , ϵ ) d x 2 d x n = R n 1 lim ϵ 0 + F ( x 2 , , x n , ϵ ) d x 2 d x n = R n 1 d x 2 d x n lim ϵ 0 + R | f X ( x 1 , , x n ) f X ( x 1 ϵ , , x n ) | ϵ d x 1 .
Lastly, by Corollary 1,
lim ϵ 0 + R | f X ( x 1 , , x n ) f X ( x 1 ϵ , , x n ) | ϵ d x 1 = 2 N ( x 2 , , x n ) V n ( K ) .
This concludes the proof.

Appendix B. Proof of Lemma 3

Recall that we need to show
1 V n ( K ) K | n ( x ) · e i | d S = R n 1 2 N i ( , x i ^ , ) V n ( K ) d x 1 d x i ^ d x n
Without loss of generality, let i = 1 . Denote K as j = 1 M F j where F j are the faces of K. Let n j be the outward normal to F j for j [ m ] . We have the equality
V n 1 ( P e 1 ( F j ) ) = P e 1 ( F j ) d x 2 d x 3 d x n = | n j · e 1 | V n 1 ( F j ) .
Summing up for all F j , the left hand side of (A1) is given by
1 V n ( K ) K | n ( x ) · e 1 | d S = 1 V n ( K ) i = 1 M | n j · e 1 | V n 1 ( F j ) = i = 1 M 1 V n ( K ) P e 1 ( F j ) d x 2 d x 3 d x n .
If for some n j , the equality n j · e 1 = 0 holds, then V n 1 ( P e 1 ( F j ) ) = 0 . Clearly,
P e 1 ( F j ) d x 2 d x 3 d x n = P e 1 ( F j ) 2 N ( x 2 , , x n ) = 0 .
Without loss of generality, we assume n j · e 1 0 . Let δ P e 1 ( F j ) ( x 2 , , x n ) be the indicator function on P e 1 ( F j ) ; that is,
δ P e 1 ( F j ) ( x 2 , , x n ) = 1 ( x 2 , , x n ) P e 1 ( F j ) , 0 otherwise .
Then
i = 1 M P e 1 ( F j ) d x 2 d x n = i = 1 M R n 1 δ P e 1 ( F j ) ( x 2 , , x n ) d x 2 d x n = R n 1 i = 1 M δ P e 1 ( F j ) ( x 2 , , x n ) d x 2 d x n .
For every ( x 2 , , x n ) , there will be 2 N ( x 2 , , x n ) many F j ’s such that ( x 2 , , x n ) F j . Therefore
R n 1 i = 1 k δ P e 1 ( U i ) ( x 2 , , x n ) d x 2 d x n = R n 1 2 N ( x 2 , , x n ) d x 2 d x n ,
which completes the proof.

Appendix C. Proof of Lemma 4

We claim that
lim ϵ 0 + a i + ϵ a i + 1 | f X ( x ) f X ( x ϵ ) | ϵ d x = | f X ( a i + 1 ) f X ( a i + ) | , ( i = 0 , , M ) , and
lim ϵ 0 + a i a i + ϵ | f X ( x ) f X ( x ϵ ) | ϵ d x = | f X ( a i + ) f X ( a i ) | , ( i = 1 , , M ) .
If f X is increasing on ( a i , a i + 1 ) , then
lim ϵ 0 + a i + ϵ a i + 1 | f X ( x ) f X ( x ϵ ) | ϵ d x = lim ϵ 0 + a i + ϵ a i + 1 f X ( x ) f X ( x ϵ ) ϵ d x = lim ϵ 0 + a i + ϵ a i + 1 f X ( x ) ϵ d x a i + ϵ a i + 1 f X ( x ϵ ) ϵ d x = lim ϵ 0 + a i + ϵ a i + 1 f X ( x ) ϵ d x a i a i + 1 ϵ f X ( x ) ϵ d x = lim ϵ 0 + a i a i + ϵ f X ( x ) ϵ d x + a i + 1 ϵ a i + 1 f X ( x ) ϵ d x = f X ( a i + ) + f X ( a i + 1 ) .
The last equality is true since a i a i + ϵ f X ( x ) ϵ = f X ( θ ) for a i < θ < a i + ϵ due to mean value theorem. This value approaches f X ( a i + ) when ϵ 0 + . Using the same argument, we can show lim ϵ 0 + a i + 1 ϵ a i + 1 f X ( x ) ϵ d x = f X ( a i + 1 ) . Similarly, when f X is decreasing on ( a i , a i + 1 ) , we have
lim ϵ 0 + a i + ϵ a i + 1 | f X ( x ) f X ( x ϵ ) | ϵ d x = | f X ( a i + ) f X ( a i + 1 ) |
Therefore we have established (A2). Similarly, a i a i + ϵ | f X ( x ) f X ( x ϵ ) | ϵ = | f X ( θ ) f X ( θ ϵ ) | for a i < θ < a i + ϵ . Since a i ϵ < θ ϵ < a i , this approaches | f X ( a i ) f X ( a i + ) | as ϵ 0 + . So we have also established (A3). Lastly,
I 1 ( X ) = lim ϵ 0 + R | f X ( x ) f X ( x ϵ ) | ϵ d x = lim ϵ 0 + i = 0 M a i + ϵ a i + 1 | f X ( x ) f X ( x ϵ ) | ϵ d x + i = 1 M a i a i + ϵ | f X ( x ) f X ( x ϵ ) | ϵ d x = i = 0 M | f X ( a i + 1 ) f X ( a i + ) | + i = 1 M | f X ( a i + ) f X ( a i ) | .

Appendix D. Proof of Lemma 5

Without loss of generality, assume u = e 1 . Let K = i = 1 m P i where P i are compact, convex polytopes. Denote the projection of f X ( · ) restricted to some set C K on the e 1 axis by f C · e 1 ( · ) . If C is a convex polytope, we may verify that f C · e 1 is log concave and therefore a continuous function on some closed interval. Furthermore, if C is a convex and compact polytope, then we may triangulate C; i.e., express C = i = 1 r T i where T i are n-dimensional compact simplices for i [ r ] such that their interiors partition the interior of C. Then f C · e 1 = i = 1 r f T i · e 1 . Each function in the summation is a degree ( n 1 ) polynomial with a compact interval as its support in R [37]. Thus, f C · e 1 is a continuous function consisting of finitely many pieces such that f C · e 1 restricted to each piece is a polynomial of degree ( n 1 ) . Note that the overall density f K · e 1 : = f X · e 1 ( · ) is given via the inclusion exclusion principle by
f K · e 1 = i = 1 m f P i i 1 < i 2 f P i 1 P i 2 + i 1 < i 2 < i 3 f P i 1 P i 2 P i 3 +
For each collection of indices i 1 , , i k , we have that j = 1 k P i j is a compact, convex polytope, possibly with 0 volume but such sets do not contribute to the above sum so we only consider cases where the intersection has a positive volume. The sum (or difference) of finitely many bounded continuous functions on closed intervals is easily seen to satisfy the following property: We may find finitely many points = γ 0 < γ 1 < < γ R < γ R + 1 = + such that the function is continuous on each open interval ( γ i , γ i + 1 ) and the left and right limits at the endpoints in each interval are finite. To verify the assumptions in Lemma 4, we simply check that on each interval ( γ i , γ i + 1 ) , the function does not have infinitely many local optima. This is clearly true since restricted to ( γ i , γ i + 1 ) , the function is a piecewise polynomial of degree ( n 1 ) . This proves the claim.
Remark A1.
Note that in general, it is possible for the difference of log concave functions to have infinitely many local optima. For example, if f 1 , f 2 : [ 1 , 1 ] R such that f 1 ( x ) = 2 x 2 and f 2 ( x ) = ( 2 x 2 ) + e 1 / x 2 sin ( 1 / x ) 2 , then both functions are concave and positive and therefore log-concave. However, f 2 f 1 = e 1 / x 2 sin ( 1 / x ) 2 has infinitely many local optima close to 0. The observation that the marginals in our case are piecewise polynomials is therefore necessary in the above argument.

Appendix E. Example

To illustrate (34), we consider the example of a cube with a hole in it as in Figure A3. Note that this is a nonconvex set but is easily seen to be polyconvex. The density f X 1 ( x 1 ) can be computed using area of slices along x 1 -axis and
I 1 ( X 1 ) = | f X 1 ( 0 ) f X 1 ( ) | + | f X 1 ( 1 ) f X 1 ( 0 ) | + | f X 1 ( 3 ) f X 1 ( 2 ) | + | f X 1 ( ) f X 1 ( 3 ) | = | 9 26 0 | + | 8 26 9 26 | + | 9 26 8 26 | + | 0 9 26 | = 20 26 .
By symmetry,
1 3 ( I 1 ( X 1 ) + I 1 ( X 2 ) + I 1 ( X 3 ) ) = 60 26 3 1.33 ,
By direct calculation,
V n 1 ( K ) V n ( K ) = 48 26 1.85 .
Figure A3. Cube with a hole.
Figure A3. Cube with a hole.
Entropy 21 00809 g0a3

References

  1. Gardner, R. Geometric Tomography; Cambridge University Press: Cambridge, UK, 1995; Volume 58. [Google Scholar]
  2. Campi, S.; Gronchi, P. Estimates of Loomis–Whitney type for intrinsic volumes. Adv. Appl. Math. 2011, 47, 545–561. [Google Scholar] [CrossRef]
  3. Wulfsohn, D.; Gunderson, H.; Jensen, V.; Nyengaard, J. Volume estimation from projections. J. Microsc. 2004, 215, 111–120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Wulfsohn, D.; Nyengaard, J.; Gundersen, H.; Jensen, V. Stereology for Biosystems Engineering. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.497.404&rep=rep1&type=pdf (accessed on 16 August 2019).
  5. Shepherd, T.; Rankin, A.; Alderton, D. A Practical Guide to Fluid Inclusion Studies; Blackie Academic & Professional: Los Angeles, CA, USA, 1985. [Google Scholar]
  6. Bakker, R.; Diamond, L. Estimation of volume fractions of liquid and vapor phases in fluid inclusions, and definition of inclusion shapes. Am. Mineral. 2006, 91, 635–657. [Google Scholar] [CrossRef]
  7. Connelly, R.; Ostro, S. Ellipsoids and lightcurves. Geometriae Dedicata 1984, 17, 87–98. [Google Scholar] [CrossRef]
  8. Ostro, S.; Connelly, R. Convex profiles from asteroid lightcurves. Icarus 1984, 57, 443–463. [Google Scholar] [CrossRef]
  9. Loomis, L.; Whitney, H. An inequality related to the isoperimetric inequality. Bull. Am. Math. Soc. 1949, 55, 961–962. [Google Scholar] [CrossRef] [Green Version]
  10. Burago, Y.; Zalgaller, V. Geometric Inequalities; Springer Science & Business Media: Berlin, Germany, 2013; Volume 285. [Google Scholar]
  11. Bollobás, B.; Thomason, A. Projections of bodies and hereditary properties of hypergraphs. Bull. Lond. Math. Soc. 1995, 27, 417–424. [Google Scholar] [CrossRef]
  12. Ball, K. Shadows of convex bodies. Trans. Am. Math. Soc. 1991, 327, 891–901. [Google Scholar] [CrossRef]
  13. Ball, K. Convex geometry and functional analysis. Handb. Geom. Banach Spaces 2001, 1, 161–194. [Google Scholar]
  14. Bennett, J.; Carbery, A.; Christ, M.; Tao, T. The Brascamp–Lieb inequalities: Finiteness, structure and extremals. Geom. Funct. Anal. 2008, 17, 1343–1415. [Google Scholar] [CrossRef]
  15. Balister, P.; Bollobás, B. Projections, entropy and sumsets. Combinatorica 2012, 32, 125–141. [Google Scholar] [CrossRef]
  16. Gyarmati, K.; Matolcsi, M.; Ruzsa, I. A superadditivity and submultiplicativity property for cardinalities of sumsets. Combinatorica 2010, 30, 163–174. [Google Scholar] [CrossRef] [Green Version]
  17. Madiman, M.; Tetali, P. Information inequalities for joint distributions, with interpretations and applications. IEEE Trans. Inf. Theory 2010, 56, 2699–2713. [Google Scholar] [CrossRef]
  18. Betke, U.; McMullen, P. Estimating the sizes of convex bodies from projections. J. Lond. Math. Soc. 1983, 2, 525–538. [Google Scholar] [CrossRef]
  19. Schneider, R. Convex Bodies: The Brunn–Minkowski Theory; Number 151; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  20. Meyer, M. A volume inequality concerning sections of convex sets. Bull. Lond. Math. Soc. 1988, 20, 151–155. [Google Scholar] [CrossRef]
  21. Li, A.J.; Huang, Q. The dual Loomis-Whitney inequality. Bull. Lond. Math. Soc. 2016, 48, 676–690. [Google Scholar] [CrossRef]
  22. Liakopoulos, D.M. Reverse Brascamp–Lieb inequality and the dual Bollobás–Thomason inequality. Archiv der Mathematik 2019, 112, 293–304. [Google Scholar] [CrossRef]
  23. Campi, S.; Gardner, R.; Gronchi, P. Reverse and dual Loomis-Whitney-type inequalities. Trans. Am. Math. Soc. 2016, 368, 5093–5124. [Google Scholar] [CrossRef]
  24. Carlen, E. Superadditivity of Fisher’s information and logarithmic Sobolev inequalities. J. Funct. Anal. 1991, 101, 194–211. [Google Scholar] [CrossRef]
  25. Carlen, E.; Cordero-Erausquin, D. Subadditivity of the entropy and its relation to Brascamp–Lieb type inequalities. Geom. Funct. Anal. 2009, 19, 373–405. [Google Scholar] [CrossRef]
  26. Beckenbach, E.; Bellman, R. Inequalities; Springer Science & Business Media: Berlin, Germany, 2012; Volume 30. [Google Scholar]
  27. Saumard, A.; Wellner, J. Log-concavity and strong log-concavity: A review. Stat. Surv. 2014, 8, 45. [Google Scholar] [CrossRef] [PubMed]
  28. Bobkov, S.; Madiman, M. The entropy per coordinate of a random vector is highly constrained under convexity conditions. IEEE Trans. Inf. Theory 2011, 57, 4940–4954. [Google Scholar] [CrossRef]
  29. Cover, T.T.; Thomas, J. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
  30. Jog, V.; Anantharam, V. Intrinsic entropies of log-concave distributions. IEEE Trans. Inf. Theory 2017, 64, 93–108. [Google Scholar] [CrossRef]
  31. Dembo, A.; Cover, T.; Thomas, J. Information theoretic inequalities. IEEE Trans. Inf. Theory 1991, 37, 1501–1518. [Google Scholar] [CrossRef]
  32. Madiman, M.; Barron, A. Generalized entropy power inequalities and monotonicity properties of information. IEEE Trans. Inf. Theory 2007, 53, 2317–2329. [Google Scholar] [CrossRef]
  33. Ball, K. Volumes of sections of cubes and related problems. In Geometric Aspects of Functional Analysis; Springer: Berlin, Germany, 1989; pp. 251–260. [Google Scholar]
  34. Sason, I.; Verdu, S. f-divergence Inequalities. IEEE Trans. Inf. Theory 2016, 62, 5973–6006. [Google Scholar] [CrossRef]
  35. Klain, D.A.; Rota, G.C. Introduction to Geometric Probability; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
  36. Meschenmoser, D.; Spodarev, E. On the computation of intrinsic volumes. preprint 2012. [Google Scholar]
  37. Lasserre, J. Volume of slices and sections of the simplex in closed form. Optim. Lett. 2015, 9, 1263–1269. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Perturbing a set by ϵ .
Figure 1. Perturbing a set by ϵ .
Entropy 21 00809 g001
Figure 2. Uniform distribution over a union of squares.
Figure 2. Uniform distribution over a union of squares.
Entropy 21 00809 g002

Share and Cite

MDPI and ACS Style

Hao, J.; Jog, V. Dual Loomis-Whitney Inequalities via Information Theory. Entropy 2019, 21, 809. https://doi.org/10.3390/e21080809

AMA Style

Hao J, Jog V. Dual Loomis-Whitney Inequalities via Information Theory. Entropy. 2019; 21(8):809. https://doi.org/10.3390/e21080809

Chicago/Turabian Style

Hao, Jing, and Varun Jog. 2019. "Dual Loomis-Whitney Inequalities via Information Theory" Entropy 21, no. 8: 809. https://doi.org/10.3390/e21080809

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop