Projection to Mixture Families and Rate-Distortion Bounds with Power Distortion Measures †

The explicit form of the rate-distortion function has rarely been obtained, except for few cases where the Shannon lower bound coincides with the rate-distortion function for the entire range of the positive rate. From an information geometrical point of view, the evaluation of the rate-distortion function is achieved by a projection to the mixture family defined by the distortion measure. In this paper, we consider the β-th power distortion measure, and prove that β-generalized Gaussian distribution is the only source that can make the Shannon lower bound tight at the minimum distortion level at zero rate. We demonstrate that the tightness of the Shannon lower bound for β = 1 (Laplacian source) and β = 2 (Gaussian source) yields upper bounds to the rate-distortion function of power distortion measures with a different power. These bounds evaluate from above the projection of the source distribution to the mixture family of the generalized Gaussian models. Applying similar arguments to e-insensitive distortion measures, we consider the tightness of the Shannon lower bound and derive an upper bound to the distortion-rate function which is accurate at low rates.


Introduction
The rate-distortion function, R(D), shows the minimum achievable rate to reproduce source outputs with the expected distortion not exceeding D. The Shannon lower bound (SLB) has been used for evaluating R(D) [1,2].The tightness of the SLB for the entire range of the positive rate identifies the entire R(D) for pairs of a source and distortion measure such as the Gaussian source with squared distortion [1], the Laplacian source with absolute magnitude distortion [2], and the gamma source with Itakura-Saito distortion [3].However, such pairs are rare examples.In fact, for a fixed distortion measure, there exists only a single source that makes the SLB tight for all D, as we will prove in Section 2.3.The necessary and sufficient condition for the tightness of the SLB was first obtained for the squared distortion [4], discussed for a general difference distortion measure d [2], and recently described in terms of d-tilted information [5].While these results consider the tightness of the SLB for each point of R(D) (i.e., for each D), we discuss the tightness for all D in this paper.More specifically, if we focus on the minimum distortion at zero rate (denoted by D max ), the tightness of the SLB at D max characterizes a condition between the source density and the distortion measure.
If the SLB is not tight, the explicit evaluation of the rate-distortion function has been obtained only in limited cases [6][7][8][9].Little is inferred on the behavior of R(D) when the distortion measure is varied from a known case, since R(D) does not continuously change even if the distortion measure is continuously modified.Although the SLB is easily obtained for difference distortion measures, it is unknown how accurate the SLB is without the explicit evaluation, upper bound, or numerical calculation of the rate-distortion function.
In this paper, we consider the constrained optimization of the definition of R(D) from an information geometrical viewpoint [10].More specifically, we show that it is equivalent to a projection of the source distribution to the mixture family defined by the distortion measure.If the source is included in the mixture family, the SLB is tight; if it is not tight, the gap between R(D) and its SLB evaluates the minimum Kullback-Leibler divergence from the source to the mixture family (Lemma 1).Then, using the bounds of the rate-distortion function of the β-th power difference distortion measure obtained in [11], we evaluate the projections of the source distribution to the mixture families associated with this distortion measure (Theorem 3).
Operational rate-distortion results have been obtained for the uniform scalar quantization of the generalized Gaussian source under the β-th power distortion measure [12,13].We prove that only the β-generalized Gaussian distribution has the potential to be the source whose SLB is tight; that is, identical to the rate-distortion function for the entire rage of positive rate.This fact brings knowledge on the tightness of the SLB of an -insensitive distortion measure, which is obtained by truncating the loss function near zero error [14][15][16].The above result implies that the SLB is not tight if the source is the β-generalized Gaussian and the distortion has another power γ = β.We demonstrate that even in such a case, a novel upper bound to R(D) can be derived from the condition for the tightness of the SLB.The fact that the Laplacian (β = 1) and the Gaussian (β = 2) sources have the tight SLB specifically derives a novel upper bound to R(D) of γ( = β)-th power distortion measure, which has a constant gap from the SLB for all D. By the relationship between the SLB and the projection in the information geometry, the gap evaluates the projections of the β-generalized Gaussian source to the mixture families of γ-generalized Gaussian models.Extending the above argument to -insensitive loss, we derive an upper bound to the distortion-rate function, which is tight in the limit of zero rate.

Rate-Distortion Function
Let X and Y be real-valued random variables of a source output and reconstruction, respectively.For the distortion measure between x and y, d(x, y), the rate-distortion function R(D) of the source X ∼ p(x) is defined by where is the mutual information and E denotes the expectation with respect to q(y|x)p(x).R(D) shows the minimum achievable rate R to reconstruct source outputs with average distortion not exceeding D under the distortion measure d [2,17].The distortion-rate function, D(R), is the inverse function of the rate-distortion function.
If the conditional distribution q s (y|x) achieves the minimum of the following Lagrange function parameterized by s ≥ 0, then the rate-distortion function is parametrically given by R(D s ) = I(q s ), The parameter s corresponds to the (negated) slope of the tangent of R(D) at (D s , R(D s )), and hence is referred to as the slope parameter [2].Alternatively, the rate-distortion function is given by ( [18], Theorem 4.5.1):R(D) = sup s≥0 min q(y) E − log e −sd(X,y) q(y)dy − sD . ( If the marginal reconstruction density q s (y) achieves the minimum above, the optimal conditional reconstruction distribution is given by q s (y|x) = e −sd(x,y) q s (y) e −sd(x,y) q s (y)dy , (see, for example, [2,19]).
From the properties of the rate-distortion function R(D), we know that R(D) > 0 for 0 < D < D max , where and R(D) = 0 for D ≥ D max [2] (p.90).Hence, D max = lim R→0 D(R).

Shannon Lower Bound
In this paper, we focus on difference distortion measures, for which Shannon derived a general lower bound to the rate-distortion function [1] ( [2], Chapter 4).Throughout this paper, we assume that the function ρ is nonnegative and satisfies for all s > 0. It follows that C s = e −sρ(z) dz = e −sd(z,0) dz = e −sd(x,µ) dx = e −sρ(x−µ) dx, denote the Kullback-Leibler divergence from p to r, which is non-negative and equal to zero if and only if p(x) = r(x) almost everywhere.We define the distribution Then, the Shannon lower bound (SLB) is defined by where h(p) is the differential entropy of the probability density p, and s is related to D by The next lemma shows that the SLB is in fact a lower bound to the rate-distortion function and that the difference between them is lower bounded by the Kullback-Leibler divergence.
Lemma 1.For a source with probability density function p(x) and the difference distortion measure (5), where s and D are related to each other by ( 9) and is the convolution between g s and q.
Proof.Let s be the slope parameter s satisfying (9) for D. From (2), we have which completes the proof.
In the information geometry, for a family of distributions M and a given distribution p, the distributions that achieve the minimums min r∈M K(p||r) and min r∈M K(r||p) are called the m-projection and e-projection of p to M, respectively [10].As a family of distributions, the (M − 1)-dimensional mixture family spanned by {p 1 (x), Hence, from the information geometrical viewpoint, the above lemma shows that the difference between R(D) and R(D) evaluates the m-projection min of the source distribution p to M s = m s (x) = (g s * q)(x) q(y) ≥ 0 (∀y), q(y)dy = 1 , the (infinite-dimensional) mixture family defined by {g s (x − y)|y ∈ R}.
It is also easy to see from the lemma that the SLB coincides with R(D) (that is, R(D) = R(D) holds in ( 8)) if and only if the source random variable X with density p(x) can be represented as the sum of two independent random variables, one of which is distributed according to the probability density function g s (x) in (7).This condition is referred to as the "backward channel" condition, and is equivalent to the fact that the integral equation p(x) = g s (x − y)q s (y)dy (11) has a solution q s (y) which is a valid density function ( [2], Chapter 4).This condition is also equivalent to the fact that p ∈ M s .

Probability Density Achieving Tight SLB for All D
The following theorem claims that for a difference distortion measure, there is at most a unique source for which R(D) is tight at D = D max .
Theorem 1. Assume that the source distribution has a finite D max achieved by a reconstruction µ; that is, E[d(X, µ)] = D max < ∞.The rate-distortion function is strictly greater than the SLB at D = D max , R(D max ) > R(D max ), unless the following holds for the source density almost everywhere: where C s is defined in (6), and s * is determined by the relation Proof.Let Z be the random variable such that Z − µ has the density g s * .As a functional of the source density p(x), the SLB at D = D max is expressed as From the non-negativity of the divergence, R(D max ) is maximized to 0 only if p(x) = g s * (x − µ) holds almost everywhere.
The tightness of the SLB for each D characterizes the form of the backward channel p(x|y) as discussed for example in ( [5], Theorem 4).The above theorem focuses on D = D max and characterizes the relation between the form of the source density p(x) and the distortion measure.
The tightness of the SLB at D = D max is relevant to the tightness for all 0 < D ≤ D max .For some distortion measures (e.g., the squared and absolute distortion measures), the random variable Z s ∼ g s is decomposable into the sum of two independent random variables where Z s ∼ g s and some random variable N for any s > s.The backward channel condition (11) means that in such a case, the tightness of the SLB at D = D max implies the tightness of the SLB for all 0 < D < D max .The condition (11) is closely related to the closure property with respect to convolution.If g s (x − y) is a kernel function associated with a reproducing kernel Hilbert space, such a closure property is studied in detail [20].

β-th Power Distortion Measure
We examine the rate-distortion trade-offs under the β-th power distortion measure where β > 0 is a real exponent.In particular, β = 2 corresponds to the squared error criterion and β = 1 to the absolute one.The corresponding noise model given by ( 7) is where β and Γ is the gamma function.This model is the β-th-order generalized Gaussian distribution including the Gaussian (β = 2) and the Laplace (β = 1) distributions as special cases.Its differential entropy is For a difference distortion measure, we can assume that the µ in Theorem 1 is zero without loss of generality.Thus, as a source, we assume the generalized Gaussian random variable with the density, where 0 < α < +∞, which is a versatile model for symmetric unimodal probability densities.Here the scaling factor α/β is chosen so that The SLB for the source (16) with respect to the distortion measure (14) is ), which follows from ( 8), (15), and the relation between the slope parameter s and the average distortion D s given by (9), It is well known that when β = 2 (Gaussian source and the squared distortion measure), the SLB is tight; that is, R(D) = R(D) for all D [1,2,17].The optimal reconstruction distribution minimizing (2) for this case is given by for 0 < D < 1/α.Additionally, when β = 1 (Laplacian source and the absolute distortion measure), the SLB is tight for all D ([2], Example 4.3.2.1), which is attained by for 0 < D < 1/α, where δ is Dirac's delta function.

Tightness of the SLB
From Theorem 1, we immediately obtain the following corollary, which shows that the β-generalized Gaussian source is the only source that can make the SLB tight at D = D max under the β-th power distortion measure (14).
Under the β-th power distortion measure (14), the rate-distortion function is strictly greater than the SLB at D = D max , R(D max ) > R(D max ), unless the source distribution is the β-generalized Gaussian with the density (16).
In the case of β = 1, the rate-distortion function of the -insensitive distortion measure, was studied [16].It was proved that a necessary condition for R(D s ) = R(D s ) at a slope parameter s is that R(D s ) = R(D s ) also holds for = 0.According to Theorem 1, this fact derives a contradiction if there is a source that makes the SLB of the distortion measure (17) tight at D = D max .Thus, we have the following corollary.
Corollary 2. Under the -insensitive distortion measure (17) with > 0, no source makes the SLB tight at D = D max .

Rate-Distortion Bounds for Mismatching Pairs
From Corollary 1, the SLB cannot be tight for all D if the distortion measure d γ has a different exponent γ from that of the source p β (i.e., γ = β).In this section, we show that even in such a case, accurate upper and lower bounds to R(D) of Laplacian and Gaussian sources can be derived from the fact that R(D) = R(D) for β = 1 and β = 2.
We denote the rate-distortion function and bounds to it by indicating the parameters β and γ of the source and the distortion measure.More specifically, R [γ] β (D) denotes the rate-distortion function for the source p β with respect to the distortion measure d γ .
We first prove the following lemma: holds for γ > 0, where E q 1/(βD) p β denotes the expectation with respect to q 1/(βD) (y|x)p β (x), and is satisfied for the optimal conditional reproduction distribution q s in (3).

Proof. R
[β] for the optimal reproduction distribution q s (y|x) in (3) and q s (y) minimizing ( 2), and D = 1/(βs).It follows that The above lemma implies that q 1/(βD) (y|x) achieving R β (D) for all D. Thus, we obtain the following upper bound to R We also have the SLB for R Therefore, we arrive at the following theorem: where the lower and upper bounds are given by ( 20) and (19).The left inequality becomes equality only for γ = β.The gap between the bounds is which is constant with respect to D. Furthermore, the upper bound is tight at Since the upper bound is tight at D = D max , it is the smallest upper bound that has a constant deviation from the SLB.In addition, the SLB is asymptotically tight in the limit D → 0 for the distortion measure d γ in general [2,21], and the condition for the asymptotic tightness has been weakened recently [22].These facts suggest that the rate-distortion function R β (D) as the average distortion D grows to D max .In terms of the distortion-rate function, the theorem also implies that the encoder q 1/(βD) (y|x) designed for d β -distortion has the loss in d γ -distortion, due to the mismatch of the orders, at most by the constant factor e From Lemma 1 in Section 2.2, by examining the correspondence between the slope parameter 0 < s < ∞ and the distortion level D, we obtain the next theorem, which evaluates the m-projection of the source to the mixture family, s * q)(x) q(y) ≥ 0 (∀y), q(y)dy = 1 .

If the upper bound R [γ]
β is replaced by the asymptotically tight upper bound [2,21], asymptotically tighter bounds to the m-projection are obtained.(9), where δ β is given by (21).For 0 < s ≤ s * , the m-projection is upper bounded as Furthermore, the inequality (22) holds with equality for 0 < s ≤ s min , where s min = lim h→−0 R [γ] Proof.The first part of the theorem is a corollary of Theorem 2 and Lemma 1.The second part corresponds to the case of D ≥ D max , since D monotonically decreases as s grows.Because q(y) = δ(y) yields m s , we have (22).It follows from (13) Since for 0 ≤ s ≤ s min , the optimal reconstruction distribution is given by q s (y) = δ(y), (22) holds with equality.
Since we know that if β = 1 and β = 2, the SLB is tight for all D, we have the following corollaries.
Corollary 4. The rate-distortion function of the Gaussian source, R [γ] 2 (D), is lower-and upper-bounded as Example 1.If we put γ = 1 in Corollary 4 (β = 2), we have The explicit evaluation of R [1] 2 (D) is obtained through a parametric form using the slope parameter s [6].While the explicit parametric form requires evaluations of the cumulative distribution function of the Gaussian distribution, the bounds in (23) demonstrate that it is well approximated by an elementary function of D. In fact, the gap between the upper and lower bounds is δ 2 (D) [6] and its lower and upper bounds in (23).
Example 2. If we put γ = 2 in Corollary 3 (β = 1), we have for the Laplacian source and the squared distortion measure d 2 (x, y) = |x − y| 2 .The gap between the bounds is The upper bound in Theorem 2 implies the following: γ (D) for all D, the γ-generalized Gaussian source has the greatest rate-distortion function among all β-generalized Gaussian sources with a fixed 19) is expressed as −(1/γ) log (D/D max ), which is equal to the rate-distortion function of the γ-generalized Gaussian source under the γ-th power distortion measure if its SLB is tight for all D.
The preceding corollary is well-known in the case of the squared distortion measure, while the Gaussian source has the largest rate-distortion function not only among all β-generalized Gaussian sources, but also among all the sources with a fixed variance ([2], Theorem 4.3.3).

Distortion-Rate Bounds for -Insensitive Loss
As another example of a distortion measure that is not matching with the β-generalized Gaussian source in the sense of Theorem 1, we consider the following γ-th power -insensitive distortion measure generalizing (17), where ρ (z) = max{|z| − , 0}.Such distortion measures are used in support vector regression models [14,15].In this section, we focus on the Laplacian source (β = 1), for which similarly to Section 4, we can evaluate where g s (z) = s 2 e −s|z| and s = 1/D.Such an explicit evaluation appears to be prohibitive for β = 1.The above expected distortion is achievable by q 1/D (y|x) with the rate R = − log(αD) since R 1 (D) holds for all D. Thus, we obtain the following upper bound, which is expressed by a closed form in the case of the distortion-rate function.(R) of the Laplacian source under the γ-th power -insensitive distortion measure (24) is upper-bounded as In addition, the upper bound is tight at R = 0; that is, Upper and (Shannon) lower bounds which are accurate asymptotically as D → 0 have been obtained for the distortion measure (24) [16].They are proved to have approximation error at most O( 2 ) as D → 0. Combined with these bounds, the upper bound (25), being accurate at high distortion levels, provides a good approximation of the rate-distortion function for the entire range of D. This is demonstrated in Figure 2 for the case of = 0.1, γ = 1, and α = √ 2, where the upper bound (25) and that in [16] are referred to as low-rate and high-rate upper bounds because they are effective at low and high rates, respectively.Although the rate-distortion function of this case is still unknown, it lies between the upper bounds and the SLB.Hence, the figure implies that the SLB is accurate for all R, and the rate-distortion function is almost identified except for the region around R = 3 (bits) where there is a relatively large gap between the upper bounds and the SLB.

Conclusions
We have shown that the generalized Gaussian distribution is the only source that can make the SLB tight for all D under the power distortion measure if the orders of the source and the distortion measure are matched.We have also derived an upper bound of the rate-distortion function for the cases when the orders are mismatched, which together with the SLB provides constant-width bounds sandwiching the rate-distortion function, and hence evaluates the m-projection of the source to the mixture family associated with the distortion measure.The derived bounds demonstrate the possibility that the condition for the tightness of the SLB implies knowledge on the behavior of the rate-distortion function of other distortion measures; for example, those defined by composition of functions.In fact, we have obtained an upper bound to the distortion-rate function of -insensitive distortion measures in the case of the Laplacian source.It is an important undertaking to investigate the geometric structure of the mixture family associated with the distortion measure and its relationship to the m-projection; that is, the optimal reconstruction distribution.
has the expected d γ -distortion, is near the SLB at low distortion levels and then approaches the upper bound R [γ]

Corollary 3 .
The rate-distortion function of the Laplacian source, R [γ] 1 (D), is lower-and upper-bounded as