Entropy-Constrained Scalar Quantization with a Lossy-Compressed Bit

We consider the compression of a continuous real-valued source X using scalar quantizers and average squared error distortion D. Using lossless compression of the quantizer’s output, Gish and Pierce showed that uniform quantizing yields the smallest output entropy in the limit D → 0, resulting in a rate penalty of 0.255 bits/sample above the Shannon Lower Bound (SLB). We present a scalar quantization scheme named lossy-bit entropy-constrained scalar quantization (Lb-ECSQ) that is able to reduce the D → 0 gap to SLB to 0.251 bits/sample by combining both lossless and binary lossy compression of the quantizer’s output. We also study the low-resolution regime and show that Lb-ECSQ significantly outperforms ECSQ in the case of 1-bit quantization.


Introduction
Entropy-constrained scalar quantization (ECSQ) is a well-known compression scheme where a scalar quantizer q(•) is followed by a block lossless entropy-constrained encoder [1,2].The two main quantities characterizing ECSQ are its distortion D and rate R.For a real-valued input source X, the most common distortion measure is the mean squared error between the source X and its reconstruction X.As the quantizer q(•) is followed by entropy coding, the rate R is usually defined as the entropy of the random variable at the output of the quantizer, denoted by q(X).
A natural design problem is how to design q(•) to achieve the lowest possible rate with distortion not greater than D. While this problem can be solved numerically with various quantizer optimization algorithms [3][4][5], the expressions are only known when X follows an exponential [4] or uniform [6] distribution.The asymptotic limit D → 0 constitutes an exception, as it is well known that an infinite-level uniform quantizer is optimal for a broad class of source distributions [1,7,8].Further, as D → 0, ECSQ with uniform quantizing is only 0.255 bits above Shannon's lower bound (SLB) to the rate distortion function R(D).SLB tends to R(D) as D → 0, and is equal to R(D) for a Gaussian distributed source.Beyond scalar quantization, vector quantization (VQ) is the most common option to improve ECSQ; i.e., to achieve rates closer to R(D) at the same distortion level [9].
In this communication, we introduce a scalar quantization scheme that, in the limit D → 0, reduces the gap to Shannon's lower bound to the rate distortion function R(D) to 0.251 bits.Furthermore, we show that in the low-resolution regime (1-bit quantization), the proposed scheme can remarkably improve ECSQ.The main idea of the proposed scheme is to encode the quantizer output by combining both lossless compression and binary lossy compression at a given Hamming distortion D H , which offers an additional degree of freedom.
The compression scheme is straightforward, as we only need to expand ECSQ with an additional bit that encodes if the source symbol was in the left half or the right half of the quantization region that contained the source symbol.In other words, this scheme codes the least significant quantization bit lossily, allowing a certain Hamming distortion D H .
We refer to the proposed method as lossy-bit ECSQ (Lb-ECSQ).Note that Lb-ECSQ contains ECSQ as a particular solution, as ECSQ is recovered when the allowed distortion at the least significant quantization bit is set to zero.
The Lb-ECSQ method resembles works in the field of source-channel coding-namely, channel-optimized quantization [10].Interestingly, when the output of a scalar quantizer is coded and transmitted via a very noisy channel, quantizers with a small number of levels (higher distortion) may yield better performance than those with a larger number of levels (lower distortion) [11].Several works have addressed the design of scalar quantizers for noisy channels (e.g., [10][11][12]).All these works present conditions and algorithms to optimize the scalar quantizer given that it is followed by a noisy channel.This is similar to the Lb-ECSQ setup, where the lossy binary encoder behaves like a "noisy channel", with an important and critical difference: in our problem, the distortion introduced by the lossy encoder (the"error probability" of the channel) is a parameter to be optimized, and acts as an additional degree of freedom.Note also that we solely consider the problem of source coding of a continuous source; encoded symbols are transmitted errorless to the receiver that aims at reconstructing the source.
We also study the low-resolution regime, in which we only encode the source with the lossy-bit-namely, 1-bit quantizer followed by a lossy entropy encoder.Results are distribution-dependent for the low-resolution regime, and we focus on the uniform and Gaussian distributions, which are interesting cases that show different behaviors.For example, in this low-resolution regime, the distortion can be reduced by 10% for a uniform distribution when we use 0.2 bits/sample.
In Section 2 of the paper, we review the analysis of ECSQ for an infinite-level uniform quantizer in the limit D → 0. The asymptotic analysis of Lb-ECSQ for the same quantizer and same limit is presented in Section 3. In Section 4, we move to the opposite limit and compare both scalar quantization schemes with 1-bit quantizers.

ECSQ and the Uniform Quantizer
Suppose that a source produces the sequence of independent and identically distributed (i.i.d.) real-valued random variables {X k , k ∈ Z} according to the distribution p X (x).A scalar quantizer is defined as a deterministic mapping q(•) from the source alphabet X ⊆ R to the reconstruction alphabet X , which is assumed to be countable.By Shannon's source coding theorem, q(X) can be losslessly described by a variable-length code whose expected length is roughly equal to its entropy H(q(X)).In ECSQ, this quantity constitutes the rate of the quantizer q(•).Additionally, the mean squared-error distortion incurred by the scalar quantizer q(•) is given by where E X denotes that the expectation is computed w.r.t. the source distribution p X (x).Consider the set of quantizers q(•) for which the squared distortion in Equation ( 1) is smaller or equal to D ∈ R + , and let R s (D) be the smallest rate achievable among this set; more precisely Under some constraints on the continuity and decay of p X (x), Gish and Pierce showed that in the limit D → 0, R s (D) can asymptotically be achieved by the infinite-level uniform quantizer, whose quantization regions partition the real line into intervals of equal lengths [1].Further, they showed that lim where R(D) is the rate-distortion function of the source [13].In the rest of this section, we briefly review the asymptotic analysis of ECSQ with uniform quantization following the approach described in [7,8].
We later rely on intermediate results to analyze the Lb-ECSQ scheme for the uniform quantizer.
The following conditions are assumed for the source [7,8]: C1 p X (x) log p X (x) is integrable, ensuring that the differential entropy h(X) is well-defined and finite; and C2 the integer part of the source X has finite entropy; i.e., otherwise, R(D) is infinite [14].
Denote the infinite-level uniform quantizer by q u (•), and let δ be the interval length.For x ∈ R, we have where (n + 1 2 )δ is the reconstruction value for interval n, and 1[•] denotes the indicator function.
We define the piecewise-constant probability density function p X (x) as follows: where p n (n+1)δ nδ p X (u)du is the probability that x belongs to that interval, and ∑ n p n = 1.To evaluate the squared error distortion, we first decompose E[(X − q u (X)) 2 ] as follows: As shown in [7,8], the absolute value of the second term in the above equation can be upper-bounded by p (δ) X (x) − p X (x) dx, and this term vanishes as δ → 0 according to Lebesgue's differentiation theorem and Scheffe's lemma (Th.16.12) [15].Thus, On the other hand, following [1], we express the entropy of the quantizer's output H(q u (X)) as follows: As shown in [16], the integral in the above expression converges to h(X) as δ → 0, hence where o(1) refers to error terms that vanish as δ tends to zero.We conclude that the uniform quantizer bits per sample, where o(1) comprises error terms that vanish as D tends to zero.Further, for sources X satisfying conditions C1 and C2, the rate-distortion function R(D) can be approximated as [17] Without the o(1) term, the right-hand side (RHS) of Equation ( 11) is referred to the Shannon lower bound (SLB).By combining Equations ( 10) and (11), we obtain lim

Uniform Quantization with a Lossy-Compressed Bit
The above results show that-according to Equation ( 2)-uniform quantizers are asymptotically optimal as the allowed distortion D vanishes.In the following, we present a simple scheme that-while maintaining the scalar uniform quantizer-reduces the gap to the rate distortion function of the source below Equation (12).To this end, the quantizer's output is compressed using both lossless and lossy compression, and thus the compression rate is no longer measured by the entropy of the quantizer's output.Unlike in [1], we do not claim that uniform quantization is optimal according to the proposed definition of compression rate.Consider again the uniform quantizer q u (•) with interval length δ.Given X and q u (X), let b(X) be a binary random variable such that

Compression with a Lossy-Compressed Bit
Given the random variable (q u (X), b(X)), we maintain the lossless variable-length encoder to compress q u (X).Moreover, the binary random variable b(X) is lossy compressed with a certain Hamming distortion D H , which is a free parameter to be tuned to minimize the squared error distortion.We refer to this compression scheme as ECSQ with a lossy-compressed bit (Lb-ECSQ).
We assume that lossy-compression of b(X) at a Hamming distortion D H is optimally done, achieving the rate distortion function for a Bernoulli source with probability P b P(b(X) = 1).While this assumption is somewhat unrealistic, our main goal in this paper is to analyze the fundamental limits of the proposed scheme, as one would do in ECSQ when assuming that the scalar quantizer's output is compressed at a rate equal to its entropy.For the actual implementation of Lb-ESCQ, practical schemes based on low-density generator-matrix (LDGM) [18] or lattice codes [19] could be investigated.
Under the assumption of optimal lossy binary compression, we define the Lb-ECSQ rate of the uniform quantizer q u (•) as where with a slight abuse of notation we use R(D H , P b ) to denote the rate distortion function of a Bernoulli source with probability P b , and h 2 (•) is the binary entropy function.We are interested in evaluating Equation ( 14) in the limit δ → 0. In this regime, it is straightforward to show that for any source distribution p X (x) satisfying C1, then lim δ→0 P b = 1 2 .Using this result and Equation ( 9), we have where o(1) comprises error terms that vanish as δ tends to zero.Observe that if we take D H = 0 (i.e., lossless compression is used for both q u (X) and b(X)), in the limit δ → 0 the rate R Lb-u coincides with the entropy of the uniform quantizer in Equation ( 9) with half the interval length-i.e., δ = δ/2.

Reconstruction Values and Squared Distortion with a Lossy-Compressed Bit
Since q u (X) is losslessly compressed, upon decompression, it is recovered with no error.Let b(X) be a binary random variable representing the reconstructed value for b(X).Due to the lossy compression at a certain Hamming distortion, there exists a non-zero reconstruction error; namely, P( b(x) = b(x)|X = x) > 0 for D H > 0. Given the pair (q u (X), b(X)), we compute the source reconstruction value X as follows where c ∈ [0, δ 2 ] is a parameter that-along with D H -will be optimized to minimize the squared error distortion D = E X, X [(X − X) 2 ].We note that the reconstruction rule in Equation ( 16) is possibly suboptimal.
Before evaluating D as a function of δ, D H , and c, we first need to compute the error probabilities for the b(X) bit.Following [20] (Chapter 10), if (b(X), b(X)) are jointly distributed according to the binary symmetric channel shown in Figure 1, then the mutual information I(b(X); b(X)) actually coincides with the Bernoulli rate distortion function R(D H , P b ) = h 2 (P b ) − h 2 (D H ).Moreover, by using random coding in [20] (Chapter 10), it is shown that there exist encoding/decoding schemes that asymptotically (in the block-length) meet the input-output distribution in Figure 1.Consequently, under the assumption of optimal lossy compression of b(X) with prior probability P b , we can compute the error reconstruction probabilities by applying Bayes' rule in Figure 1, Binary Source model of the joint probability distribution between a Bernoulli source b(X) with prior probability P b and its reconstruction b(X) after lossy compression at Hamming distortion D H , assuming that the Bernoulli rate distortion function is achieved.
Note that in the limit δ → 0, we have P b = 0.5, and thus Equations ( 17) and ( 18) are equal to D H . Lemma 1.For any source X with a distribution p X (x) that satisfies conditions C1 and C2, under the assumption that the binary random variable b(X) defined in Equation ( 13) is optimally lossy compressed at a Hamming distortion D H .
Proof.Assuming optimal lossy compression, in the limit δ → 0, b(X) is in error with probability D H . Further, for any δ > 0, it is straightforward to check that both Equations ( 17) and ( 18) are upper bounded by D H . Therefore, the squared error distortion can be computed as follows: where p X|X=x ( x) is the conditional distribution of the reconstruction value for X = x, assuming that the reconstruction error probabilities are equal to D H . Equality is achieved at δ = 0.According to Equation ( 16), p X|X=x ( x) can be expressed as follows: for nδ < x ≤ (n + 1 2 )δ, then b(X) = 1, and hence and similarly, if (n + 1 2 )δ < x ≤ (n + 1)δ, then b(x) = 0, and As in Equation ( 7), we expand the integral in Equation ( 20) using the piecewise-constant distribution p where it can be check that the absolute value of the second term is upper bounded by X (x) − p X (x) dx, which vanishes as δ → 0. Using Equations ( 21) and ( 22), the first term in Equation (23) reads: where r = δ 2 − c.The equality is obtained after straight-forward manipulation.The latter expression is minimized if we choose r = δ 4 (1 + 2D H ), Equation ( 19) being the corresponding distortion.Note that for D H = 0, the reconstruction value is at the center of the interval, c = δ/4.Conversely, if D H > 0, the reconstruction point moves closer to the center of the next largest interval, such that the distortion caused by an erroneous transmission is reduced.

Asymptotic Gap to the Shannon Lower Bound
The following lemma jointly characterizes the rate R Lb-u and squared distortion D of the Lb-ECSQ scheme for the uniform quantizer q u (•) in the limit D → 0: Lemma 2. For any source X with a distribution p X (x) that satisfies conditions C1 and C2, the uniform quantizer q u (•) with interval length δ and Lb-ECSQ compression with quadratic distortion D = E X, X[(X − X) 2 ] and Hamming distortion D H of the bit b(X) achieves where o(1) comprises error terms that vanish as D tends to zero, and Proof.The proof is straightforward by combining Equations ( 15) and (19).More precisely, from Equation ( 19), we get that as δ → 0, the following equality holds By plugging this equality into Equation ( 15), we get Equation ( 28), where In Figure 2  Corollary 1.The uniform quantizer q u (•) with interval length δ and Lb-ECSQ compression with quadratic Lemma 3. Given the source X ∼ U ∈ [−δ/2, δ/2], the 1-bit quantizer q(•) with threshold α = 0 and Lb-ECSQ compression achieves a squared distortion under the assumption that q(X) is optimally lossy compressed at a Hamming distortion D H .
Proof.The proof is similar to that of Lemma 1, expanding E[(X − X) 2 ] as done for every quantization region done in and minimizing w.r.t. the reconstruction point c.
In Figure 3, we plot E X, X [(X − X)) 2 ] in Equation (37) vs. R Lb-q in Equation (32) for δ = √ 12 as we vary D H ∈ [0, 1/2] (blue curve with marker).Observe that Lb-ECSQ improves ECSQ at all points, except for D H = 0 and D H = 1/2, as we know they must be equivalent at these two points.The Lb-ECSQ analysis proposed for α = 0 can be generalized to an arbitrary threshold α ∈ [−δ/2, δ/2], but simulations for α = 0 using numerical optimization show that the obtained rate-distortion function coincides with the one computed for α = 0.This result is dependent on the source distribution, as shown for the Gaussian source case.

Two-Level Quantization of a Gaussian Source
Now consider the same quantizer q(•) and X ∼ N (0, σ 2 ).Low-resolution ECSQ for a Gaussian input source was studied in [21], where the authors showed that the minimum rate is achieved by a quantizer whose unique threshold α goes either to −∞ or to ∞ as D → σ 2 , and the two reconstruction points are the centroids of the quantization regions.The ECSQ rate distortion function for this source is given by the following parametric curve where Φ(α) is the cumulative density function of the Gaussian distribution, and We now study the same Lb-ECSQ scheme analyzed before for the uniform source.First, we fix the quantizer threshold to α = 0 and define b(X) = 1 if X ≤ α, and zero otherwise.Note that Lb-ECSQ rate is given in Equation (32).Lemma 4. Given the source X ∼ N (0, σ 2 ), the quantizer q(•) with α = 0 and Lb-ECSQ compression achieves a squared distortion for D H ∈ [0, 1/2].
Proof.The proof is based on expanding E[(X − X) 2 ] as done for every quantization region in Equation ( 24) and minimizing w.r.t. the reconstruction point c.
In Figure 4, we plot the gap between the ECSQ and Lb-ECSQ rate distortion function for a 1-bit quantizer and the rate distortion function for the source; i.e., R(D) = 0.5 log 2 (σ 2 /D).Observe that, unlike the case of a uniform source, for D/σ 2 → 1, Lb-ECSQ is slightly worse than ECSQ.As discussed before, in the ECSQ solution for a Gaussian input source, the threshold α goes to infinity in the limit D/σ 2 → 1 [21].By fixing the threshold α to 0 in Lb-ECSQ, we are restraining to an equivalent solution.This can be tackled by generalizing the above equations to an arbitrary threshold α.While the methodology is equivalent, we have to rely on numerical optimization to find the optimal choice of α, c 1 (α), and c 2 (α) for each value of D H .In this case, the bit error reconstruction probabilities take the form given in Equations ( 17) and (18).Additionally, for an arbitrary threshold α, b(X) is a Bernoulli source with probability p = Φ(α), and hence the compression rate is R Lb-q = h 2 (p) − h 2 (D H ). A numerical optimization (gradient descend) procedure has been used to find the minimum distortion for each R Lb-q .The results are shown in Figure 4, where we can see that now Lb-ECSQ is able to perform equally to or better than ECSQ in the whole range.
, we plot ∆(D H ) for D H ∈ [0, 1/2].Observe that ∆(D H ) is equal to zero at D H = 0 and D H = 1/2.However, for small values of D H , ∆(D H ) is actually smaller than zero, achieving its minimum at D * H ≈ 3.2 × 10 −3 .