Refinements and Generalizations of the Shannon Lower Bound via Extensions of the Kraft Inequality

Neri Merhav

doi:10.3390/e28010076

The Viterbi Faculty of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Technion City, Haifa 3200003, Israel

Entropy2026, 28(1), 76;https://doi.org/10.3390/e28010076

This article belongs to the Special Issue Information Theory and Data Compression

Version Notes

Order Reprints

Abstract

We derive a few extended versions of the Kraft inequality for lossy compression, which pave the way to the derivation of several refinements and extensions of the well-known Shannon lower bound in a variety of instances of rate-distortion coding. These refinements and extensions include sharper bounds for one-to-one codes and D-semifaithful codes, a Shannon lower bound for distortion measures based on sliding-window functions, and an individual-sequence counterpart of the Shannon lower bound.

Keywords:

rate-distortion theory; Shannon lower bound; Kraft’s inequality; coding redundancy; Lempel-Ziv algorithm

1. Introduction

The Shannon lower bound (SLB) is one of the most important analytic tools in rate-distortion theory because it provides a simple, explicit, and often very tight lower bound to the rate-distortion function for a wide class of sources and distortion measures; see, e.g., Subsection 3.4.1 of [1], Sections 4.3 and 4.6 of [2], and Problem 10.6 of [3]. Its significance lies in giving a universal benchmark that connects rate-distortion tradeoffs to the entropy or the differential entropy of the source (depending on whether the source has a discrete or continuous alphabet), thereby offering an intuitively transparent approximation in regimes where exact evaluation of the rate-distortion function is intractable. The SLB is particularly powerful at low distortion, where it frequently coincides with the exact rate-distortion function for smooth sources, and it serves as a foundation for many refinements and asymptotic approximations (e.g., high-resolution analysis, corrections for lattice quantizers, and recent non-asymptotic bounds). In fact, the literature contains reported results on the asymptotic tightness of the SLB in the limit of low distortion under fairly mild regularity conditions [4,5]. Because computing the rate-distortion function exactly is generally difficult, the SLB plays a central role in the analysis, design, and performance assessment of lossy compression schemes. More recent studies include the finite block-length regime [6,7] and further developments concerning the quadratic distortion function [8].

In [9] Campbell derived an extension of Kraft’s inequality that leads to the SLB in both the discrete and the continuous alphabet cases. However, his results were claimed to apply to rate-distortion codes whose distortion level is defined by the distance between the two most distant source-space vectors that are mapped to the same codeword, namely, the diameter (as opposed to the radius) of the distortion ball centered at the reproduction vector. In [10], a similar Kraft inequality was derived for D-semifaithful codes, namely, codes that incur per-letter distortion that never exceeds D in the usual sense. An additional benefit of the derivation in [10] was that it could also deliver an

O (\frac{log n}{n})

redundancy term on top of the SLB, at least for certain distortion measures for which there is an explicit expression of the cardinality (or the volume, in the continuous case) of a ball of normalized radius D in the source vector space.

In this work, we propose a few other extended versions of Kraft’s inequality that together pave the way to several further refinements and generalizations of the SLB. In these extensions of Kraft’s inequality, the idea is to upper bound the summation (or the integral, in the continuous case) of an exponentiated negative linear combination of the code length and the distortion incurred by each and every vector in the source space. By contrasting an upper bound with a lower bound to this quantity, we obtain several refinements and extensions to the SLB, which apply to the following scenarios.

1. One-to-one codes. Instead of assuming that the reproduction vectors are represented by uniquely decodable (UD) codes, we relax this restriction and allow any one-to-one code in the level of n-vectors. The lower bound then becomes the SLB minus an

O (\frac{log n}{n})

term, similarly to the lossless case derived by Rissanen [11].

2. D-semifaithful codes. Similarly to [10], we consider D-semifaithful codes, but here we allow several simultaneous distortion criteria. Using saddle-point integration, we show that the resulting lower bound is given by the SLB plus

\frac{k log n}{2 n} + o (\frac{1}{n})

, where k is the “effective number” of distortion constraints. The concrete meaning of this term will be clarified in Section 4.2 (see the paragraph that includes Equation (40)). Generally speaking, it means the remaining number of independent constraints after removing the redundant (inactive) ones. This

O (\frac{log n}{n})

redundancy term stems from a more accurate evaluation of the volume of a ball of radius

n D

around each source word, compared to the one obtained by Chernoff bounding techniques and by the method of types in the finite-alphabet case. Interestingly, the

O (\frac{log n}{n})

behavior is also coherent with that of earlier derived lower bounds to the rate-distortion function for n-blocks in the finite-alphabet case; see, e.g., [12] and references therein.

3. Sliding-window distortion functions. In some applications, one might wish to shape the spectrum or the memory properties of the reconstruction error signal. This can be performed by imposing additional distortion constraints defined by additive functions that operate on two or more consecutive samples of the error signal in a sliding-window fashion. Our framework is capable of incorporating such distortion functions and allowing a derivation of the generalized SLB for this case.

4. Individual sequences and finite-state encoders. By developing a generalized Kraft inequality, similar (but not identical) to the one by Ziv and Lempel [13] for finite-state encoders, we derive also an individual-sequence counterpart of the SLB, where the source entropy term is replaced by the Lempel–Ziv complexity.

As is well known, the classical SLB can be obtained significantly more simply and easily than going via the Kraft inequality. In particular, it is obtained by a straightforward direct manipulation of the mutual information. But it should be emphasized that the point of this article is not a quest for a simpler proof of the SLB. The point is that the path that goes via the Kraft inequality leads to the above-mentioned extensions and refinements.

The outline of the remaining part of this article is as follows. In Section 2, we establish some notation conventions (Section 2.1) and provide elementary background on the SLB (Section 2.2). In Section 3, we present and prove our extended Kraft inequality in several variations. In Section 4, we derive corresponding lower bounds, first, for UD lossless compression of the reproduction data, then for one-to-one compression thereof (Section 4.1), and finally, for D-semifaithful codes (Section 4.2). In Section 5, we address the case of sliding-window distortion functions. In Section 6, we first provide some background on finite-state compression of individual sequences and the LZ algorithm (Section 6.1) and then derive an individual-sequence counterpart of the SLB (Section 6.2). Finally, in Section 6.2, we summarize and conclude this paper.

2. Notation Conventions and Background

2.1. Notation Conventions

Throughout this paper, scalar random variables (RVs) will be denoted by capital letters, their sample values will be denoted by the respective lower-case letters, and their alphabets will be denoted by the respective calligraphic letters. A similar convention will apply to random vectors and their sample values, which will be denoted with same symbols superscripted by the dimension. Thus, for example,

U^{n}

(n—positive integer) will denote a random n-vector

(U_{1}, \dots, U_{n})

, and

u^{n} = (u_{1}, \dots, u_{n})

is a specific vector value in

U^{n}

, the n–th Cartesian power of

U

, which is the alphabet of each component of

u^{n}

. In some of our derivations below, there will be a need to refer to multiple copies of the vector

u^{n}

. In such cases, in order to avoid cumbersome subscripts and superscripts for indexing, we will use the alternative notation u for

u^{n}

, and then the various copies will be denoted by

u_{1}

,

u_{2}

, etc. Returning to the first notation method,

u_{i}^{j}

and

U_{i}^{j}

, where i and j are integers and

i \leq j

, will designate segments

(u_{i}, \dots, u_{j})

and

(U_{i}, \dots, U_{j})

, respectively, where for

i = 1

, the subscript will be omitted (as above). For

i > j

,

u_{i}^{j}

(or

U_{i}^{j}

) will be understood as the null string. Logarithms and exponents, throughout this paper, will be understood to be taken to the base 2 unless specified otherwise. The indicator function of an event

A

will be denoted by

I {A}

, i.e.,

I {A} = 1

if

A

occurs and

I {A} = 0

if not.

Sources and probability distributions associated with them will be denoted generically by the letter P subscripted by the name of the RV and its conditioning, if applicable, exactly like in ordinary textbook notation standards; e.g.,

P_{U^{n}} (u^{n})

is the probability function of

U^{n}

at the point

U^{n} = u^{n}

,

P_{X | U^{n}} (x | u^{n})

is the conditional probability of

X = x

given

U^{n} = u^{n}

, and so on. Whenever clear from the context, these subscripts will be omitted. Information theoretic quantities, like entropies and mutual informations, will be denoted following the usual conventions of the information theory literature, e.g.,

H (U^{n})

,

I (S; U^{n} | V^{n})

, and so on. The differential entropy of a continuous valued RV,

U^{n}

, will be denoted by

h (U^{n})

. The expectation operator will be denoted by

E {\cdot}

and the probability of an event

A

will be denoted by

\Pr {A}

.

It should be noted that our derivations will apply to both discrete-alphabet sources and to continuous-alphabet sources. To avoid repetitions, we henceforth carry on under the assumption of a continuous-alphabet source with the understanding that in the discrete-alphabet cases, integrations over

U^{n}

should simply be replaced by summations.

Let

U^{n} = (U_{1}, U_{2}, \dots, U_{n})

denote a source vector, drawn from a stochastic process, P, whose alphabet is

U

. The source vector is compressed by a lossy fixed-to-variable (F-V) source code, defined by an encoder

ϕ_{n} : U^{n} \to B_{n} \subset {0, 1}^{★}

and a decoder

ψ_{n} : B_{n} \to V_{n} \subseteq U^{n}

, where

B_{n}

denotes a certain subset of the set of all binary variable-length strings,

{0, 1} *

. Without loss of generality (and optimality), it is assumed that the encoder

ϕ_{n}

can be viewed as a cascade of a reproduction encoder (vector quantizer)

U^{n} \to V_{n}

and a uniquely decodable (UD) lossless code

V_{n} \to B_{n}

. In a certain part of our results, this unique decodability assumption will be partially relaxed to become the less demanding assumption of a one-to-one mapping. Let

L [ϕ_{n} (u^{n})]

denote the length (in bits) of the compressed codeword,

ϕ_{n} (u^{n})

.

Assuming that

U

is a group with certain addition and subtraction operations (e.g., modulo-K addition/subtraction for

U = {0, 1, \dots, K - 1}

for a finite positive integer K or ordinary addition/subtraction for

U = R

) we will focus on additive difference distortion measures, where the distortion between the source vector

u^{n} \in U^{n}

and the reproduction vector

v^{n} = ψ_{n} (ϕ_{n} (u^{n})) \in V_{n}

will be given by

d (u^{n}, v^{n}) = \sum_{i = 1}^{n} d (u_{i}, v_{i}) = \sum_{i = 1}^{n} ρ (u_{i} - v_{i}),

(1)

where

ρ (z)

,

z \in U

is a certain non-negative function, which vanishes if and only if

z = 0

, for example,

ρ (z) = | z |

,

ρ (z) = z^{2}

, etc. For the sake of convenience, we will sometimes denote

d (u^{n}, v^{n})

by

ρ (u^{n} - v^{n})

, which for a given encoder–decoder pair is also

ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n})))

. Given an encoder–decoder pair,

(ϕ_{n}, ψ_{n})

, the expected distortion,

\sum_{i = 1}^{n} E {ρ (U_{i} - V_{i})}

, will be constrained to be less than or equal to

n D

, where

D > 0

designates the per-letter distortion level allowed. In certain parts of our derivations, a more restrictive pointwise distortion constraint will be imposed, i.e.,

ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n}))) \leq n D

for all

u^{n} \in U^{n}

. In other parts, more than one difference distortion measure will play a role, and accordingly, more than one distortion constraint will be imposed. Given k difference distortion functions,

ρ_{j} (\cdot)

,

j = 1, 2 \dots, k

, we then require

E {ρ_{j} (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))} \leq n D_{j}

(or

{max}_{u^{n} \in U^{n}} ρ_{j} (u^{n} - ψ_{n} (ϕ_{n} (u^{n})))} \leq n D_{j}

) for all

j = 1, 2, \dots, k

. In another part of our results, we also allow distortion functions to be additive sliding-window functions operating on m consecutive symbols of the difference

z^{n} = u^{n} - v^{n}

. I.e.,

ρ (z^{n}) = \sum_{i = m}^{n} ρ (z_{i - m + 1}^{i}) .

(2)

This corresponds to situations where we wish to shape not only the ‘intensity’ of the error signal,

z^{n}

, but also its memory properties, for example, the correlations between consecutive symbols of

z^{n}

. We will elaborate more on this in Section 5.

Similarly to

u^{n}

, for which we adopt the alternative notation u, the same will apply to

v^{n}

, which will also be denoted by v, along with its multiple copies

v_{1}

,

v_{2}

, and so on.

2.2. Background

For a continuous-alphabet memoryless source P, the SLB is given by

R (D) \geq R_{S L B} (D) \overset{▵}{=} h (U) - Φ (D),

(3)

where

h (U)

is the differential entropy of a single symbol U and

Φ (D) \overset{▵}{=} sup_{{Z : E {ρ (Z)} \leq D}} h (Z) = inf_{β \geq 0} \{β D + log [\int_{U} 2^{- β ρ (z)} d z]\},

(4)

assuming that

\int_{U} 2^{- β ρ (z)} d z < \infty

for some

β > 0

. The equivalence between the two expressions of

Φ (D)

can be easily shown using standard techniques. For the sake of completeness, we prove this equivalence in Appendix A (see also Section 4.3.1 in [1] and Sections 4.3 and 4.6 in [2]). The advantage of the second formula of

Φ (D)

is that it involves optimization over one parameter only, as opposed to variational calculus in the first formula. Clearly, if the source is discrete rather than continuous, the differential entropy,

h (U)

, should be replaced by ordinary entropy,

H (U)

, and the integration over

U

should be replaced by summation, as indicated above.

For a source with memory, the SLB is given by

R_{n} (D) \overset{▵}{=} min_{{P_{V^{n} | U^{n}} : E {ρ (U^{n} - V^{n}) \leq n D}} \frac{I (U^{n}; V^{n})}{n} \geq \frac{h (U^{n})}{n} - Φ (D),

(5)

which in the limit of large n becomes

R (D) \geq lim_{n \to \infty} \frac{h (U^{n})}{n} - Φ (D) \overset{▵}{=} \bar{h} (U^{\infty}) - Φ (D),

(6)

where

\bar{h} (U^{\infty})

is the differential entropy rate of the source. In all cases, the function

Φ (D)

remains as in (4).

3. Extended Kraft Inequalities

For a given encoder–decoder pair,

(ϕ_{n}, ψ_{n})

, and parameters

α > 1

and

β \geq 0

, define the extended Kraft integral (or the extended Kraft sum, in the discrete-alphabet case) as follows:

Z^{n} (α, β) \overset{▵}{=} \int_{U^{n}} {exp}_{2} {- α L [ϕ_{n} (u^{n})] - β ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n})))} d u^{n} .

(7)

Lemma 1.

Let

(ϕ_{n}, ψ_{n})

induce a UD lossless compression of

v^{n} = ψ_{n} (ϕ_{n} (u^{n}))

. Then, for every

α > 1

and

β \geq 0

,

Z^{n} (α, β) \leq {[\int_{U} 2^{- β ρ (z)} d z]}^{n} .

(8)

Proof of Lemma 1.

Let us examine the expression of

{[Z^{n} (α, β)]}^{k}

for an arbitrary positive integer k.

\begin{matrix} {[Z^{n} (α, β)]}^{k} & = & {[\int_{U^{n}} {exp}_{2} {- α L [ϕ_{n} (u)] - β ρ (u - ψ_{n} (ϕ_{n} (u)))} d u]}^{k} \\ = & \int_{U^{n}} d u_{1} \cdot \cdot \cdot \int_{U^{n}} d u_{k} \cdot {exp}_{2} \{- α \sum_{i = 1}^{k} L [ϕ_{n} (u_{i})] - β \sum_{i = 1}^{k} ρ (u_{i} - ψ_{n} (ϕ_{n} (u_{i})))\} \\ \leq & \sum_{l = 1}^{\infty} \sum_{{{v_{i}}_{i = 1}^{k} : \sum_{i = 1}^{k} L (v_{i})] = l}} \int_{{u_{1} : ψ_{n} (ϕ_{n} (u_{1})) = v_{1}}} d u_{1} \cdot \cdot \cdot \int_{{u_{k} : ψ_{n} (ϕ_{n} (u_{k})) = v_{k}}} d u_{k} \times \\ {exp}_{2} \{- α l - β \sum_{i = 1}^{k} ρ (u_{i} - v_{i})\} \\ \leq & \sum_{l = 1}^{\infty} 2^{- α l} \sum_{{{v_{i}}_{i = 1}^{k} : \sum_{i = 1}^{k} L (v_{i}) = l}} \int_{U^{n}} d z_{1} \dots \int_{U^{n}} d z_{k} {exp}_{2} \{- β \sum_{i = 1}^{k} ρ (z_{i})\} \\ \leq & \sum_{l = 1}^{\infty} 2^{- α l} \cdot 2^{l} {[\int_{U^{n}} d z^{n} {exp}_{2} {- β ρ (z^{n})}]}^{k} \\ = & {[\int_{U^{n}} \cdot d z^{n} {exp}_{2} {- β ρ (z^{n})}]}^{k} \cdot \sum_{l = 1}^{\infty} 2^{- (α - 1) l} \\ = & \frac{{[\int_{U} d z {exp}_{2} {- β ρ (z)}]}^{n k}}{2^{α - 1} - 1}, \end{matrix}

(9)

where in the second inequality we transformed each integration variable

u_{i}

into

z_{i} = u_{i} - v_{i}

(a transformation with unit Jacobian) and expanded the integration domain from

{u_{i} : ψ_{n} (ϕ_{n} (u_{i})) = v_{i}}

to

U^{n}

. Consequently,

Z^{n} (α, β) \leq \frac{{[\int_{U} d z {exp}_{2} {- β ρ (z)}]}^{n}}{{(2^{α - 1} - 1)}^{1 / k}},

(10)

which upon taking the limit

k \to \infty

becomes

Z^{n} (α, β) \leq {[\int_{U} 2^{- β ρ (z)} d z]}^{n},

(11)

completing the proof of Lemma 1. □

For one-to-one codes, instead of taking the limit of

k \to \infty

in the proof of Lemma 1, we simply set

k = 1

, since the one-to-one property is merely imposed at the level of a single n-block rather than at the level of concatenations of blocks, as in UD codes. This yields the following variation of Lemma 1.

Lemma 2.

Let

(ϕ_{n}, ψ_{n})

induce a one-to-one mapping between

ψ_{n} (u^{n})

and

v^{n} = ψ_{n} (ϕ_{n} (u^{n}))

. Then, for every

α > 1

and

β \geq 0

,

Z_{1 - 1}^{n} (α, β) \leq \frac{{[\int_{U} 2^{- β ρ (z)} d z]}^{n}}{2^{α - 1} - 1},

(12)

where

Z_{1 - 1}^{n} (α, β)

is defined exactly as

Z^{n} (α, β)

, except that it may apply to the larger class of one-to-one codes, rather than UD codes.

When

α \in (1, 2)

, the denominator,

2^{α - 1} - 1

, is smaller than unity, and then the upper bound to

Z_{1 - 1}^{n} (α, β)

in Lemma 2 is larger than that of Lemma 1. Indeed, the interesting region for selecting the values of

α

is in the vicinity of unity, where

2^{α - 1} - 1 < 1

.

Returning to the class of UD lossless encodings of

v^{n} = ψ_{n} (ϕ_{n} (u^{n}))

, two additional variations of the above extended Kraft inequality can be considered. The first pertains to D-semifaithful codes, namely, codes for which the distortion is restricted to never exceed

n D

, pointwise, and not merely in expectation, that is,

{max}_{u^{n} \in U^{n}} ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n}))) \leq n D

, where D is the allowed per-letter distortion. The second variation is associated with fixed-rate codes, i.e., codes for which

L [ϕ_{n} (u^{n})] = n R

for all

u^{n} \in U^{n}

, where

R > 0

is the allowed coding rate.

For D-semifaithful codes, let us redefine the integrand of the extended Kraft integral by replacing the term

β ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n})))

in the exponent with the function

W (ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n}))) - n D)

, where

W [\cdot]

is the infinite well function (IWF),

W (t) \overset{▵}{=} \{\begin{matrix} 0 & t \leq 0 \\ \infty & t > 0 \end{matrix}

(13)

This causes the integrand of the extended Kraft integral to vanish wherever the distortion constraint is violated, and then the extended Kraft integral becomes

Z_{D - s f}^{n} (α) \overset{▵}{=} \int_{S_{n} (D)} {exp}_{2} {- α L [ϕ_{n} (u^{n})]} d u^{n},

(14)

where

S_{n} (D) = {u^{n} : ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n}))) \leq n D}

. In this case, a simple modification of the proof of Lemma 1 yields the following version of the extended Kraft inequality (see also [10]).

Lemma 3.

Let

(ϕ_{n}, ψ_{n})

be a D-semifaithful code that comprises UD lossless encoding of

v^{n} = ψ_{n} (ϕ_{n} (u^{n}))

. Then, for every

α > 1

,

Z_{D - s f}^{n} (α) \leq V o l {z^{n} : ρ (z^{n}) \leq n D} .

(15)

Here too, if the UD property is replaced by the one-to-one property, then the right-hand side should be divided by

2^{α - 1} - 1

.

Finally, for fixed-rate codes, let us replace the term

L [ϕ_{n} (u^{n})]

of the Kraft integrand by

n R

and return the second term therein to

β ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n})))

. By similar manipulations of the proof of Lemma 1, we find that for fixed-rate codes, the Kraft integral becomes

Z_{fr}^{n} (α, β) \overset{▵}{=} \int_{U^{n}} {exp}_{2} {- α n R - β ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n})))} d u^{n},

(16)

with the following version of the extended Kraft inequality.

Lemma 4.

Let

(ϕ_{n}, ψ_{n})

be a rate-R fixed-rate code. Then, for every

α \geq 0

and

β \geq 0

,

Z_{f r}^{n} (α, β) \leq 2^{n (1 - α) R} \cdot {[\int_{U} 2^{- β ρ (z)} d z]}^{n} .

(17)

4. Lower Bounds

To fix ideas, we first demonstrate how the classical SLB is obtained from Lemma 1. Consider an arbitrary encoder–decoder pair,

(ϕ_{n}, ψ_{n})

, in which the reproduction vector,

v^{n} = ψ_{n} (ϕ_{n} (u^{n}))

, is losslessly compressed by a UD code. Then, following Lemma 1, we have the following chain of inequalities:

\begin{matrix} {[\int_{U} 2^{- β ρ (z)} d z]}^{n} & \overset{(a)}{\geq} & Z_{n} (α, β) \\ = & \int_{U^{n}} {exp}_{2} {- α L [ϕ_{n} (u^{n})] - β ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n})))} d u^{n} \\ = & \int_{U^{n}} P (u^{n}) {exp}_{2} {- α L [ϕ_{n} (u^{n})] - β ρ (u^{n} - ψ_{n} (ϕ_{n} (u^{n}))) - log P (u^{n})} d u^{n} \\ = & E \{{exp}_{2} {- α L [ϕ_{n} (U^{n})] - β ρ (U^{n} - ψ_{n} (ϕ_{n} (U^{n}))) - log P (U^{n})}\} \\ \overset{(b)}{\geq} & {exp}_{2} [- α E {L [ϕ_{n} (U^{n})]} - β E {ρ (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))} - E {log P (U^{n})}] \\ = & {exp}_{2} [- α E {L [ϕ_{n} (U^{n})]} - β E {ρ (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))} + h (U^{n})], \end{matrix}

(18)

where (a) stems from Lemma 1 and (b) is due to Jensen’s inequality applied to the convex function,

f (t) = 2^{t}

. It follows that

α E {L [ϕ_{n} (U^{n})]} + β E {ρ (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))} \geq h (U^{n}) - n log [\int_{U} 2^{- β ρ (z)} d z] .

(19)

Since this holds true for every

α > 1

while the right-hand side is independent of

α

, we may take the infimum of the left-hand side in the range

α > 1

to obtain, after normalization by n,

\frac{E {L [ϕ_{n} (U^{n})]}}{n} + β \cdot \frac{E {ρ (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))}}{n} \geq \frac{h (U^{n})}{n} - log [\int_{U} 2^{- β ρ (z)} d z] .

(20)

At this point, there are several possible perspectives that can be adopted regarding Equation (20). The first is, of course, to view this as a lower bound to the Lagrangian of rate and distortion. The second is to impose an expected distortion constraint,

E {ρ (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))} \leq n D

, and then Equation (20) would yield a lower bound to the expected rate according to

\frac{E {L [ϕ_{n} (U^{n})]}}{n} \geq \frac{h (U^{n})}{n} - log [\int_{U} 2^{- β ρ (z)} d z] - β D .

(21)

Since this holds true for every

β \geq 0

while the left-hand side is independent of

β

, we may maximize the right-hand side over

β \geq 0

, to obtain

\frac{E {L [ϕ_{n} (U^{n})]}}{n} \geq \frac{h (U^{n})}{n} - inf_{β \geq 0} \{log [\int_{U} 2^{- β ρ (z)} d z] + β D\} = \frac{h (U^{n})}{n} - Φ (D),

(22)

thus recovering the classical SLB. In case of multiple distortion constraints, say,

E {ρ_{j} (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))} \leq n D_{j}, j = 1, 2, \dots, k,

(23)

the above rate lower bound continues to apply, provided that

β

, D and

ρ (z)

are redefined as k-dimensional vectors,

β = (β_{1}, \dots, β_{k})

,

D = (D_{1}, \dots, D_{k})

, and

ρ (z) = (ρ_{1} (z), \dots, ρ_{k} (z))

, and accordingly,

β D

and

β ρ (z)

are understood to be inner products. The infimum over

β

is then defined across

{[0, \infty)}^{k}

.

Returning to Equation (20) and to a single distortion criterion, we may alternatively apply a rate constraint

E {L [ϕ_{n} (U^{n})]} \leq n R

and then obtain a lower bound to the expected distortion:

\begin{matrix} \frac{E {ρ (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))}}{n} & \geq & sup_{β \geq 0} \frac{1}{β} (\frac{h (U^{n})}{n} - log [\int_{U} 2^{- β ρ (z)} d z] - R) \\ = & sup_{γ \geq 0} γ (\frac{h (U^{n})}{n} - log [\int_{U} 2^{- ρ (z) / γ} d z] - R), \end{matrix}

(24)

which is the distortion-rate counterpart of the SLB.

Having established the classical SLB for general variable-rate codes with UD lossless compression of the reproduction vectors, we now carry on to derive some refinements and extensions associated with the other types of codes that we mentioned. The idea would be to apply the same chain of inequalities as in (18) but to invoke Lemma 2 or Lemma 3, or Lemma 4, according to the relevant class of codes, instead of Lemma 1. We next implement this plan for one-to-one codes and for D-semifaithful codes. The same methodology can be applied to fixed-rate codes, but we will not delve into this here.

4.1. One-to-One Codes

For one-to-one codes, we repeat the same derivation as in (18) by invoking Lemma 2 instead of Lemma 1. This yields the following modified version of Equation (19):

α E {L [ϕ_{n} (U^{n})]} + β E {ρ (U^{n} - ψ_{n} (ϕ_{n} (U^{n})))} \geq h (U^{n}) - n log [\int_{U} 2^{- β ρ (z)} d z] + log (2^{α - 1} - 1) .

(25)

Applying the average distortion constraint, optimizing over

β

, and normalizing by n, we obtain

\frac{E {L [ϕ_{n} (U^{n})]}}{n} \geq sup_{α \geq 1} \{\frac{h (U^{n}) / n - Φ (D)}{α} + \frac{log (2^{α - 1} - 1)}{α n}\} .

(26)

The maximization of the right-hand side with respect to

α

does not seem to lend itself to a closed-form expression, but by selecting

α = α_{n} = 1 + c \frac{log n}{n}

(c being an arbitrary positive constant), it is readily seen that the resulting lower bound becomes

\frac{E {L [ϕ_{n} (U^{n})]}}{n} \geq \frac{h (U^{n})}{n} - Φ (D) - O (\frac{log n}{n}),

(27)

which is in agreement with the subtraction of

O ((\log n) / n)

below the entropy for lossless one-to-one codes [11]. This reduction of

O ((\log n) / n)

is due to the fact that the class of one-to-one codes is broader than the class of UD codes, and therefore the former codes are potentially more capable for a given finite n, albeit the difference fades away as n grows.

4.2. D-Semifaithful Codes

For D-semifaithful codes with UD correspondence between the compressed bit-stream and the reproduction, we obtain from Lemma 3, combined with a derivation like (18),

E {L [ϕ_{n} (U^{n})]} \geq h (U^{n}) - log Vol {S_{n} (D)},

(28)

which is in line with the results of [10], and the remaining issue becomes the assessment of the log-volume of

S_{n} (D)

or an evaluation of a good upper bound to this quantity. One way to proceed is to apply a Chernoff bound, as was suggested by Campbell [9]. For

U = R

, this amounts to the following derivation:

\begin{matrix} log Vol {S_{n} (D)} & = & log Vol {z : ρ (z) \leq n D} \\ = & log [\int_{R^{n}} I {ρ (z) \leq n D} d z] \\ \leq & log (inf_{β \geq 0} \int_{R^{n}} 2^{β [n D - ρ (z)]} d z) \\ = & n \cdot inf_{β \geq 0} \{β D + log (\int_{R} 2^{- β ρ (z)} d z)\} \\ = & n Φ (D), \end{matrix}

(29)

and we are back to the ordinary SLB,

\frac{E {L [ϕ_{n} (U^{n})]}}{n} \geq \frac{h (U^{n})}{n} - Φ (D),

(30)

exactly as we had for UD lossless compression of the reproduction data, and once again, this derivation extends straightforwardly to the case of multiple simultaneous distortion constraints by considering

β

, D, and

ρ (\cdot)

to be vectors rather than scalars, as mentioned before.

However, since the class of D-semifaithful codes is narrower (and hence more limited to a certain extent) than the class of codes that merely comply with an average distortion constraint, it is conceivable to expect a somewhat tighter (larger) lower bound. Indeed, this turns out to be the case if the log-volume of

S_{n} (D)

is estimated using a more sophisticated analysis tool, namely, the saddle-point method (see, e.g., Chapter 5 in [14] and Chapter 3 in [15], as well as its extension to the multivariate case [16]).

To apply the saddle-point method, the idea is to represent the indicator function in the integrand of the second line in (29) as

I {ρ (z) \leq n D} = u (n D - ρ (z))

, where

u (t)

is the unit step function, defined as

u (t) = \{\begin{matrix} 0 & t < 0 \\ 1 & t \geq 0 \end{matrix}

(31)

which in turn is represented as the inverse Laplace transform of the complex function

U (s) = \frac{1}{s}

, i.e.,

u (t) = \frac{1}{2 π i} \int_{R e {s} = c} \frac{e^{s t} d s}{s},

(32)

where

i \overset{▵}{=} \sqrt{- 1}

and c is an arbitrary positive real. It follows that

\begin{matrix} Vol {S_{n} (D)} & = & \int_{R^{n}} u (n D - ρ (z)) d z \\ = & \int_{R^{n}} \frac{d z}{2 π i} \int_{Re {s} = c} \frac{e^{s (n D - ρ (z))} d s}{s} \\ = & \frac{1}{2 π i} \int_{Re {s} = c} \frac{e^{s n D} d s}{s} \int_{R^{n}} e^{- s ρ (z)} d z \\ = & \frac{1}{2 π i} \int_{Re {s} = c} \frac{e^{s n D} d s}{s} \int_{R^{n}} exp \{- s \sum_{t = 1}^{n} ρ (z_{t})\} d z \\ = & \frac{1}{2 π i} \int_{Re {s} = c} \frac{e^{s n D} d s}{s} {[\int_{R} e^{- s ρ (z)} d z]}^{n} \\ = & \frac{1}{2 π i} \int_{Re {s} = c} \frac{d s}{s} exp \{n [s D + \ln (\int_{R} e^{- s ρ (z)} d z)]\} . \end{matrix}

(33)

This path integral in the complex plane complies with the general form

\int_{A}^{B} g (s) e^{n f (s)} d s

, which under certain regularity conditions can be approximated for large n (see, e.g., Equation (5.7.2) in [14]) according to

\int_{A}^{B} g (s) e^{n f (s)} d s = e^{i θ} \sqrt{\frac{2 π}{n | f ″ (s_{★}) |}} \cdot g (s_{★}) e^{n f (s_{★})} \cdot [1 + O (\frac{1}{n})],

(34)

provided that the functions f and g are independent of n and analytic within some connected region

D

that includes A and B (which are also independent of n). Here,

s_{★} \in D

is a saddle-point, i.e., a point where

f' (s_{★}) = 0

,

f ″ (s_{★}) > 0

and

g (s_{★}) \neq 0

. The angle

θ

is called the axis and is given by

θ = (π - a r g {f ″ (s_{★})}) / 2

. In our case,

f (s) = s D + \ln [\int_{R} e^{- s ρ (z)} d z]

,

g (s) = \frac{1}{s}

,

θ = \frac{π}{2}

, and

s_{★}

is the point at which

f^{'}

vanishes, which is assumed strictly positive. But this point of zero derivative is also the point that minimizes the convex function f across the positive reals. It follows then that

Vol {S_{n} (D)}

can be approximated as follows:

Vol {S_{n} (D)} = \frac{1}{s_{★} \sqrt{2 π n | f ″ (s_{★}) |}} \cdot exp \{n (s_{★} D + ln [\int_{R} e^{- s_{★} ρ (z)} d z])\} \cdot [1 + O (\frac{1}{n})] .

(35)

As for the exponential factor, observe that

\begin{matrix} \exp \{n (s_{★} D + \ln [\int_{R} e^{- s_{★} ρ (z)} d z])\} \\ = & exp \{n \cdot inf_{s \geq 0} (s D + ln [\int_{R} e^{- s ρ (z)} d z])\} \\ = & {exp}_{2} \{n ({log}_{2} e) \cdot inf_{s \geq 0} (s D + \frac{1}{{log}_{2} e} \cdot {log}_{2} [\int_{R} 2^{- s ρ (z) {log}_{2} e} d z])\} \\ = & {exp}_{2} \{n inf_{s \geq 0} (s D {log}_{2} e + {log}_{2} [\int_{R} 2^{- s ρ (z) {log}_{2} e} d z])\} \\ = & {exp}_{2} \{n inf_{β \geq 0} (β D + {log}_{2} [\int_{R} 2^{- β ρ (z)} d z])\} \\ = & 2^{n Φ (D)}, \end{matrix}

(36)

and so,

Vol {S_{n} (D)} = \frac{2^{n Φ (D)}}{s_{★} \sqrt{2 π n | f ″ (s_{★}) |}} \cdot [1 + O (\frac{1}{n})],

(37)

which yields

\frac{log Vol {S_{n} (D)}}{n} = Φ (D) - \frac{log n}{2 n} - O (\frac{1}{n}),

(38)

and then, following Equation (28), we end up with

\frac{E {L [ϕ_{n} (U^{n})]}}{n} \geq \frac{h (U^{n})}{n} - Φ (D) + \frac{log n}{2 n} + O (\frac{1}{n}) .

(39)

We therefore observe that the SLB for D-semifaithful codes is given by the ordinary SLB plus redundancy, whose leading term is

\frac{log n}{2 n}

. This is in agreement with findings of [10], where the derivations corresponded to special cases for which simple geometric and/or combinatorial considerations facilitated the accurate characterization of

log Vol {S_{n} (D)}

, but here the conclusion is more general. Note that we could have been even more precise and also specified the

O (\frac{1}{n})

term to be

\frac{1}{n} \cdot log [s_{★} \sqrt{2 π | f ″ (s_{★}) |}] + O (\frac{1}{n^{2}})

, but this is, of course, less important.

When k simultaneous distortion constraints are imposed pointwise, i.e.,

max_{u^{n} \in R^{n}} ρ_{j} (u^{n} - ψ_{n} (ϕ_{n} (u^{n}))) \leq n D_{j}, j = 1, 2, \dots, k,

(40)

then using the multivariate saddle-point method [16], the above derivation extends with the vector version of the definition of

Φ (D)

replacing the scalar one, and the pre-exponential factor of the saddle-point approximation becomes

\frac{{(2 π / n)}^{k / 2}}{S \sqrt{\det {Hess {f} |_{s_{★}}}}},

where S is the product of the components of the k-dimensional vector

s_{★}

—the point at which

\nabla f (s) = 0

, where

f (s) = s \cdot D + ln [\int_{R} e^{- s \cdot ρ (z)} d z]

is defined such that s, D, and

ρ (\cdot)

are k-dimensional vectors, with

s \cdot D

and

s \cdot ρ (z)

being inner products, as was defined before. Here,

\det {Hess {f} |_{s_{★}}

is the determinant of the Hessian of f, computed at

s = s_{★}

. It is assumed, of course, that the

k \times k

matrix

{Hess {f} |}_{s_{★}}

is non-singular; otherwise there might be redundant (inactive) constraints, which should be removed from the calculation. In this case, the redundancy on top of the ordinary SLB is

\frac{k^{'} log n}{2 n} + O (\frac{1}{n})

, i.e.,

\frac{E {L [ϕ_{n} (U^{n}]}}{n} \geq \frac{h (U^{n})}{n} - Φ (D) + \frac{k^{'} log n}{2 n} + O (\frac{1}{n}),

(41)

where

k^{'} \leq k

is the effective dimension after the possible removal of redundant constraints.

For example, if

k = 2

,

ρ_{1} (z) = | z |

, and

ρ_{2} (z) = z^{2}

, then obviously,

{(\frac{1}{n} \sum_{t = 1}^{n} | z_{t} |)}^{2}

cannot exceed

\frac{1}{n} \sum_{t = 1}^{n} z_{t}^{2}

, and so, if

D_{1} > \sqrt{D_{2}}

, the constraint

\sum_{t = 1}^{n} | z_{t} | \leq n D_{1}

is inactive (and hence removable) in the presence of the constraint

\sum_{t = 1}^{n} z_{t}^{2} \leq n D_{2}

. In this case, although

k = 2

, the effective dimension is

k^{'} = 1

. Another obvious example of a superfluous distortion constraint occurs when there is a linear dependence. For instance, let

k = 3

and

ρ_{3} (z) = a ρ_{1} (z) + b ρ_{2} (z)

, where a and b are fixed positive reals. Then, whenever

D_{3} \geq a D_{1} + b D_{2}

, the third distortion constraint is redundant and then

k^{'} = 2

(or even less, if there are additional superfluous constraints). More generally, inactive constraints can be identified as those whose corresponding components of

s_{★}

vanish.

Another type of situation where the effective dimension

k^{'}

is smaller than the formal dimension k occurs when the dependence of

f (s)

on some of the components of s disappears in the first place. Consider, for example, the case where

k = 2

,

ρ_{1} (z) = z^{2}

, and the other distortion constraint is

{max}_{1 \leq t \leq n} | z_{t} | \leq A

; in other words, we impose both an average quadratic constraint and a peak-limited distortion constraint with the motivation of avoiding large spikes in the reconstruction error signal. The peak-limited distortion constraint can be represented as an additive distortion constraint if we select

ρ_{2} (z) = W (| z | - A)

, where

W (\cdot)

is the IWF defined in (13) and the value of

D_{2}

can then be selected to be an arbitrary finite, non-negative real, say

D_{2} = 0

. Here again

k^{'} = 1

, but as said, this time, it is because the function

f (s)

depends only on one component of the vector

s = (s_{1}, s_{2})

to begin with. To see this, observe that

\begin{matrix} f (s) & = & s D + ln [\int_{R} e^{- s ρ (z)} d z] \\ = & s_{1} D_{1} + s_{2} D_{2} + ln [\int_{R} e^{- s_{1} ρ_{1} (z) + s_{2} ρ_{2} (z)} d z] \\ = & s_{1} D_{1} + s_{2} \cdot 0 + ln [\int_{R} e^{- s_{1} z^{2} + s_{2} W (| z | - A)} d z] \\ = & s_{1} D_{1} + ln [\int_{- A}^{A} e^{- s_{1} z^{2}} d z] \\ = & s_{1} D_{1} + ln (\sqrt{\frac{π}{s_{1}}} \cdot [1 - 2 Q (A \sqrt{2 s_{1}})]) \\ = & s_{1} D_{1} + \frac{1}{2} ln (\frac{π}{s_{1}}) + ln [1 - 2 Q (A \sqrt{2 s_{1}})], \end{matrix}

(42)

where

Q (\cdot)

is the well-known Q-function, defined as

Q (t) = \int_{t}^{\infty} \frac{e^{- x^{2} / 2} d x}{\sqrt{2 π}} .

(43)

Here, although formally there are

k = 2

distortion constraints, the saddle-point integration is over one complex variable only, and so the redundancy is

\frac{log n}{2 n} + O (\frac{1}{n})

on top of the ordinary SLB,

\frac{h (U^{n})}{n} + Φ (D)

.

5. Sliding-Window Distortion Constraints

Consider a situation where one wishes not only to control the ‘intensity’ of the reconstruction error signal,

\sum_{t = 1}^{n} ρ (z_{i})

, but also possibly to shape its ‘continuity’ or its ‘smoothness’ by imposing additional limitations, say, on the empirical autocorrelations of the error signal,

z^{n} = (z_{1}, z_{2}, \dots, z_{n})

, in order to suppress, for example, high frequencies, which might be disturbing for the human eye (in the case of images and video streams) or the human ear (in the case of audio signals). For instance, one might wish to constrain the error signal to obey the first lag autocorrelation constraint,

\frac{1}{n} \sum_{t = 2}^{n} z_{t} z_{t - 1} \geq 0.95

(and perhaps also further lags), in addition to the ordinary mean-square error constraint, say,

\frac{1}{n} \sum_{t = 1}^{n} z_{t}^{2} \leq 1

. More generally, consider the case where we impose k additive constraints pertaining to sliding-window functions of the form

\sum_{t = m}^{n} ρ_{j} (z_{t - m + 1}^{t}) \leq n D_{j}, j = 1, 2, \dots, k,

(44)

where m is a positive integer that designates the size of the sliding window. For

m = 1

, we are back to ordinary additive distortion constraints, where the single-letter distortion function

ρ_{j}

operates on single symbols separately. If the sizes of the sliding window are different for the k various constraints, we take m to be the largest one. Accordingly, in the above example where

k = 2

and the constraints are

\frac{1}{n} \sum_{t = 1}^{n} z_{t}^{2} \leq 1

and

\frac{1}{n} \sum_{t = 2}^{n} z_{t} z_{t - 1} \geq 0.95

, we have

ρ_{1} (z) = z^{2}

,

D_{1} = 1

,

ρ_{2} (z_{1}, z_{2}) = - z_{1} \cdot z_{2}

, and

D_{2} = - 0.95

. In this case, the sliding-window size is

m = 2

, and the minus signs in the correlation constraint are due to the reversal of the direction of the inequality in

\frac{1}{n} \sum_{t = 2}^{n} z_{t} z_{t - 1} \geq 0.95

, as opposed to the direction of inequalities in our distortion constraints in general.

How does the SLB extend to the case of sliding-window constraints of this type? In this section, we assume that m is fixed while n tends to infinity, and we focus merely on the main terms of the resulting SLB, disregarding the redundancy terms.

A straightforward extension of Lemma 1 and the subsequent derivation in Section 4, so as to apply to a set of k sliding-window distortion constraints with window size m, yields the lower bound

E {L [ϕ (U^{n})]} \geq h (U^{n}) - inf_{β \in {[0, \infty)}^{k}} (log [\int_{U^{n}} {exp}_{2} \{- β \cdot \sum_{t = m}^{n} ρ (z_{t - m + 1}^{t})\} d z^{n}] + n β \cdot D),

(45)

where

β = (β_{1}, \dots, β_{k})

,

D = (D_{1}, \dots, D_{k})

,

ρ (z_{t - m + 1}^{t}) = (ρ_{1} (z_{t - m + 1}^{t}), \dots, ρ_{k} (z_{t - m + 1}^{t}))

, and the dot operations are understood as inner products, as defined earlier.

The multi-dimensional integral

\int_{U^{n}} {exp}_{2} \{- β \cdot \sum_{t = m}^{n} ρ (z_{t - m + 1}^{t})\} d z^{n} = \int_{U^{n}} \prod_{t = m}^{n} {exp}_{2} \{- β \cdot ρ (z_{t - m + 1}^{t})\} d z^{n}

(46)

can be viewed as being obtained by iterated applications of a sliding-window integral operator whose kernel is given by

K_{β} (z_{t - m + 1}^{t}) = {exp}_{2} \{- β \cdot ρ (z_{t - m + 1}^{t})\}

and the integration is over one component of

z^{n}

at a time. Under certain regularity conditions, the value of this multi-dimensional integral grows exponentially with an exponential order of

{[λ (β)]}^{n}

, where

λ (β)

is the spectral radius of the operator kernel, namely, the dominant eigenvalue, and then the asymptotic form of the SLB becomes

\underset{n \to \infty}{lim inf} \frac{E {L [ϕ_{n} (U^{n})]}}{n} \geq \bar{h} (U^{\infty}) - inf_{β \in {[0, \infty)}^{k}} [log λ (β) + β D] .

(47)

There are several equivalent formulas for calculating the spectral radius,

λ (β)

, of a sliding-window kernel

K_{β} (\cdot)

for a given

β

, for example, the Collatz–Wielandt formula [17,18] and the Donsker–Varadhan formula [19]. Both formulas are associated with calculus of variations. For details, see, e.g., Subsection 3.4 of [20]. In certain special cases, such as those that involve a symmetric kernel for

m = 2

,

K_{β} (z, z^{'})

, the Rayleigh quotient formula

λ (β) = sup_{g} \frac{\int_{U^{2}} g (z) K_{β} (z, z^{'}) g (z^{'}) d z d z^{'}}{\int_{U} g^{2} (z) d z} .

(48)

can also be used. Note that even in this relatively simple special case, the evaluation of

λ (β)

is a problem of calculus of variations and hence it is not trivial computationally. Clearly, in the finite-alphabet case, the operator kernel reduces to a finite-dimensional matrix, and then the spectral radius is simply the Perron–Frobenius eigenvalue (see, e.g., Theorem 8.2.11 of [21]).

As a simple example, consider the case

k = m = 2

,

ρ_{1} (z) = z^{2}

,

D_{1} = D

,

ρ_{2} (z, z^{'}) = - z \cdot z^{'}

, and

D_{2} = - θ

, where

D > 0

and

θ \in [0, 1)

are given parameters. Then, following the analogous derivation of Example 4 in [20], we find that the resulting SLB is given by

\underset{n \to \infty}{lim inf} \frac{E {L [ϕ_{n} (U^{n})]}}{n} \geq \bar{h} (U^{\infty}) - \frac{1}{2} log [2 π e D (1 - θ^{2})],

(49)

or, in words, the rate penalty due to the additional autocorrelation constraint is

\frac{1}{2} log \frac{1}{1 - θ^{2}}

on top of the SLB pertaining to the ordinary quadratic distortion constraint,

\frac{h (U^{n})}{n} - \frac{1}{2} log (2 π e D)

.

6. Individual Sequences and Finite-State Encoders

We conclude the technical part of this article by deriving an individual-sequence counterpart of the SLB for finite-alphabet, deterministic sequences, which is based on a variation of Ziv and Lempel’s generalized Kraft inequality (see Lemma 2 of [13]).

Generally speaking, our model for lossy compression of individual sequences is based on the following simple structure. Each source block of length m,

u^{m} \in U^{m}

(m—positive integer) is first mapped by an arbitrary reproduction encoder (or vector quantizer) into a reproduction vector,

v^{m} = q (u^{m}) \in V_{m} \subseteq U^{m}

(50)

and then the concatenation of the resulting m-vectors,

{v^{m}}

, forming the sequence

v_{1}, v_{2}, \dots

, is compressed losslessly by a finite-state encoder following the model of [13]. Note that in general, the adoption of this structure does not harm generality and optimality because the mapping between the reproduction vectors and the compressed bit-streams should be bijective (as otherwise additional distortion is injected in the compression phase), and so one may envision a scenario where the encoder holds a copy of the reproduction codebook and cascades it with lossless compression of the reproduction vectors.

To keep this article self-contained, we begin with some basic background on lossless compression of individual sequences by finite-state encoders and the 1978 Lempel–Ziv (LZ78) algorithm. Readers familiar with this background may safely skip Section 6.1 and move on directly to Section 6.2.

6.1. Background

Following the model of [13], consider a setting of lossless compression of

v^{n}

on the basis of finite-state (FS) encoders. An FS encoder is defined by the set

E = (U, Y, Σ, f, g)

, where

U

is the finite alphabet of each symbol,

v_{i}

, which is the same as the source alphabet and whose size is r;

Y

is a finite collection of binary, variable-length strings, which is allowed to consist of an empty string

λ

(whose length is zero);

Σ

is a set of s states of the encoder;

f : Σ \times U \to Y

is the output function; and

g : Σ \times U \to Σ

is the next-state function. Given an infinite input reproduction vector (obtained by concatenating infinitely many output vectors from the reproduction encoder),

v = (v_{1}, v_{2}, \dots)

with

v_{i} \in U

,

i = 1, 2, \dots

, the FS encoder E produces an infinite output sequence,

y = (y_{1}, y_{2}, \dots)

with

y_{i} \in Y

, henceforth referred to as the compressed bit-stream, while passing through a sequence of states

σ = (σ_{1}, σ_{2}, \dots)

with

σ_{i} \in Σ

. The encoder is governed recursively by the equations

\begin{matrix} y_{i} & = & f (σ_{i}, v_{i}), \end{matrix}

(51)

\begin{matrix} σ_{i + 1} & = & g (σ_{i}, v_{i}), \end{matrix}

(52)

for

i = 1, 2, \dots

, with a fixed initial state

σ_{1} = σ_{★} \in Z

. If at any step

y_{i} = λ

, this is referred to as idling, as no output is generated, but only the state is kept updated in response to the input.

An encoder with s states, henceforth called an s-state encoder, is one for which

| Σ | = s

. For the sake of simplicity, we adopt a few notation conventions from [13]: Given a segment of input symbols

v_{i}^{j}

with

i \leq j

and an initial state

σ_{i}

, we use

f (σ_{i}, v_{i}^{j})

to denote the corresponding output segment

y_{i}^{j}

produced by E. Similarly,

g (σ_{i}, v_{i}^{j})

will denote the final state

σ_{j + 1}

after processing the inputs

v_{i}^{j}

, beginning from state

σ_{i}

.

An FS encoder E is called information lossless (IL) if, given any initial state

σ_{i} \in Σ

, any positive integer n, and any input string,

v_{i}^{i + n}

, the set

(σ_{i}, f (σ_{i}, v_{i}^{i + n}), g (σ_{i}, v_{i}^{i + n}))

uniquely determines the corresponding input string

v_{i}^{i + n}

.

The incremental parsing process used by the LZ78 algorithm is a sequential procedure for processing a finite-alphabet input

u^{n}

. At each step of this process, one determines the shortest string that has not yet occurred as a complete phrase in the current parsed set, with the possible exception of the last phrase, which might be incomplete. For example, applying this parsing method to the sequence

u^{15} = 011010011000100

yields

0, 1, 10, 100, 11, 00, 01, 00 .

Let us denote by

c (u^{n})

the total number of distinct phrases generated by this procedure (here,

c (u^{15}) = 8

). In addition, let

L Z (u^{n})

represent the length in bits of the binary string produced by the LZ78 encoding of

u^{n}

. By Theorem 2 in [13], the following inequality holds:

L Z (u^{n}) \leq [c (u^{n}) + 1] log {2 r [c (u^{n}) + 1]}

(53)

which can easily be shown to be further upper bounded by

L Z (u^{n}) \leq c (u^{n}) log c (u^{n}) + n \cdot ϵ_{1} (n),

(54)

where

ϵ_{1} (n)

tends to zero uniformly as

n \to \infty

. In words, the LZ78 code length for

v^{n}

is upper bounded by an expression whose leading term is

c (u^{n}) log c (u^{n})

.

We shall refer to the quantity

c (u^{n}) log c (u^{n})

as the unnormalized LZ complexity of

u^{n}

to distinguish from the normalized LZ complexity defined as

\frac{c (u^{n}) log c (u^{n})}{n}

, which means the per-symbol LZ complexity.

6.2. SLB for Individual Sequences

The following lemma provides a variation of the generalized Kraft inequality of [13].

Lemma 5.

For every IL FS encoder with s states, every

α > 1

, every

β \geq 0

, and every positive integer ℓ, which is an integer multiple of m,

\sum_{σ \in Σ} \sum_{w^{l} \in U^{l}} {exp}_{2} \{- α L [f (σ, q (w^{l}))] - β ρ (w^{l} - q (w^{l}))\} \leq \frac{s^{2} {[\sum_{z \in U} 2^{- β ρ (z)}]}^{l}}{2^{α - 1} - 1},

(55)

where

q (w^{l}) \equiv q (w_{1}^{m}, w_{m + 1}^{2 m}, \dots, w_{l - m + 1}^{l}) \overset{▵}{=} (q (w_{1}^{m}), q (w_{m + 1}^{2 m}), \dots, q (w_{l - m + 1}^{l}))

, the latter being defined as in (50) for vectors in

U^{l}

.

Proof.

From the postulated IL property, it follows that given

σ \in Σ

, there cannot be more than

s 2^{k}

distinct vectors,

{v^{l}}

, such that

L [f (σ, v^{l})] = k

for every positive integer k. Therefore,

\begin{matrix} \sum_{σ \in Σ} \sum_{w^{l} \in U^{l}} {exp}_{2} \{- α L [f (σ, q (w^{l}))] - β ρ (w^{l} - q (w^{l}))\} \\ = & \sum_{σ \in Σ} \sum_{k \geq 1} \sum_{{v^{l} : L [f (σ, v^{l})] = k}} \sum_{{w^{l} : q (w^{l}) = v^{l}}} {exp}_{2} \{- α k - β ρ (w^{l} - q (w^{l}))\} \\ \leq & \sum_{σ \in Σ} \sum_{k \geq 1} s \cdot 2^{k} \cdot 2^{- α k} \sum_{z^{l} \in U^{l}} 2^{- β ρ (z^{l}))} \\ = & s^{2} \cdot {[\sum_{u \in U} 2^{- β ρ (z)}]}^{l} \cdot \sum_{k = 1}^{\infty} 2^{- (α - 1) k} \\ = & \frac{s^{2} {[\sum_{z \in U} 2^{- β ρ (z)}]}^{l}}{2^{α - 1} - 1} . \end{matrix}

(56)

This completes the proof of Lemma 5. □

Now, let ℓ divide n. For a given FS IL encoder E, a given

u^{n} \in U^{n}

, and the associated state sequence

σ^{n} \in Σ^{n}

(generated from

u^{n}

by E using the next-state function g recursively), consider the joint empirical distribution

\hat{P} (σ, w^{l}) = \frac{l}{n} \sum_{i = 0}^{n / l - 1} I {σ_{i l + 1} = σ, u_{i l + 1}^{i l + l} = w^{l}}, σ \in Σ, w^{l} \in U^{l}

(57)

Then, according to Lemma 5,

\begin{matrix} \frac{s^{2} {[\sum_{z \in U} 2^{- β ρ (z)}]}^{l}}{2^{α - 1} - 1} \\ \geq & \sum_{σ \in Σ} \sum_{w^{l} \in U^{l}} {exp}_{2} \{- α L [f (σ, q (w^{l}))] - β ρ (w^{l} - q (w^{l}))\} \\ = & \sum_{σ \in Σ} \sum_{w^{l} \in U^{l}} \hat{P} (σ, w^{l}) {exp}_{2} \{- α L [f (σ, q (w^{l}))] - β ρ (w^{l} - q (w^{l})) - log \hat{P} (σ, w^{l})\} \\ \geq & {exp}_{2} \{- α \sum_{σ \in Σ} \sum_{w^{l} \in U^{l}} \hat{P} (σ, w^{l}) L [f (σ, q (w^{l}))] - β \sum_{w^{l} \in U^{l}} \hat{P} (w^{l}) ρ (w^{l} - q (w^{l})) + \hat{H} (S, U^{l})\} \\ = & {exp}_{2} \{- α \cdot \frac{l}{n} \sum_{i = 0}^{n / l - 1} L [f (σ_{i l + 1}, v_{i l + 1}^{i l + l})] - β \cdot \frac{l}{n} \sum_{i = 0}^{n / l - 1} ρ (u_{i l + 1}^{i l + l} - q (u_{i l + 1}^{i l + l})) + \hat{H} (S, U^{l})\} \\ = & {exp}_{2} \{- α \cdot \frac{l}{n} \sum_{t = 1}^{n} L [f (σ_{t}, v_{t})] - β \cdot \frac{l}{n} \sum_{t = 1}^{n} ρ (u_{t} - v_{t}) + \hat{H} (S, U^{l})\}, \end{matrix}

(58)

where

\hat{P} (w^{l}) = \sum_{σ \in Σ} \hat{P} (σ, w^{l})

and

\hat{H} (S, U^{l})

is the joint entropy of the auxiliary random variables

S \in Σ

and

U^{l} \in U^{l}

, induced by the empirical joint distribution

\hat{P}

. It follows then that

\begin{matrix} α \cdot \frac{1}{n} \sum_{t = 1}^{n} L [f (σ_{t}, v_{t})] + β \cdot \frac{1}{n} \sum_{t = 1}^{n} ρ (u_{t} - v_{t}) \\ \geq & \frac{\hat{H} (S, U^{l})}{l} - \frac{log (s^{2})}{l} - log [\sum_{z \in U} 2^{- β ρ (z)}] + \frac{log (2^{α - 1} - 1)}{l} \\ \geq & \frac{\hat{H} (U^{l})}{l} - log [\sum_{z \in U} 2^{- β ρ (z)}] - \frac{2 log s}{l} + \frac{log (2^{α - 1} - 1)}{l} \\ \geq & \frac{c (u^{n}) log c (u^{n})}{n} - log [\sum_{z \in U} 2^{- β ρ (z)}] - Δ_{n} (l) - \frac{2 log s}{l} + \frac{log (2^{α - 1} - 1)}{l}, \end{matrix}

(59)

where

{lim}_{n \to \infty} Δ_{n} (l) = \frac{1}{l}

, and the last step can be found in Equation (12) of [22] as well as references therein. By taking the limit of

l \to \infty

(followed by the limit

n \to \infty

), the last three terms can be made arbitrarily small, whereas the first two terms serve as the individual-sequence counterpart of the right-hand side of Equation (20). Moving the distortion term to the right-hand side and optimizing over

β

, we obtain

\begin{matrix} α \cdot \frac{1}{n} \sum_{t = 1}^{n} L [f (σ_{t}, v_{t})] \geq \frac{c (u^{n}) log c (u^{n})}{n} - inf_{β \geq 0} \{β \cdot \frac{1}{n} \sum_{t = 1}^{n} ρ (u_{t} - v_{t}) + log [\sum_{z \in U} 2^{- β ρ (z)}]\} - \\ Δ_{n} (l) - \frac{2 log s}{l} + \frac{log (2^{α - 1} - 1)}{l} \\ = \frac{c (u^{n}) log c (u^{n})}{n} - Φ (\frac{1}{n} \sum_{t = 1}^{n} ρ (u_{t} - v_{t})) - \\ Δ_{n} (l) - \frac{2 log s}{l} + \frac{log (2^{α - 1} - 1)}{l} . \end{matrix}

(60)

Let

L_{max} = {max}_{σ, v} L [f (σ, v)]

. Selecting

α = 1 + ζ_{l}

, where

ζ_{l}

tends to zero at a sub-exponential rate (say,

ζ_{l} = 1 / l

), we have, on the one hand, the previous inequality and, on the other hand,

α \cdot \frac{1}{n} \sum_{t = 1}^{n} L [f (σ_{t}, v_{t})] \leq \frac{1}{n} \sum_{t = 1}^{n} L [f (σ_{t}, v_{t})] + ζ_{l} \cdot L_{max},

(61)

and so our main result in this section is the following:

\begin{matrix} \frac{1}{n} \sum_{t = 1}^{n} L [f (σ_{t}, v_{t})] & \geq & \frac{c (u^{n}) log c (u^{n})}{n} - Φ (\frac{1}{n} \sum_{t = 1}^{n} ρ (u_{t} - v_{t})) - \\ Δ_{n} (l) - \frac{2 log s}{l} + \frac{log (2^{ζ_{l}} - 1)}{l} - ζ_{l} \cdot L_{max} \\ \geq & \frac{c (u^{n}) log c (u^{n})}{n} - Φ (\frac{1}{n} \sum_{t = 1}^{n} ρ (u_{t} - v_{t})) - \\ Δ_{n} (l) - \frac{2 log s}{l} + \frac{log (ζ_{l} ln 2)}{l} - ζ_{l} \cdot L_{max}, \end{matrix}

(62)

with the last four terms tending to zero in the limit of

l \to \infty

followed by the limit

n \to \infty

.

To conclude, the individual-sequence counterpart of the SLB for finite-alphabet source sequences is essentially of the same form as the classical SLB of the probabilistic setting, except that the normalized entropy term is replaced by the normalized LZ complexity of the sequence and the function

Φ (\cdot)

is calculated at the point of the actual distortion,

\frac{1}{n} \sum_{t = 1}^{n} ρ (u_{t} - v_{t})

.

7. Summary and Conclusions

In this article, we have derived several modified versions of the Kraft inequality for lossy compression, which allowed us to derive a few sharper forms and broader types of the classical SLB. This was performed in a unified framework that cover various classes of rate-distortion coding schemes and types of distortion constraints. Two classes of codes were considered: one-to-one codes and D-semifaithful codes (as well as codes that are both one-to-one and D-semifaithful). The other types of distortion constraints are sliding-window constraints. Finally, we also derived an individual-sequence counterpart of the Shannon lower bound, which relies on an extended Kraft inequality for IL FS encoders.

It is pointed out that some of the above-mentioned refinements and extensions required us to invoke more advanced analysis tools than those transitionally used in Shannon theory, such as the saddle-point method (in the context of D-semifaithful codes) and the spectral theory of linear operators (in the context of sliding-window distortion constraints).

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A

Proof of the Second Equality in Equation (4).

Let us define the density function,

g (z) = \frac{2^{- β ρ (z)}}{\int_{U} 2^{- β ρ (z^{'})} d z'} .

(A1)

Then,

\begin{matrix} inf_{β \geq 0} \{log [\int_{U} 2^{- β ρ (z)} d z] + β D\} & = & inf_{β \geq 0} \{sup_{f} [- D (f ∥ g)] + log [\int_{U} 2^{- β ρ (z)} d z] + β D\} \\ = & inf_{β \geq 0} \{sup_{f} [\int_{U} d z f (z) log (\frac{2^{- β ρ (z)}}{f (z)})] + β D\} \\ = & inf_{β \geq 0} sup_{f} \{- \int_{U} d z f (z) log f (z) + β [D - \int_{U} f (z) ρ (z) d z]\} \\ \overset{(a)}{=} & sup_{f} inf_{β \geq 0} \{- \int_{U} d z f (z) log f (z) + β [D - \int_{U} f (z) ρ (z) d z]\} \\ = & sup_{f} inf_{β \geq 0} \{h (Z) + β [D - E {ρ (Z)}]\} \\ = & sup_{f} \{\begin{matrix} - \infty & E {ρ (Z)} > D \\ h (Z) & E {ρ (Z)} \leq D \end{matrix} \\ = & sup_{{Z : E {ρ (Z)} \leq D}} h (Z), \end{matrix}

(A2)

where (a) is due to the fact that the objective function,

- \int_{U} d z f (z) log f (z) + β [D - \int_{U} f (z) ρ (z) d z]

, is concave in f and affine (and hence convex) in

β

. □

References

Berger, T. Rate Distortion Theory—A Mathematical Basis for Data Compression; Prentice-Hall: Englewood Cliffs, NJ, USA, 1971. [Google Scholar]
Gray, R.M. Source Coding Theory; Kluwer Academic Publishers: Norwell, MA, USA, 1990. [Google Scholar]
Covet, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Linder, T.; Zamir, R. On the asymptotic tightness of the Shannon lower bound. IEEE Trans. Inf. Theory 1994, 40, 2026–2031. [Google Scholar] [CrossRef]
Koch, T. The Shannon lower bound is asymptotically tight. IEEE Trans. Inf. Theory 2016, 62, 6155–6161. [Google Scholar] [CrossRef]
Kostina, V. Data compression with low distortion and finite blocklength. In Proceedings of the 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 29 September–2 October 2015; pp. 1127–1134. [Google Scholar]
Kostina, V. When is Shannon’s lower bound tight at finite blocklength? In Proceedings of the 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 27–30 September 2016; pp. 982–989. [Google Scholar]
Gastpar, M.; Sula, E. Shannon bounds for quadratic rate-distortion problems. IEEE J. Sel. Areas Inf. Theory 2024, 5, 597–608. [Google Scholar] [CrossRef]
Campbell, L.L. Kraft inequality for decoding with respect to a fidelity criterion. IEEE Trans. Inf. Theory 1973, 19, 68–73. [Google Scholar] [CrossRef]
Merhav, N. A comment on “A rate of convergence result for a universal D-semifaithful code”. IEEE Trans. Inf. Theory 1995, 41, 1200–1202. [Google Scholar] [CrossRef]
Rissanen, J. Tight lower bounds for optimum code length. IEEE Trans. Inf. Theory 1982, 28, 348–349. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, E.-h.; Wei, V.K. The redundancy of source coding with a fidelity criterion. I. known statistics. IEEE Trans. Inf. Theory 1997, 43, 71–91. [Google Scholar] [CrossRef]
Ziv, J.; Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 1978, 24, 530–536. [Google Scholar] [CrossRef]
de Bruijn, N.G. Asymptotic Methods in Analysis, 2nd ed.; Dover Publications: New York, NY, USA, 1981. [Google Scholar]
Merhav, N.; Weinberger, N. A toolbox for refined information-theoretic analyses with applications. Found. Trends Commun. Inf. Theory 2025, 22, 1–184. [Google Scholar] [CrossRef]
Neuschel, T. Apéry polynomials and the multivariate saddle point method. Contr. Approx. 2014, 40, 487–507. [Google Scholar]
Collatz, L. Einschlieβungssatz für die charakteristischen Zahlen von Matrizen. Math. Z. 1942, 48, 221–226. [Google Scholar]
Wielandt, H. Unzerlegbare, nicht negative Matrizen. Math. Z. 1950, 52, 642–648. [Google Scholar] [CrossRef]
Donsker, M.D.; Varadhan, S.R.S. Asymptotic evaluation of certain Markov process expectations for large time, IV. Commun. Pure Appl. Math. 1983, 36, 183–212. [Google Scholar] [CrossRef]
Merhav, N.; Shamai, S. Volume-based lower bounds to the capacity of the Gaussian channel under pointwise additive input constraints. arXiv 2025, arXiv:2510.04095. [Google Scholar]
Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: New York, NY, USA, 1985. [Google Scholar]
Merhav, N. Universal Slepian-Wolf coding for individual sequences. IEEE Trans. Inf. Theory 2025, 71, 783–796. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Refinements and Generalizations of the Shannon Lower Bound via Extensions of the Kraft Inequality

Abstract

1. Introduction

2. Notation Conventions and Background

2.1. Notation Conventions

2.2. Background

3. Extended Kraft Inequalities

4. Lower Bounds

4.1. One-to-One Codes

4.2. D-Semifaithful Codes

5. Sliding-Window Distortion Constraints

6. Individual Sequences and Finite-State Encoders

6.1. Background

6.2. SLB for Individual Sequences

7. Summary and Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics