Successive Refinement for Lossy Compression of Individual Sequences

Merhav, Neri

doi:10.3390/e27040370

Open AccessArticle

Successive Refinement for Lossy Compression of Individual Sequences

by

Neri Merhav

The Viterbi Faculty of Electrical and Computer Engineering, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, Israel

Entropy 2025, 27(4), 370; https://doi.org/10.3390/e27040370

Submission received: 23 February 2025 / Accepted: 28 March 2025 / Published: 31 March 2025

(This article belongs to the Collection Feature Papers in Information Theory)

Download Versions Notes

Abstract

:

We consider the problem of successive-refinement coding for lossy compression of individual sequences, namely, compression in two stages, where in the first stage, a coarse description at a relatively low rate is sent from the encoder to the decoder, and in the second stage, an additional coding rate is allocated in order to refine the description and thereby improve the reproduction. Our main result is in establishing outer bounds (converse theorems) for the rate region where we limit the encoders to be finite-state machines in the spirit of Ziv and Lempel’s 1978 model. The matching achievability scheme is conceptually straightforward. We also consider the more general multiple description coding problem on a similar footing and propose achievability schemes that are analogous to the well-known El Gamal–Cover and the Zhang–Berger achievability schemes of memoryless sources and additive distortion measures.

Keywords:

finite-state machine; Lempel-Ziv algorithm; successive refinement; multiple description coding

1. Introduction

The notion of successive refinement of information refers to systems in which the reconstruction of the source occurs in multiple stages. In these systems, a single encoder encodes the source and communicates with either one decoder or multiple decoders in a step-by-step manner. At each stage, the encoder transmits a portion of the source information to the corresponding decoder, which also has access to all previous transmissions. Each decoder uses all available transmissions to reconstruct the source, possibly incorporating additional side information. The quality of the reconstruction at each stage (or by each decoder) is evaluated based on a predefined distortion measure. One of the motivations of this hierarchical structure is to allow scalability that meets the available channel resources of the various users who receive the compressed information or even to adapt to time-varying channel conditions of a single user. Several studies have addressed the successive refinement problem for probabilistic sources, most notably, memoryless sources, see, e.g., [1], where necessary and sufficient conditions for simultaneously achieving the rate-distortion function at all stages [2,3], where the rate-distortion region was fully characterized in general, and [4], where source coding error exponents were derived. In some later works, such as [5,6,7,8], successive refinement coding was also considered with the incorporation of side information. Successive-refinement coding is also an important special case of the so called multiple description coding, which in its simplest form, consists of two encoders that send two different individual descriptions of the source to two separate respective decoders (that do not cooperate with each other), and the compressed bit-streams pertaining to those descriptions are also combined and sent to yet another decoder, whose role is to produce a better reconstruction than both of those of the two individual decoders. The problem of fully characterizing the rate-distortion region of the multiple description coding problem is open in its general form, and there are certain inner and outer bounds to this region, see Chapter 13 of [9] for some details and references therein.

In this paper, we focus on successive refinement of information in the context of individual sequences, namely, deterministic source sequences, as opposed to the traditional setting of random sequences governed by a certain probabilistic mechanism. In that sense, this work can be viewed as an additional step in the development of multiuser information theory for individual sequences, following a series of earlier works of this flavor, initiated by Ziv and Lempel in [10,11,12,13], and continued by others in many articles, such as [14,15,16,17,18,19]. In particular, we consider the problem of successive-refinement coding for lossy compression of individual sequences in two stages, where in the first stage, a coarse description at a relatively low rate is sent from the encoder to the decoder, and in the second stage, an additional coding rate is allocated in order to refine the description and thereby improve the reproduction. Our main results are in establishing outer bounds (converse theorems) for the rate region with individual sequences, where we limit the encoders to be finite-state machines similarly as in [11,12,13] and others. The compatible achievability scheme is conceptually straightforward, and so, we believe that the deeper and more interesting part of the contribution of this work is in the converse theorems, namely, in the outer bounds. Our results are formulated and proved for two stages of coding, but their extension to any fixed number of stages is straightforward. These results can also be viewed as an extension of the fixed-distortion results of [14] to successive-refinement coding.

We also consider the more general multiple description coding problem and propose achievability schemes that are analogous to the well-known El Gamal–Cover [20] and the Zhang–Berger [21] achievability schemes of memoryless sources and additive distortion measures. There is a clear parallelism between the rate expressions that we obtain and those of [20,21], including those that are associated with the gaps between the outer bound and the corresponding inner bounds.

The outline of the remaining part of this paper is as follows. In Section 2, we establish notation conventions and formulate the problem setting. In Section 3, we provide some general background on the LZ algorithm as well as on its conditional form. Section 4 is devoted to the above-mentioned successive refinement outer bound, and finally, in Section 5, we address the more general multiple description problem.

2. Notation Conventions and Problem Formulation

Throughout the paper, random variables will be denoted by capital letters, specific values they may take will be denoted by the corresponding lower case letters and their alphabets will be denoted by calligraphic letters. Random vectors, their realizations, and their alphabets will be denoted, respectively, by capital letters, the corresponding lower case letters, and the corresponding calligraphic letters, all superscripted by their dimensions. More specifically, for a given positive integer, n, the source vector

(x_{1}, x_{2}, \dots, x_{n})

, with components,

x_{i}

,

i = 1, 2, \dots, n

, from a finite-alphabet,

X

, will be denoted by

x^{n}

. The set of all such n-vectors will be denoted by

X^{n}

, which is the n–th order Cartesian power of the single-letter source alphabet,

X

. Likewise, reproduction vectors of length n, such as

({\hat{x}}_{1}, \dots, {\hat{x}}_{n})

and

({\tilde{x}}_{1}, \dots, {\tilde{x}}_{n})

, with components,

{\hat{x}}_{i}

and

{\tilde{x}}_{i}

,

i = 1, \dots, n

, from finite-alphabets,

\hat{X}

and

\tilde{X}

, will be denoted by

{\hat{x}}^{n} \in {\hat{X}}^{n}

and

{\tilde{x}}^{n} \in {\tilde{X}}^{n}

, respectively. An infinite source sequence

(x_{1}, x_{2}, \dots)

will be denoted by x. The cardinalities of

X

,

\hat{X}

, and

\tilde{X}

will be denoted by

α

,

β

, and

γ

, respectively. The value

α = \infty

is allowed, to incorporate countably infinite and continuous source alphabets. By contrast,

β

and

γ

will always be finite.

For two positive integers i and j, where

i \leq j

, the notation

x_{i}^{j}

will be used to denote the substring

(x_{i}, x_{i + 1}, \dots, x_{j})

. For

i = 1

, the subscript ‘1’ will be omitted, and so, the shorthand notation of

(x_{1}, x_{2}, \dots, x_{n})

will be

x^{n}

, as mentioned before. Similar conventions will apply to other sequences. Probability distributions will be denoted by the letter P with possible subscripts, depending on the context. The indicator function for an event

A

will be denoted by

I {A}

, that is,

I {A} = 1

is

A

occurs, and

I {A} = 0

, if not. The logarithmic function,

log x

, will be understood to be defined to the base 2. Logarithms to the base e will be denoted by ln. Let

d_{1} : X^{n} \times {\hat{X}}^{n} \to R^{+}

and

d_{2} : X^{n} \times {\tilde{X}}^{n} \to R^{+}

be two arbitrary distortion functions between source vectors,

x^{n}

, and corresponding reproduction vectors,

{\hat{x}}^{n}

and

{\tilde{x}}^{n}

, respectively.

The successive refinement encoder model is as follows. It is composed of a cascade of two encoders: a reproduction encoder (namely, a vector quantizer) followed by a lossless encoder. The input to the reproduction encoder is the source sequence

x^{n}

and the output is a pair of reproduction vectors,

({\hat{x}}^{n}, {\tilde{x}}^{n}) \in {\hat{X}}^{n} \times {\tilde{X}}^{n}

that obeys the distortion constraints,

d_{1} (x^{n}, {\hat{x}}^{n}) \leq n D_{1}

and

d_{2} (x^{n}, {\tilde{x}}^{n}) \leq n D_{2}

, where

D_{1} \geq 0

and

D_{2} \geq 0

are prescribed normalized distortion levels. There are no particular restrictions imposed on the distortion functions. We denote

B (x^{n}) = {({\hat{x}}^{n}, {\tilde{x}}^{n}) : d_{1} (x^{n}, {\hat{x}}^{n}) \leq n D_{1}, d_{2} (x^{n}, {\tilde{x}}^{n}) \leq n D_{2}} .

(1)

The mapping

X^{n} \to {\hat{X}}^{n} \times {\tilde{X}}^{n}

, employed by the reproduction encoder, is arbitrary and not limited. The pair

({\hat{x}}^{n}, {\tilde{x}}^{n})

serves as an input to the lossless encoder.

As for the lossless encoder, we follow the same modeling approach as in [11], but with a few adjustments to make it suitable to successive refinement. In particular, the lossless encoder is defined by a set

E = (\hat{X}, \tilde{X}, U, V, S, Z, f_{1}, f_{2}, g_{1}, g_{2}),

where

\hat{X}

and

\tilde{X}

are as before;

U

and

V

are two sets of variable-length binary strings, which both include the empty string

λ

of length zero;

S

and

Z

are two sets of states, each one containing q states;

f_{1} : S \times \hat{X} \to U

and

f_{2} : Z \times \hat{X} \times \tilde{X} \to V

are the encoder output functions, and finally,

g_{1} : S \times \hat{X} \to S

and

g_{2} : Z \times \hat{X} \times \tilde{X} \to Z

are two next-state functions. When the lossless encoder is fed by a sequence of pairs

({\hat{x}}_{1}, {\tilde{x}}_{1}), ({\hat{x}}_{2}, {\tilde{x}}_{2}), \dots

, it outputs a corresponding sequence of pairs of binary strings,

(u_{1}, v_{1}), (u_{2}, v_{2}), \dots

according to the following recursive mechanism. For

t = 1, 2, \dots

:

\begin{matrix} u_{t} & = & f_{1} (s_{t}, {\hat{x}}_{t}) \end{matrix}

(2)

\begin{matrix} s_{t + 1} & = & g_{1} (s_{t}, {\hat{x}}_{t}) \end{matrix}

(3)

\begin{matrix} v_{t} & = & f_{2} (z_{t}, {\hat{x}}_{t}, {\tilde{x}}_{t}) \end{matrix}

(4)

\begin{matrix} z_{t + 1} & = & g_{2} (z_{t}, {\hat{x}}_{t}, {\tilde{x}}_{t}), \end{matrix}

(5)

where the initial states,

s_{1}

and

z_{1}

, are assumed arbitrary fixed members of

S

and

Z

, respectively. Similarly as in [11], we adopt the extended notation, of

f_{1} (s_{1}, {\hat{x}}^{n})

for

u^{n}

,

g_{1} (s_{1}, {\hat{x}}^{n})

for

s_{n + 1}

, and similar notations associated with

f_{2}

and

g_{2}

.

An encoder E is said to be information lossless if for every positive integer k, the vector

(s_{1}, f_{1} (s_{1}, {\hat{x}}^{k}), g_{1} (s_{1}, {\hat{x}}^{k}))

uniquely determines

{\hat{x}}^{k}

and likewise,

(s_{1}, z_{1}, f_{1} (s_{1}, {\hat{x}}^{k}), f_{2} (z_{1}, {\hat{x}}^{k}, {\tilde{x}}^{k}), g_{1} (s_{1}, {\tilde{x}}^{k}), g_{2} (z_{1}, {\hat{x}}^{k}, {\tilde{x}}^{k}))

uniquely determines

({\hat{x}}^{k}, {\tilde{x}}^{k})

. Let

E (q)

be the set of all information lossless encoders,

{E}

, with

| S | \leq q

and

| Z | \leq q

.

Given a lossless encoder E and a pair of inputs

({\hat{x}}^{n}, {\tilde{x}}^{n})

, we define

{[ρ_{E} ({\hat{x}}^{n})]}_{1} = \frac{L (u^{n})}{n} = \frac{L [f_{1} (s_{1}, {\hat{x}}^{n})]}{n}

(6)

and

{[ρ_{E} ({\hat{x}}^{n}, {\tilde{x}}^{n})]}_{2} = \frac{L (u^{n}) + L (v^{n})}{n} = \frac{L [f_{1} (s_{1}, {\hat{x}}^{n})] + L [f_{2} (z_{1}, {\hat{x}}^{n}, {\tilde{x}}^{n})]}{n},

(7)

where

L (u^{n}) = \sum_{i = 1}^{n} l (u_{i})

,

l (u_{i})

being the length (in bits) of the binary string

u_{i}

, and similarly for

L (v^{n})

. Recall that for the empty string,

λ

, we define

l (λ) = 0

. The achievable rate region for

x^{n}

that is associated with E is defined as

R_{E} (x^{n}) = ⋃_{({\hat{x}}^{n}, {\tilde{x}}^{n}) \in B (x^{n})} {(R_{1}, R_{2}) : R_{1} \geq {[ρ_{E} ({\hat{x}}^{n})]}_{1}, R_{1} + R_{2} \geq {[ρ_{E} ({\hat{x}}^{n}, {\tilde{x}}^{n})]}_{2}},

(8)

and the q-state achievable rate region for

x^{n}

is defined as

R_{q} (x^{n}) = ⋃_{E \in E (q)} R_{E} (x^{n}) .

(9)

The rationale behind the union operations in these definitions is that they are two-dimensional set-theoretic analogues of the minimization operations over the appropriate encoders

{E}

and reproduction vectors,

{{\hat{x}}^{n}}

, that appear for a single coding rate, like in [11,14], as can be noted from the simple relationship:

\begin{matrix} \{R : R \geq min_{E \in {all q - state encoders}} min_{{\hat{x} : d (x^{n}, {\hat{x}}^{n}) \leq n D}} ρ_{E} ({\hat{x}}^{n})\} \\ = & ⋃_{E \in {all q - state encoders}} ⋃_{{{\hat{x}}^{n} : d (x^{n}, {\hat{x}}^{n}) \leq n D}} {R : R \geq ρ_{E} ({\hat{x}}^{n})} \end{matrix}

(10)

for a given generic distortion function d and distortion level D.

For later use, we also define the joint empirical distribution of ℓ-blocks of

({\hat{x}}_{i ℓ + 1}^{i ℓ + ℓ}, {\tilde{x}}_{i ℓ + 1}^{i ℓ + ℓ})

,

ℓ = 0, 1, \dots, n / ℓ - 1

, provided that ℓ divides n. Specifically, consider the empirical distribution,

\hat{P} = {\hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}), {\hat{x}}^{ℓ} \in {\hat{X}}^{ℓ}, {\tilde{x}}^{ℓ} \in {\tilde{X}}^{ℓ}}

, of pairs of ℓ-vectors, defined as

\hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) = \frac{ℓ}{n} \sum_{i = 0}^{n / ℓ - 1} I {{\hat{x}}_{i ℓ + 1}^{i ℓ + ℓ} = {\hat{x}}^{ℓ}, {\tilde{x}}_{i ℓ + 1}^{i ℓ + ℓ} = {\tilde{x}}^{ℓ}}, {\hat{x}}^{ℓ} \in {\hat{X}}^{ℓ}, {\tilde{x}}^{ℓ} \in {\tilde{X}}^{ℓ}

(11)

Let

H ({\hat{X}}^{ℓ}, {\tilde{X}}^{ℓ})

denote the joint empirical entropy of an auxiliary pair of random ℓ-vectors,

({\hat{X}}^{ℓ}, {\tilde{X}}^{ℓ})

, induced by

\hat{P}

, that is,

H ({\hat{X}}^{ℓ}, {\tilde{X}}^{ℓ}) = - \sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} \hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) log \hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) .

(12)

Accordingly,

H ({\hat{X}}^{ℓ})

and

H ({\tilde{X}}^{ℓ} | {\hat{X}}^{ℓ})

will denote the corresponding marginal empirical entropy of

{\hat{X}}^{ℓ}

and the conditional empirical entropy of

{\tilde{X}}^{ℓ}

given

{\hat{X}}^{ℓ}

.

Our objective is to provide inner and outer bounds to the achievable rate region and show that they asymptotically coincide in the limit of large n followed by a limit of large q, in analogy to the asymptotic regime of [11].

3. Background

Before the exposition of the main results and their proofs, we revisit key terms and details related to the 1978 version of the LZ algorithm, also known as the LZ78 algorithm [11], which is the central building block in this work. The incremental parsing procedure of the LZ78 algorithm is a sequential parsing process applied to a finite-alphabet input vector,

{\hat{x}}^{n}

. According to this procedure, each new phrase is the shortest string not encountered before as a parsed phrase, except for the potential incompleteness of the last phrase. For instance, the incremental parsing of the vector

{\hat{x}}^{15} = a b b a b a a b b a a a b a a

results in

a, b, b a, b a a, b b, a a, a b, a a

. Let

c ({\hat{x}}^{n})

denote the number of phrases in

{\hat{x}}^{n}

resulting from the incremental parsing procedure (in the above example,

c ({\hat{x}}^{15}) = 8

). Furthermore, let

L Z ({\hat{x}}^{n})

denote the length of the LZ78 binary compressed code for

{\hat{x}}^{n}

. According to [11], [Theorem 2] the following inequality holds:

\begin{matrix} L Z ({\hat{x}}^{n}) & \leq & [c ({\hat{x}}^{n}) + 1] log {2 β [c ({\hat{x}}^{n}) + 1]} \\ = & c ({\hat{x}}^{n}) log [c ({\hat{x}}^{n}) + 1] + c ({\hat{x}}^{n}) log (2 β) + log {2 β [c ({\hat{x}}^{n}) + 1]} \\ = & c ({\hat{x}}^{n}) log c ({\hat{x}}^{n}) + c ({\hat{x}}^{n}) log [1 + \frac{1}{c ({\hat{x}}^{n})}] + c ({\hat{x}}^{n}) log (2 β) + log {2 β [c ({\hat{x}}^{k}) + 1]} \\ \leq & c ({\hat{x}}^{n}) log c ({\hat{x}}^{n}) + log e + \frac{n (log β) log (2 β)}{(1 - ε_{n}) log n} + log [2 β (n + 1)] \\ \overset{▵}{=} & c ({\hat{x}}^{n}) log c ({\hat{x}}^{n}) + n \cdot ϵ (n), \end{matrix}

(13)

where we remind that

β

is the cardinality of

\hat{X}

, and where both

ε_{n}

and

ϵ (n)

tends to zero as

n \to \infty

. In other words, the LZ code-length for

{\hat{x}}^{n}

is upper bounded by an expression whose main term is

c ({\hat{x}}^{n}) log c ({\hat{x}}^{n})

. On the other hand,

c ({\hat{x}}^{n}) log c ({\hat{x}}^{n})

is also known to be the main term of a lower bound (see Theorem 1 of [11]) to the shortest code-length attainable by any information lossless finite-state encoder with no more than q states, provided that

log (q^{2})

is very small compared to

log c ({\hat{x}}^{n})

. In view of these facts, we henceforth refer to

c ({\hat{x}}^{n}) log c ({\hat{x}}^{n})

as the unnormalized LZ complexity of

{\hat{x}}^{n}

, whereas the normalized LZ complexity is defined as

ρ_{L Z} ({\hat{x}}^{n}) \overset{▵}{=} \frac{c ({\hat{x}}^{n}) log c ({\hat{x}}^{n})}{n} .

(14)

A useful inequality that relates the empirical entropy of non-overlapping ℓ-blocks of

{\hat{x}}^{n}

(where ℓ divides n) and

ρ_{L Z} ({\hat{x}}^{n})

(see, for example, Equation (26) of [22]), is the following:

\begin{matrix} \frac{H ({\hat{X}}^{ℓ})}{ℓ} & \geq & ρ_{L Z} (x^{n}) - \frac{log [4 β^{2 ℓ}] log β}{(1 - ε_{n}) log n} - \frac{β^{2 ℓ} log [4 β^{2 ℓ}]}{n} - \frac{1}{ℓ} \\ \overset{▵}{=} & ρ_{L Z} (x^{n}) - δ_{n} (ℓ), \end{matrix}

(15)

It is obtained from the fact that the Shannon code for ℓ-blocks can be implemented using a finite-state encoder with no more than

β^{ℓ}

states. Specifically, for a block code of length ℓ to be implemented by a finite-state machine, one defines the state at each time instant i to be the contents of the input, starting at the beginning of the current block (at time

ℓ \cdot ⌊ i / ℓ ⌋ + 1

) and ending at time

i - 1

. The number of states for an input alphabet of size

β

is then

\sum_{i = 0}^{ℓ - 1} β^{i} = (β^{ℓ} - 1) / (β - 1) < β^{ℓ}

. Therefore, the code-length of this Shannon code must comply with the lower bound of Theorem 1 in [11]. Note that

{lim}_{n \to \infty} δ_{n} (ℓ) = 1 / ℓ

and so,

{lim}_{ℓ \to \infty} {lim}_{n \to \infty} δ_{n} (ℓ) = 0

. Clearly, it is possible to let

ℓ = ℓ (n)

increase with n slowly enough such that

δ_{n} (ℓ (n)) \to 0

as

n \to \infty

, in particular,

ℓ (n)

should be

o (log n)

for that purpose.

In [22], the notion of the LZ complexity was extended to incorporate finite-state lossless compression in the presence of side information, namely, the conditional version of the LZ complexity. Given

{\hat{x}}^{n}

and

{\tilde{x}}^{n}

, let us apply the incremental parsing procedure of the LZ algorithm to the sequence of pairs

(({\hat{x}}_{1}, {\tilde{x}}_{1}), ({\hat{x}}_{2}, {\tilde{x}}_{2}), \dots, ({\hat{x}}_{n}, {\tilde{x}}_{n}))

. As mentioned before, according to this procedure, all phrases are distinct with a possible exception of the last phrase, which might be incomplete. Let

c ({\hat{x}}^{n}, {\tilde{x}}^{n})

denote the number of distinct phrases. As an example (taken from [22]), let

n = 6

and consider the sequence pair

({\hat{x}}^{6}, {\tilde{x}}^{6})

along with its joint incremental parsing as follows:

\begin{matrix} {\hat{x}}^{6} & = & 0 | 1 | 0 1 | 0 1 | \\ {\tilde{x}}^{6} & = & 0 | 1 | 0 0 | 0 1 | \end{matrix}

(16)

then

c ({\hat{x}}^{6}, {\tilde{x}}^{6}) = 4

. Let

c^{'} ({\hat{x}}^{n})

denote the resulting number of distinct phrases of

{\hat{x}}^{k}

(which may differ from

c ({\hat{x}}^{n})

in individual parsing of

{\hat{x}}^{n}

alone), and let

\hat{x} (l)

denote the l-th distinct

\hat{x}

–phrase,

l = 1, 2, \dots, c^{'} ({\hat{x}}^{n})

. In the above example,

c ({\hat{x}}^{6}) = 3

. Denote by

c_{l} ({\tilde{x}}^{n} | {\hat{x}}^{n})

the number of occurrences of

\hat{x} (l)

in the parsing of

{\hat{x}}^{n}

, or equivalently, the number of distinct

\tilde{x}

-phrases that jointly appear with

\hat{x} (l)

. Clearly,

\sum_{l = 1}^{c^{'} ({\hat{x}}^{n})} c_{l} ({\tilde{x}}^{n} | {\hat{x}}^{n}) = c ({\hat{x}}^{n}, {\tilde{x}}^{n})

. In the above example,

\hat{x} (1) = 0

,

\hat{x} (2) = 1

,

\hat{x} (3) = 01

,

c_{1} ({\tilde{x}}^{6} | {\hat{x}}^{6}) = c_{2} ({\tilde{x}}^{6} | {\hat{x}}^{6}) = 1

, and

c_{3} ({\tilde{x}}^{6} | {\hat{x}}^{6}) = 2

. Now, the conditional LZ complexity of

{\tilde{x}}^{n}

given

{\hat{x}}^{n}

is defined as

ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n}) \overset{▵}{=} \frac{1}{n} \sum_{l = 1}^{c^{'} ({\hat{x}}^{n})} c_{l} ({\tilde{x}}^{n} | {\hat{x}}^{n}) log c_{l} ({\tilde{x}}^{n} | {\hat{x}}^{n}) .

(17)

In [22] it was shown that

ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n})

is the main term of the compression ratio achieved by the conditional version of the LZ algorithm for compressing

{\tilde{x}}^{n}

in the presence of the side information

{\hat{x}}^{n}

, available to both encoder and decoder—see the compression scheme described in [22] (see also [15]), i.e., the length function,

L Z ({\tilde{x}}^{n} | {\hat{x}}^{n})

, of the coding scheme proposed therein is upper bounded (in parallel to (13)) by

L Z ({\tilde{x}}^{n} | {\hat{x}}^{n}) \leq n ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n}) + n \hat{ϵ} (n),

(18)

where

\hat{ϵ} (n) = O (\frac{log (log n)}{log n})

(see Equations (10) and (11) in [15]). On the other hand, analogously to [11] [Theorem 1], it was shown in [16], that

ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n})

is also the main term of a lower bound to the compression ratio that can be achieved by any finite-state encoder with side information at both ends, provided that the number of states is not too large, similarly as described above for the unconditional version.

The inequality (15) also extends to the conditional case as follows (see [16]):

\frac{H ({\tilde{X}}^{ℓ} | {\hat{X}}^{ℓ})}{ℓ} \geq ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n}) - δ_{n}^{'} (ℓ),

(19)

where

δ_{n}^{'} (ℓ)

is the same as

δ_{n} (ℓ)

except that

β^{ℓ}

therein is replaced by

{(β γ)}^{ℓ}

to accommodate the number of states associated with the conditional version of the aforementioned Shannon code applied to ℓ-blocks. By the same token, we also have

\frac{H ({\hat{X}}^{ℓ}, {\tilde{X}}^{ℓ})}{ℓ} \geq ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}) - δ_{n}^{'} (ℓ) .

(20)

4. The Outer Bound for Successive Refinement

Our main result for finite-state encoders is the following.

Theorem 1.

For every

x^{n} \in X^{n}

,

\begin{matrix} R_{q} (x^{n}) & \subseteq & R_{o} (x^{n}) \\ \overset{▵}{=} & ⋃_{({\hat{x}}^{n}, {\tilde{x}}^{n}) \in B (x^{n})} R_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}), \end{matrix}

(21)

where

\begin{matrix} R_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}) & \overset{▵}{=} & {(R_{1}, R_{2}) : \\ R_{1} & \geq & ρ_{L Z} ({\hat{x}}^{n}) - Δ_{1} (q, n), \\ R_{1} + R_{2} & \geq & ρ_{L Z} ({\hat{x}}^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n}) - Δ_{2} (q, n)}, \end{matrix}

(22)

where

Δ_{1} (q, n)

and

Δ_{2} (q, n)

are defined as

Δ_{1} (q, n) = \frac{log (4 q^{2}) log β}{(1 - ϵ_{n}) log n} + \frac{q^{2} log (4 q^{2})}{n}

(23)

with

ϵ_{n} \to 0

as

n \to \infty

, and

\begin{matrix} Δ_{2} (n, q) & = & min_{{ℓ : ℓ d i v i d e s n}} \{δ_{n} (ℓ) + δ_{n}^{'} (ℓ) + \frac{1}{ℓ} log [q^{4} (1 + log [1 + \frac{β^{ℓ} γ^{ℓ}}{q^{4}}])]\} . \end{matrix}

(24)

Discussion 1.

Several comments are now in order.

1.: Since both $Δ_{1} (q, n)$ and $Δ_{2} (q, n)$ tend to zero as $n \to \infty$ for fixed q, the asymptotic achievability of $R_{o} (x^{n})$ is conceptually straightforward: Given an internal point, $(R_{1}, R_{2}) \in R_{o} (x^{n})$ , there must be at least one pair $({\hat{x}}^{n}, {\tilde{x}}^{n}) \in B (x^{n})$ such that $(R_{1}, R_{2}) \in R_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n})$ . Upon finding such a pair, proceed as follows: At the first stage, apply LZ78 compression to ${\hat{x}}^{n}$ at a coding rate of $R_{1} = \frac{L Z ({\hat{x}}^{n})}{n}$ which is only slightly above $ρ_{L Z} ({\hat{x}}^{n})$ for large n, as discussed in Section 3. At the second stage, apply conditional LZ compression of ${\tilde{x}}^{n}$ given ${\hat{x}}^{n}$ as side information at both ends, at an incremental coding rate of $R_{2} = \frac{L Z ({\tilde{x}}^{n} | {\hat{x}}^{n})}{n}$ which is close to $ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n})$ , as also explained in Section 3, and then the total rate, $R_{1} + R_{2}$ , is about $ρ_{L Z} ({\hat{x}}^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n})$ . Similarly as in [11], there is still a certain gap between the achievability and the converse theorem because the achievability requires encoders whose number of states is not small compared to n, whereas the converse is significant when q is very small relative to n. As in [11], this gap can be closed in the asymptotic limit of large q by partitioning the sequence into non-overlapping blocks and starting over the LZ compression mechanism in each block separately. We will address this point in detail later on.
2.: Considering that the achievability is conceptually straightforward, as explained in item no. 1 above, the interesting and deeper result is the converse theorem. Since the second stage encoder receives both ${\hat{x}}^{n}$ and ${\tilde{x}}^{n}$ as inputs, it is immediate to lower bound the total coding rate, at the second stage, in terms of the joint compressibility of $({\hat{x}}^{n}, {\tilde{x}}^{n})$ , namely by $ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n})$ , but recall that the first-stage encoder must have already allocated a rate at least as large as $ρ_{L Z} ({\hat{x}}^{n})$ , then in order to meet a lower bound of $ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n})$ on the total coding rate, the incremental rate, $R_{2}$ , of the second stage must not exceed $ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}) - ρ_{L Z} ({\hat{x}}^{n})$ , and there is no apparent way to achieve such a coding rate, as far as the author can see. Nonetheless, since we can also lower bound the total rate of both stages by $ρ_{L Z} ({\hat{x}}^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n})$ , then the achievability becomes obvious, as said. This point is not trivial because there is no chain rule that applies to the LZ complexities of arbitrary finite sequences. The proof that $ρ_{L Z} ({\hat{x}}^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n})$ also serves as a lower bound (essentially), which requires a certain manipulation by using a generalized Kraft inequality and passing via empirical entropies, as can be seen in the proof.
3.: The choice of ${\hat{x}}^{n}$ exhibits a trade-off between the coding rate of the first stage and the incremental rate at the second stage because ${\hat{x}}^{n}$ is both compressed at the first stage and serves as side information at the second stage, so there might be a certain tension between selecting ${\hat{x}}^{n}$ for having small $ρ_{L Z} ({\hat{x}}^{n})$ and selecting it for small $ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n})$ . Of course, an analogous tension exists also in successive refinement for memoryless sources [3]. The reproduction encoder must select $({\hat{x}}^{n}, {\tilde{x}}^{n}) \in B (x^{n})$ that best compromises these criteria.
4.: The results extend straightforwardly to any finite number of stages, where at each stage one applies conditional LZ compression of the current reproduction given all previous reproductions.

Proof of Theorem 1.

We begin with the first stage. By definition, if

(R_{1}, R_{2}) \in R_{q} (x^{n})

, then there must exist an encoder

E \in E (q)

and

({\hat{x}}^{n}, {\tilde{x}}^{n}) \in B (x^{n})

such that

R_{1} \geq {[ρ_{E} ({\hat{x}}^{n})]}_{1}

and

R_{1} + R_{2} \geq {[ρ_{E} ({\hat{x}}^{n}, {\tilde{x}}^{n})]}_{2}

. Now, according to Theorem 1 of [11]:

\begin{matrix} {[ρ_{E} ({\hat{x}}^{n})]}_{1} & \geq & \frac{c ({\hat{x}}^{n}) + q^{2}}{n} \cdot log [\frac{c ({\hat{x}}^{n}) + q^{2}}{4 q^{2}}] + \frac{2 q^{2}}{n} \\ > & \frac{c ({\hat{x}}^{n}) + q^{2}}{n} \cdot log [c ({\hat{x}}^{n}) + q^{2}] - \frac{c ({\hat{x}}^{n}) + q^{2}}{n} log (4 q^{2}) \\ > & \frac{c ({\hat{x}}^{n}) log c ({\hat{x}}^{n})}{n} - \frac{c ({\hat{x}}^{n}) log (4 q^{2})}{n} - \frac{q^{2} log (4 q^{2})}{n} \\ \geq & \frac{c ({\hat{x}}^{n}) log c ({\hat{x}}^{n})}{n} - \frac{log (4 q^{2}) log β}{(1 - ϵ_{n}) log n} - \frac{q^{2} log (4 q^{2})}{n} \\ = & ρ_{L Z} ({\hat{x}}^{n}) - Δ_{1} (q, n), \end{matrix}

(25)

where

{lim}_{n \to \infty} ϵ_{n} = 0

and the last inequality is an application of Equation (6) in [11]. Since

R_{1} \geq {[ρ_{E} ({\hat{x}}^{n})]}_{1}

, it follows that

R_{1} \geq ρ_{L Z} ({\hat{x}}^{n}) - Δ_{1} (q, n) .

(26)

Moving on to the combined encoder of both stages, consider the following. According to Lemma 2 of [11] and due to the postulated information losslessness, the combined encoder, which has

q^{2}

states, must obey the following generalized Kraft inequality:

\sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} {exp}_{2} \{- [min_{s \in S} L [f_{1} (s, {\hat{x}}^{ℓ})] + min_{z \in Z} L [f_{2} (z, {\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})]\} \leq q^{4} (1 + log [1 + \frac{β^{ℓ} γ^{ℓ}}{q^{4}}]) .

(27)

This implies that the description length at the output of this encoder is lower bounded as follows.

\begin{matrix} n (R_{1} + R_{2}) & \geq & n {[ρ_{E} ({\hat{x}}^{n}, {\tilde{x}}^{n})]}_{2} \\ = & L (u^{n}) + L (v^{n}) \\ = & \sum_{t = 1}^{n} {L [f_{1} (s_{t}, {\hat{x}}_{t})] + L [f_{2} (z_{t} . {\hat{x}}_{t}, {\tilde{x}}_{t})]} \\ = & \sum_{m = 0}^{n / ℓ - 1} \sum_{j = 1}^{ℓ} {L [f_{1} (s_{m ℓ + j}, {\hat{x}}_{m ℓ + j})] + L [f_{2} (z_{m ℓ + j}, {\hat{x}}_{m ℓ + j}), {\tilde{x}}_{m ℓ + j})]} \\ = & \sum_{m = 0}^{n / ℓ - 1} {L [f_{1} (s_{m ℓ + 1}, {\hat{x}}_{m ℓ + 1}^{m ℓ + ℓ})] + L [f_{2} (z_{m ℓ + 1}, {\hat{x}}_{m ℓ + 1}^{m ℓ + ℓ}, {\tilde{x}}_{m ℓ + 1}^{m ℓ + ℓ})]} \\ \geq & \sum_{m = 0}^{n / ℓ - 1} \{min_{s \in S} L [f_{1} (s, {\hat{x}}_{m ℓ + 1}^{m ℓ + ℓ})] + min_{z \in Z} L [f_{2} (z, {\hat{x}}_{m ℓ + 1}^{m ℓ + ℓ}, {\tilde{x}}_{m ℓ + 1}^{m ℓ + ℓ})]\} \\ = & \frac{n}{ℓ} \sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} \hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \cdot \{min_{s \in S} L [f_{1} (s, {\hat{x}}^{ℓ})] + min_{z \in Z} L [f_{2} (z, {\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})]\} \end{matrix}

(28)

and so,

R_{1} + R_{2} \geq \frac{1}{ℓ} \sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} \hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \cdot \{min_{s \in S} L [f_{1} (s, {\hat{x}}^{ℓ})] + min_{z \in Z} L [f_{2} (z, {\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})]\} .

(29)

Now, by the generalized Kraft inequality above,

\begin{matrix} q^{4} (1 + log [1 + \frac{β^{ℓ} γ^{ℓ}}{q^{4}}]) \\ \geq & \sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} {exp}_{2} \{- (min_{s \in S} L [f_{1} (s, {\hat{x}}^{ℓ})] + min_{z \in Z} L [f_{2} (z, {\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})])\} \\ = & \sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} \hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \cdot {exp}_{2} \{- (min_{s \in S} L [f_{1} (s, {\hat{x}}^{ℓ}) + min_{z \in Z} L [f_{2} (z, {\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})]) - log \hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})\} \\ \geq & {exp}_{2} \{- \sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} \hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \cdot (min_{s \in S} L [f_{1} (s, {\hat{x}}^{ℓ}) + [min_{z \in Z} L [f_{2} (z, {\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})]) + H ({\hat{X}}^{ℓ}, {\tilde{X}}^{ℓ})\}, \end{matrix}

where the last inequality follows from the convexity of the exponential function and Jensen’s inequality. This yields

\begin{matrix} log \{q^{4} (1 + log [1 + \frac{β^{ℓ} γ^{ℓ}}{q^{4}}])\} \\ \geq & H ({\hat{X}}^{ℓ}, {\tilde{X}}^{ℓ}) - \sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} \hat{P} ({\hat{x}}^{ℓ}, {\hat{x}}^{ℓ}) \cdot \{min_{s \in S} L [f_{1} (s, {\hat{x}}^{ℓ})] + min_{z \in Z} L [f_{2} (z, {\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})]\}, \end{matrix}

(30)

implying that

\begin{matrix} R_{1} + R_{2} & \geq & \frac{L (u^{n}) + L (v^{n})}{n} \\ \geq & \frac{1}{ℓ} \sum_{({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \in {\hat{X}}^{ℓ} \times {\tilde{X}}^{ℓ}} \hat{P} ({\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ}) \cdot \{min_{s \in S} L [f_{1} (s, {\hat{x}}^{ℓ})] + min_{z \in Z} L [f_{2} (z, {\hat{x}}^{ℓ}, {\tilde{x}}^{ℓ})]\} \\ \geq & \frac{H ({\hat{X}}^{ℓ}, {\tilde{X}}^{ℓ})}{ℓ} - \frac{1}{ℓ} log \{q^{4} (1 + log [1 + \frac{β^{ℓ} γ^{ℓ}}{q^{4}}])\} \\ = & \frac{H ({\hat{X}}^{ℓ})}{ℓ} + \frac{H ({\tilde{X}}^{ℓ} | {\hat{X}}^{ℓ})}{ℓ} - \frac{1}{ℓ} log \{q^{4} (1 + log [1 + \frac{β^{ℓ} γ^{ℓ}}{q^{4}}])\} \end{matrix}

(31)

Now, according to Equation (15),

\frac{H ({\hat{X}}^{ℓ})}{ℓ} \geq ρ_{L Z} ({\hat{x}}^{n}) - δ_{n} (ℓ) .

(32)

Similarly, according to Equation (19),

\frac{H ({\tilde{X}}^{ℓ} | {\hat{X}}^{ℓ})}{ℓ} \geq ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n}) - δ_{n}^{'} (ℓ),

(33)

and so,

\begin{matrix} R_{1} + R_{2} & \geq & ρ_{L Z} ({\hat{x}}^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n}) - δ_{n} (ℓ) - δ_{n}^{'} (ℓ) - \frac{1}{ℓ} log [q^{4} (1 + log [1 + \frac{β^{ℓ} γ^{ℓ}}{q^{4}}])] . \end{matrix}

(34)

Maximizing this lower bound w.r.t. ℓ yields

R_{1} + R_{2} \geq ρ_{L Z} ({\hat{x}}^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | {\hat{x}}^{n}) - Δ_{2} (n, q),

(35)

where

\begin{matrix} Δ_{2} (n, q) & = & min_{{ℓ : ℓ divides n}} \{δ_{n} (ℓ) + δ_{n}^{'} (ℓ) + \frac{1}{ℓ} log [q^{4} (1 + log [1 + \frac{β^{ℓ} γ^{ℓ}}{q^{4}}])]\} . \end{matrix}

(36)

This completes the proof of Theorem 1. □

Referring to the last part of comment no. 1 in the discussion that follows Theorem 1, we now address the gap in terms of the number of states. For an infinite source sequence

x = (x_{1}, x_{2}, \dots)

, we define the q-state achievable rate region for x as

R_{q} (x) = ⋃_{m \geq 1} ⋂_{n \geq m} R_{q} (x^{n}),

(37)

and finally, the finite-state achievable rate region for x is defined as

R_{\infty} (x) = ⋃_{q \geq 1} R_{q} (x) .

(38)

These definitions are two-dimensional counterparts of Equations (2)–(4) in [11], where the finite-state (lossless) compressibility of x is defined in several steps. In particular, the union over intersections in the definition of

R_{q} (x)

is the set-theoretic analogue of the limit superior operation, and the union operation in the definition of

R_{\infty} (x)

is parallel to a limit of

q \to \infty

.

Let k be a positive integer that divides n, and consider the partition of

{\hat{x}}^{n}

and

{\tilde{x}}^{n}

into

n / k

blocks of length k, i.e.,

{\hat{x}}_{k t + 1}^{k t + k} = ({\hat{x}}_{k t + 1}, {\hat{x}}_{k t + 2}, \dots, {\hat{x}}_{k t + k})

and

{\tilde{x}}_{k t + 1}^{k t + k} = ({\tilde{x}}_{k t + 1}, {\tilde{x}}_{k t + 2}, \dots, {\tilde{x}}_{k t + k})

,

t = 0, 1, \dots, n / k - 1

. Next, define:

\begin{matrix} R_{-}^{k} ({\hat{x}}^{n}, {\tilde{x}}^{n}) & = & {(R_{1}, R_{2}) : R_{1} \geq \frac{k}{n} \sum_{t = 0}^{n / k - 1} ρ_{L Z} ({\hat{x}}_{k t + 1}^{k t + k}) - Δ_{1} (q, k), \\ R_{1} + R_{2} \geq \frac{k}{n} \sum_{t = 0}^{n / k - 1} [ρ_{L Z} ({\hat{x}}_{k t + 1}^{k t + k}) + ρ_{L Z} ({\tilde{x}}_{k t + 1}^{k t + k} | {\hat{x}}_{k t + 1}^{k t + k})] - Δ_{2} (q, k)} . \end{matrix}

(39)

Then, similarly as in Theorem 1,

R_{q} (x^{n}) \subseteq R_{o}^{k} (x^{n}) \overset{▵}{=} ⋃_{({\hat{x}}^{n}, {\tilde{x}}^{n}) \in B (x^{n})} R_{-}^{k} ({\hat{x}}^{n}, {\tilde{x}}^{n}),

(40)

and so, for every positive integer N:

⋂_{n \geq N} R_{q} (x^{n}) \subseteq ⋂_{n \geq N} R_{o}^{k} (x^{n}),

(41)

implying that

R_{q} (x) = ⋃_{N \geq 1} ⋂_{n \geq N} R_{q} (x^{n}) \subseteq ⋃_{N \geq 1} ⋂_{n \geq N} R_{o}^{k} (x^{n}) \overset{▵}{=} R_{o}^{k} (x) .

(42)

Since this holds for every positive integer k, then

R_{q} (x) \subseteq ⋃_{K \geq 1} ⋂_{k \geq K} R_{o}^{k} (x) \overset{▵}{=} R_{o} (x),

(43)

and so,

R_{\infty} (x) \subseteq R_{o} (x),

(44)

which establishes an asymptotic version of the converse theorem.

As for the direct part, considering the fact that a block code of length k, operating on k-tuples of the two reconstruction vectors can be implemented by a finite-state machine with no more than

{(β γ)}^{k}

states, we have

\begin{matrix} R_{{(β γ)}^{k}} (x^{n}) & \supseteq & R_{i}^{k} (x^{n}) \\ \overset{▵}{=} & ⋃_{({\hat{x}}^{n}, {\tilde{x}}^{n}) \in B (x^{n})} R_{+}^{k} ({\hat{x}}^{n}, {\tilde{x}}^{n}), \end{matrix}

(45)

where

\begin{matrix} R_{+}^{k} ({\hat{x}}^{n}, {\tilde{x}}^{n}) & = & {(R_{1}, R_{2}) : R_{1} \geq \frac{k}{n} \sum_{t = 0}^{n / k - 1} ρ_{L Z} ({\hat{x}}_{k t + 1}^{k t + k}) + O (\frac{1}{log k}), \\ R_{1} + R_{2} \geq \frac{k}{n} \sum_{t = 0}^{n / k - 1} [ρ_{L Z} ({\hat{x}}_{k t + 1}^{k t + k}) + \\ ρ_{L Z} ({\tilde{x}}_{k t + 1}^{k t + k} | {\hat{x}}_{k t + 1}^{k t + k}))] + O (\frac{log (log k)}{log k})} . \end{matrix}

(46)

\begin{matrix} R_{{(β γ)}^{k}} (x) & = & ⋃_{N \geq 1} ⋂_{n \geq N} R_{{(β γ)}^{k}} (x^{n}) \\ \supseteq & ⋃_{N \geq 1} ⋂_{n \geq N} R_{i}^{k} (x^{n}) \\ = & R_{i}^{k} (x) \\ \supseteq & ⋂_{K \geq k} R_{i}^{K} (x) \end{matrix}

(47)

and so,

R_{\infty} (x) = ⋃_{k \geq 1} R_{{(β γ)}^{k}} (x) \supseteq ⋃_{k \geq 1} ⋂_{K \geq k} R_{i}^{K} (x) = R_{i} (x) .

(48)

We have just proved the following theorem:

Theorem 2.

For every infinite individual sequence

x = (x_{1}, x_{2}, \dots)

,

R_{o} (x) \supseteq R_{\infty} (x) \supseteq R_{i} (x) .

(49)

These inner and outer bounds are tight in the sense that the definitions of

R_{o} (x)

and

R_{i} (x)

are based on the same building blocks and the only difference is in terms that tend to zero as

k \to \infty

.

5. Multiple Description Coding

Consider next the configuration that is associated with the multiple description problem (see, e.g., Chapter 13 of [9]), where the source is an individual sequence and the encoders are modeled as finite-state machines. In particular, there are two q-state encoders and three decoders, which are defined as follows. Encoders 1 and 2 are fed by

x^{n}

and produce two reconstructions,

{\hat{x}}^{n}

and

{\tilde{x}}^{n}

, with distortions

d_{1} (x^{n}, {\hat{x}}^{n}) \leq n D_{1}

and

d_{2} (x^{n}, {\tilde{x}}^{n}) \leq n D_{2}

, respectively. Encoder 1 then compresses

{\hat{x}}^{n}

losslessly and sends a compressed description to Decoder 1. Likewise, Encoder 2 does the same with

{\tilde{x}}^{n}

and sends a compressed form to Decoder 2. There is no collaboration between Decoders 1 and 2. The third decoder, Decoder 0, receives both compressed descriptions and generates yet another reconstruction,

{\overset{`}{x}}^{n}

, with distortion

d_{0} (x^{n}, {\overset{`}{x}}^{n}) \leq n D_{0}

.

Using the same technique as in the proof of Theorem 1, it is easy to prove the following outer bound to the achievable rate region:

R_{o} (x^{n}) = ⋃_{({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n}) \in B (x^{n})} R_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n}),

(50)

where

B (x^{n})

is redefined as

B (x^{n}) = \{({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n}) : d_{0} (x^{n} {\overset{`}{x}}^{n}) \leq n D_{0}, d_{1} (x^{n} {\hat{x}}^{n}) \leq n D_{1}, d_{2} (x^{n} {\tilde{x}}^{n}) \leq n D_{2}\},

(51)

and

\begin{matrix} R_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n}) & = & {(R_{1}, R_{2}) : R_{1} \geq ρ_{L Z} ({\hat{x}}^{n}) - Δ_{1} (q, n), R_{2} \geq ρ_{L Z} ({\tilde{x}}^{n}) - Δ_{1} (q, n), \\ R_{1} + R_{2} \geq ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n}) + \\ ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}) - Δ_{2} (q, n)}, \end{matrix}

(52)

with

Δ_{1} (q, n)

and

Δ_{2} (q, n)

being defined similarly as before. The sum-rate inequality is obtained by considering that the two encoders together compress losslessly the triple

({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n})

and so, the main term of the lower bound to

R_{1} + R_{2}

is the joint empirical entropy of of

({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n})

, which can be decomposed as the sum of the joint empirical entropy of

({\hat{x}}^{n}, {\tilde{x}}^{n})

and the conditional empirical entropy of

{\overset{`}{x}}^{n}

given

({\hat{x}}^{n}, {\tilde{x}}^{n})

, which in turn are essentially further lower bounded by

ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n})

and

ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n})

, respectively.

We next present two inner bounds, where the first one is analogous to the El Gamal–Cover inner bound [20] and the second follows the same line of thought as that of the Zhang–Berger inner bound [21].

The former inner bound is given by

R_{i}^{E G C} (x^{n}) = ⋃_{({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n}) \in B (x^{n})} R_{i} ({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n}),

(53)

where

\begin{matrix} R_{i} ({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n}) & = & {(R_{1}, R_{2}) : R_{1} \geq ρ_{L Z} ({\hat{x}}^{n}) + ϵ (n), R_{2} \geq ρ_{L Z} ({\tilde{x}}^{n}) + ϵ (n), \\ R_{1} + R_{2} \geq ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n}) + ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}) + \hat{I} ({\hat{x}}^{n}; {\tilde{x}}^{n}) + ϵ (n) + \hat{ϵ} (n)}, \end{matrix}

(54)

where

ϵ (n)

and

\hat{ϵ (n)}

are as in (13) and (18), respectively.

\hat{I} ({\hat{x}}^{n}; {\tilde{x}}^{n}) = ρ_{L Z} ({\hat{x}}^{n}) + ρ_{L Z} ({\tilde{x}}^{n}) - ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}) .

(55)

The quantity

\hat{I} ({\hat{x}}^{n}; {\tilde{x}}^{n})

plays a role of an empirical mutual information between

{\hat{x}}^{n}

and

{\tilde{x}}^{n}

, which manifests the gap between the lower bounds to the sum-rate inequalities of the inner bound and the outer bound, analogously to the mutual information term of the El Gamal–Cover achievable region.

The achievability of the above inner bound is as follows. Given an internal point in

R_{i}^{E G C} (x^{n})

, there must exist a reconstruction triple

({\hat{x}}^{n}, {\tilde{x}}^{n}, {\overset{`}{x}}^{n})

that meets the distortion constraints and the corresponding rate inequalities. The encoder applies individual LZ compression for both

{\hat{x}}^{n}

and

{\tilde{x}}^{n}

and sends the compressed versions, at rates

ρ_{L Z} ({\hat{x}}^{n})

and

ρ_{L Z} ({\tilde{x}}^{n})

(up to negligibly small terms for large n), to Decoder 1 and Decoder 2, respectively. It then applies conditional LZ compression of

{\overset{`}{x}}^{n}

given

({\hat{x}}^{n}, {\tilde{x}}^{n})

at rate

ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n})

(up to small terms), and splits this compressed bit stream between Decoders 1 and 2 without violating their rate inequalities. The rate sum is then essentially

\begin{matrix} ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n}) + ρ_{L Z} ({\hat{x}}^{n}) + ρ_{L Z} ({\tilde{x}}^{n}) \\ = & ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n}) + ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n}) + \hat{I} ({\hat{x}}^{n}; {\tilde{x}}^{n}) . \end{matrix}

(56)

In the above description, we explained that the bit-stream associated with the conditional compression of

{\overset{`}{x}}^{n}

given

({\hat{x}}^{n}, {\tilde{x}}^{n})

is split between Decoders 1 and 2 without violating the rate inequalities. This is always possible because of the following simple fact: Given an internal point in the region

R = {(R_{1}, R_{2}) : R_{1} > A, R_{2} > B, R_{1} + R_{2} \geq A + B + C}

, there must exist

0 \leq D \leq C

such that

(A + D, B + C - D) \in R

. In particular, let

D = R_{1} - A \geq 0

. Then,

R_{2} = B + C - D \geq B

and

R_{1} + R_{2} = (A + D) + (B + C - D) = A + B + C

. In our case,

A = ρ_{L Z} ({\hat{x}}^{n})

,

B = ρ_{L Z} ({\tilde{x}}^{n})

, and

C = ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n})

.

The second achievability scheme, in the spirit of the Zhang–Berger scheme, is as follows. Here, in addition to

{\hat{x}}^{n}

,

{\tilde{x}}^{n}

and

{\overset{`}{x}}^{n}

, we also generate an auxiliary finite-alphabet sequence,

u^{n}

. The encoder applies LZ compression to

u^{n}

and conditional LZ compression of

{\hat{x}}^{n}

given

u^{n}

and sends both bit-streams to Decoder 1. At the same time, it also applies conditional LZ compression of

{\tilde{x}}^{n}

given

u^{n}

and sends the compressed forms of

u^{n}

and

{\tilde{x}}^{n}

to Decoder 2. Finally, the encoder applies conditional LZ compression of

{\overset{`}{x}}^{n}

given

({\hat{x}}^{n}, {\tilde{x}}^{n}, u^{n})

and splits the compressed bit-stream between Decoders and Decoder 2 in a manner that meets the rate constraints. Thus,

\begin{matrix} R_{1} & \approx & ρ_{L Z} (u^{n}) + ρ_{L Z} ({\hat{x}}^{n} | u^{n}) + α ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n}, u^{n}) \end{matrix}

(57)

\begin{matrix} R_{2} & \approx & ρ_{L Z} (u^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | u^{n}) + (1 - α) ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n}, u^{n}), \end{matrix}

(58)

where

α \in [0, 1]

. Thus,

\begin{matrix} R_{1} + R_{2} & \approx & 2 \cdot ρ_{L Z} (u^{n}) + ρ_{L Z} ({\hat{x}}^{n} | u^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | u^{n}) + ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n}, u^{n}) \\ = & 2 \cdot ρ_{L Z} (u^{n}) + ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n} | u^{n}) + ρ_{L Z} ({\overset{`}{x}}^{n} | {\hat{x}}^{n}, {\tilde{x}}^{n}, u^{n}) + \hat{I} ({\hat{x}}^{n}; {\tilde{x}}^{n} | u^{n}), \end{matrix}

(59)

where

\hat{I} ({\hat{x}}^{n}; {\tilde{x}}^{n} | u^{n}) = ρ_{L Z} ({\hat{x}}^{n} | u^{n}) + ρ_{L Z} ({\tilde{x}}^{n} | u^{n}) - ρ_{L Z} ({\hat{x}}^{n}, {\tilde{x}}^{n} | u^{n}),

(60)

is analogous to conditional mutual information. The first term in the rate sum is analogous to

2 I (U; X)

, the sum of the second and the third is analogous to

I (X; \hat{X}, \tilde{X}, \overset{`}{X} | U)

and the last term is analogous to

I (\hat{X}; \tilde{X} | U)

(see Theorem 13.4, p. 332 in [9]).

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflicts of interest.

References

Equitz, W.H.R.; Cover, T.M. Successive refinement of information. IEEE Trans. Inform. Theory 1991, 37, 269–275. [Google Scholar]
Koshelev, V.N. On the divisibility of discrete sources with an additive single-letter distortion measure. Probl. Inf. Transm. (IPPI) 1994, 30, 27–43. [Google Scholar]
Rimoldi, B. Successive refinement of information: Characterization of achievable rates. IEEE Trans. Inform. Theory 1994, 40, 253–259. [Google Scholar] [CrossRef]
Kanlis, A.; Narayan, P. Error exponents for successive refinement by partitioning. IEEE Trans. Inform. Theory 1996, 42, 275–282. [Google Scholar] [CrossRef]
Steinberg, Y.; Merhav, N. On successive refinement for the Wyner–Ziv problem. IEEE Trans. Inform. Theory 2004, 50, 1636–1654. [Google Scholar]
Maor, A.; Merhav, N. On successive refinement with causal side information at the decoders. IEEE Trans. Inform. Theory 2008, 54, 332–343. [Google Scholar]
Maor, A.; Merhav, N. On successive refinement for the Kaspi/Heegard–Berger problem. IEEE Trans. Inform. Theory 2010, 56, 3930–3945. [Google Scholar]
Tian, C.; Diggavi, S.N. Multistage successive refinement for Wyner-Ziv source coding with degraded side informations. In Proceedings of the 2006 IEEE International Symposium on Information Theory (ISIT 2006), Seattle, WA, USA, 9–14 July 2006; pp. 1594–1598. [Google Scholar] [CrossRef]
El Gamal, A.; Kim, Y.-H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Lempel, A.; Ziv, J. On the complexity of finite sequences. IEEE Trans. Inform. Theory 1976, 22, 75–81. [Google Scholar]
Ziv, J.; Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 1978, 24, 530–536. [Google Scholar]
Ziv, J. Distortion-rate theory for individual sequences. IEEE Trans. Inform. Theory 1980, 26, 137–143. [Google Scholar]
Ziv, J. Fixed-rate encoding of individual sequences with side information. IEEE Trans. Inform. Theory 1984, 30, 348–352. [Google Scholar]
Yang, E.-H.; Kieffer, J.C. Simple universal lossy data compression schemes derived from the Lempel-Ziv algorithm. IEEE Trans. Inform. Theory 1996, 42, 239–245. [Google Scholar] [CrossRef]
Uyematsu, T.; Kuzuoka, S. Conditional Lempel-Ziv complexity and its application to source coding theorem with side information. IEICE Trans. Fundam. 2003, 86, 2615–2617. [Google Scholar]
Merhav, N. Universal detection of messages via finite–state channels. IEEE Trans. Inform. Theory 2000, 46, 2242–2246. [Google Scholar]
Merhav, N. A universal random coding ensemble for sample-wise lossy compression. Entropy 2023, 25, 1199. [Google Scholar] [CrossRef] [PubMed]
Merhav, N. Lossy compression of individual sequences revisited: Fundamental limits of finite-state encoders. Entropy 2024, 26, 116. [Google Scholar] [CrossRef] [PubMed]
Merhav, N. Universal Slepian-Wolf coding for individual sequences. IEEE Trans. Inform. Theory 2025, 71, 783–796. [Google Scholar]
El Gamal, A.; Cover, T.M. Achievable rates for multiple descriptions. IEEE Trans. Inform. Theory 1982, 28, 851–857. [Google Scholar] [CrossRef]
Zhang, Z.; Berger, T. New results in binary multiple descriptions. IEEE Trans. Inform. Theory 1987, 33, 502–521. [Google Scholar]
Ziv, J. Universal decoding for finite-state channels. IEEE Trans. Inform. Theory 1985, 31, 453–460. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Merhav, N. Successive Refinement for Lossy Compression of Individual Sequences. Entropy 2025, 27, 370. https://doi.org/10.3390/e27040370

AMA Style

Merhav N. Successive Refinement for Lossy Compression of Individual Sequences. Entropy. 2025; 27(4):370. https://doi.org/10.3390/e27040370

Chicago/Turabian Style

Merhav, Neri. 2025. "Successive Refinement for Lossy Compression of Individual Sequences" Entropy 27, no. 4: 370. https://doi.org/10.3390/e27040370

APA Style

Merhav, N. (2025). Successive Refinement for Lossy Compression of Individual Sequences. Entropy, 27(4), 370. https://doi.org/10.3390/e27040370

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Successive Refinement for Lossy Compression of Individual Sequences

Abstract

1. Introduction

2. Notation Conventions and Problem Formulation

3. Background

4. The Outer Bound for Successive Refinement

5. Multiple Description Coding

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI