1. Introduction
The notion of
successive refinement of information refers to systems in which the reconstruction of the source occurs in multiple stages. In these systems, a single encoder encodes the source and communicates with either one decoder or multiple decoders in a step-by-step manner. At each stage, the encoder transmits a portion of the source information to the corresponding decoder, which also has access to all previous transmissions. Each decoder uses all available transmissions to reconstruct the source, possibly incorporating additional side information. The quality of the reconstruction at each stage (or by each decoder) is evaluated based on a predefined distortion measure. One of the motivations of this hierarchical structure is to allow scalability that meets the available channel resources of the various users who receive the compressed information or even to adapt to time-varying channel conditions of a single user. Several studies have addressed the successive refinement problem for probabilistic sources, most notably, memoryless sources, see, e.g., [
1], where necessary and sufficient conditions for simultaneously achieving the rate-distortion function at all stages [
2,
3], where the rate-distortion region was fully characterized in general, and [
4], where source coding error exponents were derived. In some later works, such as [
5,
6,
7,
8], successive refinement coding was also considered with the incorporation of side information. Successive-refinement coding is also an important special case of the so called
multiple description coding, which in its simplest form, consists of two encoders that send two different individual descriptions of the source to two separate respective decoders (that do not cooperate with each other), and the compressed bit-streams pertaining to those descriptions are also combined and sent to yet another decoder, whose role is to produce a better reconstruction than both of those of the two individual decoders. The problem of fully characterizing the rate-distortion region of the multiple description coding problem is open in its general form, and there are certain inner and outer bounds to this region, see Chapter 13 of [
9] for some details and references therein.
In this paper, we focus on successive refinement of information in the context of individual sequences, namely, deterministic source sequences, as opposed to the traditional setting of random sequences governed by a certain probabilistic mechanism. In that sense, this work can be viewed as an additional step in the development of multiuser information theory for individual sequences, following a series of earlier works of this flavor, initiated by Ziv and Lempel in [
10,
11,
12,
13], and continued by others in many articles, such as [
14,
15,
16,
17,
18,
19]. In particular, we consider the problem of successive-refinement coding for lossy compression of individual sequences in two stages, where in the first stage, a coarse description at a relatively low rate is sent from the encoder to the decoder, and in the second stage, an additional coding rate is allocated in order to refine the description and thereby improve the reproduction. Our main results are in establishing outer bounds (converse theorems) for the rate region with individual sequences, where we limit the encoders to be finite-state machines similarly as in [
11,
12,
13] and others. The compatible achievability scheme is conceptually straightforward, and so, we believe that the deeper and more interesting part of the contribution of this work is in the converse theorems, namely, in the outer bounds. Our results are formulated and proved for two stages of coding, but their extension to any fixed number of stages is straightforward. These results can also be viewed as an extension of the fixed-distortion results of [
14] to successive-refinement coding.
We also consider the more general multiple description coding problem and propose achievability schemes that are analogous to the well-known El Gamal–Cover [
20] and the Zhang–Berger [
21] achievability schemes of memoryless sources and additive distortion measures. There is a clear parallelism between the rate expressions that we obtain and those of [
20,
21], including those that are associated with the gaps between the outer bound and the corresponding inner bounds.
The outline of the remaining part of this paper is as follows. In
Section 2, we establish notation conventions and formulate the problem setting. In
Section 3, we provide some general background on the LZ algorithm as well as on its conditional form.
Section 4 is devoted to the above-mentioned successive refinement outer bound, and finally, in
Section 5, we address the more general multiple description problem.
2. Notation Conventions and Problem Formulation
Throughout the paper, random variables will be denoted by capital letters, specific values they may take will be denoted by the corresponding lower case letters and their alphabets will be denoted by calligraphic letters. Random vectors, their realizations, and their alphabets will be denoted, respectively, by capital letters, the corresponding lower case letters, and the corresponding calligraphic letters, all superscripted by their dimensions. More specifically, for a given positive integer, n, the source vector , with components, , , from a finite-alphabet, , will be denoted by . The set of all such n-vectors will be denoted by , which is the n–th order Cartesian power of the single-letter source alphabet, . Likewise, reproduction vectors of length n, such as and , with components, and , , from finite-alphabets, and , will be denoted by and , respectively. An infinite source sequence will be denoted by x. The cardinalities of , , and will be denoted by , , and , respectively. The value is allowed, to incorporate countably infinite and continuous source alphabets. By contrast, and will always be finite.
For two positive integers i and j, where , the notation will be used to denote the substring . For , the subscript ‘1’ will be omitted, and so, the shorthand notation of will be , as mentioned before. Similar conventions will apply to other sequences. Probability distributions will be denoted by the letter P with possible subscripts, depending on the context. The indicator function for an event will be denoted by , that is, is occurs, and , if not. The logarithmic function, , will be understood to be defined to the base 2. Logarithms to the base e will be denoted by ln. Let and be two arbitrary distortion functions between source vectors, , and corresponding reproduction vectors, and , respectively.
The successive refinement encoder model is as follows. It is composed of a cascade of two encoders: a reproduction encoder (namely, a vector quantizer) followed by a lossless encoder. The input to the reproduction encoder is the source sequence
and the output is a pair of reproduction vectors,
that obeys the distortion constraints,
and
, where
and
are prescribed normalized distortion levels. There are no particular restrictions imposed on the distortion functions. We denote
The mapping
, employed by the reproduction encoder, is arbitrary and not limited. The pair
serves as an input to the lossless encoder.
As for the lossless encoder, we follow the same modeling approach as in [
11], but with a few adjustments to make it suitable to successive refinement. In particular, the lossless encoder is defined by a set
where
and
are as before;
and
are two sets of variable-length binary strings, which both include the empty string
of length zero;
and
are two sets of states, each one containing
q states;
and
are the encoder output functions, and finally,
and
are two next-state functions. When the lossless encoder is fed by a sequence of pairs
, it outputs a corresponding sequence of pairs of binary strings,
according to the following recursive mechanism. For
:
where the initial states,
and
, are assumed arbitrary fixed members of
and
, respectively. Similarly as in [
11], we adopt the extended notation, of
for
,
for
, and similar notations associated with
and
.
An encoder
E is said to be information lossless if for every positive integer
k, the vector
uniquely determines
and likewise,
uniquely determines
. Let
be the set of all information lossless encoders,
, with
and
.
Given a lossless encoder
E and a pair of inputs
, we define
and
where
,
being the length (in bits) of the binary string
, and similarly for
. Recall that for the empty string,
, we define
. The achievable rate region for
that is associated with
E is defined as
and the
q-state achievable rate region for
is defined as
The rationale behind the union operations in these definitions is that they are two-dimensional set-theoretic analogues of the minimization operations over the appropriate encoders
and reproduction vectors,
, that appear for a single coding rate, like in [
11,
14], as can be noted from the simple relationship:
for a given generic distortion function
d and distortion level
D.
For later use, we also define the joint empirical distribution of
ℓ-blocks of
,
, provided that
ℓ divides
n. Specifically, consider the empirical distribution,
, of pairs of
ℓ-vectors, defined as
Let
denote the joint empirical entropy of an auxiliary pair of random
ℓ-vectors,
, induced by
, that is,
Accordingly,
and
will denote the corresponding marginal empirical entropy of
and the conditional empirical entropy of
given
.
Our objective is to provide inner and outer bounds to the achievable rate region and show that they asymptotically coincide in the limit of large
n followed by a limit of large
q, in analogy to the asymptotic regime of [
11].
3. Background
Before the exposition of the main results and their proofs, we revisit key terms and details related to the 1978 version of the LZ algorithm, also known as the LZ78 algorithm [
11], which is the central building block in this work. The incremental parsing procedure of the LZ78 algorithm is a sequential parsing process applied to a finite-alphabet input vector,
. According to this procedure, each new phrase is the shortest string not encountered before as a parsed phrase, except for the potential incompleteness of the last phrase. For instance, the incremental parsing of the vector
results in
. Let
denote the number of phrases in
resulting from the incremental parsing procedure (in the above example,
). Furthermore, let
denote the length of the LZ78 binary compressed code for
. According to [
11], [Theorem 2] the following inequality holds:
where we remind that
is the cardinality of
, and where both
and
tends to zero as
. In other words, the LZ code-length for
is upper bounded by an expression whose main term is
. On the other hand,
is also known to be the main term of a lower bound (see Theorem 1 of [
11]) to the shortest code-length attainable by any information lossless finite-state encoder with no more than
q states, provided that
is very small compared to
. In view of these facts, we henceforth refer to
as the unnormalized
LZ complexity of
, whereas the normalized LZ complexity is defined as
A useful inequality that relates the empirical entropy of non-overlapping
ℓ-blocks of
(where
ℓ divides
n) and
(see, for example, Equation (
26) of [
22]), is the following:
It is obtained from the fact that the Shannon code for
ℓ-blocks can be implemented using a finite-state encoder with no more than
states. Specifically, for a block code of length
ℓ to be implemented by a finite-state machine, one defines the state at each time instant
i to be the contents of the input, starting at the beginning of the current block (at time
) and ending at time
. The number of states for an input alphabet of size
is then
. Therefore, the code-length of this Shannon code must comply with the lower bound of Theorem 1 in [
11]. Note that
and so,
. Clearly, it is possible to let
increase with
n slowly enough such that
as
, in particular,
should be
for that purpose.
In [
22], the notion of the LZ complexity was extended to incorporate finite-state lossless compression in the presence of side information, namely, the conditional version of the LZ complexity. Given
and
, let us apply the incremental parsing procedure of the LZ algorithm to the sequence of pairs
. As mentioned before, according to this procedure, all phrases are distinct with a possible exception of the last phrase, which might be incomplete. Let
denote the number of distinct phrases. As an example (taken from [
22]), let
and consider the sequence pair
along with its joint incremental parsing as follows:
then
. Let
denote the resulting number of distinct phrases of
(which may differ from
in individual parsing of
alone), and let
denote the
l-th distinct
–phrase,
. In the above example,
. Denote by
the number of occurrences of
in the parsing of
, or equivalently, the number of distinct
-phrases that jointly appear with
. Clearly,
. In the above example,
,
,
,
, and
. Now, the conditional LZ complexity of
given
is defined as
In [
22] it was shown that
is the main term of the compression ratio achieved by the conditional version of the LZ algorithm for compressing
in the presence of the side information
, available to both encoder and decoder—see the compression scheme described in [
22] (see also [
15]), i.e., the length function,
, of the coding scheme proposed therein is upper bounded (in parallel to (
13)) by
where
(see Equations (10) and (11) in [
15]). On the other hand, analogously to [
11] [Theorem 1], it was shown in [
16], that
is also the main term of a lower bound to the compression ratio that can be achieved by any finite-state encoder with side information at both ends, provided that the number of states is not too large, similarly as described above for the unconditional version.
The inequality (
15) also extends to the conditional case as follows (see [
16]):
where
is the same as
except that
therein is replaced by
to accommodate the number of states associated with the conditional version of the aforementioned Shannon code applied to
ℓ-blocks. By the same token, we also have
4. The Outer Bound for Successive Refinement
Our main result for finite-state encoders is the following.
Theorem 1. For every ,
wherewhere and are defined aswith as ,
and Discussion 1. Several comments are now in order.
- 1.
Since both and tend to zero as for fixed q, the asymptotic achievability of is conceptually straightforward: Given an internal point, , there must be at least one pair such that . Upon finding such a pair, proceed as follows: At the first stage, apply LZ78 compression to at a coding rate of which is only slightly above for large n, as discussed in Section 3. At the second stage, apply conditional LZ compression of given as side information at both ends, at an incremental coding rate of which is close to , as also explained in Section 3, and then the total rate, , is about . Similarly as in [11], there is still a certain gap between the achievability and the converse theorem because the achievability requires encoders whose number of states is not small compared to n, whereas the converse is significant when q is very small relative to n. As in [11], this gap can be closed in the asymptotic limit of large q by partitioning the sequence into non-overlapping blocks and starting over the LZ compression mechanism in each block separately. We will address this point in detail later on. - 2.
Considering that the achievability is conceptually straightforward, as explained in item no. 1 above, the interesting and deeper result is the converse theorem. Since the second stage encoder receives both and as inputs, it is immediate to lower bound the total coding rate, at the second stage, in terms of the joint compressibility of , namely by , but recall that the first-stage encoder must have already allocated a rate at least as large as , then in order to meet a lower bound of on the total coding rate, the incremental rate, , of the second stage must not exceed , and there is no apparent way to achieve such a coding rate, as far as the author can see. Nonetheless, since we can also lower bound the total rate of both stages by , then the achievability becomes obvious, as said. This point is not trivial because there is no chain rule that applies to the LZ complexities of arbitrary finite sequences. The proof that also serves as a lower bound (essentially), which requires a certain manipulation by using a generalized Kraft inequality and passing via empirical entropies, as can be seen in the proof.
- 3.
The choice of exhibits a trade-off between the coding rate of the first stage and the incremental rate at the second stage because is both compressed at the first stage and serves as side information at the second stage, so there might be a certain tension between selecting for having small and selecting it for small . Of course, an analogous tension exists also in successive refinement for memoryless sources [3]. The reproduction encoder must select that best compromises these criteria. - 4.
The results extend straightforwardly to any finite number of stages, where at each stage one applies conditional LZ compression of the current reproduction given all previous reproductions.
Proof of Theorem 1. We begin with the first stage. By definition, if
, then there must exist an encoder
and
such that
and
. Now, according to Theorem 1 of [
11]:
where
and the last inequality is an application of Equation (
6) in [
11]. Since
, it follows that
Moving on to the combined encoder of both stages, consider the following. According to Lemma 2 of [
11] and due to the postulated information losslessness, the combined encoder, which has
states, must obey the following generalized Kraft inequality:
This implies that the description length at the output of this encoder is lower bounded as follows.
and so,
Now, by the generalized Kraft inequality above,
where the last inequality follows from the convexity of the exponential function and Jensen’s inequality. This yields
implying that
Now, according to Equation (
15),
Similarly, according to Equation (
19),
and so,
Maximizing this lower bound w.r.t.
ℓ yields
where
This completes the proof of Theorem 1. □
Referring to the last part of comment no. 1 in the discussion that follows Theorem 1, we now address the gap in terms of the number of states. For an infinite source sequence
, we define the
q-state achievable rate region for
x as
and finally, the finite-state achievable rate region for
x is defined as
These definitions are two-dimensional counterparts of Equations (2)–(4) in [
11], where the finite-state (lossless) compressibility of
x is defined in several steps. In particular, the union over intersections in the definition of
is the set-theoretic analogue of the limit superior operation, and the union operation in the definition of
is parallel to a limit of
.
Let
k be a positive integer that divides
n, and consider the partition of
and
into
blocks of length
k, i.e.,
and
,
. Next, define:
Then, similarly as in Theorem 1,
and so, for every positive integer
N:
implying that
Since this holds for every positive integer
k, then
and so,
which establishes an asymptotic version of the converse theorem.
As for the direct part, considering the fact that a block code of length
k, operating on
k-tuples of the two reconstruction vectors can be implemented by a finite-state machine with no more than
states, we have
where
and so,
We have just proved the following theorem:
Theorem 2. For every infinite individual sequence ,
These inner and outer bounds are tight in the sense that the definitions of and are based on the same building blocks and the only difference is in terms that tend to zero as .
5. Multiple Description Coding
Consider next the configuration that is associated with the multiple description problem (see, e.g., Chapter 13 of [
9]), where the source is an individual sequence and the encoders are modeled as finite-state machines. In particular, there are two
q-state encoders and three decoders, which are defined as follows. Encoders 1 and 2 are fed by
and produce two reconstructions,
and
, with distortions
and
, respectively. Encoder 1 then compresses
losslessly and sends a compressed description to Decoder 1. Likewise, Encoder 2 does the same with
and sends a compressed form to Decoder 2. There is no collaboration between Decoders 1 and 2. The third decoder, Decoder 0, receives both compressed descriptions and generates yet another reconstruction,
, with distortion
.
Using the same technique as in the proof of Theorem 1, it is easy to prove the following outer bound to the achievable rate region:
where
is redefined as
and
with
and
being defined similarly as before. The sum-rate inequality is obtained by considering that the two encoders together compress losslessly the triple
and so, the main term of the lower bound to
is the joint empirical entropy of of
, which can be decomposed as the sum of the joint empirical entropy of
and the conditional empirical entropy of
given
, which in turn are essentially further lower bounded by
and
, respectively.
We next present two inner bounds, where the first one is analogous to the El Gamal–Cover inner bound [
20] and the second follows the same line of thought as that of the Zhang–Berger inner bound [
21].
The former inner bound is given by
where
where
and
are as in (
13) and (
18), respectively.
The quantity
plays a role of an empirical mutual information between
and
, which manifests the gap between the lower bounds to the sum-rate inequalities of the inner bound and the outer bound, analogously to the mutual information term of the El Gamal–Cover achievable region.
The achievability of the above inner bound is as follows. Given an internal point in
, there must exist a reconstruction triple
that meets the distortion constraints and the corresponding rate inequalities. The encoder applies individual LZ compression for both
and
and sends the compressed versions, at rates
and
(up to negligibly small terms for large
n), to Decoder 1 and Decoder 2, respectively. It then applies conditional LZ compression of
given
at rate
(up to small terms), and splits this compressed bit stream between Decoders 1 and 2 without violating their rate inequalities. The rate sum is then essentially
In the above description, we explained that the bit-stream associated with the conditional compression of given is split between Decoders 1 and 2 without violating the rate inequalities. This is always possible because of the following simple fact: Given an internal point in the region , there must exist such that . In particular, let . Then, and . In our case, , , and .
The second achievability scheme, in the spirit of the Zhang–Berger scheme, is as follows. Here, in addition to
,
and
, we also generate an auxiliary finite-alphabet sequence,
. The encoder applies LZ compression to
and conditional LZ compression of
given
and sends both bit-streams to Decoder 1. At the same time, it also applies conditional LZ compression of
given
and sends the compressed forms of
and
to Decoder 2. Finally, the encoder applies conditional LZ compression of
given
and splits the compressed bit-stream between Decoders and Decoder 2 in a manner that meets the rate constraints. Thus,
where
. Thus,
where
is analogous to conditional mutual information. The first term in the rate sum is analogous to
, the sum of the second and the third is analogous to
and the last term is analogous to
(see Theorem 13.4, p. 332 in [
9]).