1. Introduction
Kraft’s inequality plays a pivotal role in information theory. It provides a complete and elegant characterization of the feasibility of variable-length uniquely decodable (UD) codes by imposing a simple constraint on codeword lengths. In 1949, Kraft [
1] introduced this inequality for prefix codes, establishing a condition on codeword lengths necessary for prefix decodability. Seven years later, McMillan [
2] generalized this to UD codes, leading to the Kraft–McMillan inequality, which is widely used in information theory, first and foremost, to furnish a necessary and sufficient condition for the existence of a UD code with a given code-length function, and thereby also to prove the converse to the lossless source coding theorem, asserting that no UD source code can yield a coding rate below the entropy rate of the source. Once this necessary and sufficient condition is satisfied, there exists not only a general UD code, but also more specifically, a prefix code with that length function. Beyond its immediate operational meaning, Kraft’s inequality underlies many fundamental principles in lossless compression, such as the equivalence between lossless source coding and probability assignment. In general, its importance stems from the fact that it connects combinatorial properties of codes with analytical bounds in a precise and tractable manner. Classical treatments can be found in standard texts such as [
3,
4].
When memory is introduced into the encoder, however, the classical Kraft inequality (CKI) no longer applies directly. Finite-state (FS) encoders constitute a natural and widely studied model for compression with memory, arising in universal source coding, individual-sequence coding, and FS prediction. In this setting, the encoder’s output depends not only on the current source symbol, but also on an internal state that evolves over time in a manner that depends on past inputs. As a result, the set of admissible codeword length assignments is no longer characterized by a single scalar inequality, and the extension of Kraft’s condition becomes substantially more subtle.
Significant progress in this direction was made by Ziv and Lempel [
5], who derived a generalized Kraft inequality (GKI) for information-lossless (IL) FS encoders by considering blocks over large super-alphabets, see Lemma 2 in [
5]. When reading Ziv and Lempel’s article, the reader might get the impression that their GKI was established merely as an auxiliary result needed in their way of proving that the FS compressibility of a sequence is lower bounded by its asymptotic empirical entropy. Their focus was not on the Kraft inequality in its own right. Consequently, their formulation of Kraft’s inequality suffers from two main limitations: (i) it does not reduce exactly to the CKI when the encoder has merely one state, and (ii) it is based on super-alphabet extensions to long blocks rather than being formulated in a single-letter manner, or at the level in which the encoder is defined in the first place. More precisely, while the inequality remains valid even for short block lengths, it yields tight results only asymptotically for long blocks. But nevertheless, a direct, state-level generalization of Kraft’s inequality that mirrors the simplicity and sharpness of the classical result has remained elusive.
In this paper, we present several new forms of GKIs for IL FS encoders. Our approach associates with every given IL FS encoder a nonnegative matrix, termed the Kraft matrix, whose entries are determined by the encoder’s single-symbol output lengths and state transitions. We show that information losslessness imposes a spectral-radius constraint on this matrix, which serves as a natural analog of Kraft’s inequality. Unlike Ziv and Lempel’s GKI mentioned above, this inequality, as well as its several equivalent forms presented herein, reduces exactly to the CKI in the single-state case and avoids the use of super-alphabet extensions.
We then further refine the analysis for irreducible FS encoders, where the Perron–Frobenius theory yields stronger, uniform bounds on matrix powers. These results lead to transparent lower bounds on achievable compression rates for both stochastic sources and individual sequences. In addition, we extend the framework to settings with side information (SI) available at both the encoder and decoder, where the relevant constraint is expressed in terms of the joint spectral radius (JSR) of a finite set of Kraft matrices [
6]. This extension clarifies the structural limitations imposed by SI and highlights the role of common sub-invariant vectors. Finally, another extension is associated with lossy source coding in the spirit of those of [
7,
8,
9].
Overall, the proposed framework provides a unified and exact characterization of feasibility conditions for FS encoders, sharpening existing results and offering new tools for the analysis of compression and prediction under finite-memory constraints.
The outline of the remaining part of this article is as follows. In
Section 2, we establish notation conventions, define the setting, and provide some background on the GKI of Ziv and Lempel. In
Section 3, we present our basic GKI, asserting that the spectral radius of the Kraft matrix must not exceed unity for an IL FS encoder. Stronger and more explicit statements are then provided for irreducible encoders in
Section 4. In
Section 5, we apply the GKI of
Section 4 to obtain converse bounds on compression and prediction of irreducible machines, both in the probabilistic setting and for individual sequences. Finally, in
Section 6, we extend the GKI to the case of availability of SI, and in
Section 7, we extend it to the lossy case.
2. Notation, Setting and Background
Throughout this paper, scalar random variables (RV’s) will be denoted by capital letters, their sample values will be denoted by the respective lower case letters, and their alphabets will be denoted by the respective calligraphic letters. A similar convention will apply to random vectors and their sample values, which will be denoted with the same symbols superscripted by the dimension. Thus, for example, (n – positive integer) will denote a random n-vector , and is a specific vector value in , the n–th Cartesian power of , which is the alphabet of each component of . For two positive integers, i and j, where , and will designate segments and , respectively, where for , the subscript will be omitted (as above). For , (or ) will be understood as the null string. An infinite sequence will be denoted by x. Logarithms and exponents, throughout this paper, will be understood to be taken to the base 2 unless specified otherwise. The indicator function of an event will be denoted by , i.e., if occurs and if not.
Following the FS encoding model of [
5], an FS encoder is defined by the quintuple,
, whose five ingredients are defined as follows:
is the finite alphabet of each symbol of the source sequence to be compressed. The cardinality of will be denoted by .
is a finite collection of binary variable-length strings, which is allowed to consist of empty string, denoted ‘null’ (whose length is zero);
is a finite set of s states of the encoder;
is the output function, and
is the next-state function.
Given an infinite source sequence to be compressed,
, with
, the FS encoder
E produces an infinite output sequence,
with
, forming the compressed bit-stream, while passing through a sequence of states
with
,
. The encoder is governed by the recursive equations:
for
, with a fixed initial state
. If at any step
, this is referred to as idling as no output is generated, but only the state evolves in response to the input. At each time instant
i, the encoder emits
bits, and it is understood that
.
Remark 1. The null string option (which also appears in [5]) is motivated by the wish to allow the encoder to “idle” for certain combinations of inputs and states rather than “enforcing” it to output compressed bits at each and every time instant. This idling option opens the door to having a lot more flexibility and sometimes it is even necessary. For example, even when considering a simple block code as an example of an FS encoder (as will be done in the sequel in Example 1), then, in general, the encoder can output nothing before having read the entire input block. So formally, if the block length is k, then the encoder idles for time instants, and only upon reading the last input symbol, it produces the compressed codeword for that block. An encoder with
s states, henceforth called an
s-state encoder, is one for which
. For the sake of simplicity, we adopt a few notation conventions from [
5]: given a segment of input symbols
, where
i and
j are positive integers with
, and an initial state
, we use
to denote the corresponding output segment
produced by
E. Similarly,
will denote the final state
after processing the inputs
, beginning from state
. Thus, in response to an input
, the encoder produces a compressed bit string of length
bits.
Definition 1. An FS encoder E is called IL if, given any initial state , any positive integer n, and any input string, , the triplet uniquely determines the corresponding input string .
Remark 2. The IL property can be considered as the FS counterpart of the notion of unique decodability for ordinary single-state codes (that is, codes with no memory). Indeed, every UD code can be viewed as a single-state IL encoder. But in general, an FS code is not necessarily prefix-free, because the codewords emitted at each time instant may depend on the internal state, which carries over additional information. It should be stressed that the IL property (required for each and every i and n) is attributed merely to the encoder, and it has nothing necessary to do with the mode of operation of the decoder. In particular, one may wonder why the final state, , plays a role in the ‘reconstruction’ of as defined in Definition 1. The final state is needed because an FS encoder can “carry information forward” in its state instead of emitting it out immediately. Consequently, without knowing the final state, some of the input information may still be stored in the encoder’s memory rather than in the emitted bits. It is easy to see this even in the above-mentioned simple example of a UD block code when viewed as an instance of an FS encoder: one can verify that the IL property holds in this case, and the reconstruction of the input according to Definition 1 indeed requires the final state in general (for details, the reader is referred to the discussion between Equations (15) and (16) in [10]). In Lemma 2 of [
5], Ziv and Lempel presented a GKI for IL FS encoders. It asserts that for every IL encoder with
s states and every positive integer
ℓ,
where we remind the reader that
is the alphabet size of the input sequence to be compressed. Ziv and Lempel’s GKI was a perfect tool for their purpose of proving that the compression ratio achieved by an IL FS encoder cannot be smaller than the asymptotic empirical entropy rate (defined in [
5]) for any infinite source sequence
x. However, when examined for finitely long sequences, and from the perspective of serving as a necessary condition for information losslessness, this inequality suffers from two main weaknesses.
It does not exactly recover the CKI for the special case,
, as in that case, the right-hand side (r.h.s.) becomes
. Moreover, even if
, the right-hand side (r.h.s.), which is
, is even larger than 2 for every
. On a related note, a close inspection of the proof of Lemma 2 in [
5] reveals that the inequality in Equation (
3) is actually a strong inequality (<), in other words, this inequality is always loose.
It is significant only upon an extension from single symbols into the super-alphabet of ℓ-strings for large ℓ, unlike the ordinary Kraft inequality, which is asserted in the same level that the code is defined. For example, the CKI for a code that is defined in the level of single symbols of is asserted in that level, i.e., .
Our objective in this work is first and foremost to establish another GKI for IL FS encoders that is free of the above-mentioned drawbacks. In other words, for the case , it would recover the traditional Kraft inequality exactly, and it will be posed in the single-letter level without recourse to alphabet extensions. The latter property will enable one to verify relatively easily that this inequality holds in a given situation.
Our first proposed GKI serves as the basis for our subsequent derivations. Having derived it, we then confine attention to the subclass of irreducible IL FS encoders, namely, FS encoders for which every state can be reached from every state in a finite number of steps. For this important subclass of encoders, we provide several alternative formulations of the GKI and provide a stronger upper bound to the growth rate of the Kraft sum as function of the block length. Again, all these forms are smooth extensions of the CKI in the sense that in the special case , they degenerate to the CKI. Finally, we consider extensions in two directions (one at a time): the first is the case where SI is available to both encoder and decoder, and the second is the case of lossy compression.
3. The Basic Generalized Kraft Inequality
For a given IL FS encoder
E with
s states, let us define an
Kraft matrix
K, whose
entry is given by
where the summation over an empty set is understood as zero. Since
K is a non-negative matrix, then according to Theorem 8.3.1 in [
11], the spectral radius of
K,
, is an eigenvalue of
K. (We remind the reader that the spectral radius is the maximum absolute value (magnitude) of the eigenvalues of a square matrix.)
Our first form of a GKI is the following.
As can be seen, this GKI has the two desired properties we mentioned above:
The first property sets the stage of establishing the condition as a necessary condition for information losslessness of a given FS encoder, in analogy to the fact that ordinary Kraft inequality is a necessary (and sufficient) condition for the existence of unique decodability in the case . Since there is no involvement of summations over super-alphabets of long vectors, this condition is relatively easy to check, similarly as the CKI, which is a necessary condition for the unique decodability property of ordinary lossless source codes.
Proof. The proof is in the footsteps of Karush [
12]. Let
. For every positive integer
ℓ, the
entry of the
ℓ-th order power,
, is given by
where in the first line,
and
, and the inequality is due to the postulated IL property (as
z and
are fixed). Alternatively, we can also bound
by
using the same considerations as in the proof of Lemma 2 in [
5], except that the factor
is missing since
z and
are fixed. The choice of which is better between these two bounds depends, of course, on
. In any case, both expressions are essentially linear in
ℓ. Continuing with the first bound, it follows that
Let
be a column vector of dimension
s whose entries are all zero except the entry corresponding to state
z, which is 1, and let
1 denote the all-one column vector of dimension
s. Then, Equation (
7) can be rewritten as
To prove that
, we proceed by contradiction. Assume conversely, that
. Since
K has non-negative entries, the Perron–Frobenius theorem (see again Theorem 8.3.1 in [
11]) guarantees that the right eigenvector
v corresponding to
has non-negative components and at least one strictly positive component. Since
has strictly positive components, there exists a constant
such that
component-wise. Multiplying by
from the left and using the non-negativity of
K, we obtain
Taking the
z-th component yields
For any index
z with
, the r.h.s. grows exponentially in
ℓ since
, but this contradicts Equation (
8) which establishes an upper bound that grows only linearly in
ℓ. Therefore the postulate
cannot hold true, and we conclude that
, which completes the proof. □
Since
, it is clear that for every natural
ℓ,
. In other words, the spectral radius of
is also never larger than unity, which is an extension of our GKI to super-alphabets, which is again, a smooth extension that degenerates to the CKI for
.
Example 1. Consider a binary source sequence and a block code of length 2, which maps the source strings 00, 01, 10, and 11, into 0, 10, 110, and 111, respectively. This code can be implemented by an FS encoder with states, labeled ‘S’, ‘O’, and ‘I’, using the following functions, f and g (see also Figure 1):andState ‘S’ designates the start of a block. State ‘O’ remembers that the first input of the block was ‘0’ and state ‘I’ remembers that the first input was ‘1’. Upon moving to state ‘I’, the encoder can already output ‘11’, because the entire codeword will be either ‘110’ or ‘111’ if the first source symbol is ‘1’, so the first two coded bits are ‘11’ in either case. After state ‘I’, the encoder can complete the codeword according to the second input in the block. After state ‘O’, outputs are generated only upon receiving the second symbol. After both states ‘O’ and ‘I’, the encoder must return to state ‘S’ in order to start the next block. The corresponding Kraft matrix (with row and column indexing in the order of (S,O,I)) is given by:whose eigenvalues are 1, 0, and , and so the spectral radius is . As can be seen, the sums of the second and third rows do not exceed unity, so when the initial state is either ‘O’ or ‘I’, the Kraft sum does not exceed 1. On the other hand, the Kraft sum corresponding to the first row (pertaining to ‘S’) exceeds unity. This demonstrates an important observation: the model of a general IL FS encoder is broader and more general than a model of an FS encoder for which given every state, the encoder implements a certain prefix (or UD) code for the variety of incoming symbols. For , we find thatwhere eigenvalues are 0, 1, and 1. Here, the sums of the first and the second rows do not exceed unity, so when the initial state is either ‘S’ or ‘O’, the Kraft sum does not exceed 1. On the other hand, the Kraft sum corresponding to the third row exceeds unity, and so, the above comment with regard to K applies here too. This concludes Example 1.
Figure 1.
State transition diagram of the encoder in Example 1. The various state transitions are labeled in a form , where x denotes the input and denotes the output.
Figure 1.
State transition diagram of the encoder in Example 1. The various state transitions are labeled in a form , where x denotes the input and denotes the output.
Earlier, we said that is a necessary condition for a given code with next-state function g and code-lengths to be IL. One might naturally wonder whether it is also a sufficient condition. This question is open in general, but we have two comments related to this issue.
The first is that the answer is obviously affirmative for the subclass of IL encoders, which satisfies the CKI for each and every state, i.e., : Simply construct a separate prefix code with length function for each . However, in general, an IL code does not necessarily satisfy the ordinary Kraft inequality for each z. Indeed, in Example 1, the sum of the first row of K is larger than 1.
The second comment is that we can give an affirmative answer in the level of longer blocks. Let
be an arbitrary initial state and consider the lengths,
. Then, as we have seen in (
6):
where the factor of
s stems from taking the sum of
over
. Equivalently,
and so, there exists a prefix code with lengths
, which are relatively only slightly longer than those of the original code. Here, the additional
term is a header that notifies
.
4. Irreducible FS Encoders
IL FS encoders for which the next-state function g allows transition from every state to every state within a finite number of steps, are henceforth referred to as irreducible FS encoders. Equivalently, defining the adjacency matrix A such that whenever such that and otherwise, then an IL FS encoder is irreducible if the matrix A is irreducible. Likewise, an IL FS encoder is irreducible if the matrix K is irreducible. For an irreducible FS encoder, the shortest path from every state z to every other state lasts no longer than steps, because any longer path must visit a certain state at least twice, meaning that this path contains a loop starting and ending at , which can be eliminated. Clearly, the encoder of Example 1 is irreducible.
Intuitively, it makes sense to use irreducible encoders, because for reducible ones, once the machine leaves a certain subset of transient states, it can never return, and so, effectively, reducible encoders use eventually a smaller number of states after finite time. Specifically, given a reducible machine and an infinite individual sequence , suppose the machine starts at a transient state. Then, there are two possibilities: either the machine quits the subset of transient states after finite time, or it stays in that subset forever. In the former case, the transient states are in use for finite time only and then never used again. In the latter case, the recurrent states are never used. In either case, asymptotically, only a subset of the available states are used, and so, effectively, the number of states actually used is smaller than s. Let denote the set of states visited infinitely many times along the sequence. This set is necessarily closed and induces a strongly connected subgraph. Consequently, the asymptotic behavior of the encoder along the given sequence is governed entirely by its restriction to , which constitutes an irreducible FS encoder with strictly fewer than s states. Therefore, reducible encoders cannot offer asymptotic advantages over irreducible ones, even for individual sequences.
Assume next that the next-state function
g induces an irreducible matrix
, where
ℓ be an arbitrary positive integer. Since
is non-negative and irreducible, the Collatz–Wielandt formulas [
13,
14] for the spectral radius of
hold true. These are given by
where
w is an
s-dimensional column vector and
is the set of all such vectors with non-negative components not all of which are zero. These lead to the two following GKIs:
and
The first formulation can be simplified at the price of a possible loss of tightness, by selecting
w to be the all-one vector and thereby bounding
from below. This results in the conclusion that an IL FS encoder always satisfies yet another GKI:
In words, for every given irreducible FS encoder,
, and for every natural
ℓ, there is at least one initial state,
, for which the Kraft sum is less than unity, but again, not all states must satisfy this condition (as we saw in Example 1, the Kraft sum exceeds unity when the initial state is ‘S’). All these are also smooth extensions of the CKI in the sense that for
we are back to the CKI.
But there is an even stronger GKI that applies to irreducible encoders. It asserts that in the irreducible case,
does not even grow linearly as in (
6), but is rather bounded by a constant, independent of
n. For
, this constant is 1, again in agreement with the CKI.
Theorem 2. Let K be an irreducible Kraft matrix. Then, for all and for every natural n,Consequently, for every ,and Proof. It is sufficient to prove the first inequality, as the two other ones will follow trivially by a summation over
and then also over
, respectively. Since
K is non-negative and irreducible, the Perron–Frobenius theorem applies. This theorem asserts that the spectral radius,
, is positive and simple, with left and right eigenvectors,
u and
v, respectively, that have only strictly positive components. In Theorem 1 we have already proved that
. Assume first that
. Then,
, or, equivalently,
Since all terms are non-negative, the left-hand side is lower bound by
for any
. This implies for every
Let
and
be achievers of
and
, respectively. Then, for every
,
Since
K is irreducible and since
and
are distinct, there exists a path of length
from
to
, say,
such that
Since all positive entries of
K are at least as large as
, this product is at least as large as
. It follows then that
Now,
which implies that
for every
. This completes the proof for the case
. The case
is obtained from the case
by simply defining
and using the fact that all non-negative entries of
are lower bounded by
. Since
is also irreducible and since
, we now have
But
, and so,
This completes the proof of Theorem 2. □
5. Converse Bounds Derived from the GKI
In this section, we demonstrate how the GKI of
Section 4 can be used to obtain lower bounds on the performance of irreducible machines in compression and in prediction problems. For compression, both probabilistic sources and individual sequences are considered. For prediction, only the individual sequence version is presented, but the probabilistic counterpart can also be derived straightforwardly using the same ideas.
5.1. Compression of Probabilistic Sources
Let
be a joint probability distribution of random variables
Z and
. Then,
where the inequality follows from Jensen’s inequality and the convexity of the exponential function. By taking logarithms of both sides, rearranging terms, and normalizing by
ℓ, we get
and if the source
P is stationary,
can be further lower bounded by
, to obtain
Since this bound applies to every positive integer
ℓ, we may maximize the lower bound over
ℓ, and obtain
We see that thanks to Theorem 2, the vanishing term subtracted from the entropy decays at the rate of
as opposed to the
rate that stems from Lemma 2 of [
5] as well as from the more general inequality of
, that is obtained when reducible machines are allowed.
5.2. Compression of Individual Sequences
In the context of individual sequences, we can arrive at an analogous lower bound, provided that we define a shift-invariant empirical distribution. Specifically, let
be a given individual sequence, let
ℓ be a positive integer smaller than
n, and let
be a given initial state of the encoder. We assume that
cyclic with respect to (w.r.t.)
g in the sense that
. If this is not the case, consider an extension of
by concatenating a suffix
such that the extended sequence would be cyclic w.r.t.
g. Since
g is assumed irreducible, this is always possible and the length
m of the extension need not be larger than
. To avoid cumbersome notation, we redefine
to be the sequence after the cyclic extension (if needed), and we shall keep in mind that this cyclic extension adds no more than
bits to the compressed description, or equivalently,
to the compression ratio, and so, this extra rate should be subtracted back upon returning to the original sequence before the cyclic extension. For every
and
, let
where ⊕ denotes modulo-
n addition. Next, define the empirical distribution
Now,
where
is the empirical conditional entropy derived from the shift-invariant distribution
. The last inequality follows from a similar derivation as in the probabilistic case considered above, except that the earlier distribution
P is now replaced by the empirical one,
, and therefore the corresponding entropies are replaced by their empirical counterparts. Using the fact that this is true for every natural
and returning to the original sequence before the cyclic extension, we find that
Furthermore, invoking Ziv’s inequality (see Equation (13.125) in [
4]), this can be further lower bounded in terms of the LZ complexity. Specifically, according to Equation (13.125) in [
4], for every Markov source,
, of order
and every
,
where
is the maximum number of distinct phrases whose concatenation forms
, and where
tends to zero at the rate of
for every fixed
ℓ. By minimizing the r.h.s. w.r.t.
, we get
and so,
The minimizing
ℓ can be found to be proportional to
, but the dominant term of
remains of the order of
.
5.3. Prediction of Individual Sequences
We next derive a lower bound to the prediction error of any FS predictor that is based on an irreducible FS machine. The idea is to harvest the compression lower bound to induce a lower bound on prediction by considering an FS encoder that is based on FS prediction and encoding the prediction error (predictive coding)—see
Figure 2.
Consider an FS predictor with
q states, defined by the following recursion, for
where
,
,
, is a corresponding infinite state sequence, whose alphabet,
, is a finite set of states of cardinality
q, and
,
,
, is the resulting predictor output sequence. Without loss of generality, the initial state,
, and the initial prediction,
, are assumed fixed members,
and
, respectively. Here,
is the predictor output function and
is the next-state function.
It is assumed that
is a group with well-defined addition and subtraction operations. For example, if
then it is natural to equip
with addition and subtraction modulo
. Let
denote a given loss function. Then, the performance of a predictor across the time range,
is measured in terms of the time-average,
Given an arbitrary irreducible FS predictor
as defined above, consider the auxiliary conditional probability distribution,
where
Define also the function
Now, define
where
and
are arbitrary members of
and
, respectively, such that
, and
are generated from
as in (
42).
Let
k divide
n and consider the lossless compression of
in blocks of length
k,
,
, by using the Shannon code, whose length function for a vector
is
. This is equivalent to predictive coding, where the prediction error signal,
is compressed losslessly under a model of a memoryless source with a marginal
(see
Figure 2 for illustration). In this case, since the ceiling operation is carried over
k-blocks, and there are
such
k-blocks, the upper bound to
becomes
On the other hand, the corresponding encoder of
Figure 2 can be viewed as an encoder with
states, where
, since this is the number of combinations of a state of the
q-state predictor and a state of the lossless block encoder, whose number of states is
. Thus,
where it should be kept in mind that
is expected to grow linearly with
k. Thus, by comparing the upper bound and the lower bound to
, we have
or, equivalently,
Maximizing the r.h.s. over
, we get
The bound is meaningful if
and
, so that the two subtracted terms in the argument of the function
are small compared to the main term,
. It is tight essentially for sequences of the form
,
, where
is typical to an i.i.d. source and where the marginal empirical distribution of each
is close to
for some
.
6. GKI in the Presence of Side Information
We will now discuss briefly an extension of the GKI for IL FS encoders in the case where SI is available at both the encoder and the decoder. The resulting condition is expressed in terms of the joint spectral radius (JSR) of a finite set of nonnegative matrices indexed by the various side-information symbols. We identify verifiable sufficient conditions for subexponential growth of Kraft sums and discuss the limitations inherent in the presence of SI.
Let
be the source alphabet as before and let
denote the finite alphabet of the SI sequence,
, whose symbols are synchronized with the corresponding source symbols. As before, let
be the finite set of states with
. An FS encoder with SI is specified by an output function
, (
being defined as a subset of
, similarly as before) and a next-state function
. Given an initial state,
, a source sequence,
, and a SI sequence,
, the encoder implements the equations:
for
, and the total code-length produced by the encoder after
n steps is
Definition 2. An FS encoder is said to be information-lossless with side information if for every n, the quadruple dictates .
For each SI symbol,
, define the corresponding Kraft matrix
Each
is a nonnegative
matrix. For a given SI sequence,
, define the product matrix
Now, let
. The growth rate of the Kraft products,
, over arbitrary SI sequences,
, is governed by the JSR of
, which is defined as follows.
Definition 3. The JSR of is defined aswhere is any matrix norm. It is a classical result that this limit exists and is independent of the chosen norm. The GKI in the presence of SI can be formulated as follows.
Theorem 3. For an IL FS encoder with SI, Proof. Fix an arbitrary SI sequence,
and states
. The
entry of
is given by
Since the encoder is IL for the fixed sequence
, the mapping between
and
is injective over all paths from
z to
. Grouping sequences according to their total code-length (similarly as before) and using a standard counting argument yield a linear upper bound (in
n) on each matrix entry of
, uniformly over
. Exponential growth of
is therefore impossible, and the JSR must satisfy
. □
The following proposition can sometimes help. For example, if satisfies Proposition 1, this means that the Kraft sum is less than or equal to unity for every initial state and every SI sequence. In such a case, one can simply design a separate prefix code for every combination of initial state and SI sequence.
Proposition 1. If there exists a vector with strictly positive components such that for every , then for every SI sequence , , and hence the family is uniformly bounded.
Proof. The claim follows by induction on n. Since , uniform boundedness of all products implies . □
In contrast to the case without SI, bounding the spectral radius of each individual Kraft matrix
is necessary but insufficient to control the growth rate of arbitrary products. In other words, even if
for every
individually, the JSR may exceed unity, and in fact, may be arbitrarily large. As an example, let
be an arbitrarily small positive real and consider the matrices
and
. While
, which is arbitrarily small, it turns out that
which is accordingly, arbitrarily large. The JSR is therefore the correct quantity governing feasibility.
Exact computation of the JSR is undecidable in general, even for nonnegative rational matrices. Consequently, the above result should be interpreted as a structural constraint rather than a computational criterion. Nonetheless, there is a plethora of upper and lower bounds to the JSR. Also, as mentioned earlier, the existence of a common positive sub-invariant vector provides a meaningful and verifiable sufficient condition for subexponential growth.
7. GKI for Lossy Compression
For lossy compression, we adopt a simple encoder model, where each source vector
is first mapped into a reproduction vector
within distortion
and then
is losslessly compressed by an IL FS encoder with
s states exactly as before. The latter may work in the level of single letters or in the level of
ℓ-blocks. Let us define
and let
. Now,
and so,
entry-wise. Now,
has all the properties that we have proved for the lossless case, it is just defined in the super-alphabet of
ℓ-blocks. Since
, we readily have:
Inequality (
62) can be viewed as the FS analog of very similar earlier results derived in [
7,
8,
9], for lossy
D-semifaithful codes combined with UD codes, i.e., codes that consist of a cascade of a reproduction encoder (within distortion
D as above) followed by UD lossless compression of the resulting reproduction vector. In those earlier articles, the main result was a generalized Kraft inequality, where the Kraft sum (or integral, in the continuous case) is upper bounded by the volume of a ball of normalized radius
D in terms of the distortion measure, which, in essence, is exactly
.
For additive distortion measures, the quantity
can be estimated using the method of types [
15], or the Chernoff bound, or saddle-point integration [
16,
17]. If the method of types is used, then
is upper bounded by
, where
Thus, the corresponding GKI reads
But this bound is tight only in terms of the exponential order as a function of
ℓ and hence is meaningful mainly for very large
ℓ. For example, if the source and the reproduction vectors are binary, and the Hamming distortion measure is adopted, then it turns out that
But we can say somewhat more in this case: here,
is simply the cardinality of a Hamming sphere of radius
, which upon careful analysis (see for example [
17]), can be shown to be
In [
9], more general results are available, including the case of multiple simultaneous distortion constraints.