Refinements and Extensions of Ziv’s Model of Perfect Secrecy for Individual Sequences

We refine and extend Ziv’s model and results regarding perfectly secure encryption of individual sequences. According to this model, the encrypter and the legitimate decrypter share a common secret key that is not shared with the unauthorized eavesdropper. The eavesdropper is aware of the encryption scheme and has some prior knowledge concerning the individual plaintext source sequence. This prior knowledge, combined with the cryptogram, is harnessed by the eavesdropper, who implements a finite-state machine as a mechanism for accepting or rejecting attempted guesses of the plaintext source. The encryption is considered perfectly secure if the cryptogram does not provide any new information to the eavesdropper that may enhance their knowledge concerning the plaintext beyond their prior knowledge. Ziv has shown that the key rate needed for perfect secrecy is essentially lower bounded by the finite-state compressibility of the plaintext sequence, a bound that is clearly asymptotically attained through Lempel–Ziv compression followed by one-time pad encryption. In this work, we consider some more general classes of finite-state eavesdroppers and derive the respective lower bounds on the key rates needed for perfect secrecy. These bounds are tighter and more refined than Ziv’s bound, and they are attained using encryption schemes that are based on different universal lossless compression schemes. We also extend our findings to the case where side information is available to the eavesdropper and the legitimate decrypter but may or may not be available to the encrypter.

To the best of the author's knowledge, there are only two exceptions to this prevailing paradigm, documented in an unpublished memorandum by Ziv [32] and a subsequent work [33].Ziv's memorandum presents a unique approach wherein the plaintext source, to be encrypted using a secret key, is treated as an individual sequence.The encrypter is modeled as a general block encoder, while the eavesdropper employs a finite-state machine (FSM) as a message discriminator.That memorandum postulates that the eavesdropper possesses certain prior knowledge about the plaintext, expressed as a set of "acceptable messages", hereafter referred to as the acceptance set.In other words, before observing the cryptogram, the eavesdropper uncertainty about the plaintext sequence is that it could be any member is in this set of acceptable messages.
This assumption about prior knowledge available to the eavesdropper is fairly realistic in real life.Consider, for example, the case where the plaintext alphabet is the latin alphabet, but the eavesdropper furthermore knows that the plaintext must be a piece of text in the Italian language.
In this case, her prior knowledge, first and foremost, allows her to reject every candidate string of symbols that includes the letters 'j', 'k', 'w', 'x' and 'y', which are not used in Italian.Another example, which is common to English and some other languages, is that the letter 'q' must be followed by 'u'.In the same spirit, some additional rules of grammar can be invoked, like limitations on the number of successive consonant (or vowel) letters in a word, a limitation on the length of a word, and so on.Now, according to Ziv's approach, perfectly secure encryption amounts to a situation where the presence of the cryptogram does not reduce the uncertainty associated with the acceptance set.In other words, having intercepted the cryptogram, the eavesdropper learns nothing about the plaintext that she did not not know before.The size of the acceptance set can be thought of as a quantifier of the level of uncertainty: a larger set implies greater uncertainty.The aforementioned FSM is used to discriminate between acceptable and unacceptable strings of plaintext symbols that can be obtained by examining various key bit sequences.Accordingly, perfect security amounts to maintaining the size of the acceptance set unchanged, and consequently, the uncertainty level, in the presence of the cryptogram.The principal finding in Ziv's work is that the asymptotic key rate required for perfectly secure encryption, according to this definition, cannot be lower (up to asymptotically vanishing terms) than the Lempel-Ziv (LZ) complexity of the plaintext source [10].
Clearly, this lower bound is asymptotically achieved by employing one-time pad encryption (that is, bit-by-bit XOR with key bits) of the bit-stream obtained from LZ data compression of the plaintext source, mirroring Shannon's classical probabilistic result which asserts that the minimum required key rate equals the entropy rate of the source.
In the subsequent work [33], the concept of perfect secrecy for individual sequences was approached differently.Instead of a finite-state eavesdropper with predefined knowledge, it is assumed that the encrypter can be realized by an FSM which is sequentially fed by the plaintext source and random key bits.A notion of "finite-state encryptability" is introduced (in the spirit of the analogous finite-state compressibility of [10]), which designates the minimum key rate which must be consumed by any finite-state encrypter, such that probability law of the cryptogram would be independent of the plaintext input, and hence be perfectly secure.Among the main results of [33], it is asserted and proved that the finite-state encryptability of an individual sequence is essentially bounded from below by its finite-state compressibility, a bound which is once again attained asymptotically by LZ compression followed by one-time pad encryption.
In this work, we revisit Ziv's approach to perfect secrecy for individual sequences [32].After presenting his paradigm in detail, we proceed to refine and generalize his findings in certain aspects.First, we consider several more general classes of finite-state discriminators that can be employed by the eavesdropper.These will lead to tight lower bounds on the minimum key rate to be consumed by the encrypter, which will be matched by encryption schemes that are based some other universal data compression schemes.The resulting gaps between the lower bounds and the corresponding upper bounds (i.e., the redundancy rates) would converge faster.Among these more general classes of finite-state machines, we will consider finite-state machines that are equipped with counters, as well as periodically time-varying finite-state machines with counters.Another direction of generalizing Ziv's findings is the incorporation of side information (SI) that is available both at the eavesdropper and the legitimate decrypter, but may or may not be available at the encrypter as well.
The outline of this article is as follows.In Section 2, we formulate the model setting, establish the notation, and provide a more detailed background on Ziv's model and results in [32].In Section 3, which is the main section of this article, we present the refinements and extensions to other types of FSMs, including FSMs with counters (Subsection 3.1), shift-register FSMs with counters (Subsection 3.2), and periodically time-varying FSMs with counters (Subsection 3.3).Finally, in Section 4, we further extend some of our findings to the case where SI is available at both the legitimate decrypter and the eavesdropper, but not necessarily at the encrypter.

Formulation, Notation and Background
Consider the following version of Shannon's cipher system model, adapted to the entryption of individual sequences, as proposed by Ziv [32].An individual (deterministic) plaintext sequence, x = (x 0 , . . ., x n−1 ) (n -positive integer), is encrypted using a random key K, whose entropy is H(K), to generate a cryptogram, W = T (x, K), where the mapping T (•, K) is invertible given K, namely, x can be reconstructed by the legitimate decoder, who has access to K, by applying the inverse function, x = T −1 (W, K).The plaintext symbols, x i , i = 0, 1, 2, . . ., n − 1, take on values in a finite alphabet, X , of size α.Thus, x is a member of X n , the n-th Cartesian power of X , whose cardinality is α n .Without essential loss of generality, we assume that K is a uniformly distributed random variable taking on values in a set K whose cardinality is 2 H(K) .Sometimes, it may be convenient to consider K to be the set {0, 1, . . ., 2 H(K) − 1}.A specific realization of the key, K, will be denoted by k.
An eavesdropper, who knows the mapping T , but not the realization of the key, K, is in the quest of learning as much as possible about x upon observing W .It is assumed that the eavesdropper also has some prior knowledge about x, even before observing W .In particular, the eavesdropper knows that the plaintext source string x must be a member of a certain subset of X n , denoted A n , which is referred to as the acceptance set.
Ziv models the eavesdropper by a cascade of a guessing decrypter and a finite-state message discriminator, which work together as follows.At each step, the eavesdropper examines a certain key, k ∈ K, by generating an estimated plaintext, x = T −1 (W, k), and then feeding x into the message discriminator to examine whether or not x ∈ A n .If the answer is affirmative, x is accepted as a candidate, otherwise, it is rejected.Upon completing this step, the eavesdropper moves on to the next key, k + 1, and repeats the same process, etc.The message discriminator is modeled as a finite-state machine, which implements the following recursive equations for i = 0, 1, 2, . . ., n − 1: where z 0 , z 1 , z 2 , . . ., z n−1 is a sequence of states, z i ∈ S, i = 0, 1, 2, . . ., n − 1, S being a set of s states (and with the initial state, z 0 , as a fixed member of S), u 0 , u 1 , . . ., u n−1 is a binary output sequence, f : S × X → {0, 1} is the output function, and g : S × X → S is the next-state function.
If u 0 , u 1 , u 2 , . . ., u n−1 is the all-zero sequence, x is accepted, namely, x ∈ A n , otherwise, as soon x is rejected.In other words, A n is defined to be the set of all {x} for which the response of the finite-state discriminator is the all-zero sequence, u = (0, 0, . . ., 0).
Example 1.Let X = {0, 1} and suppose that membership of x in A n forbids the appearance of more than two successive zeroes.Then, a simple discriminator can detect the violence of this rule using the finite-state machine defined by S = {0, 1, 2} and More generally, consider the set of binary sequences that comply with the so called (d, k)-constraints, well known from the literature of magnetic recording (see, e.g., [34] and references therein), namely, binary sequences, where the runs of successive zeroes must be of length at least d and at most k, where d and k (not to be confused with the notation of the encryption key) are positive integers with d ≤ k.We shall return to this example later.✷ Ziv defines perfect secrecy for individual sequences as a situation where even upon observing W , the eavesdropper's uncertainty about x is not reduced.In the mathematical language, let us denote and then, perfect secrecy is defined as a situation where or equivalently, To demonstrate these concepts, consider the following example.
and then for a one-time pad encrypter, where ⊕ denotes bit-wise XOR (modulo 2 addition).Let the set K of all 8 possible key strings be given by 1111 1000 1100 1001 0000 0111 0011 0110 Obviously, the decryption is given by Following Example 1, suppose that A 4 is the set of all binary vectors of length n = 4, which do not contain runs of more than two zeroes.There are only 3 binary vectors of length 4 that contain a succession of more than 2 (i.e., 3 or 4) zeroes, namely, (0000), (1000), and (0001).Thus, On the other hand, which means that this encryption system is not perfectly secure.The reason is that the key space, K, is not large enough.✷ Clearly, the best one can do in the quest of minimizing H(K), without compromising perfect secrecy, is to design the encrypter in such a way that for every cryptogram W that can possibly be obtained from some combination of x and k.Conceptually, this can be obtained by mapping A n to the set of all binary sequences of length H(K) by means of a fixed-rate data compression scheme and applying one-time pad encryption to the compressed sequence.Here, and throughout the sequel, we neglect integer length constraints associated with large numbers, and so, H(K) is assumed integer without essential loss of generality and optimality.Remark 1.For readers familiar with the concepts and the terminology of coding for systems with (d, k) constraints (and other readers may skip this remark without loss of continuity), it is insightful to revisit the second part of Example 1: If A n is the set of binary n-sequences that satisfy a certain (d, k) constraint, then optimal encryption for A n pertains to compression using the inverse mapping of a channel encoder for the same (d, k) constraint, namely, the corresponding channel decoder, which is followed by one-time pad encryption.The minimum key rate needed is then equal to the capacity of the constrained system, which can be calculated either algebraically, as the logarithm of the Perron-Frobenius eigenvalue of the state adjacency matrix of the state transition diagram, or probabilistcally, as the maximum entropy among all stationary Markov chains that are supported by the corresponding state transition graph [34].✷ Ziv's main result in [32] is that for a finite-state discriminator, if x ∈ A n , the cardinality of A n cannot be exponentially smaller than 2 LZ(x) (see Appendix for the proof), where LZ(x) is the length (in bits) of the compressed version of x using the 1978 version of the Lempel-Ziv algorithm (the LZ78 algorithm) [10], and so, the key rate needed in order to completely encrypt all members of A n is lower bounded by where ǫ n is a positive sequence tending to zero as n → ∞ at the rate of log(log n) log n , and where here and throughout the sequel, the notation |E| for a finite set E, designates the cardinality of E. The first inequality of (13) follows from (8).Obviously, this bound is essentially attained by LZ78 compression of x, followed by one-time pad encryption using LZ(x) key bits.As can be seen, the gap, ǫ n , between the upper bound and the lower bound on R is O log(log n) log n , which tends to zero rather slowly.
Remark 2. For an infinite sequence x = x 0 , x 1 , x 2 , . .., asymptotic results are obtained in [32] by a two-stage limit: First, consider a finite sequence of total length m • n, which is divided into m nonoverlapping n-blocks, where the above described mechanism is applied to each n-block separately.
The asymptotic minimum key rate is obtained by a double limit superior, which is taken first, for m → ∞ for a given n, and then for n → ∞.In this work, we will have in mind a similar double limit, but we shall not mention it explicitly at every relevant occasion.Instead, we will focus on the behavior of a single n-block.

More General Finite-State Discriminators
This section, which is the main section in this article, is devoted to describe several more general classes of finite-state discriminators along with derivations of their respective more refined bounds.

FSMs with Counters
While Ziv's model for a finite-state discriminator is adequate for rejecting sequences with certain forbidden patterns (like a succession of more than two zeroes in the above examples), it is not sufficient to handle situations like the following one.Suppose that encrypter applies a universal lossless compression algorithm for memoryless sources, followed by one-time pad encryption.Suppose also that the universal compression scheme is a two-part code, where the first part encodes the index of the type class of x, using a number of bits that is proportional to log n, and the second part represents the index of x within the type class, assuming that the encoder and the decoder have agreed on some ordering ahead of time.In this case, the length of the cryptogram (in bits), which is equal to the length of the compressed data, is about where Ĥ(x) is the empirical entropy of x (see, e.g., [5] and references therein).The eavesdropper, being aware of the encryption scheme, observes the length L of the cryptogram, and immediately concludes that x must be a sequence whose empirical entropy is In other words, in this case, Therefore, every sequence whose empirical distribution pertains to empirical entropy different from H 0 should be rejected.To this end, our discriminator should be able to gather empirical statistics, namely, to count occurrences of symbols (or more generally, count combinations of symbols and states) and not just to detect a forbidden pattern that might have occurred just once in x.
This motivates us to broaden the class of finite-state discriminators to be considered, in the following fashion.We consider discriminators that consist of a next-state function, as before, but instead of the binary output function, f , of [32], these discriminators are equipped with a set of α • s counters that count the number of joint occurrences of all (x, z) ∈ X × S for i = 0, 1, 2, . . ., n − 1, i.e., where I{A}, for a generic event A, denotes its indicator function, namely, I{A} = 1 if A is true and I{A} = 0 if not.A sequence x is accepted (resp.rejected) if the matrix of counts, {n(x, z), x ∈ X , z ∈ S}, satisfies (resp.violates) a certain condition.In the example of the previous paragraph, a sequence is accepted if where n(x) = z∈S n(x, z).Ziv's discriminator model is clearly a special case of this model: let (x ⋆ , z ⋆ ) be any combination of input and state such that f (z ⋆ , x ⋆ ) = 1 in Ziv's model (that is, a "forbidden" combination).Then, in terms of the proposed extended model, a sequence is rejected Clearly, since x ∈ A n and since membership in A n depends solely on the matrix of counts, {n(x, z), x ∈ X , z ∈ S}, it follows that all {x} that share the same counts as these of x must also be members of A n .The set of all {x} of length n with the same counts, {n(x, z), x ∈ X , z ∈ S}, as these of x, is called the finite-state type class w.r.t. the FSM g (see also [5]), and it is denoted by T g (x).Since A n ⊇ T g (x), It is proved in [5] (Lemma 3 therein), that if n(x, z) ≥ n(z)δ(n) for every (x, z) ∈ X × S, where δ(n) > 0 may even be a vanishing sequence, then where with n(z) = x∈X n(x, z) and with the conventions that 0 log 0 ∆ = 0 and 0/0 ∆ = 0.It therefore follows that the key rate needed for perfect secrecy is lower bounded by This lower bound can be asymptotically attained by an encrypter that applies universal loss compression for finite-state sources with the next state function g, followed by one-time pad encryption.This universal lossless compression scheme is based on a conceptually simple extension of the abovementioned universal scheme for the class of memoryless sources: one applies a two-part code where the first part includes a code for the index of the type class w.r.t.g (using a number of bits that is proportional to log n), and the second part encodes the index of the location of x within T g (x) according to a predefined order agreed between the encoder and the decoder (see [5] for more details).
More precisely, the compression ratio that can be achieved, which is also the key rate consumed, is upper bounded by (see Section III of [5]): which tells us that the gap between the lower bound and the achievability is proportional to log n n , which decays much faster than the aforementioned O log(log n) log n convergence rate of the gap in Ziv's approach.Moreover, for most sequences in most type classes, {T g (x)}, the coding rate (which is also the key rate in one-time-pad encryption) of Ĥ(X|Z) + s(α−1)

2
• log n n is smaller than LZ(x)/n since the former quantity is essentially also a lower bound to the compression ratio of any lossless compression scheme for most individual sequences in T g (x), for almost all such type classes [5, Theorem 1].The converse inequality between Ĥ(X|Z) and LZ(x)/n, which follows from Ziv's inequality (see [35,Lemma 13.5.5]and [36]) holds up to an O log(log n) log n term again.
Remark 3. To put Ziv's result in perspective, one may wish to envision a class of discriminators defined by a given dictionary of c distinct 'words', which are the c phrases in Ziv's derivation [32] (see also Appendix).A possible definition of such a discriminator is that it accepts only n-sequences of plaintext that are formed by concatenating words from the given dictionary, allowing repetitions.
These are no longer finite-state discriminators.In this case, Ziv's derivation is applicable under the minor modification that the various phrases should be classified only in terms of their length, but without further partition according to initial and final states (z and z ′ in the derivation of the appendix).This is equivalent to assuming that s = 1, and accordingly, the term 2c log s n in the last equation of the appendix should be omitted.Of course, the resulting bound can still be matched by LZ78 compression followed by one time-pad encryption.

Shift-Register Machines with Counters
Since the eavesdropper naturally does not cooperate with the encrypter, the latter might not know the particular FSM, g, used by the former, and therefore, it is instructive to derive key-rate bounds that are independent of g.To this end, consider the following.Given x and a fixed positive integer ℓ (ℓ ≪ n), let us observe the more refined joint empirical distribution, as well as all partial marginalizations derived from this distribution, where here, ⊕ denotes addition modulo n so to create a periodic extension of x (hence redefining z 0 = g(z n−1 , x n−1 )).Accordingly, the previously defined empirical conditional entropy, Ĥ(X|Z) is now denoted Ĥ(X 1 |Z 1 ), which is also equal to Ĥ(X 2 |Z 2 ), etc., due to the inherent shift-invariance property of the empirical joint distribution extracted under the periodic extension of x.Consider now the following chain of inequalities: = Î(Z 0 ; X 0 , . . ., X ℓ ) where Î(•; •) denotes empirical mutual information and where in the second line, the term corresponding to j = 0 should be understood to be [ Ĥ(X 0 ) − Ĥ(X 0 |Z 0 )].The first inequality follows because given (X 0 , . . ., X j−1 , Z 0 ), one can reconstruct Z 1 , Z 2 , . . ., Z j by j recursive applications of the next-state function, g.Therefore, Equivalently, and so, combining this with eqs.( 20) and ( 21), we get The advantage of this inequality is in its independence upon the arbitrary next-state function g.
In fact, we actually replaced the arbitrary FSM, g, by a particular FSM -the shift-register FSM, whose state is z i = (x i−ℓ , x i−ℓ+1 , . . ., x i−1 ) at the cost of a gap of log s ℓ+1 , which can be kept arbitrarily small if the size of the shift-register, ℓ, is sufficiently large compared to the memory size, log s, of g.
Remark 4. The fact that Ĥ(X ℓ |X 0 , . . ., X ℓ−1 ) cannot be much larger than Ĥ(X ℓ |Z ℓ ) for large enough ℓ actually suggests that whatever the state of any FSM g can possibly "remember" from the past of x is essentially captured by the recent past, and not by the remote past.While this is not surprising in the context of the probabilistic setting, especially if the underlying random process is ergodic and hence has a vanishing memory of the remote past, this finding is not quite trivial when it comes to arbitrary individual sequences.✷ Returning to our derivations, in view of the first three lines of ( 13), the key rate needed for perfect secrecy is lower bounded by and since this holds for any ℓ in some fixed range 1 ≤ ℓ ≤ l (i.e., where Note that if x is an "ℓ 0 -th order Markovian sequence" in the sense that Ĥ(X ℓ |X 0 , . . ., X ℓ−1 ) is almost fixed for all ℓ 0 ≤ ℓ ≤ l with l ≫ log s, then ℓ 0 is the preferred choice for ℓ as it essentially captures the best attainable key rate.
This lower bound can be asymptotically attained by universal lossless compression for ℓ-th order Markov types [37], [38], [39, Section VII.A], followed by one-time pad encryption, where the achieved rate is In this case, A n is the ℓ-th order Markov type of x and the finite-state discriminator of the eavesdropper is a shift-register machine, that checks whether the ℓ-th order Markov type of each x has the matching conditional empirical conditional entropy of order ℓ.
In this result, there is a compatibility between the converse bound and the achievability bound in the sense that both are given in terms of FSMs with a fixed number of states that does not grow with n.Among all possible finite-state machines, we have actually single out the shift-register machine universally, at the cost of a controllable asymptotic gap of log s ℓ+1 , but otherwise, the bound is explicit and it is clear how to approach it.If we wish to keep this gap below a given ǫ > 0, then we select ℓ = ⌈(log s)/ǫ⌉ − 1.In this sense, it is another refinement of Ziv's result.

Periodically Time-Varying FSMs with Counters
So far, we have considered time-invariance FSM, where the function g remains fixed over time.
We now expand the scope to consider the class of discriminators implementable by periodically time-varying FSMs, defined as follows: where i = 0, 1, 2, . . .and l is a positive integer that designates the period of the time-varying finite-state machine.Conceptually, this is not really more general than the ordinary, time-invariant FSM defined earlier, because the state of the modulo-ℓ clock can be considered part of the entire state, in other words, this is a time-invariant FSM with s • l states, indexed by the ordered pair (z i , i mod l).The reason it makes sense to distinguish between the state z t and the state of the clock is because the clock does not store any information regarding past input data.This is to say that in the context of time-varying finite-state machines, we distinguish between the amount of memory of past input (log s bits) and the period l.Both parameters manifest the richness of the class of machines, but in different manners.Indeed, the parameters s and l will play very different and separate roles in the converse bound to be derived below.
Remark 5. Clearly, the earlier considered time-invariant finite-state machine is obtained as the special case pertaining to l = 1, or to the case where l is arbitrary, but the next-state functions, g(•, •, 0), g(•, •, 1), . . ., g(•, •, l − 1), are all identical.✷ First, observe that a periodic FSM with period l can be viewed as time-invariant FSM in the level of l-blocks, {x il , x il+1 , . . ., x il+l−1 }, i = 0, 1, . .., and hence also in the level of ℓ-blocks where ℓ is an integer multiple of l.Accordingly, let ℓ be an arbitrary integer multiple of l, but at the same time, assume that ℓ divides n.Denote m = n/ℓ, and define the counts, In fact, there is a certain redundancy in this definition because z ′ is a deterministic function of (z, x ℓ ) obtained by ℓ recursive applications of the (time-varying) next-state function g.Concretely, m(z, z ′ , x ℓ ) = m(z, x ℓ ) iff z ′ matches (z, x ℓ ) and m(z, z ′ , x ℓ ) = 0 otherwise.Nonetheless, we adopt this definition for the sake of clarity of combinatorial derivation to be carried out shortly.In particular, our derivation will be based on grouping together all {x ℓ }, which for a given z, yield the same z ′ .Accordingly, we also denote m(z, z ′ ) = x ℓ ∈X ℓ m(z, z ′ , x ℓ ).
Suppose that the acceptance/rejection criterion that defines A n is based on the counts, {m(z, z ′ , x ℓ ), z, z ′ ∈ S, x ℓ ∈ X ℓ }, and then the smallest A n that contains x is the type class of x pertaining to {m(z, z ′ , x ℓ ), z, z ′ ∈ S, x ℓ ∈ X ℓ }.The various sequences in this type class are obtained by permuting distinct ℓ-tuples {x iℓ+ℓ−1 iℓ , i = 0, 1, . . ., m − 1} that begin at the same state, z, and end at the same state, z ′ .Let us define the empirical distribution, and the joint entropy, and let where Ĥ(Z, Z ′ ) is the marginal empirical entropy of (Z, Z ′ ).Then, using the method of types [39], we have: and so, which, for ℓ ≫ 2 log s, can be essentially matched by universal compression for block-memoryless sources and one-time pad encryption using exactly the same ideas as before.Once again, we have derived a lower bound that is free of dependence on the particular FSM, g.Recall that ℓ divides n and that it is also a multiple of l, but otherwise, ℓ is arbitrary.
Clearly, in view of Remark 5, the above lower bound applies also to the case of a time-invariant FSM, g, but then there would be some mismatch between the upper and lower bound because to achieve the lower bound, one must gather more detailed empirical statistics, namely, empirical statistics of blocks together with states, rather than just single symbols with states.

Side Information
Some of the results presented in the previous sections extend to the case where side information (SI) is available at the legitimate decrypter and at the eavesdropper.In principle, it may or may not be available to the encrypter, and we consider first the case where it is available.We assume the where the last inequality follows from [10] (see also [35,Lemma 13.5.3])with ǫ n = O log(log n) log n .
This lower bound can be asymptotically attained by the conditional version of the LZ algorithm (see [40] and [41]), followed by one-time pad encryption.If y is unavailable at the encrypter, x can still be compressed into about bits before the one-time pad encryption, using Slepian-Wolf encoding [35,Section 15.4] and reconstructing (with high probability) after decryption using a universal decoder that uses u(x|y) as a decoding metric, see [42].
Unfortunately, and somewhat surprisingly, a direct extension of the results of Subsections 3.1 and 3.2 to the case with SI turns out to be rather elusive.The reason is the lack of a single-letter formula for the exponential growth rate of the cardinality of a finite-state conditional type class of x-vectors that is defined by joint counts of the form n(x, y, z) = n i=1 I{x i = x, y i = y, z i = z}, even in the simplest special case where z i = x i−1 .In a nutshell, this quantity depends on y in a complicated manner, which cannot be represented in terms of an empirical distribution of fixed dimension that does not grow with n.However, we can obtain at least a lower bound if we treat these cases as special cases of the periodically time-varying FSMs with counters in view of Remark 5.
Finally, consider the class of discriminators implementable by periodically time-varying FSMs with SI, defined as follows: x|y) log c l (x|y) − c(y) l=1 c l (x, y)H(Z l , Z ′ l ) ≥ c(y) l=1 c l (x|y) log c l (x|y) − 2 • c(y) l=1 c l (x, y) log s = c(y) l=1 c l (x|y) log c l (x|y) − 2c(x, y) log s ≥ c(y) l=1 c l (x|y) log c l (x|y) − 2n log s (1 − ǫ n ) log n , x|y) log c l (x|y) Hence we may maximize this lower bound w.r.t.ℓ subject to these constraints.Alternatively, we may rewrite the lower bound as