Next Article in Journal
Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data
Previous Article in Journal
Artificial Intelligence Models for Predicting Stock Returns Using Fundamental, Technical, and Entropy-Based Strategies: A Semantic-Augmented Hybrid Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Universal Encryption of Individual Sequences Under Maximal Information Leakage

The Viterbi Faculty of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Technion City, Haifa 3200003, Israel
Entropy 2025, 27(6), 551; https://doi.org/10.3390/e27060551
Submission received: 30 April 2025 / Revised: 22 May 2025 / Accepted: 23 May 2025 / Published: 24 May 2025
(This article belongs to the Special Issue Information Theory and Data Compression)

Abstract

:
We consider the Shannon cipher system in the framework of individual sequences and finite-state encrypters under the metric of maximal information leakage. A lower bound and an asymptotically matching upper bound on the leakage are derived, which lead to the conclusion that asymptotically minimum leakage can be attained by Lempel–Ziv compression followed by one-time pad encryption of the compressed bitstream.

1. Introduction

The information-theoretic approach that combines individual-sequence modeling with finite-state encoders and decoders has been extensively developed, representing a notable departure from the conventional reliance on probabilistic models traditionally used in source and channel modeling. This paradigm shift has gained traction across multiple areas of information theory, including lossless and lossy source coding [1,2,3,4,5,6]; source/channel simulation [7]; hypothesis testing [8,9]; prediction and decision making [10,11,12]; filtering [13]; and even error correction coding [14,15,16]. A concise overview of this expanding body of work can be found in [17], though these citations represent only a small fraction of the broader literature. By sharp contrast, the domain of information-theoretic security has remained largely anchored in probabilistic methods from Shannon’s foundational contributions [18] to more contemporary developments [19,20,21,22,23]. While these examples are far from exhaustive, they underscore the field’s persistent adherence to probabilistic frameworks.
To the author’s knowledge, only two significant departures from the dominant probabilistic approach in information-theoretic security exist: one is an unpublished technical report by Ziv [24] and the other is a subsequent development documented in [25]. In his report, Ziv proposes a novel framework in which the plaintext, destined for encryption using a secret key, is modeled as an arbitrary individual sequence. Within this setup, the encrypter functions as a general block encoder, while the eavesdropper is equipped with a finite-state machine (FSM) designed to distinguish between potential candidates for estimating the plaintext. A central aspect of Ziv’s model is the assumption that the eavesdropper possesses partial prior knowledge about the plaintext, formalized as a set of “acceptable messages”, which he defines as the acceptance set. Before intercepting the ciphertext, the eavesdropper’s uncertainty is characterized by the possibility that the plaintext could be any member of this set. Encryption is deemed perfectly secure if the ciphertext provides no additional information, i.e., if it does not reduce the size of the acceptance set, and thus renders the eavesdropper’s uncertainty unchanged. Accordingly, the size of the acceptance set quantifies uncertainty: the larger the set, the less the eavesdropper knows. The FSM attempts to rule out unacceptable sequences by testing them across different key sequences. Perfect secrecy is thus defined by the ciphertext’s inability to eliminate any members of the acceptance set. Ziv’s key result is that the minimum asymptotic key rate required for perfect secrecy, under this definition, is lower-bounded by the Lempel–Ziv (LZ) complexity of the plaintext sequence [6]. Furthermore, this lower bound is asymptotically tight. Perfect security can be achieved by applying a one-time pad (bitwise XOR with key bits) to the LZ-compressed version of the plaintext. This mirrors Shannon’s classical finding that the key rate must match the entropy rate of the source. More recent work [26] has expanded and sharpened Ziv’s original ideas in several important respects.
The subsequent work [25] offers a different take on the modeling approach and on achieving perfect secrecy for individual sequences. Instead of focusing on a finite-state eavesdropper with predefined knowledge, this approach models the encrypter itself as a finite-state machine (FSM), which processes both the plaintext and a stream of random key bits in a sequential manner. Central to this framework is the introduction of a new notion called finite-state encryptability, in the footsteps of finite-state compressibility introduced in [6]. Finite-state encryptability is defined as the minimum key rate required by any FSM-based encrypter to ensure that a particular measure of normalized empirical mutual information between the plaintext and ciphertext converges to zero as the block length increases. One of the main theoretical results in [25] asserts that the finite-state encryptability of a given individual sequence is lower-bounded by its finite-state compressibility. Stated differently, no finite-state encrypter can use a key rate below the sequence compressibility without compromising security, as defined in this setting. Once again, this lower bound is not merely theoretical; it can be asymptotically achieved through the same two-step process that was mentioned above: first compressing the plaintext using the Lempel–Ziv (LZ) algorithm and then applying one-time pad encryption to the resulting compressed bitstream. This mirrors earlier results in both compression and security, illustrating a deep connection between the two domains.
In this paper, we adopt the same model setting as in [25], but with a different security metric: the maximum leakage of information, which was first introduced by Issa, Wagner, and Kamath in [27] and then further explored in several more recent works, including [28,29,30,31,32], among others. This metric is closely related to, and similarly motivated by, the earlier security measure proposed in [33], which defines security as a scenario where the correct decoding exponent of the plaintext is not improved by the availability of the ciphertext, compared to that of blind guessing. For more details, see the last paragraph of Section 2.2.1. The maximum leakage metric is defined in a more general form and has a relatively straightforward expression, as demonstrated in [27] and further clarified in the following sections. As will be discussed in the sequel, the maximum leakage metric is particularly well suited for the individual-sequence setting considered here, as it is weakly dependent on the probability distribution of the plaintext, depending only on its support.
We derive both a lower bound and an asymptotically matching upper bound on the leakage, leading yet again to the conclusion that asymptotically optimal performance can be achieved by applying LZ compression followed by one-time pad encryption of the compressed bitstream. Thus, considering the above-mentioned earlier works, refs. [24,25,26], one of the messages of this work is that one-time pad encryption on top of LZ compression forms an asymptotically optimal cipher system from many aspects. Therefore, we believe that the deeper and more interesting contribution of this work is the converse theorem (Theorem 1 in the sequel) and its proof, asserting that the key rate that must be consumed to encrypt an individual sequence cannot be much smaller than the LZ complexity of the sequence minus the allowed normalized maximal information leakage.
This paper is structured as follows: In Section 2, we establish notation conventions, provide some necessary background, and formulate the problem studied in this work. In Section 3, we assert the main results and discuss them. Finally, in Section 4, we prove Theorem 1, which is the converse theorem.

2. Notation Conventions, Background, and Problem Formulation

2.1. Notation Conventions

In this paper, we adopt the following notation rules: Scalar random variables (RVs) are represented using uppercase letters, while their realizations (sample values) are denoted by the corresponding lowercase letters. The sets of possible values (alphabets) for these variables are indicated using calligraphic letters. This notation extends naturally to random vectors and their realizations. Specifically, an n-dimensional random vector will be denoted by appending a superscript indicating the dimension to the scalar symbol. For instance, A n (n–positive integer) refers to the random vector ( A 1 , , A n ) , and a n = ( a 1 , , a n ) denotes a particular instance of this vector, belonging to the set A n , the n-fold Cartesian product of the alphabet A . Segment notations, such as A i j and a i j , are used to represent substrings ( A i , , A j ) and ( a i , , a j ) , respectively, for integers i j . When i = 1 , the index is dropped for brevity. If i > j , these notations are interpreted as representing the empty string. Additionally, for any real number u, the notation [ u ] + denotes max 0 ,   u . Unless otherwise noted, all logarithms and exponential functions throughout this paper are taken to base 2.
Throughout this article, information sources and channels will be generically represented by the letter P, following standard textbook notation. Subscripts will indicate the relevant random variables and any sort of conditioning, when applicable. For example, P X n ( x n ) denotes the probability mass function of the random vector X n evaluated at x n , while P Y n | X n ( y n | x n ) represents the conditional probability of Y n = y n given X n = x n . These subscripts may be omitted when the meaning is clear from context. Information-theoretic functionals such as entropy, mutual information, and related quantities will be expressed using standard symbols and conventions widely adopted in the information theory literature. In the remainder of this work, the symbol x n = ( x 1 , , x n ) will refer to a specific input sequence intended for encryption. Each element x i , i = 1 , 2 , , n , belongs to a finite input alphabet X , whose size is denoted by α .

2.2. Background

Before showing the main results and their proofs, let us revisit key terms and details related to the notion of maximal information leakage and the 1978 version of the LZ algorithm, also known as the LZ78 algorithm [6], which is the central building block in this work.

2.2.1. Maximal Leakage of Information

As mentioned in the introduction, we adopt the maximal leakage [27] as our secrecy metric. For a probabilistic plaintext source, the maximal leakage from a secret random variable X, distributed according to { P X ( x ) , x X } , to another random variable Y, available to an adversary, and which is conditionally distributed given X = x according to { P Y | X ( y | x ) , x X , y Y } , is defined as
L ( X Y ) = sup U X Y U ^ log P r { U ^ = U } max u U P U ( u ) ,
where the supremum is over all finite-alphabet random variables U and U ^ , with the Markov structure U X Y U ^ . In other words, it is the maximum possible difference between the logarithm of the probability of guessing correctly some (possibly randomized) function of X based on Y and correctly guessing it blindly.
In Theorem 1 of [27], it was asserted and proved that the leakage can be calculated relatively easily using the following formula:
L ( X Y ) = log y Y max { x : P X ( x ) > 0 } P Y | X ( y | x ) .
Clearly, if P Y | X ( y | x ) is independent of x for all y Y , then L ( X Y ) = 0 , which is the case of perfect secrecy. In general, the smaller the L ( X Y ) , the more secure the system is. In [27], it is shown that the maximal leakage has many interesting properties; one of them is that it satisfies a data processing inequality (see Lemma 1 of [27]). It is also shown in Section III of [27] that the maximal leakage has several additional operative meanings in addition to the original one explained above.
Note that the dependence on the distribution of the secret random variable, P X , is rather weak, as it depends only on its support. When passing from single variables to vectors of length n, L ( X n Y n ) is defined in the same manner except that x, y, X , Y , P X ( · ) , and P Y | X ( · | · ) are replaced by x n , y n , X n , Y n , P X n ( · ) , and P Y n | X n ( · | · ) , respectively. In this case, the weak dependence of L ( X n Y n ) on P X n makes it natural to use when P X n is uncertain, or completely unknown, or even non-existent, such as in the individual sequence setting considered here. In this case, we adopt the simple definition
L ( x n Y n ) = log y n Y n max x n X n P Y n | X n ( y n | x n ) ,
corresponding to the full support X n for x n , which accounts for a worst-case approach. The operational significance of maximal information leakage in our setting can then be understood in two ways: (i) Considering the definition (1), it allows arbitrary probability distributions (without any assumed structure) on x n , including those that place almost all their mass on a single (unknown) arbitrary sequence, with regard to the individual-sequence setting considered here. (ii) Referring to Formula (2), it is evident that the leakage vanishes whenever P Y n | X n ( y n | x n ) is independent of x n , which is an indisputable characterization for perfect secrecy in the individual-sequence setting too, where no distribution is assumed on x n .
As mentioned in the introduction, a somewhat different security metric was proposed in [33], but it is intimately related to the maximal information leakage considered here. In [33], the idea was to define a system as secure if the probability of guessing X correctly is essentially the same if Y is present or absent. More precisely, if X and Y are random vectors of dimension n, then a system is considered secure if the correct decoding exponent of X in the presence of Y is the same as if Y is absent. Specifically, the correct decoding probability of X based on Y is
P c = y max x P X Y ( x , y ) ,
which is closely related to
2 L ( X Y ) = y max x P Y | X ( y | x ) = | X | · y max x P Y | X ( y | x ) | X | = | X | · y max x P X ( x ) P Y | X ( y | x ) = y max x P X ( x ) P Y | X ( y | x ) 1 / | X | = P c i P c u ,
where P X ( · ) is understood to designate the uniform distribution across X ; accordingly, P c i stands for the probability of correct decoding of a uniformly distributed X by an informed observer, namely one that has access to Y, whereas P c u = 1 / | X | denotes the probability of correctly blind guessing the value of X (in the absence of Y).

2.2.2. Lempel–Ziv Parsing

The incremental parsing process of the LZ78 algorithm is a sequential method applied to an input vector x n over a finite alphabet. In this process, each new phrase is the shortest substring that has not appeared before as a complete parsed phrase, except possibly for the final (incomplete) phrase. For instance, if one applies incremental parsing to the sequence x 15 = abbabaabbaaabaa , the outcome is a , b , b a , b a a , b b , a a , a b , a a . Let c ( x n ) designate the total number of phrases formed from x n by the incremental parsing process (in the example above, c ( x 15 ) = 8 ). Also, let L Z ( x n ) stand for the length of the LZ78 binary compressed representation for x n . Theorem 2 of [6] easily leads to the following inequality:
L Z ( x n ) c ( x n ) log c ( x n ) + n · ϵ ( n ) ,
where ϵ ( n ) tends to zero as n . In other words, the LZ code-length for x n cannot exceed an expression whose dominant term is c ( x n ) log c ( x n ) . On the other hand, it turns out that c ( x n ) log c ( x n ) is also the dominant term of a lower bound (see Theorem 1 of [6]) to the minimum code-length attainable by any information lossless finite-state encoder with no more than s states, provided that log ( s 2 ) is very small compared to log c ( x n ) . In view of these facts, we will be referring to c ( x n ) log c ( x n ) as the unnormalized LZ complexity of x n , whereas the normalized LZ complexity will be defined as
ρ LZ ( x n ) = c ( x n ) log c ( x n ) n .

2.3. Problem Formulation

Following the approach in [25], we adopt a finite-state model for encryption, described by the sextuple
E = ( X , Y , Z , f , g , Δ ) ,
where X is a finite input alphabet with cardinality α = | X | , Y is a finite collection of binary strings of variable length, possibly including the empty string λ (with zero length), Z is a finite set representing the internal states of the encrypter, f : Z × X × 0 , 1 * Y is the output function, g : Z × X Z is the state transition function, and Δ : Z × X 0 , 1 , 2 , indicates the number of key bits used at each step. The encrypter E processes two infinite input sequences: a plaintext sequence x = x 1 , x 2 , , where each x i X , and a key sequence u = u 1 , u 2 , , where each u i 0 , 1 . Given these inputs, the encrypter generates an infinite ciphertext sequence y = y 1 , y 2 , , with each y i Y , while simultaneously transitioning through a corresponding state sequence z = z 1 , z 2 , , where each z i Z . The evolution of these sequences is governed by a set of recursive equations, applied iteratively for each time step i = 1 , 2 ,
t i = t i 1 + Δ ( z i , x i ) , t 0 = 0
k i = ( u t i 1 + 1 , u t i 1 + 2 , , u t i )
y i = f ( z i , x i , k i )
z i + 1 = g ( z i , x i ) .
Here, the initial state of the encrypter, z 1 , is fixed and will be referred to as z throughout. It is understood that when Δ ( z i , x i ) = 0 , the encrypter uses no key bits at step i. In this case, the key fragment k i is defined to be the empty string λ . Similarly, if the output y i = λ , then no output is generated at that step; the system effectively idles, meaning only the internal state changes in response to the input. More specifically, at each time step i, given the current state z i and the incoming plaintext symbol x i , the encrypter consumes the next Δ ( z i , x i ) unused bits from the key sequence u to form k i . It then produces an output y i (which may be empty) and transitions to the next state z i + 1 based on the function g. In summary, the operation at time i proceeds as follows: The encrypter is in state z i , it receives input symbol x i ; it consumes Δ ( z i , x i ) key bits from u, forming k i ; it generates output y i ; and it updates its state to z i + 1 .
Remark 1. 
Note that the evolution of the state variable z i depends solely on the source inputs { x i } and is independent of the key bits. This design choice reflects the intended role of z i , which is to retain memory of the source sequence x n , allowing the encrypter to exploit empirical correlations and repetitive patterns within the plaintext. In contrast, maintaining memory of past key bits—which are assumed to be independent and identically distributed (i.i.d.)—offers no practical benefit and is therefore omitted. Moreover, the model can be naturally extended to include two state variables: one that evolves based only on the source sequence { x i } (as in the current setup) and another that evolves based on both { x i } and the consumed key bits { k i } . In such a framework, the first state variable would continue to govern the update of the index t i , while the second could influence the output function, allowing for more expressive or adaptive encryption mechanisms.
An encrypter with s states, or an s-state encrypter, E, is one with | Z | = s . It is assumed that the plaintext sequence x is deterministic (i.e., an individual sequence), whereas the key sequence u is purely random, i.e., for every positive integer n, P U n ( u n ) = 2 n .
A few additional notation conventions will be convenient: By f ( z i , x i j , k i j ) , ( i j ) we refer to the vector y i j produced by E in response to the inputs x i j and k i j when the initial state is z i . Similarly, the notation g ( z i , x i j ) will mean that the state z j + 1 and Δ ( z i , x i j ) will designate = i j Δ ( z , x ) under the same circumstances.
As explained in Section 2.2, we adopt the maximal leakage of information as our security metric, given by
L ( x n Y n ) = log y n max x n X n P Y n | X n ( y n | x n ) .
An encryption system E is said to be perfectly secure if for every positive integer n, L ( x n Y n ) = 0 . If L ( x n Y n ) 0 as n , we say that the encryption system is asymptotically secure.
An encrypter is referred to as information lossless (IL) if for every z i Z , every sufficiently large n and all pairs ( x i i + n , k i i + n ) , the quadruple ( z i , k i i + n , f ( z i , x i i + n , k i i + n ) , g ( z i , x i i + n ) ) uniquely determines x i i + n . Given an encrypter E and an input string x n , the encryption key rate of x n with respect to E is defined as
σ E ( x n ) = ( k n ) n = 1 n i = 1 n ( k i ) ,
where ( k i ) = Δ ( z i , x i ) is the length of the binary string k i and ( k n ) = i = 1 n ( k i ) is the total length of k n .
Remark 2. 
It is worth noting that the definition of information losslessness used here is more relaxed and, thus, more general than the one given in [6]. In [6], the requirement must hold for every positive integer n, whereas in the present context, it is only required to hold for all sufficiently large n. The absence of information losslessness in the stricter sense of [6] does not contradict the ability of the legitimate decoder to reconstruct the source. Rather, it implies that reconstructing x n may require more than just the tuple ( z i , y i i + n , k i i + n , z i + n + 1 ) ; for example, some additional data from times later than i + n + 1 may be needed.
The set of all perfectly secure, IL encrypters { E } with no more than s states will be denoted by E ( s ) . The minimum of σ E ( x n ) over all encrypters in E ( s ) will be denoted by σ s ( x n ) , i.e.,
σ s ( x n ) = min E E ( s ) σ E ( x n ) .
Finally, let
σ s ( x ) = lim sup n σ s ( x n ) ,
and define the finite-state encryptability of x as
σ ( x ) = lim s σ s ( x ) .
Our purpose is to characterize these quantities and to point out how they can be achieved in principle.

3. Main Results

Our converse theorem, whose proof appears in Section 4, is the following:
Theorem 1. 
For every information lossless encrypter E with no more than s states,
L ( x n Y n ) n [ max x n X n { ρ L Z ( x n ) σ E ( x n ) } δ s ( n ) ( α s 1 ) log ( n + 1 ) n log s n ] + ,
where δ s ( n ) O log ( log n ) log n for every fixed s. Equivalently, if 0 L ( x n Y n ) n λ for some given constant λ 0 , then for every x n X n and every information lossless encrypter E E ( s ) ,
σ E ( x n ) ρ L Z ( x n ) λ δ s ( n ) ( α s 1 ) log ( n + 1 ) n log s n .
As for achievability, consider first an arbitrary lossless compression scheme that compresses x n at a compression ratio of ρ ( x n ) = L ( x n ) / n and then applies one-time pad encryption to [ L ( x n ) n λ ] + compressed bits. Let y n denote the resulting (partially) encrypted compressed representation of x n . Since the [ L ( x n ) n λ ] + key bits are purely random, the probablity of any y n that can be obtained from some x n is exactly 2 [ L ( x n ) n λ ] + and zero if y n cannot be obtained from any x n . In other words, max x n P Y n | X n ( y n | x n ) 2 [ L ( x n ) n λ ] + . Obviously, the length of y n , denoted as L ( y n ) , is equal to L ( x n ) . Therefore, by denoting L max = max x n X n L ( x n ) , we have the following:
exp 2 { L ( x n y n ) } = y n Y n max x n P Y n | X n ( y n | x n ) = = 1 L max { y n : L ( y n ) = } max x n P Y n | X n ( y n | x n ) = 1 L max { y n : L ( y n ) = } 2 [ n λ ] + = 1 L max 2 2 [ n λ ] + = 1 L max 2 2 ( n λ ) = L max · 2 n λ ,
and, therefore,
L ( x n y n ) n λ + log L max .
If L max = O ( n ) , then the dominant term is clearly n λ .
Remark 3. 
The condition that L max = O ( n ) is always easy to satisfy via a minor modification of any given compression scheme (if it does not satisfy the condition in the first place). First, test whether L ( x n ) < n log α or L ( x n ) n log α . If L ( x n ) < n log α , add a header bit ‘0’ before the compressed representation of x n ; otherwise, add a header bit ‘1’ and then add the uncompressed binary representation of x n using n log α bits. The resulting code-length would then be L ( x n ) = min { L ( x n ) , n log α } + 1 bits.
If the compression scheme is chosen to be the LZ78 algorithm, then
σ E ( x n ) ρ LZ ( x n ) λ + O log log n log n ,
which essentially meets the converse bound (16). We have therefore proved the following direct theorem:
Theorem 2. 
Given λ 0 , there exists a universal encrypter that satisfies
L ( x n Y n ) n λ + log n + O ( 1 ) ,
and for every x n X n ,
σ E ( x n ) ρ L Z ( x n ) λ + O log log n log n .
Discussion.
  • A few comments are now in order.
  • We established both a lower bound and an asymptotically matching upper bound on the information leakage, leading once again to the conclusion that asymptotically optimal performance can be achieved by applying Lempel–Ziv (LZ) compression followed by one-time pad encryption of the compressed bitstream. As discussed in the introduction, together with earlier works, such as [24,25,26], this reinforces the message that one-time pad encryption applied after LZ compression yields an asymptotically optimal cipher system in several important respects. That said, we believe the deeper and more significant contribution of this work lies in the converse theorem (Theorem 1), which shows that the key rate required to securely encrypt an individual sequence cannot be substantially smaller than its LZ complexity minus the permitted normalized maximal information leakage, no matter what encryption strategy is employed.
  • Similarly as in [6], there is formally a certain gap between the converse theorem and the achievability scheme in its basic form, when examined from the viewpoint of the number of states, s, relative to n. While s should be small relative to n for the lower bound to be essentially ρ LZ ( x n ) (see Section 2.2 above), the number of states actually needed to implement LZ78 compression for a sequence of length n is basically exponential in n. In [6], the gap is closed in the limit of s (after taking the limit n ) by subdividing the sequence into blocks and restarting the LZ algorithm at the beginning of every block. A similar comment applies here too in the double limit of achieving σ ( x ) .
  • As discussed in [26] in a somewhat different context, for an alternative to the use of the LZ78 algorithm, it can be shown that asymptotically optimum performance can also be attained by a universal compression scheme for the class of k-th order Markov sources, where k is chosen to be sufficiently large. In this case, ρ LZ ( x n ) in Theorems 1 and 2 should be replaced by the k-th order empirical entropy of order k, and some redundancy terms should be modified. However, one of these redundancy terms is log s k + 1 , which means that in order to compete with the best encrypter with s states, k must be chosen to be significantly larger than log s , so as to make this term reasonably small.
  • It is speculated that it may not be difficult to extend our findings in several directions, including lossy reconstruction, the presence of side information at either parties, the combination of both, and successive refinement systems in accordance to [32]. Other potentially interesting extensions are in broadening the scope of the FSM model to larger classes of machines, including FSMs with counters, shift-register machines with counters, and periodically time-varying FSMs with counters, as was carried out in Section III of [26]. Research in some of these directions will be explored in future studies.

4. Proof of Theorem 1

First, observe that
σ E ( x n ) = 1 n i = 1 n Δ ( z i , x i ) = x , z P ^ ( x , z ) Δ ( z , x ) ,
where P ^ = { P ^ ( x , z ) , x X , z Z } is the joint empirical distribution of ( x , z ) derived from ( x n , z n ) . It is therefore seen that σ E ( x n ) depends on x n only via P ^ . Accordingly, in the sequel, we will also use the alternative notation σ E ( P ^ ) when we wish to emphasize the dependence on P ^ . Let T ( x n ) denote the set of x ˜ n X n , which, together with their associated state sequences, share the same empirical PMF P ^ as that of x n along with its state sequence. Similarly as with σ E ( · ) , we also denote it by T ( P ^ ) . In the sequel, we will make use of the inequality
log | T ( x n ) | n ρ L Z ( x n ) δ s ( n ) ,
where δ s ( n ) 0 as n for a fixed s at the rate of log ( log n ) log n . The proof of Equation (23), which appears in various forms and variations in earlier papers (see, e.g., [34]), is provided in the Appendix A for the sake of completeness (see also the related Ziv’s inequality in Lemma 13.5.5 of [35]).
For later use, we also define the following sets:
P ( y n ) = { P ^ : T ( P ^ ) f 1 ( y n ) } ,
Y ( P ^ ) = { y n : T ( P ^ ) f 1 ( y n ) } ,
where
f 1 ( y n ) = { x n : f ( z , x n , k n ) = y n for   some k n { 0 , 1 } n σ E ( x n ) } .
In other words, P ( y n ) is the set of all type classes of plaintext sequences for which some members can be mapped to y n by some key bit strings, whereas Y ( P ^ ) is the set of ciphertext sequences which can be obtained by some member of type class T ( P ^ ) and some key bit string. Now, observe that
| Y ( P ^ ) | | { y n : y n = f ( z , x n , 0 n σ E ( x n ) ) for   some x n T ( P ^ ) } | max z Z | { y n : y n = f ( z , x n , 0 n σ E ( x n ) ) and g ( z , x n ) = z for   some x n T ( P ^ ) } | = max z Z | Y z ( P ^ ) | | T ( P ^ ) | s ,
where the last inequality follows from the following consideration: Let x n exhaust all members of T ( P ^ ) . For each such x n , let y n = f ( z , x n , 0 n σ E ( x n ) ) . Now, for every z Z , let T z ( P ^ ) denote the subset of T ( P ^ ) for which z n + 1 = g ( z , x n ) = z , and we have already defined Y z ( P ) to denote the set of corresponding output sequences, { y n } . Obviously, since { T z ( P ^ ) } z Z form a partition of T ( P ^ ) , then for some z = z * , | T z * ( P ^ ) | | T ( P ^ ) | / s . Therefore,
max z | Y z ( P ^ ) | | Y z * ( P ^ ) | = | T z * ( P ^ ) | | T ( P ^ ) | s ,
where the equality is proved since the mapping between x n and y n is one-to-one given that k n = 0 n σ E ( x n ) , z 1 = z , and z n + 1 = z * by the information losslessness postulated, provided that n is sufficiently large as required. Now, let Y n + denote that set of all y n Y n for which P Y n | X n ( y n | x n ) > 0 for some x n X n . Then,
exp 2 { L ( x n y n } = y n Y n + max x n P Y n | X n ( y n | x n ) = y n Y n + max x n ϕ 1 ( y n ) P Y n | X n ( y n | x n ) = y n Y n + max P ^ P ( y n ) max x n ϕ 1 ( y n ) T ( P ^ ) P Y n | X n ( y n | x n ) ( a ) y n Y n + max P ^ P ( y n ) max x n ϕ 1 ( y n ) T ( P ^ ) 2 n σ E ( P ^ ) = y n Y n + max P ^ P ( y n ) 2 n σ E ( P ^ ) 1 M n y n Y n + P ^ P ( y n ) 2 n σ E ( P ^ ) = 1 M n P ^ y n Y ( P ^ ) 2 n σ E ( P ^ ) = 1 M n P ^ | Y ( P ^ ) | · 2 n σ E ( P ^ ) ( b ) 1 M n s P ^ | T ( P ^ ) | · 2 n σ E ( P ^ ) 1 M n s · max x n X n | T ( x n ) | · 2 n σ E ( x n ) ( c ) 1 M n s · max x n X n 2 n [ ρ LZ ( x n ) δ s ( n ) ] · 2 n σ E ( x n ) = exp 2 { n · max x n X n [ ρ LZ ( x n ) σ E ( x n ) δ s ( n ) log M n n log s n ] } exp 2 { n · max x n X n [ ρ LZ ( x n ) σ E ( x n ) δ s ( n ) ( α s 1 ) log ( n + 1 ) n log s n ] } ,
where, in (a), we used the fact that P Y n | X n ( y n | x n ) > 0 implies P Y n | X n ( y n | x n ) 2 n σ E ( x n ) (because P Y n | X n ( y n | x n ) > 0 implies that there is at least one k n { 0 , 1 } n σ E ( x n ) such that f ( z , x n , k n ) = y n and the probability of each such k n is 2 n σ E ( x n ) ), and where M n is the number of different type classes, { P ^ } , which is upper-bounded by ( n + 1 ) α s 1 . In (b), we used Equation (27); in (c), we used Equation (23). Finally, the operator [ · ] + that appears in the assertion of Theorem 1 is due to the additional trivial lower bound L ( x n Y n ) 0 . This completes the proof of Theorem 1.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Proof of Equation (23)

Consider the LZ78 incremental parsing procedure applied to x n and let c z z , N , z , z Z , denote the number of phrases of length , which start at state z and end at state z . Clearly, , z , z c z z = c ( x n ) , for which we will use the shorthand notation c in this appendix.
Given that x n T ( P ^ ) , one can generate other members of T ( P ^ ) by permuting phrases of the same length that start at the same state and end at the same state. Thus, | T ( P ^ ) | , z , z ( c z z ! ) ; thus,
log | T ( P ^ ) | , z , z log ( c z z ! ) , z , z c z z log c z z e = , z , z c z z log c z z c log e = c , z , z c z z c log c z z c + log c c log e = c log c c H ( L , Z , Z ) c log e ,
where H ( L , Z , Z ) is the joint entropy of the auxiliary random variables L, Z, and Z , jointly distributed according to the distribution π ( , z , z ) = c z z / c , N , z , z Z . To further bound log | T ( P ^ ) | from below, we now derive an upper bound to H ( L , Z , Z ) :
H ( L , Z , Z ) H ( L ) + H ( Z ) + H ( Z ) H ( L ) + 2 log s ( 1 + E L ) log ( 1 + E L ) ( E L ) log ( E L ) + 2 log s = 1 + n c log 1 + n c n c log n c + 2 log s = n c log 1 + c n + log n c + 1 + 2 log s log n c + 1 + log ( s 2 e ) ,
where the third inequality is due to Lemma 13.5.4 of [35], the following equality is due to the relation E L = , z , z c z z / c = n / c , and the last inequality is due to an application of the inequality log ( 1 + u ) u log e for all u > 1 . It follows that
log | T ( P ^ ) | n ρ LZ ( x n ) c n log n c + 1 c n log ( s 2 e ) ρ LZ ( x n ) c n log n c c n log 1 + c n c n log ( s 2 e ) ρ LZ ( x n ) c n log n c c n 2 log e c n log ( s 2 e ) = ρ LZ ( x n ) δ s ( n ) ,
where
δ s ( n ) = c n log n c + c n 2 log e + c n log ( s 2 e ) .
Since c n log α log n ( 1 + o ( 1 ) ) (see Equation (6) of [6] and Lemma 13.5.3 of [35] and the reference therein), the second and the third terms of δ s ( n ) are bounded by O ( 1 / log 2 n ) and O ( 1 / log n ) , respectively. The first term of δ s ( n ) is upper-bounded by O log ( log n ) log n (see Equation (13.124) of [35]).

References

  1. Kieffer, J.C.; Yang, E.-H. Sequential Codes, Lossless Compression of Individual Sequences, and Kolmogorov Complexity; Technical Report 1993–3; Information Theory Research Group, University of Minnesota: Minneapolis, MN, USA, 1993. [Google Scholar]
  2. Yang, E.-H.; Kieffer, J.C. Simple universal lossy data compression schemes derived from the Lempel–Ziv algorithm. IEEE Trans. Inform. Theory 1996, 42, 239–245. [Google Scholar] [CrossRef]
  3. Ziv, J. Coding theorems for individual sequences. IEEE Trans. Inform. Theory 1978, 24, 405–412. [Google Scholar] [CrossRef]
  4. Ziv, J. Distortion–rate theory for individual sequences. IEEE Trans. Inform. Theory 1980, 26, 137–143. [Google Scholar] [CrossRef]
  5. Ziv, J. Fixed-rate encoding of individual sequences with side information. IEEE Trans. Inf. Theory 1984, 30, 348–452. [Google Scholar] [CrossRef]
  6. Ziv, J.; Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 1978, 24, 530–536. [Google Scholar] [CrossRef]
  7. Seroussi, G. On universal types. IEEE Trans. Inform. Theory 2006, 52, 171–189. [Google Scholar] [CrossRef]
  8. Ziv, J. Compression, tests for randomness, and estimating the statistical model of an individual sequence. In Sequences: Combinatorics, Compression, Security, and Transmission; Capocelli, R.M., Ed.; Springer Verlag: New York, NY, USA, 1990; pp. 366–373. [Google Scholar]
  9. Ziv, J.; Merhav, N. A measure of relative entropy between individual sequences with application to universal classification. IEEE Trans. Inform. Theory 1993, 39, 1270–1279. [Google Scholar] [CrossRef]
  10. Feder, M.; Merhav, N.; Gutman, M. Universal prediction of individual sequences. IEEE Trans. Inform. Theory 1992, 38, 1258–1270. [Google Scholar] [CrossRef]
  11. Haussler, D.; Kivinen, J.; Warmuth, M.K. Sequential prediction of individual sequences under general loss functions. IEEE Trans. Inform. Theory 1998, 44, 1906–1925. [Google Scholar] [CrossRef]
  12. Weissman, T.; Merhav, N.; Somekh-Baruch, A. Twofold universal prediction schemes for achieving the finite–state predictability of a noisy individual binary sequence. IEEE Trans. Inform. Theory 2001, 47, 1849–1866. [Google Scholar] [CrossRef]
  13. Weissman, T.; Ordentlich, E.; Seroussi, G.; Verdú, S.; Weinberger, M.J. Universal denoising: Known channel. IEEE Trans. Inform. Theory 2005, 51, 5–28. [Google Scholar] [CrossRef]
  14. Lomnitz, Y.; Feder, M. Universal communication over individual channels. IEEE Trans. Inform. Theory 2011, 57, 7333–7358. [Google Scholar] [CrossRef]
  15. Lomnitz, Y.; Feder, M. Universal communication—Part I: Modulo additive channels. IEEE Trans. Inform. Theory 2013, 59, 5488–5510. [Google Scholar] [CrossRef]
  16. Shayevitz, O.; Feder, M. Communicating using feedback over a binary channel with arbitrary noise sequence. In Proceedings of the International Symposium on Information Theory, ISIT 2005, Adelaide, Australia, 4–9 September 2005; pp. 1516–1520. [Google Scholar]
  17. Merhav, N. On Jacob Ziv’s individual-sequence approach to information theory. IEEE BITS Inf. Theory Mag. 2025. submitted. [Google Scholar] [CrossRef]
  18. Shannon, C.E. Communication theory of secrecy systems. Bell Syst. J. 1949, 28, 656–715. [Google Scholar] [CrossRef]
  19. Hellman, M.E. An extension of the Shannon theory approach to cryptography. IEEE Trans. Inform. Theory 1997, 23, 289–294. [Google Scholar] [CrossRef]
  20. Lempel, A. Cryptology in transition. Comput. Surv. 1979, 11, 285–303. [Google Scholar] [CrossRef]
  21. Liang, Y.; Poor, H.V.; Shamai (Shitz), S. Information theoretic security. Found. Trends Commun. Inf. Theory 2009, 5, 355–580. [Google Scholar] [CrossRef]
  22. Massey, J.L. An introduction to contemporary cryptology. Proc. IEEE 1988, 76, 533–549. [Google Scholar] [CrossRef]
  23. Yamamoto, H. Information theory in cryptology. IEICE Trans. 1991, E74, 2456–2464. [Google Scholar]
  24. Ziv, J. Perfect secrecy for individual sequences. 1978; unpublished work. [Google Scholar]
  25. Merhav, N. Perfectly secure encryption of individual sequences. IEEE Trans. Inform. Theory 2013, 59, 1302–1310. [Google Scholar] [CrossRef]
  26. Merhav, N. Refinements and extensions of Ziv’s model of perfect secrecy for individual sequences. Entropy 2024, 26, 503. [Google Scholar] [CrossRef]
  27. Issa, I.; Wagner, A.B.; Kamath, S. An operational approach to information leakage. IEEE Trans. Inform. Theory 2020, 66, 1625–1657. [Google Scholar] [CrossRef]
  28. Bloch, M.; Günlü, O.; Yener, A.; Oggier, F.; Poor, H.V.; Sankar, L.; Schaefer, R.F. An overview of information-theoretic security and privacy: Metrics, limits and applications. IEEE J. Sel. Areas Inform. Theory 2021, 2, 5–22. [Google Scholar] [CrossRef]
  29. Esposito, A.R.; Gastpar, M.; Issa, I. Generalization error bounds via Rényi-, f-divergences and maximal leakage. IEEE Trans. Inform. Theory 2021, 67, 4986–5004. [Google Scholar] [CrossRef]
  30. Kurri, G.R.; Sankar, L.; Kosut, O. An operational approach to information leakage via generalized gain functions. IEEE Trans. Inform. Theory 2024, 70, 1349–1375. [Google Scholar] [CrossRef]
  31. Saeidian, S.; Cervia, G.; Oechtering, T.J.; Skoglund, M. Pointwise maximal leakage. IEEE Trans. Inform. Theory 2023, 69, 8054–8080. [Google Scholar] [CrossRef]
  32. Wu, Z.; Bai, L.; Zhou, L. Successive refinement of Shannon cipher system under under maximal leakage. IEEE Trans. Inform. Theory 2025, 71, 1487–1503. [Google Scholar] [CrossRef]
  33. Merhav, N. A large-deviations notion of perfect secrecy. IEEE Trans. Inform. Theory 2003, 49, 506–508. [Google Scholar] [CrossRef]
  34. Plotnik, E.; Weinberger, M.J.; Ziv, J. Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm. IEEE Trans. Inform. Theory 1992, 38, 66–72. [Google Scholar] [CrossRef]
  35. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Merhav, N. Universal Encryption of Individual Sequences Under Maximal Information Leakage. Entropy 2025, 27, 551. https://doi.org/10.3390/e27060551

AMA Style

Merhav N. Universal Encryption of Individual Sequences Under Maximal Information Leakage. Entropy. 2025; 27(6):551. https://doi.org/10.3390/e27060551

Chicago/Turabian Style

Merhav, Neri. 2025. "Universal Encryption of Individual Sequences Under Maximal Information Leakage" Entropy 27, no. 6: 551. https://doi.org/10.3390/e27060551

APA Style

Merhav, N. (2025). Universal Encryption of Individual Sequences Under Maximal Information Leakage. Entropy, 27(6), 551. https://doi.org/10.3390/e27060551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop