Trade-offs between Error Exponents and Excess-Rate Exponents of Typical Slepian–Wolf Codes

Typical random codes (TRCs) in a communication scenario of source coding with side information in the decoder is the main subject of this work. We study the semi-deterministic code ensemble, which is a certain variant of the ordinary random binning code ensemble. In this code ensemble, the relatively small type classes of the source are deterministically partitioned into the available bins in a one-to-one manner. As a consequence, the error probability decreases dramatically. The random binning error exponent and the error exponent of the TRCs are derived and proved to be equal to one another in a few important special cases. We show that the performance under optimal decoding can be attained also by certain universal decoders, e.g., the stochastic likelihood decoder with an empirical entropy metric. Moreover, we discuss the trade-offs between the error exponent and the excess-rate exponent for the typical random semi-deterministic code and characterize its optimal rate function. We show that for any pair of correlated information sources, both error and excess-rate probabilities exponential vanish when the blocklength tends to infinity.


Introduction
As is well known, the random coding error exponent is defined by where R is the coding rate, P e (C n ) is the error probability of a codebook C n , and the expectation is with respect to (w.r.t.) the randomness of C n across the ensemble of codes. The error exponent of the typical random code (TRC) is defined as [12] E trc (R) = lim We believe that the error exponent of the TRC is the more relevant performance metric as it captures the most likely error exponent of a randomly selected code, as opposed to the random coding error exponent, which is dominated by the relatively poor codes of the ensemble, rather than the channel noise, at relatively low coding rates. In addition, since in random coding analysis, the code is selected at random and remains fixed, it seems reasonable to study the performance of the very chosen code instead of directly considering the ensemble performance.
To the best of our knowledge, not much is known on TRCs. In [3], Barg and Forney considered TRCs with independently and identically distributed codewords as well as typical linear codes, for the special case of the binary symmetric channel with maximum likelihood (ML) decoding. It was also shown that at a certain range of low rates, E trc (R) lies between E r (R) and the expurgated exponent, E ex (R). In [16] Nazari et al. provided bounds on the error exponents of TRCs for both discrete memoryless channels (DMC) and multiple-access channels. In a recent article by Merhav [12], an exact single-letter expression has been derived for the error exponent of typical, random, fixed composition codes, over DMCs, and a wide class of (stochastic) decoders, collectively referred to as the generalized likelihood decoder (GLD).
Lately, Merhav has studied error exponents of TRCs for the colored Gaussian channel [13], typical random trellis codes [14], and a Lagrange-dual lower bound to the TRC exponent [15].
Large deviations around the TRC exponent was studied in [19].
While originally defined for pure channel coding [3], [12], [16], the notion of TRCs has natural analogues in other settings as well, like source coding with side information at the decoder [17]. Typical random Slepian-Wolf (SW) codes for both fixed-rate and variable-rate binning are the main theme of this work. The random coding error exponent of SW coding, based on fixed-rate (FR) random binning, was first addressed by Gallager in [8], and improved later on by the expurgated bound in [1] and [5]. Variable-rate (VR) SW coding received less attention in the literature; VR codes under average rate constraint have been studied in [4] and proved to outperform FR codes in terms of error exponents. Optimum trade-offs between the error exponent and the excess-rate exponent in VR coding were analyzed in [20].
We begin our study by providing a single-letter expression for the error exponent of the TRC for FR random binning. In fact, since SW coding and ordinary channel coding are two sides of the same coin, the same techniques and proof ideas developed in [12] have been useful here too. While the optimal FR code at R = log |U | (denoting by |U | the cardinality of the source alphabet) is a one-to-one mapping between the source sequence set U n and the e nR bins, the TRC deviates from this optimal code significantly, and has a finite error exponent as long as R < 2 log |U |. This phenomenon has an intimate relation to the birthday paradox in probability theory. Nevertheless, this analysis of the FR code provides us a much better understanding of the random binning mechanism, and it paves the way to find other classes of codes which performs better than the FR code. This is one of the objectives of this work.
Moving forward, we discuss the trade-offs between the error exponent and the excess-rate exponent for a typical random VR code, similarly to [20], but with a different notion of the excess-rate event, which takes into account the available side information. We provide an expression for the optimal rate function that guarantees a required level for the error exponent of the typical random VR code. We show that upon relaxing the required excess-rate exponent, the resulted error exponent is strictly higher than in FR random binning. Furthermore, we find that for a class of information sources with a certain condition, the typical random VR code attains both exponentially vanishing error and excess-rate probabilities.
It turns out that both the FR and VR ensembles suffer from an intrinsic deficiency, caused by statistical fluctuations in the sizes of the bins that are populated by the relatively small type classes of the source. This fundamental problem of the ordinary ensembles is alleviated in a new proposed VR ensemble -the semi-deterministic (SD) code ensemble. In this code ensemble, these source type classes are deterministically partitioned into the available bins in a one-toone manner. As a consequence of this action, the error probability decreases dramatically. The main results concerning the SD code are the following: 1. The random binning error exponent and the error exponent of the TRC are derived and proved to be equal to one another in a few important special cases, that includes the matched likelihood decoder, the MAP decoder, and the universal minimum entropy decoder. To the best of our knowledge, this phenomenon has not been seen elsewhere before, since the TRC exponent usually improves upon the random coding exponent. As a byproduct, we are able to provide a relatively simple expression for the TRC exponent, in contrast to the two analogous results related to the FR and VR codes.
2. We show that the error exponent of the TRC under MAP decoding is also attained by two universal decoders: the minimum entropy decoder and the stochastic entropy decoder, which is a GLD with an empirical conditional entropy metric. As far as we know, this result is first of its kind; in many other scenarios, the random coding bound is attained also by universal decoders, but here, we find that the TRC exponent is also universally achievable. Moreover, while the likelihood decoder and the MAP decoder have similar error exponents [9], here we prove a similar result, but for two universal decoders (one stochastic and one deterministic) that share the same decoding metric.
3. We derive the trade-off functions between the error exponent and the excess-rate exponent for a typical random SD code and show that they may be strictly better than the tradeoff functions for the ordinary VR code. In some cases, the excess-rate exponent for the SD code may reach a strictly positive plateau, while the excess-rate exponent for the ordinary VR code eventually reaches zero. Furthermore, under a strict requirement on the excess-rate exponent, which is equivalent to a FR code, the error exponent of the SD code reaches infinity at R = log |U |, while the TRC exponent of ordinary VR coding reaches infinity at R = 2 log |U |.
The remaining part of the paper is organized as follows. In Section 2, we establish notation conventions. In Section 3, we formalize the model, the three coding techniques, the main objectives of this work, and we review some background. In Sections 4, 5, and 6, we provide and discuss the main results concerning the fixed-rate, the variable-rate, and the semi-deterministic ensembles, respectively.

Notation Conventions
Throughout the paper, random variables will be denoted by capital letters, realizations will be denoted by the corresponding lower case letters, and their alphabets will be denoted by calligraphic letters. Random vectors and their realizations will be denoted, respectively, by boldface capital and lower case letters. Their alphabets will be superscripted by their dimensions. Sources and channels will be subscripted by the names of the relevant random variables/vectors and their conditionings, whenever applicable, following the standard notation conventions, e.g., Q U , Q V |U , and so on. When there is no room for ambiguity, these subscripts will be omitted. For a generic joint distribution Q U V = {Q U V (u, v), u ∈ U , v ∈ V}, which will often be abbreviated by Q, information measures will be denoted in the conventional manner, but with a subscript Q, that is, is the mutual information between U and V , and so on. Logarithms are taken to the natural base. The probability of an event E will be denoted by P{E}, and the expectation operator w.r.t. a probability distribution Q will be denoted by E Q [·], where the subscript will often be omitted. For two positive sequences, {a n } and {b n }, the notation a n . = b n will stand for equality in the exponential scale, that is, lim n→∞ (1/n) log (a n /b n ) = 0. Similarly, a n · ≤ b n means that lim sup n→∞ (1/n) log (a n /b n ) ≤ 0, and so on. The indicator function of an event E will be denoted by ½{E}. The notation [x] + will stand for max{0, x}.
The empirical distribution of a sequence u ∈ U n , which will be denoted byP u , is the vector of relative frequencies,P u (u), of each symbol u ∈ U in u. The type class of u ∈ U n , denoted T (u), is the set of all vectors u ′ withP u ′ =P u . When we wish to emphasize the dependence of the type class on the empirical distributionP , we will denote it by T (P ). The set of all types of vectors of length n over U will be denoted by P n (U ), and the set of all possible types over U will be denoted by P(U ) △ = ∞ n=1 P n (U ). Information measures associated with empirical distributions will be denoted with 'hats' and will be subscripted by the sequences from which they are induced. For example, the entropy associated withP u , which is the empirical entropy of u, will be denoted byĤ u (U ). Similar conventions will apply to the joint empirical distribution, the joint type class, the conditional empirical distributions and the conditional type classes associated with pairs (and multiples) of sequences of length n. Accordingly,P uv would be the joint empirical distribution of (u, v) = {(u i , v i )} n i=1 , T (P uv ) will denote the joint type class of (u, v), T (P u|v |v) will stand for the conditional type class of u given v,Ĥ uv (U |V ) will be the empirical conditional entropy, and so on. Likewise, when we wish to emphasize the dependence of empirical information measures upon a given empirical distribution Q, we denote them using the subscript Q, as described above.

Problem Formulation
be n independent copies of a pair of random variables, (U, V ) ∼ P U V , taking on values in finite alphabets, U and V, respectively. The vector U will designate the source vector to be encoded and the vector V will serve as correlated side information, available to the decoder. We now distinguish between three different classes of codes: 1. Fixed-rate (FR) binning: When a given realization u = (u 1 , . . . , u n ) ∈ U n , of the finite alphabet source vector U , is fed into the system, it is encoded into one out of M = e nR bins, denoted by B(u), selected independently at random for every member of U n . Furthermore, the type index of u is also transmitted to the encoder, but it requires only a negligible extra rate when n is large enough. The entire binning code of source sequences of block-length n, i.e., the set {B(u)} u∈U n , is denoted by B n . Here, R > 0 is referred to as the binning rate. The decoder estimates u based on the bin index B(u), the type index T (u), and the side information sequence v, which is a realization of V .
The optimal (MAP) decoder estimates u, using the bin index B(u), the type index T (u), and the SI vector v = (v 1 , . . . , v n ), according tô As in [11], [12], we consider here the GLD. The GLD estimates u stochastically, using the bin index B(u), the type index T (u), and the SI sequence v, according to the following posterior whereP uv is the empirical distribution of (u, v) and f (·) is a given continuous, real valued functional of this empirical distribution. The GLD provides a unified framework which covers several important special cases, e.g., matched decoding, mismatched decoding, MAP decoding, and universal decoding (similarly to the α-decoders described in [5]). A more detailed discussion is given in [11].
The probability of error is the probability of the event {Û = U }. For a given binning code B n , the probability of error is given by The random binning error exponent is defined by We wish to derive a single-letter expression for the error exponent of the TRC, which is and then to study some of its basic properties.
2. Variable-rate (VR) binning: We assume that the coding rate is no longer fixed for every u ∈ U n , but depends on its empirical distribution. Let us denote the rate function by R(P u ). In that manner, for every type Q U ∈ P n (U ), all the source sequences in the type class T (Q U ) are randomly partitioned into e nR(Q U ) bins. Every source sequence is encoded by its bin index, prefixed by its type index.
The probability of error is defined similarly to (5) and will be denoted by P e,VR (B n ). For a given rate function, the random binning error exponent and the error exponent of the typical random VR code are defined similarly as (6) and (7), respectively, but with P e,FR (B n ) being replaced by P e,VR (B n ). These will be denoted by E r,VR (R(·)) and E trc,VR (R(·)). One possibility to define the excess-rate probability, which has already extensively studied in [20], is given by P{R(P U ) ≥ R}, where R is some target rate. Due to the existence of the available side information at the decoder, it makes sense to require a target rate which depends on the pair (u, v). Hence, we define the alternative excess-rate probability as P{R( where ∆ > 0 is a redundancy threshold. Respectively, the excess-rate exponent is defined as Now, the main mission is to derive the trade-off between the error exponent and the excess-rate exponent for the typical random VR code, and furthermore, to characterize the optimal rate function that attains a prescribed value for the error exponent of the typical random VR code. 3. Semi-deterministic (SD) binning: This code ensemble is a refinement of the ordinary VR code, which is sensitive to the order between H Q (U ) and R(Q U ). For types with H Q (U ) ≥ R(Q U ), i.e., type classes which are exponentially larger than the amount of available bins, we just randomly assign each source sequence into one out of the e nR(Q U ) bins. Otherwise, for types with H Q (U ) < R(Q U ), we deterministically order each member of T (Q U ) into a different bin, which provides a one-to-one mapping. This way, all type classes with H Q (U ) < R(Q U ) will not affect the probability of error, which is now given by We derive the random binning exponent of this ensemble, which is denoted by E r,SD (R(·)), and compare it to E trc,SD (R(·)), the TRC exponent of the same ensemble. They are defined similarly as (6) and (7), respectively, but with P e,FR (B n ) replaced by P e,SD (B n ). As for the VR code ensemble defined above, we analyze the trade-off between the error and excess-rate exponents and compare it to the same trade-off in the VR code.

Background
In pure channel coding, Merhav [12] has derived a single-letter expression for the error exponent of the typical random fixed composition code, In order to present the main result of [12], we define first a few quantities. Define and Γ(Q XX ′ , R) = min where D(Q Y |X W |Q X ) is the conditional divergence between Q Y |X and W , averaged by Q X .
Under the above defined quantities, the error exponent of the TRC is given by [12] E trc (R, Q X ) = min where E r (Q U , P V |U , S) and E ex (Q U , P V |U , S) are respectively the random coding and expurgated bounds associated with the channel P V |U and a fixed composition code of rate S, all codewords of which belong to T (Q U ). The exponent function E r (Q U , P V |U , S) is given by while E ex (Q U , P V |U , S) is given by where

Fixed-Rate Binning
In order to characterize the error exponent of the TRC, we define the set Q = {Q U U ′ : Q U = Q U ′ } and the quantities and Furthermore, define the following exponent function E fr trc (R) = min Then, the following theorem is proved in Appendix A.
Theorem 1. The error exponent of the TRC in the FR ensemble is given by

Discussion
Note that the expression of E fr trc (R) strongly resembles the error exponent of the TRC in channel coding (13). The constraint H Q (U, U ′ ) ≥ R in (21) is analogous to the constraint I Q (X; X ′ ) ≤ 2R in the minimization of (13). The origin of H Q (U, U ′ ) ≥ R is the following. Define i.e., the enumerator N (Q U U ′ ) is given by the sum of dependent binary random variables. For a sum N of independent binary random variables, ordinary tools from large deviation theory (e.g., the Chernoff bound) can be invoked for assessing the exponential moments E[N s ], s ≥ 0, or the large deviation rate function of P{N ≥ e nσ }, σ ∈ IR. For sums of dependent binary random variables, like N (Q U U ′ ) in the current problem, this can no longer be done by the same techniques, and it requires more advanced tools (see, e.g., [12]- [15]).
As expected, one can easily prove that E fr trc (R) is at least as large as E fr r (R) (14). In [2, Proposition 3], we present a somewhat stronger result, asserting that at relatively high binning rates, E fr trc (R) is strictly larger than E fr r (R). Furthermore, we provide an explicit expression for the rate above which the functions differ.
We may also obtain some intuitive meaning of the term α(R, Q U , Q V ) by considering the special case of MAP decoding, which corresponds to f (Q) = βE Q [log P (U, V )] for β → ∞. An analogous explanation was given in [12] for the α(R, Q Y ) term of channel coding, by considering the special case of ML decoding. For very large β, α(R, Following some basic manipulations, (21) now becomes with the set Q(R) given by The third constraint in Q(R) designates the event that an incorrect source sequence (represented by U ′ ) is assigned with a log-likelihood score larger than that of the correct sourcesequence (represented by U ) as well as those of all other source-sequences (represented by the term a(R, Q U , Q V )). The term a(R, Q U , Q V ) designates the typical value (with an extremely high probability) of the highest log-likelihood score among all the remaining incorrect sourcesequences. A more comprehensive discussion on this point can be found in [12,Sec. 4].
Although the optimal binning code at R = log |U | is a one-to-one mapping between the source sequence set U n and the e nR bins, the typical random binning deviates from this optimal code with a relatively high probability. We ask ourselves the following question: at different binning rates, what is the probability to draw a one-to-one code, such that each bin contains no more than one source sequence? Obviously, if R < log |U |, then it follow from the pigeonhole principle that this probability equals zero. Hence, in what follows, assume that R ≥ log |U |.
Consider the FR random binning mechanism of Subsection 3.1 and let be the set of bins. Then, we would like to derive P Hence, we find that a very sharp phase transition occurs at R = 2 log |U |. At rates below 2 log |U |, the probability of drawing a one-to-one code converges to zero double-exponentially fast, while at rates above 2 log |U |, this probability tends to one. This is in agreement with the simple fact that for any R > 2 log |U |, E fr trc (R) = ∞, which immediately follows from the constraint H Q (U, U ′ ) ≥ R in (21). This phenomenon is intimately related to the birthday paradox (or the birthday problem) in probability theory. The birthday paradox concerns the probability that among n randomly chosen people, at least two will have the same birth date.
More generally, it tells that if one chooses n times from a set of N equiprobable possibilities, then the first repetitions will typically occur when n is at the order of √ N .
Recall that [H(U |V ), log |U |] is the relevant range of binning rates in SW coding. We expect E fr trc (R) to improve upon E fr r (R) at relatively high rates, just below log |U |, in analogy to channel coding, where there is a strict inequality between E trc (R, Q X ) in (13) and the random coding bound in some range of low coding rates. Unfortunately, this is not the case in FR coding, since even for rates near log |U |, the event that exponentially many bins contain a few source vectors of the same type class is not negligible. While there exists a strong analogy between channel coding and SW coding (see, e.g., eqs. (14)-(15)), it seems to break down for TRCs. E.g., in the expurgated SW code, each bin contains exactly the same number of source vectors, while for the typical random FR code, the sizes of the bins fluctuates. In the extreme case of type classes with H Q (U ) < R, it follows by the discussion above that many bins may be populated by few source vectors, while many of them may still be empty. These type classes dominate the error event, and hence, this deficiency found in the FR code may be circumvented by treating them differently. This will be done in two steps. First, we pass from FR codes to VR codes.
This improves upon FR codes, but VR codes still suffer from a similar deficiency. Second, we improve further by passing to SD codes.

Variable-Rate Binning
In order to characterize the error exponent of the TRCs of the VR ensemble defined in Subsection 3.1, we define first a few quantities. We define and Furthermore, define and the following exponent function Then, we have the following.
Theorem 2. Let R(·) be any rate function. Then, for VR coding: Proof. The proof follows exactly the same lines as in the proof of Theorem 1 (Appendix A), except for one main modification: for a given source vector u ∈ T (Q U ), the code has a fixed composition with a rate of R(Q U ).
In order to characterize the excess-rate exponent, define the following exponent function: Then, we have the following.
Theorem 3. Fix ∆ > 0 and let R(·) be any rate function. Then, for VR coding: Proof. The excess-rate probability is given by: which proves the desired result.

Constrained Excess-Rate Exponent
Next, we would like to study the trade-off between the threshold ∆, the error exponent, and the excess-rate exponent. One possible way to explore this trade-off is to require the excess-rate exponent to exceed some value E r > 0, to solve E er (R(·), ∆) ≥ E r for an upper bound on the rate function R(Q U ), and then to substitute this upper bound back into the error exponent in (35) to give an expression for E e (E r , ∆). As for the first step of this procedure, we have the following result, which is proved in Appendix C.
This means that we have a dichotomy between two kinds of source types. Each type class that is associated with an empirical distribution that is relatively close to the source distribution, i.e., when D(Q U V P U V ) ≤ E r for some Q V |U , is partitioned into e nJ(Q U ,Er,∆) bins, and the rest of the type classes, those that are relatively distant from P U , are encoded by a one-to-one mapping. Two extreme cases should be considered here. First, when E r is relatively small, then only the types closest to P U are encoded with a rate approximately H P (U |V ) + ∆, which can be made arbitrarily close to the SW limit [17], and each a-typical source sequence is allocated with n · log 2 |U | bits. This coding scheme is the one related to VR coding with an average rate constraint, like the one discussed in [4]. Second, when E r is extremely large, then each type class is encoded to exp{n∆} bins, which is equivalent to FR coding.
Upon substituting the optimal rate function of Theorem 4 back into (35) and using the fact that E e (·) is monotonically increasing, we find that the trade-off function for the typical random VR code is given by V |U be the respective minimizers of the problems which are similar to (42) and (43), but with the only difference that the constraint D( , E e (E r , ∆) reaches a plateau and is the lowest possible. It follows from the fact that the stringent requirement on the excess-rate forces the encoder to encode each type class Q U to its target rate ∆, thus all of them affect the error event.
is a monotonically non-increasing function of E r . The reason for that is the fact that as E r decreases, more and more type classes are encoded with n · log 2 |U | bits, and hence do not contribute to the error event. When E r = 0, necessarily Q U = P U , only the typical set is encoded, and E e (0, ∆) is the highest possible. In this case, J(Q U ) = H P (U |V ) + ∆ and the constraint set in (43) becomes empty when ∆ > 2H P (U ) − H P (U |V ), and then E e (0, ∆) = ∞.

Constrained Error Exponent
An alternative option to study the trade-off between the threshold ∆, the error exponent, and the excess-rate exponent is to require the error exponent to exceed some value E e > 0, to solve E e (R(·)) ≥ E e in order to extract a lower bound on the rate function R(Q U ), and then to substitute this lower bound back into the excess-rate exponent in (37) to provide an expression for E er (E e , ∆). The main drawback of this option stems from the relatively cumbersome expression of E e (R(·)) in (35), which is given by a nested optimization problem, such that the extraction of the optimal rate function yields an unmanageable expression (like the lower bound in Eq. (E.16) in the proof of Theorem 6), and we have to compromise in some places in order to provide an analytically reasonable result. Thus, the lower bound on the rate function given in Theorem 6 below may not be the lowest possible for all E e values.
As a matter of fact, any lower bound to the reliability function of VR coding can serve as a basis for exploiting a lower bound on the rate function. Recall that the exact error exponent of VR random binning is given by [20, eq. (34)] Then, relying on this exponential error bound, the following bound on R(Q U ) is given, which is proved in Appendix D.
The dependence of G(Q U , E e ) on E e is as follows. For any given Q U , letQ V |U be the . Then, as long as E e < D(Q U ×Q V |U P U V ), the constraint set in (45) is empty, and R(Q U ) can be as low as −∞, which practically means that in this range, the entire type class T (Q U ) can be totally ignored, while still achieving P e ≈ e −nEe . Only for the unique type Q U = P U , G(P U , E e ) > 0 for all E e ≥ 0, and specifically, we find that Then, as long as , the maximization in (45) reaches its unconstrained optimum, and G(Q U , E e ) increases without bound in an affine fashion Of course, the fact that the proposed lower bound of Theorem 5 is unbounded suggests that it cannot be optimal at high E e values, since by allocating exp{n · 2 log 2 |U |} bins to each source type class, one can attain an infinite error exponent, as we already saw before. Thus, in the sequel, we propose a lower bound on the rate function which is not optimal at relatively low E e values (because of the compromise in our derivation), but improves upon the bound provided in Theorem 5 at relatively high E e 's. The following result is proved in Appendix E.
Theorem 6. Let E e > 0 be fixed. Then, for the matched likelihood decoder, the requirement where, Then, as long as E e < B Q * (U, U ′ ), the clipping operator in (47) is inactive and increases in an affine manner as clipping operator in (47) becomes active, and F (Q U , E e ) is given by which is a concave monotonically non-decreasing function of E e . When E e is as large as , the constraint set in (50) no longer depends on E e and F (Q U , E e ) reaches a plateau at a level of 2H Q (U ). At this range of relatively high error exponents, each type class is encoded at rate which equals twice its empirical entropy. This result agrees with previous findings in this work, i.e., if the binning rate of each type class is double its exponential size, then the entire code will be a one-to-one mapping with a very high probability.
Comparing analytically the lower bounds of Theorems 5 and 6 seems to be complicated.
Hence, we demonstrate some of the above discussed characteristics of F (Q U , E e ) and G(Q U , E e ) by referring to a numerical example. Consider the case of a double binary source with alphabets , and joint probabilities given by P U V (0, 0) = 0.8, P U V (0, 1) = 0.05, P U V (1, 0) = 0, and P U V (1, 1) = 0.15. Fig. 1 displays the two lower bounds on R(Q U ), F (Q U , E e ) and G(Q U , E e ), for this source and the specific type Q U (0) = Q U (1) = 1/2.
As can be seen, the gap between these two bounds is rather considerable at relatively high E e values, which means that the typical VR codes require significantly lower rates in order to achieve the same target error exponent. Furthermore, F (Q U , E e ) reaches a plateau while G(Q U , E e ) grows without bound. In the range where both of the bounds are affine, it seems that they are equal, a fact we conjecture to be true in general, although we were not able to assure an equality between (46) and (49). We also conjecture that the gap in the range of low E e values is due to the compromise we did in the proof of Theorem 6. This example is quite representative in the sense that other examples yielded qualitatively similar results.
In order to attain a target error exponent E e , both F (Q U , E e ) and G(Q U , E e ) are legitimate lower bounds on R(Q U ), hence also the minimum between them. Let us denote this minimum by Ω(Q U , E e ). Upon substituting Ω(Q U , E e ) back into (37) and using the fact that E r (·, ∆) is monotonically non-increasing, we find that the trade-off function is given by Since Ω(Q U , E e ) is monotonically non-decreasing in E e for every Q U , E er (E e , ∆) is monotonically non-increasing in E e , which is not very surprising. The dependence of E er (E e , ∆) on E e and ∆ is as follows. At E e = 0, recall that Ω(Q U , 0) = −∞ for any Q U = P U while Ω(P U , 0) = H P (U |V ). Thus, E r (0, ∆) = 0 as long as ∆ = 0, and it follows from the monotonicity that empty as long as E e < E * e (∆), and then E er (E e , ∆) = ∞ in this range 1 . In the other extreme of a very large E e , Ω(Q U , E e ) reaches a plateau at a level of 2H Q (U ), due to the behavior of

Semi-Deterministic Binning
In order to present some of the results in this section, we make a few more definitions. The minimum conditional entropy (MCE) decoder estimates u, using the bin index B(u) and the SI vector v, according toû The stochastic conditional entropy (SCE) decoder estimates u according to the following posterior distribution (53)

Error Exponents and Universal Decoding
First, we provide random binning error exponents, which generalizes (44) to this new defined ensemble. Define the expression and the exponent functions: and E sd r,MAP (R(·)) = min Then, consider the following result, which is proved in Appendix F.
Theorem 7. Let R(·) be any rate function. Then, for the SD code, E r,SD (R(·)) is given by 1. E sd r,GLD (R(·)) for the GLD, 2. E sd r,MAP (R(·)) for the MAP and the MCE decoders.
Next, we provide a single-letter expression for the error exponent of the TRCs in this ensemble. Let γ(R(·), Q U , Q V ), Ψ(R(·), Q U U ′ V ), and Λ(Q U U ′ , R(Q U )) be defined as in (32)-(34). Define the following exponent function E sd trc,GLD (R(·)) = min Then, we have the following.
Proof. The proof follows exactly the same lines as the modified proof of Theorem 1 (Appendix A) for Theorem 2, except for one main modification: when we introduce the type class enumerator and sum over joint types, the summation set becomes {Q U U ′ : is due to the indicator function in (9). Afterwards, the analysis of the type class enumerator yields the constraint which becomes redundant and thus omitted.
It is possible to make an analytic comparison between (55) and (57) This result is quite surprising at first glance, since one expect the error exponent of the TRC to be strictly better than the random binning error exponent. We conjecture that this phenomenon is due to the fact that now, part of the source type classes are deterministically partitioned into bins in a one-to-one fashion, and hence do not affect the probability of error. In the first place, these relatively "thin" type classes dominated the error probability at relatively high binning rates, but now, by encoding them deterministically into the bins, other mechanisms dominate the error event, like the channel noise (between u and v) or the random binning of the type classes with H Q (U ) ≥ R(Q U ). The result of the second part of Theorem 9 is also nontrivial, since it establishes an equality between the error exponent of the TRC and the random binning error exponent, but now for a universal decoder.
Concerning universal decoding, it is already known [6, Exercise 3.1.6], [20] that the random binning error exponents under optimal MAP decoding in both the FR and VR codes, given by (14) and (44), respectively, are also attained by the MCE decoder. Furthermore, a similar result for the SD code has been proved here in Theorem 7. A natural question arises whether the error exponent of the TRC is also universally attainable, or only a fraction of it. The following result, which is proved in Appendix H, provide a positive answer to this question.
The result of Theorem 10 asserts that the error exponent of the typical random SD code is not affected if the optimal MAP decoder is replaced by a certain universal decoder, that must not even be deterministic. We conjecture that similar results also hold for the FR and the VR ensembles, at least for the MCE decoder, although not being able to provide a proper proof.
Comparing to channel coding, numerical evidences shows that the error exponent of the typical random fixed composition code (given in (13)) is the same for the ML and the maximum mutual information decoder, but on the other hand, the GLD which is based on an empirical mutual information metric attains a strictly lower exponent.

Optimal Trade-off Functions
Following the first point of Theorem 9, let us denote the error exponent of the TRC under MAP decoding by E sd e (·). Upon substituting the optimal rate function of Theorem 4 back into (56) and (57) and using the fact that E sd e (·) is monotonically increasing, we find that the trade-off function is given by or, alternatively, where J(Q U ) = J(Q U , E r , ∆) is given in (42).
The qualitative dependencies of E sd e (E r , ∆) and E e (E r , ∆) (43) on E r are very similar. Quantitatively, the constraint set in (63) is a subset of the constraint set in (43), which implies that E sd e (E r , ∆) ≥ E e (E r , ∆). Referring to the dependence on ∆ in the extreme case of a very large E r , for which J(Q U , E r , ∆) = ∆, we find that the difference between E sd e (E r , ∆) and E e (E r , ∆) is quite dramatic; while E e (E r , ∆) is finite as long as ∆ ≤ 2 log |U |, E sd e (E r , ∆) = ∞ for any ∆ > log |U |. We also demonstrate the difference between E sd e (E r , ∆) and E e (E r , ∆) by referring to a numerical example. Consider once more the case of a double binary source, now with joint probabilities given by P U V (0, 0) = 0.75, P U V (0, 1) = 0.1, P U V (1, 0) = 0, and P U V (1, 1) = 0.15.
Graphs of the trade-off functions E sd e (E r , ∆) and E e (E r , ∆) as a function of ∆ in the extreme case of E r = ∞ are presented in Fig. 3. As can be seen in Fig. 3, E sd e (E r , ∆) and E e (E r , ∆) reach infinity at target thresholds log 2 ≈ 0.693 and 2 log 2 ≈ 1.386, respectively. We also provide an expression for the opposite trade-off function, where the error exponent is constrained to a given threshold. As in Subsection 5.2, the first step will be to solve E sd e (R(·)) ≥ E e in order to extract a lower bound on the rate function R(Q U ), and then to substitute this lower bound back into the excess-rate exponent in (37) to provide an expression for E sd er (E e , ∆).
Then, relying on the bound of (56), the following bound on R(Q U ) is given, which is proved in Appendix I.
Theorem 11. Let E e > 0 be fixed. Then, the requirement E sd e (R(·)) ≥ E e implies that When substituting K(Q U , E e ) back into (37) and using the fact that E r (·, ∆) is monotonically non-increasing, we find that the trade-off function is given by The qualitative dependencies of E sd er (E e , ∆) and E er (E e , ∆) (51) on E e and ∆ are very similar, hence we only mention some quantitative points where they differ from one another. In the extreme case of a very large which is a monotonically non-decreasing function of ∆. We conclude that the trade-off for the typical random SD code is strictly higher than the trade-off for the ordinary VR code.
Moreover, as long as ∆ ∈ [I P (U ; V ), I P (U ; V ) + H P (U )], E sd er (E e , ∆) reaches a strictly positive plateau, while E er (E e , ∆) reaches zero., i.e., in this range, both codes attain P e ≈ 0, but only the typical random SD code achieves an exponentially vanishing overflow probability.
We demonstrate the difference between E sd er (E e , ∆) and E er (E e , ∆) by referring to the same numerical example as in Fig. 3. Graphs of the trade-off functions E sd er (E e , ∆) and E er (E e , ∆) as a function of E e in the case of ∆ = 0.3 are presented in Fig. 4. This value of ∆ is chosen to be between the thresholds I P (U ; V ) ≈ 0.254 and I P (U ; V ) + H P (U ) ≈ 0.677. As can be seen in It may also be interesting to make a connection to the expurgated bound of the FR code in the SW model, which is given by (15). Making an analytic comparison between E fr ex (R) and E sd e (∞, ∆) is rather difficult. Thus, we examined these two exponent functions via a similar numerical example as in Fig. 3. We already mentioned before, that in the special case of E r = ∞, the rate function is given by the threshold ∆, hence we choose ∆ = R in order to have a fair comparison. Graphs of the functions E fr ex (R) and E sd e (∞, R) are presented in Fig. 5.
As can be seen in Fig. 5, both E fr ex (R) and E sd e (∞, R) reach infinity at rates log 2 ≈ 0.693. For relatively high binning rates, E fr ex (R) is strictly higher than E sd e (∞, R), which can be explained in the following way: Referring to the analogy between SW coding and channel coding, one can think of each bin as containing a channel code. In general, a channel code behaves well if it does not contain pairs of relatively "close" codewords. Since we randomly assign the source vectors into the bins (even if the populations of the bins are totally equal, which can be attained by randomly partitioning each type class into exp{nR} subsets), it is reasonable to assume that some bins will contain relatively bad codebooks. On the opposite side, in the expurgated SW code [5], each type class T (Q U ) is partitioned into exp{nR} "balanced" subsets in some sense (referring to the enumerators N (Q U U ′ ) in (23), they are equally populated in all of the bins), such that the codebooks contained in the bins have approximately equal error probabilities. Moreover, we conclude from (15) that each bin contains a codebook with a quality of an expurgated channel code. This code is certainly better than the TRCs in the SD ensemble.
In channel coding, it is known [18] that the random Gilbert-Varshamov ensemble has an exact random coding error exponent which is as high as the maximum between (16) and (17).
In SW source coding, on the other hand, it seems to be a more challenging problem to define an ensemble, such that the error exponent of its TRCs is as high as E fr ex (R) of (15). Since the gap between E fr ex (R) and E sd e (∞, R) is not necessarily very significant, as can be seen in Fig. 5, we conclude that the SD ensemble may be more attractive because the amount of computations needed for drawing a code from it are much lower than the amount of computations required for having an expurgated SW code. In addition, it is important to note that the probability of drawing a SD code with an exponent much lower than E sd e (∞, R) decays exponentially fast to zero, in analogy to the result in pure channel coding [19]. Following the error probability in (5), we have that where, We have the following large deviations result concerning Z u (v) (proved in Appendix B): Lemma 1. Let ǫ > 0 be arbitrarily small. Then, where (A.7) is due to Lemma 1, (A.9) is thanks to the method of types and the definition of (20), and in (A.10) we used the definition of N (Q U U ′ ) in (23). Therefore, our next task is to evaluate the 1/ρ-th moment of N (Q U U ′ ). Let us define For a given ρ > 1, let s ∈ [1, ρ]. Then, where (A. 16 and so, After optimizing over s, we get which gives, after raising to the ρ-th power, and in the limit of ρ tending to infinity, Continuing now from (A.11),

Upper Bound on the Error Exponent
Consider a joint distribution Q U U ′ , that satisfies H Q (U, U ′ ) > R, and define the event Consider the following: Let us use the shorthand notations I(u, u ′ ) = ½ {B(u ′ ) = B(u)}, K = |T (Q U U ′ )|, and p = e −nR . Concerning the variance of N (Q U U ′ ), we have the following and hence, which decays to zero since we have assumed that H Q (U, U ′ ) > R. Furthermore, if H Q (U, U ′ ) > R + ǫ, then P{E(Q U U ′ )} tends to zero at least as fast as e −nǫ . Now, for a given ǫ > 0, and a given joint type Q U U ′ V , such that H Q (U, U ′ ) > R + ǫ, let us define where (u, u ′ ) in the expression |T (Q V |U U ′ |u, u ′ )| should be understood as any pair of source sequences in T (Q U U ′ ). Next, we define We start by proving that P{G n } → 1 as n → ∞, or equivalently, that P{G c n } → 0 as n → ∞. Now, The last summation contains a polynomial number of terms. If we prove that the summand tends to zero exponentially with n, then P{G c n } → 0 as n → ∞. The first term in the summand, P {E(Q U U ′ )}, has already been proved to be upper bounded by e −nǫ . Concerning the second term of the summand, we have the following where N (Q U V ) is the number of source sequences within B(u), other than u and u ′ , that fall in the conditional type class T (Q U |V |v), which is a binomial random variable with e nH Q (U |V ) − 2 trials and success rate of exponential order e −nR , and hence, and hence, P Z uu ′ (v) > e n[α(R−2ǫ,Q U ,Q V )+ǫ] · ≤ e −2nǫ , which provides which proves that P{G n } → 1 as n → ∞. Now, for a given B n ∈ G n (Q U U ′ V ), we define the set as well as Then, by definition, for any B n ∈ G n (Q U U ′ V ), where we have used the fact that T (Q V |U U ′ |u, u ′ ) has exponentially the same cardinality for all (u, u ′ ) ∈ T (Q U U ′ ). Wrapping all up, we get First, note that Thus, taking the randomness of {B(u)} u∈U n into account, Now, N (T (Q U |V |v), B(u)) is a binomial random variable with |T (Q U |V |v)| . = e nH Q (U |V ) trials and success rate which is of the exponential order of e −nR . We prove that by the very definition of the function α(R + ǫ,P u ,P v ), there must exist some conditional distribution To show that, we assume conversely, i.e., that for every conditional distribution Q U |V ∈ S(P u ,P v ), which defines Writing it slightly differently, for every Q U |V ∈ S(P u ,P v ) there exists some real number t ∈ [0, 1] such that 9) or equivalently, which is a contradiction. Let the conditional distribution Q * U |V be as defined above. Then, where in the second inequality, we invoked the decreasing monotonicity of the function f (t) = (1 + t)e −t for t ≥ 0. Finally, we get that

Appendix C Proof of Theorem 4
We start by writing the expression in (37) in a slightly different way using min {Q: or, or, or that for any Q U ∈ P(U ), with the understanding that a minimum over an empty set equals infinity.

Appendix D Proof of Theorem 5
It follows by the identity [A] + = max µ∈[0,1] µA that (44) can also be written as E vr r (R(·)) = min such that E vr r (R(·)) ≥ E e is equivalent to or, or that for any Q U ∈ P(U ), and the proof is complete.

Appendix E Proof of Theorem 6
We start by writing the expressions in (32), (33), and (35) in a slightly different way. First, (32) can be written as where (E.6) follows by the identity min {Q: g(Q)≤0} f (Q) = min Q sup s≥0 {f (Q) + s · g(Q)}. As for (33), we have where Now, we want to solve E e (R(·)) ≥ E e and arrive at a lower bound on the rate function R(Q U ).
Requiring that (E.12) is greater or equal to E e is equivalent to which is the same as ∃µ ∈ [0, 1], ∃τ ∈ [0, 1], ∃{QŨ |V : QŨ = Q U }, ∀θ ≥ 1 : 14) or which is, in turn, equivalent to require that for any Q U ∈ P(U ), It turns out that the expression in (E.16) is relatively cumbersome and cannot be recast into a more simple expression, hence, at that point, we must compromise on the tightness of the lower bound for the rate function. Note that all the maximizations on the right-hand-side of (E.16) are mandatory, but the minimizations are not, such that we may choose any value we like and still have a valid lower bound on the rate function. Having said that, let us choose τ = 0, which saves us the need to maximize over θ ≥ 1 and minimize over {QŨ |V : QŨ = Q U }.
We get the following weakened lower bound on the rate function: At this point, we assume that f (Q U V ) = E Q [log P (U, V )], and so where (E. 19) is because the objective in (E.18) is concave in Q V |U U ′ and affine in µ. Now, the minimization over {Q V |U U ′ } can be carried out as follows: It is relatively easy to prove that the function is minimized at µ * = 1 2 . We use the definition of B Q (U, U ′ ) in (48), which finally provides which completes the proof.

Appendix F Proof of Theorem 7
We have that Step 1: Averaging Over the Random Code We first condition on the true source sequences (U = u, V = v) and take the expectation only w.r.t. the random binning. We get such that the probability in (F.6) is given by = e nH Q (U ′ |V ) trials and probability of success e −nR(Q U ) , and so, Finally, we have that thus, Step 2: Averaging Over U and V Notice that the exponent function E(u, v) depends on (u, v) only via the empirical distribution P uv . Averaging over the source and the SI sequences, now provides which proves the first point of Theorem 7.
Step 3: Moving from Stochastic to Deterministic Decoding In order to transform the GLD into the general deterministic decoder of we just have to multiply f (·) by β ≥ 0, and then let β → ∞. We find that the overall error exponent of the semi-deterministic variable-rate code with the general deterministic decoder of (F.26) is given by Appendix G

Proof of Theorem 9
By definition of the error exponents, it follows that E sd trc,GLD (R(·)) ≥ E sd r,GLD (R(·)). We now prove the other direction. The expression in (57) can also be written as We upper-bound the minimum in (G.5) by decreasing the feasible set; we add to Q the constraint that U ↔ V ↔ U ′ form a Markov chain in that order and denote the new feasible set byQ. We get that E sd trc,GLD (R(·)) ≤ miñ On the other hand, if the maximum is given by γ(R(Q U ), Q U , Q V ), let Q * = Q * U |V be the maximizer in (G.6), and then which completes the proof of the theorem.