Multi-Class Cost-Constrained Random Coding for Correlated Sources over the Multiple-Access Channel

This paper studies a generalized version of multi-class cost-constrained random-coding ensemble with multiple auxiliary costs for the transmission of N correlated sources over an N-user multiple-access channel. For each user, the set of messages is partitioned into classes and codebooks are generated according to a distribution depending on the class index of the source message and under the constraint that the codewords satisfy a set of cost functions. Proper choices of the cost functions recover different coding schemes including message-dependent and message-independent versions of independent and identically distributed, independent conditionally distributed, constant-composition and conditional constant composition ensembles. The transmissibility region of the scheme is related to the Cover-El Gamal-Salehi region. A related family of correlated-source Gallager source exponent functions is also studied. The achievable exponents are compared for correlated and independent sources, both numerically and analytically.


Introduction
In information theory, the fundamental problem of communication over a channel is studied from two complementary perspectives. First, one characterizes the transmissibility conditions, namely the circumstances under which the error probability asymptically vanishes as the blocklength goes to infinity. Second, one describes by means of error exponents the speed at which this error probability vanishes; the larger the exponent, the faster the error probability tends to zero. Since finding an exact expression for error probability is very difficult, a large body of work has investigated upper and lower bounds on the average error probability, or equivalently lower and upper bounds for the error exponent. In point-to-point, that is, single-user communication, using separate sourcechannel random coding [1,2], possibly with expurgation [1] (Eq. 5.7.10), yields lower bounds on the error exponent. In contrast, finding an upper bound to the error exponent satisfied by every code is more challenging. Generally, the hypothesis-testing method [3] is employed to derive upper bounds for the error exponent. Two well-known upper bounds to the error exponent are the sphere-packing exponent [4] and the minimum-distance exponent [5]. In fact, for rates greater than critical rate [1] (Sec. 5.6), the random-coding and sphere-packing bounds coincide with each other, while the expurgated and minimumdistance bounds coincide at rate zero.
For point-to-point communication, it was shown in ref. [1] (Prob. 5.16) that joint sourcechannel coding leads in general to a larger exponent than separate source-channel coding. σ = N , u N and U N denote the ordered vector of source messages for all users and the Cartesian product of the all source alphabets respectively. The sources are memoryless and are characterized by the joint probability distribution P N P N (u N ) = n ∏ t=1 P N (u N ,t ), (1) and by the symbol joint probability distribution P N . The source message and symbol marginal distributions of user ν ∈ N are denoted by P ν and P ν respectively. Assuming that the sources are independent, the marginal distributions induce new joint (mismatched) probability distributions of sets of users σ ⊂ 2 N . The induced independent-message and -symbol probabilities, denoted by P ind σ and P ind σ , are given by and similarly for P ind σ . Each user ν has an encoder that maps, without cooperation with the other users, the source message u ν onto a codeword x ν (u ν ) also of length n and with symbols drawn from the alphabet X ν . We denote the codebook of user ν by C ν n . We denote by x σ ∈ X n σ the vector of codewords for all users in a set σ ⊂ 2 N . Both terminals simultaneously send these codewords over a discrete memoryless multiple access channel with output alphabet Y. The symbolwise transition probability is denoted by W, and the channel is characterized by a conditional probability distribution where y is the received sequence of length n. Based on y, a joint decoder estimates all transmitted source messages u N according to the maximum a posteriori criterion: where U n N denotes the set of all possible source messages u N . An error occurs if the decoded messagesû N differ from the transmitted u N ; we refer toû N = u N as an error event. The error probability for a given set of codebooks, P e (C N n ), is thus given by In our analysis, it will prove convenient to split the error event into 2 N − 1 distinct types of error events indexed by the non-empty subsets in the power set of the user indices 2 N \ ∅, for example, τ ∈ {{1}, {2}, {1, 2}} for N = 2. More precisely, the error event of type τ corresponds to the conditionsû ν = u ν for all ν ∈ τ andû ν = u ν for all ν ∈ τ c , where τ c is the complement of τ in the power set of the user indices. We are interested in the asymptotics of the error probability for sufficiently large n, namely whether the error probability vanishes and how fast this probability tends to zero as it vanishes. The sources U N are said to be transmissible over the channel if there exists a sequence of codebooks C N n such that lim n→∞ P e (C N n ) = 0. To characterize the speed at which the error probability vanishes, we use the notion of exponent. An exponent E is said to be achievable if there exists a sequence of codebooks such that lim inf Source transmissibility and error-exponent achievability are typically studied by means of random coding. With random coding, one generates and studies sequences of ensembles of codebooks whose codewords are randomly drawn from a distribution Q ν (x ν |u ν ) independently for each user; as indicated by the notation, this distribution may possibly depend on the source message u ν . The random-coding probability distribution for the channel input Q N (x N |u N ) combined for all users is given by The use of random coding allows us to study how the error probability averaged over the ensemble, denoted byP e vanishes as n grows. More importantly, it shows the existence of good codes in the ensemble such that their error probability vanishes. For the point-topoint and the multiple-access channels, a number of such random-coding ensembles have been studied in the literature, as reviewed in the following section, where we also present a multi-class cost-constrained ensemble subsuming all these ensembles and characterize the achievable exponent and transmissibility region of this ensemble.

Summary of Notation Used in the Paper
Sets are usually denoted by calligraphic upper case letter, e.g., X , and the n-Cartesian product set of X is denoted by X n . The cardinality of a set such as X is denoted by |X |. The indicator function representing an error event or that an element x belongs to a set X is denoted by 1 1 1{x ∈ X }.
The number of users is denoted by N and user indices are typically represented by ν. The set of all users is denoted by N . The power set of all subsets of N is denoted by 2 N and the complement of a subset σ ⊂ 2 N is denoted by σ c ; sets in the power set of users are denoted that by Greek letters, for example, τ and σ. The number of source-message classes and of cost functions for user ν are respectively denoted by K ν and L ν ; the sets of such classes are functions are respectively denoted by K ν and L ν . Indices for source classes and cost functions are typically denoted by i ν and ν respectively.
Subscripts and superscripts in a quantity A may represent sets of user indices σ. Depending on the context, the quantity represents a list or a suitable product of variables for all elements in the set σ. For instance, for If the quantity is a probability distribution, its value for σ represents the probability distribution of the sequence, for example, If the quantity is a set, its value for σ is the Cartesian product, for example, U σ = U 1 × U 2 for σ = {1, 2}. If σ = ∅, then A σ = A σ = 0. If σ is a singleton, for example, σ = {2}, we simply write A 2 or A 2 . We denote the operation that merges and sorts two lists A σ 1 and A σ 2 with σ 1 ∩ σ 2 = ∅ into an ordered list containing all users in the union σ 1 ∪ σ 2 by [A σ 1 , A σ 2 ]. For sets of user indices, we denote such merging operation by [σ 1 , σ 2 ] and we have [σ, σ c ] = N . Scalar random variables are denoted by capital letters, for example, X and lowercase letters represent a particular realisation, for example, x ∈ X . Capital bold letter denotes random vectors or sequences, for example, X, while small bold letter x ∈ X n denote deterministic vectors or sequences. Probability distributions for vectors or sequences, typically of length n, (resp. for symbols) are represented by text-style letters, for example, P, Q, W (resp. math-style letters, for example, P, Q, W). Sequences symbols are usually affixed a subscript to indicate a user index; the t-th symbol in the sequence x ν is denoted by x ν,t .
The source-symbol distribution for user ν is denoted by P ν (u ν ). The joint distribution for users σ is denoted by P σ (u σ ); the joint distribution, computed as if the sources were independent, is denoted by P ind σ (u σ ). The conditional source distribution for users σ 1 given another set σ 2 is denoted by P σ 1 |σ 2 (u σ 1 |u σ 2 ). Vector or sequence distributions are defined analogously with P replaced by P. Channel input distributions are denoted by Q ν (x ν ), , where i ν denotes the index of the class source message and Q ν,u ν (x ν ) is a shorthand for the conditional distribution Q ν (x ν |u ν ). Cost functions are similarly denoted by a ν (x ν ), a i ν ν (x ν ), or a i ν ν,u ν (x ν ). Vector or sequence distributions are defined analogously with Q or a respectively replaced by Q or a. The conditional distribution for the channel output symbol (resp. sequence) is denoted by W(y|x N ) (resp. W(y|x N )).

Review of Random-Coding Ensembles
The simplest and oldest random-coding ensemble is the independent, identically distributed (iid) [1,12,17,27], where the symbols x ν,t in all codewords x ν of a given user ν are generated independently according to the same input distributions Q ν (x ν,t ) for all source messages u ν . Throughout the paper, we shall identify ensembles by hyphenated acronyms, where the first part indicates the possible dependence of the codeword on the source message and the second part describes the generation of symbols in a codeword. This first ensemble is thus the message-independent iid (mi-iid) ensemble, since codewords have the same distribution for all source messages and symbols are independent of each other and independent of the source message symbols too. For the mi-iid ensemble, the random-coding distribution is given by In the message-independent, independent-conditionally-distributed (mi-icd) ensemble, the codewords x ν of user ν are generated identically for all source messages u ν , independently of the full message u ν , and with symbols according to a set of | U ν | conditional probability distributions Q ν,u ν (x ν ) Q ν (x ν |u ν ). To this end, let I u ν (u ν ) denote the set of positions where the symbol u ν ∈ U appears in the sequence u ν , namely Within each subsequence of u ν where u ν,t = u ν , represented by u ν I u ν (u ν ) , symbols are drawn independently according to Q ν,u ν (x ν ). For this mi-icd ensemble, codewords are generated according to Compared to the mi-iid ensemble, the mi-icd ensemble can lead to a larger transmissible region for the multiple-access channel with correlated sources [11,21]. An example of generation of three codewords x (1) ν , x (2) ν and x (3) ν in the mi-icd ensemble is shown in Figure 1, for a given source sequence u ν = (α, β, β, γ, β, γ, γ, α, β, α) with source alphabet U = {α, β, γ}. To generate each codeword x ν with alphabet X = {a, c, e}, three subcodewords x ν (I α (u ν ), x ν (I β (u ν ) and x ν (I γ (u ν ) are pairwise-independently generated with i. i. d. distributions Q ν,α = (1/3, 1/3, 1/3), Q ν,β = (1/2, 1/4, 1/4) and Q ν,γ = (1/3, 2/3, 0), respectively. Symbols generated according to Q ν,α , Q ν,β and Q ν,γ are respectively represented as green circles, blue boxes and red diamonds in the figure. In the example, I α (u ν ) = {1, 8, 10}, I β (u ν ) = {2, 3, 5, 9} and I γ (u ν ) = {4, 6, 7}. For instance, the subcodeword x (1) ν (I γ (u ν ) has three symbols, each generated independently from Q ν,γ , leading to the red-diamond symbols x (1) ν (I γ (u ν ) = (a, a, a).

Generalized Multi-Class Cost-Constrained Ensemble
Motivated by the ensembles listed in the previous section, and inspired by refs. [1] (Ch. 7) and [26] (Sec. II), we study a generalized message-dependent multi-class cost-constrained random-coding ensemble with multiple auxiliary costs.
For each user, we partition the set of source messages into K ν disjoint classes with thresholds on the message probabilities as in Equation (12). Let the source message be in the i ν -th class, that is, i ν (u ν ) = i ν . Given the source message u ν and the source symbol u ν , we consider the subsequence u ν I u ν (u ν ) , where I u ν (u ν ) is defined in Equation (9), and we denote the corresponding source subsequence and subcodeword by u ν I u ν (u ν ) and x ν I u ν (u ν ) respectively. For each user ν, class index i ν , and source message symbol u ν , the subcodeword x ν I u ν (u ν ) is drawn according to a symbolwise i. i. d. distribution Q i ν ν,u ν (x ν ) conditioned on a set of cost constraints being satisfied. We consider L ν additive cost functions a i ν , ν ν,u ν (x ν ), ν ∈ L ν = {1, . . . , L ν }. The total cost a i ν , ν ν,u ν x ν I u ν (u ν ) of the subcodeword x ν I u ν (u ν ) is given by the sum of the symbol costs a i ν , ν ν,u ν , namely We assume that the average cost φ i ν , ν ν,u ν under the conditional distribution Q i ν ν,u ν is zero: Finally, fix some parameters δ ν > 0 and let D i ν ν be the set of codewords for which the average empirical cost of its constituent subcodewords ν,u ν = 0 for all cost functions and source symbols, i.e., Codewords x ν are the combination of subcodewords x ν I u ν (u ν ) with respective positions in I u ν (u ν ). For this multi-class cost-constrained ensemble, the random-coding distribution is thus given by where Ξ ν is a normalizing constant and the class index is determined by the source message, The multi-class cost-constrained ensemble subsumes all the ensembles described in Section 3.1. First of all, the iid and icd ensembles are recovered by setting L ν = 0 and choosing the appropriate number of classes K ν and random-coding distributions Q ν , Q ν,u ν , Q i ν ν and Q i ν ν,u ν . For all these cases, the set D i ν ν,u ν includes all generated codewords and the normalizing constant is Ξ ν = 1.
To recover the constant-composition ensembles, for which constraints force the subcodewords to belong to some set T n ν (Q ν ) or T |I uν (u ν )| ν (Q i ν ν,u ν ), for each of the K ν classes for user ν we set δ ν < 1, L ν = |X ν | and bijectively map the channel input symbols to cost function indices ν (x ν ) so that In case the ensemble does not depend on either i ν or u ν , these symbols are dropped from Equation (26). For example, for the md-cc ensemble, we have a i ν ν, ν ( ν ) = 1 1 1 x ν = ν − Q i ν ν (x ν ). In addition, the codeword set D i ν ν,u ν in Equation (23) is simplified as which is the same as T n (Q i ν ν ) given a version of Equation (15) where Q ν may depend on i ν . Again, choosing the right number of classes K ν and random-coding distributions Q ν , Q ν,u ν , Q i ν ν , and Q i ν ν,u ν recovers the various constant-composition ensembles. By construction, the set D i ν ν,u ν includes only the (sub)codewords with empirical distribution close to respectively Q ν , Q ν,u ν , Q i ν ν , and Q i ν ν,u ν , and the normalizing constant Ξ ν is the probability of the corresponding type set (or product thereof). As an example, for the md-ccc ensemble, choosing the cost functions in Equation (26) as follows yields the following cost-constraint set, which is equivalent to Equation (17),

Exponent for the Generalized Multi-Class Cost-Constrained Ensemble
Theorem 1. For the transmission of N correlated memorlyess sources with joint distribution P N , where N = {1, 2 . . . , N}, over a channel with input x N over a memoryless channel with transition probabilitiy W(y|x N ), consider a random-coding multi-class cost-constrained ensemble where source messages for each user ν ∈ N are allocated, depending on their probabilities, into K ν classes with thresholds {γ ν,0 , γ ν,1 , . . . , γ ν,K ν }, as in Equation (12), and encoded onto codewords randomly generated with a distribution Q i ν ν (x ν |u ν ) that depends on the source message according to Equation (24) through symbol distributions Q i ν ν,u ν that possibly depend on the source-message class index i ν and source symbol u ν and L ν cost functions a i ν , ν ν,u ν , ν ∈ {1, 2, . . . , L ν }. This random-coding ensemble attains the following exponent E cost where the Gallager function E (31) and the functions Λ i σ σ (u σ ) and R i σ σ,u σ (x σ ) are respectively given by and implicitly depende on the set of optimization parameters λ L,U Proof. This result is proved in Appendix A.
The random-coding exponent in Equation (30) depends on the partitioning of the source-message set into classes, the channel input distributions, and the codeword costconstraint functions. The best possible generalized cost-constraint exponent is obtained by optimizing over the multi-class partitioning, the cost constraints and the input distributions. We briefly discuss the optimization w. r. t. the thresholds of the source messages partitioning in Appendix B. In the next section, we provide some numerical examples where we compute the optimal exponents for either independent or correlated sources, and find that the optimal number of classes is two. In ref. [31] (Sec. 3.2.1.1), we provide some indications of why this optimality of only two classes is harder to establish in multi-user scenarios, compared to the single-user case. In the next section, we use Equations (31) and (30) to respectively obtain the source and channel Gallager functions of the various ensembles in Section 3.1 and rank their achievable exponents and transmissibility regions.

Gallager Functions for Correlated Sources
In this section, we evaluate the generalized Gallager of the multi-class cost-constrained ensemble in Equation (31) for the various ensembles described in Section 3.1. In the cases where it is possible, we relate this Gallager function to the well-known [1] correlated-source and channel Gallager functions, respectively given by: where σ ∈ 2 N . Using that [u σ , u σ c ] = u N , the standard Gallager source function is given by E s (ρ, P N ) = E s,N (ρ, P N ), with N = {1, . . . , N} the set of user indices. For the simple mi-iid ensemble, with only one source class and no cost constraints, K ν = 1 and L ν = 0 for all ν ∈ N , and Λ i σ σ (u σ ) = R i σ σ,u σ (x σ ) = 1 for all σ ∈ 2 N . With no statistical dependency between messages and codewords, (36) Isolating the summations over u τ c and u τ , we can split the Gallager function as , the transition probability of a channel with input x τ and output (x τ c , y).
For the mi-icd ensemble, we have a similar set-up as for the mi-iid ensemble, where Q ν,u ν (x ν ) now may depend on u ν . In this case, the Gallager function E mi-icd τ (·) is given by As the summations over u τ c and u τ are not independent from the rest, the Gallager function does not split into source and channel functions unless the sources are independent, in which case one can find an mi-iid ensemble with a tilted unconditional input distribution and identical exponent. To this end, and for a given conditional input distribution From this equation, we have the following equality: Substituting this identity together with (38) and rearranging the result, we obtain the following Gallager function for independent sources: For the md-iid and md-icd ensembles, there are K ν source classes per user and no cost constraints, i.e., L ν = 0 and As the summations over u τ c and u τ are now independent from the rest, the Gallager function splits as The maximization w. r. t. λ L,U N in Equation (30) only affects the second term in the r. h. s. of Equation (44), since the function Λ i N N only appears in the source part of the exponent. In Appendix C, we discuss the properties of Equation (45) after the maximization w. r. t. λ L,U N as a function of ρ, and establish some connections to the Gallager source function (34) and to the source functions for the single-user md-iid ensemble in ref. [8].
The Gallager functions for the constant-composition ensembles differ from the ones considered so far in the presence of L ν = |X ν | cost functions a i ν , ν ν,u ν (x ν ), given in Equation (26), for each input distribution Q i ν ν,u ν (x ν ). These cost functions appear in the Gallager functions through the factors R i σ σ,u σ (x σ ), for σ ∈ {τ, τ c } that multiply each appearance of Q i σ σ,u σ (x σ ) in the function, and through their associated optimization parameters r N N u N . The expressions of the Gallager functions for these constant-composition ensembles can be easily inferred from this obversation, so we focus on the factor R i σ σ,u σ (x σ ) itself.
For the mi-cc and md-cc ensembles, the cost functions a i ν , ν ν,u ν (x ν ), factor R i σ σ,u σ (x σ ), and associated optimization parameter r ν νu ν are independent of u ν , we thus write a i ν , ν ν (x ν ), R i σ σ (x σ ), and r ν ν . The expressions in Equations (26) and (33) for L ν = X ν give The exponent in Equation (46) can be evaluated as where we have defined a function α i ν τ,ν (x ν ) that depends on τ and i ν through the optimization parameters r ν ν . We can be easily verify that α i ν τ,ν has zero mean, in other words, At this point, the parameters r ν ν may be replaced by the equivalent real-valued functions α i ν τ,ν (x ν ). We obtain the mi-cc Gallager function E mi-cc τ (·) by setting i N = 1 and λ L,U N = 0 in Equation (31), where we split the Gallager function into channel and source terms in analogy to Equation (37). In ref. [31] (Eq. (4.49)), the md-cc ensemble was studied for N = 2 users in both the primal and dual domains. The md-cc Gallager function E md-cc τ (·) for N users is obtained by combining the derivation of Equation (50) with that of Equation (44) to yield As in previous cases, the exponent is obtained after maximization over α , and parameters r ν νu ν for the mi-ccc and md-ccc ensembles do depend on u ν . In analogy to Equation (48), we define a zero-mean function and similarly for β τ,ν,u ν (x ν ) for the mi-ccc ensemble. The Gallager function for the miccc ensemble E mi-ccc τ (·) is obtained by combining the derivations of Equation (50) and of Equation (38), Similarly, for the md-ccc ensemble, and in agreeement with the 2-user case studied in ref. [31] (Eq. (4.45)), combining the derivations of Equations (50) and (43), yields (54)

Transmissibility
We may obtain the transmissibility conditions from the achievable exponents derived in Section 4.1, following the random-coding method described in ref. [1] (Th. 5.6.4). The analysis extends the transmissibility condition for joint source-channel coding in ref. [1] (Prob. 5.16), to account for statistical dependency of the codeword on the source message in the multiuser set-up. As mentioned above, the source U N is transmissible over the channel W if there exists a sequence of codes with vanishing error probability, or equivalently, with strictly positive achievable error exponent E cost in Equation (30). As an example, we present the derivation for the mi-icd ensemble where the class and cost functions in Equations (32) and (33)  For the mi-icd case, and similarly to Gallager' Therefore, the achievable exponent is strictly positive, namely E mi-icd τ (ρ τ , ·) > 0, as far as the slope of the E mi-icd τ (ρ, ·) function is strictly positive at ρ = 0, that is Taking the derivative with respect to ρ at both sides of Equation (38), after some algebraic manipulations, we find that (56) is equivalent to We next write the expression in the left hand-side of the inequality (57) in terms of entropy and mutual information. We denote as H(P) the entropy of a source with distribution P [32] (Eq. (2.1)) and by I(Q, W) the mutual information of a channel W with input distribution Q [32] (Eq. (2.28)). For σ ⊂ 2 N , we define a channel input distribution Q τ|σ , that is conditioned to the source messages u σ , as Therefore, the transmissibility condition (57) can be compactly expressed as As it is, Q τ c |τ c is "transparent", as it cancels inside the fraction, and the channel law may also be written as Q τ c |τ c W, removing the conditioning in the mutual information. With N = {1, 2} in Equation (59), we recover the achievable Cover-El Gamal-Salehi region [11] (Eq. (3)).

Numerical Examples
In this section, we present two simple examples showing that the exponent of the md-iid ensemble can be larger than that of the mi-iid ensemble with only two classes (and associated input distributions) for each user. First, we consider two correlated discrete memoryless sources, N = 2 and N = {1, 2}, with alphabet U ν = {0, 1} for both users ν ∈ N , and probability distribution P N (u 1 , u 2 ) given in matrix form as The sources are sent over a discrete memoryless multiple-access channel with input alphabets X 1 = X 2 = {1, 2, 3, 4, 5, 6} and output alphabet Y = {1, 2, 3, 4}. The channel transition probabilites are given by a 36×4 matrix W, such that W(y|x 1 , x 2 ) is the row x 1 + 6(x 2 − 1). The transition matrix W is given by where the 6 × 4 submatrices W , = 1, . . . , 6 are given as follows. First, the submatrix W 1 corresponds to the point-to-point channel discussed in ref. [8] (Sec. IV.C), given by for k 1 = 0.045 and k 2 = 0.01. Let the m-th row of matrix W 1 is denoted by W 1 (m). The matrix W 2 (resp. W 3 ) is a 6 × 4 matrix whose rows are all W 1 (5) (resp. W 1 (6)). The matrices W 4 , W 5 and W 6 are respectively given by The optimal achievable exponent [8] (Sec. IV.C) for the single-user channel W 1 in Equation (62) is related to two different distributions Q and Q † , given in vector form by Q † = 1/4, 1/4, 1/4, 1/4, 0, 0 .
We let each user employ these distributions in the md-iid ensemble with input distribution in Equation (13) according to the source message partitioning in Equation (12) with K ν = 2 classes per user and thresholds γ N = (γ 1 , γ 2 ). Since we consider two input distributions for each user, the channel Gallager function max ρ∈[0,1] E 0 (ρ, Q i τ τ , WQ i τ c τ c ) is not concave in ρ [8]. To find the md-iid exponent E md-iid , we optimize over the class thresholds following the method in Appendix B with the Gallager function in Equation (44), exploit the properties of the source function in Equation (45) in Appendix C, and also find the optimal input distribution assignment of Q i ν ν for each ν ∈ {1, 2}. In our setting, we have four possible assignments, namely We start our numerical discussion by assessing which of the possible four assignments in Equations (66)-(69) leads to a higher error exponent. For each possible pair of thresholds (γ 1 , γ 2 ), we numerically calculate the optimal assignment Ω (γ N ) given by and the corresponding achievable error exponent E md-iid (γ N ) as where the exponent function E i N τ (γ N ) is given in Equation (A55). Figures 2 and 3 respectively show Ω (γ N ) and E cost (γ N ) for the valid range of γ N . For most pair of thresholds (γ 1 , γ 2 ), assignments Ω 1 and Ω 3 lead to the highest exponent among the possible assignments, while assignments Ω 2 and Ω 4 are optimal only for a marginal region. Using this information, and combined with the values of the achievable exponents in Figure 3, we determine the message-dependent exponent In this example, we obtained the achievable exponent E md-iid = 0.2611, corresponding to the input distribution assignment Ω 1 in Equation (66) and optimal source message partitioning γ 1 = 0.8469 and γ 2 = 0.6581. The optimal point γ N is shown by a white (black) bullet in Figure 2   Alternatively, we may first optimize over γ N and then over the assignments Ω j . To do so, we solve the system of Equation (A58) in Appendix B to numerically determine the optimal thresholds γ N , and compute the exponent E cost (Ω j ) as where the exponent function E i N τ (γ N ) is given in Equation (A55). We provide in Table 1 the values of the optimal thresholds γ N and exponents E i N τ (γ N ) under the different assignment Ω j , for the three types of error τ and the four possible user classes i N . For each assignment, the minimum over i N and τ as in Equation (73) is highlighted in gray, leading to the exponent E cost (Ω j ). The message-dependent exponent is then recovering the error exponent E md-iid = 0.2611 for input distribution assignment Ω 1 obtained using the previous method in Equation (71).
In the second example, we consider the transmission of two independent discrete memoryless sources with identical source alphabets U ν = {0, 1} with distributions induced by the marginals of Equation (60), given by P 1 (0) = 0.01 and P 2 (0) = 0.001. These sources are transmitted over the multiple-access channel with transition probability given by Equation (61), and are encoded using the md-iid ensemble with the input distribution assignments Ω j in Equations (66)-(69). Following the footsteps of the correlated sources case, in Table 2 we calculate optimal thresholds γ N and exponents E i N τ (γ N ) for the possible input distribution assignments and determine the exponent of the md-iid ensemble using Equations (73) and (74). In this case, the optimal assignment is again Ω 1 , with optimal source message partitioning specified by the thresholds γ 1 = 0.8779 and γ 2 = 0.6933, achieving an exponent of E md-iid = 0.2458, slightly smaller than that of correlated sources. Table 1. Correlated-sources optimal thresholds γ N and exponents E i N τ (γ N ) in Equation (73) for assignments Ω j in Equations (66)-(69). For each assignment, the minimum over i N and τ is highlighted in gray.

Assignment Ω 1
Assignment Ω 2  For the sake of completeness and purpose of comparison, we also calculate the exponent for the mi-iid ensemble described in Equation (8). In the absence of message dependence, for a given assignment Ω j , the mi-iid exponent is given by where the exponent function E τ is given by E τ = max ρ E mi-iid τ (ρ, P N , Q N , W) and E mi-iid τ is the Gallager function in Equation (37), described in the previous subsection. For both the correlated and independent sources described above, Table 3 presents the achievable exponents E τ for each type of error τ and input distribution assignment (Q 1 , Q 2 ), where Q 1 and Q 2 are either of Q and Q † in Equations (64) and (65). In our numerical example for correlated sources, the assignment with highest exponent is (Q 1 , Q 2 ) = (Q † , Q ), giving an exponent of E mi-iid = 0.2503, slightly smaller than that of the md-iid ensemble. In contrast, the mi-iid exponent for independent sources, according to the second part of Table 3 is found to be E mi-iid = 0.2367 with input distribution (Q 1 , Q 2 ) = (Q , Q † ). In this case, the md-iid exponent E md-iid is around 4% larger that the mi-iid; this situation is in contrast with to-point communication, where the gain in exponent achieved by an ensemble with two distributions is typically smaller, for example, 1% in ref. [8]. Hence, message-dependent random coding with two class distributions, compared to iid random coding, may lead to a higher error exponent gain in the MAC than in point-to-point communication. Table 3. Mi-iid exponents E τ in Equation (75) for two correlated and two independent sources vs several input distribution assigments (Q 1 , Q 2 ). For each assignment, the minimum over τ is highlighted in gray.

Comparison of the Random-Coding Achievable Error Exponents
From the numerical results presented in Section 4.3, as well as from refs. [8,20,28,31], the message-dependent ensembles attain in general a larger exponent than their messageindependent counterparts. We now compare the random-coding exponents for the ensembles presented in Section 3.1, whose Gallager functions were obtained in Section 4.1.
For independent sources, we found in Equation (42) that for a given conditional input distribution Q ν,u ν (x ν ) and ρ, there exists an iiid distribution Q ν,ρ given by Equation (39) with identical Gallager function. Thus, the mi-iid and mi-icd ensembles attains the same exponent, after maximization over the input distributions. Similarly, we conclude that md-iid and md-icd-ensembles attain the same exponent.
In ref. [31] (Prop. 2.9), it was proved that for point-to-point communication, the exponent of the mi-ccc ensemble may be lower than that of the mi-cc ensemble. The same steps actually prove the same result for the MAC with independent sources. Thus, for the MAC with independent sources we have and E md-cc is thus largest among the ensembles in Section 3.1 for an arbitrary input distribution. As discussed in ref. [29] (Th. 4), for optimal input distributions both E md-cc and E md-iid may coincide. Concerning the optimal partitioning into message classes, for point-to-point communication it is known that partitioning the source-message set into two classes is sufficient to attain the optimal error exponent [8,31] (Prop. 2.7). However, the proof of ref. [31] (Prop. 2.7) cannot be easily generalized to the MAC with independent sources. At the same time, we could not find an example showing that assigning more than two input distributions leads to a larger exponent. Hence, finding the sufficient number of input distributions is for the message-dependent exponent is an open problem.
The comparisons in Equations (76) and (77) for correlated sources require, in general, a more sophisticated machinery and we consider here two simple cases. For the message-dependent md-icd and md-ccc ensembles, we observe that compared to E md-icd  (44), yielding E md-iid ≤ E md-cc . Put together, for correlated sources it holds that suggesting that, as in the case of single-user communication, the use of constantcomposition input distributions may lead to higher exponents than the symbol-wise independent distributions when transmitting correlated sources over the MAC. Summarizing, proper choices of the cost functions recover the different coding schemes considered in Section 3.1, including message-dependent and message-independent versions of iid, independent conditionally distributed, constant-composition, and conditional constant composition ensembles. Thanks to the flexibility of the generalized cost-constraint random-coding ensemble, the achievable exponents of the various ensembles can be compared and ranked, both numerically and analytically.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1
We start by bounding the average error probability over the generalized cost-constraint ensemble,P e . Counting ties as errors, the random coding union bound [2] (Th. 16) for joint source-channel is where Q N (x N |u N ) is given by Equation (7), with every user using the generalized costconstrained input distribution Q ν (x ν |u ν ) as in Equation (24), andx N has the same distribution as x N but conditioned onû N rather than u N , i.e., Q N (x N |û N ). The summation over u N = u N can be split into 2 N − 1 distinct types of error events indexed by the non-empty subsets in the power set of the user indices 2 N \ ∅, e.g., τ ∈ {{1}, {2}, {1, 2}} for N = 2, such thatû τ c = u τ c andû ν = u ν for all ν ∈ τ. Since min{1, a + b} ≤ min{1, a} + min{1, b}, we boundP e as whereP τ e is in turn given bȳ where the inner probability is computed according to the distribution Q τ (x τ |u τ ), including only users u τ in the set τ asx τ c = x τ c . We recall that [x τ c ,X τ ] is the sorted merger of the channel inputs for users in the sets τ c and τ, in this case x τ c andX τ respectively.
Next, we split the summation over u N in Equation (A3) into classes i N ∈ K N defined by Equation (12), summing then over the messages belonging to the Cartesian product of the sets A i N N . We note that codewords are generated according to distributions that depend on the class index of the sources. Let D i N N ,u N be the Cartesian product of the sets of codewords D i ν ν,u ν in Equation (23) for ν = 1, 2, . . . , N, and define where Q i ν ν (x ν |u ν ) is given by either Equation (24) or Equation (25). Then, the double outer summation of Equation (A3) over u N and x N can be written as where we split the summations over u N and x N into separate summations over u τ c and u τ , similarly with x τ c and x τ with the corresponding rearrangements in the probabilities, and written the term Q i τ c τ c (x τ c |u τ c ) in a similar way to Equation (A4). The inner summation of Equation (A3) can be split in an analogous manner based on the classes to whichû τ belongs, now indexed by the variable j τ ∈ K τ . Applying this fact together with Markov's inequality to upper bound the probability with a parameter s ≥ 0 that implicitly depends on the errorevent type τ and indices i τ c , i τ , and j τ . We bound the inner summation of Equation (A3) as where we also used that P N (û N ) = P τ c (û τ c )P τ|τ c (û τ |û τ c ) = P τ c (u τ c )P τ|τ c (û τ |u τ c ) to rewrite the message probabilities in Equation (A8) and we expressed the codeword Inserting Equations (A6) and (A8) into Equation (A3) and using the following inequality for A ≥ 0, min{1, A} ≤ min where ρ ∈ [0, 1], we further boundP τ e as P τ where after some minor rearrangementsP τ,j τ e,i τ c i τ is in turn given by Note that, for some conveniently chosen variables z 0 and z i τ , sets Z 0 and Z i τ , as well as functions f 0 (z 0 ) and f s i τ (z 0 , z i τ ), with i τ ∈ K τ , we can expressP τ,j τ e,i τ c i τ as Next, putting Equation (A17) back in Equation (A10) and using the following inequality, proved in Appendix A.1, for A i ≥ 0 and 0 ≤ s i ≤ 1 in the double summation over i τ and j τ in Equation (A10), the following upper bound holds where we have moved the optimization overρ τ inside the summation over i τ and renamed ρ τ as ρ, with the dependence on the index i τ kept implicit. Moreover, the expression for P τ e,i τ c i τ is in fact given byP τ,j τ e,i τ c i τ in Equation (A11) after setting i τ = j τ , s = 1 1+ρ and rearranging terms, that is, (A20) It remains to factorize Equation (A20) into a product of symbol distributions in order to obtain a single-letter expression for the exponent. We start by upper bounding the summations over the input messages u τ c and u τ . For a list of users σ with corresponding messages u σ , list of class indices i σ and some function p i σ σ (u σ ), we have that where we used the definition of the message sets A i σ σ in Equation (12) and the identity Using the upper bound for a, b, c > 0 with λ L , λ U ≥ 0, together with the fact that the source-message classes are defined separately for each user to express the source message probabilities in terms of P ind σ (u σ ) = ∏ ν∈σ P ν (u ν ) similarly to Equation (2), we upper bound the r. h. s. of Equation (A21) as where we jointly wrote λ L σ and λ U σ as λ L,U σ . Definining and taking into account that the sources are memoryless, we obtain that the summations w. r. t. the source messages u τ and u τ c in Equation (A20) are upper bounded as respecively for σ = τ and σ = τ c . We proceed in a similar manner for the summations w. r. t. the codewords x τ and x τ c in Equation (A20). For a list of users σ and some function q i σ σ (u σ , x σ ) implicitly defined, the summation over channel codewords x σ ∈ D i σ σ,u σ can be upper bounded as: where we used Equation (A22) in Equation (A27), the fact that the codeword ensembles are defined separately for each user together with the definition of the ensemble cost constraints in Equation (23) and subcodewords x ν I u ν (u ν ) in Equation (A28), and a variant of Equation (A23) proved in Appendix A.2, for r ∈ R andr ≥ 0, in Equation (A29) for each indicator function of Equation (A28) and combined the product of exponentials over σ as a single exponential using the list notation. We continue by rewriting the double product over υ σ and σ in Equation (A29) as follows where in Equation (A31) we wrote the cost function in terms of the symbol costs and in Equation (A32) we rearranged terms and introduced a factor β σ that depends on the list {r σ συ σ } and a function R i σ σ,u σ (x σ ) that depends on the list {r σ συ σ } and are respectively given by for both user lists σ = τ and σ = τ c . We now combine Equations (A26) and (A35) for σ = τ to bound the summation inside the parenthesis in Equation (A20) as where we expressed the distribution Q i τ τ (x τ , u τ ) in terms of the symbol-wise iid distribution Q i τ τ,u τ (x) as in Equation (25). Since both source and channel are memoryless, we may now factorize and rearrange the expression in Equation (A36) into single-letter, symbolwise factors as where, for a list of users σ, the function g i σ σ (u σ c , x σ c , y) is defined as Although not explicitly, the function g i τ τ (u τ c , x τ c , y) in Equation (A37) depends on several optimization parameters, namely ρ, λ L,U τ , r τ τu τ ,r τ τu τ , which depend in turn on the error-event type τ and class indices i τ and i τ c .
Again, we use Equations (A26) and (A35) for σ = τ c and the fact that the source is memoryless to upper bound the summation outside the parenthesis in Equation (A20) as With this definition, we can rewrite Equation (A40) in a compact manner as where we have also combined the complementary sets i τ c and i τ into i N , and similarly for Finally, substituting Equation (A42) into Equation (A10) and then back into Equation (A2), taking (minus) the logarithm of the bound onP e , dividing the result by n, and the limit as n → ∞, we obtain a lower bound E cost KL to the exponent of the generalized cost-constrained ensemble E cost KL , namely where we have used that as n → ∞, the quantities 2K τ , β τ c Ξ τ c , and β τ Ξ τ 1+ρ are subexponential in the blocklength n and do not contribute to the exponent, accordingly removedr τ c τ c u τ c and r τ τu τ from the optimization parameter list, and finally used that the exponential decay of the error probability in Equation (A2) will be dominated by the worst error type τ and the worst classes assignment i N . It will prove convenient to the express the exponent in terms of a Gallager function E

given by Equations (A25) and (A34), we may express
or equivalently in the alternative form where in Equation (A46) we have moved the product inside the parenthesis and merged terms in τ and τ c as done above, as well as redefined the optimization parameters 1+ρ as λ L τ c , λ U τ c , and r τ c τ c u τ c respectively. In this section we find some conditions describing the optimum partitioning of the source-message set into classes for the optimization of the exponent in Equation (30). For simplicity, let each user ν ∈ N have two classes, K ν = 2.
From the class definition in Equation (12) with K ν = 2, we have that γ ν,2 = 0 and γ ν,0 = 1, so we need find just one optimum γ ν,1 for each user, which redefine as γ ν . Optimizing the exponent in Equation (30) where one of the parameters λ L N or λ U N is zero for each i N , as the corresponding constraint is absent. For each γ N , we have a minimization over 2 N assignments i N . Following the same steps as in refs. [31] (Sec. 4.1.2) and [31] (Lemma 4.3), we find that E i N τ (γ N ) defined, with some abuse of notation, as is a non-decreasing (resp. non-increasing) function with respect to γ ν for i N = [i ν , i ν c ] with i ν = 1 (resp. i ν = 2), irrespective of the values of i ν c and of ν. For the sake of completeness, we present an independent proof of this fact here. Let i ν = 1 and τ be arbitrary. Using Equation (31), the function E i N τ (γ N , ρ) has the form − log ∑ z f 1 (z)/γ λ L ν ν ) for some function f 1 (z), as all γ N are independent from each other, regardless the value of i ν c . Since λ L ν ≥ 0, the function E i N τ (γ N , ρ) in Equation (A55) is non-decreasing with respect to γ ν . When i ν = 2, this function ) for some f 2 (z), and is therefore non-increasing. This behavior will not change after taking maximization over ρ. As the minimum of monotonic functions is monotonic, the function E i N τ (γ N ) is non-decreasing (non-increasing) with respect to γ ν , when i ν = 1 (i ν = 2).
Therefore, the optimal γ ν satisfies if [0, γ ν c ] is satisfied, we have γ ν = 0 or γ ν = 1 otherwise. Since Equation (A58) holds for any ν, evaluating it for each ν gives a system of equations for the computation of the optimal thresholds.
In ref. [31] (Sec. 3.2.1.1), we give a graphical interpretation of the solutions to Equation (A58) and outline the relevant differences with the single-user case. We observe a strong coupling between the exponent and the thresholds that prevents to find the optimal number of classes, suggesting that, unlike the single-user case, two classes might not be sufficient.
are specified using a single threshold γ ν for each user ν ∈ {1, 2}. With some abuse of notation, we include the optimization w. r. t. λ L,U N and make explicit the dependence on the thresholds γ N in the expression of the source function E i N s,τ in Equation (45), namely (A61) For i ν = 1, the set A 1 ν (γ ν ) in Equation (A59) has no upper threshold, hence we find that the optimal parameter λ U ν in this case isλ U ν = 0. Similarly for i ν = 2, we obtain that λ L ν = 0. As a consequence and without any loss of generality, we define λ ν = λ L ν for i ν = 1, and λ ν = λ U ν for i ν = 2, and further simplify Equation (A61) to the following optimization problem where we also used the definition of the functions Λ i σ σ in Equation (32) with σ = {1, 2}. We recall that P 1 and P 2 are the marginal distributions for users ν = 1 and ν = 2, respectively, and the indices i ν ∈ {1, 2} indicate that user ν transmits a source message selected from the class A i ν ν (γ ν ) in Equations (A59) and (A60). It can be shown that the objective function in the r. h. s. of Equation (A62) is convex w. r. t. both λ 1 and λ 2 . Hence, the minimizersλ 1 andλ 2 in the source function E i N s,τ (ρ, P N , γ N ) are respectively given byλ 1 = max{λ 1 , 0} andλ 2 = max{λ 2 , 0}, where λ 1 and λ 2 are the unique solution after setting the partial derivatives of the r. h. s. of Equation (A62) to zero. Two special cases can be obtained from Equation (A62).
The first case is when γ ν = 1 for ν ∈ {1, 2}, implying that no message partition happens whatsoever. In such a case, we have thatλ 1 =λ 2 = 0 and Equation (A62) reduces to the joint source-channel coding source function for correlated-sources in Equation (34), i.e., The second one is the case of independent sources. Substituting P N = P 1 P 2 in Equation (A62), after some algebra, we obtain that E i N s,τ (ρ, P N , γ N ) can be split into two terms as where we defined the function E i s (ρ, P, γ) as for arbitrary class index i ∈ {1, 2}, source distribution P and threshold γ. First, we find that the unique solution after setting the derivative of the r. h. s. of Equation (A65) to zero, denoted as λ , is implicitly given by ∑ u P(u) 1 1+α log P(u) where we made the convenient change of variable Although not made explicit, λ depends on the triplet (i, P, ρ, γ). When λ < 0, or equivalently when we have thatλ = max(0, λ ) = 0, implying that Equation (A65) simplifies to where E s (ρ, P) is the Gallager source function E s (ρ, P) = log ∑ u P(u) (A70) Otherwise, whenλ = λ ≥ 0 , a regime given by the following inequality we may substitute λ = λ in the objective function in Equation (A65) to obtain E i s (ρ, P, γ) = (1 + ρ) log ∑ u P(u) 1 1+α where we wrote the expression in terms of α . Using Equation (A66) into Equation (A72) to replace log(γ), we get E i s (ρ, P, γ) = (1 + ρ) log ∑ u P(u) 1 1+α (A73) After some algebra, we are able to express the former equation in terms of the derivative of the E s -function in Equation (A70), given by E s (ρ, P) = log ∑ u P(u) and the E s -function itself, as E i s (ρ, P, γ) = E s (α , P) + (ρ − α )E s (α , P).
Once E i s (ρ, P, γ) in Equation (A65) is fully characterized, we may now discuss the correlated-sources error function E i N s,τ (ρ, P N , γ N ) in Equation (A64) in terms of the error type τ. We start with the third error type τ = {1, 2}, for which since τ c = ∅, we have that E i N s,τ (ρ, P N , γ N ) = E i 1 s (ρ, P 1 , γ 1 ) + E i 2 s (ρ, P 2 , γ 2 ), namely the superposition of two E i s functions as the ones in Equations (A76) and (A77), one for each user. For the remaining of this appendix, we consider the more informative error types τ = {1} and τ = {2} for the four possible pairs of class indices i 1 and i 2 in Equations (A59) and (A60), since in this case E i N s,τ in Equation (A64) is either directly an E s (ρ, P τ ) function or the straight-line tangent to it, in both cases shifted by a constant term given by E i τ c s (0, P τ c , γ τ c ). Figure A1 shows the family of E i N s,τ source functions respectively for independent and correlated sources, as a function of ρ where P N given by Equation (60) and τ = {1}. For independent sources, we observe that the source functions E 1,1 s,τ and E 2,1 s,τ follow the solid blue line depicting E s (ρ, P τ ) as in Equation (A70) for a certain interval of ρ, and then take the tangent line beyond. A similar behavior is observed for the sources functions E 1,2 s,τ and E 2,2 s,τ , which in this case follow or are tangent to the solid black line, the solid blue Gallager's source function shifted by the constant function E i τ c s (0, P τ c , γ τ c ) as in Equation (A64). For correlated sources, the source functions E 1,1 s,τ and E 2,1 s,τ follow the generalized Gallager's source function given by Equation (A63) for a certain interval, but unlike independent sources they are not straight lines but a curve tangent to E s,τ beyond that interval. Some intuition about this fact can be gained from the primal form of the source function E i N s,τ . Consider, for instance, the source function E 2,1 s,τ in Figure A1 for correlated sources, for which i 1 = 2 and i 2 = 1. The primal form of this source function E 2,1 s,τ can be obtained as a constrained optimization problem w. r. t. some auxiliary joint distribution P N . The interval in ρ where E 2,1 s,τ does not follow E s,τ in the dual form (approximately for ρ ≤ 0.5 in the figure) corresponds to the case where only one of the two constraints on the auxiliary distributionP N is actually active in the primal form, where the constraint is given by ∑ u NP N (u N ) log P ν (u ν ) = log(γ ν ). This implies that, unlike the case of independent sources where each source has its auxiliary distributionP 1 andP 2 constrained, for correlated sources the joint auxiliary distributionP N is not fully constrained but is the union of joint distributions with one constrained marginal distribution. This partial constraint manifests as a curve in ρ, rather than a straight line, in the dual form. A similar behavior is observed for E 1,2 s,τ and E 2,2 s,τ , which instead of following the source function for joint source-channel coding in Equation (A63) for some intervals of ρ, they follow the curve corresponding to Equation (A62) when the constraint for one source is not active, i.e.,λ ν c = 0.