Appendix A.1. Proof of Theorem 1
(i)
Converse. Consider Fano’s inequality
$H(W|{Y}^{n})\le 1+{P}_{e}^{(n)}{log}_{2}M=F$ for
$F\stackrel{\mathsf{\Delta}}{=}1+{P}_{e}^{(n)}{log}_{2}M$. Since
$W\to ({X}_{1}^{n},{X}_{2}^{n})\to {Y}^{n}$ forms a Markov chain, we have that
Moreover, from the Markovity of
$(W,{W}_{12})\to ({X}_{1}^{n},{X}_{2}^{n})\to {Y}^{n}$, we obtain that
In the above derivations, the random variable
Q is uniformly distributed on
$[1:n]$ and
$Pr\{{X}_{1}={x}_{1},{X}_{2}={x}_{2}\}=\frac{1}{N}{\sum}_{i=1}^{N}Pr\{{X}_{1i}={x}_{1},{X}_{2i}={x}_{2}\}$ for
${x}_{1}\in {\mathcal{X}}_{1},{x}_{2}\in {\mathcal{X}}_{2}$. If now both
$\delta \to 0$ and
$n\to \infty $ we obtain that
for some distribution
$p({x}_{1},{x}_{2},y)=p({x}_{1},{x}_{2})p(y|{x}_{1},{x}_{2})$, for all achievable rate
R. This concludes the converse for the discrete memoryless two-encoder case.
(ii) Achievability. We prove that if the message rate $(1/n){log}_{2}M<{C}_{B}({C}_{12})$ for a given fronthaul capacity ${C}_{12}$, the message error probability ${P}_{e}^{(n)}$ approaches zero if the codeword length n increases. Our coding method is based on superposition.
Codebook Generation: First fix a joint probability distribution $\{p({x}_{1},{x}_{2}),{x}_{1}\in {\mathcal{X}}_{1},{x}_{2}\in {\mathcal{X}}_{2}\}$. This distribution determines ${p}_{{X}_{2}}({x}_{2})={\sum}_{{x}_{1}\in {\mathcal{X}}_{1}}p({x}_{1},{x}_{2})$ and ${p}_{{X}_{1}|{X}_{2}}({x}_{1}|{x}_{2})=p({x}_{1},{x}_{2})/{p}_{{X}_{2}}({x}_{2})$ for ${x}_{2}$ with ${p}_{{X}_{2}}({x}_{2})>0$. Now generate at random ${M}_{2}$ i.i.d. sequences ${x}_{2}^{n}\in {{\mathcal{X}}_{2}}^{n}$ of length n, each drawn according to $Pr\{{X}_{2}^{n}={x}_{2}^{n}\}={\prod}_{i=1}^{n}{p}_{{X}_{2}}({x}_{2i})$ and index these sequences as ${x}_{2}^{n}({w}_{2})$ as an inner code, where ${w}_{2}\in [1:{M}_{2}]$. Then, for each such ${x}_{2}^{n}({w}_{2})$, generate ${M}_{1}$ sequences ${x}_{1}^{n}({w}_{1},{w}_{2})$ drawn according to $Pr\{{X}_{1}^{n}={x}_{1}^{n}|{X}_{2}^{n}={x}_{2}^{n}({w}_{2})\}={\prod}_{i=1}^{n}{p}_{{X}_{1}|{X}_{2}}({x}_{1i}|{x}_{2i}({w}_{2}))$ in an i.i.d. fashion as an outer code, where ${w}_{1}\in [1:{M}_{1}]$. The resulting codebook is revealed to both encoders and to the decoder.
Encoding: Split the message W that is uniformly distributed on $[1:M]$ into $({W}_{1},{W}_{2})$ with $M={M}_{1}\times {M}_{2}$, where the first part ${W}_{1}$, which is uniformly distributed on $[1:{M}_{1}]$, is transmitted by encoder 1 and the second part ${W}_{2}$, which is uniformly distributed on $[1:{M}_{2}]$ and is conveyed to encoder 2 by ${W}_{12}$, is transmitted by two encoders cooperatively. Hence, when $({W}_{1},{W}_{2})=({w}_{1},{w}_{2})$, encoder 2 sends ${x}_{2}^{n}({w}_{2})$ while encoder 1 inputs ${x}_{1}^{n}({w}_{1},{w}_{2})$ into the MAC.
Decoding: Let
$\u03f5>0$. Based on the observed channel output sequence
${y}^{n}$, the decoder finds the message pair
$({w}_{1},{w}_{2})$ such that
where set
${\mathcal{A}}_{\u03f5}^{(n)}({X}_{1}{X}_{2}Y)$ is the set of jointly
$\u03f5$-typical sequences, see Cover and Thomas [
18]. If such a pair cannot be found, or if there are more than one such pairs, an error is declared.
Probability of Error: Due to symmetry, the average probability of error is equivalent to the probability of error for an arbitrary message
$w\in \{1,\dots ,{2}^{nR}\}$. Hence, without loss of generality, we assume
$W=w=({w}_{1},{w}_{2})$. Thus, we have
where
Due to the Asymptotic Equipartition Property (AEP), it can be shown that
for all
n large enough. Moreover
and
Now as long as
${P}_{e}^{(n)}\le 2\u03f5$ for all
n large enough. Therefore we take
then both (
9) and (
A10) are satisfied. Note that this implies that
If we now let
$\u03f5\to 0$, the achievability part of the Theorem 1 is thus established.
Appendix A.2. Proof of Theorem 3
The proof is a generalization of the proof of the two-encoder settings. Consider a simplified block diagram of the
N-encoder setting as shown in
Figure A1. Now, consider a cut of the fronthaul link between
${X}_{m}$ and
${X}_{m+1}$ for any given
$m\in [1:N-1]$ such that the nodes in the network are separated in two sets of
$\{{\underline{X}}_{1}^{m}\}$ and
$\{{\underline{X}}_{m+1}^{N},Y\}$.
Figure A1.
Simplified illustration of the MAC for the N-encoder setting.
Figure A1.
Simplified illustration of the MAC for the N-encoder setting.
(i)
Converse. Consider the Markovity of
$W\to ({X}_{1}^{n},{X}_{2}^{n},$ $\dots ,{X}_{N}^{n})\to {Y}^{n}$. By applying Fano’s inequality, we first have
Then, considering the cut between
${X}_{m}$ and
${X}_{m+1}$, we have that
Note that the above result is valid for any
m in
$[1:N-1]$. Thus, by letting
$n\to \infty $ the converse follows.
(ii)
Achievability. First consider the message
W that can be represented by
N independent messages as
$W={\{{W}_{i}\}}_{i=1}^{N}$, where each
${W}_{i}$ is uniformly distributed on
$[1:{M}_{i}]$ with
${\prod}_{N}^{i=1}=M$. Then, given by the linear topology of encoders, we distribute
${\{{W}_{i}\}}_{i=1}^{N}$ into the network in the manner illustrated by
Figure A2, i.e., for the link between any
${X}_{m}$ and
${X}_{m+1}$, the fronthaul message
${W}_{m,m+1}$ conveys corresponding messages
${\{{W}_{i}\}}_{i=m}^{N}$. Therefore, for a fixed distribution
$p({x}_{1},{x}_{2},\dots ,{x}_{N})$ and corresponding marginals, we can first generate
${M}_{N}$ i.i.d.
n-sequences
${x}_{N}^{n}({w}_{N})$ with
${w}_{N}\in [1,{M}_{N}]$ according to
$Pr({X}_{N}^{n}={x}_{n}^{N})={\prod}_{i=1}^{n}{p}_{{X}_{n}}({x}_{ni})$ and then for each
${x}_{N}^{n}({w}_{N})$ generate
${M}_{N-1}$ i.i.d.
n-sequences
${x}_{N-1}^{n}({w}_{N-1},{w}_{N})$ with
${w}_{N-1}\in [1,{M}_{N-1}]$ according to
$Pr({X}_{N-1}^{n}={x}_{N-1}^{n}|{X}_{N}^{n}={x}_{n}^{N}({w}_{N}))={\prod}_{i=1}^{n}{p}_{{X}_{n-1}|{X}_{n}}({x}_{n-1\phantom{\rule{0.166667em}{0ex}}i}|{x}_{ni}({w}_{N}))$ and so on. In this way, an
N-layer superposition codebook is generated and revealed at both encoders and decoder.
Figure A2.
N-layer superposition coding message structure.
Figure A2.
N-layer superposition coding message structure.
Thus, for sending a message
w, encoders transmit sequences
${\{{x}_{m}^{n}({w}_{m},{w}_{m+1},\dots ,{w}_{N})\}}_{m=1}^{N}$ over the MAC channel. At the decoder, a unique message tuple
$({w}_{1},{w}_{2},\dots ,{w}_{N})$ is found by using simultaneous typicality decoding as performed for the two-encoder case. By taking the similar probability of error analysis, it gives that, as long as
we can have
${P}_{e}^{(n)}\le 2\u03f5$ for all sufficiently large
n and any
$\u03f5>0$. By further considering
${M}_{N}\le {2}^{n{C}_{N-1,N}}$,
${M}_{N-1}{M}_{N}\le {2}^{n{C}_{N-2,N-1}}$,
…, and
${\prod}_{m=2}^{N}\le {2}^{n{C}_{12}}$, we can subsequently take
Finally, observe that (again subsequently)
which establishes the achievability for
$n\to \infty $ and
$\u03f5\to 0$.
Appendix A.4. Proof of Proposition 2
Three steps are taken in the proof. In step (1) we show that an active compound mode
$\langle j,k\rangle $ achieves the capacity if
$LB\le UB$ is satisfied. In step (2) we show that using any two separated active modes (all other modes are inactive) does not achieve the capacity. In step (3) we show that exact solutions of (
54) and (
55) are resulted.
Step (1) Note that function
g given in (
42) is convex-∩ when
$\lambda \le \frac{1}{k-1}$ if the largest activated mode is
k. So, we set the partial derivatives of
g with respect to
${\{{\beta}_{i}\}}_{i=1}^{N}$ according to the Kuhn–Tucker conditions, see ([
23], eqn.4.4.10 and eqn.4.4.11), when the active
compound mode $\langle j,k\rangle $ achieves the capacity. By considering (
42) in nats, the partial derivative of function
g with respect to
${\beta}_{i}$ is
where
$i\in [1:N]$.
Step (1.1) Firstly, by only considering that
compound mode $\langle j,k\rangle $ is active, i.e., all
${\{{\beta}_{i}\}}_{i=j}^{k}$ are nonzero, while the other
${\{{\beta}_{i}\}}_{i=1}^{j-1}$ and
${\{{\beta}_{i}\}}_{i=k+1}^{N}$ are zeros, the partial derivative can be expressed as
For simplicity, we denote that
Now, consider that the partial derivatives corresponding to
$i\in [j:k]$ should be all identical to some value
$\mu $, i.e.,
to find the capacity solution in terms of optimal distribution of
$\underline{\beta}$. Note that
For the case of
$k>j$, we can recursively evaluate the equalities in (
A23) as
$\partial g/\partial {\beta}_{i}=\partial g/\partial {\beta}_{i-1}$ by taking
i from
k to
$j+1$ in a descending order with the use of (
A21). In such a way, it is obtained that
Now, based on (
A24) and (
A25), we can derive expressions of
${\{{\beta}_{i}\}}_{i=j}^{k}$ by considering two scenarios.
Scenario 1: Consider
$k\ge j+2$, i.e., at least three consecutive modes are active. By taking
$i=j$ and
$i=j+1$, (
A25) can be used twice to obtain the equality
$\frac{j}{D(j)}=\frac{j+2}{D(j+1)}$ that gives the relation
For
$k>j+2$, expression (
A25) allows us to further obtain
by taking
i in the order of
$j+1$ to
$k-1$, which results in an interesting and important relation
Therefore, by applying relation (
A26), we can express
$D(k)$ as
which is valid for the case of
$k=j+2$ as well. So, by setting (
A25) equal to (
A24) with
$i=j$ as
and substituting
$D(k)$ in (
A29), we can first derive the power of modes from
$j+1$ to
$k-1$ as
According to (
A26), we can then obtain the power of the first mode as
Furthermore, owing to
${\sum}_{i=j}^{k}{\beta}_{i}=1$, we can finally represent the power of the last mode as
Now, applying the total power constraint, we should have
By substituting (
A33), (
A31), and (
A32), the corresponding slope
$\lambda $ should simultaneously satisfy
Since
$k>j+1$, it is easy to see that the lower bound of
$\lambda $ is
$\frac{1}{j(2+(j+1)P)}$. For the upper bound, if
$\frac{1}{2(k-1)}>\frac{jP}{2+jP+{j}^{2}P}$, it leads to
$P<\frac{2}{2jk-{j}^{2}-3j}$. Due to
$j\le k-2$, such
P results in
which contradicts the lower bound. Therefore, to make the
compound mode exist, the slope should be in the range
Scenario 2: Consider
$k=j+1$, i.e., the
compound mode only consists of two
modes. So, letting (
A24) equal to (
A25) gives the relation
By considering
${\beta}_{j}+{\beta}_{j+1}=1$ now, it is obtained
and a same
${\beta}_{j}P$ expression as in (
A32). If such
compound mode gives optimal solution, the condition
$0<{\beta}_{j}P<P$ should be also satisfied, which gives
$\lambda \in (\frac{1}{j(2+(1+j)P)},\frac{1}{2j})$ that is consistent with the result of (
A37).
Step (1.2) Secondly, consider that the derivative
$\partial g/\partial {\beta}_{k+i}$ for
$\forall i\in [1:N-k]$ with
$k<N$ should be less than
$\mu $ as denoted in Step (1.1). Since
${\{{\beta}_{k+i}\}}_{i=1}^{N-k}=0$, we have
which directly gives
Step (1.3) Finally, consider that the derivative
$\partial g/\partial {\beta}_{j-i}$ for
$\forall i\in [1:j-1]$ with
$j>1$ should be less than
$\mu $ as well. Since
${\{{\beta}_{j-i}\}}_{i=1}^{j-1}=0$, we have
Similarly, by upper bounding this derivative by
$\mu $ given in (
A24), we can have
where the summation can be computed by using relation of (
A25) as
By further incorporating
$\mu $ given by (
A24) and
$D(k)$ given by (
A29) into (
A44), the inequality becomes to
By substituting (
A31) and (
A33) for
${\beta}_{j+1}P$ and
${\beta}_{k}P$, it can be easily shown that
where
$i\ge 1$ is applied in the bounding for the last step.
Note that, Step (1.2) and (1.3) and resulted bounds of
$\lambda $ also cover the the case of
$j=k$, i.e., only one mode is active and achieves the capacity. Now, by considering the ranges given by (
A37), (
A42), and (
A47), the slope bounds in (
52) and (
53) are resulted.
Step (2) Assume that the capacity can also be achieved by only activating any two separated modes
${j}^{\prime}$ and
${k}^{\prime}$, where
${j}^{\prime}\in [1:{k}^{\prime}-1]$ and
${k}^{\prime}\in [{j}^{\prime}+1,N]$. Then, the Kuhn–Tucker condition requires
$\partial g/{\beta}_{{k}^{\prime}}=\partial g/{\beta}_{{j}^{\prime}}$, which results in the relation of
Note that in
$D({k}^{\prime})$ only
${\beta}_{k}^{\prime}$ and
${\beta}_{j}^{\prime}$ are nonzero. Moreover, for
$\forall i\in [1:{k}^{\prime}-{j}^{\prime}-1]$, we have
By substituting the relation (
A48) into above derivative, it can be shown that
where the last step is due to
${k}^{\prime}-i>{j}^{\prime}$. This result contradicts to the Kuhn–Tucker condition. This demonstrates the only compound modes achieves the capacity.
Step (3) For a slope
$\lambda $ in the range (
52) and (
53), the obtained optimal
$\underline{\beta}$ from Step (1.1) can be used to evaluate (
38) and (
39) directly. Thus, (
54) and (
55) are resulted, respectively.
Appendix A.6. Proof of Lemma 2
By following the method in [
20], we define
$f(x)=x-ln(x+1)$ and prove
for
$x\ge 0$. Once this is done, one can take the procedure applied in ([
20], Thm. 1) and the lower bound (
61) follows.
Now, we start converting (
A51) to an equivalent problem. First, by substituting
$f(x)$ in (
A51), we need to prove
By denoting
$t:=x+1$ with
$t\ge 1$ and considering
${x}^{3}$ and
${x}^{\frac{1}{3}}$ are monotonically increasing in
x for
$x\ge 0$, showing (
A52) is equivalent to show
let
$v(t):=2t-2-lnt-{(lnt)}^{3}$, we can focus on showing
since
$v(1)=0$. Considering
$t\ge 1$, we finally convert proving (
A51) to demonstrating
To do so, we take the first derivative of
$\psi (t)$ and set it to zero to have
By solving (
A56), we can evaluate the local maxima and/or local minima of
$\psi (t)$. It can be seen that (
A56) has only two real roots for
$t\ge 1$, which are
where
${W}_{0}(\xb7)$ is the principle branch of the Lambert
W function defined over
$[-{e}^{-1},\infty )$. Based on the property of the Lambert
W function,
$1<{t}_{1}<{t}_{2}$. Therefore, since
$\psi (1)=0$, we can have two scenarios as
Scenario 1: if $\psi ({t}_{1})>0$, we must have $\psi ({t}_{2})<\psi ({t}_{1})$, i.e., ${t}_{1}$ gives a local maxima and ${t}_{2}$ gives a minima;
Scenario 2: if $\psi ({t}_{1})<0$, we must have $\psi ({t}_{2})>\psi ({t}_{1})$, i.e., ${t}_{1}$ gives a global minima and ${t}_{2}$ gives a maxima.
Hence, if we can prove that
$\psi ({t}_{1})>0$ and
$\psi ({t}_{2})>0$, both, we can conclude (
A55) and thus (
A51). We show this in what follows.
By setting
${t}^{\star}={t}_{1}\phantom{\rule{0.166667em}{0ex}}or\phantom{\rule{0.166667em}{0ex}}{t}_{2}$,
$\psi ({t}^{\star})$ can be expressed as
If
$\psi ({t}^{\star})>0$, it is equivalent to have
${({t}^{\star})}^{2}-6{t}^{\star}+6<0$, which results in the range of
must be satisfied. For
${t}_{2}$, by applying the bounds given in ([
20], Thm. 1) which is
$-1-\sqrt{2x}-x<{W}_{-1}(-{e}^{-x-1})<-1-\sqrt{2x}-\frac{2}{3}x$, it is easily obtained that
$4.53<{t}_{2}<4.62$, which is in the range (
A59). So,
$\psi ({t}_{2})>0$ follows. For
${t}_{1}$, we apply an upper bound on
${W}_{0}(\xb7)$ given in ([
24], Thm. 2.3), which is
for
$x>-{e}^{-1}$ and
$y>{e}^{-1}$. By taking
$y=0.5$,
${t}_{1}$ can be bounded as
${t}_{1}>1.8$. By also considering
${t}_{1}<{t}_{2}$,
${t}_{1}$ must in the range (
A59) as well, which completes showing
$\psi ({t}_{1})>0$.
Numerical evaluation of the bound (
61) verifies the proof as illustrated in
Figure A3, where the bound in [
20] is plotted as a reference as well.
Figure A3.
The bounds on the Lambert function ${W}_{-1}(-{e}^{-(x+1)})$.
Figure A3.
The bounds on the Lambert function ${W}_{-1}(-{e}^{-(x+1)})$.