On the Downlink Capacity of Cell-Free Massive MIMO with Constrained Fronthaul Capacity

We investigate the downlink of a cell-free massive multiple-in multiple-out system in which all access points (APs) are connected in a linear-topolpgy fronthaul with constrained capacity and send a common message to a single receiver. By modeling the system as an extension of the multiple-access channel with partially cooperating encoders, we derive the channel capacity of the two-AP setting and then extend the results to arbitrary N-AP scenarios. By developing a cooperating mode concept, we investigate the optimal cooperation among the encoders (APs) when we limit the total fronthaul capacity, and the total transmit power is constrained as well. It is demonstrated that achieving capacity requires a water-pouring distribution of the total available fronthaul capacity over the fronthaul links. Our study reveals that a linear growth of total fronthaul capacity results in a logarithmic growth of the beamforming capacity. Moreover, even if the number of APs would be unlimited, only a finite number of them need to be activated. We found an expression for this number.


Introduction
Recently, cell-free massive multiple-input multiple-output (mMIMO) has been considered as a key technology for beyond-5G networks. In such user-centric transmission systems, a large number of distributed access points (APs) are connected to one central processing unit (CPU) via fronthaul links and phase coherently cooperate to cover a wide area for a small number of users in the same time-frequency resource using time-division operation. Compared to cell-based collocated mMIMO solutions, such technology improves energy-spectral efficiency and enhances immunity to shadow fading without extra signal processing burdens. We refer to [1-3] and the references therein for a general overview of current developments of cell-free mMIMO.
Effectively utilizing fronthaul resources is of critical importance for deploying a scalable cell-free mMIMO system. Considering the downlink for instance, simple distributed conjugate beamforming is optimal, as shown in [1]. However, it can already be seen that a large amount of information exchange over fronthaul links is required since all the APs need to know the message that is to be transmitted. A star-topology fronthaul where each APs are individually connected to a CPU was originally modeled and has been widely studied, see, e.g., [4][5][6] and the references therein. Currently, a serial fronthaul connecting APs in a linear topology is considered for achieving a cost-efficient architecture, both in deployment and maintenance [3]. A novel and promising technique relying on a linear topology is the radio stripe system, where multiple APs are embedded in a cable/strip, see [3,7] in detail. Such radio stripes can be easily and invisibly deployed indoor or outdoor in existing constructions to enable numerous new applications [8].
The focus of prior work in cell-free mMIMO study was on developing wireless signaling techniques. In this paper, we study from an information-theoretic perspective the downlink of a cell-free mMIMO system shown by Figure 1, where single-antenna APs are connected in a linear topology with constrained fronthaul capacities to communicate to one single-antenna terminal receiver (Rx).  The considered multiple-in single-out (MISO) setup forms a distributed massive beamforming system and can be formulated as a multiple-access-channel (MAC) with limited fronthaul capacity, which is defined as the maximal amount of information that can be reliably sent per MAC channel use [9]. By investigating the channel capacity of such a MAC, we reveal essential relations between the three the most fundamental resources of the system, i.e., the total available number of APs (N), the total transmit power (P), and total available fronthaul capacity (C B ). Specifically, in the current cell-free mMIMO literature, the only configuration of APs that is considered is where full cooperation (full beamforming) is realized and where the same information is shared at all involved APs. Therefore, for a real-valued Gaussian MISO channel with N APs and unity channel gains, the maximum downlink rate is given by the channel capacity C full := 1 2 log 2 (1 + N · SN R) bits/channel use (1) where SN R is the received signal-to-noise ratio (SNR) if only one AP is active with all available transmit power assigned to it. It requires C full B := (N − 1)C full fronthaul capacity among N APs. In this work, we focus on the case where the available fronthaul capacity is not large enough to support full cooperation of the APs. We were motivated to investigate the achievable downlink rates given that fronthaul resources for communication between the APs is constrained. We call this setting partial beamforming, since C B < C full B . We could derive the channel capacity and the optimal cooperation strategies among APs for given total available P and N.

Related Work
We can model the studied system as a special extension of the multiple-access channel (MAC) with partially cooperating encoders studied by Willems [9]. In particular, we can generalize the system setup in [9] to a network of encoders by considering only one source but employing an arbitrary number of encoders, namely APs, via unidirectional conferences. Since fronthaul links can be treated as separate channels that are orthogonal to the beamforming MAC, our setup might also be viewed as an extension of a special case of the orthogonal-component relay channel due to El Gamal and Zahedi [10], which is generalized to relay networks by Ghabeli and Aref in [11]. In addition, if only two APs are considered, our study is also strongly related to the multiple access diamond channel as studied in [12,13]. Moreover, the two APs setup looks very similar to the semi-deterministic relay channels [14]. Furthermore, it is also worth to note that in our system, all APs cooperatively send one message to a receiver at a same time. In this sense, our channel setting is "noncausal", which is related to the relay-with-delay channel studied in [15] in general.

Contributions and Organization
By investigating the MAC with limited fronthaul capacity in the discrete channel case and in the Gaussian channel cases, the main findings of our research work include

•
The channel capacity is found for an arbitrary number of APs for both discrete channel and the Gaussian channel with constrained transit power, where the total fronthaul capacity and the total number of APs are limited.

•
When numerous APs are engaged, a linear growth of total fronthaul capacity results in a logarithmic growing of the channel (beamforming) capacity. • A concept of cooperating modes is developed to demonstrate the optimal cooperation among APs to achieve capacity based on superposition coding.

•
When the channel capacity is only limited by the fronthaul capacity, the number of required APs is quasi-linear to the available fronthaul capacity even if the number of APs would be unlimited.

•
A new and sharp lower bound of the Lambert-W function is derived for computing the number of required APs given by the total fronthaul constraint.
In the rest of this paper, the system model is first presented in Section 2. In Section 3, we start with investigating a two-APs setting consisting of one fronthaul link. This setting serves as a baseline system where the cooperating mode concept is developed. In Section 4, the study is extended to the case where an arbitrary number of APs is engaged and the behavior and exact solution of the channel capacity is derived. In Section 5, the number of required APs is derived to leverage limited fronthaul resources if the number of available APs is unlimited. Finally, the conclusion and final remarks can be found in Section 6. Detailed proofs and derivations of the presented results are collected in the Appendix A. Partial material in this paper was presented in [16].

Notation
Although all the paper, capital letters, e.g., X, denote random variables, and their realizations are denoted by small letters, e.g., x. The probability mass or density function according to X is denoted by p X (x) or simply p(x). The expectation of X is denoted by E[X]. The entropy of X is denoted by H(X) and the differential entropy is denoted by h(X). The mutual information between X and Y is denoted by I(X; Y). The consecutive integer range from i to j with i ≤ j is denoted by [i : j]. In addition, a set of elements x m with index m in range of i to j is denoted as {x m } j m=i .

System Model
The investigated system is modeled as Figure 2, where we denote the CPU as the source, the APs as encoders, while for the destination, the receiver is denoted as the decoder. As plotted, one-directional fronthaul links connect N adjacent encoders that simultaneously send a uniformly distributed message W ∈ [1 : M] to a decoder (receiver). We focus on the study of the fronthaul resource usage among all encoders. The discrete memoryless MAC denoted by (X 1 × X 2 × . . . × X N , p(y|x 1 , x 2 , . . . , x N ), Y, {C m,m+1 } N−1 m=1 ) consists of input alphabets {X m } N m=1 , output alphabet Y, a transition probability distribution p(y|x 1 , x 2 , . . . , x N ), and a set of fronthaul capacity constraints {C m,m+1 } N−1 m=1 between N encoders. Before the beginning of each n channel uses, (partial) information about the generated message W is first shared among N encoders. Let W m,m+1 ∈ [1 : M m,m+1 ] for m ∈ [1 : N − 1] be the message sent over the fronthaul link between encoder m to encoder m + 1. Then, the encoders map the messages W and {W m,m+1 } N−1 m=1 into codewords {x n m } N m=1 as follows where {e m (·)} N m=1 are the corresponding encoding functions. Meanwhile, the generated fronthaul messages should satisfy 1 n log 2 M m,m+1 ≤ C m,m+1 .
As presented, the corresponding fronthaul link capacity C m,m+1 ≥ 0 is defined as the maximal amount of information that can be reliably sent per channel use of the MAC channel over the link from encoder m to encoder m + 1. At the decoder, a deterministic decoding function d : Y n → [1 : M] is applied to obtain the message-estimate W based on the channel output y n . We define the average probability of error at the decoder as P Now we say that a rate R is achievable with given fronthaul capacities {C m m+1 } N−1 m=1 if there exists N encoders and a corresponding decoder, such that for all δ > 0 and large enough n. The channel capacity C (of MAC) as a function of the fronthaul capacities is defined as the supremum of all achievable rates given by all the fronthaul constraints. Eventually, we will be interested only in a constraint on the sum of the fronthaul capacities C B that is defined as To interpret the capacity results for the partial beamforming, we focus on MACs with additive white Gaussian noise. At the output of the Gaussian MAC, the decoder receives at time i, where X mi is the transmitted symbol by encoder m and Z i is modeled as independent and identically distributed (i.i.d.) Gaussian noise at the decoder for all i ∈ [1 : n]. For individual encoder m, m ∈ [1 : N], the transmit power constraint is for P m ≥ 0. Then, the total transmit power is limited as Without loss of generality, we assume that Z i ∼ N (0, 1). Therefore, the transmit SNR can be directly represented by the total constrained transmit power P.

Two-Encoder Result
We first investigate the simplest system setting where only two encoders are involved. The MAC is now denoted by (X 1 × X 2 , p(y|x 1 , x 2 ), Y, C 12 ). The fronthaul message W 12 ∈ [1 : M 12 ] must satisfy the constraint 1 n log 2 M 12 ≤ C 12 , which is same to the total fronthaul capacity C B in this case. The underlying Gaussian MAC is given by Although this two-encoder setting can be considered as a special case of related work, see discussion later, we provide here the capacity proofs for both discrete and Gaussian MACs. The applied approach carries over to the N-encoder setting that is investigated in Section 2.
In the following, the channel capacity as a function of the fronthaul capacity is first obtained for the discrete memoryless MAC. Then, we derive capacity results for the Gaussian case with total transmit power constraint. Within this study, a so-called cooperating mode concept is developed that will be very useful to provide cooperation insights among encoders when more of them are engaged.

Discrete Channel
First, consider the discrete channel setup.
Theorem 1. For the discrete memoryless channel p(y|x 1 , x 2 ), the channel capacity C as a function of the fronthaul capacity C 12 is given by where distribution p(x 1 , x 2 , y) = p(x 1 , x 2 )p(y|x 1 , x 2 ) is determined by the input distribution p(x 1 , x 2 ).
The detailed proof is provided in Section Appendix A.1, where the converse is based on the Markovities of W → (X n 1 , X n 2 ) → Y n and (W, W 12 ) → (X n 1 , X n 2 ) → Y n , and the achievability is based on applying superposition coding. For the achievability, the source splits the message W into two parts (W 1 , W 12 ) and delivers the index of W 12 over the fronthaul link to encoder 2 that maps W 12 into the inner code while encoder 1 of the source encodes W 1 into an outer code-word which is super-imposed on the inner code-word. Although this coding scheme is simple, the cooperating mode concept that is important for studying the multi-encoder setup will be developed based on the superposition scheme as discussed later.
Remark 1. By viewing the two-encoder setting as a special setup of ([9], Figure 1), where only one source and one conference link are deployed, we can have Theorem 1 by letting the common message U = X 2 and the conference capacity C 21 = 0 in ( [9], Thm.). Note that the achievability in [9] which is based on binning becomes superposition coding.

Remark 2.
By viewing the two-encoder setting as a special setup of the multiple access diamond channel where one source connects to two encoders (relays) by using two separate noiseless links, see [12,13], Theorem 1 can also be obtained if letting C 1 = ∞, the common message V = X 2 , and the common message rate R 0 = C 2 (or C 2 = ∞, V = X 1 , R 0 = C 1 ) in ( [12], Thm. 2). Note that the achievability based on superposition and Marton-coding in [12,13] becomes superposition coding only.

Gaussian Channel
Now we consider the Gaussian MAC of the two-encoder channel setting given by (10) with total power constraint P, i.e., P 1 This first leads to the following result.
Theorem 2. The channel capacity C(C 12 , P) of the two-encoder Gaussian MAC is C(C 12 , P) = max 0≤β≤1 min 1 2 log 2 (1 + (1 + β)P), The proof is the adaptation of the discrete channel version given in Section Appendix A.1 by considering the transmit power constraints and Gaussian channel noise.
Proof. (i) Converse. First note that without loss of generality (and without violating the power constraints) we may assume that all E[X 1i ] = E[X 2i ] = 0 for all i ∈ [1, n]. If we define (X 1 , X 2 , Y) being the random triple with density p X 1 ,X 2 ,Y (x 1 , x 2 , y) = 1 N ∑ N i=1 p X 1i ,X 2i ,Y i (x 1 , x 2 , y) then converse in Section Appendix A.1 shows that where the random variables X 1 , First consider the random pair (X 1 , X 2 ). By applying the Cholesky factorization ( [17], Thm. 4.2.7) to the covariance matrix of [X 2 , X 1 ] T , the assignment of can be obtained, where S 1 and S 2 are uncorrelated with zero means and unit variances. Next, observe that if we take α 21 = α 22 = (α 21 + α 22 )/2 = α 2 this choice does not affect I(X 2 ; Y) = I(S 2 ; Y) and I(X 1 ; Y|X 2 ) = I(S 1 ; Y|S 2 ), but minimizes the total transmit power for fixed α 21 Therefore we only need to consider assignment Now we take α 2 2 = P 2 , α 2 2 + α 2 1 = P 1 , and 2α 2 2 + α 2 1 = P. By denoting in [0, 1], we further have Taking the signal assignment (19) and the power assignment (21) gives that where (a) and (b) follow by the maximum differential entropy theorem, see ( [18], Thm. 8.6.5).
(ii) Achievability. Taking the assignment (19) by letting S 1 ∼ N (0, 1) and S 2 ∼ N (0, 1). Using the power assignment (21) directly gives The rest of the proof follows by first establishing a coding theorem for the discrete memoryless channel with input cost (power constraint). The step from discrete to Gaussian channels is justified by the relation between differential entropy and discrete entropy, see, e.g., ( [18], Thm. 9.3.1). Now, by optimizing over β in (13), we can further express C as a function only in total transmit power P and total fronthaul capacity C B , which is C 12 for this two-encoder setup. Corollary 1. The channel capacity C(C B , P) of the total transmit power constrained two-encoder Gaussian MAC can be expressed as Proof. The two logarithms on the RHS of (13) are monotonically increasing and decreasing in β respectively and equal to each other at β = 0. Hence, we can set to obtain the β that maximizes C for ∀C B ∈ [0, 1 2 log 2 (1 + 2P)] as This results in the second capacity expression in (26). Then, if C B > 1 2 log 2 (1 + 2P), the second term is always larger than the first term for any β in (13). This corresponds to the situation where C B is large enough and the transmission over the MAC is the bottleneck of the network. In this case, C remains at its global maximum.
Note that, for C B < 1 2 log 2 (1 + 2P), the first term of the capacity result (26) is the channel capacity with no beamforming and the second term directly represents the partial beamforming gain that is independent of transmit power P and only grows as the fronthaul capacity increases. As revealed, the partial beamforming gain increases with a same rate regardless of the transmit power P.

Cooperating Modes
Based on assignment (19) that possesses a superposition structure, we can naturally denote two cooperating modes as what follows to describe the optimal cooperation between the encoders for the capacity achieving.
• mode 1: Sending a private message given by α 1 S 1 from encoder 1; • mode 2: Coherently sending a common message given by α 2 S 2 from encoder 2 and encoder 1.
According to (20), the parameter β represents the fraction of the total transmit power assigned to mode 2 while 1 − β represents the remaining fraction assigned to mode 1. Note that β given by (28) should be taken for achieving the capacity. Now consider the cooperation scenarios of the two encoders based on the availability of C B . If C B = 0, the transmission reduces to the point-to-point communication case. This is represented by having only mode 1 active and encoder 2 is inactive. If C B ≥ C full , full cooperation can be achieved by activating mode 2 only. For C 12 ∈ (0, C full ), two encoders cooperate to achieve partial beamforming capacity by activating both cooperating modes. Figure 3 illustrates the cooperating modes activating and deactivating at encoders depending on C B increasing from 0 to C full . For the two-encoder setting, the modes evolution due to available amount of C B looks straightforward. Nevertheless, it will be shown that this cooperating modes interpretation provides a clear insight of leveraging available encoders for given certain total fronthaul and transmit power constraints, where the optimal cooperation is not trivial as the number of encoders goes largely.

N-Encoder Result
Based on the investigations of the two-encoder setting, we extend the study to the system model with arbitrarily N encoders, where N ≥ 2. The parameter N in principle can be any large integer so that a distributed massive beamforming is obtained. The investigation is focused on the Gaussian MAC under the constraints of the total fronthaul capacity C B and the total transmit power P, which are defined by (5) and (8), respectively. Before addressing the exact capacity solution for arbitrary N encoders, we first derive capacity bounds of C(C B , P) to provide a general behavior of channel capacity C in total fronthaul capacity C B . The obtained result indicates that the growth of C requires an exponential growth of C B . By using the compound mode, the exact capacity solution with the optimal cooperation among encoders are derived. The results show that the distributed beamforming system works most efficiently when it is working in its fronthaul-capacity-limited regime. As a result, we consider the case where encoders are always available to be activated as needed to leverage the entire fronthaul resource.

Discrete Channel
For simplicity, let the tuple X m l (X l , X l+1 , . . . , X m ) be the collection of ordered transmitted random variables that are generated at encoder l to encoder m with l ≤ m for one channel use. In addition, let C b {C j,j+1 } N−1 j=1 be the collection of the corresponding fronthaul capacities.
Theorem 3. For the discrete memoryless N-encoder setting, channel capacity C of the channel P(y|x 1 , x 2 , . . . , x N ) as a function of fronthaul capacities C b is with N ≥ 2.
A sketch of the proof is given in Section Appendix A.2. As shown in the achievability, the capacity is achieved by applying an N-layer superposition coding among the encoders, which naturally agrees with the studied linear topology.

Gaussian Channel under Total fronthaul Constraint
By considering on the total power and separate fronthaul constraints, we first have the following result. Theorem 4. The the N-encoder Gaussian setting with the total transmit power constraint of P, the channel capacity C as a function of the fronthaul capacities C b is where β = (β 1 , β 2 , . . . , β N ) T is a probability vector.
Proof. Similar to proof of the two-encoder setting, the generic signal assignment can be used at each encoder for m ∈ [1, N], where {S l } N l=1 are uncorrelated and have zero mean and unit variance. Again, we can further apply the special signal assignment to minimize the total transmit power without affecting dependency of the different signals {S l } N l=1 at the decoder that determines the beamforming capacity. In this way, the transmit power allocated for signal S l can be expressed by such that ∑ N l=1 β l = 1. Thus, for the converse, we can use the assignment (32) to evaluate (29) and the mutual informations on the RHS are bounded as given by (30). Then, for the achievability, by letting S l ∼ N (0, 1), the result follows.
The proof shows that all the transmitted signals at encoders should form a Markov chain X N → X N−1 → · · · → X 1 . Again, since signal α l S l represents the common messages used at first l encoders, we say that cooperating mode l is active if the signal S l is generated and sent and there can be N cooperating modes in total for this N-encoder setting. Now, we can solve the optimization problem where C(C b , P) is given by (30), to investigate the total power limited capacity C under the constraint of total fronthaul capacity C B for a given P. To do so, we first prove the following lemma. Note that the full-cooperation capacity is now C full (N) = 1 2 log 2 (1 + NP) when N encoders are used. For simplicity, we denote the mutual informations as for any m ∈ [1 : N].
Lemma 1. For the N-encoder setting with any given C B ≤ (N − 1)C full (N), power distribution β can only be optimal if equality of all the terms on the RHS of (30) is achieved.
The proof is given in Appendix A.3.

Remark 3.
Lemma 1 indicates that asymmetric distribution of C B over fronthaul link is optimal. This result will be further demonstrated after the capacity result is derived.
Based on the reduced β set given by Lemma 1, we make the terms on the RHS of (30) equal and have Thus, the channel capacity and the required total fronthaul capacity in the power allocation vector β can now be represented as and (38) and (39), we can have the following theorem.

Theorem 5.
For the N-encoder Gaussian channel under the total power constraint P and total fronthaul constraint C B , the channel capacity is given by To evaluate the channel capacity, we only need to maximize the function by introducing a Lagrange multiplier λ as under the constraint that β is a probability vector to derive the solution of C(C B ) for the general N-encoder case. Note that the parameter λ is the slope of C(C B ). However, before working out the exact solution of this optimization problem, we first derive general bounds of C(C B , P) to reveal the capacity behavior of the studied distributed beamforming.

Capacity Behavior Bounds
To obtain a simple but meaningful insight of the relation between C and the constrained C B and P for an arbitrary N, we propose an upper bound and a lower bound of the channel capacity to draw the following conclusion.
Property 1. For any fixed total transmit power P and number of encoders N, a linear growth of total fronthaul capacity C B results in a logarithmical growing of the channel capacity as C can be bounded as C ≤ 1 2 log 2 (1 + P) + 1 2 log 2 (1 + 2 ln 2 · C B ), and C > 1 2 log 2 (1 + (2 ln 2 · PC B ) 2/3 ).
(1) Upper bound. By considering L ∈ [1 : N] as a random variable with distribution β, the capacity (38) can be expressed as where µ L E[L]. By applying Jensen's inequality, the corresponding fronthaul capacity C B in (39) can be lower bounded as 2 ln 2·(µ L +µ L P) = (µ L −1)P 2 ln 2·(1+P) that results in µ L ≤ 1 + 2 ln 2 · ( 1+P P )C B and thus (43). 2) Lower bound. Consider time-sharing of the rates given by only using one cooperating mode. Hence, the channel capacity should be larger than or equal to an achievable rate R as where k is the number of the activated encoders corresponding to the required total fronthaul capacity that achieves R. By applying ln x ≤ x−1 √ x for x ≥ 1, see, e.g., ([19], Section 3.6.15), we have that gives an upper bound of C B as Therefore, we can have k 3 > (2 ln 2 · C B ) 2 /P that directly gives (44).
The upper bound (43) and lower bound (44) thus indicate the logarithmical behavior of C in C B . Figure 4 gives an illustration of these two bounds for 10-encoder Gaussian setting where total transmit power is set at P = 21. The exact capacity solution derived shortly is plotted as well as a comparison, showing that the bounds describes the capacity behavior.

Compound Mode and Exact Solution
In what follows, we perform evaluation of the channel capacity given in Theorem 5 by defining a compound mode j, k as a collection of all consecutive cooperating modes between and including modes j, k ∈ [1 : N] with j ≤ k. A compound mode j, k is referred to as active if all {β l } k l=j are nonzero and the other elements in β are zeros. Note that using a single mode is a special case of compound mode. By denoting b(j) 1 j(2 + (j + 1)P) , we have the following results.

Corollary 2.
For an N-encoder Gaussian setting where C B ≤ (N − 1)C full with a fixed transmit power P, if there is a compound mode j, k such that UB ≥ LB, where the channel capacity corresponding to the slope λ ∈ [LB, UB] is achieved and only achieved by using that compound mode which gives and The proof is given in Appendix A.4.

Remark 4.
The proof in Appendix A.4 shows that if compound mode j, k achieves the capacity and k ≥ j + 2, the modes in [j + 1 : k − 1] should be assigned with same power amount as the optimal setting.

Remark 5.
By rewriting (55) and comparing it to (54), we can also represent C B in terms of C as for a certain slope λ ∈ [LB, UB].

Modes Selection for Capacity Achieving
The results in Corollary 2 state how the capacity is achieved and expressed over a certain λ range. To further elaborate how to exactly use cooperating modes from no cooperation to full cooperation, a procedure efficiently activating modes is developed based on applying the following result, where an identification of valid compound modes that are the ones resulting in capacity is provided in terms of using a power penalty.
where · is the ceiling function. If (57) is satisfied, compound modes j , k with j > j and k < k do not achieve the capacity.
The proof is given in Appendix A.5. The power condition (57) indicates that a compound mode needs certain transmit power to be supported to be optimal. On the other hand, some compound modes can never be optimal if the transmit power is too large. Now, note that C is monotonically increasing in C B owing to nonnegative slope λ and monotonically decreasing in λ according to (54) when j and k are fixed. It shows that to achieve the capacity, compound modes should be activated in a way such that the corresponding slope range varies from large to small as C B increases. Therefore, based on the results in Corollary 3, an algorithm is resulted for computing C and C B over C B ∈ [0, (N − 1)C full ] by activating valid compound modes sequentially.
Algorithm 1 represents the cooperating strategy among encoders. It reveals that cooperating modes should be activated one-by-one to form new compound modes with the increase of C B . At certain point of the growth of C B , the first mode dies, i.e., deactivated owing to the limited P or N. With the further increasing of C B , lower modes die in a one-by-one fashion till the full cooperation is obtained. Figure 5 plots the results for P = 1 and P = 21 by applying Algorithm 1, C(C B ) over the full range of C B ∈ [0, (N − 1)C full ]. Different number of the available encoders are considered. In the plot, each color segment represents the corresponding activated compound mode. In addition, the pentagram markers label the points where a lower mode has to be deactivated (dead) because of the power penalty (57) or because all N encoders are all used up, namely operations in line 13 and line 9 of the algorithm, respectively. It is shown that for low SNR, i.e., P = 1, the modes die fast due to the small power. On the other hand, for large SNR, i.e., P = 21, the larger available encoder number the slower the modes die such that higher capacity can be achieved (consider curves of using 2-encoder, 3-encoder, 4-encoder, and 5-encoder).
Algorithm 1 Compute C and C B from no cooperation to full cooperation Initialize: j ← 1 and k ← 1 Ensure: 1 ≤ j ≤ k ≤ N 1: while j < N do 2: if Power condition (57) is satisfied then 3: λ ← [LB, UB]

Property 2.
As probably the most natural strategy, the way of applying modes in the lower bound proof of Proposition 1, i.e., time-sharing full cooperation of small number of encoders, is not optimal in general. However, it is sub-optimal when SNR is small as the compound modes that achieve capacity reduce to single modes.
To visualize each mode evolution from no beamforming to total beamforming, we can illustrate the power allocation for each cooperating mode as C B increases. By incorporating calculations of β (given in Appendix A.4) and (37) into Algorithm 1, Figures 6 and 7 show the modes' power evolution of the 10-encoder setting for P = 5 and P = 21, respectively. It is shown that the first mode dies faster when P is relatively small. They also interestingly show that once C B is large enough to approach the total beamforming, the last mode dominates as other modes all vanish.
Moreover, we can also elaborate the cooperating of encoders in terms of showing optimal distribution of C B over fronthaul links. Figure 8 illustrates the distribution of the 10-encoder setting where the bolder curves are for P = 5 while the lighter curves are for P = 21. In each case, the fronthaul capacity curves for C m m+1 for m = 1 to m = 9 are located from left to right in the plot. This result further demonstrates the asymmetric water-pouring assignment of C B over fronthaul links, see Remark 3.

1, k Mode and Capacity Regimes
Consider the case where 1, k mode achieves the capacity for k ≤ N. In this case, the growth rate of C(C B ) is independent of P and N, see the expression of C B in (56) with j = 1. Therefore, we call that the system works in a fronthaul-capacity-limited regime when a 1, k mode is used. The reason why we are interested in the fronthaul-capacity-limited regime is that C(C B ) achieves the fast growth rate regardless of P and N. As a further increase of C B , the first mode dies due to either limited P or limited N. We then call the system works in a power-limited regime or encoder-limited regime, respectively. When the system is in either power-limited regime or encoder-limited regime, C(C B ) growth is slowed down compared to when the system works in the fronthaul-capacity-limited regime. This is due to the discontinuities of the slope λ, see the derived optimal upper and lower bounds of λ. The following result shows how to determine which regime the system works in for given C B , P, and N.

Property 3. For given P and N, if
the capacity growth is limited by P and the system works in a fronthaul-capacity-limited regime if whereP P . Otherwise it works in a power-limited regime. On the other hand, if P > N − 2, the capacity growth is limited by N and the system works in a fronthaul-capacity-limited regime if Otherwise it works in a encoder-number-limited regime.
Proof. Modes dying hampers the growth of C in C B . Consider that the first mode of the compound mode 1, k dies because of constrained P not N. In this case, P must satisfy (58) which is given by the lower bound of (57). Consequently, at the moment after the first mode dies, i.e., compound mode 2, k is active, we have j = 2 and k ≈ P + 1 given by taking the upper bound of (57). This j, k setting results in λ = 1 2+2P so that (59) is obtained by evaluating (55). Similarly, considering the compound mode 1, N can be supported by P, the first mode dies because that no new encoders can be used. At the moment of first mode dying, i.e., compound mode 1, N is still active, we thus have j = 1 and k = N, which also result in λ = 1 2+2P . Hence, (60) is resulted. Figure 9 plots C(C B ) of 10-encoder setting for P = 5 and P = 21, respectively, where the regime separations are indicated at the first mode dies for both powers. It is illustrated that in the fronthaul-capacity-limited regime, C has the highest growth rate no matter what its initial value is (point-to-point communication). In the next subsection, we focus on a system working at the fronthaul-capacity-limited regime.

Infinitely Many Encoders
Consider designing a system in practice when C B and P are critical resources while available encoders could be many, for instance, the radio stripe system. Based on the previous study, we should always try to let the system work in its fronthaul-capacity-limited regime where the fronthaul capacity is maximally utilized. Hence, we are motivated to determine the number of encoders that are required to be activated for a given C B by considering infinitely many of them are available when the system is purely fronthaul constrained.
To directly solve k, the highest active mode that is the number of required encoders, from (55) or (56) is not trivial. To achieve an accurate approximate result, we first need the following lemma, of which the proof follows the outline in [20] and is given in Appendix A.6.

Lemma 2.
The non-principle branch of Lambert W function W −1 (·) defined in the interval [−e −1 , 0), see [21], can be bounded as follows Remark 6. Figure A3 shows that for ∀x ≥ 0.5, the lower bound (61) is much tighter than W −1 (−e −(x+1) ) ≥ −x − √ 2x − 1 given in [20], which is the tightest bound of W −1 (·) reported in the literature so far, to our best knowledge. Property 4. When the system works in the fornthaul limited regime, the number of required encodersk is quasi-linear to the available fronthaul capacity as where C B in nats per channel use and · is the floor function.
Proof. As the system works in the fronthaul-limited regime, compound mode 1, k exists. Thus, according to (56), in nats. To upper bound k, we lower bound C B by taking λ = 1 2(k−1) , see the bound (52), which gives where (a) follows by applying (k − 1)! ≤ e(k − 1) k− 1 2 e −k+1 derived based on ([22], 6.1.38), and (b) follows by taking the fact that ( k k−1 ) k is monotonically decreasing in k and goes to e as k → ∞.Therefore, Now, solving k and applying (61) give Finally, since the number of encoders is a integer, (62) is resulted.
In Figure 10, the bound of k given by (66) and the actual number encoders required to be activated are plotted as a function of C B . It is revealed that the derived result is accurate enough.

Concluding Remarks
In this paper, the downlink of a cell-free mMIMO in which multiple APs connected in a linear fronthaul topology serve as a single receiver was studied to reveal relations between the three fundamental network resources, namely the total fronthaul capacity C B , the total transmit number P, and the number of available APs N. Specifically, we focused on partial distributed beamforming where the total available fronthaul capacity is not enough to support full cooperation between all APs, i.e., beamforming. By formulating the problem as a MAC channel with multiple encoders linked in a feed-and-forward setting, we derived the channel capacity as a function of the total fronthaul capacity for both discrete and Gaussian channels. The derivation was started by considering two encoders and then we extended the analysis multiple encoders. It was demonstrated that capacity is achieved by multi-layer superposition coding from which the concept of cooperating mode was developed for the Gaussian channel. This cooperating mode technique leads to optimal cooperation among encoders. Bounds on the capacity for N-encoder setting demonstrated that this channel capacity grows logarithmically in C B for a fixed P. The exact capacity solution shows that the capacity is achieved if and only if by certain compound modes are used. An algorithm was derived for computing which compound modes should be activated as as function of C B , which grows from zero to the value obtaining full beamforming. We demonstrated that C B should be water-poured over the fronthaul links to obtain optimality. Finally, by considering the case where infinitely many encoders are available, we showed that the number of required encoders is quasi-linear to the available total fronthaul capacity when the system is purely constrained by fronthaul resources.
Future directions include extending the results to channels with links which do not have unit gain as is the case here, and considering multiple receivers. Another interesting direction would be the equivalent uplink case.
Author Contributions: All authors conceived the problem and solution. P.Z. wrote the paper. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proofs and Derivations
Appendix A.1. Proof of Theorem 1 Since W → (X n 1 , X n 2 ) → Y n forms a Markov chain, we have that Moreover, from the Markovity of (W, W 12 ) → (X n 1 , X n 2 ) → Y n , we obtain that In the above derivations, the random variable Q is uniformly distributed on [1 : n] and Pr{X 1 = for some distribution p(x 1 , x 2 , y) = p(x 1 , x 2 )p(y|x 1 , x 2 ), for all achievable rate R. This concludes the converse for the discrete memoryless two-encoder case.
(ii) Achievability. We prove that if the message rate (1/n) log 2 M < C B (C 12 ) for a given fronthaul capacity C 12 , the message error probability P (n) e approaches zero if the codeword length n increases. Our coding method is based on superposition.
Encoding: Split the message W that is uniformly distributed on [1 : M] into (W 1 , W 2 ) with M = M 1 × M 2 , where the first part W 1 , which is uniformly distributed on [1 : M 1 ], is transmitted by encoder 1 and the second part W 2 , which is uniformly distributed on [1 : M 2 ] and is conveyed to encoder 2 by W 12 , is transmitted by two encoders cooperatively. Hence, when (W 1 , W 2 ) = (w 1 , w 2 ), encoder 2 sends x n 2 (w 2 ) while encoder 1 inputs x n 1 (w 1 , w 2 ) into the MAC. Decoding: Let > 0. Based on the observed channel output sequence y n , the decoder finds the message pair (w 1 , w 2 ) such that where set A (n) (X 1 X 2 Y) is the set of jointly -typical sequences, see Cover and Thomas [18]. If such a pair cannot be found, or if there are more than one such pairs, an error is declared.
Probability of Error: Due to symmetry, the average probability of error is equivalent to the probability of error for an arbitrary message w ∈ {1, ..., 2 nR }. Hence, without loss of generality, we assume W = w = (w 1 , w 2 ). Thus, we have Due to the Asymptotic Equipartition Property (AEP), it can be shown that for all n large enough. Moreover and Now as long as M 1 ≤ 2 n(I(X 1 ;Y|X 2 )−5 ) , ≤ 2 for all n large enough. Therefore we take log 2 M 2 = min{n(I(X 2 ; Y) + ), nC 12 }, then both (9) and (A10) are satisfied. Note that this implies that If we now let → 0, the achievability part of the Theorem 1 is thus established.

Appendix A.2. Proof of Theorem 3
The proof is a generalization of the proof of the two-encoder settings. Consider a simplified block diagram of the N-encoder setting as shown in Figure A1. Now, consider a cut of the fronthaul link between X m and X m+1 for any given m ∈ [1 : N − 1] such that the nodes in the network are separated in two sets of {X m 1 } and {X N m+1 , Y}.
. . (i) Converse. Consider the Markovity of W → (X n 1 , X n 2 , . . . , X n N ) → Y n . By applying Fano's inequality, we first have Then, considering the cut between X m and X m+1 , we have that ≤ log 2 M m,m+1 + I(X n 1 , X n 2 , . . . , X n m ; Y n |X n m+1 , X n N ) + F Note that the above result is valid for any m in [1 : N − 1]. Thus, by letting n → ∞ the converse follows.
(ii) Achievability. First consider the message W that can be represented by N independent messages as Then, given by the linear topology of encoders, we distribute {W i } N i=1 into the network in the manner illustrated by Figure A2, i.e., for the link between any X m and X m+1 , the fronthaul message W m,m+1 conveys corresponding messages {W i } N i=m . Therefore, for a fixed distribution p(x 1 , x 2 , . . . , x N ) and corresponding marginals, we can first generate M N i.i.d. n-sequences x n N (w N ) with w N ∈ [1, M N ] according to Pr(X n N = x N n ) = ∏ n i=1 p X n (x ni ) and then for each x n N (w N ) generate M N−1 i.i.d. n-sequences x n N−1 (w N−1 , w N ) with w N−1 ∈ [1, M N−1 ] according to Pr(X n N−1 = x n N−1 |X n N = x N n (w N )) = ∏ n i=1 p X n−1 |X n (x n−1 i |x ni (w N )) and so on. In this way, an N-layer superposition codebook is generated and revealed at both encoders and decoder. Figure A2. N-layer superposition coding message structure.
Thus, for sending a message w, encoders transmit sequences {x n m (w m , w m+1 , . . . , w N )} N m=1 over the MAC channel. At the decoder, a unique message tuple (w 1 , w 2 , . . . , w N ) is found by using simultaneous typicality decoding as performed for the two-encoder case. By taking the similar probability of error analysis, it gives that, as long as we can have P (n) e ≤ 2 for all sufficiently large n and any > 0. By further considering M N ≤ 2 nC N−1,N , M N−1 M N ≤ 2 nC N−2,N−1 , . . ., and ∏ N m=2 ≤ 2 nC 12 , we can subsequently take which establishes the achievability for n → ∞ and → 0.
where j is the smallest index such that C m,m+1 = 0 for all m ∈ [j − 1, N − 1], and L (C B , β) = I m + C m,m+1 for any m such that C m,m+1 = 0, namely the 'water level'. Once the water-filling is performed, we fix the corresponding distribution of C B . For Case (a) where C = L , we now can decrease {β m } N m=j to increase β 1 such that L is increased. For Case (b) where C = I N , we can look for an m < N with β m > 0 such that by decreasing β m , β N increases and I N increases as well. Therefore, the β satisfying Case (a) and Case (b) are not optimal. The equality of the terms on the RHS of (30) is thus necessary.
Appendix A.4. Proof of Proposition 2 Three steps are taken in the proof. In step (1) we show that an active compound mode j, k achieves the capacity if LB ≤ UB is satisfied. In step (2) we show that using any two separated active modes (all other modes are inactive) does not achieve the capacity. In step (3) we show that exact solutions of (54) and (55) are resulted.
Step (1) Note that function g given in (42) is convex-∩ when λ ≤ 1 k−1 if the largest activated mode is k. So, we set the partial derivatives of g with respect to {β i } N i=1 according to the Kuhn-Tucker conditions, see ( [23], eqn.4.4.10 and eqn.4.4.11), when the active compound mode j, k achieves the capacity. By considering (42) in nats, the partial derivative of function g with respect to β i is where i ∈ [1 : N].
Step (1.1) Firstly, by only considering that compound mode j, k is active, i.e., all {β i } k i=j are nonzero, while the other {β i } j−1 i=1 and {β i } N i=k+1 are zeros, the partial derivative can be expressed as For simplicity, we denote that Now, consider that the partial derivatives corresponding to i ∈ [j : k] should be all identical to some value µ, i.e., to find the capacity solution in terms of optimal distribution of β. Note that For the case of k > j, we can recursively evaluate the equalities in (A23) as ∂g/∂β i = ∂g/∂β i−1 by taking i from k to j + 1 in a descending order with the use of (A21). In such a way, it is obtained that Now, based on (A24) and (A25), we can derive expressions of {β i } k i=j by considering two scenarios. Scenario 1: Consider k ≥ j + 2, i.e., at least three consecutive modes are active. By taking i = j and i = j + 1, (A25) can be used twice to obtain the equality j D(j) = j+2 D(j+1) that gives the relation For k > j + 2, expression (A25) allows us to further obtain by taking i in the order of j + 1 to k − 1, which results in an interesting and important relation Therefore, by applying relation (A26), we can express D(k) as which is valid for the case of k = j + 2 as well. So, by setting (A25) equal to (A24) with i = j as and substituting D(k) in (A29), we can first derive the power of modes from j + 1 to k − 1 as According to (A26), we can then obtain the power of the first mode as Furthermore, owing to ∑ k i=j β i = 1, we can finally represent the power of the last mode as Now, applying the total power constraint, we should have By substituting (A33), (A31), and (A32), the corresponding slope λ should simultaneously satisfy 0 < λ < jP 2 + jP + j 2 P 1 j(2 + (j + 1)P) < λ < 1 2j . (A35) Since k > j + 1, it is easy to see that the lower bound of λ is 1 j(2+(j+1)P) . For the upper bound, if 1 2(k−1) > jP 2+jP+j 2 P , it leads to P < 2 2jk−j 2 −3j . Due to j ≤ k − 2, such P results in j(2 + (j + 1)P) < 2j + 2(j + 1) 2k − j − 3 ≤ 2(k − 1), which contradicts the lower bound. Therefore, to make the compound mode exist, the slope should be in the range 1 j(2 + (j + 1)P) < λ < 1 2(k − 1) .
By substituting (A31) and (A33) for β j+1 P and β k P, it can be easily shown that where i ≥ 1 is applied in the bounding for the last step. Note that, Step (1.2) and (1.3) and resulted bounds of λ also cover the the case of j = k, i.e., only one mode is active and achieves the capacity. Now, by considering the ranges given by (A37), (A42), and (A47), the slope bounds in (52) and (53) are resulted.
Step (2) Assume that the capacity can also be achieved by only activating any two separated modes j and k , where j ∈ [1 : k − 1] and k ∈ [j + 1, N]. Then, the Kuhn-Tucker condition requires ∂g/β k = ∂g/β j , which results in the relation of Note that in D(k ) only β k and β j are nonzero. Moreover, for ∀i ∈ [1 : k − j − 1], we have By substituting the relation (A48) into above derivative, it can be shown that where the last step is due to k − i > j . This result contradicts to the Kuhn-Tucker condition. This demonstrates the only compound modes achieves the capacity.