Energy Efficiency Optimization for Downlink Cloud RAN with Limited Fronthaul Capacity

In the downlink cloud radio access network (C-RAN), fronthaul compression has been developed to combat the performance bottleneck caused by the capacity-limited fronthaul links. Nevertheless, the state-of-arts focusing on fronthaul compression for spectral efficiency improvement become questionable for energy efficiency (EE) maximization, especially for meeting its requirements of large-scale implementation. Therefore, this paper aims to develop a low-complexity algorithm with closed-form solution for the EE maximization problem in a downlink C-RAN with limited fronthaul capacity. To solve such a non-trivial problem, we first derive an optimal solution using branch-and-bound approach to provide a performance benchmark. Then, by transforming the original problem into a parametric subtractive form, we propose a low-complexity two-layer decentralized (TLD) algorithm. Specifically, a bisection search is involved in the outer layer, while in the inner layer we propose an alternating direction method of multipliers algorithm to find a closed-form solution in a parallel manner with convergence guaranteed. Simulations results demonstrate that the TLD algorithm can achieve near optimal solution, and its EE is much higher than the spectral efficiency maximization one. Furthermore, the optimal and TLD algorithms are also extended to counter the channel error. The results show that the robust algorithms can provide robust performance in the case of lacking perfect channel state information.


Introduction
To maintain the requirements of the expected scale of the increasing data traffic and mobile terminals, the fifth generation (5G) wireless network [1] faces some challenges in terms of system capacity, energy consumption, and so on. The cloud radio access network (C-RAN) [2,3], which has emerged as a promising solution in reducing both the capital and operating expenditures, is expected to be an effective approach to fulfil these requirements. In C-RAN, a central unit (CU) or baseband unit (BBU) pool connects all the deployed low-power base stations (BSs) using the finite-capacity fronthaul links that allows joint signal processing and transmission. Despite various attractive advantages brought by C-RANs [4][5][6], such as joint beamforming, and centralized encoding and decoding, the performance bottleneck for large-scale implementation comes with the high capacity requirements of the fronthaul links. Therefore, in practical C-RAN, the data sharing [7][8][9][10][11][12][13] and fronthaul compression designs [14][15][16][17][18][19][20][21][22] are recognized as two promising approaches to overcome the significant impact of the constrained fronthaul on spectral efficiency (SE) and energy efficiency (EE) (bit-per-Joule) [23][24][25]. The data sharing strategy [8][9][10][11][12][13] reduces the fronthaul consumption through limiting the data transfer among BSs (one BS serves a small number of the total MUs). For the latter one, the CU computes the precoded signals intended to be transmitted to each BS, and then the signals are quantized and sent The main contributions of this paper are summarized as follows. Firstly, we formulate the EEmax problem of joint beamforming and quantization noises design under a limited power budget and capacity-limited fronthaul links as a non-convex fractional programming problem. We first derive an optimal solution method based on branch-and-bound (BnB) technique [32,33] to solve the EEmax problem globally. Specifically, the BnB algorithm computes the upper and lower bounds, and deletes the regions that do not contain the optimal solution. The algorithm terminates when the difference between the upper and lower bounds is smaller than a predefined accuracy.
Secondly, to reduce the computational complexity of the optimal algorithm and facilitate decentralized implementation, we propose to transform the problem into a parametric subtractive form, and further proposed a two-layer decentralized (TLD) algorithm to solve the equivalent subtractive problem. Specifically, an one-dimension search approach is used to find the EE in the outer layer, and a decentralized algorithm based on alternating direction method of multipliers (ADMM) is proposed to solve a subproblem in the inner layer. The proposed algorithm achieves closed-form solution in parallel manner with convergence guaranteed.
Thirdly, considering the imperfection of the obtained CSI in practical C-RANs, the robust optimal and TLD algorithms for the considered EEmax problem are also proposed to characterize the performance degradation of the CSI errors. In particular, the robust optimal can also achieve a performance benchmark, and the robust TLD algorithm also has closed-form solution in a parallel manner.
Finally, we validate the effectiveness of the proposed algorithms through extensive simulations. The results demonstrate that both the optimal and TLD algorithms are convergent, and the TLD algorithm can achieve near optimal solution which is much higher than the SEmax one. Numerical analysis also show that the EE performance is susceptible to the channel errors, and a smaller channel error reaches a higher EE.
The remainder of the paper is organized as follows. In Section 2, the system model and problem formulation are presented. Section 3 describes the proposed optimal algorithm with perfect CSI. Section 4 presents TLD algorithm with perfect CSI. For imperfect CSI case, the robust optimal and TLD algorithms are also presented in Section 5. The simulation results are given in Section 6. Finally, we conclude this paper in Section 7.
Notations: We use C to denote the set of complex numbers, and C M×N to denote the set of all M × N matrices with complex entries. We use boldface capital and lower case letters are respectively used to denote matrices and vectors. (X) −1 , X H and Tr(X) represent the matrix inverse, Hermitian transport and the trace, respectively. |x| represents the Euclidean norm. E[·] is the expectation operator, and diag(x 1 , · · · , x L ) represents a diagonal matrix with diagonal elements given by {x 1 , · · · , x L }. For a complex number x, |x| is the mode of x. "s.t."stands for "subject to".

System Model
We consider a downlink C-RAN with L single-antenna BSs and K single-antenna MUs. The CU connects all the BSs via fronthaul links, and each link is finitely constrained by C l , l = 1, · · · , L. Assume that the CU has access the global CSI. The data symbol for each MU (denoted by s k for the k-th MU) is distributed as complex Gaussian with zero mean and unit variance. Denote by x l = ∑ K k=1 w kl s k the beamformed complex signal at the CU for BS l, where w kl is the beamforming from BS l to MU k. To reduce the capacity requirements on the fronthaul network, the signals are compressed before being forwarded to the corresponding BSs via the finite-capacity fronthaul links. According to [18,20], the compression procedure is modeled as a test channel and the procedure can be expressed as x l = x l + e l , ∀l, where e l is the quantization noise, and x l andx l are the input and out of the test channel, respectively. The received signal at MU k is given by where h kl ∈ C is the channel from BS l to MU k, n k is the additive Gaussian noise at MU k, with zero mean and σ 2 variance. When employing the single user detection at each MU, the received signal-to-interference-plus-noise-ratio (SINR) at MU k is where h k = [h T k1 , · · · , h T kL ] T and w k = [w k1 , · · · , w kL ] T are the aggregated channel and beamforming from all BSs to MU k, respectively, and is the covariance matrice of e = [e 1 , · · · , e L ] T , i.e., Q = ee H . It is noted that multivariate compression is also possible and has been studied in [20], where e l e H j = 0, ∀l = 1, · · · , L., e l e H j = 0 when l = j. In this paper, we consider point to point fronthaul compression, and let q l = q ll , l = 1, · · · , L.
Considering an ideal vector quantizer, the quantization noise level q l and the fronthaul capacity C l for the l-th fronthaul link satisfy the following constraint [12,15] The transmission power consumed at BS l is constrained by E[|x l | 2 ] ≤ P max l , where P max l is the maximum transmit power of BS l. The transmit power of BS l, denoted by p l , consists of quantization noise q l and data transmission power (denoted by p t l ), i.e., p l = p t l + q l = ∑ K k=1 |w kl | 2 + q l . It is obviously that p l ≤ P max l , ∀l.
The network power of C-RAN consists of the BS transmit power and relative fronthaul network power. In this paper, we adopt the power consumption model of C-RAN as [4] where P c = ∑ L l=1 P c l is the total relative fronthaul link power consumption [4], P c l ≥ 0 is the relative fronthaul link power consumption when switch off both the fronthaul link and the corresponding BSs. η l (η l > 1) is the drain efficiency of power amplifier of BS l. In this paper, we assume that all the BSs have the same drain efficiencies, i.e., η = η l , ∀l. Since we do not consider the BS switch on/off scheme in this paper, P c is a nonnegative constant and we call it by static power for brevity for the rest of the paper. We point out that based on the results obtained by the proposed algorithms in this paper, it is easily extended to add BSs selection (determine the BSs to be switch off or not) into consideration through ordering the BSs in accordance with the bisection search [4] to further improve EE. However, this is outside the scope of this paper.

Problem Formulation
To balance the sum rate and total power consumption, the EEmax problem is optimized in this paper. This problem over the beamforming in the presence of fronthaul compression can be formulated as where w = [w 1 , · · · , w K ] and q = [q 1 , · · · , q K ] are the collection of beamforming vectors and quantization noises, respectively. R k = log 2 (1 + SINR k ) is the achievable data rate of MU k. (7b) is the BS transmit power constraint, and (7c) is the reformulation of the fronthaul capacity constraint (4) and β l = 1 Due to the non-convex R k and the fractional objective function in (7a), P 0 is a NP-hard problem, and it is challenging to find its global optimum. In the following, we first present an optimal approach and then propose a TLD framework solution.

Optimal Algorithm with Perfect CSI
In this section, we will propose a global optimal algorithm, which based on Branch-and-Bound method [32], to solve problem P 0 . The essential idea of the proposed algorithm is based on the following equivalent transformation where t = [t 1 , · · · , t K+1 ] T are the introduced variables, (8b) is the transformation of log 2 (1 + SINR k ) ≥ t k .
The equivalence between problems P 0 and P 1 is that the constraints (8b) and (8c) hold with equality at optimum. Although P 1 is more tractable compared to P 0 , it is still hard to solve due to the coupled variables of w k and t. It is observed that if one increase each t k in the feasible set of P 1 , a better objective value can be obtained. This motivates us to use the monotonic optimization in [32], i.e., the optimal BnB algorithm, to solve problem P 1 .

BnB Algorithm
To solve P 1 , we first denote the feasible set for variables t by Ξ, i.e., Ξ = {t|constraints of P 1 }. Denote t = [t 1 , · · · , t K+1 ] T andt = [t 1 , · · · ,t K+1 ] T by the aggregated lower and upper bound of t k . The interval t ≤ t ≤t indicates that each element of t is bounded by its lower and upper bounds.
The objective function f (t) in P 1 is monotonically increasing in the interval t ≤ t ≤t. In particular, t k is upper bounded by ignoring the interferences, i.e., t k ≤ log(1 + 1 σ 2 k ∑ L l=1 P m l |h k | 2 ) =t k , and the lower bound of t k is t k = 0 ≤ t k for k = 1, · · · , K. Similarly, we can constrain t K+1 by t K+1 ≤ t K+1 ≤t K+1 , wheret K+1 = 1/P c and t K+1 = It is obvious that the feasible set t in Ξ must be contained For a given t ∈ Φ, problem P 1 reduces to the feasibility problem given by P 2 : find w 1 , · · · , w K , q l (9) s.t. (7b), (7c), (8b), (8c).
Obviously, when t = [0, 0, · · · , 1 ∑ L l=1 P m l /η+P c ] T , problem P 2 is infeasible. That is because the sum rate should not equal to zero with maximum transmit power. In this paper, we will customize the BnB algorithm to solve problem P 1 globally. The BnB algorithm divides the box Φ into smaller ones, and cuts off boxes that do not contain an optimal solution. The algorithm will converge to the global optimal solution after finite iterations. Since P 2 is non-convex due to the SINR constraint, in the following, we first recast them as convex ones. Letγ k = 2 t k − 1, (8b) is equivalently rewritten as [7,33] 1 In the above formulation, we note that (10a) is a second-order cone (SOC) constraint. The constraint (10b) is without loss of generality due to the fact that a phase rotation of the beamformers does not effect the objective of the problem [25,33].
Moreover, (8c) is easily rewritten as Then, problem P 2 becomes a SOCP feasibility problem which can solved efficiently. In the proposed BnB algorithm, the bounding function can be formally expressed as where φ ub (Φ) and φ lb (Φ) are the upper and lower bound respectively, Φ is defined as Φ {t|t k,min ≤ t k ≤ t k,max , ∀k} where t k,min and t k,max denote the end points of the kth edge of We denote by V i the collection of all created boxes at iteration i. Then, the work flow of the BnB algorithm to obtain the global optimal solution is presented in Algorithm 1.
1: Check the feasibility problem P 2 with given t. If it is infeasible, exit; Otherwise go to step 2.
Branch Φ i into two smaller boxes Φ I and Φ I I using the bisection subdivision along the longest

Remark 1 (H).
According to [34], the convergence of Algorithm 1 is guaranteed due to the monotonic property of f (t). The main step of Algorithm 1 is to delete the boxes that do not contain the optimal solution. This step is referred to as pruning, and a smaller box that contains the optimal solution is obtained. Therefore, step 7 confirms the convergence of Algorithm 1. The corresponding optimal EE is f (t) = U i , the optimal achieved data rate is t , ∀k ∈ K, and the network power consumption is 1/t K+1 . This algorithm gives an optimal solution to problem P 1 (equivalently to problem P 0 ) when the tolerance τ is small enough. Algorithm 1 provides a performance benchmark for any other suboptimal algorithms. However, the computational complexity of Algorithm 1 is very high in general. Therefore, an improved box reduction approach approach was proposed in [33,34] to reduce the searching time, but we use the basic BnB approach in this paper for simplicity.

Decentralized Algorithm with Perfect CSI
In this section, we first transform the original problem into an equivalent subtractive-form using the Dinkelbach's method. By exploiting the equivalence between the achievable data rate and its MSE, an ADMM algorithm is proposed to solve a QCQP subproblem with closed-form solution in a parallel manner.

Equivalent Optimization Problem
It is noted that P 0 is a nonlinear fractional programming problem and can be transformed using the Dinkelbach's method [27]. Defining the optimal EE of problem P 0 by α opt , we have where P opt tot is the optimal total power consumption, R opt is the optimum of R, and R opt = ∑ K k=1 R opt k is the sum rate.
According to [29], the optimal EE α opt can be achieved if and only if max w,q where (w, q) ∈ D and D = {(w, q)|(7b), (7c)} is the feasible region of problem α opt .
Thus, based on the theoretical results in [25,30,33], problem P 0 is transformed as the following parametric programming problem It is noted that if problem P 3 is optimally solved with G(α) = 0, problem P 0 can be solved optimally. However, if problem P 3 cannot be optimally solved, we can still solve problem P 0 through solving a sequence of problem P 3 . Unfortunately, the optimal solution of problem P 0 is not guaranteed in this case. We will provide detailed analysis in the next subsection.
The function of G(α) is a monotonically decreasing function over α. Therefore, a bisection method, which is demonstrated in Algorithm 2, should perform well enough to find α [25,30].
It is important to initialize α in reducing the search time of Algorithm 2. Here, we initialize the interval α min ≤ α ≤ α max that α is bounded by its lower and upper bounds. Intuitively, α is lower bounded by α min = 0 when the the sum rate equals to zero. For α max , it is upper bounded by ignoring the interference and using maximum transmit power in R k , and ignoring the transmit power and the quantization noises in P tot . Specifically, R k ≤ log 2 (1 + 1 σ 2 ∑ L l=1 P max l |h k | 2 ) = R k,max , and P t,min = P c . Therefore, α max = R k,max /P t,min , and α = [α min , α max ] = [0, ∑ K k=1 R k,max /P c ].

Decentralized Algorithm for Subproblem P 3
The key step for finding the quantized noises and the beamformings in Algorithm 1 lies in solving the subproblem P 3 . The main difficulty arises from the non-convex R k in the objective function (16a). Fortunately, by extending the equivalence between the SRmax problem and MMSE problem [31,35], R k in problem P 3 can be reformulated into a tractable form.
where ρ k ∈ R is a scalar variable associated with MU k, e k ∈ R is the MSE for MU k, given by The proof of the equality in (17) is based on the first-order optimality condition [31], which is omitted here for brevity. Then, problem P 3 can be recast as It is worth noting that problem P 4 is not jointly convex in {w, u, q, ρ}, but it is convex with respect to {w, q} or {u, ρ} by given {u, ρ} or {w, q}, respectively. Thus, with fixed {w k } and {q l }, the optimal weight ρ k is ρ k = 1/e k where e k is the optimal MSE for MU k. Then, the optimal receive beamforming With fixed {u k } and {ρ k }, the optimal {w k } and {q l } can be obtained by solving the following quadratic constraint quadratic programming (QCQP) problem.
It is observed that this problem is a convex optimization problem with respect to w and q, which can be solved centrally by standard mathematical tools, i.e., CVX [36]. It is noted that by simply replacing the constraints of the SINR and the maximum transmit power per BS as in [20], such an alternative optimization can also be adopted to solve P 0 with multivariate compression. That because the replacement does not affect the convexity of the subproblems. However, such an interior-point method solves P 5 with high computational complexity and it does not reveal the structure of the solution. Meanwhile, it is implementation intensive for large-scale C-RANs due to the centralized computation of the beamformings and quantization noises at the CU. Unlike the beamforming design problem in multicell system [30], the beamformings are coupled among BSs in our problem, making the Lagrangian based decomposition algorithm invalid in solving P 5 . Towards this end, we propose a novel approach using ADMM method to solve P 5 with closed-form solution optimally in a parallel manner.
In particular, in P 5 , the two constraints (7b) and (7c), respectively, provide an upper and lower bound on q. Then, the constraint β l p t l ≤ P max l − p t l should be satisfied. By rearranging this constraint, we have Since the objective function of P 5 is monotonically decreasing over q, we can replace the inequality constraint (7c) with equality, i.e., q l = β l ∑ K k=1 |w kl | 2 . We denote a new beamforming vector for BS l asw l = [w 1l , · · · , w Kl ] T , and then we have q l = β l |w l | 2 . As a result, problem P 5 is equivalent to the following problem in only a single set of variables w.
where µ l = ∑ K k=1 ρ k |u k | 2 β l |h kl | 2 + α η (1 + β l ). The objective function in (23a) contains two parts, and they are functions of different variables, i.e., w k andw l , rather than the same variable. Therefore, problem P 6 is not a standard group lest absolute shrinkage and selection operator (LASSO) problem [37]. Hence, the existing algorithms for the group LASSO problems are not directly applicable. This fact motivates us to find new algorithm to solve problem P 6 . Fortunately, it is observed that problem P 6 has a special structure that can be solved by developing the famous ADMM algorithm. To account for the difference between w k and w l in problem P 6 , we first introduce a copyz l forw l , and define z = [z T 1 , · · · ,z T L ] T . Problem P 6 can be equivalently expressed as The partial augmented Lagrangian function of problem P 7 is where y = [ỹ T 1 , · · · ,ỹ T L ] T withỹ k = [ỹ T k1 , · · · ,ỹ T kL ] T is the vector of Lagrangian dual variables for the equality constraint (24c), and c > 0 is some constant.
The idea of the ADMM is to update the local variables when fixing the other variables. Specifically, the variables updating procedure of the ADMM algorithm is detailed as follows.
By fixing w (m) and y (m) at the (m)-th iteration, z (m+1) at the (m + 1)-th iteration is updated by solving the following convex problem We show that problem (26) can be solved in a parallel manner. Specifically, we first decompose problem (26) that can be solved independently at the CU. The Karush-Kuhn-Tucker (KKT) conditions of problem (27) are where θ l is the optimal Lagrangian multiplier associated with the power constraint,z l is the optimum ofz l , and s l is defined as s l =w l . If θ l = 0, we havez l = cs l 2µ l +c under the condition of (1 + β l )|z l | 2 ≤ P max l (equivalent to To update w (m+1) with fixed {y Due to the relationship between w k = [w kl , · · · , w kL ] T andw l = [w 1l , · · · , w Kl ] T , ∀l, k, we have Then, problem (33) can be decomposed into the following K subproblems, and can be solved in a parallel manner with each MU.
By differentiating (34) with respect to {w k } and set to zero, we obtain the optimal {w k } with closed-form expression in the (m + 1)-th iteration, given by Using the relationship between w k andw l ,w Therefore, the decentralized algorithm for solving P 3 is summarized in Algorithm 3. The convergence of Algorithm 3 is guaranteed by Theorem 1.
Proof. The proof is based on the convergence of the alternative optimization method and ADMM algorithm. With initialized {w (0) , z (0) , y (0) }, the inner loop of Algorithm 3 from steps 7 to 12 converges to an optimal solution of P 5 due to the convergence of ADMM algorithm, and the proof can be found in [38]. On the other hand, the outer loop of Algorithm 3 converges to a stationary point of subproblem P 3 due to the convergence property of block coordinate decent algorithm [31,35]. According to [25,30], for an arbitrary α, the objective (16a) in problem P 3 is shown to be non-decreasing during each iteration of the outer loop of Algorithm 3. Therefore, Algorithm 3 is guaranteed to converge to a stationary point of problem P 3 , and the proof is completed.
Denote by α opt the actually optimal solution of problem P 0 , and α the obtained suboptimal solution returned by the Algorithm 2 when using Algorithm 3 to solve problem P 3 . Since Algorithm 3 converges to a stationary point of problem P 3 , the suboptimal objective of G(α) equals to zero which equivalently equals to G(α ) > 0. Moreover, considering the fact that G(α) is monotonically decreasing, the actually optimal solution α opt that satisfies G(α opt ) = 0 must be no smaller than α , i.e., α opt ≥ α . Therefore, the obtained solution returned by Algorithm 2 when using Algorithm 3 is no larger than the optimal solution of problem P 0 . Simulation results will demonstrate that the obtained solution is very close to the optimal one that verifies the effectiveness of the proposed algorithm numerically.
Combining Algorithms 2 and 3, it is concluded that problem P 0 can be solved efficiently by the proposed TLD algorithm. Since the limits of the upper and lower bounds are updated iteratively, the bisection procedure will stop any way. However, we may not have G(α) = 0.

Parallelized Implementation
Since problem (26) is decomposed into L subproblems, z (m+1) at step 8 in Algorithm 3 is updated in a parallel manner with closed-form solution. Similarly, the beamformings w (m+1) and multipliers y (m+1) are also updated using (35) and (36) in a parallel manner with closed-form solution. Therefore, we derive closed-form expressions for the optimal beamformings, the optimal receiver filters and the auxiliary variables in Algorithm 3, that provide some insights on the EEmax problem. For example, if w l = 0 (no transmission power consumed by BS l), BS l can be switched off to save static power and improve EE.

Complexity Analysis
According to the algorithm flow, the computational complexity of Algorithm 3 contains two parts. In Algorithms 3, the computational complexity of computing (20) arises from the matrix inversion of the receive beamforming, i.e., O(L 3 ). When the interior-point method is adopted to solve P 5 , the computational complexity is O((LK) 3.5 ) [39]. In this case, to serve K MUs, the overall computational complexity is in the order of O(KL 3 + (LK) 3.5 ). On the other hand, since the ADMM algorithm is applied to solve P 5 , the main computational complexity is the matrix inversion in step 9 of Algorithm 3 where the transmit beamformings are computed by (35). Thus, the overall computational complexity of Algorithm 3 is in the order of O(KL 3 + KL 3 ). For the outer layer algorithm, simulation results will show that it converges rapidly (about 5 iterations). Therefore, we can deduce that the proposed TLD algorithm is computation efficient.

Generalization to the Multi-Antenna System
Although a single-antenna is equipped at each BS and each MU in the above discussion, we claim that proposed TLD algorithm can be generalized to the multi-antennas BSs and the single-antenna MUs scenarios. This is because deployed multiple antennas (each with N antennas) at BSs, one only needs to replace the corresponding channel coefficient h kl from BS l to MU k with h kl ∈ C N×1 . Similarly, the transmit beamforming coefficient w kl and receive beamforming coefficient u k from BS l to MU k are replaced with vectors w kl ∈ C N×1 and u k ∈ C N×1 , respectively. While e l is replaced by e l ∈ C N×1 , and the corresponding covariance matrice is E[e l e H l ] = diag(q 11 , ..., q 1N ). If q l = q 11 = · · · = q lN , one only need to replace q l in (7b) and (7c) with Nq l . Therefore, the proposed TLD algorithm can be extended to solve the EEmax problem with multi-antenna BSs and single-antenna MUs but requires additional efforts. Moreover, for the multi-antenna BSs and multi-antenna MUs C-RAN, we point out that the weighted minimum MSE algorithm in [31] might provide some insights on how to apply the proposed algorithm.

Robust Algorithms with Imperfect CSI
In practical C-RAN, due to the limited feedback [40], partial CSI [41] or estimated error, the obtained channel is not perfect. According to [42][43][44], the imperfection in CSI has significantly impact on the system performance. Therefore, we will extend the proposed algorithms in the previous two sections to solve the robust EEmax problem in the presence of imperfect CSI.
The same system model is considered as in Section 2. Since the path-loss fading and the log-normal fading can be estimated accurately, the imperfection usually comes from the uncertain small-scale fading. Thus, different from the worst-case design [42,43], the Gaussian distribution in [44] is adopted to model the channel imperfection. The real channel from BS l to MU k is expressed as where g kl = GL(d kl )ϕ kl is the channel gain consisting of the antenna gain G, the path-loss fading L(d kl ) at distance d kl in km and the log-normal random fading ϕ kl .h kl and∆ kl are respectively the estimated channel and channel uncertainty from BS l to MU k.∆ kl is assumed to be independent identically distributed (i.i.d) zero mean circularly symmetrical complex Gaussian (ZMCSCG) random variables with variance σ 2 e . The channel from all the BSs to MU k is h k = g k (h k +∆ k ), ∀k, where g k = diag(g kl , · · · , g kL ), andh k and∆ k are the aggregated collection ofh kl and∆ kl , respectively. Leth k = g khk ,h kl = g klhkl and D k = diag(|g k1 | 2 , · · · , |g kL | 2 ). The received SINR at MU k is expressed as where Υ = σ 2 e Tr(D k ) ∑ K i=1 |w i | 2 + ∑ L l=1 q l σ 2 e denotes the interference caused by uncertainty part of CSI. The first term of Υ contains the interference caused by the CSI error of the intended signal and the signals of other MUs.
The robust EEmax problem has the same form as problem P 0 by simply using the imperfect channel, i.e., replacing h kl by (28). The details of the extended algorithms to solve the robust problem with the consideration of imperfect CSI are given in the following two subsections, respectively.

Robust Optimal Algorithm
To apply the optimal algorithm under imperfect CSI, we only need to reformulate (8b) because the CSI is only involved in (8b). In particular, (8b) is recast as Follow the same procedure as demonstrated in Algorithm 1, one only need to replace constraint (10a) in problem P 2 with (39a). In this case, the computational complexity is the same as Algorithm 1 (both of them are very high but provide optimal solutions).

Robust TLD Algorithm
Since the CSI is involved only in R k , the TLD algorithm performs well to solve the robust EEmax problem but requires some transformations. In particular, the out layer iteration of the robust TLD algorithm follows the procedure as Algorithm 2. To solve the robust subproblem P 3 in the inner layer, Algorithm 3 cannot be applied directly. Fortunately, with the same transformations as Section 4.2, the corresponding MMSE receiver can be expressed as and the MSE of the k-th MU is Then, with fixed {ũ k } and {ρ k }, the robust P 5 with respect to {w k } and {q l } becomes s.t. (6b), (6c).
We further reformulate problem P 8 as Since problems P 6 and P 9 have the same form, the ADMM algorithm presented in the inner layer of Algorithm 3 performs well in solving P 9 . Particularly, z is updated by (31), and {w k } is updated using the following closed-form expression whereÃ k andb k are shown in (44) and (45), respectively. The corresponding multipliers y are updated by (36). Similar to the TLD algorithm, the robust TLD algorithm procedure is omitted here for brevity. It is easily verified that the robust TLD algorithm is convergent, and it has the same computational complexity as the non-robust TLD algorithm, and it achieves a suboptimal solution with closed-form expression in a parallel manner as well.

Simulation Results and Discussions
In this section, we evaluate the performance of the proposed algorithms via Monte-Carlo simulation. We consider a downlink C-RAN with L = 5 single-antenna BSs and K = 4 single-antenna MUs, where one BS locates at the circle centre, and the other four BSs are located in a circle region at equal distances apart with radius 0.5 km, as shown in Figure 1. The four single-antenna MUs are randomly deployed in the circle with uniform distribution. Each BS and MU are equipped with a single antenna, and we set the maximum transmit power P m = P max l and the fronthaul capacity C = C l for all BSs. The convergence errors are set as for all the proposed algorithms. Unless specified, other simulation parameters are listed in Table 1, and all the simulation results are averaged over 50 times independent MU locations, each with a single random channel realization. In order to verify the effectiveness of the proposed algorithms, we consider two baseline algorithms for comparison: SRmax algorithm: The proposed TLD algorithm is adopted to solve the SRmax problem by simply setting α = 0.
DC algorithm: We also modify the algorithm proposed in [22] for comparison. Specifically, to arrive at a tractable formulation, one can use the epigraph form of the original problem. Based on the transformation, one can transform the constraints into convex ones by using the first-order Taylor expansion. Then, the approximated problem is iteratively solved, and the solution converges to a KKT point [22]. The computational complexity of the DC algorithm using the interior-point method is O(I max L 6.5 K 3.5 N 6.5 ) (I max is the maximum of iteration number), which is much higher than the TLD algorithm.

Non-Robust Performance
We first investigate the convergence behavior of the proposed algorithms over a typical random channel realization for P c = 2 W, P m = 30 dBm, and C = 5 bits/s/Hz, which are presented in Figure 2. It is observed that the outer layer (Algorithm 2) of TLD in Figure 2a converges to a near optimum very fast (about 5 iterations). Meanwhile, Figure 2b illustrates that the objective function of problem (16) of Algorithm 3 at the 2nd iteration of Algorithm 2 in Figure 2a increases and converges to a stationary point in less than 30 iterations. Thus, the proposed TLD algorithm converges rapidly. It is also found that the convergence rate of the optimal algorithm is much slower that the TLD algorithm. Due to a large number of infeasible boxes being removed, the gap between upper and lower bounds of the optimal algorithm reduces rapidly during the first iterations.
Then, we explore the effect of fronthaul capacity on the performance in terms of EE, sum rate and transmit power, shown in Figures 3-5, respectively.   The TLD algorithm achieves a comparable performance compared to the DC one in terms of EE. Figure 3 shows the EE comparison among the four algorithms under different fronthaul capacities. As an be seen from Figure 3 that the TLD and DC algorithms achieve an approximate EE performance as the optimal one, and both of them outperform the SRmax one by about three times in the middle-high fronthaul capacity region. Since the network power is penalized in (7a), and the sum rate and power are jointly optimized, the EE is improved compared to SRmax. This can be explained from Figures 4 and 5 that the saved ratio of the transmit power of TLD (save for about 95% compared with SRmax) is much higher than its sum rate reduction ratio (decrease by about 50% compared with SRmax), resulting in a higher EE. It is also observed from Figure 3 that the EE performance increases with the growing fronthaul capacity in the low fronthaul capacity region, and then gradually converge to a constant in the middle to high fronthaul capacity region, and a similar trend can be found in Figure 4. According to the expression of quantization noises (QN), i.e., q l = 1 2 C −1 ∑ K k=1 |w kl | 2 , it reduces exponentially with the increasing fronthaul capacity (shown in Figure 5). As a result, at low fronthaul capacity region, the SINR is limited by the high QN, leading to a low sum rate in this region. Moreover, the corresponding data transmit power (DTP) in Figure 5 increases very slowly from low to high fronthaul capacity region. That because with the increase of the fronthaul capacity, more powers of DTP and QN are consumed to improve the sum rate. However, more network power is consumed which makes the increment of EE gradually. It should be noted that the optimal algorithm cannot provide DTP and QN but only the total network power, as a result only the transmit power (sum of DTP and QN) of the optimal algorithm is given in Figure 5. In summary, Figure 3-5 illustrate that the TLD and optimal algorithms can achieve the balance between the sum rate and power consumption, implying a much higher EE than the SRmax one.  Figure 6 compares the EE performance with respect to the maximum transmit power P m for a given fronthaul capacity C = 5 bits/s/Hz. It is observed that TLD and DC significantly outperform the SRmax in terms of EE in the middle to high transmit power region (≥18 dBm). This is because by jointly optimizing the sum rate and network power, the increased sum rate is slightly larger than the increased power consumption ratio, resulting in a gradually increase of EE. While for the SRmax algorithm, because of the interference between MUs, its sum rate gain cannot compensate for the negative impact of the network power, making its EE worse than the TLD and DC algorithms in the event that P m ≥ 18 dBm. It is also observed that the optimal, and TLD and DC algorithms have comparable EE performance as the SRmax one at low maximum transmit power (≤18 dBm), which suggests that at this region, transmitting with the maximum transmit power provides comparable sum rate gain for the optimal, EEmax and SRmax algorithms. To investigate further, we also test the impact of static power consumption on the EE performance of the proposed algorithms, shown in Figure 7. It depicts that, at low to middle static power (P c ≤ 6 W), the EE of the optimal, DC and TLD algorithms decreases significantly with the growing static power P c . This can be explained that the static power dominates the network power at this region and the transmit power is optimized as well, leading to a rapid decrease of EE. Whereas, since the maximum transmit power is usually adopted by the SRmax algorithm for transmission and it accounts for a large amount of the network power for P c ranging from 2 to 10 W, the decrease EE of the SRmax algorithm becomes gentle. These facts bring a higher EE performance for the TLD algorithm than the SRmax at different static powers. The effect of MU number on the EE performance is also investigated with C = 5 bits/s/Hz, P m = 30 dBm and P c = 2 W, which is shown in Figure 8. We learn from Figure 8 that, with more MUs to be served, the EE of the TLD and DC algorithms increase significantly and exhibits obvious advantage over the SRmax one, e.g., higher than SRmax by more than 300%. Intuitively, the data rate of each MU decreases with the growing number of served MUs due to the increased interference between MUs. The obtained sum rate gain is very large when the serving MU number is small, while the sum rate increases much more slowly if the served MU number is large. It is attributed to the rapidly increase of interference among MUs by serving large number of MUs, implying a data rate reduction for each MU and a slightly increase on the sum rate of all MUs. Besides, more transmit power is needed to achieve the sum rate gain when serving more MUs, which also perform negative for boosting EE performance. It should also be noted that by further increasing the number of MUs, the EE of all the four algorithms will firstly become stable and then show a decrease trend. This is because, due to the increased interference caused by increasing the MU number, the decreased sum data rate of MUs is much larger than the increased data rate brought by the newly added MUs.

Robust Performance
The EE performance of the proposed robust TLD algorithm is investigated over the channels with different channel errors, and the results are shown in Figure 9. Similar to the non-robust design, each point in Figure 9 is averaged over 100 channel realizations. In the simulation, σ 2 e = 0 means perfect CSI is known at the CU. It was observed in the previous subsection that the SRmax algorithm has a worse EE performance than that of the TLD (TLD has near optimal performance), and thus we only plot the EE curve of robust TLD algorithm under different CSI errors. Due to the high complexity imposed by the DC algorithm, we do not compare it with the TLD in this section. We can see from Figure 9 that by increasing the channel errors from σ 2 e = 0 to σ 2 e = 10 −2 , the EE performance decreases significantly especially in the middle to high fronthaul capacity. In the low fronthaul capacity region, according to (38), the received SINR is slightly influenced by the channel errors since the increased interference is very small. While a worse sum rate is obtained in the middle to high fronthaul capacity due to the impairment of channel error on the received SINR, and more amount of transmit power is consumed in order to achieve the same sum rate for a larger channel errors. As a result, the sum rate gain cannot compensate the network power reduction that caused by the channel error, resulting in lower EE. In summary, the EE performance is susceptible to the channel error especially for a larger channel error.

Conclusions
In this paper, we have studied the EEmax problem with fronthaul compression in a downlink C-RAN. The optimization problem is formulated as a non-convex fractional programming problem. We have proposed an optimal algorithm based on BnB method to provide a performance benchmark. Further, a near optimal TLD algorithm has been proposed via a bisection search procedure in conjunction with an ADMM method. The proposed algorithms were guaranteed to be convergent, and the solution of TLD was achieved with closed-form in a parallel manner. Simulation results illustrated that the TLD algorithm converged to a near optimal solution very fast, and it achieved a much higher EE than the SRmax algorithm. The results further indicated that EE could be improved by increasing the fronthaul link capacity or optimizing the network power. Numerical analysis also demonstrated that the robust TLD algorithm can provide robust performance in the case of lacking perfect CSI, and its performance is susceptible to the channel error.