Optimization of MIMO Systems Capacity Using Large Random Matrix Methods

: This paper provides a comprehensive introduction of large random matrix methods for input covariance matrix optimization of mutual information of MIMO systems. It is ﬁrst recalled informally how large system approximations of mutual information can be derived. Then, the optimization of the approximations is discussed, and important methodological points that are not necessarily covered by the existing literature are addressed, including the strict concavity of the approximation, the structure of the argument of its maximum, the accuracy of the large system approach with regard to the number of antennas, or the justiﬁcation of iterative water-ﬁlling optimization algorithms. While the existing papers have developed methods adapted to a speciﬁc model, this contribution tries to provide a uniﬁed view of the large system approximation approach.


Introduction
It is now recognized that multiple-input multiple-output (MIMO) systems have the potential to increase the capacity of wireless digital communication systems.If t and r denote the number of transmit and receive antennas, the channel capacity gain can be in fact of the order of min(t, r) under certain (often rather unrealistic) assumptions [1].Although the use of MIMO systems begins to be normalized, a number of related theoretical issues are still to be solved.This paper addresses the design of input covariance matrices chosen in order to maximize the (Gaussian) mutual information of the system.This problem was extensively addressed in the past under various assumptions on the channel state information.In this paper, we assume that the channel is known at the receiver side, but that only its probability distribution, which is assumed complex Gaussian, is available at the transmitter side, a context recognized to be realistic in wireless communications, as the channels encountered are mostly fast fading whereas their statistics are slow fading.In this context of fast fading channels, we need to consider the expectation of the mutual information w.r.t. the channel probability distribution instead of the mutual information itself.The optimization of this average mutual information is of course more difficult than the optimization of the mutual information itself, because the analytical expressions of the cost function and of its first-and second-order derivatives are extremely complicated.The optimal input covariance matrix thus cannot be expressed in closed form, and one has to resort to descent algorithms in which the gradient and the Hessian of the cost function have to be evaluated using computationally intensive Monte-Carlo simulations (see e.g., [2] in the context of Rician channels).We note that for certain channel distributions (Rayleigh bi-correlated and uncorrelated Rician channels), the singular vectors of the optimal precoder have closed form expressions, which, of course, simplifies the optimization algorithms (see e.g., [3][4][5]).
Recently, it was proposed to use large random matrix tools to approximate the average mutual information by a large system approximation, and to optimize the approximation versus the input covariance matrix.This optimization problem is easier because the approximation can be very often nearly expressed in closed form so that its maximization does not need Monte-Carlo simulations.Large system approximations of average mutual information were derived in the past using the replica method (see e.g., [6] for bi-correlated MIMO Rayleigh channels, [7] for frequency selective Rayleigh channels, [8] for bi-correlated MIMO Ricean channels, [9] for jointly correlated Rician channel) or rigorous random matrix methods ( [10] for bi-correlated MIMO Rayleigh channels, [11] for bi-correlated MIMO Ricean channels, [12] for frequency selective Rayleigh channels, [13] for multiple access Rayleigh MIMO channels).Optimization algorithms of the large system approximation were first proposed in [14] in the context of bi-correlated MIMO Rayleigh channels.In [11,12] the authors addressed in more details the Rician bi-correlated channels and the frequency selective Rayleigh channels respectively, while [15] considered the Rician channel with interference.In [9] the authors considered the case of jointly correlated Rician channel.Finally, in the context of multiple access Rayleigh MIMO channels, the authors in [13] studied the multiple users input covariance matrices optimization of the related large system approximation.In [16] the authors generalized this analysis in the context of the multiple access jointly correlated Rician MIMO channels.Interestingly, the mutual information provided by the optimization of the approximation appears numerically to be quite close from the true optimum mutual information, even for realistic values of r and t.These observations were confirmed by the theoretical results of [11,12], which showed that the relative error is a O 1 t 2 term.In this paper, we give a comprehensive introduction to the optimization of large system approximations of average mutual information of MIMO channels.We note that the main results of this paper were already presented in [11,12].However, the focus of [11,12] is on the detailed derivation of the large system approximations, while the present paper concentrates on their optimization.In particular, we emphasize on methodological issues that are often partially-if at all-addressed.Another benefit of this paper is to provide, as much as possible, a unified view of the optimization of the large system approximation in the context of various kinds of MIMO channel, whereas the above referenced papers develop tools that seem specific to each particular model.Generally speaking, the writing style of this paper is rather informal, and hopefully easier to understand.The interested reader may refer to [11,12] for a rigorous presentation.

Optimization of the Average Mutual Information
We consider a flat fading MIMO system equipped with t transmit antennas and r receive antennas.In this context, the received observation at time n is a r-variate signal y(n) given by: where x(n) represents the t-dimensional signal sent at time n at the transmitter side, and where H is the Channel matrix H is assumed to be a complex Gaussian random matrix (Matrix H is said complex Gaussian if the rt-dimensional vector h = vec(H) is complex Gaussian, i.e., the probability distribution of (Re(h whose current value is known at the receiver side and whose probability distribution is known at the transmitter side.If Q denotes the covariance matrix of x(n) normalized in such a way that 1  t Tr(Q) = 1, the (Gaussian) average mutual information of model ( 1) is given by [1]: where E H represents the mathematical expectation operator w.r.t. the distribution of matrix H.
In the following, we denote by C the cone of all non-negative (positive semi-definite) matrices Q, and by C 1 the set of all matrices Q ∈ C for which 1  t Tr(Q) = 1.The ergodic capacity of the channel model ( 1) is defined as the maximum of I(Q) over the set C 1 , and is known to represent the maximum rate that can be transmitted reliably when H is known at the receiver and when its probability distribution is known at the transmitter side.In particular, using as input covariance matrix the argument of the maximum of I(Q) allows to transmit reliably at the largest possible rate.This explains why the maximization of I(Q) over the set C 1 is important.
Function Q → I(Q) is concave, and under some additional assumptions, strictly concave on the convex set C 1 .Therefore ( [17]), the maximization of I(Q) over C 1 has a unique argument, which is denoted Q * in the following.In general, Q * has no closed form expression. Therefore, its evaluation needs the use of descent algorithms which, due to the concavity of I and the convexity of C 1 , are convergent.In order to implement such algorithms, it is necessary to be able to evaluate I and its first and second derivatives at each point.As the analytical expressions of these functions are in general extremely complicated, the simplest solution consists in using Monte-Carlo evaluations.This, of course, leads to computationally intensive algorithms.

Large System Approximation of
In this paragraph, we first introduce informally how it is possible to derive a large system approximation of the mutual information I(I) corresponding to the case where the covariance matrix of the transmitted signal is Q = I.In order to simplify the notations, we denote I(I) by I.
Background on the behaviour of the eigenvalue distribution of HH H Let H r,t be a r × t complex Gaussian random matrix representing the channel matrix of a MIMO system ( ) tends to have a deterministic behaviour when the dimensions r and t converge to +∞ in such a way that t r → c, where 0 < c < +∞.In order to shorten the notations, t → +∞ stands in the following for r and t converge to +∞ in such a way that t r → c.This is equivalent to saying that it exists deterministic probability measures µ r,t (which depend on the dimensions r and t) for which almost surely when t → +∞ for each well chosen test function φ.Sometimes, the measures µ r,t also converge towards a probability distribution µ * , which does not depend on the values of (r, t) but only on c = lim t r .In this case, the eigenvalue distribution converges towards µ * , which is called the limit eigenvalue distribution H r,t H H r,t .In order to illustrate the above phenomenon, we consider the simplest situation in which the entries of H r,t are independent identically distributed (i.i.d.) zero mean random variables of variance 1 t , and where c ≤ 1.In this case, the measure µ r,t is absolutely continuous, and its density p r,t (λ) is given by: The corresponding distribution is called the Marčenko-Pastur distribution, and was introduced for the first time in [18].In order to simplify the notations, we no longer mention the dimensions r and t in the following, and denote H r,t , μr,t , µ r,t by H, μ, µ respectively.In order to establish that μ has a deterministic behaviour when t → +∞, it is well known that it is sufficient to establish that (3) holds for φ(λ) = 1 λ+σ 2 for each σ 2 > 0 (see e.g., [19,20]).In order to reformulate (3) for φ(λ) = 1 λ+σ 2 , we introduce the Stieltjes transform s ν of a positive measure ν carried by R + , which is the function defined for each z ∈ C − R − as: Equation (3) for φ(λ) = 1 λ+σ 2 for each σ 2 > 0 is thus equivalent to: for each σ 2 > 0 because In order to prove (5), a possible approach is to establish that s μ(σ 2 ) has the same asymptotic behaviour as a deterministic quantity which appears to coincide with the value at point σ 2 of the Stieltjes transform of a certain probability measure µ carried by R + .For this, it is useful to introduce the resolvent of matrix HH H defined as the r × r matrix valued function S defined for each σ 2 > 0 by: It is important to notice that Under suitable hypotheses on H, the random entries of S(σ 2 ) turn out to have the same behaviour as the entries of a deterministic r × r matrix T(σ 2 ), which, in particular, implies that The entries of matrix T(σ 2 ) can be expressed in closed form in terms of the (unique) positive solutions (δ j (σ 2 )) j=1,...,n , ( δi (σ 2 )) i=1,...,m of a nonlinear system of n + m equations depending on r, t, σ 2 , and the statistical properties of H.This nonlinear system of equations is called in [21] the canonical equations associated to the random matrix model H.The number of unknowns m + n depends on r, t and the statistics of H.Then, it is shown that function σ 2 → 1 r TrT(σ 2 ) coincides with the Stieltjes transform of a certain probability measure µ, which, in turn, establishes that (5) holds for each σ 2 .It is also useful to notice that the entries of matrix S(σ 2 ) defined as have the same behaviour as the entries of a t × t deterministic matrix T(σ 2 ).These entries can be evaluated in terms of the solutions of the above-mentioned canonical equations.
We also mention that it is often possible to evaluate the accuracy of the approximations.In the context of a number of complex Gaussian random matrix models, it holds that Large system approximation of I We finally consider the problem of approximating and thus coincides with a functional of the eigenvalues of HH H .It can therefore be approximated by a deterministic term that can very often be expressed in an explicit way.In order to explain this, we use the following well-known identity: As 1 r TrS(ω 2 ) can be approximated by 1 r TrT(ω 2 ), it can be shown that But, in the context of the various usual MIMO models, it turns out that where δ(σ 2 ) = (δ 1 (σ 2 ), . . ., δ n (σ 2 )) T and δ(σ 2 ) = ( δ1 (σ 2 ), . . ., δm (σ 2 )) T represent the solutions of the nonlinear system of m + n equations governing the entries of T(σ 2 ) and where g is a well chosen function that is expressed in closed form.Taking the mathematical expectation of (12), and using that the error between E 1 r TrS(ω 2 ) and 1 r TrT(ω 2 ) is a O 1 t 2 term for each ω 2 (see Equation ( 10)), it can be shown that or equivalently where the large system approximation I of I is defined by: As I and I scale linearly with t when t increases, Ref (14) implies that the relative error between I and its large system approximation is a O 1 t 2 term, and thus decreases rather fast towards 0 when t increases.This explains partially why large system approximation of mutual information are very accurate even for moderate number of transmit and receive antennas.
Examples In order to illustrate these results, we provide the expressions of T, T and I in the context of useful random matrix models.We first consider the case where H can be written as: where the entries of X are complex Gaussian i.i.d.random variables with zero mean and variance 1 and where C and C are r × r and t × t deterministic positive matrices such that Then, it can be shown that the nonlinear system of equations: has a unique pair of positive solutions denoted (δ(σ 2 ), δ(σ 2 )).Matrices T(σ 2 ) and T(σ 2 ) introduced above are given by: The two canonical equations defining (δ(σ 2 ), δ(σ 2 )) can therefore be written as: The corresponding large system approximation I of I is given by: so that function g defined by Equation ( 13) is given by g(σ 2 , δ, δ) = t σ 2 δ δ.It is useful to justify the above expression of I because the proof below extends in the context of other models.For this, it is necessary to establish that Equation ( 13) holds for the above function g.It can be shown that the right-hand side of ( 23) converges towards 0 when σ 2 → +∞.Therefore, it remains to check that the derivative w.r.t.σ 2 of the right-hand side of ( 23) coincides with − r 1 σ 2 − 1 r TrT(σ 2 ) .We introduce the following function V (σ 2 , κ, κ) of three variables: which is obtained by replacing (δ(σ 2 ), δ(σ 2 )) with the fixed parameters (κ, κ) into the expression of the right-hand side of (23).A very important property of function V is: In fact, it is straightforward that and vanishes because δ(σ 2 ) satisfies the canonical Equation (22).We obtain similarly that (26) holds because δ(σ 2 ) is the solution of the canonical Equation (21).In sum, it appears that (25,26) are equivalent to the canonical Equations 19 and 20.The derivative w.r.t.σ 2 of the right-hand side of ( 23) is given by: and thus coincides with −t δ(σ 2 ) δ(σ 2 ) by (25,26).Using (21,22) as well as the expressions (19,20) of T(σ 2 ) and T(σ 2 ), this is easily seen to be equal to − r 1 Hence, both sides of ( 13) have the same derivative w.r.t.σ 2 .Moreover, both sides converge towards 0 when σ 2 → +∞, thus Equation ( 13) is verified for g(σ 2 , δ, δ) = t σ 2 δ δ.This, together with (12), establishes that (23) holds.
We now consider the case where channel H is given as: where matrices (X l ) l=1,...,L are mutually independent complex Gaussian random matrices with i.i.d.entries of mean 0 and variance 1 and where C l and Cl are r × r and t × t deterministic positive matrices such that sup for l = 1, . . ., L. It is important to notice that the parameter L is independent of r and t, and thus does not scale with the number of antennas.As shown in [7,12], model ( 27) is useful in the context of multipath channels with independent paths.Matrices T and T are given by: where the (δ l (σ 2 ), δl (σ 2 )) l=1,...,L are the unique positive solutions of the system of 2L equations: for l = 1, . . ., L. The large system approximation of I is this time given by [7,12]: and thus corresponds to Equation ( 14) when g(σ 2 , δ, δ) = t σ 2 δ T δ = t σ 2 L l=1 δ l δl .(33) is proved in the same way as (23) by introducing the function of 2L + 1 variables V (σ 2 , κ, κ) obtained by replacing into the right-hand side of (33) functions δ(σ 2 ), δ(σ 2 ) by fixed vectors κ and κ, and by observing that the canonical Equations (31, 32) are equivalent to: We now consider a third random matrix model modeling bi-correlated Rician channels.We assume that H can be written as: where the entries of X are complex Gaussian i.i.d.random variables with zero mean and variance 1 and where C and C are r × r and t × t deterministic positive matrices such that Matrix A is a deterministic r × t matrix satisfying: This time, matrices T(σ 2 ) and T(σ 2 ) are given by: where (δ(σ 2 ), δ(σ 2 )) satisfy the same canonical Equations (21,22) as in the case of the Rayleigh bi-correlated MIMO channels, i.e., It can be shown as previously that the corresponding large system approximation I of I is now given by [11,22]: and corresponds to Equation ( 14) when g(σ 2 , δ, δ) is equal to: A slightly more general non-zero mean MIMO model is: where • represents the Schur-Hadamard product, where D represents a r × t matrix such that D i,j ≥ 0 and sup r,t D < +∞, and where U and V are deterministic unitary r × r and t × t matrices.A and X have the same properties as in the context of model ( 36).This MIMO channel model was introduced in [23], studied in the large system regime in the zero mean case in [10] and in the non-zero mean case in [24] and, using the replica method, more recently in [9,16] in the context of the MIMO multi-access channel.As we are essentially interested by the functionals of the eigenvalues of HH H , it is possible to assume without restrictions that U and V are reduced to I r and I t respectively.In this case, matrices T and T are given by: where ∆(σ 2 ) = Diag(δ 1 (σ 2 ), . . ., δ t (σ 2 )) and ∆(σ 2 ) = Diag( δ1 (σ 2 ), . . ., δr (σ 2 )) are two diagonal t × t and r × r matrices whose elements are the positive solutions of the equations: for i = 1, . . ., r, and j = 1, . . ., t, where E j = Diag(D 2 1,j , . . ., D 2 r,j ) and Ẽi = Diag(D 2 i,1 , . . ., D 2 i,t ).The large system approximation I is now given by: (50) and corresponds to Equation ( 14) for a suitable function g(σ 2 , δ, δ).We finally consider the model representing the MIMO multiple access channel addressed in [13].H is now a r × Lt matrix given by: where the various matrices (C l , Cl , X l ) l=1,...,L satisfy the same conditions as in the context of model (27).

The General Case Q = I
In order to address the general case Q = I, it is sufficient to exchange random matrix model H by random matrix model HQ 1/2 , and to evaluate the large system approximation I(Q) of the corresponding average mutual information I(Q).This is straightforward if the right multiplication of H by Q 1/2 leads to a random matrix model for which the large system approximations are easy to evaluate.This turns out to be the case in the context of models (15, 27 and 36) and also in the context of model (51) if Lt × Lt matrix Q is block diagonal, i.e., Q = Diag(Q 1 , . . ., Q L ), a condition that fortunately holds in the multiuser precoding schemes presented in [13].For these models, H and HQ 1/2 belong to the same class of random matrix model and it is then sufficient to replace: • for model (15), matrix C by Q 1/2 CQ 1/2 , • for model (27), matrices ( Cl ) l=1,...,L by (Q 1/2 Cl Q 1/2 ) l=1,...,L , • for model (36), matrices A and C by AQ 1/2 and Q 1/2 CQ 1/2 respectively, • for model (51), matrices ( Cl ) l=1,...,L by (Q in the various equations defining the large system approximations of I (note that the various matrices C are multiplied from both sides by Q 1/2 as they are covariance matrices; e.g., the matrix C is the transmit covariance matrix of channel H, therefore the transmit covariance matrix of the channel HQ 1/2 is Q 1/2 CQ 1/2 ).In the context of model (45) the right multiplication by Q 1/2 unfortunately modifies the structure of the model.Using the replica trick [9] nevertheless derived a large system approximation I(Q) of I(Q) (see also [16] in the case of the corresponding multi-user MIMO channel).It is however not clear that the optimization of I(Q) completely fits with the unified presentation of the present paper.Therefore, we will not elaborate on the model (45) in the following.
In the context of models (15, 27 and 36), matrices T and T as well as solutions of the canonical equations δ, δ still depend on σ 2 , but also on the covariance matrix Q.As the dependence versus σ 2 does not play any role in the following, we now denote T, T, δ and δ by T(Q), T(Q), δ(Q) and δ(Q) respectively.It is easily seen that for the above 3 models the large system approximation of I(Q) can be written as: where G(δ, δ) is a certain t × t positive matrix valued function given in closed form, and i(δ, δ) is a function also given in closed form.As previously, it is useful to introduce the function V(Q, κ, κ) defined by: and which corresponds to the expression (58) of I(Q) but in which (δ(Q), δ(Q)) is replaced by the fixed parameter (κ, κ).In others words, I(Q) can be written as: In the context of models (15, 27 and 36), it holds that for each pair (i, j).As previously, these relations follow directly from the canonical equations verified by the components of δ(Q) and δ(Q).It is important to notice that the boundedness assumptions (28,38) imply that I(Q) represents a valid large system approximation of I(Q) provided that the mean and the covariance matrices associated to model HQ 1/2 satisfy such assumptions.For this, it is sufficient to assume that matrix Q satisfies: In particular, property As explained below, this induces some technical difficulties to justify the relevance of the approach consisting in replacing the optimization of I(Q) by the optimization of I(Q).
We finally mention that in the context of model ( 51), if Q is block-diagonal, it holds that for certain matrix valued functions (G l ) l=1,...,L .In [13], it is proposed to optimize (64) w.r.t.matrices (Q l ) l=1,...,L under the constraints 1 t Tr(Q l ) = 1 for each l = 1, . . ., L. As the formulation of this problem slightly differs from the optimization of I in models (15, 27 and 36), we will not discuss this issue in the next sections.However, the results presented below can easily be adapted to the context of model (51).

Concavity of the Large System Approximation
As the general approach of this paper is to optimize I(Q) w.r.t.Q, it is crucial to be able to prove that Q → I(Q) is a strictly concave function defined on C 1 .We point out that this important point was not addressed in [7,14,15].While the strict concavity needs rather intricate arguments, specific to each model, the concavity of I in the context of (15, 27 and 36) can be derived in a unified way.For this, we use the scheme proposed in [11,12], and introduce for each integer m an augmented mr × mt-dimensional model H (m) of the same kind in which the various covariance matrices and mean matrices are exchanged by their Kronecker product by matrix I m (the new matrices are denoted ), and in which the mutually independent i.i.d.r×t complex Gaussian random matrices (X l ) l=1,...,L are replaced by mutually independent i.i.d.mr × mt complex Gaussian random matrices (X (m) l ) l=1,...,L .If Q is a t × t covariance matrix defined on the usual cone C, we consider the function I m defined on C by: When m → +∞ while r and t remain fixed, the dimensions mr and mt of matrix H (m) both converge to +∞ in such a way that mt mr = t r remains constant.For each model, the mutual information I m (Q) has a large system approximation I m (Q ⊗ I m ), which can be expressed explicitly in terms of Q and of the solutions of the associated canonical equations denoted δ (m) (Q ⊗ I m ) and δ(m) (Q ⊗ I m ).However, looking at the canonical equations characterizing models (15, 27 and 36), it is easy to check that δ (m) (Q⊗ I m ) and δ(m) (Q ⊗ I m ) coincide with the solutions δ(Q) and δ(Q) of the canonical equation associated to the non-augmented model, and that for each m.Moreover, it is clear that As the pointwise limit of concave functions is still concave, (67) implies that I is concave on C.
As mentioned above, the strict concavity of the large system approximation is much more difficult, and was proved in [11,12] in the context of the models (15, 27 and 36).Therefore, function I is maximized on C 1 by a unique matrix denoted Q * in the following.The optimization of I in place of I is justified if it is possible to establish that the optimum mutual information I(Q * ) has the same asymptotic behaviour as I(Q * ) when t increases.As function I is an approximation of function I, it seems reasonable to expect that the argument of the maximum of both functions behave similarly, and thus that I(Q * ) − I(Q * ) → 0 when t → +∞.We however recall that I(Q) is a valid approximation of I(Q) provided that sup t Q < +∞.It is therefore intuitive that if Q * and/or Q * do not satisfy this condition, then the approach consisting in approximating Q * by Q * may not be successful.Fortunately, the following result was established in [11,12] in the context of models (15, 27 and 36).
Theorem 1 In the context of models (15, 27 and 36), under certain extra assumptions on the second order statistics of H, it holds that (1) This result is proved in [11,12], where the above-mentioned technical extra assumptions are specified (they are related to the infima over the considered matrix dimension of the second order statistics smallest eigenvalue-these infima need to be greater than 0).The proofs of items (1) and (2) are not obvious, and the interested reader may refer to [11,12] for more details.We here only justify that item (3) holds if (1) and (2) are verified.We write that As t , which, in particular, implies that item (3) holds.

Structure of
In the context of model (15), it is well known that the eigenvectors of the optimal matrix Q * coincide with the eigenvectors of matrix C. Models (27,36) are more complicated, and the structure of Q * is unknown.The asymptotic approach presented in this paper allows to obtain insights on matrix Q * , and thus on an approximate capacity achieving covariance matrix.In order to present the corresponding result, we recall that in the context of models (27,36), I(Q) is given by (58).We denote by δ * and δ * the vectors δ(Q * ) and δ(Q * ), and by G * the matrix G(δ * , δ * ).Then, the following result holds.
Theorem 2 Matrix Q * coincides with the maximizer of the following classical optimization problem: Before giving the proof, we provide some comments on Theorem 2. It is first important to notice that this result cannot in practice be used in order to evaluate matrix Q * because the parameters δ * and δ * , which depend on Q * , are of course unknown.However, Theorem 2 is useful in that it gives some insights on the structure of the nearly capacity achieving covariance matrix Q * .More precisely, it is well known that the eigenvectors of the maximizer of (69) coincide with the eigenvectors of G * and that its eigenvalues are solutions of a water-filling algorithm based on the eigenvalues of G * .It is useful to express matrix G * in the context of models (27,36).In the first case (27), G * is equal to: If model (27) reduces to the Rayleigh bi-correlated MIMO channel, i.e., L = 1, one obtains that the eigenvectors of Q * coincide with the eigenvectors of the transmit correlation matrix, a result consistent with what is known on Q * in this context.Interestingly, if L > 1, we obtain that the eigenvectors of the approximate capacity achieving covariance matrix Q * coincide with the eigenvectors of a weighted sum of the covariance matrices ( Cl ) l=1,...,L .Moreover, the eigenvalues of Q * are the solutions of a classical water-filling algorithm based on the eigenvalues of L l=1 δ l, * Cl .In the context of model (36), matrix G * is given by: In the case of an uncorrelated Rician channel (C and C are both reduced to I), matrix G * is a linear combination of I and of matrix A H A. The eigenvectors of Q * thus coincide with the right singular vectors of A, a result in accordance with the structure of Q * in this context (see [5]).If C = I, the eigenvectors of Q * coincide with the eigenvectors of a linear combination of C and A H A, which is an intuitively pleasant result.However, in the general case where C = I, this is no longer true, and a linear combination of C and the surprising matrix A H (I + δ * C) −1 A comes into play.
Proof of Theorem 2. We first introduce some classical concepts and results needed for the optimization of Q → I(Q).
Definition 1 Let φ be a function defined on the convex set C 1 .Let P, Q ∈ C 1 .Then φ is said to be differentiable in the Gâteaux sense (or Gâteaux differentiable) at point Q in the direction P − Q if the following limit exists: Proposition 1 Let φ : C 1 → R be a strictly concave function.Then, • Q opt is the unique maximizer of φ on C 1 if and only if it verifies: This proposition is standard (see for example Chapter 2 of [25]).
In order to prove Theorem 2, we apply the above proposition to function I.For this, we use function V introduced in Equation (59).It is clear that maximizing over The proof then relies on the observation hereafter proven that, for each P ∈ C 1 , Assuming ( 71) is verified, it holds that the left-hand side of (71) is negative for each P because Q * maximizes I over C 1 .Therefore, we obtain that V (Q * , δ * , δ * ), P − Q * ≤ 0 for each matrix P ∈ C 1 .
As the function Q → V(Q, δ * , δ * ) is strictly concave on C 1 , its unique maximizer on C 1 coincides with Q * .It now remains to prove (71).We first mention that functions Q → δ(Q) and Q → δ(Q) are Gâteaux differentiable (this is proved in [11,12] in the context of models (27,36)).Consider P and Q ∈ C 1 .Then, (60) implies that Using relations (61, 62) and letting Q = Q * in (72) yields, as expected, We mention that a number of authors used, at least implicitly, the result stated in Theorem 2, but sometimes with partial or non-convincing justifications.We hope that the above proof is more convincing.

Optimization Algorithms of I(Q)
The most straightforward way to optimize the strictly concave function Q → I(Q) on C 1 consists in implementing a descent algorithm.The constraint Q ≥ 0 may be taken into account by using, e.g., barrier interior point methods ( [26]).However, an alternative tempting choice consists in proposing an iterative water-filling algorithm based on the observation that for fixed (κ, κ), the optimization w.r.t.Q of Q → V(Q, κ, κ) can be achieved by a classical water-filling.On the other hand, for fixed Q, the solutions of the canonical equations (δ(Q), δ(Q)) can be easily evaluated.Therefore, a tempting choice is to adapt Q and (δ(Q), δ(Q)) separately.In order to present the corresponding algorithms, we assume that the canonical equations defining (δ(Q), δ(Q)) can be written as: In the simulations featured in Figure 1(a) (respectively in Figure 1(b)), we consider a MIMO system with r = t = 4 (respectively r = t = 8).In both figures, we have represented the average mutual information I(I t ) (i.e., without optimization), the optimal average mutual information I(Q * ) (i.e., with an input covariance matrix maximizing the approximation I) obtained by the iterative water-filling introduced in [11,12] (also called frozen water-filling [15]) and the same quantity I(Q * ) obtained this time by the water-filling introduced by [15].All the mutual information are evaluated by Monte-Carlo simulations with 20 000 channel realizations.All the values given by iterative algorithms are calculated with a relative precision of 10 −3 .The average mutual information I(Q * ) obtained with the Vu-Paulraj direct approach [2] is also represented as a reference.
The algorithm stops when the desired precision is obtained, or, as the Newton algorithm requires heavy Monte-Carlo simulations for the evaluation of the gradient and of the Hessian of I(Q), when the number of iterations of the outer loop reaches a given number N max .As in [2], we chose N max = 10, µ = 100, 20 000 trials for the Monte-Carlo simulations, and we started with k barrier = 1 100 .Both Figure 1(a) and Figure 1(b) show that maximizing I(Q) over the input covariance leads to significant improvement of I(Q).All the approaches provide the same results: the two water-filling procedures presented in this section and the Vu-Paulraj's direct approach all produce the same results, confirming the stated results.
Nonetheless, the execution times differ considerably from one algorithm to another.The indirect approaches are computationally much more efficient: in Vu-Paulraj's algorithm, the evaluation of the gradient and of the Hessian of I(Q) needs heavy Monte-Carlo simulations, thus slowing down drastically the calculations.Besides, the water-filling introduced by [8] is computationally slightly more efficient than the so-called frozen water-filling.Instead of two nested iterative loops (the outer to optimize the input covariance matrix Q, the inner to solve the canonical equations), there is only one loop performing both calculations.For the present case with r = t = 8, an average of 39.7 fixed point iterations is needed for the frozen water-filling to converge, whereas the water-filling introduced by [8] needs only 7.9 fixed point iterations on average.Table 2 illustrates these comparisons; for the three algorithms the average execution time to obtain the input covariance matrix is given, on a 2.93 GHz Intel Core 2 Duo CPU with 4 GB of RAM, for both r = t = 4 and r = t = 8.Frozen Water-filling 17.3ms 20.1ms Water-filling of [8] 5.10ms 7.13ms

Concluding Remarks
In this paper, we have given a comprehensive introduction to the evaluation of the capacity achieving covariance matrices of MIMO channels using large system approximation approaches.When the MIMO channel is available at the receiver and its probability distribution is available at the transmitter, the direct maximization of the mutual information w.r.t. the input covariance matrix leads to high complexity algorithms using intensive Monte-Carlo simulations.An alternative approach consists in approximating the mutual information in the large number of antennas regime and optimizing the approximation.We have recalled how it is possible to derive the large system approximations in the context of a number of MIMO channels, and have addressed their optimization.We have emphasized on important methodological points that are not necessarily covered by the existing literature, such as the strict concavity of the approximation, the structure of the argument of its maximum, the accuracy of the large system approach w.r.t. the number of antennas, and the justification of iterative water-filling optimization algorithms.We finally mention that the mutual information considered in this paper allow to obtain insights on the performances of MIMO systems equipped with maximum likelihood receivers.As these receivers may be complex to implement, reduced complexity schemes such as the minimum mean squares (MMSE) receivers are often considered.In this context, the relevant mutual information to be optimized versus the input precoder is different, and is defined as the sum over the transmit antennas of terms E(log(1 + β j )) where β j represents the output MMSE signal to interference plus noise ratio associated to the stream sent by antenna j.The corresponding optimization problems are more difficult because they are not convex as in the context of the present paper.However, high accuracy large system approximation techniques can still be developed (see e.g., [28,29]), and provide both insights on the optimal precoders and low complexity maximization algorithms.The case of Rayleigh bi-correlated MIMO channels was recently addressed in [30] where the structure of the optimal precoders was analyzed.However, an important work remains to be done in the context of the other usual MIMO channels.

( 1 )
and (2) hold, I(Q * ) − I(Q * ) and I(Q * ) − I(Q * ) are both O 1 t terms.Moreover, I(Q * ) − I(Q * ) and I(Q * ) − I(Q * ) are both positive terms by the very definition of Q * and Q * .Therefore, both terms are O 1
t ) (no optimization) I(Q * ) (waterfilling) I(Q * ) (frozen waterfilling) I(Q * ) (Vu-Paulraj optimization) (a) r = t = 4 t ) (no optimization) I(Q * ) (waterfilling) I(Q * ) (frozen waterfilling)I(Q * ) (Vu-Paulraj optimization) (b) r = t = 8Vu-Paulraj's algorithm is a direct approach to optimize I(Q).It is composed of two nested iterative loops.The inner loop evaluates Q (n) * = argmax {I(Q) + k barrier log detQ} thanks to the Newton algorithm with the constraint 1 t Tr Q = 1, for a given value of k barrier and a given starting point Q (n) 0 .Maximizing I(Q) + k barrier log detQ instead of I(Q) ensures that Q remains positive semi-definite through the steps of the Newton algorithm; this is the so-called barrier interior-point method.The outer loop then decreases k barrier by a certain constant factor µ and gives the inner loop the next starting point Q

Table 2 .
Average execution time.r = t = 4 r = t = 8 Vu-Paulraj 503s 1877s we indicate that H is a r × t matrix by denoting H by H r,t ), and consider its associated Gram matrix defined by H r,t H H r,t .Under certain assumptions on the probability distribution of H r,t (see the examples below), it appears that the empirical eigenvalue distribution μr,t of H r,t H H r,t defined as 1