Improved Massive MIMO RZF Precoding Algorithm Based on Truncated Kapteyn Series Expansion

In order to reduce the computational complexity of the inverse matrix in the regularized zero-forcing (RZF) precoding algorithm, this paper expands and approximates the inverse matrix based on the truncated Kapteyn series expansion and the corresponding low-complexity RZF precoding algorithm is obtained. In addition, the expansion coefficients of the truncated Kapteyn series in our proposed algorithm are optimized, leading to further improvement of the convergence speed of the precoding algorithm under the premise of the same computational complexity as the traditional RZF precoding. Moreover, the computational complexity and the downlink channel performance in terms of the average achievable rate of the proposed RZF precoding algorithm and other RZF precoding algorithms with typical truncated series expansion approaches are analyzed, and further evaluated by numerical simulations in a large-scale single-cell multiple-input-multiple-output (MIMO) system. Simulation results show that the proposed improved RZF precoding algorithm based on the truncated Kapteyn series expansion performs better than other compared algorithms while keeping low computational complexity.


Introduction
Recently, due to the rapid development of wireless communication technology, the demand for data rate, service quality and number of users has multiplied and research on large-scale MIMO technology, a key technology in wireless communication technology, has been deepened, both domestically and overseas [1].In a massive MIMO system, the RZF precoding algorithm can obtain approximately optimal linear precoding performance by increasing the number of antennas of the base station (BS).However, the complexity of hardware platform implementation and signal processing will also increase accordingly.When the ratio of the number of BS antennas to the number of user antennas is greater than 10, although the number of BS transmit antennas is very large, the complexity of RZF precoding mainly focuses on the inverse budget of the matrix which is related to the number of users [2].The dimension of inverse matrix in the massive MIMO system is also large relative to the conventional MIMO system.The direct matrix inversion incurs a very high computational complexity in massive MIMO systems when the large-dimensional matrix inversion needs to be performed for RZF precoding.
The idea of estimating computation has been applied in many ways.For example, the articles in [3][4][5][6] use the coprime array interpolation to estimate the related source parameters, direction-of-arrival and so on.Recently, related research work in large-scale MIMO systems is also very plentiful.The authors of [7] present a wide choice of low-complexity sub-optimal rules about channel-aware decision fusion over MIMO channels which efficiently exploit large-array benefits.In [8], a diagonal band Newton iteration (DBNI) method is proposed.The contribution of [9] is that solve the decentralized multi-sensor estimation problem over uniform power allocation in massive MIMO mobile communications.Channel-aware (through pilot-based channel estimation) decision fusion over massive MIMO has been analyzed in [10].
The exact inverse of the matrix methods in common include the QR decomposition based on the Gram-Schmidt method [11,12] and the Gauss-Jordan elimination method [13], but they need lots of calculations.In order to reduce the complexity of the high dimensional inverse matrix, many scholars have started research.The underlying idea of [14] is to carry out an approximate matrix inversion using a small number of Neumann-series terms and on this basis proposes a novel VLSI architecture.A low-complexity precoding scheme based on matrix polynomial for downlink large-scale MIMO systems is presented in [15].The RZF precoding algorithms based on the truncated Neumann series expansion [16] and the truncated Taylor series expansion [17] are the most common method of truncation.However, these algorithms do not consider the influence of the corresponding order factor in the expansion series, and the convergence is not good.
In this paper, the Kapteyn series polynomial is used to represent the inverse matrix of high dimensional matrix in RZF precoding algorithm.That is, we use the Kapteyn series to precode the signal.The inverse matrix is expanded and truncated by the Kapteyn series, taking the N terms before the polynomial.The constraint conditions for the series convergence are established and the value of N and corresponding order factor are studied.When the total transmitted power is constant and the channel state information is known, the correspondence between the signal-to-interference-noise ratio (SINR) and the polynomial coefficients is established.In the case of the fixed N value, the optimal solution of polynomial coefficient is sought to optimize SINR, so as to obtain the low-complexity pre-coding matrix.
The rest of this paper is organized as follows.Section 2 describes the system model.Section 3 proposes an improved RZF precoding based on the truncated Kapteyn series expansion to approximate the inverse matrix in RZF precoding and optimizes the expansion coefficients.Then the computational complexity of the proposed algorithm is calculated and compared with the one of the traditional RZF precoding.Simulation results of the proposed algorithm are presented to compare the channel performance to the ones of the RZF precoding with other truncated series expansions in Section 4. Section 5 concludes this paper.
We summarize the following notations used throughout the paper to facilitate the reader.Specifically, the boldface uppercase letters denote the matrices and the boldface lowercase letters represent the column vectors.In addition, (•) H , (•) T and tr (•) are the Hermitian transpose, transpose and trace of the matrix, respectively.Moreover, (•) * is the conjugate operator, • 2 represents the I 2 norm of the matrix, respectively.Finally, CN (m, Φ) denotes the circular symmetric complex Gaussian distribution with mean vector m and covariance matrix Φ.

System Model
A fading downlink channel is considered in a massive MIMO system consisting of a BS with M transmitting antennas and K single-antenna users.The channel matrix H ∈ C M×K is given by where h k is the channel fading coefficient vector between the k-th user and the BS, which elements obey the Gaussian distribution with zero mean.
The estimated channel matrix obtained by the BS after acquiring the channel state information (CSI) is defined as Ĥ = C M×K , which can also be expressed as where ĥk ∈ C M×1 is the estimated channel matrix between the BS and the k-th user with the elements to be assumed to follow Gaussian distribution, i.e., ĥk ∼ CN (0 M×1 ,Φ) where Φ ∈ C M×M is the channel covariance matrix.However, the BS may not be able to accurately obtain the estimated CSI matrix in the real channel environment.Hence we model the distribution of the channel estimate [18] as follows: where h k is the real channel matrix, and n k is the channel noise which follows the Gaussian distribution n k ∼ CN 0, σ 2 and is distributed with the matrix h k independently and identically.In (3), τ ∈ [0, 1] is the channel estimation parameter and when τ = 0, the estimated channel matrix equals to the real one.The received signal by the k-th user terminal (UT) can be expressed as where z k denotes the additive white Gaussian noise at the k-th user and x is the transmitted signal from the BS.Let the signal sent by the BS to K users be s = [s 1 , ..., s K ] T ∼ CN (0 K×1 , I K×1 ), which is pre-coded to reduce the multi-user interference [19], yielding where G = [g 1 , ..., g K ] ∈ C M×K is the precoding matrix.

Using Kapteyn Series To Estimate The Inverse Matrix In RZF Precoding
This section formulates the RZF precoding matrix based on the truncated Kapteyn series expansion and optimizes the polynomial coefficients.Then the computational complexity of some RZF precoding algorithms are analyzed and compared.

Truncated Kapteyn Series Expansion Algorithm
To start, let us define the RZF precoding matrix as [20] where β is set to ensure G RZF satisfies the power constraint tr GG H = K.In Equation ( 6), the scalar regulation coefficient ξ can be selected as shown in [20].Denote X = ĤH Ĥ + ξI K .Let the eigenvalue of the matrix I − αX be λ which satisfies the following inequality.
Then, according to the definition of the Kapteyn series [21], if the eigenvalues of the positive definite matrix X satisfy |λ m (X)| < 1, we have where and J n (nX) is the Bessel function of the first kind.
Rewriting (8) to the summation form yields where . Similarly, since all eigenvalues of any positive definite matrix X and 0 < α < 2 max (λ n (X)) are less than 1, the matrix polynomial based on Kapteyn series expansion can be obtained via Equation ( 9).The inverse of matrix X can be expressed as [22,23].
where 10) can be rewritten as Substituting Equation (11) into Equation ( 6), we have Thus, by truncating Equation ( 12) and taking the first N terms, we obtain the RZF precoding matrix with the truncated Kapteyn polynomial expansion as [24] Note that fl n (X) is the sum of infinite terms.Hence it is also necessary to perform the truncation of fl n (X) to further reduce the computational complexity.When only taking the first term of fl n (X), we have which results in the corresponding RZF precoding matrix with the truncated Kapteyn polynomial expansion, shown as Using the equation (a C n m a n−m b m to expand Equation ( 15), we have 16) is finally expressed as Similarly, when taking the first two terms of fl n (X), i.e., the corresponding RZF precoding matrix with the truncated Kapteyn polynomial expansion is given by Equation ( 19) is further simplified as Using the equation (a 20), we have For , expanding it and adding the k-th column of each item, it can be further manipulated to obtain the Equation (22).

Polynomial Coefficients Optimization
Let us consider Equation ( 22) and examine its polynomial coefficients by denoting Then Equation ( 22) is expressed as It is observed that the existing N + 2 degrees of freedom in Equation ( 23) can be optimized.Specifically, the N + 2 coefficients Ψ = [ϕ 0 . . .ϕ N+1 ] H are optimized to improve the signal acceptance performance at the users, which are described as below.
In the RZF precoding, the SINR at the k-th user is defined as Our objective is to maximize the SINR at the users.Firstly, according to Equations ( 23) and ( 24), we obtain tr where Proof.See the Appendix A.
Hence, according to Equations ( 25)-( 27), we formulate the optimization problem as max The optimal value Ψ opt of the coefficients Ψ can be obtained by solving (28).However, the matrices A k , B k and C in Equation ( 28) are varying with the change of the channel matrix H. Consider that the channel matrix behaves as deterministic in the large-scale MIMO system.Hence when both the number of transmitting antennas at BS and the number of single-antenna users tend towards infinity, the limits of the matrices A k , B k and C can be obtained by the random matrix theory, and the optimal value Ψ opt is then obtained [25].Specifically, assume that the limit values of A k , B k and C are Āk , Bk and C respectively.We can rewrite Equation (28) as max Further, since the matrices A k , B k and C are all real symmetric matrices, simplifying Equation (29) leads the objective function becoming Note that for any real symmetric matrices R, we have R = R 1/2 H R 1/2 .Hence Equation (30) can be further manipulated into 2 Ø, which is substituted into Equation (31) to achieve The maximum value of the objective function in Equation ( 32) is the maximum eigenvalue of the matrix 2 .Thus let the maximum eigenvalue of the matrix be λ max and the corresponding eigenvector be b, then Ø = kb, where k is a specific normal number.
According to 2 Ø, we have Substituting Equation (33) into Ψ H CΨ = K, we have Solving Equation (34) results in Therefore the final solution of Ψ opt is

Analysis of Computational Complexity
A thorough analysis of the computational complexity of RZF precoding based on the truncated Kapteyn series expansion is presented in this section, where the computational complexity was measured in terms of the total number of multiplication and addition operations that are needed for determining the polynomial.The complexity of the traditional RZF precoding algorithm was also evaluated for comparison.
Let us examine the computational complexity of the traditional RZF precoding algorithm first.Assume that T data is the number of time slots during which the base station downlink transmits information to the user in each coherence period.
The traditional RZF precoding matrix is G RZF =β Ĥ ĤH Ĥ + ξI K −1 .Assuming that Ĥ and ξI K are constant, which can be obtained without any calculations.Then, the precoding matrix needs to be computed by the steps of (1) one matrix and matrix multiplication for ĤH Ĥ; (2) one matrix and matrix addition for ĤH Ĥ+ξI K ; (3) one K-order matrix inversion for ĤH Ĥ+ξI K −1 ; (4) one matrix and matrix multiplication for Ĥ ĤH Ĥ+ξI K −1 ; and (5) one constant and matrix multiplication for MK, respectively.Hence the total number of calculations of a traditional RZF precoding matrix is K 3 + 4K 2 M. In addition, when the RZF precoding matrix is obtained, the precoding signal needs to be calculated by x = Gs, which requires the calculations of 2KM − M. Therefore, the computational complexity of the traditional RZF precoding algorithm in a coherence period is summarized as From above discussion, it can be observed that the required calculation of the K-order matrix inversion in the traditional RZF precoding algorithm is very large, which increases dramatically with the number of users.
We are now in a position to examine the computational complexity of RZF precoding based on the truncated Kapteyn series expansion.Let us denote According to Equation (22), G Kapteyn_2 can be rewritten as Because Ω n is independent of the channel matrix H, it can be calculated and stored in advance, which will not consume additional calculations when computing the RZF precoding in the coherence period.The pre-coded signal from BS to the user is where n > 0. Hence the calculations of ŝn are counted as 4MK − M − K, and the calculations of Therefore, the computational complexity of G Kapteyn_2 is given by Obviously, the computational complexity in Equation ( 39) increases with the truncation order of fl n (X).The larger the truncation order of fl n (X), the closer the convergence of the RZF precoding with truncated Kapteyn polynomial expansion to the one without truncation.Numerical simulation results in Section 4 show that the good convergence effects can be achieved when fl n (X) truncation order is two or above.When this truncation order is greater than two, the impact of truncated expression on the convergence of the Kapteyn series is reduced.Therefore, this paper only analyzes the case where the fl n (X) truncation order is 2 for the greatest reduction of the computational complexity.
Moreover, it can be observed from Equation (36) that the optimal coefficients are independent of the actual channel matrix element values.Hence the optimal coefficients can be calculated in advance when the truncation order is determined and the information of the number of BS transmitting antennas and the number of users are received; no extra computation cost will be added during the precoding of the transmitting signal, which makes the RZF precoding with Kapteyn optimization the same computational complexity as the proposed Kapteyn algorithm without coefficients optimization.
Finally, the computational complexities of RZF precoding algorithms based on truncated Neumann series expansion and truncated Taylor series expansion are obtained in the similar way as the one with truncated Kapteyn series expansion.The computational complexity of these listed and aforementioned precoding algorithms are shown in Table 1 for comparison.
Table 1 shows the significant reduction of the computational complexity of the proposed RZF precoding with truncated Kapteyn series expansion compared to the traditional RZF precoding algorithm.For convenience, we also provide a corresponding figure showing the comparison of the computational complexity of the five algorithms for varying number of transmitting antennas.From Figure 1, we can see that the computational complexity of the proposed algorithm is much less than the one of the truncated Neumann series expansion algorithm, although it is slightly higher than the one of the truncated Taylor series expansion algorithm.In Section 4, we compare the performances of five algorithms including our proposed one and show that the improved RZF precoding algorithm based on the truncated Kapteyn series performs the best in terms of the average achievable rate among the compared algorithms given the same computational complexity.

Algorithm Computational Complexity
Traditional RZF

Simulation Results and Analysis
Numerical simulations were conducted to evaluate the channel performance of listed precoding algorithms in Table 1.In the simulations, if they were not specially defined, the default noise power variance was 1, the actual transmission signal power satisfied tr GG H = K, and the simulation model was a large-scale single-cell MIMO system.Further, the default number of BS transmitting antennas M = 256 and the number of users K = 32 if they were not specially defined.The channel correlation matrix was modelled as [26] where a = 0.1.
The channel performance was evaluated by the average achievable data rate at the UTs, which is given by r This rate is further averaged by the values obtained under different channel implementations in the simulations.
Figures 2-5 illustrate the average achievable rate versus the SINR of the transmitted signal to the users.The performance of the traditional RZF precoding algorithm was considered as the "perfect" value for comparison.
From Figure 2, it can be observed that under the well estimated channel condition (lower value), all the compared RZF precoding algorithms perform better than the corresponding ones under poor estimated channel condition (higher value).Under certain channel estimation quality, the RZF precoding algorithm based on truncated Kapteyn series expansion performed much better than the ones with Taylor series expansion and Neumann series expansion, while the performance of the latter two ones was nearly identical.Especially when the channel estimation was very imperfect, the performance of the proposed RZF precoding based on the truncated Kapteyn series expansion is very close to the traditional RZF precoding algorithm.Figure 3 shows that the channel performance of the precoding algorithms with all compared truncated series expansions is getting close to the "perfect" traditional RZF precoding when the truncation order N is increased (e.g., from 2 to 4 in the simulations).Again, it shows the better channel performance of the proposed RZF precoding algorithm than other truncation algorithms under the various N values.Furthermore, the coefficient optimization efforts were evaluated with the RZF precoding algorithm based on truncated Kapteyn series expansion.Figure 4 illustrates the channel performance at the user when the coefficients optimization process was performed in RZF precoding.From Figure 4, it can be easily seen that the channel performance of the proposed algorithm with coefficients optimization was always better than the one without the optimization effects.When SINR increased, the performance gap between two cases got higher.In the cases when the truncation order was 2, the performance of proposed RZF precoding algorithm with coefficients optimization was almost the same as the performance of the traditional RZF algorithm.

Signal to Noise Ratio[dB]
Finally, we end this section by comparing the channel performance of Kapteyn algorithm and Taylor algorithm with certain computational complexity considered.Specifically, the RZF precoding matrix based on truncated Taylor series expansion is given by According to Table 1, it can be found that the computational complexity of the improved Kapteyn algorithm with the truncated order of N is the same as the one of the Taylor algorithm with the truncated order of N + 2. It is interesting that Figure 5 shows that the average achievable rate obtained by the proposed improved Kapteyn algorithm when N = 2 or N = 3 compared the one obtained by the Taylor algorithm with N = 4 or N = 5.This observation implies that in the case of the same computational complexity, the improved Kapteyn algorithm performs much better than the Taylor algorithm.

Conclusions
In this paper, we propose an improved RZF precoding algorithm based on the truncated Kapteyn series expansion and optimize its expansion coefficients for fast precoding convergence.As the SINR increases for the users, the average achievable data amount obtained by utilizing the proposed coefficients optimization algorithm of the truncated Kapteyn series expansion is much higher than that of the truncated Kapteyn series expansion without coefficients optimization.Compared with the direct matrix inversion required in RZF precoding, the computational complexity of the proposed algorithm is significantly reduced and the computational complexity of the proposed algorithm does not increase with coefficients optimization.Numerical results show that the downlink channel performance in terms of the average achievable rate at the users of the proposed algorithm is better than the ones of the truncated Neumann series and Taylor series expansions in a large-scale single-cell MIMO system.Furthermore, under the condition of same complexity, the performance of the improved RZF precoding algorithm is the best among these compared algorithms.Further work will focus on the research and could target extending the proposed optimization to signal detector in massive MIMO systems in order to reduce the computational complexity.

Table 1 .
Computational complexity comparison of five algorithms.
ĤH Ĥ N+1 ĤH h k , we have h H k Ge k e H k G H h k = Ψ H P H e k e H = 0, . . ., N + 1 and m = 0, . . ., N + 1. Therefore we have h H k Ge k e H k G H h k = Ψ H A k Ψ and (25) results.Similarly, substituting (23) into h H k GG H h k , we have h H k GG H h k = Ψ H B k Ψ, where B k = P H P and [B k ] l,m = h H k Ĥ ĤH l+m+1 h k .Hence (26) results.