An Improved Jacobi-Based Detector for Massive MIMO Systems

: Massive multiple-input-multiple-output (MIMO) is one of the key technologies in the ﬁfth generation (5G) cellular communication systems. For uplink massive MIMO systems, the typical linear detection such as minimum mean square error (MMSE) presents a near-optimal performance. Due to the required direct matrix inverse, however, the MMSE detection algorithm becomes computationally very expensive, especially when the number of users is large. For achieving the high detection accuracy as well as reducing the computational complexity in massive MIMO systems, we propose an improved Jacobi iterative algorithm by accelerating the convergence rate in the signal detection process.Speciﬁcally, the steepest descent (SD) method is utilized to achieve an efﬁcient searching direction. Then, the whole-correction method is applied to update the iterative process. As the result, the fast convergence and the low computationally complexity of the proposed Jacobi-based algorithm are obtained and proved. Simulation results also demonstrate that the proposed algorithm performs better than the conventional algorithms in terms of the bit error rate (BER) and achieves a near-optimal detection accuracy as the typical MMSE detector, but utilizing a small number of iterations.


Introduction
Massive multiple-input-multiple-output (MIMO) is an emerging technology for communication application which contributes a promising technology for the wireless sensor networks (WSNs) [1][2][3] and the fifth generation (5G) wireless communications [4].In such systems, the base station (BS) was equipped with hundreds of antennas serving tens of single-antenna users in the same frequency band [5][6][7].Benefit from the massive MIMO system is capable of achieving higher multiplexing and diversity gains compared with the conventional small-scale MIMO system [8,9].
However, due to the increasing dimension, the signal detection in the uplink may become a challenge of the massive MIMO system [10].According to [11], the maximum likelihood (ML) detector is an optimal algorithm.However, its substantially high computational complexity contributes the bottleneck of massive MIMO systems.Therefore, assorted detection algorithms have been proposed.Some nonlinear detection algorithms such as sphere decoder (SD) and its variants have been proposed to achieve near-optimal performance with low complexity [12,13].However, the complexity is still unfavorable because of the large dimension of the systems or the high modulation order.Thus, one can resort to the linear detection algorithm, such as the minimum mean square error (MMSE) with the near-optimal detection accuracy [14].However, the inverse of the large-dimensional covariance matrices required by MMSE would result in a prohibitively high complexity.
Given this challenge, low-complexity approximate matrix inversion has drawn substantial attention.The typical algorithms can be divided into two categories.One typical category is based on the series expansion, such as the Neumann series expansion (NSE) detection algorithm [15].The NSE algorithm utilizes the first few terms of the series expansion to approximate matrix inverse.However, its performance suffers from a significant loss with the scaling up massive MIMO and only marginal reduction in complexity can be achieved.
Another category is based on the iterative algorithms derived from linear equations.To expand, it converts the matrix inversion process into solving linear equations, where the Gauss Seidel (GS) based iterative algorithm is one of the typical algorithms and utilizes the most up-to-date values at each iteration, leading to the faster convergence rate and lower complexity than NSE detection algorithm [16].Furthermore, the symmetric successive over-relaxation (SSOR) algorithm [17] can also be applied, where different weights are put in the iteration structure.To enhance the robustness of the SSOR algorithm, the optimal parameter of weights is determined by exploiting channel characteristics, yielding a fast convergence and a high accuracy.Unfortunately, each iteration of the GS and SSOR calculations is serial in nature, where each component of the new iteration depends on all of the previously calculated components.Fortunately, the Jacobi algorithm is a simple iterative algorithm with high parallelism [18].Some improved Jacobi-based algorithms have been proposed, one of which is the damped Jacobi (DJ) algorithm [19].The algorithm has great convergence performance when the number antennas at the BS is far greater than the number of antennas of the users.However, in the uplink communications, the increase of the users will make the requirements of DJ algorithm no longer satisfied, resulting in a great performance degradation.
Motivated by the above concerns, we focus our attention on further improving the convergence rate and calculation accuracy of the Jacobi algorithm with low-complexity calculation.Based on the Jacobi algorithm, we invoke the steepest descent (SD) method [20] and whole-correction method [21] to modify the iteration process.The major contributions of our work can be listed as below: (1) The SD method [20] is invoked in Jacobi algorithm to improve the convergence rate.As the SD method provides an efficient searching direction, the combined SD and Jacobi algorithm is capable of satisfying the convergence condition.Furthermore, compared to the conventional Jacobi algorithm, the combined algorithm is more efficient to accommodate a different number of users.(2) Note that, when the iterative approximate solution is close to the exact solution, the SD method will no longer provide an efficient searching direction, and the whole-correction method [21] is employed to improve the convergence.By exploiting the obtained, the more accurate solution can be attained by the whole-correction method.
The remainder of the paper is structured as follows.We briefly introduce the system model and present a review of previously proposed approaches in Section 2. In Section 3, a hybrid Jacobi-based iterative method is proposed, where the analysis of convergence and computational complexity of the proposed Jacobi iterative method are presented.In Section 4, numerous simulation results are demonstrated.Finally, the entire paper is concluded in Section 5.
Notation: In this paper, bold-face, lower-case letters represent the column vectors (e.g., a), while the bold-face, upper-case letters refer to the matrices (e.g., A).The superscript of matrix A, A T , A H , A −1 and A + indicate the transpose, the Hermitian transpose, the inverse and the Moore-Penrose inverse of A, respectively.Furthermore, I K denotes the K × K identify matrix, [a b] denotes the inner product of the vectors, and • stands for the Euclidean norm of a vector.

System Model
Consider an uplink large-scale MIMO system where the BS is equipped with N r receiving antennas serving N t single-antenna users and N r N t [22].The received signal vector, which is denoted by y = [y 1 , y 2 , • • • , y N r ] T , at the BS is expressed as where T denotes the transmitting signal set and x i ∈ Ω is the transmitted signal from the ith user with Ω being a complex signal set obtained by modulating and mapping the source message s.H ∈ C N r ×N t represents a flat-fading channel matrix which has a complex-valued independent and identically distributed (i.i.d.) Gaussian entries following CN (0, 1), and n = [n 1 , n 2 , • • • , n N r ] T is the complex-valued additive white Gaussian noise vector with zero mean and σ 2 variance, respectively.The purpose of MIMO signal detection is to estimate the transmitted vector x.The channel matrix H and the received signal vector y are utilized for the detection.The structure of the signal detection is illustrated in Figure 1.

MMSE Detector
The MMSE approach achieves a near-optimal detector in massive MIMO systems, where the resulting estimated symbol vector x by utilizing MMSE detector is given by In Equation ( 2), A = H H H + σ 2 I N t is the symmetric positive definite matrix and b = H H y is regarded as the output of the matched filter, respectively.The computational complexity of direct matrix inversion A −1 is on the order of O N 3 t , which incurs a heavy computational burden in massive MIMO systems when the large-dimensional matrix inversion needs to be performed for large number of users.

Conventional Jacobi Algorithm
To avoid the direct matrix inverse of A, we can convert the MMSE detection into solving the linear equation as By utilizing Jacobi algorithm [18], the matrix A can be decomposed as A = D + E, where D and E are respectively considered as the diagonal and off-diagonal components parts of A. Then, the iterative solution of the Jacobi algorithm is given by where k denotes the iteration index.Since the convergence rate of the iteration is highly based on the initial setting, we consider the initial solution x(0) as x(0) = D −1 − D −1 ED −1 b in this work [23].

Proposed Algorithm for MIMO Detection
In this section, an improved Jacobi algorithm for MIMO detection is proposed in uplink massive MIMO systems.Specifically, the proposed algorithm is divided into two stages.In the first stage, the SD method is applied towards an efficient searching direction in the Jacobi algorithm.In the second stage, the whole-correction method is utilized to further improve the convergence rate of the algorithm.Then, a fast convergence of the proposed algorithm is proved following an analysis of the complexity of the algorithm.

Improved Jacobi Algorithm
For achieving a fast convergence of the estimation of x, the SD method is applied to provide an efficient searching direction for Jacobi algorithm [20], which is called the SD-Jacobi iteration in this work.The combined iteration consists of two steps: (1) During the first half of the iteration, utilize the SD method to compute symbol vector as where is the residual vector, and α k represents the variable step size, respectively.The step size α k , which depends on r (k) , is determined by (2) During the second half of the iteration, conduct Jacobi iteration, which is given by where Substituting Equation (5) into Equation (7) yields where (k) .Equation (8) shows that, at each iteration, the involved SD method improves the convergence rate of the Jacobi iteration.However, when the iterative estimation is close to the solution, the SD method no longer provides an efficient iteration direction.Hence, a whole-correction method is applied to continue improving the convergence of the SD-Jacobi iteration afterwards, which is introduced below.
The whole-correction method is designed to solve the optimal convergence problem in a linear system [21].It assigns the different weights to a series of approximate solutions of the iterations to obtain the optimal solution, which is presented below.
, where m is the number of selected solutions, which is discussed later.Considering the different weights to the iterative solutions, the vector x, which represents the updated x(k) , can be expressed as where the weight a i denotes the contribution ratio of xi to x.
The whole-correction method leads to a more accurate solution of the vector x from the approximate solution set of xi .Therefore, the updated solution We are now in position to determine the weight a i .Let's select q ∈ {1, 2, • • • , m} randomly.Then, we can obtain The residual vector regarding a q is r q = b − A xq .Thus, the residual vector of the updated x is rewritten as In this case, the optimization problem can be rewritten as where The minimum norm solution of the optimization problem in Equation ( 12) is given by Therefore, an improved solution of x can be achieved when the contribution ratio is obtained.It is verified later that this improved solution is more accurate than any arbitrary approximate solutions of the iterations.Based on the above discussion, the pseudo code of the proposed algorithm is presented in Algorithm 1, where the algorithm terminates at the K-th iteration.Note that an important issue of the proposed algorithm is related to a suitable choice of the number of approximate solutions m, which highly influences the convergence rate and the computation complexity of the algorithm.In the next subsection, we will consider the value m from two aspects.
Algorithm 1: Proposed Improved Jacobi Iterative Algorithm Step 1) Compute the initial approximate solution of the transmitted signal x, mark it as x(0) .
Step 2) (if m > 2) Compute m − 2 approximate solutions of x using Equation ( 7), mark as , to obtain enough mutual solution which are required by the whole-correction method.
For k = 1 : m − 2 Use SD-Jacobi iteration to obtain x(k) ; End for Step 3) Employe the whole-correction method to update the approximate solution.
For k = m − 1 : K Use SD-Jacobi iteration to obtain x(k) ; Obtain the updated x(k) by End for

Parameter Selection
In general from the existing iterative MIMO detection algorithms aim at the small number of iterations.In this case, number of approximate solutions obtained by the iterations is also limited.Thus, in this part, we only consider the value of m as 2, 3 and 4. Firstly, we focus on the complexity of calculating the correction coefficient.In each iteration process, we need to compute the correction coefficient a 1 , • • • , a m , which can be written as: where q ∈ {1, • • • , m}.For illustrative purposes, we consider the q as q = 1, then Equation ( 14) can be rewritten as: Case 1: for m > 2, it should be noted that we need to calculate a Moore-Penrose inverse to obtain a correction coefficient, the required number of complex-valued multiplication is O mN 2 t [24].Case 2: when m = 2, the correction coefficient a 1 and a 2 can be rewritten as The calculation of correction coefficient requires performing two vector multiplication and one norm calculation with the complexity is O (N t ).The second case requires far fewer computations than the first case to realize coefficient calculation.Therefore, in terms of computational complexity, m = 2 is more advantageous for the iteration process than m = 3 and 4. Next, the detector performance is shown in terms of the bit error rate (BER) performance over a Rayleigh-fading channel to illustrate the effect of m value on performance.We consider an uplink massive MIMO scenario with 128 × 32 antenna configuration and 64-quadrature amplitude modulation (QAM) modulation scheme.In addition, the number of iterations is set as K = 3. Figure 2 and Table 1 show that the proposed algorithm with m = 3 and m = 4 does not result in an appreciable gain in performance with respect to the case m = 2.We can observe that the proposed algorithm with m = 3 only outperforms the proposed algorithm with m = 2 in 0.12 dB for BER = 10 −5 and its complexity is 3N t times higher.The proposed algorithm m = 4 even has the same BER of the proposed algorithm with m = 2.As a consequence, we choose the value of m is 2. In Algorithm 2, the pseudo code of the proposed algorithm with m = 2 is presented, where the algorithm terminates at the K-th iteration.

Convergence Proof
Taking the iterative process into account, the overall convergence performance is split into two parts for theoretical analysis.For the first part, we research the convergence performance of the combined SD and Jacobi iterative process and compare it with the conventional Jacobi algorithm.Theorem 1.A necessary and sufficient condition for the iterative algorithm x (k) = Bx (k−1) + c to be convergent for all initial vectors x (0) is that ρ (B) < 1 [25].
As the theorem shows, the convergence velocity of the iterative algorithm is closely related to the spectral radius of the iteration matrix.Therefore, we derive the iteration matrix of the combined algorithm firstly.The iterative form of iterative process of the combined algorithm is concluded as below: where considered as the iteration matrix.Afterward, analyze the spectral radius of B, hereinafter ρ (B): Since the MMSE filtering matrix A is a positive definite matrix, the diagonal matrix D of A can be considered as the positive definite.According to that, we can infer that the norm of ρ I − D −1 A and ρ (I − α k−1 A) , k = 1 : K are both less than one [26].Furthermore, we can get ρ I − D −1 A < 1 and ρ (I − α k−1 A) < 1 , respectively.Thus, it is easy to infer ρ (B) < 1.According to Theorem 1, we can assert that the iteration of the combined algorithm is convergent.
if k = 1; the value of r (0) is obtained in the initialization 6: else;r (k−1) = a 1 r 1 + a 2 r 2 , the value of r 1 , r 2 is obtained in the previous iteration p (k−1) = Ar (k−1) ; 9: , a 2 = 1 − a 1 ; 14: x(k) = a 1 x1 + a 2 x2 ; 15: end for In addition, the spectral radius of the Jacobi iterative matrix is ρ I − D −1 A .Comparing Label (18) with the spectral radius of the conventional Jacobi algorithm, we obtain the following inequality: It is worth pointing out that the smaller spectral radius of the iterative matrix, the faster the convergence of the scheme.Therefore, we can infer that the convergence velocity of the proposed algorithm is faster than the conventional Jacobi algorithm.
For the second part, we focus on the convergence of the whole correction method, that is, prove the convergence of the whole correction method [21].Supposing that the residuals value correspond to Assume the residual value r 1 , r 2 , r as the space vector which takes the origin of the coordinate as the starting point, and the r 1 , r 2 , r as the end point.Select i, k ∈ {1, 2} randomly, and i = k, for ease of presentation and description, let i = 1, k = 2, then we can get Thus, we can conclude that r is perpendicular to the hyperplane determined by points r 1 , r 2 , which means As a consequence, the proposed algorithm exhibits good convergence performance.

Computational Complexity Analysis
The computational complexity is one of the necessary aspects in measuring the performance of detection algorithms for massive MIMO systems.In this part, we compute the proposed algorithm complexity in each step and analyze the complexity of the proposed algorithm and the conventional algorithms.Since all the algorithms we mentioned need to calculate the matrix A = H H H + σ 2 I N t and the output matched filter b, we only concentrate on the complexity of the remaining parts.In addition, the complexity is defined as the required complex multiplications.
(1) compute x(0) : As the computation of x(0) involves a multiplication of a N t × N t diagonal matrix with a N t × N t off-diagonal matrix and a multiplication of a N t × N t off-diagonal matrix with a N t × 1 vector, the complexity of initial solution is 2N 2  t .(2) compute r (k) : The computation complexity of r (k) should be evaluated by two scalar multiplication with a N t × 1 vector, yielding the total complexity of 2N t .
(3) compute x2 : In Algorithm 2, the computation of x2 includes a multiplication of a N t × N t matrix A with a N t × 1 vector r : p, two inner products of two N t × 1 vectors: α k−1 , two scalar multiplication with N t × 1 vectors and a N t × N t diagonal matrix with a N t × 1 vector.In conclusion, the complexity in this step is N 2 t + 5N t .(4) compute x: having obtained the value of x1 , x2 , the computation of x includes a multiplication of a N t × N t matrix A with a N t × 1 vector x2 : r 2 , one multiplication of two N t × 1 vectors: a 1 and two scalar multiplication with N t × 1 vectors.In this step, the total complexity is N 2 t + 5N t .In summary, the total calculated quantity required by the proposed algorithm is From Table 2 and Figure 3, due to the number of iterations usually being relatively small, the proposed algorithm has approximately the same low computational complexity as the conventional GS and SSOR algorithms.Although the DJ algorithm requires the fewest calculation among these algorithms, it requires much larger iterations to achieve the same accuracy with the proposed algorithm.Additionally, for any number of iterations, the proposed algorithm can reduce the computational complexity of the MMSE detection algorithm from O N 3 t to O N 2 t .This advantage of computational complexity is reflected distinctly, especially when the number of users is large.respectively.As a consequence, the proposed Jacobi-based algorithm outperforms the conventional algorithms in massive MIMO systems.Figure 6 reveals that the advantages of the proposed Jacobi-based algorithm come to an effect when the number of users increases.Note that we can set the number of the receiving antennas at BS as 128 and SNR = 10 dB is applied to the channel.As seen in Figure 6, when the number of single-antenna users increases, the performance of the all algorithms suffers a non-negligible degradation.The proposed algorithm still can achieve almost the same BER performance as the MMSE algorithm with small iterations (i.e., K = 3), regardless of the number of users.In addition, the GS and SSOR algorithms require more iterations (i.e., K = 4) to achieve the near BER performance of MMSE algorithm.Finally, for realistic MIMO systems, the spatial correlation plays an important role in the BER performance.Thus, we consider the influence of channel correlation on the performance of BER.The exponential correlation model is formulated in [27], and r represents the correlation coefficient.Figure 7 shows that the convergence performance of all algorithms degrades with the serious channel correlation.Furthermore, it is worth noting that the proposed algorithm can still converge to the MMSE algorithm with a small number of iterations (i.e., K = 3 when r = 0.1, K = 4, when r = 0.4).When r = 0.1, to ensure the performance of the final approximation, the required complexity of the proposed Jacobi-based algorithm, GS algorithm and SSOR algorithm is O 8N 2 t (i.e., K = 3), O 9N 2 t (i.e., K = 4) and O 10N 2 t (i.e., K = 4), respectively.Similarly, when r = 0.4, the required complexity of the proposed Jacobi-based algorithm, GS algorithm and SSOR algorithm is O 10N 2 t (i.e., K = 4), O 11N 2 t (i.e., K = 6) and O 14N 2 t (i.e., K = 6), respectively.Consequently, the proposed algorithm can still enjoy lower complexity than the GS and SSOR algorithm to achieve the near exact performance of the MMSE algorithm with serious channel correlation.

Conclusions
In this paper, we propose an improved Jacobi iterative algorithm in signal detection in the massive MIMO system.The performance improvement has been achieved with the fast convergence of the iteration process, which is conducted by integrating an SD method and the whole-correction method into the traditional Jacobi iterative algorithm.We evaluate the performance of the proposed algorithm by a theoretical analysis and Monte Carlo simulations.Based on the theoretical analysis, the proposed algorithm has been proved to have good convergence and low computational complexity.Simulation results show that the proposed Jacobi-based algorithm performs better than the conventional algorithms.Especially, with the small number of iterations, the improved Jacobi-based algorithm performs almost the same as the MMSE algorithm in terms of BER, but achieving a much lower computationally complexity compared to one of the latter.Future work includes the joint channel estimation and signal detection based on the proposed algorithm in the massive MIMO system and the related hardware design for a real application system.

Figure 2 .
Figure 2. Bit error rate (BER) curves of the proposed signal detection algorithm with different value m, where K = 3.

Figure 6 .
Figure 6.BER curves with the different number of users.

Figure 7 .
Figure 7. BER curves with different values of correlated magnitude.

Table 1 .
Performance comparison for different m.
m Value Signal-to-Noise Ratio (Target 10 −5 ) Computational Complexity of Moore-Penrose Inverse Bit error rate (BER) curves of the damped Jacobi (DJ) algorithm and the proposed Jacobi-based algorithm.