Low-Complexity Soft-Output Signal Detection Based on Improved Kaczmarz Iteration Algorithm for Uplink Massive MIMO System

For multi-user uplink massive multiple input multiple output (MIMO) systems, minimum mean square error (MMSE) criterion-based linear signal detection algorithm achieves nearly optimal performance, on condition that the number of antennas at the base station is asymptotically large. However, it involves prohibitively high complexity in matrix inversion when the number of users is getting large. A low-complexity soft-output signal detection algorithm based on improved Kaczmarz method is proposed in this paper, which circumvents the matrix inversion operation and thus reduces the complexity by an order of magnitude. Meanwhile, an optimal relaxation parameter is introduced to further accelerate the convergence speed of the proposed algorithm and two approximate methods of calculating the log-likelihood ratios (LLRs) for channel decoding are obtained as well. Analysis and simulations verify that the proposed algorithm outperforms various typical low-complexity signal detection algorithms. The proposed algorithm converges rapidly and achieves its performance quite close to that of the MMSE algorithm with only a small number of iterations.


Introduction
Massive multiple-input multiple-output (MIMO) technology dramatically expands the capacity of wireless communication systems without increasing the system bandwidth and transmit power and effectively resolves the contradiction between the limited spectrum resource and the rapid growth in capacity demand. Therefore, it has become one of the most promising solutions of 5G systems [1][2][3][4]. Equipped with up to hundreds of antennas at the base station (BS), massive MIMO system simultaneously serves multiple single-antenna users in the network, resulting in an order of magnitude improvement in spectrum utilization and energy efficiency of wireless systems [5,6].
Apart from the salient technical merits, massive MIMO system encounters various challenging problems in practice, one of which is the multi-user signal detection in the uplink. Due to the large number of antennas, the multi-user interference is remarkably intensified in system uplink and the implementation complexity is highly accrued, compared with the conventional MIMO systems. Theoretically, the maximum likelihood (ML) algorithm serves as the optimal solution for signal detection in MIMO systems and it becomes stringently burdensome to be implemented effectively in practical applications. Since the computational complexity of ML algorithm rises exponentially with the increase of the number of antennas and the modulation order of the baseband signal [6,7], it is actually infeasible to be employed in massive MIMO systems. When the reduced or fixed complexity is considered as the first priority design objective, the tabu search (TS) algorithm [8] and the fixed-complexity sphere decoding (FSD) algorithms [9] were proposed to obtain the close optimal ML detection performance, but their complexity still becomes not practically affordable for a large scale configuration of MIMO system with high modulation order.
Benefiting from the large number of antennas in massive MIMO systems, linear signal detection algorithms, such as zero forcing (ZF) and minimum mean square error (MMSE) algorithms, have been shown and verified to achieve nearly optimal detection performance, at the cost of involving high-dimensional matrix inversion with high complexity [6] (O(K 3 ), where K is the number of users simultaneously transmitting over the uplink). In recent years, various low complexity signal detection algorithms based on the MMSE criterion have been proposed for massive MIMO systems in the literature. In our previous work, we have investigated a variety of low complexity signal detection algorithms for massive MIMO systems under the MMSE criterion-based signal detection, where the key idea of achieving simplified complexity in signal detection is to find a solution that manages to evade the high-dimension matrix inverse operation, and a comparative study has been presented [10].
To the authors' best knowledge, the MMSE criterion-based low complexity signal detection algorithms can be basically categorized into three typical types, namely the approximate matrix inversion algorithms (AMIA), the iterative approaches for solving linear equations (IASLE), and the matrix gradient search methods (MGSM). Firstly, the AMIA algorithms deal with the matrix inversion operation, required by the MMSE signal detection, in an approximation manner where the Neumann series expansion and Newton iteration are evoked to estimate the matrix inversion approximately [11][12][13]. The approximation accuracy depends on the number of Neumann items or the Newton iterations and may result in a high complexity when the number of items or iterations is set large for achieving a satisfactory performance. Secondly, in an entirely different mechanism, the IASLE tackle the problem of matrix inversion by finding solution to the system equation [14][15][16][17], where the transmitted multi-user signal vector is directly estimated and thus the high dimensional matrix inversion is purposely circumvented. Thirdly, based on the same idea of the IASLE, the MGSM methods are proposed to acquire the equation solution by matrix gradient search and hence the direct matrix inversion operations are bypassed, saving a lot of computations [18,19]. From the perspective of system performance, the AMIA algorithms are usually inferior to the IASLE and MGSM algorithms and when the number of items or iterations is large, the algorithm complexity is approaching O(K 3 ) again. As for the IASLE and MGSM algorithms, in case that some special properties of the weighting matrix is not guaranteed, for instance, if the weighting matrix is not symmetric positive and strictly diagonal dominant, they may encounter serious performance degradation or even fail to operate properly. Drawbacks of these algorithms need to be overcome by means of finding new type of algorithms. The aforementioned typical MMSE criterion-based multi-user signal detection algorithms are compared in Table 1. Table 1. Minimum mean square error (MMSE) criterion-based low complexity signal detection algorithms.

Category
Algorithm Comparisons AMIA Neumann series approximation [12] • Relatively poor/unsatisfactory performance; • High complexity required for large number of Neumann items or Newton iterations. Newton iteration [13] IASLE Richardson [14] • Direct estimation of the transmitted signal vector via equation solving; • Symmetric positive definite property demanded for the filtering matrix; • Characteristics of being diagonally dominant required for the filtering matrix; • Failure or performance deteriorated when above conditions not guaranteed.
Gauss-Seidel [15] Jacobi [16] Successive over-relaxation [17] MGSM Conjugate gradient [18] • Relatively high complexity in gradient update; • Same requirements on the filtering matrix as that of the iterative methods. Steepest descent [19] In this paper, in order to obtain an easy-to-implement multi-user signal detection scheme for the uplink massive MIMO system, we propose a soft decision algorithm based on the Kaczmarz iteration [20][21][22]. In our previous work, the Kaczmarz algorithm was proposed to serve as a matrix-inverse approximation method for implementing the MMSE criterion-based signal detection with reduced complexity [22]. By circumventing the high-dimensional matrix inversion computations, we effectuated a simplified detection scheme for acquiring the transmitted signal vector in linear equation solving manner. To further improve the system performance and accelerate the converging speed in iterations, an optimal relaxation parameter is introduced to accelerate the convergence of the proposed Kaczmarz algorithm. To be more specific, the proposed improved Kaczmarz algorithm falls in one category of the low complexity MMSE criterion-based signal detection algorithms, that is, the IASLE, and it is actually a combination of the iterative approach for solving linear equations and the conventional Kaczmarz algorithm. Based on the output of the proposed improved Kaczmarz algorithm, theoretical log-likelihood ratios (LLRs) of the user bit streams are derived and one approximate method of estimating the LLRs for channel decoding is presented as well. Simulation results verify that the proposed algorithm outperforms the typical algorithms mentioned in Table 1 in terms of bit error rate (BER) with significantly relieved computational complexity. In comparison with the Kaczmarz algorithm, the proposed improved Kaczmarz algorithm yields a much better BER performance, given the same number of iterations. Additionally, the proposed improved Kaczmarz algorithm converges rapidly in operation and achieves its performance quite close to that of the MMSE algorithm with only a small number of iterations.
The rest of this paper is organized as follows. In Section 2, we describe the system model. In Section 3, the Kaczmarz iteration based signal detection is introduced. In Section 4, we propose the improved Kaczmarz algorithm based soft output signal detection. In Section 5, simulation and analysis are presented. Finally, Section 6 concludes the paper.
Notation: Lower-case and upper-case boldface letters are used to represent column vectors and matrices, respectively. The superscripts (·) T , (·) H , and (·) −1 stand for the transpose, conjugate-transpose, and inverse of matrix, separately. The operator · , E[·], and < ·, · > denote the vector/matrix norm, the statistical expectation of a given argument, and the inner product of two vectors, respectively. I K is the K dimensional unit diagonal matrix.

System Model
An uplink massive MIMO system is considered, where the BS is equipped with N antennas and totally K single-antenna users are located within the coverage area of the BS (N K). The bit stream of each user is first encoded using a channel encoder and then mapped to the constellation points in the set C. The constellation symbol vector s = [s 1 , s 2 , · · · , s K ] T , s ∈ C K , contains the transmit signals from the K users and it is assumed that E{|s k | 2 } = E s , k ∈ {1, 2, · · · , K}, where E s represents the average power of the user signals. Based on the system configuration, the received signal at the BS can be expressed as: where y = [y 1 , y 2 , · · · , y N ] T is the received signal vector at the BS, n is the white Gaussian noise vector with zero mean and variance σ 2 0 for each entry, and H = [h 1 , h 2 , · · · , h K ] is the N×K channel matrix with its entry [H] nk denoting the channel coefficient between the k-th user and the n-th BS antenna.

MMSE Detection
If linear detection is utilized under the MMSE criterion, an estimate of the transmitted signal, s = [ŝ 1 ,ŝ 2 , · · · ,ŝ K ] T , at the BS is expressed as: where

LLRs Generation
Based on the MMSE weighting, the transmitted signal can be presented as: where U represents the channel matrix after equalization. The received signal of the k-th user can be expressed as: where µ k = [U] kk = U kk is the equivalent channel gain after equalization and p k denotes the noise-plus interference (NPI) for the k-th user, with its variance calculated as: where E kk is the k-th diagonal element of E, and The LLR of the b-th bit of the k-th user's symbol is hence obtained as: where the coefficient Y k = µ 2 k /v 2 k is equivalently the signal-to-interference-plus-noise ratio (SINR) for the k-th user, C 0 b and C 1 b are the symbol subsets of C where the b-th bit of the constellation symbols is 0 and 1, respectively.
From the above analysis, it is easy to observe that the calculations of F, U, and E all require computation of W −1 first, leading to a high complexity as O(K 3 ). In order to circumvent the high complexity computations in matrix inversion, a low complexity MMSE soft detection scheme is proposed in this paper.

Kaczmarz Iteration Based Signal Detection
Since the IASLE methods usually outperform the AMIA algorithms, we also consider another typical IASLE algorithm, known as the Kaczmarz algorithm, in handling the signal detection task as the one of system equation solving. The Kaczmarz algorithm is widely used in various fields, among which it is also known as the algebraic reconstruction technique (ART) [23] in computed tomography. It provides an iterative method for solving the large scale over-determined linear equation x represents a K × 1 vector to be determined, and b is an N × 1 measurement vector. In the iterative process of the Kaczmarz algorithm, a k , the k-th row of the matrix A, is traversed in a periodic manner. In each step, the solution of the last inner iteration isx t,k−1 within the t-th outer iteration and it is orthogonally projected, as a T k ,x t,k−1 = b k , onto the hyperplane associated with the row vector a k . Given an initial solutionx 0 for solving Ax = b, the t-th iteration's solution of the Kaczmarz algorithm can be expressed as: where t and k respectively represent the index of outer and inner iterations, T Iter is the predetermined largest number of iterations, and a k 2 is the vector norm of a k . Applying the Kaczmarz algorithm to detecting the transmitted signalŝ in the linear equation Wŝ =ŷ for massive MIMO systems [22], we can obtain an estimate of the transmitted signal as: where w k is the k-th row vector of W and the initial solution is denoted asŝ 0 that is usually set as a zero vector. Details of the Kaczmarz algorithm based signal detection are given in Algorithm 1.

Improved Kaczmarz Algorithm Based Soft Output Signal Detection
As a non-optimal solution compared to the MMSE signal detection, the performance of the Kaczmarz algorithm can be ameliorated. We propose an improved Kaczmarz algorithm, utilizing a traversal scheme based on norm ordering to enhance the signal detection performance. Specifically, a traversal scheme based on norm ordering consists of the following three steps.
Step-1: Norm ordering. The norms of w k 2 are sorted in descending order, resulting in the corresponding subscript set ψ = {κ 1 , κ 2 , · · · , κ K } permutated from the index set {1, 2 · · · , K}, with ψ(k) = κ k , and then the entry in the subscript set ψ is sequentially selected for traversing. Equation (8) can be expressed as: Step-2: Introducing the relaxation parameter in iterations. An optimal relaxation parameter is introduced to accelerate convergence. Equation (9) can be modified as: where ω t,k denotes the relaxation parameter for the (t, k)-th iteration.
Step-3: Finding the optimal relaxation parameter. The convergence speed of the Kaczmarz Algorithm merely depends on the condition number of W and the algorithm converges for any t and k when ω t,k is set as a constant ω ∈ (0, 2) [24]. Equation (10) can be simply expressed as: Usually, the relaxation parameter ω is set as 1 for simplicity, as shown in Equation (9). However, it is heuristically found in experiments that the optimal relaxation parameter ω opt can be set as (1 + λ min /λ max ), where λ min and λ max is the minimum and maximum eigenvalue of W, respectively.
In massive MIMO uplink system, as the number of BS antennas and the number of users increase, the eigenvalues of W obey the Marchenko-Pastur distribution, where λ min and λ max asymptotically converge as [2]: where α = N/K. The optimal relaxation parameter, as a function of α, can be therefore determined as: where it is easy to observe the fact that ω opt ∈ (0, 2). Based on the above analysis, we further demonstrate via simulations that the optimal relaxation parameter ω opt exists in a very narrow range and it can be set as a constant determined by the system configuration only. Detailed description of the improved Kaczmarz algorithm based soft output signal detection is presented in Algorithm 2.  14: k = k + 1, 15: end for 16: 17: end for 18: % Computing the approximate LLRs

Initial Estimation
For massive MIMO systems, the columns of H are asymptotically orthogonal, meaning that G = H H H is positive definite and diagonally dominant. According to the channel hardening phenomenon [25], there is G ≈ NI K . This special property enables us to utilize D −1 to approximate W −1 with trivial error and the initial solution of Equation (11) can be set as: where D is the diagonal matrix corresponding to W.

Exact Method
Through iterative operation, the solution vectorŝ t,k gradually converges to the solution for the equation Wŝ =ŷ, and hence there should be a corresponding matrixŴ −1 t,k for each iteration. The most straightforward method of computing LLR is to use the Kaczmarz algorithm to estimateŴ −1 t,k after each iteration. Combining Equations (2) and (11), an estimate of the transmit vector at each iteration can then be computed as:ŝ where e T ψ(k) represents the ψ(k)-th unit row vector that only has the ψ(k)-th element being non-zero. For the ψ(k)-th element inŝ t,k , we havê where the corresponding ψ(k)-th row vector ofŴ −1 t,k can be expressed as: where W ψ(k)ψ(k) is the ψ(k)-th diagonal element of W. Therefore,Ŵ −1 t,k can be obtained from the above discussions, with an initial solutionŴ −1 0,0 = D −1 . The corresponding channel gain and NPI variance can be derived as:μ Then, the LLRs for channel decoding can be obtained by substituting Equations (18) and (19) into Equation (6).

Approximated Method
The exact method precisely computes the LLRs and yields theoretically optimal BER performance. However, from Equation (17), updatingŴ −1 t,k involves multiplication and addition between the matrix and the vector for each iteration, causing the final complexity order to rise to O(K 3 ) again. To solve this problem, an approximated method to calculate the LLRs is proposed, which completely avoids the complicated matrix inversion. Since W −1 is diagonal dominant for uplink massive MIMO systems, it can be replaced by the diagonal matrix D −1 with tolerable error. Then, the approximated channel gain and NPI variance are obtained in a non-iterative manner as: Then, the LLRs for channel decoding can be obtained by substituting Equations (20) and (21) into Equation (6) with much lower complexity.

Computational Complexity
The computational complexity is analyzed in terms of the number of real-valued multiplications. Since all algorithms mentioned in this paper need to calculate the filter matrix W and the matched filter outputŷ, and the LLR is computed via the proposed approximate method, only the computational complexity of other different implementation parts of each algorithm is analyzed and compared. The computational complexity of the improved Kaczmarz algorithm mainly comes from iteratively estimatingŝ. For each inner iteration, the calculation of the inner product of the two vectors in Equation (10) and the update ofŝ t,k , require 2K real multiplication operations respectively. With the additional operations in updating the intermediate values, it totally requires 2K(4K + 2) = 8K 2 +4K real multiplication operations for each outer iteration. The computational complexity of the algorithms investigated in the computer simulations can be found in Reference [10]. Figure 1 shows the comparison of computational complexity among the MMSE, Neumann [12], and Kaczmarz algorithms, where the Kaczmarz algorithm demands significantly less than the other two algorithms when the number of iterations is small. Although the Neumann series expansion algorithm requires a slightly lower complexity than the Kaczmarz algorithm when its number of expansion items is two, the system BER performance of the Neumann algorithm is far from being satisfactory in practice, as shown in Figure 2. Moreover, when the number of expansion items is four, the complexity of the Neumann algorithm even surpasses that of the MMSE algorithm. The number of real-valued multiplications

BER Performance
In Monte-Carlo simulations, Rayleigh fading channels are assumed. The channel coding scheme is implemented with convolutional coding with a code rate of 1/2 and the 16-QAM constellation is chosen for modulation. The average transmit power of each user is set as 1. The parameter t represents both the number of iterations and the number of expansion items of the Neumann algorithm. Figures 2 and 3 give the BER performance of the algorithms mentioned in this paper when the system antenna configuration (N, K) is (64, 16). In Figure 2, the BER performance of Neumann [12], conjugate gradient (CG) [18], Kaczmarz, and the improved Kaczmarz algorithm is presented. The BER performance of the improved Kaczmarz algorithm is obtained with approximated LLR computations. It is obvious that the improved Kaczmarz algorithm performs best among all the low complexity signal detection algorithms, given the same number of iterations or expansion items. In Figure 3, diagonal band Newton iteration (DBNI) with the band width as E = 2 [13], joint steepest descent and Jacobi method (JC) [16], and the improved Kaczmarz algorithm are compared. As expected, the improved Kaczmarz algorithm outperforms all the other algorithms with the same number of iterations and its performance is sufficiently close to that of the conventional MMSE algorithm.    (128, 16). In Figure 4, the improved Kaczmarz algorithm is again verified to be superior to the other typical low complexity signal detection algorithms. Meanwhile, it is apparent that as the ratio K/N decreases from (64, 16) in Figures 2 and 3 to (128, 16) in Figures 4 and 5, the improved Kaczmarz algorithm approaches the performance of the MMSE algorithm faster and closer. In Figure 5, BER performance of the improved Kaczmarz algorithm with different numbers of iterations is provided. For different SNR, the performance is becoming stable after only three to four iterations.   In order to find the optimal relaxation parameter ω opt , the impact of different relaxation parameters on the BER performance is provided in Figure 6. The antenna configuration is (128, 16) and t = 4. For different SNR, the optimal relaxation parameter ω opt is found to exist between 1.2 and 1.35. It is shown to be independent of the SNR and can be set as a constant. The optimal relaxation parameter/ω 1 1.5 6 2 Figure 6. Impact of the relaxation parameter on BER (128 × 16).

Conclusions
In this paper, a low complexity soft output signal detection algorithm based on Kaczmarz iteration is proposed. The algorithm is tailored for uplink massive MIMO system to avoid high-dimensional matrix inversion required by the MMSE criterion. The improved Kaczmarz algorithm estimates the transmitted signal by iteratively solving the linear equation and circumventing the matrix inverse operation. Therefore, the complexity is significantly reduced from O(K 3 ) to O(K 2 ). Meanwhile, an optimal relaxation parameter is introduced to the improved Kaczmarz algorithm to further accelerate the algorithm convergence and enhance the BER performance. Simulation results verify that the proposed algorithm outperforms various conventional signal detection algorithms with approximate matrix inversion in terms of BER and computational complexity. The improved Kaczmarz algorithm converges rapidly and achieves its performance quite close to that of the MMSE algorithm with only a small number of iterations, and the complexity order remains as O(K 2 ) at any number of iterations. The algorithm can serve as a low-complexity candidate scheme for signal detection of the uplink massive MIMO system.

Conflicts of Interest:
The authors declare no conflict of interest.