A Robust Hybrid Iterative Linear Detector for Massive MIMO Uplink Systems

: Fifth-generation (5G) communications system is commercially introduced by several mobile operators where sub-6 GHz bands are the backbone of the 5G networks. A large-scale multiple-input multiple-output (MIMO), or massive MIMO (mMIMO), technology has a major impact to secure high data rate, high spectral efﬁciency, and quality of service (QoS). It could also have a major role in the beyond-5G systems. A massive number of antennas seek advanced signal processing to detect and equalize the signal. However, optimal detectors, such as the maximum likelihood (ML) and maximum posterior (MAP), are not desirable in implementation due to extremely high complexity. Therefore, sub-optimum solutions have been introduced to obtain and guarantee enough balance between the performance and the computational complexity. In this paper, a robust and joint low complexity detection algorithm is proposed based on the Jacobi (JA) and Gauss–Seidel (GS) methods. In such iterative methods, the performance, complexity, and convergence rate are highly dependent on the initial vector. In this paper, initial solution is proposed by exploiting the beneﬁts of a stair matrix to obtain a fast convergence rate, high performance, and low complexity. Numerical results show that proposed algorithm achieves high accuracy and relieve the computational complexity even when the BS-to-user-antenna ratio (BUAR) is small.


Introduction
In last few years, a vast development in telecommunication systems has been witnessed to offer data-oriented services.
In fifth-generation (5G) wireless communication systems, massive multiple-input multiple-output (mMIMO) is utilized to attain high data rate, reliability, robustness, energy efficiency, and spectral efficiency [1][2][3]. The mMIMO technology is also a strong candidate to be used in beyond-5G systems. However, computational complexity is remaining a challenge in implementation of the mMIMO system where large number of antennas are deployed at the base station (BS) to serve many users in a single cell [4]. The complexity is exponentially increases as the number of antenna elements increases. Therefore, a design of efficient low complexity detector for mMIMO uplink (UL) system requires a sophisticated signal processing. It is well known that the maximum likelihood (ML) obtains the lowest bit-error-rate (BER), but requires a very high computational complexity, which is not desirable in implementation. For instance, in a mMIMO system with 64 transmit antennas, ML detector exhaustively searches 1.84 × 10 19 solutions to find the optimal result which is not practical [5]. In addition, computational complexity of the ML detector remarkably increases as the number of users increases. Sub-optimum detectors have been proposed to attain a considerable performance and complexity trade-off where a comprehensive survey can be found in [6].
Nonlinear algorithms like the K-best [6] and sphere decoding (SD) [7] are proposed for MIMO, but they include a QR decomposition which increases required signal processing operations, and therefore the computational complexity increases unfavorably with the system's size. The case of utilizing the K-best and the SD algorithms in mMIMO system is being worse because of the large system's size. The successive interference cancellation (SIC) cannot achieve a quasi-optimum performance close to the ML based detector [5]. Other nonlinear algorithms are also proposed such as the expectation propagation detector and the message passing detector [8]. However, myriad area and high power are required for real implementation in case of nonlinear algorithms. In addition, the belief propagation (BP)-based detector achieves low performance when the ratio between the BS-to-user-antenna ratio (BUAR) is close to 1 [9]. It is also hard to obtain the optimal damping factor. However, the convergence may not be achievable under certain circumstances. In [10], a detection algorithm based on the BP and neural networks have been proposed. Unfortunately, the computational complexity is greatly increased as the number of layers and neurons increased. The approximate message passing (AMP) and generalized approximate message passing (GAMP) algorithms are not robust and they obtain a poor performance outside the situation of the zero-mean independent and identically distributed (i.i.d) sub-Gaussian matrix [11].
Linear detection methods have been proposed to achieve a quasi-optimal performance and relieve some of the burden of the computational complexity in mMIMO systems. They work properly when the number of users is small. For instance, a minimum mean square error (MMSE) achieves a near-optimal performance when the BUAR is large enough [1]. Unfortunately, the MMSE algorithm includes undesirable and complicated matrix inversion, which is a critical task in the computational complexity, particularly when the number of antennas is large. The complexity of utilizing the MMSE algorithm is O K 3 , where K is the number of users. Alternative detection algorithms have been illustrated such as the MMSE-parallel-interference-cancellation (MMSE-PIC) to achieve a satisfactory performance. However, complexity of the MMSE-PIC algorithm is still O K 3 for computing the matrix inversion [12]. The channel hardening property of the mMIMO channel can be exploited to relatively relieve the computational complexity by replacing the matrix inversion challenge by matrix-vector multiplications. The truncated Neumann series expansion (NSE) approach is an example of such replacement where it achieves a significant performance enhancement when the BUAR is large. The convergence rate of the NSE method is slow. Therefore, another method, namely, the Newton iteration (NI), is proposed and it has faster convergence rate than the NSE method. Performance of the NSE and NI methods decays significantly when the BUAR is small, i.e., BUAR< 5 [13]. In other words, the NSE and NI methods are limited to the mMIMO system when the number of user terminals is comparatively small. Therefore, iterative methods have been proposed to avoid the matrix inversion by representing and solving the detection problem as a set of linear equations, and then refinements are performed iteratively over the solutions. It is well known that iterative methods, such as the successive over relaxation (SOR), the Gauss-Seidel (GS), the Jacobi (JA), the Richardson (RI), and etc., approach the MMSE performance with low complexity, i.e., O K 2 . The performance and the complexity of RI and SOR methods depend on a relaxation parameter (ω). If ω = 1, we obtain the GS method from the SOR method. The GS method achieves the MMSE performance when the BUAR is small. The JA method can be implemented in a parallel pattern. However, the performance-complexity profile and the convergence rate of iterative methods are highly dependent on the initial solution. Therefore, initial solution has to be selected carefully. In the literature, the initial solution is usually selected based on the diagonal property of the equalization matrix. In recent literature, a stair matrix have been utilized to initiate the initial solution which shows a considerable performance enhancement as well as a fast convergence rate [14][15][16].
In this paper, a robust low complexity joint detection algorithm is proposed based on the JA and GS methods to achieve the MMSE performance where the initial solution is selected based on properties of the stair matrix. The convergence rate of proposed algorithm is fast even when the BUAR is small. This paper is organized as follows. Section 2 presents the fundamentals and the background of proposed detection algorithm. Section 3 illustrates the iterative methods, whereas Section 3.6 presents the proposed hybrid low complexity detection algorithm. Section 4 presents the complexity analysis of the proposed algorithm. Section 5 illustrates the performance and complexity profiles of the proposed algorithm as well as a comparison with up-to-date methods is presented. Finally, Section 6 concludes the paper.

Background
The matrix inversion operation is undesirable in mMIMO systems and will greatly inflate the computational complexity. To reduce the complexity, iterative matrix inversions methods are proposed to estimate iteratively the signal. However, iterative methods can be categorized into two groups: approximate matrix inversion methods and avoid matrix inversion methods. Approximate matrix inversion methods, such as the NS method and the NI method, are alternative approach to approximate the inverse of equalization matrix before estimating the received signal. In the NS method, the matrix inversion is converted into a matrix-vector multiplication, which reduces the performance. In [17], weighted NS (WNS) detection technique is proposed to minimize the error between the exact matrix inversion and the WNS-based matrix inversion where online learning basis is used to obtain the optimal weights. Proposed technique improves the performance when the BUAR is large, i.e., 128 16 = 8 and BUAR = 128 32 = 4. However, approximate matrix inversion methods require O K 3 where K is the number of user terminals. In [18], modified detection algorithms are proposed based on the NI method where the computational complexity is reduced to O (KN), where N is the number of antennas at the BS side. In [19], tridiagonal matrix and a modified NSE are used to reduce the complexity of detection algorithm. The proposed algorithm performance is investigated on a Xilinx Virtex-7 XC7VX690T FPGA. It achieves a high performance when the BUAR is large, i.e., 128 16 = 8. However, approximate matrix inversion methods require a large number of iterations to achieve a satisfactory performance which increases the computational complexity. On the other hand, avoid matrix inversion methods, such as the GS, SOR, JA, and RI, refine the signal through iterations until the best estimation is obtained. This approach is usually achieving higher performance and lower complexity in comparison with the first approach where the complexity is usually O K 2 [20]. In [21,22], it is illustrated that a detector based on the GS, SOR, JA, and RI methods achieve high performance and low complexity if the BUAR is large. However, they suffered from a considerable performance loss when the BUAR is small. In [23], a detector based on the SOR method has been implemented as on XILINX VIRTEX-7 XC7V690T FPGA for 8 × 64 mMIMO system (BUAR = 64 8 = 8). Two adaptive and non-adaptive SOR detectors have been proposed where the relaxation parameter has been selected based on different approaches [24]. Proposed detectors attained a satisfactory performance and low complexity for 8 × 128 mMIMO system. In [25], a detector based on the JA method has been proposed based on the concept of a decentralized feedforward initialization approach where the mMIMO system is decomposed into multiple smaller subsystems. Proposed algorithm obtained a good performance when the BUAR is large, i.e., BUAR = 128 8 = 16 and BUAR = 256 8 = 32. The RI method and Chebyshev acceleration technique are exploited in the detection approach for mMIMO system. Proposed techniques achieved acceptable performance when the BUAR is large, i.e., BUAR = 600 150 = 4 and BUAR = 160 32 = 5 [25]. It is also well known that initialization has a great impact in achieving a high performance and low complexity [15]. In [16], the impact of a stair matrix in algorithms based on iterative methods has been investigated and a compared with diagonal matrix based detection algorithms. It is shown that the utilization of a stair matrix contributes in obtaining a satisfactory performance within a small number of iterations (low complexity). In [14], a stair matrix is exploited in initializing the GS, the SOR, the NI, and the RI methods and a high performance is achieved when the BUAR is large (BUAR = 128 16 = 8). It is also shown that the utilization of a stair matrix contributes positively in obtaining a high convergence rate.
In the literature, most of the existing detection techniques have been tested and investigated when BUAR is large, and thus they achieve high performance. In [16], several detectors have been studied based on a stair matrix, the GS, SOR, JA, NI, and RI methods. However, they achieved a good performance-complexity profile when the BUAR is large (BUAR = 256 32 = 8). However, this is not the case when the BUAR is small. In such case, a sophisticated signal processing is required to select the optimal solution.

System Model
We consider an uncoded mMIMO system where the BS is furnished with N antennas to serve K single antenna users simultaneously in a single cell where K N. The vector x presents the transmitted data by all users and the symbol vector y presents received data at the BS where x and y are K × 1 and N × 1, respectively. The received vector y is usually corrupted by channel effects and the noise (w). The channel matrix (H) entries are independent and identically distributed (i.i.d) with zero mean and unit variance. The mMIMO model is given as y = Hx + w. (1) In the MMSE algorithm, the equalization matrix (A) plays a role in estimating the signal aŝ and and where σ 2 is the noise variance and I K is the K × K identity matrix. The Gram matrix (G) is H H H. However, a direct computation of A −1 requires O K 3 . However, literature is rich with methods to avoid the direct and exact matrix inversion which relieves a burden of high computational complexity.

Neumann Series
The NS method approximates iteratively the matrix inversion where the problem is represented as a sum of infinite number of terms. However, the computational complexity increases as the number of iterations (n) and terms increase. In such approximation method, the equalization matrix is decomposed into a diagonal matrix (D) and a non-diagonal matrix E, where A −1 can be iteratively approximated and refined as The initial vector of the matrix inversion (A −1 (0) ) is usually selected as D −1 and refined iteratively [14]. However, it converges to A −1 if I − AA −1 (0) < 1. After that, the estimated signal (x) is written asx = A −1 y MF .

Gauss-Seidel
The GS method is an efficient iterative method and it is a special scenario of the SOR method. Signal estimation using the SOR depends on the lower triangular matrix (L) and upper triangular matrix (U) asx where ω is the relaxation parameter, the lower triangular matrix. When ω = 1 in Equation (6), the GS iterative method is obtained and the estimated signal can be written aŝ Generally, the GS method has a fast convergence rate. The initial estimation is sit asx (0) = D −1 y MF and can be refined iteratively.

Jacobi
The JA method is another iterative method where estimation of the signal is written aŝ The rate of convergence of the JA method is slower than the GS method. However, computational complexity of the JA method is lower than the GS method and it is easy to implement in parallel manner.

A Stair Matrix
A stair matrix S has one of the formulas listed below. In the stair matrix, off-diagonal elements on either the odd or the even row are zeros [15]. For instance, a stair matrix of 6 × 6 size can be written as

Proposed Method
Initialization of detection methods impacts the performance-complexity profile as well as the convergence rate. Proposed method exploits the benefits of low complexity JA and GS methods. In addition, a stair matrix (S) is also utilized to guarantee a fast convergence rate. In the literature, most methods are using the diagonal matrix where there is no guarantee of achieving a satisfactory convergence rate. Figure 1 illustrates the block diagram of the proposed detector, and Figure 2 presents the flowchart of the two stages proposed detector, the initialization, and the final estimation. Unlike the work in [16], where a detector is proposed based on individual GS, individual SOR, individual JA, individual NI, and individual RI methods, this paper proposes a hybrid JA-GS detector which initialized using a stair matrix. Proposed detector obtained a satisfactory balance between the performance and the complexity even when the BUAR is small (BUAR < 5 ).  The initial solution is first computed based on the stair matrix aŝ Then, the first iteration of the JA method is conducted aŝ In the next step, the estimation is conducted and iteratively refined based on the GS method as shown in (7), where n = 2. Algorithm 1 illustrates the proposed detection algorithm using the JA, GS methods, and the stair matrix.

Complexity Analysis
It is well known that the complexity is greatly controlled by the number of performed mathematical operations, such as multiplications and divisions. In addition, large number of iterations contributes in elevating the computational complexity. Also, note that the selection of initial vector affect the computational complexity. Inverse of a stair matrix (S −1 ) requires a K real number of divisions as well as 3(K − 1) real number of multiplications. The JA method initialized based on a stair matrix requires n(4K 2 − 2K) real number of multiplications. The GS method requires 4nK 2 real number of multiplications. Therefore, the proposed algorithm imposes K 2 (1 + 4n) + K − 3 real number of multiplications. However, the proposed algorithm avails a small number of iterations which reduces the computational complexity. The number of multiplications is illustrated in Table 1. Table 1. Complexity of the NS, GS, JA, and proposed algorithm.

Numerical Results
Simulation results are presented in this section as well as a comparison between the proposed algorithm and recently introduced mMIMO UL detectors to show the veracity of proposed algorithm. Simulation results are obtained using MATLAB for various iterations and BUAR = 160 30 = 5.33, 160 40 = 4, 160 50 = 3.2, 160 60 = 2.66. Results and comparisons are provided in bit-error-rate (BER) with the signal-to-noise ratio (SNR). In addition, the computational complexity comparison is also provided in the number of multiplications. A modulation scheme of 64QAM and the i.i.d Gaussain channels are utilized. Figure 3 shows the BER comparison of proposed algorithm using joint JA and GS with initialization of the stair matrix and other up-to-date methods for 30 × 160 antenna configuration (BUAR = 5.33). It is clear that the proposed algorithm can considerably speed up the convergence rate compared with existing methods. The proposed algorithm achieved the MMSE performance at n = 2, whereas the traditional GS, JA, and NS methods require extra iterations which increases the computational complexity. It is illustrated that the proposed algorithm achieved a BER =10 −4 at SNR = 18 dB, whereas the GS method can achieve the target performance at SNR = 20 dB.    Figure 5 shows the performance comparison for 50 × 160 antenna configuration (BUAR = 3.2). It is illustrated that the performance of proposed algorithm is better than the GS method. For instance, at SNR = 25 dB, the BER = 10 −3 and 10 −2 for the proposed algorithm and the GS method, respectively. It is also clear that the JA and NS methods are not desirable when BUAR is small.     Unlike the GS method, the proposed algorithm can achieve the target performance within a small number of iterations. The JA and NS method are not practical when BUAR 5. The target performance can be achieved within a small number of iterations when we utilize the proposed algorithm. For instance, BER = 10 −3 can be achieved when n = 2 and n = 3 using the proposed algorithm and the GS method, respectively. In case of the proposed algorithm, the number of multiplications is 32,457, whereas it is 43,200 when the GS method is used. This shows the advantage of utilizing the proposed algorithm over the traditional iterative method. Figure 9 illustrates a comparison between the proposed method and the GS method to achieve BER = 10 −3 . It is clear that the GS method requires a large number of iterations, and therefore a high number of multiplications and high complexity.

Conclusions
This paper proposes a robust hybrid iterative linear detector for mMIMO systems when BUAR is relatively small. Proposed detector utilizes the advantages of iterative methods such as the JA and the GS methods. In addition, a stair matrix structure is also exploited to initialize the proposed detector. Numerical results show that the proposed detector obtained the target performance with the lowest complexity. The JA and the NS methods are not suiting the detector design when the BUAR is small.
Utilization of hybrid iterative methods could be extended to propose efficient hybrid detectors such as the JA-SOR, JA-RI, GS-RI, and RI-SOR. In addition, early mentioned hybrid detectors could be initialized based on the stair matrix structure. The performance-complexity profile of such detectors has to be tested in realistic radio channels, i.e., the QUAsi Deterministic RadIo channel GenerAtor (QuaDRiGa) package.