The matrix inversion operation is undesirable in mMIMO systems and will greatly inflate the computational complexity. To reduce the complexity, iterative matrix inversions methods are proposed to estimate iteratively the signal. However, iterative methods can be categorized into two groups: approximate matrix inversion methods and avoid matrix inversion methods. Approximate matrix inversion methods, such as the NS method and the NI method, are alternative approach to approximate the inverse of equalization matrix before estimating the received signal. In the NS method, the matrix inversion is converted into a matrix-vector multiplication, which reduces the performance. In [

17], weighted NS (WNS) detection technique is proposed to minimize the error between the exact matrix inversion and the WNS-based matrix inversion where online learning basis is used to obtain the optimal weights. Proposed technique improves the performance when the BUAR is large, i.e.,

$\frac{128}{16}=8$ and BUAR =

$\frac{128}{32}=4$. However, approximate matrix inversion methods require

$\mathcal{O}\left({K}^{3}\right)$ where

K is the number of user terminals. In [

18], modified detection algorithms are proposed based on the NI method where the computational complexity is reduced to

$\mathcal{O}\left(KN\right)$, where

N is the number of antennas at the BS side. In [

19], tridiagonal matrix and a modified NSE are used to reduce the complexity of detection algorithm. The proposed algorithm performance is investigated on a Xilinx Virtex-7 XC7VX690T FPGA. It achieves a high performance when the BUAR is large, i.e.,

$\frac{128}{16}=8$. However, approximate matrix inversion methods require a large number of iterations to achieve a satisfactory performance which increases the computational complexity. On the other hand, avoid matrix inversion methods, such as the GS, SOR, JA, and RI, refine the signal through iterations until the best estimation is obtained. This approach is usually achieving higher performance and lower complexity in comparison with the first approach where the complexity is usually

$\mathcal{O}\left({K}^{2}\right)$ [

20]. In [

21,

22], it is illustrated that a detector based on the GS, SOR, JA, and RI methods achieve high performance and low complexity if the BUAR is large. However, they suffered from a considerable performance loss when the BUAR is small. In [

23], a detector based on the SOR method has been implemented as on XILINX VIRTEX-7 XC7V690T FPGA for

$8\times 64$ mMIMO system (BUAR =

$\frac{64}{8}=8$). Two adaptive and non-adaptive SOR detectors have been proposed where the relaxation parameter has been selected based on different approaches [

24]. Proposed detectors attained a satisfactory performance and low complexity for

$8\times 128$ mMIMO system. In [

25], a detector based on the JA method has been proposed based on the concept of a decentralized feedforward initialization approach where the mMIMO system is decomposed into multiple smaller subsystems. Proposed algorithm obtained a good performance when the BUAR is large, i.e., BUAR =

$\frac{128}{8}=16$ and BUAR =

$\frac{256}{8}=32$. The RI method and Chebyshev acceleration technique are exploited in the detection approach for mMIMO system. Proposed techniques achieved acceptable performance when the BUAR is large, i.e., BUAR =

$\frac{600}{150}=4$ and BUAR =

$\frac{160}{32}=5$ [

25]. It is also well known that initialization has a great impact in achieving a high performance and low complexity [

15]. In [

16], the impact of a stair matrix in algorithms based on iterative methods has been investigated and a compared with diagonal matrix based detection algorithms. It is shown that the utilization of a stair matrix contributes in obtaining a satisfactory performance within a small number of iterations (low complexity). In [

14], a stair matrix is exploited in initializing the GS, the SOR, the NI, and the RI methods and a high performance is achieved when the BUAR is large (BUAR =

$\frac{128}{16}=8$). It is also shown that the utilization of a stair matrix contributes positively in obtaining a high convergence rate.

In the literature, most of the existing detection techniques have been tested and investigated when BUAR is large, and thus they achieve high performance. In [

16], several detectors have been studied based on a stair matrix, the GS, SOR, JA, NI, and RI methods. However, they achieved a good performance–complexity profile when the BUAR is large (BUAR =

$\frac{256}{32}=8$). However, this is not the case when the BUAR is small. In such case, a sophisticated signal processing is required to select the optimal solution.