Comparative Analysis of Data Detection Techniques for 5G Massive MIMO Systems

Massive multiple-input multiple-output (MIMO) is a backbone technology in the fifth-generation (5G) and beyond 5G (B5G) networks. It enhances performance gain, energy efficiency, and spectral efficiency. Unfortunately, a massive number of antennas need sophisticated processing to detect the transmitted signal. Although a detector based on the maximum likelihood (ML) is optimal, it incurs a high computational complexity, and hence, it is not hardware-friendly. In addition, the conventional linear detectors, such as the minimum mean square error (MMSE), include a matrix inversion, which causes a high computational complexity. As an alternative solution, approximate message passing (AMP) algorithm is proposed for data detection in massive MIMO uplink (UL) detectors. Although the AMP algorithm is converging extremely fast, the convergence is not guaranteed. A good initialization influences the convergence rate and affects the performance substantially together and the complexity. In this paper, we exploit several free-matrix-inversion methods, namely, the successive over-relaxation (SOR), the Gauss–Seidel (GS), and the Jacobi (JA), to initialize the AMP-based massive MIMO UL detector. In other words, hybrid detectors are proposed based on AMP, JA, SOR, and GS with an efficient initialization. Numerical results show that proposed detectors achieve a significant performance enhancement and a large reduction in the computational complexity.


Introduction
In the past three decades, a rapid data traffic growth has been witnessed where the average wireless transmission rate has been doubled every 18 months. It has grown more than eight-fold between 2015 and 2020 and is expected to grow more than twenty-fold by 2022 [1]. Nowadays, the fifth-generation (5G) wireless communication system is being deployed worldwide by several mobile carriers to achieve a high data rate and a satisfactory quality of services (QoS) [2]. Figure 1 illustrates the deployment map of commercial 5G networks worldwide. One of the key target applications/scenarios of the 5G and beyond 5G (B5G) is the massive machine-type communications (mMTC) where a large number of sensor-like devices are employed and produce a choppy traffic [4][5][6]. The sub-6 GHz will continue the spine of future 5G wireless networks and the B5G as well. To achieve a high data rate (up to 20 Gbps), high spectral efficiency (100 bps/Hz), and high mobility (up to 1000 km/h), several technologies are considered as the backbone of the 5G networks, such as the massive multiple-input multiple-output (MIMO), millimeter wave (mmWave) [7,8], the internet of things (IoT) [9], ultra dense networks (UDNs), visible light communications (VLC), the optical wireless communications (OWC), and the device to device (D2D) communications [10]. The IoT connects billions of sensors, devices, and machines to each other, and they exchange data among themselves. It includes countless physical components embedded with sensors, actuators, and radio frequency identification (RFID) tags and enables their cooperation and data exchange through IoT protocols [11]. D2D communication is an emerging technology to relieve the congestion in communication networks. It reduces the latency between users and hence, boost the reliability of the communication link. It facilitates data exchange among devices without the base station (BS) or access point where data can be transferred directly among connected devices without interaction with external devices. For instance, the D2D communication is considered in the third generation partnership project (3GPP) standards to enhance the network's connectivity. Due to a huge number of connected devices and data exchange, the demand for radio spectrum is ever-increasing. The spectrum sharing (SS) is the significant technology to enhance the spectrum utilization efficiency [12]. In addition, OWC technologies can handle the spectrum crisis. For instance, visible light communication (VLC) technology can achieve a fast speed, unregulated bandwidth, high security, and high data rate communication for domestic applications [13]. The UDNs approach can coordinate the deployment of a dense small cell with very high traffic intensity. It is needed to distinguished advantages on escalating the spectrum reuse and lessening the path loss. It improves the capacity crunch and coverage. To acquire a huge bandwidth, and thus, higher data rates (Gigabits\ second), the mmWave spectrum is exploited. In addition, a highly directive radiation beam is required to maximize the transfer of energy to the desired user [14]. Massive MIMO is one of the crucial and mature technologies in the 5G wireless communications system both for downlink (DL) and uplink (UL) [15]. In massive MIMO, a massive number of antennas at the base station (BS) is deployed to avail simultaneously many single antenna users. Massive MIMO improves the capacity, the throughput, the reliability, the diversity gain, the performance gain, and the spectral efficiency [16]. In practice, the bottleneck in obtaining the full advantage of the massive MIMO technology is utilizing an efficient precoder, efficient detector, and accurate channel estimation. However, the BS infrastructure with a large number of antennas brings a high pressure to signal processing in terms of a high performance and a low number of computations. An accurate and instantaneous estimation of the channel state information (CSI)/ channel impulse response (CIR) is required at the BS. In addition, the DL data are precoded to concentrate the spatial data-streams at the user's location [17]. In the UL, an efficient detector is required to estimate the transmitted signal where the main issue with utilizing a large number of antennas is the high detection complexity involved. The maximum likelihood (ML) detector is considered as an optimum detector; however, it is prohibitively complex due to an exhaustive search [18,19].
Researchers have put tremendous efforts to achieve a satisfactory trade-off between the performance and computational complexity. A substantial literature in detection schemes for massive MIMO UL has appeared and comprehensive surveys are presented in [18,20]. In [21], the computational complexity of linear detection mechanisms based on the QR, Cholesky, and LDL decomposition algorithms for different massive MIMO configurations is presented. Other detectors, such as the sphere decoding (SD), require the QR-decomposition which increases the computational complexity. Therefore, most existing detectors need a refinement to meet the implementation demands of a low computational complexity and high performance, in particular under complicated environments. The approximate message passing (AMP) and iterative matrix inversion methods are popular schemes to achieve a pleasant performance and partially release a burden of computational complexity. The first scheme, the AMP algorithm, was initially proposed for signal reconstruction and solving the selection problem and state evolution analysis in compressed sensing (CS) [22]. The AMP algorithm with a large system size is attractive in terms of realization and implementation. Therefore, the utilization of the AMP algorithm is extended for applications in linear estimation for massive MIMO systems, precoding, and multi-user detection [22]. The convergence rate of the AMP algorithm is good, but it is not guaranteed, especially in highly loaded systems. The damping principle is a solution to slow down the estimates and ease convergence. Another solution includes an appropriate initialization, which would affect the convergence rate and impact substantially both the performance and the complexity [23]. The second scheme of detection in massive MIMO includes free-matrix-inversion methods to estimate the received signal. In other words, such methods have inversion-less terms. Instead of matrix multiplications, matrix-vector products are utilized, and the diagonally dominant property of the equalization matrix is exploited. Therefore, a detector based on free matrix inversion methods often has a low computational complexity [24]. We notice that there is a significant room for fundamental research contributions in data detection for large-scale MIMO systems.
In this paper, we propose efficient initialization methods of the AMP algorithm for base station (BS) detectors using free matrix inversion methods, namely, the successive over relaxation (SOR), the Gauss-Seidel (GS), and the Jacobi (JA). The diagonally dominant property of the equalization matrix, the SOR, the GS, and the JA methods will be exploited to compute the initial solution of the AMP-based detector. In other words, the SOR, the GS, and the JA are utilized to effectively initialize the AMP-based detector.

Background
We consider N multi-user MIMO BS antennas are serving K single antenna users where N K. The bottleneck of using this scenario (N K) is the channel hardening phenomenon is being dominant. It is utilized to cancel the small-scale fading characteristics [25]. Symbols are transmitted individually by K users, where a symbol vector x = [x 1 , x 2 , ....., x K ] T is transmitted by all users. At the BS, a vector y = [y 1 , y 2 , ....., y N ] T is received as where H is the channel matrix and w is the noise vector.
(1) is very popular in detection techniques to estimate the transmitted signal (x). However, a perfect channel estimation is assumed at the BS. The free-matrix-inversion methods depend on the equalization matrix (A), which is expressed as where σ 2 , I K , and G are the noise variance, the K × K identity matrix, and the Gram matrix or Gramian G = H H H, respectively.

Successive over Relaxation
The symmetric positive definite is the key property of (2) and it is diagonally dominant which is the basis of the SOR detector to achieve the MMSE performance and low computational complexity [26,27]. The SOR detector is an iterative manner to avert a high dimensional matrix inversion with the support of a relaxation parameter (ω) and the matrix splitting pattern: A = D + U + L. The transmitted signal vector is estimated as:x where y MF = H H y, D, U, and L are the diagonal, the upper triangular, and the lower triangular matrices, respectively. The bit-error-rate (BER) performance and the convergence rate of the SOR detector are substantially liable on the values of ω. In general, the performance of the SOR detector is usually close to the optimum, and the complexity is relatively low. Unfortunately, it suffers from a high performance loss in high channel correlation scenarios [28]. In addition, it always difficult to obtain the optimal value of ω [29].

Gauss-Seidel Detector
Unlike the SOR method, the GS detector is free of ω. However, it depends on matrix splitting to attain a good BER performance and low complexity. The refinement of the signal iŝ Compared to the SOR detector, the GS detector convergence rate is slow because the SOR detector has the aid of ω [30]. In addition, both methods need few iterations, but the inverse of a triangular matrix is needed [29]. In [31], an efficient VLSI architecture is proposed for the GS detector.

Jacobi Method
The JA is a simple iterative method where D has a great impact to equalize the signal aŝ JA iteration can obtain a near-optimal performance but requires a large number of iterations [29]. The complexity required by the JA method is lower than the complexity required by the SOR and GS methods.

Approximate Message Passing
Message passing detection (MPD) is an approach for data detection in massive MIMO where the Gram matrix calculation is simplified based on the channel hardening theory. It is also a matrix-inverse free, and hence, it achieves better BER performance than the MMSE [32]. In [33], the convergence analysis of the MPD algorithm is presented. It is shown that the algorithm involves lots of exponential computations. Therefore, the MPD is undesirable in massive MIMO because of the high complexity. Unlike the MPD, where messages are associated with edges on the factor graph, the AMP associates them with nodes. Therefore, the number of messages is significantly reduced, so computational complexity [34]. It is noteworthy that the AMP algorithm is developed from the conventional approach of belief propagation (BP) [35]. In the AMP algorithm, iterative thresholding is utilized to estimate the received signal by minimizing the residual error in each successive iteration as where where z (n+1) is the residual. A common selection of the initial estimations arex (0) = [0 · · · 0], α (0) = σ 2 , and z (0) = [0 · · · 0]. It is noteworthy that the AMP algorithm can be implemented by a matrix-vector product, which leads to a low complexity. In [36], the performance of the AMP algorithm is evaluated and justified. In [32], a hardware implementation of the MPD algorithm is proposed where approximate updating and serial message updating schemes are utilized to reduce the computational complexity of the MPD algorithm. The algorithm is free of division and exponential operations.

Proposed Methods
The equalization matrix (A) is diagonally dominant and utilization of the diagonal matrix (D) leads to a good convergence. In this paper, we exploit the diagonally dominant property of A in initializing the AMP-based detector instead of a zero vector. In other words,x (0) is computed based on D. Then, low complexity iterative methods are utilized to estimate the signalx (1) . Figure 2 shows the general block diagram of proposed detector. Details and mathematical description are shown below:
Algorithms 1-3 show the proposed iterative detectors based on the AMP algorithm, the SOR, the GS, and JA methods. Each of them contains the initialization, iterations, and the final estimation. In addition, the main difference between the three of them is the initialization part. Algorithm 1: Initialize the AMP using the SOR method Input: y, H, σ 2 , n, ω Output: Estimated signalx Initialization: Iteration: for j = 2 : 1 : n Apply the AMP algorithm using (6)-(8) end Returnx.

Complexity Analysis
Multiplications and divisions are dominant operations in the computational complexity. The AMP, SOR, GS, and JA methods can be implemented by matrix-vector multiplications. The AMP algorithm requires only two matrix-vector multiplications per iteration. Therefore, the AMP algorithm is implementation-friendly. The computation of D −1 requires K real number of divisions. The number of multiplications required to perform the first iteration in the SOR, GS, and JA methods are 4K 2 + 4K, 4K 2 , and 4K 2 − 2K, respectively. Although initialization increases the number of multiplications, this complexity increment is trivial because a small number of iterations are required to obtain the target performance. Table 1 illustrates the number of required multiplications in each detection algorithm.

Numerical Results
In this paper, we utilize independent and identically distributed (i.i.d) Gaussian channels, the MIMO size is 32 × 128, and 32 × 256, the modulation order is 64QAM, and several iterations are considered to attain the target. The performance of the proposed initialized AMP-based detector is presented in BER versus the signal-to-noise ratio (SNR). Figure 3 illustrates the performance of a detector based on the initialized AMP detector using the SOR, the GS, and the JA methods when n = 2 and MIMO size is 32 × 128. The AMP detector initialized by JA, GS, and SOR methods outperforms the conventional AMP-based detector. In other words, a large number of iterations is required to achieve the target performance if the efficient initialization is absent. For instance, at SNR = 30 dB, the BER is close to 10 −3 and 10 −2 for the proposed AMP-GS and conventional AMP, respectively. When n = 3, the performance of the proposed detectors is significantly improved, and it is still better than the original AMP detector (Figure 4). For instance, at SNR = 26 dB, the BER is 10 −4 and 10 −3 for the initialized AMP-based detector (AMP-GS and AMP-SOR) and the AMP-based detector without initialization, respectively. It is also clear that all proposed initialization methods achieve better performance than the original AMP-based detector. To study the impact of MIMO size in proposed algorithms, the performance of a 32 × 256 MIMO size is also presented. Figure 5 presents the performance of 32 MIMO system when n = 2. The proposed initialized detectors are outperforming the original AMP detector. The BER is 10 −4 at SNR = 24 dB and 26 dB in proposed detectors and AMP detectors, respectively. It is also clear that extra iterations are required in the AMP to achieve the MMSE performance.  Figure 6 compares the performance-complexity profile of proposed detectors and the original AMP detector to attain Bit-error-rate BER = 10 −4 . Initialized AMP-based detector achieves the target BER with a small number of iterations, and hence, low computational complexity. The AMP-Jacobi (AMP-JA)-based detector achieved the target performance with the lowest computational complexity. In contrast, the AMP-based detector without initialization requires a high number of iterations, and thus, high complexity is required to achieve the target performance. As shown in Section 2, iterative methods (JA, GS, and SOR) require a small number of iterations to achieve a high performance.

Conclusions
In this paper, free matrix inversion methods have been exploited to initialize the AMP detector. Compared with the traditional AMP detector, initialized detectors have attained the target performance with a small number of multiplications and hence, a low computational complexity. The detector based on the AMP and GS method has the best performance over several iterations. However, the computational complexity of the AMP-JA is the lowest. In addition, the AMP-SOR and the AMP-Gs have extremely low complexity compared to the AMP algorithm. It is also shown that an efficient initialization impacts the performance and the number iterations and hence, the computational complexity.
Recent studies show that the machine learning (ML) can provide a significant improvement in image recognition and wireless communications. Recently, there is a remarkable trend to exploit the ML approach in large-scale MIMO detector's design where the most expensive process is the "learning", which can be conducted off-line. However, there are many attempts to do it online. Therefore, the proposed AMP-JA, AMP-GS, and AMP-SOR could be exploited with the ML approaches to achieve a remarkable low computational complexity and high performance.