Open Access
This article is

- freely available
- re-usable

*Electronics*
**2018**,
*7*(12),
382;
https://doi.org/10.3390/electronics7120382

Article

Computationally Efficient Channel Estimation in 5G Massive Multiple-Input Multiple-output Systems

^{1}

Department of Electrical Engineering, University of Engineering & Technology, Peshawar P.O.B. 814, Pakistan

^{2}

School of Electrical Engineering, University of Ulsan, Ulsan 44610, Korea

^{*}

Author to whom correspondence should be addressed.

Received: 9 November 2018 / Accepted: 28 November 2018 / Published: 3 December 2018

## Abstract

**:**

Traditional channel estimation algorithms such as minimum mean square error (MMSE) are widely used in massive multiple-input multiple-output (MIMO) systems, but require a matrix inversion operation and an enormous amount of computations, which result in high computational complexity and make them impractical to implement. To overcome the matrix inversion problem, we propose a computationally efficient hybrid steepest descent Gauss–Seidel (SDGS) joint detection, which directly estimates the user’s transmitted symbol vector, and can quickly converge to obtain an ideal estimation value with a few simple iterations. Moreover, signal detection performance was further improved by utilizing the bit log-likelihood ratio (LLR) for soft channel decoding. Simulation results showed that the proposed algorithm had better channel estimation performance, which improved the signal detection by 31.68% while the complexity was reduced by 45.72%, compared with the existing algorithms.

Keywords:

5G; massive MIMO; computational efficiency; precoding algorithms; channel estimation## 1. Introduction

Multiple-input multiple-output (MIMO) technology is becoming more and more mature, especially when combined with orthogonal frequency division multiplexing (OFDM) [1,2,3,4,5], which has been successfully applied in multiple wireless communications fields such as Long-Term Evolution (LTE) and LTE-Advanced. However, traditional MIMO technology can only achieve a $4\times 4$ or $8\times 8$ scale system [6], which makes it difficult to meet the explosive growth in mobile data services. Therefore, in recent years, massive MIMO has been proposed based on traditional MIMO technology [7]. Massive MIMO systems configure up to hundreds of antenna arrays at the base station to serve multiple single-antenna end-users simultaneously [8], which can improve spectrum utilization and power utilization in wireless communications systems by two to three orders of magnitude [9,10,11]. This has become one of the most promising enabling technologies and one of the hottest research directions in 5G [12]. The maximum likelihood (ML) algorithm is the optimal algorithm in massive MIMO detection algorithms, but its computational complexity increases exponentially with the number of system antennas and the modulation order of baseband signals. It is difficult for it to be fast, effective, and realized in practical applications. Linear detection methods, such as the zero-forcing (ZF) algorithm and minimum mean square error (MMSE) algorithm, can achieve near-optimal detection performance in massive MIMO systems. The complexity in this kind of detection algorithm is greatly reduced, compared with the complexity of the ML algorithm, but introduces a complex high-dimensional matrix inversion operation, so a low-cost and efficient engineering implementation is still a problem to be solved. Aimed at this problem, many simplified algorithms based on the MMSE detection scheme have been proposed in recent years, and can be roughly divided into three types: The series expansion class-approximation method [13,14], the iterative class-approximation method [15,16] and a gradient-based search for an approximate solution [17,18,19,20]. The authors in [13] proposed a method of using Neumann series expansion to approximate the inverse MMSE filter matrix, but when the number of expansion stages was gradually increased $(i2)$, the computational complexity was still high, even equal to or exceeding the exact MMSE. The complexity of the filter matrix inversion algorithm also loses a large degree of detection performance. The authors in [14] applied the Newton algorithm derived from the first-order Taylor series expansion (similar to Neumann series expansion) to massive MIMO signal detection, and used the iterative method to improve the estimation accuracy of the MMSE filter matrix inversion. However, from the aspects of detection performance and computational complexity, the algorithm based on the Newton iteration was not dominant. Different from the two series expansion-based algorithms above, it is necessary to estimate the signal vector sent by the user by inverting the approximate matrix. Some iterative algorithms based on solving linear equations, such as the Richardson iterative (RI) algorithm [15] and the successive over-relaxation (SOR) algorithm [16], use the special properties of the MMSE filter’s symmetric positive definite matrix. Through the method of solving linear equations, they directly estimate the transmission vector, thus avoiding the inversion of high-dimensional matrices.

The RI and SOR algorithms mentioned above have lower computational complexity at a fixed number of iterations, but RI convergence is slower and requires a higher number of iterations to achieve certain detection performance requirements. In SOR, although the detection performance is close to excellent, its internal iterative structure means the algorithm cannot be implemented in parallel in practical applications. The third type of algorithm is mainly designed and implemented based on the idea of a matrix gradient, including the conjugate-gradient (CG) method [17] and the steepest descent (SD) method [18]. This type of algorithm uses the matrix gradient search method and avoids the high-dimensional matrix inversion problem. However, compared to the method of series expansion, the CG and SD algorithms bring about great improvement in detection performance, but calculation of the matrix gradient after each iteration also causes higher complexity.

In this paper, a low-complexity joint detection algorithm was proposed. The SD algorithm had a good convergence direction at the beginning of the iteration, and the Gauss–Seidel (GS) algorithm with low complexity mentioned in [19] was combined with the SD method (called SDGS), which provided an effective search direction for GS iterations, speeding up convergence and improving the detection performance. Furthermore, applying it to soft output detection gave an approximate calculation method for the bit log-likelihood ratio (LLR) of the channel decoder input. A good compromise between detection performance and computational complexity was achieved.

The rest of this paper is organized as follows: Section 2 discusses the system model and analytical derivations, Section 3 explains signal detection, while Section 4 explains the mixed iterative algorithm and the proposed algorithm. Section 5 provides the simulation results, while Section 6 concludes the paper.

## 2. System Model

The research object considered in this paper was the uplink for a massive MIMO system consisting of a base station equipped with $\mathit{N}$ antennas and $\mathit{K}$ single-antenna users where $\mathit{N}\gg \mathit{K}$, as shown in Figure 1. Let ${\mathit{s}}_{\mathit{c}}={\left[{\mathit{s}}_{\mathbf{1}},{\mathit{s}}_{\mathbf{2}},\dots ,{\mathit{s}}_{\mathit{K}}\right]}^{\mathit{T}}$ denote the $\mathit{K}\times \mathbf{1}$ dimensional symbol vector sent by all users simultaneously, where ${\mathit{s}}_{\mathit{k}}\in \mathit{\epsilon}$ was the transmitted symbol from the $\mathit{k}th$ user, and $\mathit{\epsilon}$ was the modulation symbol set.

Let ${\mathit{H}}_{\mathit{c}}\in {\u2102}^{\mathit{N}\times \mathit{K}}$ represent the Rayleigh fading channel matrix; then, the $\mathit{N}\times \mathbf{1}$ dimensional signal vector received by the base station could be recorded as:
where ${\mathit{n}}_{\mathit{c}}$ represented an additive white Gaussian noise (AWGN) vector with an $\mathit{N}\times \mathbf{1}$ dimensional mean of 0 and a covariance matrix of ${\mathit{\sigma}}^{\mathbf{2}}{\mathit{I}}_{\mathit{N}}$. Converting the complex model of Equation (1) into an equivalent real model gave:

$${\mathit{y}}_{\mathit{c}}={\mathit{H}}_{\mathit{c}}{\mathit{s}}_{\mathit{c}}+{\mathit{n}}_{\mathit{c}}$$

$$\mathit{y}=\mathit{H}\mathit{s}+\mathit{n}$$

Among these terms, $\mathit{s}\in {\mathbb{R}}^{\mathbf{2}\mathit{K}}$, $\mathit{H}\in {\mathbb{R}}^{\mathbf{2}\mathit{N}\times \mathbf{2}\mathit{K}}$, $\mathit{y}\in {\mathbb{R}}^{\mathbf{2}\mathit{N}}$, and $\mathit{n}\in {\mathbb{R}}^{\mathbf{2}\mathit{N}}$, which were:

$$\mathit{H}=\left[\begin{array}{c}\Re \left({\mathit{H}}_{\mathit{c}}\right)\\ \Im \left({\mathit{H}}_{\mathit{c}}\right)\end{array}\begin{array}{c}-\Im \left({\mathit{H}}_{\mathit{c}}\right)\\ \Re \left({\mathit{H}}_{\mathit{c}}\right)\end{array}\right],\mathit{y}=\left[\begin{array}{c}\Re \left({\mathit{y}}_{\mathit{c}}\right)\\ \Im \left({\mathit{y}}_{\mathit{c}}\right)\end{array}\right]\phantom{\rule{0ex}{0ex}}\mathit{s}=\left[\begin{array}{c}\Re \left({\mathit{s}}_{\mathit{c}}\right)\\ \Im \left({\mathit{s}}_{\mathit{c}}\right)\end{array}\right],\mathit{n}=\left[\begin{array}{c}\Re \left({\mathit{n}}_{\mathit{c}}\right)\\ \Im \left({\mathit{n}}_{\mathit{c}}\right)\end{array}\right]$$

Among those, $\Re (\xb7)$ and $\Im (\xb7)$ indicated the real part and imaginary part, respectively.

#### 2.1. Minimum Mean Square Error Signal Detection

The main task of signal detection was to accurately determine user transmission vector $s$ at the base station through received signal vector $y$. The transmitted signal vector $\widehat{s}$ detected by the MMSE algorithm could be expressed as:
where $\widehat{y}={H}^{H}y$. The filter matrix $W$ of the MMSE detector could be expressed as:
where $G={H}^{H}H$ was the Gram matrix. In massive MIMO systems, the computational complexity of ${W}^{-1}$ is $O\left({K}^{3}\right)$, which makes the implementation of the MMSE algorithm very complex.

$$\widehat{s}={\left({H}^{H}H+{\sigma}^{2}{I}_{2K}\right)}^{-1}{H}^{H}y={W}^{-1}\widehat{y}$$

$$W=G+{\sigma}^{2}{I}_{2K}$$

#### 2.2. Log Likelihood Ratio Calculation

Various channel coding techniques are commonly employed in wireless communication systems to improve their error performance, since channel reliability can be used to improve system stability. Conventional MIMO system signal detection generally uses a hard decision method to directly execute symbol decisions on the estimated value of the user-transmitted signal vector, i.e., $\widehat{s}$ in Equation (4). In order to output the soft detection information to the back end of the detector, after the MMSE detector estimates $\widehat{s}$, the LLR soft information used for channel decoding could be calculated with the following method. First, we needed to restore the estimated $\widehat{s}$ and the calculated ${W}^{-1}$ to the equivalent complex field to get ${\widehat{s}}_{c}$ and ${W}_{c}^{-1}$. Let $U={W}_{c}^{-1}{G}_{c}={W}_{c}^{-1}{H}_{c}^{H}{H}_{c}$ denote the equalized channel matrix. The equalized signals obtained by the MMSE filter matrix could be obtained from Equations (2) and (4) as follows:

$${\widehat{s}}_{c}={W}_{c}^{-1}{G}_{c}{s}_{c}+{W}_{c}^{-1}{H}_{c}^{H}{n}_{c}\phantom{\rule{0ex}{0ex}}=U{s}_{c}+{W}_{c}^{-1}{H}_{c}^{H}{n}_{c}$$

Then, the estimated value of the symbol transmitted by the $i\mathrm{th}$ user is ${\widehat{s}}_{c,i}={\mu}_{i}{s}_{c,i}+{e}_{i}$, where ${\mu}_{i}={\left[U\right]}_{ii}={U}_{ii}$ represented the effective channel gain after equalization, and ${e}_{i}$ represented the noise plus interference (NPI) term contained in the ${\widehat{s}}_{c,i}$. The noise variance was ${v}_{i}^{2}={\displaystyle \sum}_{j\ne i}^{K}{\left|{U}_{ji}\right|}^{2}+{E}_{ii}{\sigma}^{2}$, where ${U}_{ji}$ and ${E}_{ii}$ represented the $\left(j,i\right)\mathrm{th}$ element of the matrix $U$ and the $i\mathrm{th}$ diagonal of the matrix $E$, respectively, where $E={W}_{c}^{-1}{H}_{c}^{H}{\left({W}_{c}^{-1}{H}_{c}^{H}\right)}^{H}={W}_{c}^{-1}{G}_{c}{W}_{c}^{-1}$. Using the max-log approximation representation given in [11], the LLR ${L}_{i,b}$ corresponding to the $b\mathrm{th}$ bit transmitted by the $i\mathrm{th}$ user was expressed as:
where ${{\rm Y}}_{i}={\mu}_{i}^{2}/{v}_{i}^{2}$ represented the signal-to-interference plus noise ratio (SINR), and ${O}_{b}^{0}$ and ${O}_{b}^{1}$ represented the modulation symbol set with the $b\mathrm{th}$ bit being 0 and 1, respectively.

$${L}_{i,b}={{\rm Y}}_{i}\left(\begin{array}{c}min\\ a\in {O}_{b}^{0}\end{array}{\left|\frac{{\widehat{s}}_{c,i}}{{\mu}_{i}}a\right|}^{2}-\begin{array}{c}min\\ {a}^{\prime}\in {O}_{b}^{1}\end{array}{\left|\frac{{\widehat{s}}_{c,i}}{{\mu}_{i}}a\right|}^{2}\right)$$

## 3. Low Complexity Signal Detection

#### 3.1. Neumann Series Expansion

In a massive MIMO system, the MMSE signal detection algorithm involves a high-dimensional matrix inversion, ${W}^{-1}$, with a computational complexity of $O\left({K}^{3}\right)$. In order to reduce the computational complexity of ${W}^{-1}$, the authors in [11] proposed using Neumann series expansion to approximate matrix inversion results. When $W$ approximates the invertible matrix $X$ and satisfies
then, the Neumann series can be expressed as

$$\underset{n\to \infty}{\mathrm{lim}}{\left(I-WX\right)}^{n}=0$$

$${W}^{-1}={\mathsf{\Sigma}}_{n=0}^{\infty}{\left({X}^{-1}\left(X-W\right)\right)}^{n}X$$

The decomposition matrix is $W=D+E$, where $D$ is the diagonal matrix of $W$, and $E$ is the hollow matrix corresponding to $W$. Since the number of antennas equipped at the base station was much larger than the number of single-antenna users $\left(N\gg K\right)$, matrix $W$ has a diagonal dominant characteristic [3]; that is, $W\approx D$. Substituting $D$ for $X$ in Equation (9) gives:
when $\underset{n\to \infty}{\mathrm{lim}}{\left(-{D}^{-1}E\right)}^{n}=0$, the progression of Equation (10) converges. If we only expand the first $i$ term of the Neumann series, we can get:
when the value of $i$ is small, the Neumann series expansion can approximate ${W}^{-1}$ with lower complexity. For example, when $i=2$, ${W}_{2}^{-1}={D}^{-1}-{D}^{-1}E{D}^{-1}$, which is computationally complex, and the complexity is $O\left({K}^{2}\right)$.

$${W}^{-1}={\mathsf{\Sigma}}_{n=0}^{\infty}{\left(-{D}^{-1}E\right)}^{n}{D}^{-1}$$

$${W}_{i}^{-1}={\mathsf{\Sigma}}_{n=0}^{i-1}{\left(-{D}^{-1}E\right)}^{n}{D}^{-1}$$

#### 3.2. Gauss–Seidel Algorithm

In the Neumann series expansion algorithm, when the number of expansion terms $i\ge 3$, the computational complexity is still $O\left({K}^{3}\right)$, which is equal to or even exceeds the complexity of the exact inverse calculation of the MMSE filter matrix. Unlike the Neumann series expansion, which approximates $W$, the GS algorithm [19] can solve N-dimensional linear equations of the form $Ax=b$ without inverting the matrix, where matrix $A$ is an $N\times N$ dimensional symmetric positive definite matrix, $x$ is the $N\times 1$ dimensional solution vector, and $b$ is the $N\times 1$ dimensional measurement vector. Decomposing matrix $A$ into a diagonal element matrix, ${D}_{A}$, a strict lower triangular element matrix, ${L}_{A}$, and a strict upper triangular element matrix, ${L}_{A}^{H}$, the GS algorithm can estimate $x$ by the following iterative method:
where $i=1,2,\dots $ represents the number of iterations of the GS algorithm. In a massive MIMO system, as the number of base station antennas increases substantially, when it is much larger than the number of single-antenna users $\left(N\gg K\right)$, the individual column vectors of the uplink channel matrix $H$ are progressively orthogonal [20], and $W=G+{\sigma}^{2}{I}_{2K}$ is a symmetric positive definite matrix. Similarly, $W$ can be decomposed into:

$${\widehat{x}}^{\left(i\right)}={\left({D}_{A}+{L}_{A}\right)}^{-1}\left(b-{L}_{A}^{H}{\widehat{x}}^{\left(i-1\right)}\right)$$

$$W=\left(D+L+{L}^{H}\right)$$

Among those terms, $D$, $L$, and ${L}^{H}$, respectively, is the diagonal element matrix of $W$, the strict lower triangular element matrix, and the strict upper triangular matrix. The GS algorithm can be used to avoid inverting the high-dimensional matrix, which directly estimates the transmitted signal vector $\widehat{s}$:
where ${\widehat{s}}^{\left(0\right)}$ represents the initial solution and is usually set to a zero vector.

$${\widehat{s}}^{\left(i\right)}={\left(D+L\right)}^{-1}\left(\widehat{y}-{L}^{H}{\widehat{s}}^{\left(i-1\right)}\right)$$

## 4. Proposed Algorithm

#### 4.1. Hybrid Iterative Algorithm Structure

The SD algorithm based on matrix gradient search can have a good convergence direction at the beginning of the iteration [18], while the GS iterative algorithm has lower complexity. Therefore, using the above characteristics, this paper proposed a hybrid iteration of the joint SD and GS algorithm. The joint algorithm (called the SDGS algorithm) speeds up convergence of the iterative effect of the algorithm without increasing the complexity, and achieves error performance close to the MMSE ideal matrix inversion detection method. The steps are in SDGS Algorithm.

**SDGS Algorithm**

**Step 1:**For the diagonal approximation’s initial value setting, Equation (4) can be converted to $W\widehat{s}=\widehat{y}$; $W$ is a symmetric positive definite matrix and a diagonally dominant matrix, so ${W}^{-1}$ is also a diagonally dominant matrix.**Step 2:**Determine the initial solution using ${D}^{-1}$ instead of ${W}^{-1}$:$${s}^{\left(0\right)}={D}^{-1}\widehat{y}$$Since $D$ is a diagonal matrix, it is obvious that calculating ${D}^{-1}$ requires only low complexity, and the initial value, ${s}^{\left(0\right)}$, is set to the initial value of the SD algorithm according to Equation (15).**Step 3:**The iterative results of the first two GS algorithms are represented by the SD algorithm, and the second GS iteration result can be expressed as:$${s}^{\left(2\right)}={\left(D+L\right)}^{-1}\left(\widehat{y}-{L}^{H}{s}^{\left(1\right)}\right)={\left(D+L\right)}^{-1}\left[\left(\left(D+L\right)-W\right){s}^{\left(1\right)}+\widehat{y}\right]\phantom{\rule{0ex}{0ex}}={s}^{\left(1\right)}+{\left(D+L\right)}^{-1}\left(\widehat{y}-W{s}^{\left(1\right)}\right)={s}^{\left(1\right)}+{\left(D+L\right)}^{-1}{r}^{\left(1\right)}$$$$\mathrm{where}{r}^{\left(1\right)}=\widehat{y}-W\left({s}^{\left(0\right)}+u{r}^{\left(0\right)}\right)=\widehat{y}-W{s}^{\left(0\right)}-uW{r}^{\left(0\right)}={r}^{\left(0\right)}-u{p}^{\left(0\right)};u=\frac{{\left({r}^{\left(0\right)}\right)}^{H}{r}^{\left(0\right)}}{{\left({p}^{\left(0\right)}\right)}^{H}{r}^{\left(0\right)}}$$**Step 4:**Combine single SD and GS iterations into one hybrid iteration by substituting Equation (17) and$${s}^{\left(1\right)}={s}^{\left(0\right)}+u{r}^{\left(0\right)}\to {s}^{\left(2\right)}={s}^{\left(0\right)}+u{r}^{\left(0\right)}+{\left(D+L\right)}^{-1}\left({r}^{\left(0\right)}-u{p}^{\left(0\right)}\right)$$This represents the first two GS iterations as Equation (18); update the mixed iteration value ${\widehat{s}}^{\left(1\right)}={s}^{\left(2\right)}$, and then perform the next GS iteration.**Step 5:**Using the $\left(i-1\right)\mathrm{th}$ GS iteration using Equation (14), ideal estimated value ${\widehat{s}}^{\left(i\right)}$ of the transmitted signal vector $s$ can be obtained by setting the appropriate number of iterations, i:$${\widehat{s}}^{\left(i\right)}={\left(D+L\right)}^{-1}\left(\widehat{y}-{L}^{H}{\widehat{s}}^{\left(i-1\right)}\right)$$Then, ${\widehat{s}}^{\left(i\right)}$ is related to the complex domain for the next soft decision, so the hybrid iterative algorithm can converge very quickly after a small number of iteration.

#### 4.2. Approximate Log-Likelihood Ratio Calculation

The low-complexity MMSE signal detection algorithm described in [13,14,15,16] directly estimates the transmitted signal vector $\widehat{\mathit{s}}$ without calculating ${\mathit{W}}^{-\mathbf{1}}$. The exact calculation of the LLR for the channel decoder input is described in Section 1 (i.e., using the exact ${\mathit{W}}^{-\mathbf{1}}$ matrix inversion information), which is not difficult to find with Equation (7). When the LLR of the first bit transmitted by the $\mathit{i}th$ user is ${\mathit{L}}_{\mathit{i},\mathit{b}}$, the inverse ${\mathit{W}}^{-\mathbf{1}}$ of the MMSE detector filter matrix $\mathit{W}$ needs to be used again to calculate the SINR of the $\mathit{i}th$ user. Consider using the $\mathit{W}$ approximation of the diagonal property to replace ${\mathit{W}}^{-\mathbf{1}}$ and ${\mathit{D}}^{-\mathbf{1}}$, that is, ${\tilde{\mathit{W}}}^{-\mathbf{1}}\approx {\mathit{D}}^{-\mathbf{1}}$, and then convert it to the complex domain to get ${\tilde{\mathit{W}}}_{\mathit{c}}^{-\mathbf{1}}$, in order to obtain the approximate channel gain and NPI variance, expressed as:
where $\tilde{\mathit{U}}\approx {\tilde{\mathit{W}}}_{\mathit{c}}^{-\mathbf{1}}{\mathit{G}}_{\mathit{c}}={\mathit{D}}_{\mathit{c}}^{-\mathbf{1}}{\mathit{G}}_{\mathit{c}}$, and $\tilde{\mathit{E}}\approx {\tilde{\mathit{W}}}_{\mathit{c}}^{-\mathbf{1}}{\mathit{G}}_{\mathit{c}}{\tilde{\mathit{W}}}_{\mathit{c}}^{-\mathbf{1}}=\tilde{\mathit{U}}{\tilde{\mathit{W}}}_{\mathit{c}}^{-\mathbf{1}}=\tilde{\mathit{U}}{\mathit{D}}_{\mathit{c}}^{-\mathbf{1}}$, so we can calculate ${\mathit{{\rm Y}}}_{\mathit{i}}={\tilde{\mathit{\mu}}}_{\mathit{i}}^{\mathbf{2}}/{\tilde{\mathit{v}}}_{\mathit{i}}^{\mathbf{2}}$.

$${\tilde{\mathit{\mu}}}_{\mathit{i}}={\tilde{\mathit{U}}}_{\mathit{i}\mathit{i}}$$

$${\tilde{\mathit{v}}}_{\mathit{i}}^{\mathbf{2}}={\displaystyle \sum}_{\mathit{j}\ne \mathit{i}}^{\mathit{K}}{\left|{\tilde{\mathit{U}}}_{\mathit{j}\mathit{i}}\right|}^{\mathbf{2}}+{\tilde{\mathit{E}}}_{\mathit{i}\mathit{i}}{\mathit{\sigma}}^{\mathbf{2}}$$

#### 4.3. Complexity Analysis

According to the number of real multiplications required in the algorithm, the computational complexity of the SDGS detection algorithm proposed in this paper was analyzed. Since all linear MMSE detection algorithms and the proposed algorithm must calculate the filter matrix, $\mathit{W}=\mathit{G}+{\mathit{I}}_{\mathbf{2}\mathit{k}}$, and the matched filter signal, $\widehat{\mathit{y}}={\mathit{H}}^{\mathit{H}}\mathit{y}$, then only the other parts were analyzed for complexity, mainly using the following three parts of the composition.

#### 4.3.1. Initial Value and First Iteration Calculation

Equation (15) requires $\mathbf{2}\mathit{K}$ multiplications. The first iteration is mainly to calculate ${\mathit{r}}^{\left(\mathbf{0}\right)}=\widehat{\mathit{y}}-\mathit{W}{\mathit{s}}^{\left(\mathbf{0}\right)}$, ${\mathit{p}}^{\left(\mathbf{0}\right)}=\mathit{W}{\mathit{r}}^{\left(\mathbf{0}\right)}$, and scalar. Obviously, the respective $\mathbf{4}{\mathit{K}}^{\mathbf{2}}$, $\mathbf{4}{\mathit{K}}^{\mathbf{2}}$, and $\mathbf{4}\mathit{K}$ sub-multiplications are required. Combining the first iteration of Equation (18), a total of $\left(\mathbf{2}{\mathit{K}}^{\mathbf{2}}+\mathbf{10}\mathit{K}\right)$ multiplications are required.

#### 4.3.2. GS iteration

Equation (19) can be expressed as $\left(\mathit{D}+\mathit{L}\right){\widehat{\mathit{s}}}^{\left(\mathit{i}\right)}=\widehat{\mathit{y}}-{\mathit{L}}^{\mathit{H}}{\widehat{\mathit{s}}}^{\left(\mathit{i}-\mathbf{1}\right)}=\mathit{c}$. After $\mathit{i}$ iterations, the calculation of ${\widehat{\mathit{s}}}^{\left(\mathit{i}\right)}$ mainly comes from the following two steps: First, $\mathit{c}$ is a $\mathbf{2}\mathit{K}\times \mathbf{2}\mathit{K}$ strictly lower triangular element matrix; $\mathbf{2}\mathit{K}\times \mathbf{2}\mathit{K}$ and the $\mathbf{2}\mathit{K}\times \mathbf{1}$ vector ${\widehat{\mathit{s}}}^{\left(\mathit{i}-\mathbf{1}\right)}$ are multiplied, and $\mathit{c}$ must be multiplied $\left(\mathbf{2}{\mathit{K}}^{\mathbf{2}}-\mathit{K}\right)$ times. Second, in Equation (19), the $\mathit{m}th$ element, ${\widehat{\mathit{s}}}_{\mathit{m}}^{\left(\mathit{i}\right)}$, can be expressed as:
where ${\mathit{c}}_{\mathit{m}}$ represents the $\mathit{m}th$ element of $\mathit{c}$, and ${\mathit{L}}_{\mathit{m}\mathit{k}}$ represents the $\mathit{m}th$ row and $\mathit{k}th$ column element of the lower triangular matrix $\left(\mathit{D}+\mathit{L}\right)$. When $\mathit{m}=\mathbf{1}$, it is obvious that ${\widehat{\mathit{s}}}_{\mathbf{1}}^{\left(\mathit{i}\right)}$ requires $\mathbf{2}\mathit{K}$ multiplications, and all ${\widehat{\mathit{s}}}_{\mathit{m}}^{\left(\mathit{i}\right)}\left(\mathit{m}=\mathbf{2},\dots ,\mathbf{2}\mathit{K}\right)$ require $\left(\mathbf{2}{\mathit{K}}^{\mathbf{2}}-\mathit{K}\right)$ multiplications, so a total of $\mathbf{2}{\mathit{K}}^{\mathbf{2}}$ multiplications are required for each iteration.

$${s}_{m}^{\left(i\right)}=\{\begin{array}{c}\frac{{c}_{1}}{{L}_{11}},m=1\\ \frac{{c}_{m}-{{\displaystyle \sum}}_{k=1}^{m-1}{s}_{k}^{\left(i\right)}{L}_{mk}}{{L}_{mm}},m=2,\dots ,2K\end{array}$$

#### 4.3.3. LLR calculation

The computational complexity of this part mainly came from the calculation of the effective channel gain and the NPI variance after equalization. It can be known from Equations (20) and (21) that all the elements of the matrix $\tilde{\mathit{U}}$ and the pair of matrices $\tilde{\mathit{E}}$ need to be calculated. Obviously, the former requires $\mathbf{2}{\mathit{K}}^{\mathbf{2}}$ multiplications, while the latter only requires $\mathbf{2}\mathit{K}$ multiplications. Therefore, a total of $\left(\mathbf{2}{\mathit{K}}^{\mathbf{2}}+\mathbf{2}\mathit{K}\right)$ multiplications were required for this step.

In summary, the total complexity required for the joint iterations to be applied to the soft decision was $\mathbf{2}{\mathit{K}}^{\mathbf{2}}\left(\mathit{i}+\mathbf{2}\right)+\mathbf{12}\mathit{K}\mathit{i}$, which reduced the computational complexity by an order of magnitude, compared to the traditional MMSE algorithm. The complexity of the number of iterations was kept at $\mathit{O}\left({\mathit{K}}^{\mathbf{2}}\right)$. In addition, considering the application scenarios of hard decision detection, Table 1 also gave a comparison of the computational complexity in the four detection algorithms.

## 5. Simulation Results

We deployed Matlab (R2017a, Mathworks, Natick, MA, USA) for performing analysis and experimentation. In order to verify the soft and hard detection performance of the SDGS algorithm proposed in this paper, this section presents Monte Carlo simulation results based on Matlab. The main simulation parameters configured are in Table 2.

Figure 2 compares the bit error rate (BER) based on Neumann series (NS) expansion, the conjugate gradient (CG) detection algorithm, the Gauss–Seidel iterative detection algorithm, the MMSE exact inversion detection algorithm, and the proposed SDGS joint algorithm under different antenna configurations. The decision mode is a hard decision; that is, estimated signal vector $\widehat{s}$ is directly judged. The simulation results showed that the detection performance of the various algorithm increased with the number of iterations or the number of items expanded by the Neumann series. For example, when the number of iterations $i=2$, the BER performance of the SDGS algorithm was much better than the BER when the number of items expanded by the Neumann series was 2. By comparing the performance in Figure 2a,b, it can be seen that with the increase in the ratio of the number of base station antennas to the number of users $\left(N/K\right)$, the BER performance of the various algorithms was greatly improved. For example, if the BER was to reach ${10}^{-3}$, the MMSE algorithm and the proposed algorithm require an SNR of about $13\mathrm{dB}$ when the antenna configuration is $64\times 16$, and only $8\mathrm{dB}$ when the configuration is $128\times 16$.

Figure 3 shows the soft decision simulation results for the two antenna configurations. With BER based on the NS, CG, and GS iterative detection algorithms, the MMSE exact inversion detection algorithm and the SDGS joint algorithm were compared. We set the system’s convolutional code rate to 1/2, and the LLR calculation used the approximate calculation method described in this paper. Simulation results showed that no matter what kind of MMSE receiver was used, the soft decision was checked. The measured performance was much better than the hard decision. For example, when the BER reached ${10}^{-4}$, when the antenna was configured, the MMSE algorithm and the proposed algorithm required an SNR of $10\mathrm{dB}$ for hard decisions and only $5\mathrm{dB}$ for soft decisions. In addition, for the same number of iterations, the BER performance of the SDGS algorithm proposed in this paper was better than the other three simplified algorithms, and after a few iterations, the detection performance could quickly approach the detection performance of the ideal MMSE filter matrix inversion.

Figure 4 shows the hard decision BER comparison of the proposed SDGS algorithm with NS, CG, GS and MMSE under a high fading scenario with $128\times 16$ antenna configuration. As can be seen from Figure 4 the BER of the proposed SDGS algorithm was better and followed the MMSE performance with increasing SNR and number of iterations. Moreover, due to high fading impact on the SNR, there was a gap between the proposed SDGS algorithm and MMSE algorithm at a high SNR level.

Figure 5 shows the BER comparison of the proposed SDGS algorithm with NS, CG, GS and MMSE under a low fading level and $128\times 16$ antenna configuration. It can be seen from Figure 5 that all the algorithms showed lower BER and better performance as compared with the hard decision BER performance in Figure 2a and Figure 4. Therefore, to keep the system performance in a suitable level, the fading level and number of iterations should be considered, which has an obvious impact on the system’s overall performance. Furthermore, the proposed SDGS algorithm in Figure 5 had a close BER performance with MMSE which indicated that the SDGS algorithm showed better performance in the low fading level.

## 6. Conclusions

Signal detection methods based on MMSE filtering in massive MIMO systems are widely used, but matrix inversion with higher complexity makes it more difficult to implement them in practical applications. Some methods of approximate inversion, such as Neumann series expansion, has reduced the detection complexity, but due to a large degree of detection performance loss; others avoid the complex matrix inversion and directly estimate the signal vector. Although, computational complexity is reduced by orders of magnitude, detection performance needs to be improved. Based on the MMSE criterion, this paper proposes a low-complexity, hybrid, iterative SDGS joint detection algorithm, which directly estimates the user’s transmitted symbol vector and can quickly converge to obtain an ideal estimation value with a few simple iterations. The matrix inversion operation is avoided, and algorithm complexity is kept at $O\left({K}^{2}\right)$. In addition, in order to make full use of soft information, the algorithm is applied to the soft decision, and an approximate calculation method of the LLR for channel decoding is given, which further improves the signal detection performance. Theoretical derivation and simulation results show that the SDGS algorithm can be used as one of the most effective solutions for signal detection in massive MIMO systems.

## Author Contributions

I.K. conceived and designed the presented idea and developed the theory, performed the simulations, and wrote the paper. M.H.Z. analyzed the research and performed experimentations, and M.A. provided extensive technical support throughout the research. S.K. provided extensive support in the theoretical analysis and provided funding support.

## Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF-2016R1D1A1B03934653).

## Conflicts of Interest

The authors declare no conflicts of interest.

## References

- Chang, Y.K.; Ueng, F.B.; Jhang, Y.W. Turbo MIMO-OFDM Receiver in Time-Varying Channels. KSII Trans. Internet Inf. Syst.
**2018**, 12, 3704–3724. [Google Scholar] - Tseng, S.M.; Chen, Y.F. Average PSNR Optimized Cross Layer User Grouping and Resource Allocation for Uplink MU-MIMO OFDMA Video Communications. IEEE Access
**2018**, 6, 50559–50571. [Google Scholar] [CrossRef] - Ghosh, A.; Ratasuk, R.; Mondal, B.; Mangalvedhe, N.; Thomas, T. LTE-advanced: Next-generation wireless broadband technology. IEEE Wirel. Commun.
**2010**, 17, 10–22. [Google Scholar] [CrossRef] - Khan, I.; Zafar, M.H.; Jan, M.T.; Lloret, J.; Basheri, M.; Singh, D. Spectral and Energy Efficient Low-Overhead Uplink and Downlink Channel Estimation for 5G Massive MIMO Systems. Entropy
**2018**, 20, 92. [Google Scholar] [CrossRef] - Rusek, F.; Persson, D.; Lau, B.K.; Larsson, E.G.; Marzetta, T.L.; Edfors, O.; Tufvesson, F. Scaling up MIMO: Opportunities and challenges with very large arrays. IEEE Signal Process. Mag.
**2012**, 30, 40–60. [Google Scholar] [CrossRef] - Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. Massive MIMO for next-generation wireless systems. IEEE Commun. Mag.
**2014**, 52, 186–195. [Google Scholar] [CrossRef] - Marzetta, T.L. Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. Wirel. Commun.
**2010**, 9, 3590–3600. [Google Scholar] [CrossRef] - Khan, I.; Singh, D. Efficient compressive sensing based sparse channel estimation for 5G massive MIMO systems. AEU-Int. J. Electron. Commun.
**2018**, 89, 181–190. [Google Scholar] [CrossRef] - Ngo, H.Q.; Larsson, E.G.; Marzetta, T.L. Energy and spectral efficiency of very large multiuser MIMO systems. IEEE Trans. Commun.
**2011**, 61, 1436–1449. [Google Scholar] - Dai, L.; Wang, Z.; Yang, Z. Spectrally efficient time-frequency training OFDM for mobile large-scale MIMO systems. IEEE J. Sel. Areas Commun.
**2013**, 31, 251–263. [Google Scholar] [CrossRef] - Benmimoune, M.; Driouch, E.; Ajib, W. Joint antenna selection in grouping in Massive MIMO Systems. In Proceedings of the 10th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP), Prague, Czech Republic, 20–22 July 2016; pp. 1–6. [Google Scholar]
- Khan, I. A Robust Signal Detection Schemefor 5G Massive Multiuser MIMO Systems. IEEE TVT.
**2018**, 67, 9567–9604. [Google Scholar] - Wu, M.; Yin, B.; Wang, G.; Dick, C.; Cavallaro, J.R.; Studer, C. Large-scale MIMO detection for 3GPP LTE: Algorithms and FPGA implementations. IEEE J. Sel. Top. Signal Process.
**2014**, 85, 916–929. [Google Scholar] [CrossRef] - Tang, C.; Liu, C.; Yuan, L.; Xing, Z. High precision low complexity matrix inversion based on Newton iteration for data detection in the massive MIMO. IEEE Commun. Lett.
**2016**, 20, 490–493. [Google Scholar] [CrossRef] - Gao, X.; Dai, L.; Yuen, C.; Zhang, Y. Low-complexity MMSE signal detection based on Richardson method for large-scale MIMO systems. In Proceedings of the Vehicular Technology Conference, Vancouver, BC, Canada, 14–17 September 2014; pp. 1–5. [Google Scholar]
- Gao, X.; Dai, L.; Hu, Y.; Wang, Z.; Wang, Z. Matrix inversion-less signal detection using SOR method for uplink large-scale MIMO systems. In Proceedings of the IEEE Global Communications Conference, Qingdao, China, 8–12 December 2014; pp. 3291–3295. [Google Scholar]
- Hu, Y.; Wang, Z.; Gaol, X.; Ning, J. Low-complexity signal detection using the CG method for uplink large-scale MIMO systems. In Proceedings of the IEEE International Conference on Communications, Macau, China, 19–21 November 2014; pp. 477–481. [Google Scholar]
- Hageman, L.A. The Iterative Solution for Large Linear Systems; Academic Press: Manhattan, NY, USA, 1971; pp. 9–22. [Google Scholar]
- Dai, L.; Gao, X.; Su, X.; Han, S.; Chih-Lin, I.; Wang, Z. Low-complexity soft-output signal detection based on a Gauss-Seidel method for uplink multiuser large-scale MIMO systems. IEEE Trans. Veh. Technol.
**2014**, 64, 4839–4845. [Google Scholar] [CrossRef] - Wu, M.; Yin, B.; Vosoughi, A.; Studer, C.; Cavallaro, J.R.; Dick, C. Approximate matrix inversion for high-throughput data detection in the large-scale MIMO uplink. In Proceedings of the International Symposium on Circuits & Systems, Beijing, China, 19–23 May 2013; pp. 12155–12158. [Google Scholar]

**Figure 2.**Hard decision bit error rate (BER) performance comparison. (

**a**) Analysis at $64\times 16$ antenna configuration; (

**b**) Analysis at $128\times 16$ antenna configuration.

**Figure 3.**Soft decision bit error rate (BER) performance comparison. (

**a**) Analysis at $64\times 16$ antenna configuration; (

**b**) Analysis at $128\times 16$ antenna configuration.

**Figure 4.**Hard decision bit error rate (BER) performance comparison with high fading level at $128\times 16$ antenna configuration.

**Figure 5.**Soft decision bit error rate (BER) performance comparison with slow fading level at $128\times 16$ antenna configuration.

Algorithm | Real Multiplications | Performance |
---|---|---|

NS [11] | $\left(\mathbf{8}{\mathit{K}}^{\mathbf{3}}-\mathbf{8}{\mathit{K}}^{\mathbf{2}}+\mathbf{2}\mathit{K}\right)\left(\mathit{i}+\mathbf{2}\right)+\left(\mathbf{4}{\mathit{K}}^{\mathbf{2}}-\mathbf{4}\mathit{K}\right)(\mathit{i}\mathbf{1})$ | General |

CG [15] | $\mathbf{4}{\mathit{K}}^{\mathbf{2}}\left(\mathit{i}+\mathbf{1}\right)+\mathbf{10}\mathit{K}\mathit{i}$ | Better |

GS [17] | $\mathbf{4}{\mathit{K}}^{\mathbf{2}}\mathit{i}+\mathbf{2}\mathit{K}$ | Better |

SDGS | $\mathbf{2}{\mathit{K}}^{\mathbf{2}}\left(\mathit{i}+\mathbf{2}\right)+\mathbf{12}\mathit{K}\mathit{i}$ | Optimal |

Parameter | Value |
---|---|

Number of transmitter antennas $\left(N\right)$ | 64–128 |

Number of receiver antennas $\left(K\right)$ | 16 |

Baseband modulation mode | 16 QAM |

Signal-to-noise ratio (SNR) | 10–12 dB |

Code rate | ½ |

Channel characteristics | Uncorrelated Rayleigh fading |

Number of iterations $\left(i\right)$ | 1000 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).