An Enhanced Precoder for Multi User Multiple-Input Multiple-Output Downlink Systems

: Recently, as the demand for data rate of users has increased, wireless communication systems have aimed to offer high throughput. For this reason, various techniques which guarantee high performance have been invented, such as massive multiple-input multiple-output (MIMO). However, the implementation of huge base station (BS) antenna array and decrease of reliability as the number of users increases are chief obstacles. In order to mitigate these problems, this paper proposes an adaptive precoder which provides high throughput and bit error rate (BER) performances to achieve the desired data rate in multi user (MU) MIMO downlink systems which have a practical BS antenna array (up to 16). The proposed scheme is optimized with a modiﬁed minimum mean square error (MMSE) criterion in order to improve BER gain and reduce data streams in order to obtain diversity gain at low signal to noise ratio (SNR). It is shown that the BER and throughput performances of the proposed scheme are improved.


Introduction
In the future, wireless communication systems will be able to offer various services, such as virtual reality, big data, etc., and large amounts of data will be able to travel over the network. Therefore, the wireless communication system will probably be able to provide multiple users with high reliability and throughput [1]. However, insufficient bandwidth makes the wireless resources scarce. Therefore, the multiple-input multiple-output (MIMO) system is seen as attractive, due to spectral efficiencies and high throughput [2][3][4][5][6][7]. Also, the multi user (MU) MIMO system was invented to provide high capacity for multiple user equipments (UEs) [8,9]. The MU MIMO system applies a concept of space division multiple access (SDMA) [10]. In a co-channel wireless system, each UE generates inter-user interference (IUI) to other UEs. The challenge is to cancel out IUI disrupting wireless communication. Therefore, the base station (BS) estimates channel state information (CSI) and applies a precoding technique, mitigating IUI. The precoding techniques are divided into nonlinear and linear precoding techniques. Nonlinear techniques [11][12][13] that provide high performance are unrealizable due to their high complexity, and linear techniques supply feasible compromises.
The block diagonalization (BD) precoder is a typical linear precoder [14]. BD is well known as a precoder to carry out perfect nulling through singular value decomposition (SVD) operations. In the environment where channel correlation is high, BS technique can eliminate IUI through SVD. For this advantage, an improved BD precoder has been proposed in many previous studies. For instance, the author in [15] proposed low complexity for the BD precoder. However, although the received signal has a benefit of zero IUI, a BD precoder can not guarantee high bit error rate (BER) performance because the noise component is disregarded. Also, the BD precoder spends most of its spatial resources to achieve perfect nulling, and poor diversity gain is obtained. Therefore, a regularized block diagonalization (RBD) precoder and generalized block diagonalization (GBD) precoder have been proposed in [16,17]. The RBD precoder has been considered attractive since many probabilities of performance improvement are provided by considering noise power. For this reason, the author in [18] proposed an improved precoder in terms of capacity performance. However, the RBD precoder still consumes transmit antenna resources to mitigate IUI. On the other hand, the GBD precoder offers additional diversity gain, but is not optimized with minimum mean square error (MMSE) criterion. Consequentially, the GBD scheme cannot achieve additional BER performance. In other words, the RBD precoder provides improved BER performance by mitigating the noise effect, but additional diversity gain is not obtained. On the other hand, the GBD precoder offers diversity gain as the number of transmitted data streams is adjusted, but additional gain for BER performance is not guaranteed. Therefore, the scheme that obtains diversity gain and simultaneously mitigates the noise component is required to meet the increased demand of the throughput.
Many attempts have been made to increase throughput performance in MU MIMO. In fact, the author in [19] proposed a precoder which obtains improvement throughput by combining the BD with non-orthogonal multiple access (NOMA). However, this precoder uses multiple power domains. Therefore, comparison with a conventional precoder which uses a single power domain is not appropriate. Also, this precoder does not obtain maximum throughput in all signal to noise ratios (SNRs). Thus, a precoder that uses a single power domain and maximizes throughput performance is needed.
This paper proposes a modified GBD (MGBD) precoder which provides improved BER gain by satisfying the modified MMSE criteria and additional diversity gain by adopting partial nulling technique. The modified MMSE criteria is designed to minimize the Frobenius norm for non-diagonal components of equivalent channel that causes IUI. The partial nulling scheme obtains diversity gain by utilizing degrees of freedom. The proposed scheme is operated like a GBD precoder which retains some IUIs to increase diversity gain and uses postcoding to eliminate the remaining IUI. Also, the proposed scheme is optimized through the modified MMSE criterion like RBD precoder. Therefore, the MGBD precoder provides improved BER performance compared to the GBD precoder and guarantees higher throughput performance than the RBD precoder in an unfavorable channel state by adjusting the number of streams. Additionally, in order to maximize the throughput performance according to channel state, this paper proposes an adaptive precoder in order to maximize throughput at all SNRs. The adaptive algorithm performs precoding by adjusting the partial nulling subset according to SNR. In other words, this precoder increases the partial nulling space at high SNRs for high throughput and decreases the partial nulling space at low SNRs for improved BER performance. Thus, UEs in MU MIMO downlink system obtain maximum throughput and BER performance. This paper is organized as follows. Section 2 presents a MU MIMO downlink system model. Section 3 explains conventional precoding schemes, such as RBD, GBD, etc. In Section 4, the MGBD precoder and the adaptive MGBD algorithm are proposed. The simulation results for BER performance and throughput are represented in Section 5. Section 5 also provides computational complexity for MGBD and RBD. Finally, Section 6 gives brief conclusions.

Multi User MIMO Downlink System Model
This paper considers a multi user MIMO downlink system model with N T transmit antennas at base station and N R receive antennas, as shown in Figure 1. In this system, the i-th UE has N R i receive antennas and N R = K ∑ j=1 N R j is the total number of receive antennas, where N R ≤ N T . The r data streams are transmitted from the base station to K separate UEs via complex Rayleigh flat fading channel. The BS precodes a data signal based on CSI to weaken IUI and transmits the precoded signal. Each UE receives a signal with mitigated IUI and postcodes the received signal to eliminate residual IUI.
In this system model, the received signal vector Y = y T 1 y T 2 · · · y T K T ∈ C r×1 is represented as follows, whereX ∈ C N T ×1 is transmitted signal vector expressed byX = FX and W = w T is additive white Gaussian noise (AWGN) vector with independent and identically distributed (i.i.d) components of variance σ 2 W and zero mean.
∈ C N T ×r are data signal vector which is uniformly distributed with E x i x H i = I r i and precoding matrix. r i is the number of data streams for the i-th UE and r is defined as follows, Also, it is assumed that E X 2 = P T and W is independent of transmitted signal X. The joint channel matrix is represented as The decoding matrix G ∈ C r×N R which is a diagonal matrix of postcoding matrices for all UEs is defined as follows, Then, the received signal of the i-th UE y i can be written as follows, where Rayleigh flat fading channel matrix, precoding matrix, data signal vector and AWGN vector for the i-th UE respectively. The desired signal is interfered with by the IUI signal. Also, the equivalent single user (SU) model of the i-th UE is represented in Figure 2.

Conventional Precoding Schemes
In this section, conventional BD precoder and modified BD precoders are introduced. Before the detailed description,H i , L andL i are defined as follows,

Conventional BD Precoder
The typical property for the conventional BD precoder is that IUI signal is completely eliminated by multiplying the precoding matrix by the data signal. Thus, the precoding matrix consists of null space for the channel matrix of other UEs. The conventional BD precoder obtains orthogonal vectors to IUI signal from SVD ofH i . The SVD ofH i is given by,  i as the precoding matrix F BD i . Consequently, IUI signal is completely eliminated and received signal y i can be expressed as follows, By using the BD precoder, the complex MU MIMO channel turns into parallel SU MIMO channels due to zero IUI. However, the BD algorithm wastes spatial resources by accomplishing perfect nulling and does not obtain additional BER performance by disregarding the noise element.

RBD Precoder
The RBD scheme which was proposed in [16] was designed to optimize the BD scheme in MMSE criterion. The author in [16] divides the precoding matrix into two parts as follows, where γ RBD is scaling factor for RBD, F RBD MU converts MU channel into parallel SU channels guaranteeing MMSE and F RBD SU optimizes performance for the SU channel. The expression for MMSE criterion to obtain F RBD MU is as follows, However, the RBD algorithm does not solve Equation (11) to find F RBD MU . Instead, F RBD MU is optimized with modified MMSE criterion that minimizes IUI. The expression for modified MMSE criterion is as follows, F RBD MU satisfying Equation (12) is as follows [16,20], where σ 2 W and P T are noise power and transmit power. The channel HF RBD MU is changed to parallel SU MIMO channels. The SVD for are left singular vector matrix, singular value matrix, right singular vector matrix correlated with non zero singular values and right singular vector matrix correlated with zero singular values. The solution for performance optimization of SU MIMO is as follows [16,20], Consequently, the i-th UE precoding matrix for RBD is given by, Although RBD scheme provides improved performance compared to the BD scheme, the RBD precoder does not offer additional diversity gains, which the BD precoder does.

GBD Precoder
The key point for the GBD scheme which was proposed in [17] is partial nulling increasing degrees of freedom for a wireless communication system [21]. Therefore, the GBD precoder does not completely eliminate channels for other UEs and the remaining IUI is removed by using postcoding.
The matrix of dominant components for other UEs except for the i-th UE Then, D i is projected onto orhogonal space ofD i by using complementary projection in order to mitigate IUI. The orthogonal vectors withD i in D i are obtained as follows, where P i is transmit weight matrix and GBD precoder uses P i as precoding matrix F GBD i for the i-th UE. The GBD precoder suppresses the remaining IUI by utilizing postcoding matrix G GBD . The where ∆ and 0 are a matrix with small-valued elements and a matrix with zero elements. Therefore, the effective channel H eff is given by, and essentially effective channel is given by, Consequently, the number of singular values for desired channel is reduced from N R i to r i and IUI channel is converged to zero.
The GBD scheme increases diversity gain for total systems but is not optimized with MMSE criterion. Therefore, the goal of proposed scheme is that the precoder is optimized with modified MMSE criteria by considering noise entries and provides diversity gain.

Modified GBD Precoder
The MGBD precoder transmits a signal with some IUIs to increase degrees of freedom and residual IUI is mitigated by postcoding. Therefore, the postcoding matrix for the proposed scheme consists of left singular vectors for H i . The postcoding matrix G i is given by, On the other hand, the precoding matrix for the proposed scheme is divided into two parts as follows, where γ is scaling factor, F MU is precoding matrix to achieve MMSE and F SU is precoding matrix to optimize the SU MIMO system. F MU is determined by the equation that minimizes mean square error (MSE) as follows, whereT i is expressed as follows, where T i is multiplication of G i and H i as follows, and and V i are left singular vector matrix, singular value matrix, right singular vector matrix. The F MU is obtained by calculating Equation (23). According to Appendix A, the obtained F MU is as follows, Although remaining IUI exists, the channel can be regarded as parallel SU channels by precoding with F MU . The SVD of channel precoded with F MU is given by, whereŪ i is left singular vectors andΣ i is singular value matrix.V i is used as F SU,i [16,20] and precoding matrix F for proposed scheme is as follows, i .
The rest of this subsection discusses the effects of the relationship between r and rank of H. In the following classification, both cases assume that the number of data streams for each UE is equal to r i and each UE has N R i receive antennas.

Case 1
When r is equal to rank of H (L = Kr i ), the MGBD precoder operates like RBD precoder. In this case, all ranks of H are consumed to increase data rate. Therefore, the system does not obtain additional diversity gain.

Case 2
When r is lower than rank of H (L > Kr i ), the MGBD precoder uses L − r rank to obtain extra degrees of freedom. In this case, some ranks of H are consumed to increase diversity gain. On the other hand, because the relationship between data rate and diversity gain has trade-off relation, the data rate for the system decreases. The throughput performance closely correlates with data rate and diversity gain. Therefore, the throughput performance can be optimized by adjusting suitably r i . In this concept, this paper additionally proposes the adaptive MGBD precoder.

Adaptive MGBD Precoder
The adaptive MGBD precoder adjusts the number of data streams of each UE according to received SNR for each UE. The algorithm for the adaptive MGBD precoder is represented in Figure 3. BS can know the received SNR for each UE and θ i is the received SNR of the i-th UE which BS knows. Also, it is assumed that each UE has N R i receive antennas and BS has threshold table forθ 1 ,θ 2 , · · · ,θ N R i −1 . The threshold table is an indicator for the number of data streams which provides maximum throughput. First, the adaptive MGBD precoder compares the received SNR θ 1 of the first UE with threshold SNRθ 1 which previously is obtained through training data. Then, the adaptive MGBD precoder continuously compares θ 1 to other thresholds and selects the suitable r 1 . Therefore, BS transmits r 1 data streams which maximize throughput at θ 1 . The adaptive MGBD precoder repeats these steps from the second UE to the K-th UE. Afterward, the adaptive MGBD precoder calculates Equation (28) by using r 1 , r 2 , · · · , r K in order to obtain precoding matrix. Through these steps, the proposed scheme achieves maximum throughput within the limited SNR of each UE.
The initial threshold values can be obtained through mathematical analysis or the Monte Carlo method. However, the former method is very difficult to estimate threshold values for all channel environments and can lead to a large error due to the approximation processes for the number of data streams [22]. Thus, in this paper, the SNR points of switching the number of data streams are obtained by comparing the results which are gained through previous simulation. The previous simulation is performed by training data which is only used to obtain initial threshold values. Then, the BS stores these SNR points in the form of a threshold table.

Simulation Results
This section analyzes BER and throughput performaces for proposed scheme. In the first subsection, the MGBD precoder is compared to the RBD and the GBD precoder. In second subsection, the throughput performances for the adaptive MGBD and GBD precoder are represented according to SNR. The simulation parameters are shown in Table 1. All simulations are performed by MATLAB, and the BER performances are measured by Monte Carlo method. Then, the simulation results are provided by averaging a large number of results for a given SNR to obtain reliable BER values. Additionally, this simulation considers a Rayleigh flat fading channel environment for which the channel components with zero mean unit variance are i.i.d. Also, it is assumed that the number of data streams for all UEs is the same and the CSI is estimated perfectly at BS and all UEs.

MGBD Precoder
The MGBD precoder is compared to the conventional precoders in terms of BER and throughput performances.
The Figures 4 and 5 show the BER performance for the MGBD precoder, RBD precoder and GBD precoder according to SNR. Also, Figure 4 shows the BER performance when each of the 3 UEs has 3 receive antennas and BS has 9 transmit antennas and Figure 5 is BER performance when each of the 4 UEs has 3 receive antennas and BS has 16 transmit antennas. In Figures 4 and 5, the BER performance for the proposed scheme increases compared with the GBD precoder at all SNRs and all cases except for minimum data streams (r = 3 or r = 4). Compared to GBD, the performance gain of the MGBD precoder is about 3 dB (r = 6) to 15 dB (full streams) when each of the 3 UEs has 3 receive antennas and BS has 9 transmit antennas according to Figure 4. In Figure 5, the MGBD precoder obtains about 2 dB (r = 8) to 5 dB (full streams) gain in comparison with GBD precoder when each of the 4 UEs has 3 receive antennas and BS has 16 transmit antennas. This advantage is obtained by optimizing with a modified MMSE criterion like Equation (12). If the BS transmits minimum data stream, the extra BER gain that the modified MMSE criterion provides is negligible in comparison with diversity gain. Thus, the BER performance for the MGBD precoder is equal to the BER performance for the GBD precoder when BS transmits minimum data streams. The degrees of freedom for MGBD and GBD precoder increases as the data streams decrease. Therefore, the BER performance is maximized when BS transmits 1 data stream to each UE and is minimized when BS transmits full data streams by consuming all ranks for the channel. In this case, the MGBD precoder does not obtain extra diversity gain; thus, the BER performance for the proposed scheme is the same as the BER performance for the RBD precoder.  The gap for BER performance between MGBD and GBD precoder in Figure 5 decreases more than the gap for BER performance in Figure 4. Because extra BER gain is obtained by increasing the number of transmit antennas relative to the number of receive antennas, the gap for BER performance decreases in both precoders. Figures 6 and 7 represent normalized throughput performance for the modified GBD precoder, RBD precoder and GBD precoder according to SNR. Also, the throughput performances for the proposed scheme are compared with Tomlinson-Harashima precoding (THP), which is well known as a suboptimal precoder in the MU-MIMO system [23] to show the superiority of the proposed scheme. Both figures show that peak value for normalized throughput is enhanced as the number of data streams increases. However, the required SNR where a peak value for normalized throughput is achieved rises as the number of data streams increases.  First, in the both cases, THP reaches maximum throughput rapidly when BS transmits data with full streams. However, the throughput performance for the MGBD outperforms the THP when BS transmits data with lower than the maximum streams (r = 3, r = 6 in Figure 6 and r = 4, r = 8 in Figure 7). Also, the GBD obtains more gain than the THP in minimum streams, and the gap between these precoders decreases as the number of streams reduces. These results are effected by the MGBD and GBD obtaining more diversity gain than the THP since the partial nulling is used. Additionally, the gap of throughput performance between the MGBD and THP is large when the SNR is low since the proposed scheme obtains additional SNR gain by mitigating noise components.
The MGBD precoder provides improved throughput performance compared to the GBD precoder at all SNRs and all cases, except for minimum data streams (r = 3 or r = 4). In more detail, the MGBD precoder provides about 3 dB (r = 6) gain in comparison with GBD precoder when each of the 3 UE has 3 receive antennas and BS has 9 transmit antennas according to Figure 6. Compared to GBD, the performance gain of the MGBD precoder is about 1 dB (r = 8) to 4 dB (full streams) when each of the 4 UEs has 3 receive antennas and BS has 16 transmit antennas in Figure 7. The improvement for throughput performance can be predicted through Figures 4 and 5. In short, the throughput performance improves when the BER performance improves. Like the BER performance, the throughput performance for proposed scheme is the same as that of the RBD precoder when BS transmits the full data stream and is the same as the GBD precoder when BS transmits the minimum data stream. Because the BER performance is not enough to obtain the reasonable throughput performance, if BS transmits a full stream in Figure 6, this presents poor performance. In other words, although the RBD precoder transmits a signal with maximum data streams, the peak value of the throughput performance for RBD only achieved 33% of the peak value for the MGBD precoder (r = 6) and 60% of the peak value for the MGBD precoder (r = 3) when SNR is less than 30 dB.

Adaptive MGBD Precoder
In this subsection, throughput performance for the adaptive MGBD precoder is compared with throughput performance for the adaptive GBD precoder. The algorithm for the adaptive GBD precoder is the same as Figure 3 and uses Equation (17) in order to obtain the precoding matrix. The dotted line in Figure 8 is throughput performance when each of the 3 UEs has 3 receive antennas and BS has 9 transmit antennas and the solid line in Figure 8 is throughput performace when each of the 4 UEs has 3 receive antennas and BS has 16 transmit antennas. Because the BER performance for the MGBD precoder is better than that of the GBD precoder, the required SNR for which the adaptive MGBD precoder changes the number of data streams is lower than adaptive GBD precoder. This advantage implies that the required SNR for the MGBD precoder is lower than the GBD precoder in order to achieve the desired throughput. Also, the gap for throughput performance between adaptive MGBD and adaptive GBD precoder increases as SNR grows. In other words, the gain of throughput performance for the MGBD precoder increasingly grows as the number of data streams increases compared to GBD. Although the RBD precoder provides the same peak value as MGBD, the RBD precoder does not obtain any throughput performance in low SNR. For these reasons, unlike other schemes, the MGBD precoder maximizes throughput at any SNRs.

Computational Complexity Analysis
This subsection provides the comparison of computational complexity for RBD precoder, GBD precoder and MGBD precoder.
It is assumed that one complex multiplication requires four multiplications. Also, process to obtain the precoding matrix for RBD and MGBD precoders is similar and most of computational complexity is caused by SVD operation. Therefore, this discussion only calculates computational complexity for SVD in the cases of RBD and MGBD precoders. On the other hand, most of the computational complexity for GBD is caused not only by SVD, but also Equation (17). Thus, the process of calculating complexity for GBD considers SVD operation and Equation (17). Table 2 shows computational complexity for two methods of SVD operation when dimensions of a complex matrix are M × N [24]. The MGBD precoder, RBD precoder and GBD precoder require respectively three, two and one SVD operation. The computational complexity for all precoders is represented in Tables 3-5. Figure 9 shows a comparison for computational complexity when each of the 3 UEs has 3 receive antennas and BS has 9 transmit antennas. In this case, the number of multiplications for the MGBD precoder is lower than the RBD precoder, except for 3 streams. In more detail, the computational complexity for MGBD precoder reduces by about 40% (r = 3) and 10% (r = 6 ) compared to RBD, when BS transmits a data signal with lower than full streams. In contrast, the number of multiplications for MGBD increases by 50% compared to RBD, when BS transmits a data signal with full streams. The complexity can be reduced by setting G i to the unit matrix when BS transmits full streams. The MGBD precoder obtains a precoding matrix F by the same method as the RBD precoder when G i is changed to the unit matrix. On the other hand, the computational complexity for MGBD is always higher than the computational complexity for GBD. Since the GBD precoder is not optimized with MMSE criterion, the performance of the GBD precoder is degraded and computational complexity for the GBD precoder is low.
Since MGBD and GBD eliminate remaining IUI by utilizing postcoding, additional multiplications are required. The number of additional multiplications for both precoders is the same as the first row in Table 5. Although the computational complexity for MGBD postcoding is independent of the number of data streams, in common with the RBD precoder, extra SVD operation is not needed by setting G i to the unit matrix when UE receives full streams. Table 3. Computational complexity for a regularized block diagonalization (RBD) precoder. (13)) (14)) Table 4. Computational complexity for a generalized block diagonalization (GBD) precoder. (16)) Table 5. Computational complexity for MGBD precoder. (21)) Figure 9. The comparison for computational complexity when each of 3 UEs has 3 receive antennas and BS has 9 transmit antennas.

Conclusions
In this paper, an MGBD precoder is proposed by optimization with modified MMSE criterion and simultaneously performing partial nulling. Because the noise component is additionally considered, the simulation results show that the proposed scheme provides improved BER and throughput performances compared to GBD. The proposed scheme obtains diversity gain by reducing the number of data streams unlike RBD. These advantages provide the possibility that the MGBD precoder obtains maximum throughput in all SNRs by adaptively changing the number of data streams according to channel state. In exchange for this performance improvement, the proposed scheme has higher complexity than RBD and GBD. Although the number of multiplications for the MGBD precoder is higher than that of the RBD precoder when BS transmits full streams, the MGBD precoder can reduce system complexity by setting the postcoding matrix to a unit matrix. Also, an adaptive MGBD precoder for which throughput performance is maximized at all SNRs is proposed. The conventional RBD requires high SNR in order to obtain appropriate throughput and the throughput performance for conventional GBD is lower than for the proposed scheme. Additionally, the gap for throughput performance between adaptive MGBD and the adaptive GBD precoder increases as the number of data streams grows. Thus, the proposed scheme reduces the required SNR in order to obtain the desired throughput. These advantages are valuable in wireless communication systems, because high throughput performance is achieved at unfavorable channel conditions.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Derivation of F MU
This section explains proof that F MU is optimized with modified MMSE criterion. The F MU is designed that diversity gain is preserved and MSE is minimized. The postcoding matrix removes the remaining IUI and F MU is obtained as Equation (23). The Equation (23) can be rewritten as follows, The power for trasmitted signal vector is constrained to P T as follows, F SU,i is designed as unitary matrix. Therefore, the scaling factor γ satisfies as follows, By substituting Equation (A3) into Equation (A1), Equation (A1) is given by, The solution of Equation (A5) is obtained like [20] (Appendix). Consequently, F MU,i is given by,