Partial Nulling Regularized Block Diagonalization Using Unfair Channel Selection for Post-Coding with Low-Complexity

A number of requirements for 5G mobile communication are satisfied by adopting multi-user multiple input multiple output (MU-MIMO) systems. The inter user interference (IUI) which is an inevitable problem in MU-MIMO systems becomes controllable when the precoding scheme is used. The proposed scheme, which is one of the precoding schemes, is built on regularized block diagonalization (RBD) precoding and utilizes the partial nulling concept, which is to leave part of the IUI at the same time. Diversity gain is obtained by leaving IUI, which is made by choosing the row vectors of the channel matrix that are not nullified. Since the criterion for choosing the row vectors of the channel is the power of the channel, the number of selected row vectors of the channel for each device can be unfair. The proposed scheme achieves performance enhancement by obtaining diversity gain. Therefore, the bit error rate (BER) performance is better and the computational complexity is lower than RBD when the same data rate is achieved. When the number of reduced data streams is not enough for most devices to achieve diversity gain, the proposed scheme has better performance compared to generalized block diagonalization (GBD). The low complexity at the receiver is achieved compared to GBD by using the simple way to remove IUI.


Introduction
Multiple input multiple output (MIMO) systems are key components of future wireless communication in terms of high data rate over a limited frequency resource. Multi-user (MU) MIMO systems have been studied as promising techniques which achieve spatial multiplexing gain to increase throughput and spatial diversity to improve reliability at 5G [1][2][3][4][5][6]. MU-MIMO systems are actually used in Wi-Fi routers. According to the IEEE 802.11 ac and 802.11 ax, MU-MIMO systems are available. In [7,8], the performances of MU-MIMO are shown in actual indoor and outdoor environment. MU-MIMO systems are expected to provide significant multiplexing and diversity gains while MU-MIMO systems resolve some of the issues associated with conventional single user (SU) MIMO. Namely, it brings robustness with respect to multipath richness, allowing for compact antenna spacing at the BS, and crucially, yielding the diversity and multiplexing gains without the need for multiple antenna user terminals [9]. When MU-MIMO systems are adopted, systems which have dozens of devices with a single antenna can be treated like systems which have a single device with a number of antennas. Base station can use efficiently multiple antennas for each device. In MU-MIMO When the proposed scheme is used, the better performances of the bit error rate (BER) and throughput can be achieved with low complexity. The complexity of the proposed scheme at the receiver is lower than GBD and MGBD. In addition, the performance of the proposed scheme is better than GBD and MGBD when the number of data stream is enough. The low complexity at the receiver is the great benefit since the operation of the receiver is heavier burden than the operation at the base station. Therefore, the proposed scheme is more practical than GBD and MGBD in terms of the complexity at the receiver. The computational complexity of the proposed scheme at the transmitter is also lower than RBD and MGBD. However, the proposed scheme can achieve the better performance with low complexity at the transmitter. Thus, the proposed scheme is highly efficient compared to conventional schemes in respect of performance and complexity.

System Model and Conventional Precoding Schemes
MU-MIMO broadcasting system which is composed of one base station with N T transmit antennas and K users shown in Figure 1 is considered. Rayleigh flat fading channel is assumed and the kth user has N k receive antennas. The total number of receive antennas is N R = K ∑ k=1 N k . The transmit signal vector for the kth user is x k ∈ C L k ×1 , where L k (≤ N k ) is the number of data streams of the kth user.
The total number of data streams is L total = K ∑ k=1 L k and cannot be greater than N R . The received signal for the kth device can be expressed as follows, where H k ∈ C N k ×N T , P k ∈ C N T ×L k , j and n k ∈ C N k ×1 are the channel matrix of the kth user, precoding matrix for the kth user, index of other users except the kth user and the additive white Gaussian noise vector of the kth user which has zero mean and variance σ 2 n , respectively. The set of entire received signal vectors y∈ C N R ×1 can be expressed as follows, whereH∈ C N R ×L is the effective channel. The effective channelH is obtained as follows, H =       H 1 P 1 H 1 P 2 · · · H 1 P K H 2 P 1 H 2 P 2 · · · H 2 P K . . . . . . . . . . . .
Diagonal components in the effective channel matrix are MIMO channels for each user and off-diagonal components express the IUI.
If a unit variance for the noise is assumed, the capacity region can be written as follows, where Q k = E x k x H k and P K = Tr (Q k ) are the covariance of signal vector for the kth user and the power allocated to the kth user.

Regularized Block Diagonalization
RBD [10,20,21] suppresses IUI perfectly by using null space of the channels of any other users like BD and the null space is obtained from SVD operation. Specifically, the SVD of an m × n real or complex matrix M is a factorization of the form UΣV H , where U is an m × m real or complex unitary matrix, Σ is an m × n rectangular diagonal matrix with non-negative real numbers on the diagonal, and V is an n × n real or complex unitary matrix. The exact calculation method and examples of SVD are shown on pages 367 to 370 of [24]. RBD scheme considers the noise term in contrast with BD. Therefore, the performance of the RBD scheme is better than the BD precoding when the noise is the dominant factor. RBD precoder is expressed as follows, To form the precoding matrix of the kth user, the channel matrixH k ∈ CN k ×N T which is composed of the channel matrices of all users except the kth user is denoted as follows, whereN k is the number of receive antennas of all devices except the kth device. The channel matrix H k is expressed by using SVD as follows,H whereŪ k ∈ CN k ×N k , the diagonal elements ofΣ k ∈ CN k ×N T ,V k ∈ C N T ×N T and (•) H are the set of left singular vectors, the singular values, the set of right singular vectors and hermitian operator, respectively. The first precoder P a k can be expressed as follows, where α and I are the ratio of the total noise power to the total transmit power, i.e., α = N R σ 2 k /P total and the identity matrix, respectively. The SVD of the effective channel for the kth user is obtained as follows, whereŨ k , the diagonal elements ofΣ k ,Ṽ indicate the set of left singular vectors, the singular values, the set of right singular vectors corresponding to non-zero singular values and the set of right singular vectors corresponding to zero singular values. The second precoder P b k is obtained as follows, Finally, the precoding matrix of the RBD for the kth user can be written as follows, RBD scheme eliminates IUI perfectly. The performance of the RBD is better than the BD precoding by taking noise term into account. However, the transmit diversity gain cannot be obtained since all transmit antenna resources are used to make precoder that eliminates channel components of other users.

Generalized Block Diagonalization
When IUI is removed perfectly, all degrees of freedom are consumed by transmit beamforming. GBD [22] does not consume all degrees of freedom to eliminate IUI but acquires the transmit diversity gain by removing IUI imperfectly. Since GBD scheme does not eliminate all IUI but remains intended IUI, extra degrees of freedom that have equal number with decreased data streams are left. The transmit diversity gain is obtained since the data streams are transmitted with the transmit antennas which are more than the data streams. The intended IUI is eliminated at the receiver by using the extra degrees of freedom. When the GBD precoder is used, the transmit antenna resources are exploited not only to eliminate IUI partially but also to acquire the transmit diversity gain.
Thus, (3) can be expressed as follows, where ∆ k,j = H k P j indicates the remaining IUI that is formed by using partial nulling concept and reducing the number of total data streams. The received signal of the kth user is expressed as follows, The remaining IUI can be perfectly removed by post-coding at the receiver by spending extra degrees of freedom that are obtained by reducing the number of data streams and using partial nulling.
The SVD of the channel matrix for the kth user is written as follows, where U k , the diagonal elements of Σ k , V k , the jth left singular vector in U k and the jth right singular vector in V k are the set of left singular vectors, the singular values, the set of right singular vectors, u k,j and v k,j , respectively. A case of L k = N k − 1 is considered for the sake of simplicity. Since the number of data streams is one less than the number of receive antennas, the receiver can spend one degree of freedom to suppress the IUI. To build proper ∆ k,j that can be suppressed by using one degree of freedom, every column vector in ∆ k,j needs to be parallel to certain vector q. The reasonable choice for q is the N k th left singular vector u k,N k corresponding to minimum singular value since the intended IUI and the loss of channel gain become minimum. Thus, precoding matrix needs to be formed by using dominant channel components except the N k th right singular vector v k,N k so that all column vectors of ∆ k,j are parallel to u k,N k .
The general case of L k = N k − l k is considered, where l k is the number of reduced data streams for the kth device. The number of vectors that are parallel to every column vector in ∆ k,j increases to l k . Thus, parallel vectors are left singular vectors from u k,N k −l k +1 to the u k,N k . The set of the dominant channel components except right singular vectors corresponding to parallel vectors is defined as follows, The channel component matrix of all devices except the kth device is constructed as follows, The precoding matrix is obtained by the complementary projection as follows, Since the remaining IUI needs to be suppressed, zero-forcing spatial filter as post-coding is used at the receiver. The post-coder of the kth device is given as follows, The received signal of the kth device after post-coding is written as follows, Since the column space of ∆ k,j in (13) is belonged to the null space of R k , the IUI is perfectly removed by post-coding.
When the diversity gain cannot be obtained, the performance of the GBD scheme is poorer than RBD precoding. GBD scheme achieves performance enhancement by reducing the number of the data streams. Thus, GBD scheme is not proper to be used when the number of reduced data streams is not enough for most devices to achieve diversity gain. In other words, GBD scheme cannot avoid degradation of data rate which occurs by reducing the number of data streams to achieve better performance. Since GBD scheme is appropriate in the case that IUI remains, post-coding at the users is essential to eliminate remaining IUI. Post-coding of GBD scheme is enormous burden to receiver in that receiver must carry out the additional operation.

Modified Generalized Block Diagonalization
The recent study which is related with GBD was proposed in [23]. The MGBD was proposed with enhanced performance compared to GBD. Most conventional schemes use all antenna resources to remove IUI perfectly. However, MGBD uses part of the antenna resources to obtain diversity gain like GBD. Since the MGBD also uses the partial nulling concept to achieve diversity gain, the MGBD is chosen as the bench-marked scheme. The difference between the GBD and MGBD is that the MGBD is optimized with a modified minimum mean square error (MMSE) criterion in order to improve BER gain. The modified MMSE criterion is designed to minimize the Frobenius norm for non-diagonal components of equivalent channel that cause IUI.
The precoding matrix of MGBD is presented as follows, To calculate the P MU k , the effective channel which is the multiplication of post-coder (18) and the channel matrix of the kth user is used. The effective channel is expressed as follows, The effective channels of all users except the kth user is expressed as follows, The SVD ofT k is written as follows,T where U k , the diagonal elements of Σ k and V k are the set of left singular vectors, the singular values and the set of right singular vectors, respectively. The first precoder P MU k can be expressed as follows, Since MGBD uses the effective channel for making the precoding matrix, MGBD can achieve performance enhancement compared to GBD. The SVD of the channel precoded with P MU k is as follows, whereÛ k , the diagonal elements ofΣ k ,V k are the set of left singular vectors, the singular values, the set of right singular vectors corresponding to non-zero singular values and the set of right singular vectors corresponding to zero singular values, respectively. The second precoder P SU k is as follows, Therefore, final precoding matrix of MGBD is expressed as follows, The way to remove residual IUI at the receiver is exactly same as GBD. The residual IUI is eliminated by multiplying post-coder at the receiver.
The process for obtaining a precoding matrix of the MGBD is very similar with RBD. Therefore, when the number of data streams is equal to the number of the transmit antennas, the precoding matrix of the MGBD is same as the precoding matrix of the RBD. It means that the performance of the MGBD with full data streams is same as the performance of the RBD. However, when the number of data streams is reduced, the MGBD can obtain the transmit diversity gain. Thus, the performance of the MGBD is better than RBD when the MGBD has the reduced data streams. MGBD needs one more SVD operation to obtain the post-coder and the post-coder is used for forming first precoder of the MGBD. By using the effective channel matrix at the first precoder, the MGBD takes noise term into account. Therefore, the performance of the MGBD is better than the GBD. However, since the MGBD needs three SVD operations to form the final precoding matrix, the computational complexity at the base station is extremely high compared to RBD and GBD. In addition, since the residual IUI must be removed at the receiver, the computational complexity at the receiver is larger than the RBD. Figure 2 is the flowchart of the proposed scheme. The proposed scheme is based on the RBD precoder and applies partial nulling concept at the same time. Thus, when all users cannot get diversity gain, the proposed scheme has the same precoder as RBD. The proposed scheme adopts partial nulling concept by selecting the row of the channel matrix which is not nullified depending on the power of the channel. The proposed scheme does not remove IUI corresponding to the rows of the channel matrix which have relatively low power to remain the transmit antenna resources. The number of the rows which are selected not to nullified is same as the number of decreased data streams. Since the number of the data streams is reduced and is less than the number of transmit antennas, the transmit diversity gain is achieved. The performance of the proposed scheme is better than RBD due to the transmit diversity gain. The proposed scheme also can achieve better performance compared to GBD and MGBD when the number of data streams is slightly reduced. The remaining IUI is removed by consuming the extra degrees of freedom at the receiver. However, the way to remove IUI in the proposed scheme is simpler than the way that uses post-coder at the user. Since the extra multiplication at the receiver is not needed by using simpler way in the proposed scheme, the proposed scheme has same complexity as RBD even if the proposed scheme has to remove IUI at the receiver. In other words, the complexity of the proposed scheme at the receiver is lower than GBD and MGBD which have to multiply post-coder to remove IUI at the receiver.

Proposed Partial Nulling RBD
A case of L total = N R − 1 is considered for the sake of simplicity. When the channel from the ith transmit antenna to the jth receive antenna is h j,i , the power of the channel to the jth receive antenna is expressed as follows, The reasonable choice for row that is not nullified is the row with the smallest power to minimize the loss of channel gain. When the index of receive antenna with the smallest power of the channel is defined as m, the mth row of the channel matrix with the smallest power P m is chosen as the row that is not nullified. The mth receive antenna corresponding to the qth device is considered.
N k , the m q th row of the channel matrix for the qth device is not nullified. In other words, the number of data streams for the qth device is decreased to N q − 1. The channel matrix of the qth device except the m q th row can be defined as follows, where h i,j is the jth row vector of channel matrix for the ith device. When k = q, the channel matrix H k which is composed of the channel matrix of all devices except the kth device is denoted as follows, When k = q, the row vector with the smallest power is excluded from the channel matrix H k .
Thus, the channel matrix H k is denoted as follows, Since the number of data streams is one less than the number of receive antennas, the number of chosen row vector to be excluded is one.
A case of L total = N R − l total is considered where l total (< N R ) is the total number of reduced data streams. In this case, since the number of data streams is l total less than the number of receive antennas, the number of chosen row vectors not to be nullified is l total . The l total row vectors which are not nullified are the row vectors with the relatively small power. Since the criterion for choosing the l total row vectors of the channel is the power of the channel, the number of selected row vectors of the channel for each device can be unfair. However, the proposed scheme can achieve highly performance improvement by using unfair selection of the row vectors for the channel with the low power relatively and minimizing the loss of the channel gain. The l total row vectors which are not nullified are omitted from the channel matrix H k .
The precoder of the proposed scheme is formed like RBD precoder by utilizing H k which does not contain the row vectors with relatively small power. The precoder of the proposed scheme can be expressed as follows, The channel matrix H k can be expressed by using SVD as follows, where U k , the diagonal elements of Σ k and V k are the set of left singular vectors, the singular values and the set of right singular vectors, respectively. Since H k has smaller dimension compared toH k due to excluding the row vectors, SVD operation at (33) has lower computational complexity. The first precoder P α k can be written as follows, (34) has low computational complexity due to Σ k which has smaller dimension like H k . The effective channel is expressed as follows, whereÜ k , the diagonal elements ofΣ k ,V The precoder of the proposed scheme for the kth device is written as follows, Since the row vectors which are not included in H k are not belonged to the null space of the precoder, IUI which cannot be nullified by precoder is left. In other words, the IUI remains in the channels corresponding to the selected row vectors. The proposed scheme can obtain the diversity gain by remaining IUI.
The received signal of the kth device can be expressed as follows, y k = y k,1 y k,2 · · · y k,N k T , where y i,j is the received signal at the jth antenna for the ith device. A case of L total = N R − 1 is considered for the sake of simplicity. When the m q th row vector of the channel matrix for the qth device is excluded from H k , y q,m q is the received signal through the channel with remaining IUI. The IUI can be removed perfectly by excluding y q,m q from y k . The received signal after IUI is removed is written as follows, y q = y q,1 · · · y q,m q −1 y q,m q +1 · · · y q,N q T .
A case of L total = N R − l total is considered for the sake of generalization. The received signal y k for the kth device after IUI is removed can be made by excluding received signal corresponding to the selected row vectors of the kth device.
The advantage of the proposed scheme is that the diversity gain is achieved by reducing the number of total data streams. In addition, since the multiplication of the post-coder is not needed, the computational complexity at the receiver is low. However, since the proposed scheme excludes the whole row vectors from the channel, the loss of the channel gain can grow rapidly with the number of the excepted row vectors. Therefore, the limitation of the proposed scheme is the large loss of the channel gain.

Computational Complexity Analysis
The GBD and MGBD need to eliminate IUI by multiplication of the post-coder. Therefore, the complexities for GBD and MGBD at the receiver are exactly same. However, since the proposed scheme uses simpler way not to use post-coder for removing IUI compared to GBD and MGBD, the proposed scheme has low complexity at the receiver. Therefore, even if the proposed scheme has to remove IUI at the receiver, the complexity at the receiver of the proposed scheme is same as the RBD. Since the operations at the receiver are more burden than the operations at the base station, the lower complexity at the receiver is great advantage.
Since the complexity of the proposed scheme at the receiver is same as RBD, the comparison between the complexities at the base station of the proposed scheme and RBD is considered. RBD and the proposed scheme require two SVD operations to decompose the channel matrix. However, three SVD operations are required at the MGBD. The one more SVD at the MGBD is utilized for making post-coder. Therefore, since the complexity of the MGBD at the transmitter is extremely high, the complexity of the MGBD at the transmitter is also calculated and compared with the proposed scheme. The computational complexity of the SVD grows with the dimension of the channel matrix which is decomposed. Since the dimension of the channel matrix which is decomposed by using SVD at the proposed scheme is decreased by excluding the row vectors, the proposed scheme has low complexity at the transmitter compared to RBD. The numbers of flops of conventional schemes and proposed scheme are calculated to measure computational complexity according to [19]. A multiply followed by addition of the real number needs 2 flops. A multiply followed by addition with complex number needs 8 flops.
The complexities of each operation for each scheme are shown in Tables 1-4, whereN k = N R − N k andL k = L total − L k are considered. The complexities of the conventional schemes and the proposed scheme are derived deterministically. The numbers of flops of conventional schemes and the proposed scheme are calculated to measure computational complexity according to [19]. For example, the complexity of the SVD operation such as (7) and (9) in Table 2 is 32 nm 2 + 2m 3 at [19] when the complexity of SVD for m × n complex matrix is calculated. Therefore, since (7) is the SVD ofN k × N T complex matrix, the complexity of (7) for each user is 32 N TN 2 k + 2N 3 k . (9) is the SVD of N k × N T complex matrix. Therefore, the complexity of (9) for each user is 32 N T N 2 k + 2N 3 k .
In (8), the complexity of Σ H kΣ k + αI −1/2 is K (18N T +N i ) in [19]. Since the dimensions ofV k and Σ H kΣ k + αI −1/2 are N T × N T , the complexity of multiplication betweenV k and Σ H kΣ k + αI −1/2 is 8KN 3 T . Therefore, the total complexity of (8) can be expressed as The complexities at the receiver are shown in Table 1 when R k and y k are the post-coder of the kth user and the received signal of the kth user, respectively. GBD and MGBD need extra operation in (19) for multiplying the post-coder at the receiver. In contrast, since the RBD precoder eliminates IUI perfectly at the transmitter, extra operation at the receiver to remove remaining IUI is no needed. The proposed scheme also does not need extra multiplications to remove remaining IUI by excluding the received signal corresponding to remaining IUI. Tables 2-4 show the complexities at the base station of the RBD, MGBD and the proposed scheme, respectively. The parameters in Tables 2-4 are explained in Sections 2.1, 2.3 and 3, respectively. The proposed scheme needs extra operation which requires low number of flops to obtain the power of the channel in (28). However, the complexities of the operations in (33) and (34) are low since the channel matrix which is utilized to form the precoder of the proposed scheme has low dimension. Therefore, the complexity of the proposed scheme is lower than the RBD. Since MGBD requires three SVD operations at (14), (23) and (25), the complexity of MGBD is extremely high. It means that the proposed scheme has lower complexity at the base station compared to MGBD.
In terms of time complexity, when the data size is N, time complexities of all schemes can be written as N × the number of total operations modulation order×code rate×L k = O (N). In addition, since the modulation order increases with reduced L k , the results of (modulation order × code rate × L k ) is constant regardless of scheme. It means that the numbers of total operations which are shown in Tables 2-4 are important at time complexity. Table 1. Computational complexity of post-coding at receiver.

RBD None
Proposed scheme None Table 2. Computational complexity of each operation for regularized block diagonalization (RBD) at transmitter. Table 3. Computational complexity of each operation for modified generalized block diagonalization (MGBD) at transmitter. Table 4. Computational complexity of each operation for proposed scheme at transmitter.

Label Operations Flops
Since the proposed scheme and RBD have no need of the extra operation at the receiver, the proposed scheme and RBD are efficient schemes in terms of the computational complexity at the receiver. Thus, the computational complexities of RBD and the proposed scheme at the transmitter are compared in order to know more efficient scheme. In addition, the MGBD is also compared with the RBD and the proposed scheme since MGBD has extremely high complexity at the base station. The computational complexities of RBD, MGBD and the proposed scheme are shown in Figure 3. The number of data streams for RBD is N T and the numbers of data streams for MGBD and proposed scheme are N T 2 which is reduced to obtain diversity gain. The complexities of some operations at the proposed scheme are lower than RBD since the dimension of the channel matrix which is used for operations is small. The proposed scheme needs extra operation to calculate the power of the channel. However, the decrease in the complexity by lower dimension of the channel matrix in the proposed scheme is larger than the complexity of the extra operation. Therefore, the total complexity of the proposed scheme is lower than the RBD. The MGBD has the highest complexity at the transmitter since the three SVD operations are required for obtaining precoding matrix. The complexities of the RBD and MGBD increase rapidly depending on the number of transmit antennas. Even though the differences in computational complexities from 8 to 12 antennas look like small, the complexity of the proposed scheme is much lower than the complexities of the conventional schemes. In addition, since the number of total operations for the proposed scheme is lower than RBD and MGBD in Figure 3, the time complexity of the proposed scheme is also lower than RBD and MGBD.

Simulation Environment
The performance of the proposed scheme is evaluated and compared with the conventional schemes. The simulation was conducted using Matlab. The mean of repeated simulation results is used to evaluate performance according to the Monte Carlo method. Table 5 shows simulation parameters used in this paper. The simulation parameters are determined by the IEEE 802.11 ac. Only blocks that affect performance comparisons in the IEEE 802.11 ac process are simulated for simplicity. Since simulation parameters at the IEEE 802.11 ac are used, the proposed scheme can be applicable to the actual IEEE 802.11 ac environment. The Rayleigh fading channel which is frequently used in wireless communication system is used. Many recent studies [2,6,23] also used Rayleigh fading channel. Perfect channel estimation is assumed. Since orthogonal frequency division multiplexing (OFDM) system is adopted, each subcarrier goes through the flat fading channel. All elements of the channel matrices are independent complex Gaussian random variables with zero mean and unit variance. The system which has same number of transmit antennas and receive antennas is considered. The number of transmit and receive antennas is 8 or 16. The number of antennas of each user is fixed to 2 to consider the case which accommodates numerous devices. Therefore, when the number of transmit and receive antennas is 8 or 16, there are 4 users or 8 users, respectively. The modulation which is used at the RBD is quadrature phase shift keying (QPSK). The 16 quadrature amplitude modulation (QAM) and 64QAM is used at the GBD, MGBD and the proposed scheme. When the proposed scheme is compared with RBD, the modulations of RBD and the proposed scheme are QPSK and 16QAM to achieve same data rate, respectively. When the proposed scheme is compared with GBD and MGBD, 16QAM and 64QAM is used to present simulation results with various parameter.
FFT size is 128. The convolution code is used for channel coding. When the QPSK and 16QAM is used, the code rate is 1 2 . When the 64QAM is used, the code rate is 2 3 . The total number of data streams for RBD is fixed at L total = N R . When the proposed scheme is compared with RBD, the total number of data streams for the proposed scheme is fixed at L total = N R 2 . Thus, the proposed scheme has higher modulation level in order that RBD and the proposed scheme have the same data rates. Since the GBD and the MGBD are schemes which reduce the data streams for obtaining diversity gain, the proposed scheme can be compared with GBD and MGBD in Section 5.3. Therefore, GBD, MGBD and the proposed scheme are evaluated with various numbers of the data streams. The various numbers of transmit data streams is limited to the number which is more than 2 3 of the number of receive antennas since the case with few data streams is impractical in terms of data rate.

Performance Comparison of RBD and Proposed Scheme
The enhancement of the BER performance which is obtained by diversity gain and normalized throughput of the proposed scheme are shown in Figures 4 and 5, respectively. The number of data streams for RBD is N R and QPSK is used at RBD. In proposed scheme, N R 2 data streams are transmitted and 16QAM is used to achieve same data rate with RBD. In other words, since the number of data streams for the proposed scheme is reduced to the half number of the data streams for RBD to obtain the diversity gain, modulation level is increased to 16QAM. The number of data streams for the proposed scheme is 4 in Figures 4 and 5.  The BER performances are shown in Figure 4 with N T = N R = 8. The proposed scheme has better performance compared to RBD in spite of the higher modulation level at the proposed scheme. Since the proposed scheme can achieve the transmit diversity gain by reducing the number of total data streams, the BER of the proposed scheme is better than RBD in Figure 4. The proposed scheme has especially steep slope from 5dB to 20dB due to diversity gain. In a short range with small transmit power, the proposed scheme shows lower BER performance than RBD due to high modulation level. The BER performance of the proposed scheme decreases gradually from 20dB.
The normalized throughput performances are shown in Figure 5 with N T = N R = 8. The throughput performances are normalized according to maximum throughput. The proposed scheme reduces the number of the data streams for achieving the transmit diversity. Thus, the modulation level of the proposed scheme is increased to 16QAM to achieve same data rate with RBD. Figure 5 shows that the maximum throughput performances of the proposed scheme and the RBD are same even though the proposed scheme transmits fewer data streams. Since the proposed scheme has better BER performance by obtaining transmit diversity, the proposed scheme can achieve maximum throughput with relatively lower transmit power. In other words, the proposed scheme which has lower computational complexity has maximum throughput with relatively lower transmit power compared to RBD.

Performance Comparison of GBD, MGBD and Proposed Scheme
The BER and normalized throughput performances of the GBD, MGBD and the proposed scheme are shown in Figures 6-8. When N T = N R = 8 and N T = N R = 16, the numbers of data streams from 6 to 8 and from 12 to 16 are considered, respectively. The cases with the number of the data streams which are less than the 2 3 of the number of receive antennas are inefficient since the loss of the data rate is serious compared to the transmit diversity gain. Therefore, the cases with the number of the data streams which are more than the 2 3 of the number of receive antennas are considered.
The BER performances of GBD, MGBD and the proposed scheme are shown in Figure 6 with N T = N R = 8 and Figure 7 with N T = N R = 16. The performance of each scheme with 16QAM is better than the performance of each scheme with 64QAM. When the number of data streams is same as the number of transmit antennas in Figures 6a and 7a, MGBD and the proposed scheme cannot obtain the diversity gain. In addition, the precoder of the MGBD and the proposed scheme is same as the precoder of the RBD. Therefore, the BER performances with full data streams of the MGBD and the proposed scheme are exactly same in Figures 6a and 7a. When the number of data streams is reduced, BER performance of the proposed scheme is enhanced like GBD and MGBD. The GBD scheme with 6 data streams in Figure 6c can achieve same performance as the proposed scheme with 7 data streams in Figure 6b. It means that when the BER performance of the GBD is same as the proposed scheme, the loss of data rate of the GBD is larger than the proposed scheme. The performance difference between GBD and the proposed scheme is the maximum when the number of data streams is full. The performance difference is decreased with the number of data streams. In other words, the performance enhancement of the GBD which is obtained by reducing the number of data streams is larger than the performance enhancement of the proposed scheme. In the proposed scheme, the whole row vector of the channel matrix remains as IUI when the number of reduced data streams is one. However, in the GBD scheme, the channel component corresponding to the minimum singular value which is obtained by decomposing the channel matrix remains. The loss of the channel gain at the proposed scheme is larger than GBD which leaves the channel component even though the row vector with the minimum power is chosen at the proposed scheme. When the number of reduced data streams increases, the number of remaining whole row vectors also grows at the proposed scheme. In other words, since the loss of the channel gain at the proposed scheme grows with the number of remaining row vectors, GBD scheme can achieve better performance with the number of reduced data streams which is enough for most devices to achieve diversity gain. However, GBD suffers huge loss of the throughput performances when the GBD scheme achieves better performance compared to proposed scheme. When the number of data stream is 6 and 12, the performance of the MGBD is better than the proposed scheme at the high transmit power. However, it does not mean that the BER performance of the proposed scheme is bad at the high transmit power. The BER performance of the proposed scheme is only worse than the performance of the MGBD and is good in terms of throughput performance.
(a) BER with 8 data streams (b) BER with 7 data streams (c) BER with 6 data streams  The normalized throughput performances of GBD, MGBD and the proposed scheme are shown in Figure 8 with N T = N R = 8 and N T = N R = 16. When the numbers of the transmit and receive antennas are 8 at Figure 8a,b, the throughput performances are normalized according to maximum data rate which is achieved with 8 data streams. When the numbers of the transmit and receive antennas are 16 at Figure 8c,d, the throughput performances are normalized according to maximum data rate which is achieved with 16 data streams. When GBD, MGBD and the proposed scheme utilize full data streams and 16QAM, the maximum data rate with full data streams cannot be achieved due to insufficient BER performance. When the number of data streams is reduced and the 16QAM is used, the data rate can be achieved due to the diversity gain. However, when the 64QAM is used in Figure 8b,d, the data rate cannot be achieved regardless of the number of the data streams. Since the Modulation level is too high, the BER performances become low. Therefore, even though the number of data streams is reduced, the data rate is not achieved. The conventional schemes and the proposed scheme are inappropriate schemes to be used with high level modulation. When the number of data streams is same with the number of the transmit antennas, the proposed scheme and MGBD has same precoder with RBD. Therefore, the throughput performances of MGBD and the proposed scheme with full data streams are exactly same. When the number of data streams is 7 in Figure 8a and 14 in Figure 8c, the proposed scheme achieves the data rate which is accomplished with 7 data streams and 14 data streams by obtaining diversity gain. The MGBD also achieves the data rate which is accomplished with 7 and 14 data streams. However, MGBD needs more transmit power to achieve the data rate with 7 and 14 data streams. In the case with the smallest data streams in Figure 8a,c, all schemes achieve the data rate which is accomplished with the smallest data streams by getting more sufficient diversity gain. In Figures 6c and 7c, the MGBD with 6 and 12 data streams has better BER performance than the proposed scheme at the high transmit power. However, Figure 8a,c show that the throughput performances of MGBD and the proposed scheme at the high transmit power are same. In addition, the proposed scheme achieves the data rate which is accomplished with the smallest data streams at the lower transmit power compared to MGBD. When the number of data streams is lower than the 2 3 of the number of receive antennas, the better BER performance is achieved by GBD and MGBD obviously. However, the data rate which is accomplished with the reduced data streams is also lower than the 2 3 of the maximum data rate with the full data streams. In other words, the throughput performance is inefficient when the data streams are reduced too much. Thus, the proposed scheme is more efficient than GBD and MGBD in terms of throughput since the proposed scheme with slightly reduced data streams has better performance compared to GBD and MGBD.

Conclusions
MU-MIMO systems are a key component of future wireless communication in terms of high data rate over a limited frequency resource. In MU-MIMO broadcast channel, IUI occurs inevitably at each device. Therefore, base station has to utilize precoding schemes for IUI reduction.
In this paper, the precoding scheme which is based on RBD and uses partial nulling concept is proposed in Rayleigh fading system. The Rayleigh fading channel which is frequently used in wireless communication system is used. The proposed scheme obtains the diversity gain from applying partial nulling by excluding the row vectors from channel matrix which is nullified. The dimension of the channel matrix which is used for forming the precoder is reduced by excluding row vectors with low power. When the number of data streams is smaller than the number of receive antennas, the proposed scheme achieves performance enhancement due to diversity gain and does not need extra operation at the receiver.
The simulation results according to the Monte Carlo method indicate that the performance of the proposed scheme is better than RBD by obtaining the diversity gain when RBD and the proposed scheme have the same data rate. When the BER is 10 −3 , the proposed scheme has 10dB gain. Since the proposed scheme uses the channel matrix with low dimension to form the precoder, the computational complexity of the proposed scheme is lower than the RBD. When the number of reduced data streams is not enough for most devices to achieve diversity gain, the performance of the proposed scheme is better than GBD and MGBD. Therefore, the GBD with 6 data streams can achieve same performance as the proposed scheme with 7 data streams. GBD and MGBD cannot avoid the inefficient operation at the receiver. In addition, since MGBD needs three SVD operations, the complexity of the MGBD at the base station is higher than the proposed scheme. Since the proposed scheme has advanced performance and low complexity, the proposed scheme has potential of practical applications in MU-MIMO systems.
By choosing the whole row vectors at the proposed scheme, the SVD of the channel for each device is not needed and the way to remove the remained IUI at the receiver is simple. However, the whole row vectors which are chosen are not used, the loss of the channel gain at the proposed scheme is greater than the loss of the channel gain at the GBD and MGBD. Therefore, when the number of the data streams is less than 2 3 of the number of receive antennas, the performances of the GBD and MGBD are better than the proposed scheme since the loss of the channel gain at the proposed scheme is larger than GBD and MGBD. Thus, further study is needed to decrease the loss of the channel gain at the proposed scheme when the number of the data streams is less than 2 3 of the number of receive antennas. In other words, the simple way to remain IUI without the huge loss of the channel gain has to be studied for obtaining performance enhancement with low complexity.