An E ﬃ cient Estimation of the Number of Optimal Iterations for GS Pre-coding in Downlink Massive MIMO Systems

: This paper proposes an estimation scheme of the number iterations for optimal Gauss–Seidel (GS) pre-coding in the downlink massive multiple input multiple output (MIMO) systems for the ﬁrst time. The number of iterations in GS pre-coding is one of the key parameters and should be estimated accurately prior to signal transmission in the downlink systems. For e ﬃ cient estimation without presentations of the closed-form solution for the GS pre-coding symbols, the proposed estimation scheme uses the relative method which calculates the normalized Euclidean distance (NED) between consecutive GS solutions by using the property of the monotonic decrease function of the GS solutions. Additionally, an e ﬃ cient initial solution for the GS pre-coding is proposed as a two term Neumann series (NS) based on the stair matrix for improving the accuracy of estimation and accelerating the convergence rate of the GS solution. The evaluated estimation performances verify high accuracy in the downlink massive MIMO systems even in low loading factors. In addition, an additional complexity for estimating the number of the optimal iterations is nearly negligible.


Introduction
Multi-user (MU) massive multiple input multiple output (MIMO) is one of core techniques for high error performance and spectral efficiency in wireless communication systems without additional bandwidths and transmit powers. Compared to traditional MU-MIMO systems which use only two, four, or eight antennas, the massive MIMO systems make the fading channels asymptotically orthogonal, since the base station (BS) has tens to hundreds of antennas [1][2][3][4]. The various performances were studied in the massive MIMO systems in [5][6][7][8][9]. Achievable user rate and throughput performances were analyzed in a multi-hop massive MIMO system [5], in a dual-polarized massive MIMO system [6] and in a transceiver hardware impairment massive MIMO system [7]. Additionally, the method for pilot allocation for enhanced channel estimation was studied in [8,9]. The orthogonal frequency division multiplexing (OFDM) technique has the advantage of robustness in the frequency selective fading channels. In the OFDM, each narrowband channel goes through a frequency flat channel since the OFDM uses a cyclic prefix (CP). The CP produces the effect of a fading channel with multiplications at the receiver. Like the 4G long term evolution (LTE), 5G new radio (NR) uses the transmission waveform as CP-OFDM with various numerologies. Therefore, the combination of the massive MIMO and the OFDM gives a tremendous data rate. At the BS in the massive MIMO system, the transmit symbols are preprocessed for eliminating inter user interference in the downlink system, and the received symbols are decoded for eliminating inter antenna interference in the uplink system. The linear zero-forcing (ZF) is a suboptimal pre-coding or detection scheme, since the massive MIMO system has a high loading factor (α) which is defined as the ratio of the number of BS antennas to the number of total users. In [1,2], the ergodic spectral efficiency for the ZF is analyzed in the massive MIMO system and suboptimal performance is proven in both downlink and uplink systems. However, the ZF calculates the inversion of the gram matrix, and it requires O K 3 multiplications when the number of total users is K. The huge complexity with respect to the number of users is prohibited in the massive MIMO system since dozens of active users should be supported. Thus, several pre-coding and detection schemes for reducing the complexity of the ZF were surveyed in [10][11][12][13]. Among various schemes, ref [12] proved that the Gauss-Seidel (GS) requires very low transmit power for obtaining target error performance compared to other schemes such as the Neumann series (NS), the Jacobi method, the Richardson method, and optimized coordinate descent. In [14][15][16][17][18][19][20][21][22][23][24][25], the pre-coding and detection schemes based on the GS were studied in the massive MIMO systems. The GS has high error performance using the iterative method and requires O K 2 multiplications since an algorithm requires only matrix-vector multiplications. For efficient signal transmission, the authors in [14] applied the GS pre-coding in the downlink system for the first time and proved that the GS has a √ 2 times higher convergence rate than the NS. For reducing the complexity and error propagation of the GS, the authors in [15] proposed a reduced signal dimension using an interference cancellation and effective decoding scheme using the grey region in the uplink system. For accelerating the convergence rate of the GS, the authors in [16][17][18][19] proposed the band matrix and the stair matrix as an initial matrix, respectively, in the uplink system. For better error performance of the GS, the authors in [20][21][22][23] applied the soft-output detector in the uplink system. Finally, for efficient hardware implementation of the GS, the authors in [24] proposed the low-complexity inversion of the triangular matrix in the uplink system, and the authors in [25] proposed the parallel decoding scheme in the uplink system. According to several surveys, the common problems in [14][15][16][17][18][19][20][21][22][23][24][25] are summarized as follows: First, the number of optimal iterations-which is one of the important parameters in the GS-is not estimated prior to signal transmission in the downlink system and signal detection in the uplink system, respectively. Second, all studies except for [15] evaluated the performances for the GS in only high α systems such as α = 4, α = 8, or α = 16. Third, all studies except for [14] considered the GS in only uplink systems.
The GS updates the present symbol using the past symbol at each iteration and the error performance is improved as the number of iterations increases. Therefore, the number of iterations is a very important parameter in the GS and should be decided accurately prior to signal transmission in the downlink system and signal detection in the uplink system, respectively, for obtaining target error performance with minimum complexity. Let the number of required iterations for obtaining target error performance beî. The complexity for the GS is high when the number of iterations is larger thanî, such asî + 1,î + 2, · · · . On the other hand, the error performance for the GS is degraded when the number of iterations is lower thanî, such asî − 1,î − 2, · · · , 1, and it causes an error floor where the error performance is not improved despite infinite transmit power. However, all studies for the GS in [14][15][16][17][18][19][20][21][22][23][24][25] did not consider a prior estimation of the number of iterations and obtained the number of optimal iterations by measuring error performances with Monte-Carlo simulations.
In addition to problems of the prior estimation, the authors in [14,[16][17][18][19][20][21][22][23][24][25] considered the massive MIMO systems which have high α values. The useful properties for the massive MIMO systems such as channel hardening effect and favorable propagation are more remarkable as α is increased. However, the massive MIMO systems which have low α should be researched for increasing cell capacity since the number of BS antennas cannot increase infinitely. Additionally, the GS solution is converged to the ZF solution regardless of an initial solution [26] when the number of iterations is sufficient since the gram matrix has the property of Hermitian positive definite [14]. Finally, in [15][16][17][18][19][20][21][22][23][24][25], the GS was used in the uplink systems since the GS was originally developed for the purpose of solving linear equations, where the uplink signal detection is modeled as a linear equation problem at the BS. However, it should be also noted that the GS is performed well in the downlink system for the mobile communication system. Thus, this paper proposes an estimation scheme of the number of iterations for the GS pre-coding in the downlink massive MIMO-OFDM system for optimal error performance. The optimal error performance denotes the error performance for the ZF pre-coding and this paper uses the ZF pre-coding as a benchmark. In addition, various values of α are considered from two to five since almost existing studies set α as 4, 8, or 16, and the simulation results show that the proposed estimation scheme has a high accuracy even in low α systems.
This paper is organized as follows: Section 2 expresses the system model for downlink MU massive MIMO-OFDM. Sections 3 and 4 deal with conventional GS pre-coding and proposed estimation of the number of iterations. Section 5 shows the graphical performance evaluations for the estimation scheme proposed in Section 4. Finally, Section 6 provides a conclusion and directions for future work.

Downlink Massive MIMO-OFDM System Model
This paper considers the N t × N u downlink MU massive MIMO-OFDM system in Figure 1. The number of transmit antennas at the BS is N t and the number of total users where each user has single antenna is N u (N t > N u ). At the transmitter, the modulated symbols are pre-coded by using one of the various pre-coding schemes mentioned in [10], and the OFDM symbols which include the CP go through wireless frequency selective channels. GS solution is converged to the ZF solution regardless of an initial solution [26] when the number of iterations is sufficient since the gram matrix has the property of Hermitian positive definite [14]. Finally, in [15][16][17][18][19][20][21][22][23][24][25], the GS was used in the uplink systems since the GS was originally developed for the purpose of solving linear equations, where the uplink signal detection is modeled as a linear equation problem at the BS. However, it should be also noted that the GS is performed well in the downlink system for the mobile communication system. Thus, this paper proposes an estimation scheme of the number of iterations for the GS precoding in the downlink massive MIMO-OFDM system for optimal error performance. The optimal error performance denotes the error performance for the ZF pre-coding and this paper uses the ZF pre-coding as a benchmark. In addition, various values of α are considered from two to five since almost existing studies set α as 4, 8, or 16, and the simulation results show that the proposed estimation scheme has a high accuracy even in low α systems.
This paper is organized as follows: Section 2 expresses the system model for downlink MU massive MIMO-OFDM. Sections 3 and 4 deal with conventional GS pre-coding and proposed estimation of the number of iterations. Section 5 shows the graphical performance evaluations for the estimation scheme proposed in Section 4. Finally, Section 6 provides a conclusion and directions for future work.

Downlink Massive MIMO-OFDM System Model
This paper considers the t u N N × downlink MU massive MIMO-OFDM system in Figure 1. The number of transmit antennas at the BS is t N and the number of total users where each user has single antenna is At the transmitter, the modulated symbols are pre-coded by using one of the various pre-coding schemes mentioned in [10], and the OFDM symbols which include the CP go through wireless frequency selective channels. The channel matrix H between all transmit antennas and total users is as follows, where the n -th entry in  is independent and identically distributed (i.i.d.) complex baseband Rayleigh fading coefficient with zero mean and unit variance from the nth transmit antenna to the m -th user. An index of subcarrier is ignored for simple notation and generality is not lost. At the m -th user, the received symbol m y after the fast Fourier transform (FFT) is as follows,  The channel matrix H between all transmit antennas and total users is as follows, where the n-th entry in h m = h m1 h m2 · · · h mN t T is independent and identically distributed (i.i.d.) complex baseband Rayleigh fading coefficient with zero mean and unit variance from the n-th transmit antenna to the m-th user. An index of subcarrier is ignored for simple notation and generality is not lost. At the m-th user, the received symbol y m after the fast Fourier transform (FFT) is as follows, where P is downlink transmit power, G is N t × N u pre-coding matrix, g n is the n-th column vector of G, x m is the m-th complex transmit symbol with zero mean and unit variance, and z m is the m-th complex additive white Gaussian noise (AWGN) with zero mean and unit variance. Finally, · F is the Frobenious norm operator.

Conventional GS Pre-coding
For expressing the conventional GS pre-coding, gram matrix W is represented using (1) as follows, In (3), W is Hermitian positive definite and W can be decomposed as follows, where D, L, and U are diagonal, strictly lower triangular matrix, and strictly upper triangular matrix of W, respectively. With the number of iterations i, the i-th GS solution s (i) is calculated as follows, is N u × 1 column vector for the modulated symbols and s (0) is an initial solution which is variously defined according to the system requirement. The final downlink transmit symbols vector s is calculated with matched filter and normalization factor as follows, where β is the normalization factor to prevent the variation of transmit power from the pre-coding. Again, the BS does not know the number of required iterations for obtaining target error performance and it should be estimated accurately prior to signal transmission.

Methodology
The main studies in [14][15][16][17][18][19][20][21][22][23][24][25] deal with the enhancement of error performance or complexity reduction in the GS. The number of iterations is a very important parameter which should be decided accurately prior to signal transmission in the downlink system. The number of insufficient iterations causes error floor like results in [14][15][16][17][18][19][20][21][22]24,25]. Thus, this paper proposes an estimation scheme of the number of minimum iterations in the GS pre-coding for optimal error performance. According to the system requirements, the system should estimate the number of minimum iterations which is only satisfied with the target error performance despite of the error floor. However, this paper considers the number of minimum iterations for only optimal error performance without the error floor. For performance enhancement, the design of initial solution is proposed. Additionally, for low-complexity estimation, complex mathematical analysis is not considered. Finally, for performances evaluation of the proposed estimation scheme, the number of estimated iterations is compared with the number of optimal iterations which is calculated in advance with Monte-Carlo simulation based on the system in (2).

Detailed Expression for Proposed Scheme
For estimating the number of optimal iterations, the proposed scheme deals with an initial solution s (0) and normalized Euclidean distance (NED) between s (i) and s (i−1) .
The GS solution s (i) is converged to the ZF solution s ZF regardless of an initial solution. The s ZF is calculated as follows, However, an initial solution is an important parameter for accelerating the convergence rate of the GS pre-coding. One of the widely used initial solutions in the GS is the zeros vector, since it is not complex. The authors in [14,23] used the zeros vector as an initial solution. However, the convergence rate is very slow since diagonal entries of W are approximated to N t in the massive MIMO system. For efficient usage of the massive MIMO system, the authors in [15,22,25] used an initial solution s (0) Diag as follows, s The Diag is a robust initial solution when the diagonal dominance of W is large. In [27], the authors proved that W is a fully diagonal dominant matrix when α is larger than 5.83. Therefore, the authors in [14,[16][17][18][19][20]22,23] evaluated performances in the massive MIMO systems where α is larger or equal to six. In addition, the required complexity for calculating s Diag is not robust when α is less than 5.83. For accelerating the convergence rate of the GS solution, the authors in [20,21,24] used an initial solution s (0) NS,Diag using a two term-NS based on D as follows, where I m is m × m identity matrix. Therefore, the authors in [21,24] evaluated performances in the massive MIMO systems, where α is larger or equal to four, unlike the authors in [14,[16][17][18][19][20].
Recently, the stair matrix is a newly emerging initial matrix since the robustness of the stair matrix is higher than the diagonal matrix with slight additions of the multiplications. The stair matrix S is satisfied with one of the following conditions: where · is floor operator. The authors in [17][18][19] used an initial solution s Stair for accelerating the convergence rate of the GS solution as follows, However, in [14][15][16][17][18][19][20][21][22][23][24][25], the trials for an initial solution as a two term NS based on the stair matrix in the GS were not revealed. In addition, a robust initial solution is required for a high accuracy of estimation and fast convergence of the GS solution in the massive MIMO systems where α is less than 5.83. Therefore, this paper uses an initial solution s for improving the accuracy of estimation and accelerating the convergence rate of the GS solution compared to the exiting studies as follows, The s (i) is converged to s ZF as the number of iterations is increased. Figure 2 shows the GS pre-coding symbol s The ( ) i s is converged to ZF s as the number of iterations is increased. Figure 2 shows the GS pre-coding symbol ( ) increased. Therefore, the ED is a monotonic decrease function with respect to i . Finally, the convergence rate of ( ) i m s accelerates more quickly when α is increased, since the diagonal dominance of W is proportional to α . Additionally, the convergence rate for the case (a) is faster than for case (b) since S is closer to W than D . However, ( )  For estimating the number of optimal iterations, one can consider the absolute method which is a mathematical analysis for obtaining closed form solutions of However, an approach of the closed form solution has several disadvantages as follows: First, the mathematical analysis is very complex since (5) includes an iterative method with respect to i . Second, the closed form solution should be analyzed according to ( ) 0 s . Third, the main advantages for the massive MIMO systems, such as the channel hardening effect and favorable propagation, are not gradually valid as α is decreased, and it causes severe errors for several random matrix theories. Therefore, this paper proposes the relative method using the NED which is a relative value of the ED for estimating the number of optimal iterations. The i -th NED ( ) i e is calculated at each i and i is estimated when the î -th NED is lower than a threshold which should be set as a small number for optimal error performance before an algorithm is performed. The proposed estimation scheme which uses the relative method does not require complex analysis and an algorithm is not varied regardless of an initial solution and α . For general expressions, the ED ( ) i e is calculated as follows, is decreased as i is increased. Therefore, the ED is a monotonic decrease function with respect to i. Finally, the convergence rate of s (i) m accelerates more quickly when α is increased, since the diagonal dominance of W is proportional to α. Additionally, the convergence rate for the case (a) is faster than for case (b) since S is closer to W than D. However, s (i) m in the 100 × 50 system is not perfectly converged to s m,ZF in both cases (a) and (b) since the number of maximum iterations is insufficient.
For estimating the number of optimal iterations, one can consider the absolute method which is a mathematical analysis for obtaining closed form solutions of E s (i) . However, an approach of the closed form solution has several disadvantages as follows: First, the mathematical analysis is very complex since (5) includes an iterative method with respect to i. Second, the closed form solution should be analyzed according to s (0) . Third, the main advantages for the massive MIMO systems, such as the channel hardening effect and favorable propagation, are not gradually valid as α is decreased, and it causes severe errors for several random matrix theories. Therefore, this paper proposes the relative method using the NED which is a relative value of the ED for estimating the number of optimal iterations. The i-th NED e (i) is calculated at each i and i is estimated when theî-th NED is lower than a threshold which should be set as a small number for optimal error performance before an algorithm is performed. The proposed estimation scheme which uses the relative method does not require complex analysis and an algorithm is not varied regardless of an initial solution and α. For general expressions, the ED e (i) is calculated as follows, Appl. Sci. 2020, 10, 8735 NS,Stair for improving the estimation accuracy and accelerating convergence rate of the GS solution. After the calculation of s (0) , the proposed algorithm is initiated with i = 1 and the first NED e (1) is calculated as follows, where e (1) is the same as e (1) since s (0) is obtained without the GS pre-coding and the ED between s (0) and s ZF is large in low α like the results shown in Figure 2. However, the result e (1) is used in the next loop for the normalization factor. The second NED e (2) is calculated as follows, where γ (i) is the i-th normalization factor and is calculated by summation from e (1) to e (i) except for γ (1) as follows, The NED which is represented as the ratio stands for the relative expression of the ED since γ (i) is the result of the summation of the EDs from e (1) to e (i) . Then, e (2) is compared with a threshold η. The number of optimal iterationsî is estimated as two when e (2) is lower than η. On the other hand, the next loop for i = 3 is performed when e (2) is larger than η since s (i) is not converged to s ZF . In this way, the i-th NED e (i) is calculated as follows, An algorithm ends when e (î) is lower than η and the number of estimated iterations isî. Finally, the GS pre-coding symbols vector s in (6) is generated by using the matched filter to the GS solution s (î) . Table 1 represents the number of required multiplications for the conventional GS pre-coding, the GS pre-coding based on the proposed estimation scheme, and the ZF pre-coding. In Table 1, only the number of multiplications is considered since the main operation is multiplication. The GS pre-coding based on the proposed estimation scheme has nearly the same complexity as conventional GS pre-coding since the additional operations for the GS pre-coding based on the proposed estimation scheme are only calculations of e (i) , i = 1, 2, · · · ,î and it requires onlyîN u multiplications where an order of the complexity with respect to N u is merely one.
However, the GS pre-coding based on the proposed estimation scheme has a greater complexity than the ZF when the number of estimated iterations is high. The massive MIMO systems where α is low and an inappropriate initial matrix is used require a high number of iterations for optimal error performance. Therefore, the number of maximum iterations i Max should be previously limited. The difference of the number of multiplications between the ZF pre-coding and the GS pre-coding based on the proposed estimation scheme is defined as C D = C ZF − C Pro , and C D , calculated as follows, In (18), i Max is initialized as the maximum i which is satisfied with the condition for C D > 0 using given N u . Table 2 represents i Max with respect to the number of users when N t is 100 and an initial matrix is s (0) NS,Stair . In (18), i Max is calculated as i Max = N u − 4. Table 1. The number of required multiplications for conventional GS pre-coding, GS pre-coding based on the proposed estimation scheme, and ZF pre-coding.

Scheme The Number of Multiplications
GS pre-coding without estimation (C GS ) GS pre-coding based on proposed estimation scheme (C Pro ) The proposed algorithm initializes i Max and the algorithm outputs i Max when the number of optimal iterations is not estimated at the (i Max − 1)-th loop. Algorithm 1 represents an algorithm for the GS pre-coding based on the proposed estimation scheme. For another initial matrix, only i Max and s (0) are varied. In Table 3, β = N u /(N t − N u ) is an approximate normalization factor for the transmit power using the distribution of Wishart matrix in [30] as follows, where tr(·) is trace operator. The β is perfectly the same as tr W −1 when α is larger or equal to 2 [14]. Finally, the proposed scheme sets η as 0.01 where the proposed algorithm outputsî when the NED between consecutive GS solutions is less than 1%. The η = 0.01 is the same as 1% since the total summation for the NEDs is one. However, the set threshold is not an optimal value, and slight errors between GS solutions based on the number of optimal iterations and estimated iterations occurred due to the threshold. However, its errors almost vanish by using digital demodulation. The calculation of an optimal threshold requires further study in future work. Algorithm 1. An algorithm for GS pre-coding using proposed estimation scheme Input and Initialization: 8.

Performance Evaluations
For performance evaluations of the proposed estimation scheme, the bit error rate (BER) performances and the complexity are measured. However, for a better understanding of the BER performances, the average magnitude for the diagonal dominance of W and mean square error (MSE) performances between the ZF pre-coding symbols and the GS pre-coding symbols with respect to various initial solutions are shown in advance. In addition, the NEDs with respect to the number of users are shown for clear visualization of the results in Figure 2. Table 3 represents the used parameters for performance evaluations. In Table 3, the uplink transmit power for channel estimation is set as 3dB less than the downlink transmit power P.
The proposed scheme uses an initial solution as s (0) NS,Stair . However, the additional performances are evaluated by using an initial solution s (0) Diag which is used in [15,22,25] for showing high accuracy of the proposed estimation scheme.
The average magnitude for the diagonal dominance of W is calculated as follows, where w mk is the entry for the m-th row and the k-th column of W. The average magnitude for the diagonal dominance of W decreases as the number of users increases since the law of large numbers is weak in a low α system. Figure 3 shows the MSE performances between the ZF pre-coding symbols and the GS pre-coding symbols with respect to various initial solutions in (8), (9), (11), and (12). An initial solution based on the two term NS has a better MSE performance than the initial solution without the NS since the NS accelerates the inversion of the initial matrix to W −1 . However, the MSE performances for an initial solution based on the two term NS are converged to the MSE performances for an initial solution without the NS since the diagonal dominance of W decreases as α is decreased. In addition, an initial solution based on the stair matrix has a better MSE performance than an initial solution based on the diagonal matrix since the stair matrix includes non-diagonal entries additionally unlike the diagonal matrix.
numbers is weak in a low α system. Figure 3 shows the MSE performances between the ZF pre-coding symbols and the GS precoding symbols with respect to various initial solutions in (8), (9), (11), and (12). An initial solution based on the two term NS has a better MSE performance than the initial solution without the NS since the NS accelerates the inversion of the initial matrix to 1 − W . However, the MSE performances for an initial solution based on the two term NS are converged to the MSE performances for an initial solution without the NS since the diagonal dominance of W decreases as α is decreased. In addition, an initial solution based on the stair matrix has a better MSE performance than an initial solution based on the diagonal matrix since the stair matrix includes non-diagonal entries additionally unlike the diagonal matrix.    NS,Stair as an initial solution is lower than the NED using s (0) Diag as an initial solution since the stair matrix includes additional non-diagonal entries of W and NS is applied. In addition, the NED decreases as the number of iterations increases since the diagonal dominance of W is proportional to α.
numbers is weak in a low α system. Figure 3 shows the MSE performances between the ZF pre-coding symbols and the GS precoding symbols with respect to various initial solutions in (8), (9), (11), and (12). An initial solution based on the two term NS has a better MSE performance than the initial solution without the NS since the NS accelerates the inversion of the initial matrix to 1 − W . However, the MSE performances for an initial solution based on the two term NS are converged to the MSE performances for an initial solution without the NS since the diagonal dominance of W decreases as α is decreased. In addition, an initial solution based on the stair matrix has a better MSE performance than an initial solution based on the diagonal matrix since the stair matrix includes non-diagonal entries additionally unlike the diagonal matrix.    Figure 5 shows the BER performances for the GS pre-coding, the Richardson pre-coding-which is one of popular low-complexity ZF methods in [11][12][13]-and the ZF pre-coding in 100 × 50 systems when an initial solution is s (0) NS,Stair and the 16-quadrature amplitude modulation (QAM) is used. The Richardson solution is perfectly approximated to the ZF solution in any α system like the GS pre-coding. In Figure 5, parameters L G and L R are the number of iterations in the GS pre-coding and the Richardson pre-coding, respectively. The importance of estimation of the number of optimal iterations is shown, where the GS pre-coding and the Richardson pre-coding have a lower BER performance than the ZF pre-coding since the number of iterations are insufficient. Figure 5 shows the BER performances for the GS pre-coding, the Richardson pre-coding-which is one of popular low-complexity ZF methods in [11][12][13]-and the ZF pre-coding in 100 50 × systems when an initial solution is ( ) 0 NS,Stair s and the 16-quadrature amplitude modulation (QAM) is used. The Richardson solution is perfectly approximated to the ZF solution in any α system like the GS precoding. In Figure 5, parameters G L and R L are the number of iterations in the GS pre-coding and the Richardson pre-coding, respectively. The importance of estimation of the number of optimal iterations is shown, where the GS pre-coding and the Richardson pre-coding have a lower BER performance than the ZF pre-coding since the number of iterations are insufficient.  Figure 6 shows the BER performances for the conventional GS pre-coding, the Richardson precoding, and the ZF pre-coding in a 100 50 × system with respect to distance between the BS and users when an initial solution is ( ) where W P is total transmit power at the BS, R P is path loss, and 0 N is the thermal noise power of the user. R P is calculated as follows: 4 20 log 10 log 10 10 , where c f is carrier frequency, c is speed of light, n is path loss exponent, and t G is the transmit antenna gain, respectively. Table 4 represents the used simulation parameters in Figure 6. Like the results in Figure 5, the GS pre-coding and the Richardson pre-coding have lower BER performances than the ZF pre-coding since the number of iterations is not enough. Again, an estimation of the number of optimal iterations is important in the iterative pre-coding method.  Figure 6 shows the BER performances for the conventional GS pre-coding, the Richardson pre-coding, and the ZF pre-coding in a 100 × 50 system with respect to distance between the BS and users when an initial solution is s (0) NS,Stair and the 16-QAM is used. For measuring the BER performances, the signal to noise ratio (SNR) value of user S d is calculated according to given distance d as follows: where P W is total transmit power at the BS, P R is path loss, and N 0 is the thermal noise power of the user. P R is calculated as follows: where f c is carrier frequency, c is speed of light, n is path loss exponent, and G t is the transmit antenna gain, respectively. Table 4 represents the used simulation parameters in Figure 6. Like the results in Figure 5, the GS pre-coding and the Richardson pre-coding have lower BER performances than the ZF pre-coding since the number of iterations is not enough. Again, an estimation of the number of optimal iterations is important in the iterative pre-coding method. Table 4. The used simulation parameters in Figure 6.

Parameter Value
P W 37 dBm (6 dBm less than maximum total radiated power in 5G new radio (NR) standards)  and it is merely used as a performance comparison of the proposed estimation scheme. In Table 5, the initial solution ( )  Figure 7 shows the BER performances for the GS pre-coding based on the number of optimal iterations, the GS pre-coding based on the proposed estimation scheme, and the ZF pre-coding in 100 × 20, 100 × 30, 100 × 40, and 100 × 50 systems when an initial solution is s (0) NS,Stair and the 16-QAM is used. In Table 5, the number of optimal iterations i opt for the GS pre-coding is represented when the initial solutions are s Diag . The i opt is defined as the number of minimum iterations which does not cause variation of the GS solution. In addition, i opt is obtained by simulations in advance where the authors find i opt when the GS solution in (6) has the same value as the ZF solution. However, i opt can be differently calculated according to the channel environment and it is merely used as a performance comparison of the proposed estimation scheme. In Table 5, the initial solution Diag requires more iterations compared to the initial solution s (0) NS,Stair . In Figure 7, i Avg is defined as the average value of total estimatedî s for 100,000 iterations of Monte-Carlo simulations. The i Avg is calculated as 3.08, 4.84, 7.78, and 12.27, respectively, in 100 × 20, 100 × 30, 100 × 40, and 100 × 50 systems, respectively. The proposed scheme estimates the number of iterations and finishes an algorithm when the NED between consecutive GS solutions is less than 1%. The estimated results are related to the NEDs in Figure 4 based on the stair matrix with two term NS. In Figure 4, the number of iterations for the NED 10 −2 as a natural number is about 3, 5, 8, and 12, respectively, in 100 × 20, 100 × 30, 100 × 40, and 100 × 50 systems, respectively. For clear comparisons, an estimation error between i opt and i Avg in a given α is defined as δ α , as follows, The calculated δ 5 , δ 3.3 , δ 2.5 , and δ 2 are 0.08, 0.16, 0.22, and 0.27, respectively, which are less than one and these do not cause serious BER loss compared to the ZF pre-coding. However, accuracy is reduced as the number of users is increased, since the diagonal dominance of W is proportional to α.  The estimation errors are higher than the results in Figure 7 since an initial solution is not robust in low α systems which have poor MSE performances (See Figure 3). Specifically, 2 δ is larger than  Figure 8 shows the BER performances for the GS pre-coding based on the number of optimal iterations, the GS pre-coding based on the proposed estimation scheme, and the ZF pre-coding in 100 × 20, 100 × 30, 100 × 40, and 100 × 50 systems when the initial solution is s (0) Diag and the 16-QAM is used. In Figure 8, the calculated δ 5 , δ 3.3 , δ 2.5 , and δ 2 are 0.38, 0.64, 0.74, and 1.21, respectively. The estimation errors are higher than the results in Figure 7 since an initial solution is not robust in low α systems which have poor MSE performances (See Figure 3). Specifically, δ 2 is larger than one and it causes BER loss compared to the ZF pre-coding. Therefore, the proposed scheme uses s is an initial solution is iN 2 u + N t N u + N u which is slightly higher than C Pro in Table 1. Figure 9 shows the BER performances for the GS pre-coding based on the number of optimal iterations, the GS pre-coding based on the proposed estimation scheme, and the ZF pre-coding in 100 × 20, 100 × 30, 100 × 40, and 100 × 50 systems when the initial solution is s (0) NS,Stair and the 64-QAM is used. In Figure 9, the calculated δ 5 , δ 3.3 , δ 2.5 , and δ 2 are 0.11, 0.14, 0.23, and 0.28, respectively. The estimation errors are similar to the results in Figure 7 since the different modulation gives the SNR gain and the effect of the modulation does not impact on the estimation of the number of iterations. one and it causes BER loss compared to the ZF pre-coding. Therefore, the proposed scheme uses iN N N N + + which is slightly higher than Pro C in Table   1.  Figure 9 shows the BER performances for the GS pre-coding based on the number of optimal iterations, the GS pre-coding based on the proposed estimation scheme, and the ZF pre-coding in 100 20 × , 100 30 × , 100 40 × , and 100 50 × systems when the initial solution is ( ) 0 NS,Stair s and the 64-QAM is used. In Figure 9, the calculated 5 δ , 3.3 δ , 2.5 δ , and 2 δ are 0.11, 0.14, 0.23, and 0.28, respectively. The estimation errors are similar to the results in Figure 7 since the different modulation gives the SNR gain and the effect of the modulation does not impact on the estimation of the number of iterations. Figure 10 shows the BER performances for the GS pre-coding based on the number of optimal iterations, the GS pre-coding based on the proposed estimation scheme and the ZF pre-coding in    Figure 10 shows the BER performances for the GS pre-coding based on the number of optimal iterations, the GS pre-coding based on the proposed estimation scheme and the ZF pre-coding in 256 × 64 (α = 4) and 256 × 128 (α = 2) systems when an initial solution is s (0) NS,Stair and the 16-QAM is used. In Figure 10, the calculated δ 4 and δ 2 are 0.07 and 0.28, respectively. The estimation errors are similar to the results in Figure 7 since the property of the massive MIMO system mainly depends on α where the estimation performance is not varied according to the number of absolute transmit antennas and users.    Figure 11 shows the BER performances for the GS pre-coding based on the number of optimal iterations, the GS pre-coding based on the proposed estimation scheme and the ZF pre-coding in 100 × 20, 100 × 30, 100 × 40, and 100 × 50 systems when the initial solution is s (0) NS,Stair and the 16-QAM is used in Rician fading. In Figure 11, the line of sight (LoS) is applied to the first path in the eight-path fading and the K factor, which denotes the LoS path to non-LoS (NLoS) path power ratio, is set as 5 dB. The GS pre-coding requires more iterations to obtain the same BER performance compared to the results in Figure 7 where the fading channel is modeled as Rayleigh fading since the LoS path is applied. The calculated δ 5 , δ 3.3 , δ 2.5 , and δ 2 are 0.13, 0.18, 0.47, and 0.75, respectively, which are less than one and the accuracy of estimation is not seriously reduced compared to the results in Figure 7 since the proposed estimation scheme uses relative methods which calculate the NED. Figure 12 shows the number of required multiplications for the GS pre-coding based on the proposed estimation scheme when an initial solution is s NS,Stair . In Figure 12, the values of i Avg are results in Figure 7. The number of multiplications for the GS pre-coding based on the proposed estimation scheme is increased slightly with respect to increased i Avg . In addition, the number of multiplications for the GS pre-coding based on the proposed estimation scheme is slightly higher than the number of multiplications for the conventional GS pre-coding since an additional complexity for the proposed scheme is only iN u multiplications for calculating NEDs. eight-path fading and the K factor, which denotes the LoS path to non-LoS (NLoS) path power ratio, is set as 5 dB. The GS pre-coding requires more iterations to obtain the same BER performance compared to the results in Figure 7 where the fading channel is modeled as Rayleigh fading since the LoS path is applied. The calculated 5 δ , 3.3 δ , 2.5 δ , and 2 δ are 0.13, 0.18, 0.47, and 0.75, respectively, which are less than one and the accuracy of estimation is not seriously reduced compared to the results in Figure 7 since the proposed estimation scheme uses relative methods which calculate the NED.  . In Figure 12, the values of Avg i are results in Figure 7. The number of multiplications for the GS pre-coding based on the proposed estimation scheme is increased slightly with respect to increased Avg i . In addition, the number of multiplications for the GS pre-coding based on the proposed estimation scheme is slightly higher than the number of multiplications for the conventional GS pre-coding since an additional complexity for the proposed scheme is only u iN multiplications for calculating NEDs.

Conclusions
The proposed scheme calculates the NEDs with respect to the number of iterations for optimal error performance in the downlink massive MIMO systems. For the uniqueness of the proposed scheme, this paper considers the massive MIMO systems where α is larger or equal to two which was not studied in the GS pre-coding. The proposed scheme uses the two term NS based on the stair matrix as an initial solution for the first time in the GS pre-coding for improving an accuracy of estimation and accelerating the convergence rate of the GS pre-coding compared to the existing initial solutions. The proposed scheme estimates the number of optimal iterations with high accuracy, without presentations of the closed-form solution for the GS pre-coding symbols. Additionally, the added complexity for the GS pre-coding based on the proposed estimation scheme is very low.

Conclusions
The proposed scheme calculates the NEDs with respect to the number of iterations for optimal error performance in the downlink massive MIMO systems. For the uniqueness of the proposed scheme, this paper considers the massive MIMO systems where α is larger or equal to two which was not studied in the GS pre-coding. The proposed scheme uses the two term NS based on the stair matrix as an initial solution for the first time in the GS pre-coding for improving an accuracy of estimation and accelerating the convergence rate of the GS pre-coding compared to the existing initial solutions. The proposed scheme estimates the number of optimal iterations with high accuracy, without presentations of the closed-form solution for the GS pre-coding symbols. Additionally, the added complexity for the GS pre-coding based on the proposed estimation scheme is very low. However, an accuracy of the proposed estimation scheme decreases as α decreases. In low α systems such as α = 2, an initial solution which is more robust than the two term NS based on the stair matrix, and an optimal adaptive threshold are required. Thus, our future works will focus on the design of an adaptive optimal initial solution and threshold according to a given α.