A Low Complexity Channel Estimation and Detection for Massive MIMO Using SC-FDE

: 5G Communications will support millimeter waves (mm-Wave), alongside the conventional centimeter waves, which will enable much higher throughputs and facilitate the employment of hundreds or thousands of antenna elements, commonly referred to as massive Multiple Input–Multiple Output (MIMO) systems. This article proposes and studies an e ﬃ cient low complexity receiver that jointly performs channel estimation based on superimposed pilots, and data detection, optimized for massive MIMO (m-MIMO). Superimposed pilots suppress the overheads associated with channel estimation based on conventional pilot symbols, which tends to be more demanding in the case of m-MIMO, leading to a reduction in spectral e ﬃ ciency. On the other hand, MIMO systems tend to be associated with an increase of complexity and increase of signal processing, with an exponential increase with the number of transmit and receive antennas. A reduction of complexity is obtained with the use of the two proposed algorithms. These algorithms reduce the complexity but present the disadvantage that they generate a certain level of interference. In this article, we consider an iterative receiver that performs the channel estimation using superimposed pilots and data detection, while mitigating the interference associated with the proposed algorithms, leading to a performance very close to that obtained with conventional pilots, but without the corresponding loss in the spectral e ﬃ ciency.


Introduction
Massive Multiple-Input-Multiple-Output (MIMO), alongside with millimeter-wave communications (mm-Wave), are two key techniques that will support the improvements of 5G (Fifth Generation) communications, such as improved spectral efficiency and network capacity [1,2]. Such a combination of massive MIMO (m-MIMO) with mm-Wave [3] is being utilized by other systems, such as IEEE 802.11ad [4], using bands around 60 GHz [5].
For small cells, dedicated to increasing the capacity in a small geographic area of 5G communications, mm-Wave communications will be used [6]. Under these conditions, the area throughput will be augmented by increasing the bandwidth of the communication channels. However, it is well known that there is a very large path loss in the mm-Wave spectrum. Therefore, in order to mitigate such path loss, MIMO techniques such as beamforming will be implemented to guarantee a large array gain at the reception, so that the signal-to-noise ratio (SNR) of the received signal is acceptable. It should also be noted that due to the large path-loss in mm-Wave, the interference is lower, and the cell density can be increased by reducing the inter-base station distance, which leads to capacity gains.
(MRC) [15] or on the Equal Gain Combiner (EGC) [10], and therefore, the computation requirements are kept at a lower level. The advantage of these algorithms relies on the fact that the computation of the pseudo-inverse of the channel is overcome, reducing the complexity. Nevertheless, since such algorithms are non-optimum, a certain interference is generated in the detection process. This is mitigated by incorporating an iterative receiver that performs the following functions: 1.
Interference Mitigation generated in the data detection process, using an Iterative Block-Decision Feedback Equalization technique (IB-DFE) [20].
This article presents and studies an iterative receiver based on MRC/EGC algorithms, that jointly perform channel estimation and data detection (including interference mitigation), optimized for m-MIMO and using SC-FDE signals.
This article is organized as follows: Section 2 describes the system and signal characterization for m-MIMO using SC-FDE transmissions; Section 3 describes the channel estimation using multiplexed or superimposed pilots; Section 4 analyzes the performance results and Section 5 concludes the article.

System and Signal Characterization
It is assumed a MIMO scenario ( Figure 1), with T transmit antennas and R receive antennas, using SC-FDE signals with the Quadrature Phase Shift Keying (QPSK) modulation.
Telecom 2020, 1,3 requirements are kept at a lower level. The advantage of these algorithms relies on the fact that the computation of the pseudo-inverse of the channel is overcome, reducing the complexity. Nevertheless, since such algorithms are non-optimum, a certain interference is generated in the detection process. This is mitigated by incorporating an iterative receiver that performs the following functions: 1. Channel Estimation using superimposed pilots; 2. Data Detection; 3. Interference Mitigation generated in the data detection process, using an Iterative Block-Decision Feedback Equalization technique (IB-DFE) [20]. This article presents and studies an iterative receiver based on MRC/EGC algorithms, that jointly perform channel estimation and data detection (including interference mitigation), optimized for m-MIMO and using SC-FDE signals.
This article is organized as follows: Section II presents an overview of 5G communications; section III describes the system and signal characterization for m-MIMO using SC-FDE transmissions; section IV describes the channel estimation using multiplexed or superimposed pilots; section V deals with the receiver design for MIMO detection and channel estimation; section VI analyzes the performance results and section VII concludes the article.

System and Signal Characterization
It is assumed a MIMO scenario ( Figure 1), with T transmit antennas and R receive antennas, using SC-FDE signals with the Quadrature Phase Shift Keying (QPSK) modulation. After removing the cyclic prefix, and assuming a cyclic prefix longer than the overall channel impulse response of each channel, the received frequency-domain signal comes,  We consider an N-length time-domain block signal to be transmitted of the form x n ; n = 0, 1, . . . , N − 1 .
After removing the cyclic prefix, and assuming a cyclic prefix longer than the overall channel impulse response of each channel, the received frequency-domain signal comes, where H k denotes the channel frequency response for the k-th subcarrier (which is assumed invariant during the transmission of a given block), i.e., H k ; k = 0, 1, . . . , N − 1 = DFT h n ; n = 0, 1, . . . , N − 1 . Moreover, N k is the frequency-domain block channel noise for that subcarrier. The received time-domain signal can be obtained from (1) as y n ; n = 0, 1, . . . , N − 1 = IDFT Y k ; k = 0, 1, . . . , N − 1 [15]. Assuming the conventional linear FDE for SC schemes, the post-processing comes, where β As expected, This means that the received symbol corresponds to the transmitted one, multiplied by H k 2 that can be viewed as a gradient, with a noise factor β (2) k , and added by the noise N eq k . In addition, we define α as N eq k denotes the equivalent noise for detection purposes, with E N eq k k , and with This article focus on the scenario with T data streams, where R T. The tth antenna has a block of N data symbols x (t) n ; n = 0, 1, . . . , N − 1 to send. At the BS, the received block associated to the rth user is represented by y (r) n ; k = 0, 1, . . . , N − 1 . As with other SC-FDE schemes, a cyclic prefix longer than the maximum overall channel impulse response is appended to each transmitted block and removed at the receiver. In this case, the corresponding frequency-domain block Y (r) where H k denotes the T × R channel matrix for the kth frequency, with (r, t)th element H (t,r) k . The transmitted symbols comes X k = X Let us consider the frequency domain estimated data symbols X k = X Assuming a non-iterative receiver, we have: where B k is defined as follows: 1.
For the ZF receiver, as: 2.
Using the MRC receiver, as: 3.
Using the EGC receiver, as: A disadvantage of the ZF relies on the need to compute the pseudo-inverse of the channel matrix, for each frequency component, which corresponds to a high processing power capability. This article mitigates this limitation by using the MRC and EGC algorithms.
For m-MIMO, with R 1, with small correlation between the channels between different transmitting and receiving antennas, the elements outside the main diagonal of  Based on A H k H k , we can implement the MRC or EGC, in the frequency domain. Nevertheless, for moderate values of T/R, the residual interference can still present a certain level. In order to mitigate this, we implement the iterative receiver (interference canceller), depicted in Figure 2, as: where the frequency domain estimated data symbols are X k = X The interference cancellation matrix C k can be computed by where I is an R × R identity matrix.
transmitting and receiving antennas, the elements outside the main diagonal of are much lower than the ones at its diagonal, where ( ) for moderate values of / T R , the residual interference can still present a certain level. In order to mitigate this, we implement the iterative receiver (interference canceller), depicted in Figure 2, as: where the frequency domain estimated data symbols are cancellation matrix k C can be computed by where I is an R R × identity matrix. This interference canceller is implemented using x . Note that n x can be obtained as defined in [15].
For the first iteration, there is no information about the transmitted symbols and 0 k = X .
It is worth noting that an increase in the number of transmitting antennas results in an increase of the symbols rate. Moreover, increasing the number of receiving antennas results in an increase of diversity and, as a result, in performance improvement. Apart from the EGC and MRC, the Matched Filter Bound (MFB) curve is a way to measure the channel modeled by the sum of delayed and independently Rayleigh-fading rays [15].
For a massive MIMO scheme with a large number of antennas at the receiver side, this technique does not require special pilot design. Nevertheless, a problem arises for a large number of transmit This interference canceller is implemented using X k = X 0 , . . . , X N−1 , with X k denoting the frequency-domain average values conditioned to the FDE output for the previous iteration [11], with X k = DFT x n . Note that x n can be obtained as defined in [15].
For the first iteration, there is no information about the transmitted symbols and X k = 0. It is worth noting that an increase in the number of transmitting antennas results in an increase of the symbols rate. Moreover, increasing the number of receiving antennas results in an increase of diversity and, as a result, in performance improvement. Apart from the EGC and MRC, the Matched Filter Bound (MFB) curve is a way to measure the channel modeled by the sum of delayed and independently Rayleigh-fading rays [15].
For a massive MIMO scheme with a large number of antennas at the receiver side, this technique does not require special pilot design. Nevertheless, a problem arises for a large number of transmit antennas with uncorrelated channels, since we should have orthogonal pilots for the different transmit antennas. This means switching off all antennas, but one in a successive way.
The proposed technique is generic, and valid for all frequencies, which are suitable for mm-Wave associated with massive MIMO, due to the smaller wavelength. This allows smaller antennas and packing more antennas in a given device. The main difference when mm-Wave is employed relies on the type of channel, which, in general, has a lower number of multipath components.

Channel Estimation
Let us first assume the use of conventional pilots, i.e., there is no data overlapping in the training block. Conventional pilots of training sequences comprise the periodic transmission of known symbols Telecom 2020, 1 8 utilized by the receiver to compute the channel parameters that are required for equalization purposes. In this scenario, the channel frequency response is estimated by using [11]: where Y (r) k denotes the signal at the r-th received antenna (r = 1, 2, . . . , R), and X (t)P k denotes the pilot transmitted by the t-th transmit antenna (t = 1, 2, . . . , T). Moreover, σ 2 P denotes the power of the pilots, i.e., training sequences (P stands for the pilot). It is assumed that [21] If the pilots associated to different transmit antennas are orthogonal, then there is no interference between antennas when estimating the corresponding channels (e.g., by using disjoint sets of subcarriers for different antennas). In this case, we have Since the channel impulse response is shorter than the cyclic prefix (which is just a fraction of the block duration), we can employ the enhanced channel estimates as [13] H (t,r) where w n = 1 if the nth time-domain sample is inside the cyclic prefix and 0 otherwise. In this case, the SNR at the channel estimates is improved by a factor T/T G , with T and T G denoting the duration of the useful part of the block and the cyclic prefix, respectively. Let us consider now the use of superimposed pilots, i.e., X k 0 for the subcarriers with pilots. Superimposed pilots suppress the overheads associated with channel estimation based on conventional pilot symbols, which tends to be more demanding in the case of m-MIMO, leading to a reduction in spectral efficiency. In the following, we will assume that [21] where σ 2 D stands for the power of the data. Let us assume a frame with N T time-domain blocks, each with N subcarriers. If the cyclic prefix of each FFT block has N G = NT G /T samples, we will need N G equally spaced frequency-domain pilots for the channel estimation. For pilot spacings in time and frequency ∆N T and ∆N F , respectively, the total number of pilots in the frame is given by This means that we have a pilot multiplicity or redundancy of To avoid significant performance degradation due to channel estimation errors, the SNR associated with the channel estimation, given by SNR est ≈ N R σ 2 P /σ 2 D , should be much higher than the SNR for the data SNR data = σ 2 D /σ 2 N .
In this scenario, assuming superimposed pilots (pilots added to data), data will represent interference to the channel estimation obtained with the pilots, and the pilots will represent interference to the channel estimation obtained with the pilots, which leads to a degradation of performance. This can be mitigated by employing pilots with relatively low power and average the pilots over a large number of blocks, so as to obtain accurate channel estimates (the window size should be such that the channel should be constant within it). Since different data blocks are uncorrelated and the data symbols have usually zero mean, this approach tends to be very efficient.
Therefore, the channel is initially estimated from the pilots, and then we remove the interference (pilots) from the received signal to detect the data symbols.
The procedure for obtaining the channel estimates follow iteratively the following three steps (see indexes in the chains of Figure 3): 1.
Obtain the channel estimates from the pilots (remove the data from the received signal, estimated from the previous iteration, if not the first iteration); 2.
Obtain the channel estimates from the data, after removing the pilots from the received signal (which represents interference); 3.
Combine the channel estimates obtained from 1. and 2., to improve the estimate of the channel, and repeat the process.

Performance Results
This section analyses the Bit Error Rate (BER) performance obtained with m-MIMO, after the estimation of the channel parameters using superimposed pilots. SC-FDE transmission technique is considered. Monte Carlo simulations were employed to measure the performance of the proposed m-MIMO system, using QPSK modulation and with a block length of N = 256 symbols (similar results were observed for other values of N, provided that N >> 1). A Rayleigh fading channel with 16 equal power paths was assumed (invariant during the block duration). The BER is evaluated as a function of N is the one-sided power spectral density of the noise and b E is the energy of the transmitted bits. Without loss of generality it is assumed that there is a pilot for each subcarrier of each block of the frame (and for each transmit antenna), i.e., A transmitter with linear power amplification is assumed, as well as perfect synchronization. Figure 4 shows the performance obtained with ideal channel estimation versus conventional pilots. It is assumed a 4 × 32 MIMO, with 4 iterations of the interference cancellation.

Channel Estimation with Conventional Pilots
Although achieving performances with conventional pilots very close to those obtained with ideal channel estimation, the conventional pilots present the disadvantage that they require reserving a part of the bandwidth for the periodic transmission of pilots or training sequence, used for data detection and equalization.
As can be seen, the MRC achieves a performance very close to the MFB. Note that the MFB consists of a performance lower bound, corresponding to a curve that quantifies the channel modeled  For the first iteration (step 1), an initial channel estimation is obtained by correlating the received signal Y (r)Rx k with the pilots, by using Equations (13) and then (16). As described above, if this is not the first iteration, then the data symbols are removed from the received blocks, as: and the channel estimation is calculated by using Equations (13) and (16). For the second iteration (step 2), the pilots are removed from the received blocks, as and the average values of the data symbols will be used as pilots for obtaining the channel frequency response estimate, as in [11] Since using α = 0 might lead to noise enhancement effects in the channel estimates when X (i) k 2 is small, we will consider α as defined in (4). If we have moderate to high SNR then and we could use α = 0. Note that we employ pilots with relatively low power, and we average the pilots over a large number of blocks so as to obtain accurate channel estimates. The computation of Equation (22) corresponds to an improved channel estimate obtained with the data estimated from the previous iteration is very effective because the data symbols have usually zero mean and different data blocks are uncorrelated. Naturally, there are limitations on the length of this averaging window, since the channel should be constant within it.
Finally, following step 3, the channel estimates obtained from the pilots ( H P k ) and from the data aided ( H D k ), can be combined to provide the normalized channel estimates with minimum error variance [14], as defined by where

Performance Results
This section analyses the Bit Error Rate (BER) performance obtained with m-MIMO, after the estimation of the channel parameters using superimposed pilots. SC-FDE transmission technique is considered. Monte Carlo simulations were employed to measure the performance of the proposed m-MIMO system, using QPSK modulation and with a block length of N = 256 symbols (similar results were observed for other values of N, provided that N >> 1). A Rayleigh fading channel with 16 equal power paths was assumed (invariant during the block duration). The BER is evaluated as a function of E b /N 0 , where N 0 is the one-sided power spectral density of the noise and E b is the energy of the transmitted bits.
Without loss of generality it is assumed that there is a pilot for each subcarrier of each block of the frame (and for each transmit antenna), i.e., ∆N F = ∆N T = 1, leading to N Frame P = NN T and a pilot multiplicity or redundancy of N R = N Frame P /N G = NN T /N G . The duration of the useful part of the blocks (N symbols) is 1µs and the cyclic prefix has a duration 0.125 µs.
A transmitter with linear power amplification is assumed, as well as perfect synchronization. Figure 4 shows the performance obtained with ideal channel estimation versus conventional pilots. It is assumed a 4 × 32 MIMO, with 4 iterations of the interference cancellation.

Channel Estimation with Conventional Pilots
Although achieving performances with conventional pilots very close to those obtained with ideal channel estimation, the conventional pilots present the disadvantage that they require reserving a part of the bandwidth for the periodic transmission of pilots or training sequence, used for data detection and equalization.
As can be seen, the MRC achieves a performance very close to the MFB. Note that the MFB consists of a performance lower bound, corresponding to a curve that quantifies the channel modeled by the sum of delayed and independently Rayleigh-fading rays. Under ideal channel conditions, the ZF is the scheme that follows the MRC in terms of performance, and the worst performance is achieved by the EGC. It is worth noting that the advantage of both MRC and EGC relies on its simplicity, as compared to the ZF. Whereas the ZF requires the computation of the inverse of the channel matrix, for each frequency component, the MRC and EGC do not. Moreover, it is known that the ZF presents noise enhancement, when utilized in post-processing mode, which is the current situation, is the reason why the MRC performs better than the ZF.
It is worth noting that the proposed technique applies to any constellation. The only difference relies on an adaptation of the interference cancellation receiver. Naturally, the sensitivity to channel estimation errors (and other impairments) increases with the constellation size. However, the main conclusions remain valid [21][22][23]. To avoid having duplication, only QPSK results are shown in this article.
As compared to OFDM, SC-FDE allows lowering the E b /N 0 values required for a certain BER, but the relative positions of curves do not change significantly.
Telecom 2020, 1,9 ZF is the scheme that follows the MRC in terms of performance, and the worst performance is achieved by the EGC. It is worth noting that the advantage of both MRC and EGC relies on its simplicity, as compared to the ZF. Whereas the ZF requires the computation of the inverse of the channel matrix, for each frequency component, the MRC and EGC do not. Moreover, it is known that the ZF presents noise enhancement, when utilized in post-processing mode, which is the current situation, is the reason why the MRC performs better than the ZF. It is worth noting that the proposed technique applies to any constellation. The only difference relies on an adaptation of the interference cancellation receiver. Naturally, the sensitivity to channel estimation errors (and other impairments) increases with the constellation size. However, the main conclusions remain valid [21][22][23]. To avoid having duplication, only QPSK results are shown in this article.
As compared to OFDM, SC-FDE allows lowering the Eb/N0 values required for a certain BER, but the relative positions of curves do not change significantly.  Figure 5 shows the performance results for 4 × 32 MIMO with conventional pilots and different iterations of the interference cancellation. Even with 1 iteration, the MRC performs better than the ZF. As before, although with lower processing complexity, the EGC performs worse than the ZF. Note that, with MRC and with EGC, a certain level of interference is generated, because these receivers are not optimum. In order to mitigate such interference, the proposed receiver incorporates an interference cancellation.  Figure 5 shows the performance results for 4 × 32 MIMO with conventional pilots and different iterations of the interference cancellation. Even with 1 iteration, the MRC performs better than the ZF. As before, although with lower processing complexity, the EGC performs worse than the ZF. Note that, with MRC and with EGC, a certain level of interference is generated, because these receivers are not optimum. In order to mitigate such interference, the proposed receiver incorporates an interference cancellation.
As can be viewed from Figure 5, the best performances, of both MRC and EGC, are achieved with four iterations. Nevertheless, the performances obtained with four iterations are close to those obtained with three iterations. It is also observed a degradation of performance when a single iteration is assumed. Using the interference cancellation associated with the MRC and EGC, one can choose between three or four iterations, as the difference is residual. Beyond four iterations, the performance improvement was almost negligible (not shown in Figure 5). It is shown that the best performance is achieved by the MRC with 4 iterations. Figure 6 compares   As can be viewed from Figure 5, the best performances, of both MRC and EGC, are achieved with four iterations. Nevertheless, the performances obtained with four iterations are close to those obtained with three iterations. It is also observed a degradation of performance when a single iteration is assumed. Using the interference cancellation associated with the MRC and EGC, one can choose between three or four iterations, as the difference is residual. Beyond four iterations, the performance improvement was almost negligible (not shown in Figure 5). It is shown that the best performance is achieved by the MRC with 4 iterations. Figure 6 compares the performance of the 4 × 32 MIMO against 4 × 256 MIMO and 16 × 256 MIMO. As expected, due to the higher level of receive diversity, the 4x256 MIMO performs better than the 4 × 32 MIMO. Nevertheless, it is noticeable that 16 × 256 MIMO performs worse than 4 × 256 MIMO. This occurs because the 16 × 256 MIMO comprises 16 parallel flows of symbols, while 4 × 256 MIMO comprises only 4 parallel flows of data, and therefore 16 × 256 MIMO corresponds to more data being transmitted, but also to more interference. This is valid for the MRC, EGC and ZF. It can be viewed that the results obtained with the MRC, for the 4 × 256 MIMO, are almost superimposed with the MFB curve.

Channel Estimation with Superimposed Pilots
For the purpose of channel estimation, this section considers superimposed or implicit pilots, i.e., pilots added to the data. The advantage of this approach relies on the fact that there is no need to reserve a certain bandwidth to send pilots or training sequences. Nevertheless, using superimposed pilots, data represents interference to the channel estimate obtained with the pilots, and the pilots represent interference to the channel estimate obtained with the data, which leads to a degradation of performance. This can be mitigated by employing pilots with relatively low power and average the pilots over a large number of blocks, so as to obtain accurate channel estimates (the window size should be such that the channel should be constant within it). We propose the receiver shown in Figure 3, which consists of the following phases: 1. Perform an initial channel estimation using the pilots (added to the data).

Channel Estimation with Superimposed Pilots
For the purpose of channel estimation, this section considers superimposed or implicit pilots, i.e., pilots added to the data. The advantage of this approach relies on the fact that there is no need to reserve a certain bandwidth to send pilots or training sequences. Nevertheless, using superimposed pilots, data represents interference to the channel estimate obtained with the pilots, and the pilots represent interference to the channel estimate obtained with the data, which leads to a degradation of performance. This can be mitigated by employing pilots with relatively low power and average the pilots over a large number of blocks, so as to obtain accurate channel estimates (the window size should be such that the channel should be constant within it). We propose the receiver shown in Figure 3, which consists of the following phases:

1.
Perform an initial channel estimation using the pilots (added to the data).

2.
Remove the pilots (that represent interference), detect the data, and perform an initial channel estimate using the data.

3.
Combine the channel estimate obtained from step 1. with that of step 2.

4.
Repeat steps 1 to 3 iteratively, with the new improved channel estimates, and removing the estimated data from the pilots of step 1. Figure 7 considers the performance results obtained with superimposed pilots compared with those obtained with conventional pilots. The superimposed pilots consider pilots power of 0 dB (pilots with the same power as the data), with a block length of five blocks of data symbols used to estimate the channel, and four iterations of the iterative channel estimator. As can be viewed, although the performances obtained with conventional pilots are slightly better than those obtained with superimposed pilots, the differences are residual. Nevertheless, superimposed pilots present the advantage of not having to reserve a certain bandwidth for the purpose of estimating the channel. Regardless of whether superimposed or conventional pilots are employed, the best overall performance is obtained with the MRC, being followed by the ZF and, finally, by the EGC. Once again, the ZF performs worse than the MRC because the former presents noise enhancement.  Figure 8 presents the performance results obtained superimposed pilots, with two versus four iterations of the channel estimator, with pilots power of 0 dB.
As previously described, two iterations simply consider one initial estimation from the pilots and another estimation from the data (after removing the pilots). These two estimates are combined, and the result is used to detect the data. In the case of four iterations of the channel estimator, this process is repeated, i.e., a third iteration considers the channel estimate from the pilots (after removing the data) and a fourth iteration, where the channel estimate is obtained from the data (after removal of the pilots). In this scenario, the estimates obtained in the third and fourth iterations are combined, and the result is used to perform data detection. The interference between pilots and data is mitigated by employing pilots with relatively low power and average the pilots over a large number of blocks, so as to obtain accurate channel estimates (the window size should be such that the channel should be constant within it). In this case, we have assumed a block of five symbols.
As expected, the performance results with four iterations are always better than those obtained with two iterations. This is valid for both MRC, ZF and EGC. As before, the best overall performance  Figure 8 presents the performance results obtained superimposed pilots, with two versus four iterations of the channel estimator, with pilots power of 0 dB.
As previously described, two iterations simply consider one initial estimation from the pilots and another estimation from the data (after removing the pilots). These two estimates are combined, and the result is used to detect the data. In the case of four iterations of the channel estimator, this process is repeated, i.e., a third iteration considers the channel estimate from the pilots (after removing the data) and a fourth iteration, where the channel estimate is obtained from the data (after removal of the pilots). In this scenario, the estimates obtained in the third and fourth iterations are combined, and the result is used to perform data detection. The interference between pilots and data is mitigated by employing pilots with relatively low power and average the pilots over a large number of blocks, so as to obtain accurate channel estimates (the window size should be such that the channel should be constant within it). In this case, we have assumed a block of five symbols.
As expected, the performance results with four iterations are always better than those obtained with two iterations. This is valid for both MRC, ZF and EGC. As before, the best overall performance is obtained with the MRC (with 4 iterations). It is worth noting that, beyond 4 iterations, the performance improvement is residual (not shown). Therefore, the proposed receivers and channel estimators should employ 4 iterations to achieve good performance.  Figure 9 considers the performance results with superimposed pilots, assuming a pilots power of 0 dB versus −3 dB, using a block length of 5 data symbols. It is worth noting that 0 dB corresponds to using the same power for the pilots and for the data, whereas −3 dB corresponds to using half of the power for the pilots, as compared to the data power. Regardless the receiver employed, the best overall performance is always achieved with a pilots power of 0 dB. It is worth noting that simulations with a pilots power of 3 dB were implemented (not shown in this article), but the best performance was also 0 dB. By using −3 dB of pilots power, small degradation of performance can be observed, being valid for both MRC, ZF and EGC.   Figure 9 considers the performance results with superimposed pilots, assuming a pilots power of 0 dB versus −3 dB, using a block length of 5 data symbols. It is worth noting that 0 dB corresponds to using the same power for the pilots and for the data, whereas −3 dB corresponds to using half of the power for the pilots, as compared to the data power. Regardless the receiver employed, the best overall performance is always achieved with a pilots power of 0 dB. It is worth noting that simulations with a pilots power of 3 dB were implemented (not shown in this article), but the best performance was also 0 dB. By using −3 dB of pilots power, small degradation of performance can be observed, being valid for both MRC, ZF and EGC. Figure 10 shows the performance results for the 4 × 32 MIMO with Superimposed Pilots, with a block length of five versus 10 data symbols. The performances obtained with 10 data symbols are always slightly better than those obtained with five data symbols. This is valid for both MRC, ZF and EGC. Nevertheless, this is obtained at the cost of much higher processing power, as 10 data blocks requires processing the double of the data than five data blocks. It is worth noting that degradation of performance is experienced when less than five data blocks are employed (simulations carried out, but not shown in this article). Therefore, considering the differences of performance, one can conclude that five data blocks is a good tradeoff between performance and processing complexity. As before, the best overall performance is obtained with the MRC, being followed by the ZF and, finally, by the EGC. to using the same power for the pilots and for the data, whereas −3 dB corresponds to using half of the power for the pilots, as compared to the data power. Regardless the receiver employed, the best overall performance is always achieved with a pilots power of 0 dB. It is worth noting that simulations with a pilots power of 3 dB were implemented (not shown in this article), but the best performance was also 0 dB. By using −3 dB of pilots power, small degradation of performance can be observed, being valid for both MRC, ZF and EGC.  Figure 10 shows the performance results for the 4 × 32 MIMO with Superimposed Pilots, with a block length of five versus 10 data symbols. The performances obtained with 10 data symbols are Telecom 2020, 1,14 always slightly better than those obtained with five data symbols. This is valid for both MRC, ZF and EGC. Nevertheless, this is obtained at the cost of much higher processing power, as 10 data blocks requires processing the double of the data than five data blocks. It is worth noting that degradation of performance is experienced when less than five data blocks are employed (simulations carried out, but not shown in this article). Therefore, considering the differences of performance, one can conclude that five data blocks is a good tradeoff between performance and processing complexity. As before, the best overall performance is obtained with the MRC, being followed by the ZF and, finally, by the EGC.

Conclusions
This article proposes and studies an efficient and low complex receiver that jointly performs channel estimation based on superimposed pilots, and data detection, optimized for massive MIMO. 5G Communications will support mm-Wave, which will enable much higher throughputs and facilitate the employment of Massive MIMO.
The use of superimposed pilots avoids the overheads associated with channel estimation based on conventional pilot symbols, which tends to be more demanding in the case of m-MIMO, and therefore, this technique achieves an improvement of spectral efficiency.
The proposed receiver uses two low complex algorithms: the MRC and the EGC. These algorithms are compared with the ZF. The advantage of the MRC and EGC relies on the fact that the ZF requires the computation of the pseudo-inverse of the channel matrix for each frequency component, processing not required with MRC/EGC, keeping the complexity requirements at a low level.
It was viewed that the MRC algorithm implemented in the proposed iterative receiver, which performs channel estimation using superimposed pilots and data detection for m-MIMO, achieves a performance very close to the MFB, just after few iterations. It was also viewed that the MRC achieves

Conclusions
This article proposes and studies an efficient and low complex receiver that jointly performs channel estimation based on superimposed pilots, and data detection, optimized for massive MIMO. 5G Communications will support mm-Wave, which will enable much higher throughputs and facilitate the employment of Massive MIMO.
The use of superimposed pilots avoids the overheads associated with channel estimation based on conventional pilot symbols, which tends to be more demanding in the case of m-MIMO, and therefore, this technique achieves an improvement of spectral efficiency.
The proposed receiver uses two low complex algorithms: the MRC and the EGC. These algorithms are compared with the ZF. The advantage of the MRC and EGC relies on the fact that the ZF requires the computation of the pseudo-inverse of the channel matrix for each frequency component, processing not required with MRC/EGC, keeping the complexity requirements at a low level.
It was viewed that the MRC algorithm implemented in the proposed iterative receiver, which performs channel estimation using superimposed pilots and data detection for m-MIMO, achieves a performance very close to the MFB, just after few iterations. It was also viewed that the MRC achieves better performance than the ZF, even with a reduction of complexity, which occurs since ZF introduces noise enhancement.
Author Contributions: All authors contributed equally to the article. All authors have read and agreed to the published version of the manuscript.
Funding: This work is funded by FCT/MCTES through national funds and when applicable co-funded EU funds under the project UIDB/EEA/50008/2020.