Two-Step Multiuser Equalization for Hybrid mmWave Massive MIMO GFDM Systems

: Although millimeter-wave (mmWave) and massive multiple input multiple output (mMIMO) can be considered as promising technologies for future mobile communications (beyond 5G or 6G), some hardware limitations limit their applicability. The hybrid analog-digital architecture has been introduced as a possible solution to avoid such issues. In this paper, we propose a two-step hybrid multi-user (MU) equalizer combined with low complexity hybrid precoder for wideband mmWave mMIMO systems, as well as a semi-analytical approach to evaluate its performance. The new digital non-orthogonal multi carrier modulation scheme generalized frequency division multiplexing (GFDM) is considered owing to its e ﬃ cient performance in terms of achieving higher spectral e ﬃ ciency, better control of out-of-band (OOB) emissions, and lower peak to average power ratio (PAPR) when compared with the orthogonal frequency division multiplexing (OFDM) access technique. First, a low complexity analog precoder is applied on the transmitter side. Then, at the base station (BS), the analog coe ﬃ cients of the hybrid equalizer are obtained by minimizing the mean square error (MSE) between the hybrid approach and the full digital counterpart. For the digital part, zero-forcing (ZF) is used to cancel the MU interference not mitigated by the analog part. The performance results show that the performance gap of the proposed hybrid scheme to the full digital counterpart reduces as the number of radio frequency (RF) chains increases. Moreover, the theoretical curves almost overlap with the simulated ones, which show that the semi-analytical approach is quite accurate. (fast IFFT fast (radio


Introduction
The number of users in telecommunication systems has grown rapidly in recent years, which led to work for the next generations of wireless networks in order to achieve the desired requirements in terms of reaching to higher data rates, reliable services, robustness, and spectral efficiency. For that reason, the predict key enabling technologies for the evolution of mobile technologies towards beyond 5G and 6G include the following: (1) the use of millimeter-wave (mmWave) frequency bands, as the conventional sub-6 GHz frequency spectrum, used by cellular communications networks, is currently saturated; and (2) the use of a massive number of antennas, called massive multiple input multiple output (MIMO) (mMIMO) or large-scale MIMO [1]. The implementation of mMIMO is very desirable for the smaller wavelength of the mmWave frequencies as it can improve the link reliability [2] and 2 of 19 mitigate the antenna array size restrictions to deal with the possibility of arranging more antennas into the confined space [3,4].
It is of paramount importance to look forward to the features of the beamforming techniques introduced by mmWave mMIMO, which has an important and attractive role in combating the limitation of high path loss (PL) [5,6]. Although the use of mmWave mMIMO is very attractive, the high cost and power consumption of mixed-signal components can be considered the main drawbacks to achieve full digital systems, that is, systems where each antenna is connected with a fully dedicated radio frequency (RF) chain [7,8]. A simple approach, so-called full analog beamforming with only phase shifters, has been proposed to overcome the large number of RF chains issue [9]. However, the full analog signal processing is just used for single-stream transmission, and its performance is limited by constraints on the amplitude and quantized phase shifters [10]. Then, a new architecture was addressed in [11], so-called hybrid analog-digital architecture, where the number of RF chains is much lower than the number of antennas in order to cope with the hardware constraints. In this approach, part of the signal processing is implemented in the analog domain, while a lower complexity processing is left to the digital domain.
Concerning wireless channel access and sharing, digital communication technologies have become more sophisticated and reliable, as they can operate with higher spectral efficiency. One of the most relevant specifications in the physical layer of the wireless system is the technology used to access and share the wireless channel. For this reason, the communications systems integrate multi carrier transmission technologies in order to accomplish their target, and thus orthogonal frequency division multiplexing (OFDM) is implemented in 4G (LTE downlink) systems owing to its ability to achieve high data rates transmission [12]. Although the use of OFDM is suitable for transmitting signals via a multipath propagation scenario, where frequency selective fading is verified at the channel response, the large peak signal that formed by the random sum of the phase subcarriers results in a high peak to average power ratio (PAPR). Therefore, a large power back-off is required for the power amplifiers, or then non-linear distortions will degrade the system performance [12]. Moreover, OFDM presents relatively high levels of out-of-band (OOB) emissions induced because of using rectangular pulse shaping filter in the transmitter, which results in strong interference into neighboring frequency bands [13]. That is why other waveform candidates were evaluated and proposed as an alternative of OFDM to overcome such problems, such as constant envelope OFDM (CE-OFDM), filter bank multi carrier (FBMC), universal filtered multi carrier (UFMC), and generalized frequency division multiplexing (GFDM). These modulation schemes are suitable for future systems (beyond 5G/6G), and GFDM can be considered the most flexible non-orthogonal multi carrier transmission scheme, among others [14].

Previous Works on Hybrid Architectures
In recent years, some works have been proposed for narrowband hybrid analog-digital systems [15][16][17][18][19][20][21][22]. In [15], the spatially scattering structure of mmWave channels was explored to formulate the hybrid RF/baseband beamforming scheme as a sparse reconstruction signal recovery problem. Then, the principle of orthogonal matching pursuit (OMP) algorithm is applied to develop the algorithms that accurately approximate optimal unconstrained precoders and combiners as a linear combination of beam steering vectors. To handle the inter-user interference, a hybrid MU algorithm was proposed for the uplink in [16]. This algorithm is based on the Gram-Schmidt method to compute the analog vectors, and then the minimum mean square error (MMSE) beamforming is performed over the effective channel. The authors in [17] proposed an efficient iterative algorithm for a single antenna user terminals (UTs) system to overcome the multiple access interference obtained in the scatterer-share circumstance, based on the joint hybrid analog-digital precoder and combiner design for mmWave by exploiting the duality of the uplink and downlink MU-MIMO channels. The OMP algorithm was employed for selecting the analog precoder-combiner pair to increase the channel gain, while the MMSE approach was used to get the digital combiner. A hybrid precoder for the downlink MU-MIMO system based on angles of Electronics 2020, 9, 1220 3 of 19 departure (AoDs) in frequency division duplex (FDD) systems was proposed in [18]. First, the beam steering and zero-interference conditions are assumed, and then a decomposition is performed in the analog and digital parts without losses, as these parts are jointly designed. The authors of [19] proposed beamforming methods, where the phase shifters are replaced by switches. In the first method a sub-connected phase shifter network is combined with a full connected switch architecture to reduce the number of the phase shifters. Then, the full connected switches are replaced by a sub-connected switch network with the aim of simplifying the switch network. A hybrid beamforming for optimizing beam control vectors based on a low complexity codebook and signal to interference and noise ratio (SINR) maximization was proposed in [20]. In the analog part, an optimization algorithm is used to alternately optimize the vectors of the receiver and transmitter. Then, the digital part is computed using the equivalent channel and singular value decomposition (SVD). The authors of [21] proposed a low-complexity hybrid beamforming scheme, considering an alternating MMSE algorithm on the basis of the traditional MMSE algorithm to minimize the MSE of the transmitted and received signals, using the orthogonality of the digital matrix and the idea of an iterative update. In order to maximize the energy efficiency of narrowband mmWave MIMO interference channels involving internet of things (IoT) devices, the authors in [22] designed a low-complexity two-stage hybrid transceiver, considering both perfect and imperfect channel state information (CSI).
Although the previous solutions focused on hybrid beamforming approaches for narrowband systems, wideband communications systems are expected for future systems (beyond 5G or 6G). Therefore, the solutions for hybrid mmWave mMIMO wideband systems are of paramount importance. Some works were addressed for hybrid full connected architectures in [23][24][25]. In [23], a precoding algorithm is proposed and efficient hybrid codebooks are developed in order to maximize the mutual information that can be obtained for any particular RF codebook. For each subcarrier, different digital precoding is performed in the frequency domain, while the analog precoder remains constant over the subcarriers. In [24], hybrid precoders for downlink space division multi access (SDMA) and orthogonal frequency division multi access (OFDMA) systems were addressed. The main purpose of the proposed low complexity hybrid precoder is to minimize the total transmit power of the BS, under data rate requirements of users and coverage constraint of signaling broadcasting. For this solution, iterative optimizations between analog and digital precoders are not required. In [25], a full iterative solution was proposed, where the analog and digital parts are iteratives. This approach introduced optimal performance in terms of removing the residual MU and inter-symbol interferences (ISIs). The previous schemes require a high number of phase shifter and connections.
To solve the issues of complexity and power consumption of hybrid full connected architectures, some approaches were addressed for hybrid sub-connected architectures in [26,27], but with losses in terms of performance. In the sub-connected architectures, each RF chain is only connected to a sub-set of antennas. The authors of [26] designed an efficient MU linear equalizer combined with full analog precoder for uplink single carrier-frequency division multiple access (SC-FDMA) systems. They considered single RF low complexity UTs using an efficient full analog precoder based on the knowledge of partial CSI. Then, at BS, the hybrid equalizer is optimized by minimizing the average BER. Finally, a hybrid MU equalizer based on a dynamic subarray antennas structure was proposed in [27]. In the first step, the analog part of equalizer is designed, where a dynamic antenna mapping algorithm is applied to connect each RF chain to antennas, and the phase of each phase shifter is computed. The analog equalizer is constant over the iterations and subcarriers owing to the hardware constraints. Then, the digital equalizer is computed iteratively over the subcarriers based on the iterative block decision feedback equalization (IB-DFE) principle.

Previous Works on GFDM
The principle of GFDM in dividing the input bits stream into several subcarriers and several subsymbols, and applying the impulse response of an appropriate pulse shaping filter circularly to each subcarrier, are the main factors in reducing the OOB emissions and PAPR experienced by OFDM [28]. On the other side, the cyclic pulse shaping filters lead to losing the orthogonality between the subcarriers, and thus the adjacent subcarriers may interfere causing inter-carrier interference (ICI). This leads to a degradation in the performance of the GFDM system [14]. As the pulse shaping filter has a clear impact on the performance of GFDM, there are many studies in the literature about this topic [28][29][30][31][32][33]. In [28], the widely used filter in GFDM systems is the raised cosine (RC) filter, which is considered a non-casual ideal filter. In practice, this type of filter increases the OOB signal radiations, and thus system delay. For that reason, in [29], different types of improved Nyquist pulse shaping filters are used in combination with the GFDM ZF receiver. The study showed that the proposed filter has lower OOB emissions and a better symbol error rate (SEP) performance compared with root-raised cosine (RRC) in the case of using 16-quadrature amplitude modulation (16-QAM) transmission over an additive white Gaussian noise (AWGN) channel. The OOB emissions and SER of ramp-based pulse shaping filter of GFDM system with RRC and Xia filters for AWGN channel are compared in [30]. This type of filter has better SEP performance and lower OOB emission compared with RRC and Xia pulses, while Xia filter outperforms RRC in terms of OOB radiation. In [31], the authors introduced the results of the proposed new pulse shaping filter from the linear combination of the two pulse shapes. The proposed pulse resulted in better OOB emissions and better resistance to ISI owing to the vertical and horizontal pulse sharpness compared with RRC filter. Another solution was presented in [32], where the quadratic programming (QP) approach is applied as a newly designed pulse shaping filter to reduce the OOB emissions. Moreover, the authors in [33] presented how to modify the GFDM modulation technique to give the same FBMC performance. Therefore, the study introduced the BER performance of GFDM and FBMC schemes under three different channel models: AWGN, time-invariant frequency-selective, and time-variant frequency-selective. This modification led to achieving a linear filtering behavior that can provide an improvement in the OOB emissions.

Contributions
In this paper, we propose for the uplink a hybrid analog-digital MU equalizer for wideband mmWave mMIMO systems using GFDM as an access technique. To the best of our knowledge, hybrid MU equalizer schemes specifically designed for mmWave mMIMO GFDM systems have not been addressed in the literature yet. Given the hardware limitations of these systems, our design options are as follows: • The use of GFDM, which deals with multi-path effects as the OFDM scheme, but the OOB emissions and PAPR issues are decreased.

•
The use of low-complexity UTs, employing a phase shifter network to perform the analog precoding. Two types of analog precoders are used based on the levels of CSI knowledge at the UTs: random and AoD-based precoders.

•
The use of hybrid analog-digital receiver structure, because to have one dedicated RF chain per antenna would be impractical owing to hardware costs and power consumption.

•
It is assumed that the analog coefficients are constant over the subcarriers owing to hardware constraints, because, if the analog equalizer is designed as a frequency selective filter, additional hardware would be needed. This assumption is followed by most of the previous works on hybrid beamforming [23][24][25].
The main contributions of this paper are as follows: • The analog coefficients are derived by minimizing the MSE between the hybrid approach and the full digital counterpart. They are computed by selecting a set of vectors from a dictionary based on array response vectors of channel.

•
The ZF criterion is considered at the digital part on a per subcarrier basis to remove the interference not mitigated on the analog part.

•
A semi-analytical, yet accurate approach for obtaining the performance of the proposed hybrid GFDM system is also proposed.

of 19
The remainder of this paper is organized as follows. Section 2 describes the system model considered in this work. Sections 3 and 4 present the design of the transmitter and receiver, respectively. Finally, the main performance results are shown in Section 5 and the conclusions are drawn in Section 6.

Notations
Boldface capital letters denote matrices and the boldface lowercase letters denote column vectors. The function diag(a) correspond to a diagonal matrix A with diagonal entries equal to vector a, and diag(A) gives a diagonal matrix with entries equal to the diagonal entries of the matrix A. The operations (.) * , (.) T , and (.) H represent the conjugate, transpose, and Hermitian transpose of a matrix, respectively. {α l } L l=1 denotes an L length sequence. The vector a = [a q ] 1≤q≤Q 1 ∈ C Q 1 Q 2 and the

System Model
In this manuscript, we consider an uplink mmWave mMIMO hybrid analog-digital system that uses GFDM as the access technique. The GFDM block has K subcarriers, M timeslots, and N samples per symbol (N = KL), where L is the oversampling factor. This system is designed for U UTs sharing the same radio resources. Each user transmits one data stream per subcarrier, and has N tx transmitting antennas and a single RF chain. The base station is equipped with a number of RF chains (N RF rx ) lower or equal than the number of receiving antennas (N rx ), that is, where N cl denotes the number of clusters and N ray denotes the number of paths per cluster, as discussed in [25] and given by where H u,d is the channel in time domain and can be expressed as noting that E H u,d 2 F = N rx N tx and the remaining variables are as follows: • ρ PL is the path loss (PL) between the UTs and the BSs; • α u q,r is the complex path gain at the rth ray of the qth scattering cluster; • p rc (.) is the pulse shaping filter function, where T S ,τ u q , and τ u q,r are the sampling interval, the time delay of qth scattering cluster, and the relative time delay, respectively; • a tx,u (θ u q − ϑ u q,r ) denotes the normalized transmitting array response vector with the AoD θ u q and the relative angle of departure ϑ u q,r at the rth ray of the qth scattering cluster; • a rx,u (φ u q − ϕ u q,r ) denotes the normalized receiving array response vector with the angle of arrival (AoA) φ u q and the relative angle of arrival ϕ u q,r at the rth ray of the qth scattering cluster.
For a uniform linear array (ULA), the normalized array response vector is [15] a ULA (φ) Electronics 2020, 9, 1220 where λ is the wavelength, d is the inter element spacing, and N ant is the number of array antennas elements. The path delays have a uniform distribution in [0, DT s ], where DT s denotes the maximum channel ray delay with significant power and, for the angles, the random distribution presented in [23] is assumed. The power α u q,r has a decay equal to β q 1 ,q 2 from the q 1 th cluster to the q 2 th cluster, and a decay equal to β q,r 1 ,r 2 from the r 1 th ray to r 2 th ray of the qth cluster.

Transmitter Design
This section introduces the proposed GFDM transmitter. First, the transmitter model is presented, followed by two analog precoders based on different levels of CSI knowledge. In the first one, we assume that the UT has no CSI knowledge, while in the second one, partial CSI knowledge (the average AoD of each cluster) is assumed.

Transmitter Model
The block diagram of the transmitter side is based on the low complexity modulation scheme GFDM and has two main parts, digital and analog, as depicted in Figure 1. In the digital part, first the input bit stream Each vector is spread on the K subcarriers and M time slots. The converting process from time to frequency domain can be applied by Z M ∈ C M×M to each d u,k , where the result Z M d u,k represents the data at subcarrier k in the frequency domain [34]. The transformed vector Z M d u,k is passed through the second frequency domain processing block that contains three stages. These stages in Figure 1 correspond to the three blocks (up-sampling, filtering, and up-conversion operations). Each operation may be represented by a matrix, thus the signal at the output of each block is equal to the signal at the input multiplied by the corresponding matrix. Therefore, the repetition matrix R (L) that has the identity matrices I M is applied to the obtained frequency samples, in order to duplicate the samples L times. Then, to each subcarrier, the diagonal of the pulse shaping filter matrix Γ tx = diag(Z LM g) ∈ C LM×LM that contains the frequency samples of the filter g will be applied. Herein, the functionality of the permutation matrix P (k) ∈ C NM×LM on the kth subcarrier signal is used to circular up-convert the signal from base-band to band-pass, where P (1) Electronics 2020, 9, x FOR PEER REVIEW 7 of 20 It is of paramount importance to mention that the transmitter model of GFDM can be represented by a single matrix in order to facilitate the application of standard receiver methods such as matched filter (MF), ZF, and minimum mean square error (MMSE). In this case, the matrix representation Equation (5) can be rewritten as It is of paramount importance to mention that the transmitter model of GFDM can be represented by a single matrix in order to facilitate the application of standard receiver methods such as matched filter (MF), ZF, and minimum mean square error (MMSE). In this case, the matrix representation Equation (5) can be rewritten as where Finally, transmitted signal x u,l ∈ C N tx is obtained after applying the analog precoder f a,u ∈ C N tx , as The analog precoder elements are phase shifters, that is, f a,u (i) 2 = N −1 tx , and because of the hardware constraints, its entries have the same amplitude. For the analog precoder, we address two cases of the availability of the CSI knowledge at the transmitter side. First, we proposed a random precoder where each UT has no access to CSI. Then, on the basis of [23], we suppose that the UT has access to a partial CSI, that os, only the average AoD of each channel cluster.

No CSI-Based Precoder
The transmitter here has no access to the CSI, which means that the analog precoder vector of the uth UT is generated randomly accordingly to noting that φ u n ∈ [0, 1], where n ∈ {1, . . . , N tx } and u ∈ {1, . . . , U}, are i.i.d. uniform random variables.

Partial CSI-Based Precoder
The main purpose of this approach is to maximize the required user's power and conduct the beam to the dominant direction of the channel. Therefore, based on the channel model described in Section 2, first we must build the following matrix: where a tx,u (φ tx,u q ), q = 0, . . . , N cl − 1 is given by (3) and φ tx,u q is the average AoD of the qth cluster. Then, the eigenvalue decomposition of A tx,u A H tx,u = Λ tx,u Σ tx,u Λ H tx,u is computed. Finally, the dominant channel direction is selected to use in the analog precoder, making

Receiver Design
This section presents the proposed receiver scheme. First, the receiver model is introduced, and then an expression to compute the bit error rate (BER) is derived. After that, the hybrid analog-digital equalizer is designed. A decoupled transmitter-receiver design is assumed for formulating the optimization problem and computing the equalizer matrices. For the analog part, the equalizer elements are selected from a dictionary based on the channel array response vectors. The selection procedure considers as a metric the weighted error between the hybrid and the full digital equalizer matrices. Because of hardware constraints, we assume that the analog equalizer is constant over the subcarriers, as in [23,27]. Then, the digital part is computed on a per subcarrier basis, over the effective channel, to remove the interference ignored on the analog part.

Receiver Model
The receiver model of the proposed hybrid analog-digital receiver is described in this section. First, let us consider the unequalized received signal, given by where x u,l ∈ C N tx is the transmitted signal of the uth user, n l ∈ C N rx is a vector of Gaussian noise (AWGN) that has zero mean, and variance σ 2 n , H l is the equivalent channel defined as and c l = c 1,l , . . . , c U,l T ∈ C U . From Equation (6) we can obtain the time domain transmitted signal, given by where C = [c 1 , . . . , c U ] ∈ C NM×U and D = [d 1 , . . . , d U ] ∈ C KM×U . We can prove that the matrix of the unequalized received signal Y = [y 0 , . . . y NM−1 ] ∈ C N rx ×NM in the frequency domain is where N = [n 0 , . . . n NM−1 ] ∈ C N rx ×NM and E l = diag([0, 0, . . . , 1, .., 0, 0]) ∈ C NM×NM is a matrix of zeros, with 1 in the lth position of diagonal. Then, from Equations (11), (13), and (14), we have where the received signal in the time domain is Y Z H NM T . The block diagram of the receiver is shown in Figure 2; it has two parts, as in the transmitter, an analog and digital part. First, the analog equalizer W a ∈ C N rx ×N RF rx is applied on the received signal, as where W a is just composed by phase shifters and its matrix entries must have the same amplitude, that is, W a (i, j) 2 = N −1 rx . After that, the resulted signal passes through the RF chains in order to be processed in the digital domain. This signal will be converted to the frequency domain by DFT down Then, the digital equalizer W d,l ∈ C U×N RF rx is performed, and thus the equalized receive signal can be given by . After that, the operations of the GFDM are the same as those performed at the transmitter ( Figure  1) but in reverse order and transposed conjugate. Therefore, the transpose of the permutation matrix corresponds to the GFDM matched filter equalizer. In this case, the estimate of data bits sequence can be obtained after the de-mapping process, as noting that ˆˆ× ∈   After that, the operations of the GFDM are the same as those performed at the transmitter ( Figure 1) but in reverse order and transposed conjugate. Therefore, the transpose of the permutation matrix P (k) is applied on the kth subcarrier to make circular down-conversion to zero frequency. Then, the receiver filter Γ

Semi-Analytical Performance Approximation
with only LM filter coefficients is applied. Thus, the down-sampling process by a factor L, represented by (R (L) ) T , is needed to get M samples that match the transmitted data in terms of the number of symbols on the kth subcarrier. Finally, the resulting signal is converted to the time domain by applying IDFT matrix Z H M . All these operations can be represented by the Hermitian transpose of the same single matrix used in the transmitter A H , which corresponds to the GFDM matched filter equalizer. In this case, the estimate of data bits sequence can be obtained after the de-mapping process, asD T ∈ C KM . The formulas from Equations (16)- (19), may be represented in an equivalent expression to Equation (19), aŝ

Semi-Analytical Performance Approximation
From AA T = I NM and NM−1 l=0 E l+1 = I NM , it can be proven that therefore, we can rewrite D T as From Equation (20) and Equation (21), we obtain the error ∆ D =D T − D T , given by where we can identify two contributions: (1) the residual ISI, and (2) the part corresponding to the channel noise. From Equation (22), we can prove that the mean square error (MSE) is where where W d,l = w d,1,l , . . . , w d,U,l T and e u ∈ C U is a unit vector with entry u equal to one, while all others are zeros. From Equation (24), we can obtain a semi-analytical BER approximation for an M-QAM constellation with Gray mapping, given by [35] , and Q(.) denotes the Q-function.

Hybrid Analog-Digital Equalizer
In this section the two-step equalizer is proposed, where, in the first step, we compute the analog equalizer, while in the second one, we compute the digital part. The analog equalizer is computed to minimize the error between hybrid and full digital schemes. Owing to hardware constraints, the analog equalizer is constant over the subcarriers. Finally, for the digital part, a ZF approach is explored, to remove the interference that was neglected in the analog part. This means the inversion of only low-dimensional matrices, and the digital equalizer is computed on a per subcarrier basis.

Optimization Problem
A general optimization problem for a hybrid receiver can be formulated as where W a denotes the set of feasible analog coefficients. As shown in Appendix A, for the particular case wherein a maximal ratio combining (MRC) approach is adopted for the analog equalizer, we can simplify Equation (26) to where its solution gives us the analog equalizer matrix.

Analog Part of the Hybrid Equalizer
For the analog part, the hybrid MU linear equalizer is optimized by minimizing the mean square error between the hybrid approach and the full digital counterpart. In the analog equalizer, the MU interference is not considered, and only the power of desired users is maximized, that is, an MRC-based approach is implemented in the analog part, as we can see in the optimization problem in Equation (27). For that, we consider a sequential procedure wherein we select one equalizer vector at a time, up to a total of N RF rx selected vectors. Let w a,r ∈ C N rx be the equalizer vector of the pth RF chain, such that W a, p = [w a,1 , . . . , w a, (27), the optimization problem simplifies to w a,p = arg min where F a,p denotes the set of feasible analog vectors for step p. From the Karush-Kuhn-Tucker (KKT) conditions, we can obtain the vector w d,l,p , and replacing it in the objective function of Equation (29), we obtain In Equation (30) we can identify two terms: the first one where we can see that is independent of w a,r ; and the second one is a correlation involving w a,p . Then, Equation (29) is equivalent to the maximization of the second term. Considering a dictionary-based approach, where w a,p is selected from a dictionary given by where The optimization problem Equation (29) where w a,p = A (n opt,p ) rx . The proposed algorithm is summarized in the Algorithm 1. First, we initialized the analog and residue matrices, and then there is a loop where, in each iteration, we select a vector until the total of N RF rx . In the loop, the vector is first selected considering the criteria of line 4, which we add to our set of selected vectors in line 5. After that, computing a temporary digital matrix and a normalization constant in line 6 and 7, respectively. Finally, in line 8, we update the residue matrix.

Digital Part of the Hybrid Equalizer
In the digital part, we consider the ZF detector, with the aim of cancelling the interference neglected in the analog part to avoid the inversion of the high-dimension matrices. Therefore, the interference is only explicitly taken into account in the digital part, where the needed matrixial inversions are computationally much less demanding. Therefore, we assume that the digital equalizer is given by where H eq,l = (W a ) H H l .

Complexity Computation
In this section, we perform the complexity analysis of the proposed hybrid equalizer. This evaluation is divided into two parts, the computation of the analog equalizer (Algorithm 1) and the computation of the digital equalizer.
As we can see in Algorithm 1, metric (33) is computed, whose complexity results from product of a matrix of size U × N rx by another matrix of size N rx × N cl N ray U. Therefore, the complexity of (33) is O(N cl N ray N rx U 2 ). For each RF chain, the metric is computed once, and then the complexity of Algorithm 1 is O(N cl N ray N rx N RF rx U 2 ). Additionally, the digital equalizer is computed based on the inversion of a U × U matrix, whose complexity is O(U 3 ). Therefore, we conclude that O(U 3 + N cl N ray N rx N RF rx U 2 ) is the total complexity computation of the proposed hybrid equalizer. The complexity of the analog precoder is O(N 2 tx ).

Performance Results
This section shows the performance of mmWave mMIMO hybrid analog-digital system using GFDM modulation. The results are evaluated through the BER (E b /N 0 ) as a performance metric, where E b is the average bit energy and N 0 is one-sided noise power spectral density (PSD). We assume that the average E b /N 0 = σ 2 u /(2σ 2 n ) = σ −2 n /2 is the same for all users, where σ 2 1 = . . . , = σ 2 U = 1. In this work, the used simulation parameters are depicted in Table 1. Figure 3 shows the comparison between the proposed hybrid analog-digital scheme and the full digital one, for two types of precoders: (a) random-based precoder and (b) AoD-based precoder. Taking into consideration that the number of transmitting antennas is N tx = 8, and the number of recieving antennas is N rx = 32 for different number of RF chains, where N RF rx = 4, 8, 16, 32. For the random precoding (case (a), we can observe that the best performance among the addressed hybrid cases was when N RF rx = 16, with a gap of only around 0.5 dB at a BER target of 10 −3 compared with the full digital curve. Furthermore, it can be noticed that the gap from N RF rx = 16 to N RF rx = 8 is much lower than the gap between N RF rx = 16 and N RF rx = 4, especially at the high SNR region. Better performance results can be depicted in case b, where AoD precoding is applied because partial CSI is available at the transmitter side. For both cases, the performance tends to the one obtained with full digital approach when the number of RF chains increases as expected.
For more illustration, Figure 4 clearly presents the enhancement in the performance of the two discussed cases, for N RF rx = 8, where the case of partial CSI gives better results 4 dB for BER = 10 −3 compared with the random precoding approach, which is expected because the information about the channels at the terminals is quite small.  For more illustration, Figure 4 clearly presents the enhancement in the performance of the two discussed cases, for 8 RF rx N = , where the case of partial CSI gives better results 4 dB for BER = 3 10 − compared with the random precoding approach, which is expected because the information about the channels at the terminals is quite small.    antennas, we can see from both cases, as the number of RF chains increases, the diversity order increases the performance toward the full digital curve. Nevertheless, the AoD-based precoder (case b) clarifies the enhancement of the system performance, that is, the slope will be increased owing to the diversity gain more than in the first case (a).
(a)  Similarly, Figure 5 also compares the BER of the hybrid and the full digital schemes, for two types of precoders: (a) random-based precoder and (b) AoD-based precoder, where N tx = 16, N rx = 64 and N RF rx = 4,8,12,16,64. Even with the increasing number of transmitting and receiving antennas, we can see from both cases, as the number of RF chains increases, the diversity order increases the performance toward the full digital curve. Nevertheless, the AoD-based precoder (case b) clarifies the enhancement of the system performance, that is, the slope will be increased owing to the diversity gain more than in the first case (a).  antennas, we can see from both cases, as the number of RF chains increases, the diversity order increases the performance toward the full digital curve. Nevertheless, the AoD-based precoder (case b) clarifies the enhancement of the system performance, that is, the slope will be increased owing to the diversity gain more than in the first case (a).
(a)       Figure 6 compares the BER performance of the proposed hybrid AoD-based precoder with the full digital system, in the case in which we have N tx = 16,N rx = 128 and for different numbers of RF chains N RF rx = 4, 8, 16, and 32. It can be observed from these results that increasing the number of receiving antennas gives an obvious improvement in the results, and thus the BER performance of the AoD precoder is improved, especially when the number of RF chains increases to approximately match the full digital curve.     In Figure 7, we present the semi-analytical curves for the AoD-based precoder case, with N tx = 16 and N rx = 64, presenting results for both N RF rx = 4 and 8. The theoretical curve totally overlaps with the simulation one when we have 8 or 12 RF chains, while there is a slight gap between the theoretical and simulated curve in the case of N RF rx = 4 .
Electronics 2020, 9, x FOR PEER REVIEW 17 of 20 In Figure 7, we present the semi-analytical curves for the AoD-based precoder case, with  Finally, in Figure 8, the impact of the path loss and the shadowing effects is evaluated. As we can see, the BER performance improves as the number of RF chains increases, as we saw for scenarios without path loss and showing effects. Therefore, we can draw approximately the same conclusions that we obtained before.  Finally, in Figure 8, the impact of the path loss and the shadowing effects is evaluated. As we can see, the BER performance improves as the number of RF chains increases, as we saw for scenarios without path loss and showing effects. Therefore, we can draw approximately the same conclusions that we obtained before.  Finally, in Figure 8, the impact of the path loss and the shadowing effects is evaluated. As we can see, the BER performance improves as the number of RF chains increases, as we saw for scenarios without path loss and showing effects. Therefore, we can draw approximately the same conclusions that we obtained before.

Conclusions
In this paper, we considered the design of a hybrid analog-digital MU equalizer with a simple analog precoder scheme for mmWave mMIMO GFDM-based systems. Herein, the GFDM waveform technique is proposed to overcome the high PAPR and OOB emissions of OFDM. At the transmitter, the phase shifter network is used as a simple analog precoder, where two types of precoders based on the availability of the CSI knowledge at the transmitter are used: random-and AoD-based precoder. At the receiver, a hybrid digital-analog equalizer was designed to efficiently separate users. The analog coefficients were obtained by minimizing the MSE between the hybrid structure and the full digital counterpart, noting that the GFDM structure and ZF are used in the digital part. The performance results showed that the performance gap between the hybrid equalizer and the full digital scheme decreases as the number of RF chains increases.