Multi-User Linear Equalizer and Precoder Scheme for Hybrid Sub-Connected Wideband Systems

Millimeter waves and massive multiple-input multiple output (MIMO) are two promising key technologies to achieve the high demands of data rate for the future mobile communication generation. Due to hardware limitations, these systems employ hybrid analog–digital architectures. Nonetheless, most of the works developed for hybrid architectures focus on narrowband channels, and it is expected that millimeter waves be wideband. Moreover, it is more feasible to have a sub-connected architecture than a fully connected one, due to the hardware constraints. Therefore, the aim of this paper is to design a sub-connected hybrid analog–digital multi-user linear equalizer combined with an analog precoder to efficiently remove the multi-user interference. We consider low complexity user terminals employing pure analog precoders, computed with the knowledge of a quantized version of the average angles of departure of each cluster. At the base station, the hybrid multi-user linear equalizer is optimized by using the bit-error-rate (BER) as a metric over all the subcarriers. The analog domain hardware constraints, together with the assumption of a flat analog equalizer over the subcarriers, considerably increase the complexity of the corresponding optimization problem. To simplify the problem at hand, the merit function is first upper bounded, and by leveraging the specific properties of the resulting problem, we show that the analog equalizer may be computed iteratively over the radio frequency (RF) chains by assigning the users in an interleaved fashion to the RF chains. The proposed hybrid sub-connected scheme is compared with a fully connected counterpart.


Introduction
Mobile data traffic has increased over the years and the next generation (5G) is a response to this demand for higher data rates [1,2].It is expected that 5G will achieve a minimum data rate of 1 Gb/s, 5 Gb/s for high mobility users, and 50 Gb/s for pedestrian users [3,4].The frequency spectrum used by the mobile communications is currently saturated, so it is necessary to find another spectrum band for mobile applications.In this context the millimeter wave (mmW) band, with its huge available bandwidth [5], with wavelengths from 1 to 10 millimeters [6], can be an interesting solution.Other key technology for future generation communications is massive multiple-input-multiple-output (mMIMO), allowing achievement of higher data rates and better energy efficiency (EE) when compared to the previous generation, 4G-Long Term Evolution (LTE) [7].
The combination of mmW with mMIMO systems enables the use of a large antenna array at the base station (BS) and user terminals (UT) [8].Massive MIMO, also known as large-scale antenna systems (LSAS), beyond improving EE, can also improve the spectrum efficiency (SE) of the mobile communications systems [9].SE is independent of the number of antennas employed at the BS and grows with the increase of the number of radio frequency (RF) chains, as discussed in [10].It is well known that when we have a large number of antennas, it is not feasible to have one dedicated RF chain per antenna, and consequently, a full digital beamforming (BF) architecture is not realistic due to the higher costs and power consumption [9,11].On the other hand, a system that works only in the analog domain, by employing full analog BF, is not feasible due to the availability of only quantized phase shifters and the constraints on the amplitudes of these analog phase shifters.As a result, the fully analog architecture is normally limited to a single-stream transmission [12].One possible solution to overcome these limitations is to consider hybrid digital and analog BF where the signal is processed at both analog and digital levels.When this hybrid architecture is compared with the fully digital one, the performance of the hybrid solution is limited by the number of RF chains, but it is possible to design efficient signal processing schemes to achieve a performance close to the fully digital counterpart [13].

Previous Work on Fully Connected Architectures
Some fully connected hybrid beamforming architectures for narrowband single-user systems were discussed in [14][15][16].The work presented in [14] considered a transmit precoding and a receiver combining scheme for mmW mMIMO.In this work the spatial structure of the mmW channels was exploited to design the precoding/combining schemes as a sparse reconstruction problem.In [15] an iterative turbo-like algorithm was proposed that found the near-optimal pair of analog precoder/combiner.A matrix decomposition method that could convert any existing precoder/combining design for the full digital scheme into an analog-digital precoder/combining for the hybrid architecture was addressed in [16].Approaches for narrowband multi-user systems were also considered in [17][18][19][20][21][22].A limited feedback hybrid analog-digital precoding/combining scheme for multi-user systems was addressed in [17].A heuristic hybrid BF was addressed in [18], where the proposed design could achieve a performance close to the fully digital BF with low-resolution phase shifters.A hybrid analog-digital precoding/combining multi-user system based on the mean-squared error (MSE) was proposed in [19].The authors of [20] designed an iterative hybrid analog-digital equalizer that efficiently removed the multi-user interferences.In [21], an iterative precoder and combiner design was proposed by exploiting the duality of the uplink and downlink multi-user MIMO channels.In [22], a hybrid beamforming system based on a dual polarized array antenna was proposed for single user systems.The hybrid architecture for either single or multi-user the mmW mMIMO wideband systems was considered in [23][24][25].Precoding solutions with codebook design for limited feedback spatial multiplexing in single user wideband mmW were developed in [23].For the multi-user case, statistical MIMO orthogonal frequency division multiplexing (OFDM) beamformers without instantaneous channel information were designed in [24], where the beams were formed using the dominant eigenvectors to select the main directions.In [25], a downlink MIMO-OFDM hybrid multi-user precoder based on the vector quantization concept was proposed, where the total transmit power was minimized.

Previous Work on Sub-Connected Architectures
The previous works mainly focused on fully connected architectures.However, sub-connected architectures, where each RF chain is only connected to a subset of the available antennas, is more suited for practical applications due to its lower complexity.Narrowband fixed sub-connected hybrid architectures for single user systems were addressed in [26,27].The authors of [26] proposed a two-layer optimization method jointly exploiting the interference alignment and fractional programming principles.First, the analog precoder and combiner were optimized via the alternating-direction optimization method and then the precoder and combiner were optimized based on an effective MIMO channel coefficient.In [27] two analog precoder schemes for high and low signal-to-noise ratio (SNR) condition were developed.For multi-user sub-connected narrowband architecture, some approaches were also proposed in [28][29][30][31][32].In [28], the total achievable rate optimization problem with nonconvex constraints was decomposed into a series of sub-rate optimization problems for each sub-antenna array, and then a successive interference cancelation (SIC) based hybrid precoder was proposed.A low-complexity hybrid precoding and combining design was discussed in [29], where a virtual path was performed to maximize the channel gain and then, based on the effective channel, a zero-forcing precoding was applied to manage the interference.The scheme proposed in [30] efficiently controlled the multi-user interference by sequentially computing the analog part of the equalizer over the RF chains, using a dictionary obtained from the array response vectors.In [31], the Gram-Schmidt (GS) based antenna selection (AS) algorithm was used to obtain an appropriate antenna subset for the overlapped, interlaced, and dynamic architectures.Solutions for wideband sub-connected hybrid architectures were also considered in [32,33].In [32], solutions for fully connected, fixed, and dynamic subconnected OFDM single user hybrid precoding were designed to maximize the sum rate.Precoding techniques for multi-user downlink massive mmWave MIMO-OFDM systems were proposed in [33].
A unified heuristic design for both fully connected and sub-connected hybrid structures was developed by maximizing the overall spectral efficiency.

Main Contributions
The previous works considered mainly a sub-connected hybrid architecture for single and multi-user narrowband systems.The works for wideband sub-connected hybrid architectures are very scarce in the literature and they mainly focus on solutions for the downlink considering OFDM modulations.To the best of our knowledge, sub-connected hybrid approaches for uplink of multi-user wideband mmWave massive MIMO systems have yet to be addressed in the literature.Therefore, in this paper we aim to fill this gap and design an efficient hybrid multi-user equalizer combined with a pure analog precoder for sub-connected uplink mmW massive MIMO single-carrier frequency division multiple access (SC-FDMA) systems.We consider single RF UTs employing a low complexity, yet efficient, analog precoder approach based on the knowledge of partial channel state information (CSI), i.e., only a quantized version of the average angle of departure (AoD) of each cluster is considered.The hybrid multi-user linear equalizer employed at the BS is optimized by using the bit-error-rate (BER) as a metric over all the subcarriers.We assume that the digital part of the equalizer is computed on a per subcarrier basis while the analog part is constant over the subcarriers.The analog domain hardware constraints considerably increase the complexity of the corresponding optimization problem.To simplify it, the merit function is first upper bounded, and by leveraging the specific properties of the resulting problem, we show that the analog equalizer may be computed iteratively over the RF chains by assigning the users in an interleaved fashion to the RF chains, using a dictionary built from the array response vectors.The results show that the performance penalty of the sub-connected multi-user equalizer approach to the fully connected counterpart decreases as the number of RF chains increases.

Organization and Notation
This paper is organized as follows: Section 2 describes the transmitter, channel, and receiver system model.In Section 3 the analog precoder employed at each UT is described, while in Section 4 the sub-connected hybrid analog-digital multi-user equalizer is derived.In Section 5, the main performance results are presented.Finally, the conclusions are discussed in Section 6.
The following notation is used in this paper: boldface uppercase letters, boldface lowercase letters, and italic letters denote matrices, vectors and scalars, respectively.The operations (.) T , (.) H , (.) * , and tr(.) represent the transpose, the Hermitian, the conjugate, and the trace of a matrix, respectively.The operator diag(A) corresponds to the diagonal entries of the matrix A. The identity matrix of size N × N is denoted I N .E[.] and {α l } L l=1 represent the expectation operator and an L length sequence, respectively.|a| denotes the absolute value of a. [A] n,l represents the entry of the nth row and lth column of the matrix A. The indices, t, k, and u represent the time domain, subcarrier in the frequency domain, and user terminal, respectively.

System Model
In this section, we describe the transmitter, the channel model, and the receiver for the considered uplink massive MIMO mmW SC-FDMA system.

Transmitter Description
We assume U UTs sharing the same radio resources, each equipped with N tx transmit antennas and with a single RF chain.Figure 1 presents the general schematic of the uth user terminal.Firstly, the time domain N c -length sequence s u,t is the discrete Fourier transform (DFT) of the time domain sequence s u,t .After that, the frequency domain data is interleaved and mapped to the OFDM symbol.To simplify the formulation, we assume that N s = N c , which means that only a single N c -length block is considered and assume the identity mapping.Therefore, the frequency domain sequence c u,k is the DFT of the full-time sequence s u,t N c −1 t=0 .After the cyclic prefix (CP), an analog precoder f a,u ∈ C N tx is employed.Due to the hardware constraints, we only consider analog phase shifters that force all coefficients of the precoder to have equal magnitude, i.e., f a,u 2 = 1/N tx , and furthermore, it is assumed that they are constant over the subcarriers.Therefore, the discrete transmit complex baseband signal x u,k ∈ C N tx of the uth user at subcarrier k can be represented as where c u,k ∈ C. The design of the analog precoder coefficients will be presented in the next Section.We assume that the number of users is lower than the number of RF chains (N RF ) at the receiver, U ≤ N RF .

System Model
In this section, we describe the transmitter, the channel model, and the receiver for the considered uplink massive MIMO mmW SC-FDMA system.

Transmitter Description
We assume U UTs sharing the same radio resources, each equipped with tx N transmit antennas and with a single RF chain.Figure 1 presents the general schematic of the uth user terminal.
Firstly, the time domain , where { } , and furthermore, it is assumed that they are constant over the subcarriers.Therefore, the discrete transmit complex baseband signal of the uth user at subcarrier k can be represented as where , u k c ∈ .The design of the analog precoder coefficients will be presented in the next Section.
We assume that the number of users is lower than the number of RF chains ( RF N ) at the receiver, RF U N ≤ .

Channel Model Description
We assume a channel given by the sum of the contribution of cl N clusters; each one contributes with ray N propagation paths.The considered delay-d MIMO channel matrix of the uth user can be written as Schematic of the uth user terminal transmitter.CP is cycle prefix, RF is radio frequency, FFT and IFFT are the fast Fourier transform and inverse fast Fourier transform.

Channel Model Description
We assume a channel given by the sum of the contribution of N cl clusters; each one contributes with N ray propagation paths.The considered delay-d MIMO channel matrix of the uth user can be written as and the corresponding frequency domain channel matrix H u,k ∈ C N rx ×N tx of the uth user at the kth subcarrier is given by where N rx represents the number of receive antennas, ρ PL represents the path-loss between the transmitter and the receiver, α u q,l is the complex path gain of the lth ray in the qth scattering cluster, and a raised-cosine filter is adopted for the pulse shaping function p rc (.) for T S -spaced signaling, as in [22].The qth cluster has a time delay τ u q , angles of departure θ u q , and arrival φ u q .Each ray l from qth cluster has a relative time delay τ u q,l , relative angles of departure ϑ u q,l , and arrival ϕ u q,l .The paths delay is uniformly distributed in [0, DT s ] where D denotes the length of the CP, and the angles follow the random distribution mentioned in [22], such that E[ H u,d Finally, the vectors a rx,u and a tx,u represent the normalized receive and transmit array response vectors, respectively.For an N-element uniform linear array (ULA), the array response vector can be given by where k = 2π/λ, λ is the wavelength, and p is the inter-element spacing.The channel matrix of the uth user can also be expressed as where ∆ u,k is a diagonal matrix, with entry (q, l) that corresponds to the path gain of the lth ray in the qth scattering cluster.
] hold the transmit and receive array response vectors of the uth user, respectively.

Receiver Description
At the receiver we consider a hybrid analog-digital sub-connected architecture, where each RF chain is connected into a group of R = N rx /N RF antennas, where N RF is the number of RF chains, as represented in Figure 2. We assume that the number of RF chains is lower than the number of receive antennas, N RF ≤ N rx .
and the corresponding frequency domain channel matrix of the uth user at the kth subcarrier is given by where rx N represents the number of receive antennas, PL ρ represents the path-loss between the transmitter and the receiver, , u q l α is the complex path gain of the lth ray in the qth scattering cluster, and a raised-cosine filter is adopted for the pulse shaping function

( )
. rc p for S T -spaced signaling, as in [22].The q th cluster has a time delay u q τ , angles of departure u q θ , and arrival u q φ .Each ray l from qth cluster has a relative time delay , u q l τ , relative angles of departure , u q l ϑ , and arrival , u q l ϕ .The paths delay is uniformly distributed in [ ] 0, s DT where D denotes the length of the CP, and the angles follow the random distribution mentioned in [22], such that For an N-element uniform linear array (ULA), the array response vector can be given by where 2 / k π λ = , λ is the wavelength, and p is the inter-element spacing.The channel matrix of the uth user can also be expressed as where Δ is a diagonal matrix, with entry ( ) , q l that corresponds to the path gain of the lth ray in the qth scattering cluster. ] ] hold the transmit and receive array response vectors of the uth user, respectively.

Receiver Description
At the receiver we consider a hybrid analog-digital sub-connected architecture, where each RF chain is connected into a group of   The frequency domain received signal at the kth subcarrier y k ∈ C N rx can be written as where n k ∈ C N rx is the zero mean Gaussian noise with variance σ 2 n , and x u,k and f u,k represent the discrete transmit complex baseband signal and analog precoder of the uth user at subcarrier k, respectively.We consider a sub-connected hybrid analog-digital multi-user equalizer to efficiently separate the users, as shown in Figure 2. Initially, the signal is processed through the phase shifters modeled by the vector w a,r ∈ C R , where all elements of w a,r have equal magnitudes ( w a,r (n) 2 = 1/N rx ) .The overall analog matrix W a ∈ C N rx ×N RF that represents the connection between each subset of N rx /N RF antennas and the corresponding N RF chain, has a block diagonal structure W a = diag w a,1 , . . ., w a,r , . . ., w a,N RF , r = 1, . . ., N RF .(7) As in the analog precoder, we also assume that the analog part of the equalizer is constant for all subcarriers.
After that, the CP is removed on each RF chain and the signal is moved to the frequency domain by applying the DFT operator.Then, the samples of each subcarrier pass through the digital part of the equalizer modeled by matrix W d,k ∈ C N RF ×U .Therefore, the resulting signal at the end of the analog and digital processing equalizer can be written as where

Analog Precoder Design
In this section, we design a low complexity analog precoder to be employed at the transmitters.These precoders are computed based on the knowledge of partial CSI, i.e., only a quantized version of the average AoD θ u q , q = 1, . . .N cl of each cluster is used.These angles estimated at the receiver are quantized as and then sent to the transmitters.In this paper, for the sake of simplicity, we consider uniform quantizers, i.e., the uniform function f Q has 2 n (n is the number of quantization bits) levels equally spaced between clipping levels −A m and A m .With the knowledge of these quantized angles, user u should start by computing the correlation , where the overall matrix A tx,u ∈ C N tx ×N cl is given as with a tx,u ( θ u q ) computed from 1, e jkd sin( θ u q ) , . . ., e j(N tx −1)kd sin( θ u q ) .( To compute the analog precoders, we first need to apply the eigenvalue decomposition to the correlation matrix R u , i.e., R u = U tx,u Λ tx,u U H tx,u , where Λ tx,u is a diagonal matrix whose elements are the matrix R u eigenvalues and U tx,u a square matrix where the ith column is the ith eigenvector of R u .Finally, entry n tx of the proposed analog precoders of the uth user is set as where arg(a) denotes the argument of complex number a and U tx,u (n tx , 1) , n tx = 1, . . ., N tx represents entry n tx of the eigenvector corresponding to the largest eigenvalue of the correlation matrix R u .Hence, the beam follows the best channel direction, improving the transmit/receive link reliability.

Multi-User Equalizer Design
In this section, we design a hybrid analog-digital sub-connected equalizer for multi-user mmW mMIMO to be employed at the receiver side.A decoupled transmitter-receiver optimization problem is assumed in this paper, since a joint optimization problem is a very complex task.The overall analog matrix W a defined in (7) and the digital matrices W d,k N c −1 k=0 are optimized by minimizing the BER, which is equivalent to minimize the MSE.

Problem Formulation
It can be shown that the digital part of the equalizer that minimizes the BER is given by since it maximizes the overall signal-to-interference-plus-noise-ratio (SINR) of the uth user at time t, SINR u,t , i.e., the SINR relatively to data symbol s u,t [23].
Let us now describe the method to compute the analog part of the considered sub-connected architecture.By using the matrix inversion lemma [34], the overall analog-digital equalizer matrix simplifies to Assuming quadrature phase shift keying (QPSK) constellations for simplicity and without loss of generality, the average BER can be written as where Q represents the well-known Q-function and with the SINR u,t given by where [A] u,u represents the entry of the uth row and column of the matrix A. From (16) we see that SINR u,t is independent of the time index.So, we can simplify (15) as with SINR u = SINR u,1 = SINR u,2 = . . .= SINR u,N c .The optimization problem to compute the analog part of the equalizer may be mathematically formulated as (W a ) opt = arg min where W a = W a : W a = diag[w a,1 , . . ., w a,r , . . ., w a,N RF ], w a,r (n) 2 = 1/N rx denotes the feasible set for the analog equalizer.

Proposed Method Derivation
Due to the non-convex nature of the merit function and the constraint imposed in problem (18), it is difficult or even impossible to obtain an analytical solution to the optimization problem at hand.Moreover, as we are considering a multi-user scenario, the resulting average BER is a weighted function of the average BER for each user, making it even harder to obtain a solution to the aforementioned problem.Hence, instead of an exact solution to problem (18) we will derive, in the following, an algorithm to obtain an approximate solution to the previous optimization problem.
Using the exponential upper bound of the Q-function −1 , and as a consequence we have Replacing Equation ( 16) in Equation ( 19), and after some mathematical manipulations, we obtain, and then an approximate solution to the optimization problem ( 18) may be obtained from the following simplified optimization problem where h eq,u,k ∈ C N rx represents the equivalent channel of user u.To solve it, we propose to iteratively compute matrix W a column by column, which in practice corresponds to iteratively adding RF chains to the receiver.Let matrix W (i) a and vector w a denote the first i columns and column i of matrix W a , respectively; then we can define a iteratively instead to compute the overall matrix W a at once, and thus the optimization problem can be modified as From the definition of W (i) a and the Gram-Schmidt orthogonalization follows , where a is a block diagonal matrix and therefore P (i) and U (i) are also block diagonal.Therefore, the term in the denominator of the merit function of the optimization problem (22) can be simplified to Notice that the first term in Equation ( 23) is constant at iteration i, since it does not depend on vector w a , and is equal to zero for the first iteration.Furthermore, as Electronics 2019, 8, 436 9 of 16 with equality if Therefore, if one of the terms of Equation ( 23) is large, the other one must be small and the following approximation follows, Nonetheless, due to the independence of the U user channels, if w a leads to a large value for the term h H eq,u,k P (i−1) w (i) a w (i) H a P (i−1) H h eq,u,k for user u i = u, then with high probability for the other users u u i , the value of this term will be small.For this reason, S u,k may be approximated as follows Therefore, the optimization problem ( 22) can be approximated by where is a constant and thus it only depends on the channels h eq,u,k and matrix U (i−1) computed in the previous iteration.W a,u i represents the column i of the elements of set W a .Therefore, the optimization problem ( 22) may be simplified to In spite of the previous simplifications, the optimization problem is still non-convex and hard to solve due to the constraint w a ∈ W a,u i .Therefore, to further simplify it, we replace the set W a,u i by the codebook F a,u i = D u i A rx,u i , where D u i is a block diagonal matrix where all blocks are zero except u i , which is equal to the identity matrix, i.e., the elements of the codebook are the normalized receiver array response vectors, which leads to the simpler optimization problem Notice that we need to follow some criterion to associate a given user to iteration i.We propose to do this association using the following mapping between users and iterations u i = imodU, i.e., the U users are interleaved along the iterations.For example, for N RF = 4 and U = 2, we have The procedure to obtain the analog part of the equalizer matrix is presented in Algorithm 1.It can be summarized as follows: Firstly we start with user 1, U (0) = 0 (line 1) and compute the projection matrix P (1) (line 4).After that, we compute the merit function of the optimization problem (29) for each element of the codebook F a,u 1 (lines 5-9).Vector w (1) a is set to be equal to the element of codebook F a,u 1 with the lowest value (lines 10-11).Then, the column vector P (0) w (1) a is added to matrix U (0) to form U (1) (line 12).With U (1) , the same procedure may be repeated for the other users according to the mapping between users and iterations defined by u i = imod U.
To compute the optimization problem of (29), we need to compute the correlation matrix a for all the elements of the selected codebook F a,u 1 , which may be accomplished with the following expression: Notice that P (i) P (i) = P (i) since P (i) is an idempotent matrix.Nonetheless, as the Gram-Schmidt procedure may lead to a loss of orthogonality among vectors [35], we use a Gram-Schmidt algorithm with reorthogonalization that amounts to applying two times the projection matrix P (i) or using (P (i) ) 2 instead of P (i) .
Algorithm 1: The proposed analog-digital multi-user linear equalizer algorithm for sub-connected architecture Analog Part of the equalizer 9: end for 10: (q, l) = arg min f (i)   11: w

Complexity Analysis
The steps presented in Algorithm 1 describe the procedure to obtain the analog and digital parts of the proposed equalizer.In the following, the computational complexity of Algorithm 1 is analyzed.Matrix P (i) at line 4 may be computed with O(R 2 ) complexity, with R = N rx /N RF , since matrix U (i−1) is block diagonal and each column has only R non-zero elements.To obtain vector r i,k at line 7, the vector A H rx,u i D u i P (i−1) H h eq,u i ,k must be calculated.As both D u i and P (i) are block diagonal with N RF blocks of size R = N rx /N RF and D u i is a block diagonal matrix where all blocks are zero except u i , which is equal to the identity matrix, then the product D u i P (i−1) H h eq,u i ,k requires O(R 2 ) complexity.As the resulting vector has only R non-zero elements and A rx,u i is a matrix with dimension N rx × N cl N ray , A H rx,u i D u i P (i−1) H h eq,u i ,k may be computed with O(R 2 + N cl N ray R) complexity.As the computation done in line 7 is repeated for all N c subcarriers and all RF chains, the overall complexity of lines 6-9 is O(N c (R 2 + N cl N ray )).Line 10 requires the computation of the minimum of vector f (i) ∈ R N cl N ray , whose complexity is O(N cl N ray ).In line 11 we compute vector w (i) a with complexity O(R 2 ).The complexity of line 12 is identical to line 11.As the previous must be repeated for all RF chains, the overall complexity is N RF times the individual complexities previously described.The computation of the digital equalizer must be performed for all subcarriers and requires the inversion of a matrix with size N RF × N RF , resulting in a complexity O(N c N 3 RF ).Therefore, the computational complexity of the proposed algorithm is linear in the number of subcarriers, quadratic with the number of antennas per RF chain, and cubic with the number RF chains.The full connected architecture would require a complexity scaling quadratically with the number of antennas and for the full-digital architecture the complexity is a cubic function of the number of antennas.

Performance Results
In this section, we evaluate the performance of the proposed multi-user linear equalizer and precoder scheme designed for hybrid sub-connected wideband mmW systems.
The carrier frequency was set to 72 GHz and for each user, the clustered wideband channel model was considered, as discussed in Section 2, with five clusters N cl = 5, all with the same average power, such that E H u,d | 2 F = N rx N tx , and each one contributed with N ray = 3 propagation path.The path delays were uniformly distributed in the CP interval.We considered a ULA with antenna element spacing set to half-wavelength, but it should be emphasized that the schemes proposed in this paper can be applied to any antenna arrays.The azimuth angles of departure and arrival had a Laplacian distribution as in [20] and were considered to have an angle spread of 10 • for both the transmitter and receiver.It was assumed a QPSK modulation, perfect synchronization, and that the CSI is known at the receiver side.At the transmitter, only a quantized version of the average angle of departure of each cluster was known.We assumed N c = 64 subcarriers, and the CP was set to be a quarter of the number of subcarriers, such that D = N c /4 = 16.We considered a Monte-Carlo simulation with a length of 100,000 SC-FDMA blocks.The average BER was considered the performance metric, presented as a function of E b /N 0 , where E b is the average bit energy and N 0 is the one-sided noise power spectral density.We considered that the average E b /N 0 was identical for all the users u and is given by E b /N 0 = 1/(2σ 2 n ).We considered that each transmitter had a single RF chain and was equipped with N tx = 8 antennas.At the receiver side, it was assumed that a sub-connected architecture existed, where each RF chain was connected to a group of R = N rx /N RF antennas, with N rx = 16 antennas.The results were compared with the fully connected counterpart that could be obtained from the proposed one by relaxing the optimization constraints, since each RF chain was connected to all antennas.The main simulation parameters are presented in Table 1. Figure 3 depicts the results for the proposed hybrid sub-connected multi-user equalizer with the analog precoder for two, four, and eight users.In this figure, it was assumed perfect knowledge of the average AoD of each cluster.It was also assumed eight RF chains existed, which meant that each one was connected to two antennas.As it can be seen in Figure 3, the performance of both sub-and fully connected improved as the number of users decreased, as expected, since the multi-user equalizer had to deal with less interference and the available degrees of freedom could be used to provide more diversity.We could also observe a performance penalty of the sub-connected approach against the fully connected one of approximately 2 dB, independently of the number of users, at a target BER of 10 −3 .This was because the number of connections of the fully-connected architecture was larger than the number of connections for the sub-connected architecture and, as expected, the result for the fully-connected one was better than for the sub-connected architecture.The worst performance, for both fully and sub-connected approaches, was obtained for the full load case, i.e., when the number of users was equal to the number of RF chains N RF = U. one was connected to two antennas.As it can be seen in Figure 3, the performance of both sub-and fully connected improved as the number of users decreased, as expected, since the multi-user equalizer had to deal with less interference and the available degrees of freedom could be used to provide more diversity.We could also observe a performance penalty of the sub-connected approach against the fully connected one of approximately 2 dB, independently of the number of users, at a target BER of 3 10  .This was because the number of connections of the fully-connected architecture was larger than the number of connections for the sub-connected architecture and, as expected, the result for the fully-connected one was better than for the sub-connected architecture.The worst performance, for both fully and sub-connected approaches, was obtained for the full load case, i.e., when the number of users was equal to the number of RF chains RF NU  .In Figure 4 we present results for different numbers of RF chains and two users.If the number of antennas ( R ) connected to each RF chain was reduced, we verified that the penalty for fully digital approach decreased.We could observe a penalty of approximately 3 dB, 2 dB, and 0 dB for   , respectively (BER of 3 10  ).This happened because more RF chains and consequently less antennas per chain increase the number of available degrees of freedom of the sub-connected architecture.For the extreme case of 1 R  (one RF chain per antenna) the curve obtained for the sub-connected approximately overlapped the one obtained for the fully connected.In Figure 4 we present results for different numbers of RF chains and two users.If the number of antennas (R) connected to each RF chain was reduced, we verified that the penalty for fully digital approach decreased.We could observe a penalty of approximately 3 dB, 2 dB, and 0 dB for (R = 8, N RF 2), (R = 2, N RF = 8), and (R = 1, N RF = 16), respectively (BER of 10 −3 ).This happened because more RF chains and consequently less antennas per chain increase the number of available degrees of freedom of the sub-connected architecture.For the extreme case of R = 1 (one RF chain per antenna) the curve obtained for the sub-connected approximately overlapped the one obtained for the fully connected.
As seen in Figures 5 and 6, we evaluated the impact of imperfect knowledge of the average AoD at the transmitter side.To compute the analog precoders we assumed the knowledge of only a quantized version of the average AoD of each cluster, as discussed in Section 3. We presented results for n = [2, 4, 6] quantization bits.Figures 5 and 6 depict the results for two and eight users, respectively.As expected, increasing the number of quantization bits improved the performance of the proposed sub-connected scheme and tended to the one achieved for perfect knowledge of the average AoD (n = ∞) for both cases U = 2, 8.When the number of bits in the quantizer was lower, the performance was worse compared to the perfect curve.In Figure 5 we can observe a performance penalty, for BER of 10 −3 , of approximately 5 dB, 1.5 dB, and 0 dB, for n = 2, 4, and 6, respectively.This means that a very limited number of bits for the quantization of the average AoD of each cluster was enough to get a performance close to the perfect case.Since the mmW channels were usually sparse, the amount of information needed to be fed back from the BS to the UTs was small.As seen in Figures 5 and 6, we evaluated the impact of imperfect knowledge of the average AoD at the transmitter side.To compute the analog precoders we assumed the knowledge of only a quantized version of the average AoD of each cluster, as discussed in Section 3. We presented results for   2,4,6 n  quantization bits.Figure 5 and Figure 6 depict the results for two and eight users, respectively.As expected, increasing the number of quantization bits improved the performance of the proposed sub-connected scheme and tended to the one achieved for perfect knowledge of the average AoD ( n ) for both cases 2,8 U  . When the number of bits in the quantizer was lower, the performance was worse compared to the perfect curve.In Figure 5 we can observe a performance penalty, for BER of 3 10  , of approximately 5 dB, 1.5 dB, and 0 dB, for n = 2, 4, and 6, respectively.This means that a very limited number of bits for the quantization of the average AoD of each cluster was enough to get a performance close to the perfect case.Since the mmW channels were usually sparse, the amount of information needed to be fed back from the BS to the UTs was small.

Conclusions
In this paper, we proposed an analog precoder combined with an efficient hybrid analog-digital multi-user equalizer for sub-connected mmW massive MIMO SC-FDMA systems.At the UT, we proposed a low complexity pure analog precoder that requires the knowledge of a quantized version

Conclusions
In this paper, we proposed an analog precoder combined with an efficient hybrid analog-digital multi-user equalizer for sub-connected mmW massive MIMO SC-FDMA systems.At the UT, we proposed a low complexity pure analog precoder that requires the knowledge of a quantized version of the average AoD of each cluster.At the BS, a hybrid analog-digital multi-user equalizer was developed for a sub-connected architecture.It was assumed that the analog part was constant over all subcarriers, while the digital part was computed on a per subcarrier basis.We considered a minimum MSE-based equalizer for the digital part, and the analog part was optimized using the average of all subcarriers as In order to simplify the optimization problem at hand, the merit function was first upper bounded and then, due to the properties the resulting problem, we showed that the analog part of the hybrid equalizer may be computed iteratively over the RF chains by assigning the users in an interleaved fashion to the RF chains.
The numerical results show that the proposed wideband hybrid multi-user linear equalizer is quite efficient at removing the multi-user and the performance tends to the one achieved by the fully connected counterpart as the number of RF chains increases.Furthermore, only a few bits are required for the quantization of the average AoD of each cluster to obtain a performance close to the perfect case.The small performance gap between the proposed sub-connected approach and the fully connected one, together with the lower complexity, make it a very interesting choice for practical systems.

− 1 k=
where s u,t rN s −1 t=(r−1)N s represents the rth data block.Then, this time domain sequence is moved to the frequency domain and the resulting sequence denominated by c u,k rN s the rth data block.Then, this time domain sequence is moved to the frequency domain and the resulting sequence denominated by { }

.
Fourier transform (DFT) of the time domain sequence , u t s .After that, the frequency domain data is interleaved and mapped to the OFDM symbol.To simplify the formulation, we assume that s c N N =, which means that only a single c N -length block is considered and assume the identity mapping.Therefore, the frequency domain sequence { } After the cyclic prefix (CP), an analog precoder, tx N a u ∈ f  isemployed.Due to the hardware constraints, we only consider analog phase shifters that force all coefficients of the precoder to have equal magnitude, i.e.,

Figure 1 .
Figure 1.Schematic of the uth user terminal transmitter.CP is cycle prefix, RF is radio frequency, FFT and IFFT are the fast Fourier transform and inverse fast Fourier transform.
receive and transmit array response vectors, respectively.
RFNis the number of RF chains, as represented in Figure2.We assume that the number of RF chains is lower than the number of receive antennas, RF r x N N ≤ .

Figure 2 .
Figure 2. Schematic of the receiver.

Figure 2 .
Figure 2. Schematic of the receiver.
represents the overall equivalent channel between the U users and the receiver, and c k ∈ C U denotes the frequency domain transmitted signal of all users at the kth subcarrier.Finally, the equalized signals are demapped and moved to the time domain by using the inverse DFT, obtaining the estimates s u,t N c −1 t=0 of the uth user transmitted N c -length data block s u,t N c −1 t=0 .

Figure 4 .
Figure 4. Performance of the proposed hybrid sub-connected schemes for

Figure 5 .
Figure 5. Performance of the proposed hybrid sub-connected scheme for different numbers of quantization bits of the average AoD, 2 U  .

Figure 5 .
Figure 5. Performance of the proposed hybrid sub-connected scheme for different numbers of quantization bits of the average AoD, U = 2.

Figure 5 .
Figure 5. Performance of the proposed hybrid sub-connected scheme for different numbers of quantization bits of the average AoD, 2 U  .

Figure 6 .
Figure 6.Performance of the proposed hybrid sub-connected scheme for different numbers of quantization bits of the average AoD, 8 U  .

Figure 6 .
Figure 6.Performance of the proposed hybrid sub-connected scheme for different numbers of quantization bits of the average AoD, U = 8.