Low Complexity Angular-Domain Detection for the Uplink of Multi-User mmWave Massive MIMO Systems

: To compromise between the system performance and hardware cost, millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems have been regarded as an enabling technology for the fifth generation of mobile communication systems (5G). This paper considers a low-complexity angular-domain compressing based detection (ACD) for uplink multi-user mmWave massive MIMO systems, which involves hybrid analog and digital processing. In analog processing, we perform angular-domain compression on the received signal by exploiting the sparsity of the mmWave channel to reduce the dimension of the signal space. In digital processing, the proposed ACD scheme works well with zero forcing (ZF)/maximum ratio combining (MRC)/minimum mean square error (MMSE) detection schemes. The performance analysis of the proposed ACD scheme is provided in terms of achievable rates, energy efficiency and computational complexity. Simulations are carried out and it shows that compared with existing works, the proposed ACD scheme not only reduces the computational complexity by more than 50%, but also improves the system’s achievable rates and energy efficiency.

(HBF) is advocated in [17][18][19][20], which uses a large number of phase shifters to implement analog processing and compensate for the high path loss in the mmWave bands. It uses a small number of RF chains for digital beamforming to perform advanced multiplexing/multi-user processing. Hybrid beamforming has been shown to achieve nearly the same spectral efficiency as full-digital beamforming [21].
To address the challenge of using limited transmit/receive RF chains, different schemes have been studied in the literature [19,20,[22][23][24][25]. A practical hybrid beamforming algorithm is proposed in [19], which, however, only has good performance in a few special cases. In other cases, there is a significant gap between the achievable rates of the proposed scheme and that of the full-digital beamforming scheme. In [20], a low complexity HBF algorithm is proposed for uplink multi-user mmWave MIMO by handling the inter-user interference in both analog and digital processing using a low complexity algorithm. However, this work uses a single-path channel model, which is not practical. The authors of [22] propose a fixed beamforming and channel state information (CSI) -based precoding (FBCP) scheme, which can achieve similar performance as full-digital beamforming with reduced complexity. The work of [23] takes advantage of the sparsity of mmWave channels and approximates the optimal unconstrained precoder using a low dimensional basis representation that can be efficiently implemented in hardware. In [24], the relationship between the number of RF links and the number of data streams is studied. It is proved that when the number of RF links is twice the number of data streams, the performance of full-digital beamforming can be achieved by hybrid design. In [25], the capacity optimization problem of sub-array hybrid beamforming is considered. Based on successive interference cancellation (SIC), an iterative method is proposed. The achievable rates are shown to be close to optimal when the number of sub-arrays is small. The authors of [22][23][24][25] focus on single-user hybrid beamforming and do not consider scenarios with multiple users.
One of the main challenges of the mmWave massive MIMO system is the processing complexity in uplink detection and downlink precoding at the base station, due to the large dimension of the channel matrix. The authors of [26] show the use of zero forcing (ZF) in massive MIMO could approach near-optimal performance. However, the computational complexity of ZF grows significantly with the size of the channel matrix. The authors in [27] present a framework for physically-accurate computational modeling and analysis of continuous aperture phased (CAP) MIMO, and report measurement results on a discrete lens array (DLA) based prototype for multimode line-of-sight communication. Similarly, according to the norm values of the beam space channel vector, an interference-aware beam selection method is proposed in [28]. In [29], three related beam selection schemes are introduced, and the computational complexity is greatly reduced. Although these works manage to reduce the complexity, the inter-user interference is not well-handled and thus the performance is not satisfactory in most cases.
To address the inter-user interference, a hybrid analog/digital process is obtained by alternating the minimization algorithm in [30]. The work in [31] uses a selective algorithm to generate a precoding matrix, which requires iterative calculation of the analog and digital processing matrices. However, the complexity of the proposed schemes in [30,31] grows significantly as the number of antennas or iterations increase. In [32], a low complex hybrid precoding scheme is developed, which directly takes the phase of conjugate-transposed channel matrix as the analog processing matrix, and then applies ZF for digital processing. A hybrid analog/digital processing scheme is proposed in [33] to optimize the energy efficiency in the downlink of mmWave non-orthogonal multiple access (NOMA) systems. In [34], a downlink massive multi-user MIMO system is considered, and a hybrid block diagonalization scheme is proposed to approach the system capacity of the traditional block diagonalization processing method. The work in [32][33][34] all consider the downlink of mmWave massive MIMO systems.
Regarding the angular-domain compress, several low-complexity detection schemes for mmWave massive MIMO systems have been proposed in [35,36] recently, which take advantage of channel sparsity and angular-domain signal processing. The authors of [35,36] propose the first angular-domain massive MIMO detection scheme, which transforms the received signal into angular domain by Fast Fourier Transform (FFT) and selects the strongest angles to reduce the signal dimension. ZF has been considered for digital baseband processing. The complexity is reduced compared with those hybrid schemes in [37], however, the performance is not very good since the angular resolution is fixed for FFT. In [38], the authors study the performance of angular-domain mmWave MIMO non-orthogonal multiple access systems in the presence of angular estimation error. A closed-form expression for the achievable rate is presented, based on which an asymptotic approximation is derived. The authors of [39] propose an angle-based beamforming scheme to reduce the feedback overhead. In [40], the authors propose an angular-domain hybrid precoding and channel tracking method by exploring the spatial features of the mmWave massive MIMO channel. In [41], the authors propose a novel angular-domain peak to average power ratio (PAPR) reduction technique in massive MIMO orthogonal frequency division multiplexing systems.
In this paper, we consider low complexity angular-domain based detection (ACD) for the uplink of multi-user mmWave MIMO systems. In contrast to previous works, we convert the received signal into the angular domain and reformulate the channel model, and then select the dominant angles that contain most of the signal power for angular-domain processing. The selected dominant angles are used to perform the dimensionality reduction operation to reduce the dimension of the effective channel, which greatly reduces computational complexity. In baseband processing, we compare the performance of the ACD scheme with ZF/maximum ratio combining (MRC)/minimum mean square error (MMSE) detection method. Computational complexity is analyzed and the total number of floating-point operations (FLOPs) is used to compare the proposed ACD scheme and that in [35,36]. Simulation results show that the proposed ACD scheme outperforms its counterparts in [35,36] significantly while the complexity is reduced by over 50%.
The main contributions of this work are summarized as follows: • A low-complexity angular-domain compressing based detection is proposed, which is superior to the existing angular-domain detection algorithms in [35,36] in terms of achievable rate, energy efficiency, and computational complexity; • The schemes in [35,36] have the complexity of order O(M 2 ), where M is the number of antennas. However, the computational complexity of the proposed ACD scheme is of order O(K 2 ), where K is the number of users; and it is not affected by the number of antennas, so it is more suitable for mmWave massive MIMO systems; • The proposed ACD algorithm uses ZF, MRC and MMSE detection methods for baseband detection processing. The achievable rates and energy efficiency are analyzed, and the superiority of the proposed ACD algorithm is verified through simulations.
The rest of the paper is organized as follows. Section 2 introduces the system model. In Section 3, the proposed low-complexity angular-domain detection scheme is described. Performance analysis is provided in Section 4. The numerical results are presented in Section 5 and conclusions are drawn in Section 6.
Notation: Boldface lower and upper case symbols represent vectors and matrices, respectively. (·) * , (·) T , (·) H denote the conjugate, the transpose and Hermitian transpose of a vector or a matrix, respectively. E{·} denotes the expectation operator. I M is the M-dimensional identity matrix. Tr(·) denotes the trace of a matrix. diag (X) denotes the diagonal matrix with the same diagonal elements as X. var(·) denotes the variance of a random variable. R ss E{ss H } denotes the covariance of random vector s.

System Model
Consider the uplink of a multi-user mmWave massive MIMO system consisting of a BS that has an array of M antennas and serves K (K < M) single-antenna user terminals simultaneously. We consider the a hybrid angular and baseband processing structure as depicted in Figure 1. The received signal first goes through a phase shifter network for angular-domain processing, which reduces the signal dimension from M to M . After signal space compression in the angular domain, baseband processing is carried out to recover the transmitted signal. The received M-dimensional signal y at the BS can be expressed as where H represents the M × K block fading channel matrix between the BS and all the users; x denotes the K × 1 vector of symbols transmitted by all users, which follows E(xx H ) = I; p u is the average transmitted power of each user; n ∼ CN (0, I) is the additive white gaussian noise vector. In order to model the mmWave propagation environment, we consider the widely used multipath channel model, where each propagation path between the BS and a user is related to a scatterer in the environment. We assume that scatterers seen by different users are independent, and this model is referred to as the independent multipath channel model [42]. Channel state information as well as AoAs can be obtained by channel estimation schemes based on compressed sensing in [43,44]. The channel vector of the k-th user is given by where L k is the number of propagation paths, which can be determined by compressed sensing channel estimation; α kl ∼ CN (0, 1) is the complex gain of the l-th path; and θ kl ∈ [0, 2π] denotes the angle of arrival (AoA) of the l-th path. Assume that a uniform linear array (ULA) is equipped at the BS, and then the steering vector a(θ) can be modeled as where ∆ is the antenna spacing normalized by carrier wavelength λ.
Since there is a large number of antennas at the BS, it is very computationally expensive to recover x from Equation (1). In this work, we aim to detect x in the angular domain with reduced complexity and improved performance compared with existing schemes in [35,36].

Low-Complexity Detection Based on Angular-Domain Compressing
In this section, we propose a low-complexity ACD scheme. We first reformulate the channel model in the angular domain and then select the dominant angles that contain most of the signal power for the design of the analog processing matrix. The proposed ACD scheme can work with baseband processing schemes like ZF, MRC and MMSE. This work differs from those in [35,36], where the angular resolution is fixed by FFT and the performance is thus not very good. Moreover, if the number of antenna increases, the dimension of the FFT matrix also grows, resulting in increased computational complexity. In contrast, we remove FFT and devise an ACD scheme based on an angular-domain channel model. We propose to choose the angles with strong received signal power and make a correlation judgment of the angle to ensure that the resulting effective channel matrix is full-rank.

Angular-Domain Compression
Let V F ∈ C M ×M represent the angular domain compressing matrix that reduces the dimension of the received signal space to M in the angular-domain by taking advantage of the sparsity of mmWave channels. Because it needs to support the data streams of K users, and to reduce the number of RF chains so as to simplify the baseband processing, the value of M should satisfy K ≤ M < M. The signal y R after compression is thus given as To reformulate the channel model in the angular domain, denote in which C is KL × K block diagonal matrix, and c k contains all path gains of the k-th user; A is of dimension M × KL. Due to the sparsity of mmWave channel, H is also sparse in the angular domain. The received signal power concentrates at a few angles of arrivals. Therefore, we propose signal space reduction in the angular domain to reduce the computational complexity. The proposed angular-domain signal compression contains three steps. Let us first compute A H A as where a(θ ij ) H a(θ mn ) is the correlation coefficient between the steering vectors correspond to the j-th Finally, we aim to reduce the number of effective AoA's form KL to M and design the angular-domain compressing matrix V F . To find the most significant AoA's, the received signal power is taken into consideration. Denote v sum ∈ R KL×1 as the row-wise summation of the absolute values of each element of A H AC. Obviously, the i-th (i = kL − L + l) element of v sum represents the received signal power on the l-th AoA of user k, i.e., θ kl . The angles corresponding to the M maximum values of v sum are selected. Denote these selected angles as θ 1 , . . . , θ M and V F is designed as In case two selected angles are so close that V F is not full row rank, an auxiliary parameter α lim ∈ (0, 1) is introduced. The absolute value of the correlation coefficient of any selected angles should be less than α lim .
The detailed description of the proposed ACD scheme is summarized in Algorithm 1. Figure 2 shows the procedure of the proposed ACD algorithm.  Compare the correlation between all valid angles and the angles corresponding to m.

13:
If the correlation is less than α lim , then the angle corresponding to m is recorded as the valid angle. 14: Use the m-th row of A H to design the next row of V F . 15: v sum (m) ← 0. 16: until z(M ) = null.

Baseband Processing
Based on Equation (4), any baseband processing scheme can be applied. Let V B ∈ C K×M be the baseband processing matrix. By taking the angular-domain compressing matrix into account, the estimate of the transmitted signal in based-band is given as DenoteH = V F H as the effective channel in the angular domain and we introduce three linear detection schemes for designing V B under the effective channelH, i.e., ZF, MRC and MMSE.
ZF detection eliminates interference between users, and the detection matrix is thus given by MRC detection maximizes the received signal to noise ratio (SNR), and we have The MMSE detection method minimizes the mean square error between the received signal and the transmitted signal, which is given by where Substituting Equations (7), (8) and (10) into (6) gives the estimate of the desired signal.

Performance Analysis
In this section, we analyze the performance of the proposed ACD scheme with ZF, MRC and MMSE in terms of achievable rates and computational complexity.

Uplink Achievable Rates
From Equation (6), the detected signal of the k-th user is given by where V B,k is the k-th row of V B , H k is the k-th column of H.
To derive the signal-to-interference-plus-noise ratio (SINR) of each user, we first derive the received signal power as Similarly, the interference power is obtained as and the noise power is derived as Combining results in Equations (12)- (14), the received SINR of user k is obtained as Note that Equation (15) is still a random variable because it is a function of the channel matrix. By taking expectation over H, we derive the uplink achievable rates of the k-th user as and an accurate approximation of R k has been provided in [45] and described as The uplink sum rate of all the users is thus given by Note that the expectation in Equation (17) is very difficult to derive for MRC, ZF and MMSE due to the complicated model of H. In the next section, we carry out extensive numerical simulations to show the superiority of the proposed ACD over its counterparts in the literature.

Computational Complexity
In this subsection, we analyze the complexity of the proposed ACD scheme and provide a comparison with existing angular-domain schemes. The total number of FLOPs involved in the algorithm is derived. In [46], some basic operation computational complexity is provided. We consider multiplication and addition operations and each real-valued multiplication or addition counts for 1 FLOP. Each complex-valued multiplication and addition counts for 6 FLOPs and 2 FLOPs, respectively. For basic matrix operations, the addition of two N × K real matrices requires NK FLOPs, while that of complex matrices is 2NK. The multiplication of an N × K and a K × M real matrix requires N M(2K − 1) FLOPs, while that of complex matrices is N M(8K − 2).
Since comparison and transposition operations are trivial compared with matrix operations, they are omitted in the following analysis. The complexity of the proposed ACD scheme is calculated as follows: • Calculation of conjugate-symmetric matrix A H A. The multiplication of a KL × M and an M × KL complex matrix requires computing 1 2 (1 + KL − 1) × (KL − 1) + KL complex-valued multiplications, so the complexity is 3K 2 L 2 + 3KL FLOPs; • Calculation of A H AC. The multiplication of a KL × KL and a KL × K complex matrix requires computing KL × K × L complex-valued multiplications and KL × K × (L − 1) complex-valued additions, so the complexity is 8K 2 L 2 − 2K 2 L FLOPs; • Calculating v sum is to sum the KL × K real matrix by row, which requires KL(K − 1) FLOPs.
Therefore, the total number of FLOPs of ACD is 11K 2 L 2 − K 2 L + 2KL. The proposed ACD scheme has a complexity of order O(K 2 ) and O(L 2 ). In the mmWave massive MIMO systems, due to the sparsity of the channel, the value of L can be determined by compressed sensing based channel estimation. Therefore, the complexity of AoA estimation is not high compared with conventional schemes. When compared with K, the value of L is small, so L slightly affects the complexity of the proposed ACD scheme, which mainly depends on K. In the ACD scheme, we transform the signal processing from the antenna domain to the angular domain, so that the complexity is related to the number of selected angles and has nothing to do with the number of antennas, and thus the complexity is significantly reduced.
The computational complexity of the proposed ACD scheme and that in [35,36] is summarized in Table 1 and also illustrated in Figure 3, where M = 64, L = 5, and K = 50 . It can be seen that, the complexity of schemes in [35,36] is almost the same and increase significantly with respect to the number of users. In contrast, the complexity of the proposed ACD scheme increases much slower with respect to K. When K = 50, the complexity of the proposed ACD is only around 50% of that for [35,36]. As will be shown in the numerical results, ACD has significantly better performance in terms of achievable rates with much lower complexity. The proposed ACD scheme achieves the trade off between the complexity reduction and performance loss compared to the previous work in [35,36]. Table 1. Complexity comparison.

Number of FLOPs
Scheme in [35]  Complexity (FLOPs) The Proposed ACD Figure 3. Computational complexity comparison between the proposed ACD and those in [35,36].

Numerical Results
In this section, we present numerical results to compare the performance of the proposed ACD scheme and that in [35,36] in terms of sum data rate. In all simulations, the number of path is L = 5 and the carrier frequency is f = 2.6 GHz [47].
The simulation results for the uplink achievable rates of three schemes with ZF detection are given in Figure 4, where M = 64, K = 10, M = 10, 16. It can be seen that the achievable rates of all schemes improves with respect to the SNR. Among all the curves, the proposed ACD scheme exhibits the best performance. When M = 10, the performance of ACD is even better than the scheme of in [35] with M = 16. The proposed ACD has a gain of around 3 and 16 dB over the schemes in [35,36], respectively.  It can be seen that the proposed ACD scheme shows the best performance. For the ACD and the scheme in [36], the curves of MMSE and ZF tend to coincide, because the V F V H F term in Equation (10) is too small for MMSE. It worth noting that the performance of ACD with MRC outperforms that in [35] with MMSE and ZF. Figure 6 shows the simulation results for the impact of channel estimation error on the uplink achievable rate, where K = 10, M = 64 and M = 16; e kl ∼ CN (0, 0.04) is the estimation error of α kl ; w kl ∼ CN (0, 0.01) is the estimation error of θ kl . Under the channel estimation error, the uplink sum rate of all schemes slightly decrease, and the one with the best performance is still the proposed ACD scheme. The Proposed ACD, MMSE The Proposed ACD, ZF Scheme in [36] , MMSE Scheme in [36] , ZF Scheme in [35] , MMSE Scheme in [35] , ZF The Proposed ACD , MRC Scheme in [36] , MRC Scheme in [35] , MRC  The Proposed ACD, ZF Scheme in [36] , ZF Scheme in [35] , ZF The Proposed ACD , MRC Scheme in [36] , MRC Scheme in [35] , MRC Figure 6. Uplink sum rate of the three schemes versus signal to noise ratio (SNR). Figure 7 shows the uplink sum rate with respect to the number of antennas, where M = 16 and K = 10. It can be seen that ACD performs the best. As the number of antennas increases, the uplink sum rate of ACD increases first and then stabilizes. However, the uplink sum rate of the scheme in [35] decreases with more antennas. The reason may be that as the number of antennas increases, the received beam is subdivided in the angular domain. However, the scheme in [35] only selects the beams that fulfill the beams power criteria. Therefore, it is possible that multiple similar beams are selected, resulting in the effective beams being much smaller than M and the performance is reduced consequently. The introduced constraint on the correlations of beams with different angles in ACD avoids these highly correlated angles and thus yields better performance. The Proposed ACD, MMSE The Proposed ACD, ZF Scheme in [36] , MMSE Scheme in [36] , ZF Scheme in [35] , MMSE Scheme in [35] , ZF The Proposed ACD, MRC Scheme in [36] , MRC Scheme in [35] , MRC It worth noting that the impact of the number of antennas on the AoA resolution is reflected in the AoA estimation. In general, with more antennas the angular resolution will increase in MIMO beamforming or precoding techniques. This resolution may be proportional to 1 M . However, this work does not study the AoA estimation, but used them for detection following the proposed ACD scheme. Therefore, once AoAs are determined, the number of antennas does not affect the algorithm anymore. Figure 8 shows the uplink sum rate with respect to the number of users where M = 20 and M = 64. It can be seen that the uplink sum rate of all the schemes with MMSE and ZF improves with respect to the number of users. However, the uplink sum rate decreases significantly when M = K due to the inter-angle correlation. Figure 9 shows the relationship between the uplink sum rate and L, where K = 10, M = 12, and M = 64. With the increase of L, although the sum rate of the proposed ACD scheme declines, its performance is still the best among all schemes. This is because as L increases, the number of AoAs also increases, so the signal power of each user is more dispersed in the angular domain. However, the number of selectable angles is fixed, so the SNR of the corresponding signal decreases, resulting in performance loss. It should be noted that in the mmWave channel, L will not be too large, so the performance advantage of the proposed ACD scheme is still very obvious. Figure 10 illustrates the energy efficiency performance of different schemes where M = 16, M = 64 and K = 10. The energy efficiency is calculated as where B is the bandwidth, P LNA is the power consumption of the low noise amplifier (LNA), P PS is the power consumption of the phase shifter. The typical values for B, P LNA and P PS are chosen to be 20 MHz, 5.4 mW and 2 mW according to [48][49][50]. As shown in Figure 10, all the curves of MMSE and ZF increase with SNR. The proposed ACD scheme with MMSE and ZF performs the best, while MRC is the worst. This is because by using MMSE and ZF, ACD is able to balance both the power consumption and the data rates, thus resulting in improved energy efficiency. The Proposed ACD, MMSE The Proposed ACD, ZF Scheme in [36] , MMSE Scheme in [36] , ZF Scheme in [35] , MMSE Scheme in [35] , ZF The Proposed ACD, MRC Scheme in [36] , MRC Scheme in [35] , MRC  The Proposed ACD, MMSE The Proposed ACD, ZF Scheme in [36] , MMSE Scheme in [36] , ZF Scheme in [35] , MMSE Scheme in [35] , ZF The Proposed ACD, MRC Scheme in [36] , MRC Scheme in [35] , MRC The Proposed ACD, MMSE The Proposed ACD, ZF Scheme in [36] , MMSE Scheme in [36] , ZF Scheme in [35] , MMSE Scheme in [35] , ZF The Proposed ACD , MRC Scheme in [36] , MRC Scheme in [35] , MRC Figure 10. Energy efficiency versus the SNR.

Conclusions
In this work, we have proposed a low-complexity uplink angular-domain compressing based detection for mmWave massive MIMO systems. The proposed ACD scheme not only reduces the computational complexity, but also exhibits improved performance compared with existing works in [35,36]. Performance analysis in terms of achievable rates, energy efficiency and computational complexity is given. Simulation results show that ACD with MRC/ZF/MMSE outperforms the schemes in [35,36] significantly while the complexity is greatly reduced. Future work could be to consider low-complexity angular-domain channel estimation and angular-domain detection for mmWave massive MIMO systems with multi-antenna users.
Author Contributions: Conceptualization, X.X., W.Z. and Y.F.; data curation, X.X. and Y.F.; writing-original draft preparation, X.X., W.Z., X.B. and J.X. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.