On Matrix Completion-Based Channel Estimators for Massive MIMO Systems

: Large-scale symmetric arrays such as uniform linear arrays (ULA) have been widely used in wireless communications for improving spectrum efﬁciency and reliability. Channel state information (CSI) is critical for optimizing massive multiple-input multiple-output(MIMO)-based wireless communication systems. The acquisition of CSI for massive MIMO faces challenges such as training shortage and high computational complexity. For millimeter wave MIMO systems, the low-rankness of the channel can be utilized to address the challenge of training shortage. In this paper, we compared several channel estimation schemes based on matrix completion (MC) for symmetrical arrays. Performance and computational complexity are discussed and compared. By comparing the performance in different scenarios, we concluded that the generalized conditional gradient with alternating minimization (GCG-Alt) estimator provided a low-cost, robust solution, while the alternating direction method of multipliers (ADMM)-based hybrid methods achieved the best performance when the array response was perfectly known.


Introduction
Millimeter-wave (mmWave) wireless communications have drawn great attention in the industry and academia [1] thanks to the large bandwidth available in the 30-300 GHz band.To compensate for the significant path loss in this band and also thanks to the short wavelength, massive MIMO have been suggested for mmWave systems.In particular, large-scale symmetric antenna arrays, such as the uniform linear arrays (ULAs) and uniform planer arrays (UPAs) have been extensively considered for transmitters and receivers due to their neat structures and high gains for directional transmissions [2][3][4].However, the coherence time in the millimeter-wave system is suggested to be short and as the number of antennas increases, the complexity of channel estimation increases.Therefore, it is challenging to acquire instantaneous channel state information (CSI) for a mmWave massive MIMO.
MmWave channels are often dominated by a small number of propagation paths, indicating that the channel is sparse in the angular domain [5].The channel matrix can be expressed in terms of dictionary matrices, which are formed by the transmitting and receiving array response vectors, and path gains.Compressive sensing (CS) [6] can then be applied to search for the dominant paths [7][8][9][10][11][12][13]. Different measurement matrices can be used by choosing the precoders and combiners, as well as various recovery algorithms such as orthogonal matching pursuit (OMP) [7,14] and the adaptive CS [8][9][10] can be applied.In general, the above mentioned CS schemes require knowledge of the array response, which depends on the array geometry and calibration of the antenna arrays.Such knowledge can be inaccurate when there are unknown hardware impairments, e.g., due to phase and gain errors, and imperfect calibration of the antenna arrays.
In the meantime, a small number of propagation paths also indicate that the channel is low-rank [15] and can be depicted in low-dimensional subspace.Furthermore, such low-rankness is independent of the array response and calibration errors.In [16,17], both the sparsity and low-rank property of the mmWave channel are exploited to enhance CS-based channel estimators.In [16], a two-stage estimator is proposed, where the low-rankness is exploited at the first stage while sparsity in the angular domain is exploited at the second stage.In [17], the improved alternating direction method of multipliers (ADMM) method [18] is applied to exploit the low-rankness and sparsity while at the same time enhancing the performance.These estimators, however, still require knowledge of the array response vectors.They can still suffer from performance loss when there are uncertainties in the array response.
To achieve robust channel estimation, matrix completion (MC) methods exploiting only the low-rank property of the mmWave channel have recently been proposed [19,20].The analysis in [21] shows that the rank of the channel matrix is generally very low, which is usually much smaller than the antenna dimension.In [19], the singular value projection (SVP) algorithm [22] is adopted to solve the mmWave channel estimation problem.Later, the GCG-Alt method is developed in [20] by applying the generalized conditional gradient (GCG) framework [23] together with the alternating minimum (AltMin) method [24].There are also other widely-studied MC algorithms that can be used for mmWave channel estimation, such as the singular value thresholding algorithm (SVT) [25] and the fixed point continuation algorithm (FPC) [26].In this paper, we discuss several mmWave channel estimators based on MC, focusing on their performance and complexity comparisons with alternative methods based on CS.We aim to examine the pros and cons for several MC estimators and the factors that influence their performances.
The rest of this paper is organized as follows.Section 2 introduces the mmWave MIMO channel model and formulates the channel estimation problem.Section 3 introduces channel estimators based on MC, including their detailed implementation.Section 4 presents simulation results, in terms of the mean squared error (MSE) and computational complexity.Section 5 concludes the paper.
Notation: A T , A * , and A H denote transpose, conjugate, and conjugate transpose of matrix A, respectively.A 1 denotes the l 1 norm.I and 0 represent the identity matrix and zero matrix/vector, respectively.A ⊗ B and A B denote the Kronecker product and the Hadamard product, respectively.tr{A} is the trace of A and A, C = tr A H C denotes the inner product of matrices A and C. E[•] denotes the statistical expectation and abs(•) represents taking element-wise absolute value.∇ denotes the gradient of a function.For a matrix A ∈ C M×N , vec(A) ∈ C MN×1 denotes the vectorization of A and vec −1 (A) ∈ C M×N denotes the inverse of vectorization.R(•) and I(•) denote the real and imaginary part of a number or vector, respectively.CN a, b 2 denotes complex Gaussian distribution with mean a and variance b 2 .

System Model
Consider a point-to-point, switch-based mmWave hybrid MIMO system, with the receiver at the mobile station (MS) shown in Figure 1.For simplicity and clarity, this paper assumes switch-based mmWave systems to investigate MC-based channel estimators.MC-based estimators can also be applied to phase shifter-based mmWave systems, when the hybrid precoders/combiners are properly designed, as shown in [20].Therefore, the discussion in this paper can be easily extended to phase shifter-based mmWave systems.At the receiver, each of the N MS antennas is equipped with a switch used to select one of the N RF RF chains.The base station (BS) has the same structure with N BS antennas and N RF RF chains.Assume that N s data streams are transmitted, with N s ≤ N RF ≤ min(N BS , N MS ) [14,27].The switching operation can be represented as a precoder F, where the nonzero entries indicate the entries of the channel matrix that are sampled.A symbol s ∈ C N s ×1 with E[ss H ] = 1 N s I is precoded, resulting in the transmitted signal x = Fs.We consider a narrow-band flat fading channel whose channel matrix H ∈ C N MS ×N BS satisfies E H 2 F = N MS N BS .The received signal is expressed as: where ρ indicates the average received power and n ∈ C N MS ×1 is a noise vector with i.i.d.entries distributed as CN (0, σ 2 n ).Applying a combiner W to the received signal at the MS, the processed received signal is given by: In the switch-based system, the combiner W has a similar structure with the precoder F. Following [8], the ray-clustering model of H is given as: where C ∼ max{Poisson(λ), 1} is the number of clusters with λ as the mean of the Poisson distribution, and R is the number of rays in each cluster.The complex small-scale fading gain on the r-th ray of the c-th cluster is g cr with g cr ∼ CN (0, γ c ), where γ c is the sub-power on the c-th cluster.In Equation (3), a MS (φ MS cr ) and a BS (φ BS cr ) represent the array response vector for the receiver and transmitter, respectively, where φ MS cr and φ BS cr represent the corresponding azimuth AoA and AoD, which follow the Laplacian distribution [28].
Considering a uniform linear array (ULA) with distance between adjacent antennas being d, the array response is given by: where λ c is the wavelength of the carrier wave.The array response a MS (φ MS cr ) is constructed in the same manner as a BS (φ BS cr ).

Compressive Sensing-Based Channel Estimation
It has been shown that, without considering quantization errors, the mmWave channel estimation problem can be formulated as a sparse recovery problem by modeling the channel as [29][30][31]: where ] is a unitary dictionary matrix when N 1 = N MS and it is an overcomplete dictionary matrix when N 1 > N MS , and Each column in A BS and A MS consists of a predefined array response vector.H v ∈ C N 1 ×N 2 is a sparse matrix with only L non-zero values, with each of its non-zero values corresponding to the complex gain of a channel path.Vectorization of the channel matrix (5) produces: where matrix.Sparse recovery schemes can then be used to estimate the channel, which transforms the task of estimating H to estimating the non-zero coefficients in x.
A widely used method to estimate x from the the received signal is orthogonal matching pursuit (OMP) [7,14].Using OMP, L path directions from the N 1 N 2 candidates in the dictionary are determined.The computational complexity of the OMP method is approximately O(NLN 1 N 2 ), where N is the length of the received signal.In general, a larger dictionary leads to better performance but also higher computational complexity.
The above mentioned CS approach uses a discretized approximation of the channel.It may suffer from the off-grid issue if the physical propagation paths are off the assumed grid of the angles.In this case, the number of non-zero entries in the beamspace channel H v may not be exactly equal to L, leading to a power leakage.Another challenge is that the knowledge of the array response is required, which may be imperfect in practice due to unknown hardware impairments and imperfect calibrations.

Matrix Completion-Based Channel Estimation
In this section, we introduce MC-based estimation methods for the mmWave channel by exploiting the low-rankness of the channel matrix.By appropriately choosing the training scheme with proper precoders and combiners, the received signal provides noisy observations of a subset of the entries of H: where H = H + N is the perturbed channel matrix, N is a noise matrix, Ω denotes a sample domain, and [ H] i,j is the (i, j)th entry of H. Define p = N/(N BS N MS ) as the sampling density, where N is the total number of samples observed.It is discussed in [19] that when the mmWave channel matrix has strong non-coherent characteristics, it can be recovered from a subset samples of the channel matrix.We can thus formulate the channel estimation problem as a low-rank matrix completion problem as: where δ 2 n is the error tolerance parameter and the sampling operator P Ω (•) is defined as: where [ H] i,j denotes the (i, j)-th entry of H.The sampling operator P Ω significantly influences the performance of the algorithm [32].Bernoulli and uniform sampling models are proposed in [32] and a uniform spatial sampling model (USS), which improves the performance, is proposed in [33].
With USS, N/N BS samples are taken for each column of the target matrix.

MC Estimators
In the following, we discuss several MC estimators that can be used to solve the problem in Equation (8).

SVT Estimator
Before presenting the SVT algorithm, let us define the matrix shrinkage operator: where Σ X denotes the singular value matrix of X and S τ (Σ X ) is the element-wise shrinkage operator: where τ is the threshold.The SVT algorithm [25] can be applied to provide a heuristic solution to Equation ( 8), which consists of two major steps, where τ > 0, δ is a step size, and k = 1, 2, . ... The iteration is stopped when a stopping criterion is met or a maximum number of iterations J SVT is reached.Singular value decomposition (SVD) is required at each iteration.Some comments regarding the implementation of the SVT algorithm to the mmWave channel estimation problem are as follows: • The threshold is set as τ = 5 √ N MS N BS following [25]; The stepsize is set as δ = 1.2/p; • Assuming the initialization X 0 = 0, H k = 0 for a small k < k 0 .As such, X k = kδP Ω ( H), k = 1, . . ., k 0 .The algorithm can begin with computing H k 0 to save work.From [25], the integer k 0 is determined by SVT Estimator is shown in below Algorithm 1: end if 7: end for 8: return H = H k From [25], SVT is effective for completing large matrices with low ranks.Its performance degrades as the rank increases.The computational complexity of the SVT algorithm mainly arises from Step 3.

FPC Estimator
The FPC algorithm [26] reformulates the MC problem using the nuclear norm, which is the summation of the singular values, min where µ > 0 is the regularization parameter.The algorithm consists of two steps similar to SVT: where the threshold of the singular value thresholding operator is set as a variable δµ m rather than a fixed value as that in SVT.A continuous strategy [34] is used to accelerate the convergence by adapting µ m .The details are presented in Algorithm 2. Some comments are as below: • We can set the step size δ ∈ 0, 2/λ max P Ω ( H) H P Ω ( H) according to [26], where λ max is the maximum eigenvalue; • µ m+1 decreases as where M, determined by (µ final , η µ ), controls the step size and the estimation accuracy, µ final is small (e.g., µ final = 0.01), and the parameter 0 < η µ < 1 determines the decreasing rate for consecutive µ m .

Algorithm 2 FPC Estimator
Require: P Ω ( H), , J FPC , δ, µ final , and η µ 1: Initialization: H 0 = 0, m = 0, µ m = P Ω ( H) for k = 1 : 6: end for 10: end while The main computational cost of the FPC algorithm is in Step 6 of Algorithm 2 due to the SVD.In addition, the FPC algorithm needs to choose the step size δ by calculating the maximum eigenvalue of P Ω ( H) H P Ω ( H) and has a higher computational complexity per iteration than that of the SVT algorithm.

SVP Estimator
The SVP algorithm [22] is based on the projected gradients and is detailed in Algorithm 3.This algorithm requires that the rank L of the channel matrix to be known.The step size can be chosen empirically as η = 1/(1 + δ 0 )p with 0 < δ 0 < 1/3.The stopping criterion is based on the norm of the difference in the sampled channel matrix, where the small threshold 0 < < 1/2 can be set such as = 10 −3 .The SVP algorithm also needs to calculate the SVD in Step 4, which is the most computationally expensive step of the algorithm.

Algorithm 3 SVP Estimator
Require: P Ω ( H), L, η, 1: Initialization: H 0 = 0, t = 0 2: while P Ω ( Compute the L principal singular vectors of X t+1 : U L , Σ L , V L . 5: In [20], a generalized conditional gradient framework with alternating minimization (GCG-Alt) is developed for the MC problem.The nuclear norm is used to promote low-rankness as: where µ > 0 is a regularization factor.Let: The gradient direction of the kth iteration of f ( H) [23]: where (u k−1 , v k−1 ) is the top singular vector pair of ∇ f H k−1 = P Ω H k−1 − H which can be found iteratively.The channel matrix is updated as: where η k ∈ [0, 1] is the step size and θ k is adaptively chosen.By using a property of nuclear norm [20], the optimization problem can be reformulated as: where U ∈ C N MS × r and V ∈ C N BS × r with r being the rank of H. Alternating minimization can then be used to optimize Equation (19).The details of solving the alternate minimization problem can be found in [20].The overall algorithm is summarized in Algorithm 4.

Algorithm 4 GCG-Alt Estimator
Require: P Ω ( H), µ, , a 1: Compute the top singular vector pair 6: h Ω = vec P Ω ( H) 8: : 13: while i k > a do 14: Find V i k that minimizes Equation (19) given U = U i k ; 16: Find U i k that minimizes Equation (19) given end while 19: 21: The above MC methods have different computational complexities.SVT, SVP, and FPC all need SVD, which can be implemented using the PROPACK [35] based on the iterative Lanczos bidiagonalization algorithm with partial reorthogonalization.The FPC has a higher complexity as SVD is repeated for different values of µ m .The GCG-Alt has the least complexity as the full SVD is not required.SVP is effective for large matrix completion problems with very low ranks, while the FPC, SVT, and GCG-Alt estimators allow higher ranks.The SVP estimator requires rank knowledge, while the FPC and GCG-Alt estimators implicitly determine the rank through choosing regularization parameters or thresholds.

MC-Based Hybrid Estimators
Next we discuss two MC-based hybrid methods that jointly exploit the sparsity and low-rankness of the channel.

ADMM Estimator
In [17], the low-rankness of H and the sparsity of the beamspace channel H v are jointly exploited and an ADMM method is proposed.Leveraging the side information that H has a sparse virtual representation given by Equation ( 5), the channel estimation problem is formulated following [36] as: min (20) where the nuclear norm and l 1 -norm together with the regularization parameters τ L and τ S are used to promote low-rankness and sparsity, respectively.The above problem is then reformulated by incorporating the constraints as penalty terms as: min where E ∈ C N MS ×N BS and C are two auxiliary matrix variables.This problem is then solved by using ADMM which involves the iterative updates of the variables and Lagrangian multipliers.
The augmented Lagrangian function of Equation ( 21) is given by: where Z 1 and Z 2 ∈ C N MS ×N BS are dual variables (the Lagrange multipliers) and t > 0 is the step size.
The estimator is summarized in Algorithm 5, where: • z i denotes the vectorization of Z i , and similarly for other variables; where Ω * ∈ {0, 1} N MS ×N BS is composed of N ones and N BS N MS − N zeros, the value 1 indicates the position of a sample from the channel matrix, and [Ω * ] i denotes the i-th row of Ω * , and I is the N MS × N MS matrix that the value at its (i, i)-th position is 1 and the remaining position is 0 [17];

•
The parameters in Equation ( 20) are chosen empirically as τ L = t P Ω ( H) where σ 2 n is the noise power.
The computational cost of the ADMM algorithm is mainly due to the SVD in Step 3 and the matrix operations in Steps 4, 5, 6, and 7.

Two-Stage Estimator
The two-stage estimator [16] also exploits the sparsity and low-rankness of the channel matrix.In the first stage, MC is applied to provide a denoised channel estimation based on the low rankness of the channel matrix and then the second stage employs CS to refine the estimation based on the array response and the virtual representation of the channel matrix.
The SVT algorithm [25] introduced above solves the low-rank matrix completion problem: The fast iterative shrinkage threshold algorithm (FISTA) proposed by [37] is used to solve the sparse vector recovery problem: The estimator is summarized in Algorithm 6, where: • The parameters of the first stage, SVT, is the same with Algorithm 1; The complexity of SVT in the first stage has been anlyzed.The cost of the FISTA algorithm mainly consists of applying the sensing matrix in Step 4, which has a complexity of O(N 2 BS N 2 MS ).

Numerical Results
Consider switch-based MIMO systems over a mmWave channel at 90 GHz.When not otherwise specified, the number of clusters C ∼ max(Poisson(1.8),1), and the number of rays R ∼ U [1,20] (the total number of paths is L = CR); the AoDs and AoAs follow Laplace distributions with a standard deviation of 15 • [38]; the sub-power of the clusters γ c = 1; and the ULA at the MS and BS has N MS = 32 antennas and N BS = 128 antennas, respectively, and N RF = 4 RF chains.Here, the CS method based on OMP is compared with MC-based estimators.The parameter settings are as follows: • OMP: The unitary dictionary is set with N 1 = N MS = 32 and N 2 = N BS = 128.The stopping threshold is set as OMP = 0.025σ 2 , 0.05σ 2 , 0.075σ 2 , 0.1σ 2 , 0.125σ 2 , 0. The normalized MSE (NMSE) is defined as: where H is an estimate of the channel matrix H.We next show the performance with different number of channel paths.Here, we assume the number of clusters C = 1, the numbers of rays R is changed from 1 to 22, and so the total number of channel paths L = R. From Figure 4, as the numbers of paths L increases, the NMSE of different algorithms degrades.The SVP algorithm is the most sensitive to the number of paths while the OMP estimator is the most robust.The sampling ratio is also critical for performance.From Figure 5, the NMSE improves substantially with increasing sampling ratio.It is noted that CS-based schemes such as the OMP and the two-stage algorithms are more advantageous when the sampling ratio is low.

NMSE Comparison When There Are Hardware Impairments
In practice, it is inevitable to have impairments of the antenna elements, which are typically time-varying, e.g., due to temperature changes or hardware aging [39].Therefore, the array response may be severely impacted.Due to mechanical reasons and uncertainty regarding the precise position of the antenna phase center, the actual antenna position may deviate from the assumed ideal array shape.Following [20], we define the gain and phase error vector at the BS as: where ω i represents the phase errors and β i denotes the amplitude gain of each antenna element.The gain and phase error vector e MS at the MS is similar to e BS .Considering such errors, the received signal in Equation ( 2) can be expressed as: where E MS is a diagonal matrix with e MS as the diagonal elements, and E BS is defined similarly.We carry out simulations to examine the robustness of the different approaches when there are phase and gain errors in the array response.Those errors are assumed to be uniformly distributed within certain range and characterized by the level of phase and gain errors, respectively, following [20].It is found that MC estimators were not affected by the phase or gain errors since the estimators were independent of the array response vectors.The sparsity-based methods use array response which depends on the phase and gain information, and can result in poor channel estimation when unknown gain and phase errors are present.These were validated by simulation experiments.From Figures 6  and 7 where NMSE achieved with different levels of phase and gain errors are compared, the MC estimators based on SVP, FPC, GCG-Alt, and SVT, were insensitive to phase errors or gain errors.The estimators exploiting CS, such as the one based on OMP and hybrid methods (the ADMM and two-stage estimators), were more sensitive to phase errors or gain errors.

Computational Complexity
Finally, we compare the computational complexity of different algorithms in Figures 8 and 9.It can be seen that in general, the hybrid estimators (the ADMM and two-stage estimators) exhibited higher complexity because they both involved SVD and the application of the sensing matrices were of large sizes.In particular, the ADMM algorithm had the highest complexity.The MC estimators had moderate complexity that did not vary significantly with the PNRs and the FPC exhibited higher complexity than SVT and SVP as more SVD operations were required.The GCG-Alt algorithm had the lowest complexity.This was because the GCG-Alt algorithm had a fast convergence rate and also avoided the SVD operations used in the other MC algorithms such as SVP, SVT, and FPC.The computational complexity of the OMP estimator increased with the PNR because the numbers of paths recovered by the OMP increased as the PNR increased.

Performance with Line of Sight (LoS) Propagation
In the above, we have focused on the comparison for the channels where the different paths had the same average power.In practical applications, there may exist line of sight (LoS) propagation, where a path contributes a significant portion of the power gain [40,41].In this case, the channel model can be modified from Equation (3) to: g cr a MS (φ MS cr )a H BS (φ BS cr ) + βa MS (φ MS LoS )a H BS (φ BS LoS ).
In our simulations we set g cr ∼ CN (0, 0.5) and β = √ C/2.The NMSE results are shown in Figure 10.With a LoS path which dominates the power gain, the effective rank of the channel matrix may be reduced.The various channel estimators considered are more effective in this case, leading to lower NMSE as compared with the case where the LoS path is absent.The results also demonstrate the effectiveness of the MC-based methods in realistic scenarios.

Conclusions
In this paper, we compared the performance of several MC-based channel estimators for mmWave massive MIMO systems.It was observed that the hybrid ADMM algorithm exhibited the best performance in general, which jointly exploited the low rank property of the channel matrix and the sparsity in the angular domain.However, it also exhibited the highest complexity among the estimators compared.The MC-based estimators (using GCG-Alt, SVT, SVP, or FPC) were robust against array impairments as they did not rely on array response vectors.Among them the GCG-Alt estimator exhibited the lowest complexity, better performance, and provided a competitive solution when the arrays were not perfectly calibrated.
In this work, we considered a point-to-point mmWave system.The estimators could also be applied to multiuser systems when orthogonal training schemes are deployed.The comparison of these methods when nonorthogonal training is used would be an interesting study for future work.

Figure 2
Figure 2 compares different estimators in terms of NMSE with different PNRs, which is defined as PNR = 10 log 10( ρ σ 2 n ), when N MS = 32, N BS = 128, and N RF = 4.The SVP algorithm performed worse than others.The ADMM algorithm performed the best, and the NMSE of the ADMM estimator was at least 5 dB better than other estimators.The NMSE with N BS = 64, N MS = 64 and N RF = 8 is

Figure 2 .
Figure 2. NMSE (normalized mean square error) of the channel estimation in the ULA (uniform linear arrays) system with N BS = 128, N MS = 32, N RF = 4, the sampling ratio p = 0.375, different PNRs (pilot-to-noise ratios), and without array impairments.

Figure 4 .
Figure 4. NMSE of the channel estimation in the ULA system with N BS = 128, N MS = 32, N RF = 4, p = 0.375, PNR = 20 dB, different number of paths, and without array impairments.

Figure 5 .
Figure 5. NMSE of the channel estimation in the ULA system with N BS = 128, N MS = 32, N RF = 4, PNR = 20dB, different sample ratios, and without array impairments.

Figure 6 .
Figure 6.NMSE of the channel estimation in the ULA system with N BS = 128, N MS = 32, N RF = 4, p = 0.375, PNR = 20 dB, and different levels of phase errors.

Figure 7 .
Figure 7. NMSE of the channel estimation in the ULA system with N BS = 128, N MS = 32, N RF = 4, p = 0.375, PNR = 20 dB, and different levels of gain errors .

Figure 10 .
Figure 10.NMSE of the channel estimation in the ULA system with N BS = 128, N MS = 32, N RF = 4, the sampling ratio p = 0.375, different PNRs, and without array impairments.