Channel Covariance Matrix Estimation via Dimension Reduction for Hybrid MIMO MmWave Communication Systems

Hybrid massive MIMO structures with lower hardware complexity and power consumption have been considered as potential candidates for millimeter wave (mmWave) communications. Channel covariance information can be used for designing transmitter precoders, receiver combiners, channel estimators, etc. However, hybrid structures allow only a lower-dimensional signal to be observed, which adds difficulties for channel covariance matrix estimation. In this paper, we formulate the channel covariance estimation as a structured low-rank matrix sensing problem via Kronecker product expansion and use a low-complexity algorithm to solve this problem. Numerical results with uniform linear arrays (ULA) and uniform squared planar arrays (USPA) are provided to demonstrate the effectiveness of our proposed method.


Introduction
Millimeter wave (mmWave) communications are promising for future-generation wireless communications for their advantages such as large bandwidths, narrow beams, and secure transmissions [1,2]. Large-scale multiple-input multiple-output (MIMO) hybrid structures equipped with only a few RF chains have generated great interests for mmWave systems due to their low complexity and near-optimal performance [3,4]. Precoders and combiners must be carefully designed to exploit the potential of large-scale MIMO hybrid systems, e.g., to achieve high data transmission rates. They can be designed based on the instantaneous channel matrix [5,6], which may be estimated by using channel estimation techniques [3,7]. However, the instantaneous channel can vary fast [8], especially at mmWave frequencies [9,10], and the precoder/combiner has to be redesigned once the instantaneous channel changes [5,6].
Although the instantaneous mmWave channel can change very fast, the long-term channel statistics, e.g., the angular power spectrum, can be stationary for tens to hundreds of coherence blocks [9]. Recently, the channel covariance information has been utilized to design the analog precoders/combiners [9,11], which remain fixed when the covariance matrix is unchanged. The effective digital system has a reduced dimensionality, which greatly reduces the cost of acquiring the instantaneous channel state information (CSI) and simplifies the optimization of the digital precoders and combiners. The channel covariance matrix should be firstly estimated to realize the designs in [9,11]. With large antenna arrays, the channel covariance matrix has a large dimensionality, which demands a large number of observations to be used when traditional covariance matrix estimators are adopted. Meanwhile, the hybrid structure only allows a reduced number of observations to be acquired at the receiver, which makes the task of channel covariance estimation challenging.
In order to address this challenge, [10,12] propose several compressive sensing (CS) based channel covariance estimators, which explore the relations between the angle of departure (AoD)/angle of arrival (AoA) and the channel covariance matrix. Their methods need a dictionary for searching the AoD/AoA, and the resulting performance improves when the resolution of the dictionary becomes higher. However, high-resolution dictionary yields high computational complexity. Moreover, these CS-based estimators require the number of paths in the channel to be known a priori. In [13], an analytical expression of the channel covariance matrix is derived and computed through the information obtained from one instantaneous channel realization, which can be estimated from low-dimensional observations. In [14], the covariance matrices of vector channels are estimated by solving a subspace estimation problem leveraging their low-rank property. Also, tensor decomposition has been used for dimension reduction for the mmWave channel estimation problem [15,16]. It has been recently used for channel covariance estimation in frequency-selective channels [17], where the channel is represented as a low-rank third-order tensor in terms of factor matrices. The channel covariance matrix is obtained from the estimated factor matrices. The methods of [13,14,17] focus on vector channels, which may not be directly applicable to matrix channels where both the transmitter and receiver employ multiple antennas.
In this paper, we investigate the mmWave channel covariance matrix estimation problem for hybrid mmWave communication systems that are equipped with uniform linear arrays (ULA) or uniform square planar arrays (USPA). Both the transmitter and the receiver have multiple antennas. The main contributions are as follows: 1. We show that the mmWave MIMO channel covariance matrix follows a Kronecker product expansion model [18]. Following [18][19][20], we show that this model can be used for reducing the effective dimension of the large-dimensional channel covariance matrices in mmWave MIMO systems. We further show that permutation can reduce the rank of the mmWave channel covariance matrix, which admits an expression of the summation of vector outer products. We thus formulate the channel covariance matrix estimation problem as a low-rank matrix sensing problem. 2. Although the aforementioned low-rank matrix sensing problem has a smaller size than the original problem, the complexity can still be high when the numbers of the transmitter/receiver antennas are large. In order to reduce the complexity, we further exploit the structures of the ULA or USPA to reduce the dimensionality of the problem and formulate the problem as a structured low-rank matrix sensing problem. We adapt the recently proposed generalized conditional gradient and alternating minimization (GCG-Alt) algorithm [21], which has low computational complexity, to find the solution. Numerical results with ULA and USPA suggest that our proposed estimator is effective in estimating the mmWave channel covariance matrix.
The rest of this paper is organized as follows. We introduce the spatial channel model and the hybrid system in Section 2. In Section 3, we formulate the channel covariance estimation problem as a structured low-rank matrix sensing problem and present the solution. We show the simulation results in Section 4 and conclude the paper in Section 5.
Notations: Bold uppercase A denotes a matrix and bold lowercase a denotes a column vector. A * , A T , and A H denote the conjugate, transpose, and conjugate transpose of matrix A, respectively. a(i) denotes the i-th element of vector a. [A] a:b,: denotes the submatrix of A made of its a-th to b-th rows.
[A] a:b,c:d denotes the submatrix of A defined by its a-th to b-th rows and c-th to d-th columns. A F and A * are the Frobenius norm and the nuclear norm of A. For A ∈ C M×N , vec(A) ∈ C MN×1 is a column vector obtained through the vectorization of A and vec −1 (A) ∈ C M×N is a matrix obtained by the inverse of vectorization. For matrices A and B, A ⊗ B denotes the Kronecker product of A and B. CN (a, b 2 ) represents complex Gaussian distribution with mean a and variance b 2 . U (a, b) represents uniform distribution with support [a, b].

Spatial Channel Model
Consider point-to-point mmWave transmissions, where the transmitter has N t antennas and the receiver has N r antennas. We assume the following spatial channel [22]: where K is the number of clusters, and L is the number of rays within each cluster. As reported in [22], the number of clusters is often small, e.g., K = 1, 2, but the number of rays inside each cluster can be large, e.g., L = 30. a r (φ r kl , θ r kl ) and a t (φ t kl , θ t kl ) are the array response vectors at the receiver and transmitter, respectively, where φ r kl , θ r kl , φ t kl , and θ t kl are the azimuth AoA, elevation AoA, azimuth AoD, and elevation AoD on the l-th ray of the k-th cluster, respectively. These angles can be characterized by cluster center angles and angular spreads: Each cluster covers a range of angles and the angular spread describes the span of each cluster. The angular spread in the mmWave propagation environment is considered to be small [13]. Measurements of the angular spread taken in the urban area of New York City are presented in [22] in terms of the root-mean-square (rms) of all the measurements. At the carrier frequency f c = 28 GHz, example angular spreads of 15.5 • , 6 • , 10.2 • , and 0 • are reported for φ r kl , θ r kl , φ t kl , and θ t kl , respectively. The small-scale fading coefficient g kl is assumed complex Gaussian, i.e., g kl ∼ CN (0, γ 2 k ), where γ 2 k is the fraction power of the k-th cluster [22] (Equation (7)). As discussed in [9], though the small-scale fading gains {g kl } change fast, the AoDs/AoAs and γ 2 k may remain stationary over tens to hundreds of coherence blocks. Assume that {g kl , ∀k, ∀l} are mutually independent, then channel covariance matrix can be modeled as and T r kl a r (φ r kl , θ r kl )a H r (φ r kl , θ r kl ) ∈ C N r ×N r .
Note that Equation (2) is the same as the channel covariance expression in [9] when L = 1. In the following, we first present our proposed covariance matrix estimation method for systems equipped with the ULA and then discuss its adaptation to systems that adopt the USPA.
For the ULA, the array responses a t (φ t kl , θ t kl ) and a r (φ r kl , θ r kl ) are independent of the elevation angles. They can thus be abbreviated as a t (φ t kl ) and a r (φ r kl ). For an N a -element ULA with distance d between adjacent antennas, the array response is where λ c is the carrier wavelength and N a = N t or N r is the number of antennas at the transmitter or receiver. Accordingly, T t kl of Equation (3) and T r kl of Equation (4) become and respectively, which are Toeplitz-Hermitian. Since the Kronecker product of two Toeplitz-Hermitian matrices is block-Toeplitz-Hermitian [23], the channel covariance matrix R defined in Equation (2) is block-Toeplitz-Hermitian. We next discuss the hybrid system. We assume phase shifter-based hybrid transceivers [21] shown in Figure 1, where the antennas and analog phase shifters at the transmitter or receiver are fully connected. Assume that there are K t N t radio frequency (RF) chains at the transmitter and K r N r RF chains at the receiver. For single-stream transmissions with one symbol s transmitted, the received signal is written as where W and f are the receiving processing matrix and transmitting processing vector, respectively, and n is the noise vector. Up to K r digital symbols can be observed at the receiver after each transmission. In hybrid transceivers, we have W = W RF W BB and f = F RF f BB , where W RF and F RF are the analog combiner and precoder, respectively, and W BB and f BB are the digital combiner and precoder, respectively. In addition, due to the constraints of the phase shifters in the RF combiner and precoder, the entries in W RF and F RF have constant modulus. Note that using single-stream transmissions during the channel training avoids the interferences caused by transmitting multiple symbols simultaneously, and this has been widely considered [3,4,7].
When N t and N r are large, the dimension of the channel covariance matrix R is large. In this case, estimating R can be difficult when only a small number of observations available, which is typical in the hybrid system.
From Equation (2), R follows the Kronecker product expansion model [18]. In the following, we explore this property and the block-Toeplitz-Hermitian structure of R to reduce the dimensionality of the problem of estimating R, and formulate the channel covariance matrix estimation problem as a structured low-rank matrix sensing problem.

Rank Reduction By Permutation
and respectively, where 1 ≤ l ≤ L and 1 ≤ k ≤ K. Then R of Equation (2) can be written compactly as where the summation involves KL terms. Note that T t kl and T r kl are Toeplitz-Hermitian. Denote the following N r × N r submatrix of R as by stacking each submatrix R mn into a row vector as Then based on the Kronecker product expansion property [23,24], R p can be written as a sum of vector outer products Note that if we have R p , we can obtain R as P −1 (R p ). From Equation (12), we can see that the column space of R p is spanned by {t t kl } and the row space of R p is spanned by {t r kl }. Recall that t t kl = vec(T t kl ) and by using the relation between T t kl and the transmitter array response a t (φ t kl ) shown in Equations (5) and (8), t t kl can be written as where φ t kl is the azimuth AoD. We can see that t t kl consists of the array response vector a t (φ t kl ) and the column space of R p is determined by the set As introduced earlier, small angular spreads are observed in the mmWave propagation environment, which indicates that the AoDs inside a cluster are closely spaced and their corresponding array response vectors are highly correlated. Therefore, for the k-th cluster, though the number of rays L inside can be large, the space spanned by {a * t (φ t kl ) ⊗ a t (φ t kl ), 1 ≤ l ≤ L} may be well approximated by a low-rank space. In addition, since the number of clusters K is generally small (e.g., K = 1 or 2), both C t and R can be low-rank. This is similar to the low-rankness of the mmWave channel H, which has been validated by the experimental and simulation results in [22]. The low-rank property of R p can be shown numerically. Denote by r ch the rank of R p or R, and let σ 1 > σ 2 > . . . > σ r ch be the singular values of R p or R. We may use to measure the energy captured by a rank-r sub approximation of R p or R, where r sub is the rank of the subspace of R p or R. Figure 2 shows an example of a ULA system with K ∈ {1, 2, 3, 4}, L = 30, N t = 64, and N r = 16. The covariance matrix R and its permuted version R p have sizes of 1024 × 1024 and 4096 × 256, respectively, which shows that R p is a taller matrix. The horizontal AoDs where the center angles φ t k are distributed uniformly in [0, 2π] and separated by at least one angular where υ r h = 15.5 • . The cluster powers are generated following [22] (Table I). It can be seen from Figure 2 that for capturing a majority of the total energy, e.g., with p e = 0.95, 0.99, the required r sub for R p is generally much smaller than min(N 2 t , N 2 r , KL) and is also much smaller than that for R. In the following, we use r p as the r sub of R p and r R as the r sub of R for a certain p e . As such, R p may be well approximated as a rank-r p matrix. One may use low-rank matrix recovery methods, e.g., matrix completion methods, to estimate the best rank-r p approximation of R p from a small amount of observations. However, when N t and N r are large, which is the case in mmWave communications, the number of parameters required by the rank-r p approximation of R p , i.e., (N 2 t + N 2 t ) × r p , is still large. Therefore, estimating the subspaces of R p can be computationally expensive.

Dimension Reduction by Exploiting the Toeplitz-Hermitian Structure
Recall that R is block-Toeplitz-Hermitian and R p = P (R) is a permutation of R. From Equations (12) and (13), we can see that R p is also specially structured: R p is the summation of the outer products of t t kl and t r kl , where t t kl and t r kl are the vectorizations of Toeplitz-Hermitian matrices T t kl and T r kl , respectively. Since the Toeplitz-Hermitian matrix T t kl ∈ C N 2 t ×N 2 t is determined by its first column and first row (its first row is the conjugate transpose of its first column), we can represent t t kl in terms of the entries in the first column and first row of T t kl . We can represent t r kl in the same way. Therefore, the total numbers of unknowns in t t kl and t r kl are 2N t − 1 and 2N r − 1, respectively. Then we can reduce the problem size of (N 2 In the following, we show how the problem size can be reduced.
First, let us use an example with N t = 3 to illustrate the structure of t t kl . The array response Then according to Equation (13), we have We can see that all the 9 elements in t t kl can be represented by the elements in a t (φ t kl ) and a * t (φ t kl ). Now construct a vector then t t kl can be rewritten as In fact, a kl (4) = (a kl (2)) * and a kl (5) = (a kl (3)) * . Therefore, t t kl can be expressed as a product of a weight matrix and a vector a kl . Furthermore, the weight matrix depends only on the structure of the antenna array and is independent of the path angles.
We then have t t kl = Γ u a kl , and t r kl where being a vector whose i-th entry is 1 and other entries are zero. Γ v is constructed similarly as Γ u , and Γ u and Γ v are both full-rank. This is because Γ u and Γ v consist of 1's and 0's, and there is only one 1 in each row of Γ u and Γ v . Therefore, Equation (12) can be rewritten as where C = ∑ K k=1 ∑ L l=1 a kl b T kl . As shown above, R p is approximately low-rank. Since the fixed weight matrices Γ u and Γ v are full-rank, C is approximately low-rank. Hence estimating a low-rank approximation of R p is equivalent to estimating a low-rank approximation of C. Note that the size of C ∈ C (2N t −1)×(2N r −1) is much smaller than the size of R p ∈ C N 2 t ×N 2 r , and this can greatly reduce the complexity of the problem.

Training
We assume that the channel matrix H remains static during a snapshot and suppose we have T snapshots. For different snapshots, we assume that the AoAs/AoDs and the fraction power γ 2 k remain unchanged, but the small-scale fading gain g kl ∼ CN (0, γ 2 k ) can change [9]. Suppose the transmitter sends out S training beams during each snapshot. For the s-th training beam of the t-th snapshot, we employ the transmitting vector f t,s ∈ C N t and the receiving matrix W t,s ∈ C N r ×K r . Therefore, in each snapshot, after the transmitter sends out S training beams, the receiver receives SK r symbols, and the sampling ratio is SK r /N r N t . We design f t,s and W t,s and their corresponding F RF , f BB , W RF , and W BB realizations for the hybrid structure according to the training scheme in [21] (Section III.D). For the s-th training beam of the t-th snapshot, the received signal is where n t,s is the noise vector and H t is the channel matrix at snapshot t. Without loss of generality, assume identical training symbols s = √ P. By setting f t,s 2 F = 1, the total transmitting power is f t,s s 2 F = P, and the pilot-to-noise ratio (PNR) is defined as where the noise is assumed to be an additive white Gaussian noise (AWGN) with variance σ 2 . In the t-th snapshot and after the transmitter sends out all the S training beams, stack the received signals as . . . where . . .
Suppose the trainings are the same for different snapshots, i.e., f 1,s = f 2,s = . . . = f T,s = f s and W 1,s = W 2,s = . . . = W T,s = W s . We then have and where Σ and Σ n represent the covariance matrices of the received signal y t and the noise n t , respectively. After T snapshots, we can compute the dimension-reduced sample covariance matrix (SCM) of y t as We permute S into S p ∈ C S 2 ×K 2 r in a similar procedure as R is permuted into R p .

Low-Rank Matrix Sensing Problem
We can now formulate the channel covariance estimation problem as a low-rank matrix sensing problem [25]: min where R p is the estimate of R p , A : C N 2 is an appropriate linear map, and ζ 2 is a constant to account for the fitting error. Replacing R p with Equation (18), we can reformulate Equation (26) as min where C is the estimate of C.
In general, problem Equation (27) is a nonconvex optimization problem and difficult to solve. In this paper, we solve the relaxed version of problem Equation (27) [26]: where and µ > 0 is a regularization coefficient. After some manipulations, we have A(Γ u CΓ T v ) = Qvec( C), where and Q ∈ C S 2 K 2 r ×(2N t −1)(2N r −1) . The direct evaluation of C * , which is the nuclear norm (i.e., the summation of the singular values) of C, is computationally expensive. Following [21], C * can be written as Therefore, finding a C to minimize the objective function in Equation (28) becomes finding a pair of (U, V) to minimize A similar low-rank recovery problem is recently studied in [21] for instantaneous mmWave channel estimation, where a training scheme is designed such that the channel can be estimated by solving a matrix completion (MC) problem. A generalized conditional gradient and alternating minimization (GCG-Alt) algorithm is developed, which is shown to be able to provide accurate low-rank solutions at low complexity. In this work, we adapt the GCG-Alt algorithm to solve Equation (32) for our covariance matrix estimation problem. In the following, we discuss the key steps of the GCG-Alt algorithm for solving Equation (32) and refer readers to [21] for more detailed treatments.
The GCG-Alt algorithm consists of a relaxed GCG algorithm and an AltMin algorithm. Let C k−1 be the solution to C at the (k − 1)-th GCG iteration. The relaxed GCG algorithm first produces an output where Z k is the outer product of the top singular vector pair of vec −1 (−∇ f ( C k )). The calculations of vec −1 (−∇ f ( C k )) and the parameter θ k here are different from those in [21]. For problem Equation (32), and the parameter θ k as where q z k = Qvec(Z k ) and R(·) denotes the real part of a number. Since Then the obtained U k and V k are used as the initial input of the AltMin algorithm, i.e., U 0 k ← U k , V 0 k ← V k . After I a iterations of the AltMin algorithm, update U k = U I a k and V k = V I a k . For completeness, we summarize the GCG-Alt algorithm in Algorithm 1. After obtaining C, we have R r p = Γ u CΓ T u and R = P −1 ( R r p ). (u k , v k ) ← singular vector pair of Z k 5: η k ← 2/(k + 1) and determine θ k using Equation (34) 7: : 10: while i k > a do 11: i = i + 1 12: update U i k and V i k via the AltMin algorithm [21] 13: 17: end while 18: Output:

Computational Complexity
Define a flop as an operation of real-valued numbers. Let M = SK r be the number of received symbols during each snapshot. Following the computational complexity analysis in [21], the computational complexity of the GCG-Alt estimator is about 8r est (I a r est + I a + 1) where I a is the number of iterations of the AltMin algorithm and r est is the estimated rank of C by the GCG-Alt estimator. Later in Section 4, we show the computational complexity of the GCG-Alt estimator with specific examples.

Extension to the USPA System
We now follow the same process introduced in Section III. A-D to estimate the channel covariance matrix for USPA systems. To account for the different array structure of the USPA, the weight matrices of Equation (17) are redesigned. For a √ N a × √ N a USPA placed on the yz plane with distance d between adjacent antennas, the array response is where is the array response along the y axis and is the array response along the z axis. We design the weight matrices by examining the structure of T t kl defined in Equation (3) which is written as T where a t y (φ t kl , θ t kl ) and a t z (θ t kl ) are the transmitter array response vectors along the y axis and z axis, respectively. Note that T t kl is block-Toeplitz-Hermitian. Let we can verify that and T y kl and T z kl are Toeplitz-Hermitian. Then for the USPA, T t kl of Equation (8) can be written as In Section III. B, we have expressed t t kl = vec −1 (T t kl ), where T t kl of Equation (8) is Toeplitz-Hermitian matrix, in terms of a weight matrix and a vector. We have similar expressions for the vectorizations of the Toeplitz-Hermitian matrices T y kl and T z kl . Let where Γ y ∈ C N t ×(2 √ N t −1) is the weight matrix and a y kl ∈ C (2 √ N t −1)×1 , and and Γ By exploring the matrix vectorization process, we have where is the weight matrix and is a vector. Then for the USPA system, Γ u of Equation (17) becomes Equation (41) and Γ v of Equation (17) is constructed similarly as Equation (41); the sizes of vectors a kl and b kl of Equation (17) have changed: a kl ∈ C (2 √ N t −1) 2 ×1 and b kl ∈ C (2 √ N r −1) 2 ×1 , and consequently, the size for matrix C of Equation (18) has changed: C ∈ C (2 After obtaining the weight matrices, we can follow the process in Section III. C-D to estimate C and then have the channel covariance matrix estimated as R = P −1 (Γ u CΓ T v ).

Simulations
We now evaluate the performance of our proposed design for fully connected hybrid transceivers with the ULA and USPA.

The ULA System
We assume a carrier frequency of f c = 28 GHz. For the ULA system, N t = 64, N r = 16, K t = 16, and K r = 4. The number of clusters K ∈ {1, 2}, and there are L = 30 rays in each cluster. The horizontal AoDs and AoAs are generated as Equation (15) with υ t h = 10.2 • and as Equation (16) with υ r h = 15.5 • , respectively. The cluster powers are generated following [22] (Table I). We compare the GCG-Alt estimator with the DCOMP estimator in [10], which has varying receiving processing matrices W t,s and transmitting processing vectors f t,s during training and has the best performance among other estimators in [10]. The DCOMP estimator needs a dictionary matrix with G t grid points that is associated with AoD and a dictionary matrix with G r grid points that is associated with AoA. Let L p be the number of paths in the channel, the DCOMP estimator assumes that L p is known. For the DCOMP estimator, we set G t = 2N t = 128, G r = 2N r = 32, and L p = r R . Based on Figure 2, for p e = 0.99, r R = 18 and 24 for K = 1 and 2, respectively. For the GCG-Alt estimator, we set µ = σ 2 , = 0.003, and a = 0.1. The performance metric η [10] is used to measure how close the subspace of R is to the subspace of R, where M ∈ C N t N r ×r R and M ∈ C N t N r ×r R are the singular vector matrices of R and R, respectively. We also use the average of the normalized mean square error to measure their performance. We set PNR = 10 dB and the number of training beams S = 32, and compare the GCG-Alt estimator with the DCOMP estimator under different T. With S = 32 per snapshot, the sampling ratio at each snapshot is SK r /N r N t = 12.5%. The comparison result shown in Figure 3 suggests that when the sampling ratio per snapshot is 12.5%, our proposed estimator requires fewer snapshots to obtain an R whose subspace is close to that of R, as compared to the DCOMP estimator. The NMSE result shown in Figure 4 suggests that our proposed GCG-Alt estimator can obtain a more accurate covariance matrix estimate.    We also compare the computational complexity of the GCG-Alt estimator and the DCOMP estimator. The computational complexity of the DCOMP estimator is about 8TL p G t G r (M 2 + M) flops, where M = SK r . For the GCG-Alt estimator, based on our observations, the number of iterations of the AltMin algorithm I a ≤ 2, the estimated rank r est ≈ 4 when K = 1 and r est ≈ 5 when K = 2. Figure 5 shows the comparison results with different T. We can see that the computational complexity of the GCG-Alt estimator is lower than that of the DCOMP estimator. Also, the computational complexity of the GCG-Alt estimator does not increase as T increases. This is because we use S p ∈ C S 2 ×K 2 r , which is the permutation of the SCM of y t shown in Equation (25), and its size is irrelevant to T. Then we set the number of snapshots T = 40, and compare the GCG-Alt estimator with the DCOMP estimator under different S. The result shown in Figure 6 suggests that when T = 40, the GCG-Alt estimator can obtain a more accurate subspace estimation than the DCOMP estimator when the number of training beams S ≥ 24 per snapshot. Note that S = 24 corresponds to a sampling ratio of 9.375% per snapshot.  The GCG-Alt estimator explores both the Kronecker structure and the block-Toeplitz-Hermitian structure of R while the DCOMP estimator only considers the Hermitian structure of R, so the GCG-Alt estimator can reach an accurate subspace estimation of R with fewer snapshots. We use the same training for different snapshots while the DCOMP estimator uses different trainings per snapshot (i.e., varying W t,s and f t,s ). When S is small (e.g., S ≤ 16), the DCOMP estimator outperforms the GCG-Alt estimator. However, the GCG-Alt estimator performs better when S becomes larger (e.g., S ≥ 24). Note that for the DCOMP estimator, estimating more paths (i.e., L p is large) yields better performance, but its computational complexity also increases.

The USPA System
We next consider the system with the USPA at the transmitter and receiver. The parameters f c , K, L, φ t kl , and φ r kl are assumed the same as in the ULA system. The transmitter has an 8 × 8 USPA (i.e., N t = 64) and K t = 16 RF chains, and the receiver has a 4 × 4 USPA (i.e., N r = 16) and K r = 4 RF chains. We assume the elevation AoD angular spread υ t v = 0 • and the elevation AoA angular spread υ r v = 6 • based on the measurement results in [22]. The elevation AoDs and AoAs are distributed as with the elevation center angles θ t k and θ r k being generated in the same manner as the azimuth center angles in the ULA system. For the DCOMP estimator, we set The parameters L p , µ, , and a for the GCG-Alt estimator and DCOMP estimator are the same as in the ULA system.
We set PNR = 10 dB. The performance comparison with S = 32 under different T is shown in Figure 7 and the performance comparison with T = 40 under different S is shown in Figure 8. We can see that both of the GCG-Alt estimator and the DCOMP estimator achieve higher η for the USPA system. One reason for this is that the USPA system has lower resolution than the ULA system in the azimuth direction even though they have the same number of transmitter and receiver antennas. For the USPA system, the azimuth AoD is resolved by an √ N t = 8-element antenna array and the azimuth AoA is resolved by a √ N r = 4-element antenna array; while for the ULA system, the azimuth AoD is resolved by a N t = 64-element antenna array and the azimuth AoA is resolved by a N r = 16-element antenna array. Therefore, for the same angular spread, the USPA system resolves fewer paths than the ULA system, which results in a lower rank.    We also show the effects of angular spreads on the performance of the estimators. We set υ t v = 0 • , υ r v = 6 • , K = 1, PNR = 10 dB, S = 16, and T = 16. The estimators' performance under different angular spreads for the azimuth AoD/AoA (i.e., different υ t h and υ r h ) shown in Figure 9 suggests that the estimators achieve lower η when υ t h and υ r h are larger.

Conclusions
We have formulated the channel covariance estimation problem for hybrid mmWave systems as a structured low-rank matrix sensing problem by exploiting Kronecker product expansion and the structures of the ULA/USPA. The formulated problem has a reduced dimensionality and is solved by using a low-complexity GCG-Alt algorithm. The computational complexity analysis and numerical results suggest that our proposed method is effective in estimating the mmWave channel covariance matrix.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: