Precise Channel Estimation Approach for a mmWave MIMO System

: Channel estimation is a formidable challenge in mmWave Multiple Input Multiple Output (MIMO) systems due to the large number of antennas. Therefore, compressed sensing (CS) techniques are used to exploit channel sparsity at mmWave frequencies to calculate fewer dominant paths in mmWave channels. However, conventional CS techniques require a higher training overhead for e ﬃ cient recovery. In this paper, an e ﬃ cient extended alternation direction method of multipliers (Ex-ADMM) is proposed for mmWave channel estimation. In the proposed scheme, a joint optimization problem is formulated to exploit low rank and channel sparsity individually in the antenna domain. Moreover, a relaxation factor is introduced which improves the proposed algorithm’s convergence. Simulation experiments illustrate that the proposed algorithm converges at lower Normalized Mean Squared Error (NMSE) with improved spectral e ﬃ ciency. The proposed algorithm also ameliorates NMSE performance at low, mid and high Signal to Noise (SNR) ranges.


Introduction
In accordance with recent research trends, millimeter-wave (mmWave) communication has been found to be a potential candidate for next-generation Wireless Local Area Network (WLAN) and 5G cellular systems [1]. mmWave has a large amount of unused bandwidth, which can provide higher frequencies, higher spectral efficiency, and a relevant rise in channel capacity and support to connect millions of devices with high reliability and low latency [2][3][4][5][6][7]. However, the poor scattering nature of the mmWave channel causes path loss problems; as a result, fewer dominant paths remain in the mmWave channel [8][9][10]. Therefore, the channel matrix can be reconstructed by exploiting information from the remaining paths. In order to reconstruct the mmWave channel, the few dominant paths remaining in the channel matrix have to be estimated at the receiver's end. Conventional channel estimators, such as least-square (LS) estimators and maximum likelihood (ML) estimators, require a higher training overhead, which is not possible for any hybrid mmWave MIMO system. Considering the hardware architecture complexities of mmWave MIMO systems, channel estimation becomes a difficult task. In the literature, two types of approaches are used for mmWave channel estimation. In the first approach, the mmWave channel is estimated by exploiting the sparsity of the channel matrix in the virtual beamspace domain, whereas, in the second approach, the estimation is performed by exploiting the low rank properties of the channel matrix in the antenna domain. In [11][12][13][14], a compressive sensing (CS)-based channel estimation approach was used for a mmWave MIMO system. The basic idea behind this approach is based on the technique in which estimators have to search for a pair angle in a predefined dictionary matrix, which further depends upon the training information. Such estimators exploit sparsity in a predefined dictionary matrix. For high resolution, another asymmetric approach was proposed in [15]. Under this method, atomic norm is used to formulate the angles of arrival (AoAs)/angles of departure (AoDs), finding problems. A beam codebook design-based approach was proposed in [16], in which, based on the information in static dictionaries, a codebook is designed. All the aforementioned techniques are CS-based approaches. For the case of high-dimensional training information, CS-based estimators suffer from high computational load, which can further reduce the spectral efficiency. A matrix completion (MC)-based approach was proposed in [17,18], in which the estimator exploits sparsity and low-rank properties of mmWave channels by two independent procedures.
In this paper, an extended alternating direction method of multipliers (Ex-ADMM) mmWave massive MIMO channel estimation technique is proposed for more accurate estimation rates. mmWave exhibits some important and peculiar effects over the wireless channel, i.e., the channel experiences an unspecified amount of spread over the angular domain due to its easy scattering nature [19]. As a result, a jointly sparse and low-rank channel matrix in the angular domain can be obtained. This the main motivation behind the formulation of a joint optimization problem for efficient recovery of the channel matrix proposed in this paper. ADMM was recently proposed in [20], has received attention for its easy implementation. Fundamentally, ADMM can exploit both the sparsity and low rank properties of any data matrix. Researchers have found that, in many real-world applications, additional information in a data matrix (known as side information) helps complete the matrix with few entries to obtain more accurate estimations. Therefore, the use of side information for matrix completion and factorization has been introduced in various research areas, like statistical signal processing, image processing, statistical learning, computer vision and so on [21][22][23][24]. Side information is very useful to describe the row and column entries of any matrix, and it has been further shown to reduce the complexity of completing a matrix [25,26]. In this work, side information theory is used to obtain the optimal solution for a joint optimization problem. The main contributions of this work are as follows:

1.
A joint optimization estimation problem for a mmWave massive MIMO system based on the Ex-ADMM algorithm is formulated. With the help of side information in matrix completion theory, a training procedure compatible with the hybrid beamforming (HBF) structure, leveraging the low rank and sparsity in angular domains, is designed. In Ex-ADMM, the joint optimization problem is further divided into several subproblems, which are then solved individually.

2.
An Ex-ADMM is proposed for better estimation of the channel matrix. This algorithm is originally derived from [18,[26][27][28], and it is more efficient than the other iterative algorithms described in the results section. In the proposed Ex-ADMM algorithm, a nuclear norm is used for the low-rank approximations of the channel matrix, and l 1 − norm is used to enforce the sparsity, consecutively. In addition, a relaxation factor is introduced to enhance the system performance.
Simulation results demonstrate the performance of the proposed Ex-ADMM algorithm, along with orthogonal matching pursuit (OMP) [16], two-stage sparse representation (TSSR) [29], vector approximate message passing (VAMP) [30] and the alternating direction method of multipliers (ADMM) [18] in terms of normalized mean squared error (NMSE), achievable spectral efficiency (ASE) and convergence. This paper is organized as follows. The mmWave channel model is presented in Section 2. In Section 3, a mmWave channel estimation problem based on Ex-ADMM is formulated. Section 4 provides the training procedure details for the proposed Ex-ADMM. A detailed description of the proposed algorithm is provided in Section 5. Section 6 illustrates the computational complexities of proposed algorithm with respect to OMP, TSSR, VAMP and ADMM. Simulation results and discussion, describing the superiority of proposed algorithm over the benchmark algorithms in terms of NMSE, ASE and convergence, are provided in Section 7. Finally, the conclusion is presented in Section 8.
Notation: We use the following notation throughout the paper. α, a, and A denote a scalar, a vector and a matrix, respectively. A T , A H and A * indicate A's transpose, conjugate transpose and conjugate, respectively. (.) F , (.) * and (.) 1 indicate the Frobenius norm, nuclear norm (sum of all singular values), and l 1 − norm, respectively. The operands • and ⊗ represent matrix Hadamard and Kronecker products. vec(.) and unvec(.) represent the concatenation of columns of the matrix into vectors and the reverse operation, respectively. Expected value is represented by E{.}. A ∈ {0, 1} M×N indicates that A's elements are taken independently with equal probability from the binary set {0,1}.

mmWave Channel Model
Let us consider a N R × N T mmWave massive MIMO system based on the recent models described in [5][6][7]. This system is equipped with N T transmitters at the base station (BS) and N R receivers at the mobile station (MS), N S parallel data streams, and radio frequency (RF) chains, such that N RF ≤ min(N T , N R ) at the transmitter and receiver. At the transmitter, N RF chains are present, such that N S < N RF < N T . The mmWave MIMO system shown in Figure 1 is a combination of two consecutive joint segments: a digital MIMO baseband F BB ∈ C N RF ×N S and analog RF precoder F RF ∈ C N T ×N RF . Similarly, at the receiver, the signal is processed by the consecutive joint segments of the RF combiner W RF ∈ C N R ×N RF and baseband combiner W BB ∈ C N RF ×N S , respectively. In order to get the closest estimation, the transmitter employs N Beam T ≤ N T pilot beam patterns, denoted as {f a ∈ C N T ×1 : f a 2 2 = 1}. Meanwhile, at the receiver end, the receiver employs N Beam R ≤ N R pilot beam patterns, denoted as {w b ∈ C N R ×1 : w b 2 2 = 1} [16], where a and b are the transmitter's training precoding vector and receiver's training combining vector, respectively. For enabling communication, an analog HBF [11] is used, which opens a path for the transmitter to apply a baseband precoder by steering the beam to the receiver. The b-th received vector for the a-th transmitted beam is determined by Appl. Sci. 2020, 10, 4397 3 of 18 Notation: We use the following notation throughout the paper. α, a, and A denote a scalar, a vector and a matrix, respectively.

mmWave Channel Model
Let us consider a N × N mmWave massive MIMO system based on the recent models described in [5][6][7]. This system is equipped with N transmitters at the base station (BS) and N receivers at the mobile station (MS), N parallel data streams, and radio frequency (RF) chains, such that N ≤ min(N , N ) at the transmitter and receiver. At the transmitter, N chains are present, such that N < N < N . The mmWave MIMO system shown in Figure 1 is a combination of two consecutive joint segments: a digital MIMO baseband and analog RF precoder F ∈ ℂ × . Similarly, at the receiver, the signal is processed by the consecutive joint segments of the RF combiner W ∈ ℂ × and baseband combiner W ∈ ℂ × , respectively. In order to get the closest estimation, the transmitter employs N ≤ N pilot beam patterns, denoted as { ∈ ℂ × : ‖ ‖ = 1}. Meanwhile, at the receiver end, the receiver employs N ≤ N pilot beam patterns, denoted as {w ∈ ℂ × : ‖w ‖ = 1} [16], where a and b are the transmitter's training precoding vector and receiver's training combining vector, respectively. For enabling communication, an analog HBF [11] is used, which opens a path for the transmitter to apply a baseband precoder by steering the beam to the receiver. The b-th received vector for the a-th transmitted beam is determined by Here, x a is the pilot vector symbol emitted from the transmitter, v is the observation noise, with zero mean and σ 2 v K N R variance, i.e., CN 0, σ 2 v K N R , and H ∈ C N R ×N T is the channel matrix of the mmWave MIMO system. The received vector y in Equation (1), which is expressed as ], can be written in generalized form as follows: Here, the combiner noise vector Therefore, the received signal matrix Y can be expressed as Here, the received signal matrix Y y 1 , . . . , y N Beam Sci. 2020, 10, 4397 T are independent and identically distributed (I.I.D) complex additive white gaussian noise (AWGN), with zero mean and σ 2 q variance CN 0, σ 2 q . In this paper, we assume pilot symbols are identical, so X = where P t is the average transmitted pilot power [12,16]. According to the hybrid mmWave MIMO structure, the transmitted and received matrices are decomposed at the transmitter and receiver ends, such that F = F BB F RF , and W = W RF W BB , respectively.
Hence, Equation (3) of the received signal matrices Y can be re-written as Here, F RF ∈ C N T ×N T and W RF ∈ C N R ×N R are the transmitted and received beamforming matrices, respectively. F BB ∈ C N T ×N Beam T and W BB ∈ C N R ×N Beam R are the transmitted and received baseband processing matrices, respectively. W is the combiner, such that W ∈ {0, 1} N R , and F is the precoder, such that F ∈ {0, 1} N T .
We adopted the geometric virtual model (GV) of mmWave MIMO system described in [12,17]. According to the GV model, channel H is described as Here, L p is the total number of propagation paths, α l is the complex channel gain of the l-th path, which can be obtained from the random complex gaussian distributions, and CN 0, R ∈ C N R are the array response vectors (ARV) at the transmitters and receivers, respectively. Φ (l) R are the elevation and azimuth AoA and AoD angles at the transmitters and receivers, respectively. These angles are generated by Laplacian distributions, whose means are uniformly distributed over (0,2π). The ARV of a uniform linear array (ULA) [31,32] is described by a(θ) = 1 Here, λ is the wavelength, d is the antenna spacing and θ is the even function of the ARV. The channel model described in Equation (5) can be further written in matrix form (described in a virtual beamspace representation model [33,34]) as follows: Here, A R ∈ C N R ×N R and A T ∈ C N T ×N T are the unitary matrices with the ARV of the receivers and transmitters [33] By using matrix properties, we can say that A H R A R = K N R and A H T A T = K N T , with I N being the N × N identity matrix. In Equation (6), Z is the sparse matrix (which contains few virtual channel gains of greater amplitude) with dimensions Z ∈ C N R ×N T .

mmWave Channel Estimation Problem Formulation
In this section, a channel estimation problem for a mmWave MIMO system is formulated. Without loss of generality, matrix decomposition methods are used for the completion of a low-rank matrix with the help of partially observed data [35,36]. By following the given conditions, Equation (6) Here, the side information of A T and A R is used to find the missing values of matrix H. Originally, H = A R DA H T is the decomposed form of Equation (6). Here, the sum of the singular values of a matrix is obtained by imposing a nuclear norm, which also represents the tightest convex lower bound for any matrix. In this way, the low rank property of matrix H is exhibited by the nuclear norm, however, l 1 − norm bounds D to enforce the sparsity. In contrast, the weighting factors Γ H and Γ D , which depend upon the number of propagation paths, are always considered as positive integers [35].

Proposed Extended ADMM (Ex-ADMM)-Based Channel Estimation Scheme
In the following section, the proposed Ex-ADMM is described in detail. From Section 3, Equation (7) is clearly a convex function with two objectives. Therefore, there are many possible ways to get the global optimal solution. However, the best ways to solve the convex problem are first-order methods, which only require the first-order information of the optimization problem. Since these methods are computationally expensive, alternating optimization techniques (AOTs) [26] are used to obtain the optimal solution to a convex optimization problem, due to their less complex structure and easy handling. ADMM [20] is one very popular and efficient AOT. To solve the convex optimization problem described in Equation (7), an extension of ADMM [28] known as extended ADMM (Ex-ADMM) is proposed for the channel estimation of a mmWave MIMO system.
In order to solve the optimization problem described in Equation (7) by Ex-ADMM, we reformulate the optimization problem, and two auxiliary matrices, I ∈ C N R ×N T and J I − A R DA H T , are introduced. As a result, the reformulated optimization problem can be re-written as, minmize H,I,D,J The first term in the optimization problem described in Equation (8) is the side information of the matrix, i.e., A R and A T , along with the virtual channel gain in matrix Z. The second term is the information on subsampled virtual channel gain. The third term in the optimization problem is described by Equation (8), and considers the discretization error, and the fourth term is the AWGN noise. Afterwards, Equation (8) can be re-written with consideration of the augmented Lagrangian function (ALF) as follows: Appl. Sci. 2020, 10, 4397 6 of 19 Here, P 1 and P 2 ∈ C N R ×N T are Lagrange multipliers. β is the step size of Ex-ADMM and β > 0. For the next iteration, the Ex-ADMM can split Equation (9) into six sub-equations and solve it alternatively, i.e., In Equations (10)-(15), H (l+1) is involved in every step; henceforth, H (l+1) is known as an intermediate variable. In contrast, variables I, D and J are known as essential variables, and Lagrange multipliers P 1 and P 2 are recognized as dual variables. Another ADMM algorithm mentioned in [18] is also able to solve the optimization problem in (9) but the proposed Ex-ADMM algorithm converges at a lower NMSE as compared to the ADMM and provides better ASE performance as well.
In general, ADMM updates its order in an arbitrary way, i.e., H → I → D → J → P 1 → P 2 . In contrast, the Ex-ADMM updates its order in a cyclical way, i.e., H → P 1 → P 2 → I → D → J . In the proposed Ex-ADMM, reordering of the ADMM is only done to ensure the channel matrix H satisfies the first-order optimality conditions [39]; therefore, right after the updating of H (l+1) , dual variables P 1 and P 2 are updated and then, at last, the essential variables I, D and J are updated. A relaxation technique, along with a relaxation parameter, is used to relax the essential variables [41,42]. Consequently, for the relaxation of the essential variables, we can assume that the cyclical order of dual and essential variables in Equations (11)-(15) is a block variable, i.e., z (l) = P , . Thus, the final relaxed variable, z (l+1) = P 1 (l+1) , P 2 (l+1) , I (l+1) , D (l+1) , J (l+1) , can be generated as Here, roughly speaking, the tilde (∼) variables (i.e., P 1 (l+1) , P 2 (l+1) , I , D and J ) are auxiliary variables, and they can be updated using the main variables (i.e., P (l+1) 1 , P (l+1) 2 , I (l+1) , D (l+1) and J (l+1) ), and these variables can be obtained using the Equations (11)-(15) as follows: Therefore, the final relaxed variable can be expressed as From the above discussion, it is clear that the Equations (10) and (21) have several advantages: i.e., the optimization problem described in Equation (9) turns into six individual subproblems, and these can be solved further without any strict conditions. Hence, the global optimal solution can be derived effortlessly. This helps reduce the computational complexity and storage requirements. For the next iteration, the values of the subproblems described in Equations (10) and (21) have to be updated. In order to do so, first of all, the closed-form solutions of Equations (10)-(15) need to be obtained. The advantages of Ex-ADMM over ADMM are described further in Section 5.

Procedure for Updating H (l+1)
In order to obtain the closed-form solution for H, reformulate L A to L 1 and consider all terms with respect to H in Equation (9). Keeping only the terms that are related to it, Here, Equation (22) is the solution of Equation (10). To get the closed-form solution, a singular value thresholding (SVT) operator [43] is implemented on Equation (22): Here, U ∈ C N r ×r and V ∈ C N r ×r are the side singular vector matrices of the matrices (I (l) − 1 β P (l) 1 ) and h i µ i − Γ β , respectively, where Γ is the SVT threshold operator and µ i denotes the r singular values.

Procedure for Updating I (l+1)
To update I (l+1) , reformulate L A to L 2 , and, to get the closed-form solutions for I, consider all terms of I in Equation (9) and differentiate with respect to I: where K is the identity matrix, A i exhibits the k-th row, and E kk is derived by inserting unit values in the N R × N R zero matrix at its (k,k)-th position and B A * T ⊗ A R . Therefore, the closed-form solution of i for the (l + 1) iteration is To get the final solution, unvectorized i (l+1) , i.e.,

Procedure for Updating D (l+1)
To get the solution for D, reformulate L A to L 3 and consider all terms corresponding to D in Equation (9). Keeping only the terms that are related to it, Here, A R and A T are the unitary matrices. To get the closed-form solution of L 3 , Equation (28) is transformed into the standard least absolute shrinkage and selection operator (LASSO) problem [44]. In order to do this, vectorization is performed on Equation (28): Let us assume V (l+1) = A H R A T 1 β (P (l+1) 2 − J (l+1) + I (l+1) and v (l+1) = vec V (l+1) . Therefore, Equation (29) can be written as Thus, to obtain the estimate of d in Equation (30), a soft thresholding operator is imposed for (l + 1) iterations:

Procedure for Updating J (l+1)
To update J (l+1) , reformulate L A to L 4 , and, to obtain the closed-form solution for J, take all terms corresponding to J in Equation (9) and differentiate with respect to J: By setting L 4 to zero, i.e., L 4 = 0, Hence, for iteration (l + 1), The dual variables P 1 and P 2 can be directly updated using the H, I, D and J variables.

Algorithm Description
Algorithm 1 demonstrates the proposed Ex-ADMM-based mmWave channel estimation scheme, which is originally derived from the ADMM described in [18,28]. In the proposed Ex-ADMM algorithm, first of all, the parameters H (0) = P (0) = 0 are initialized [18]. The main objective of the proposed algorithm is to update H (l+1) (in step 2). This step is the most crucial task, which is done by deriving the Lagrangian L A to L 1, and then implementing the SVT operator on Equation (22). In order to efficiently update H (l+1) , the SVT operator computes the µ i singular values of h i . At every instant, the proposed algorithm is required to run the subsampled version of H as an input, which is known as H Ψ . For the training procedure, the mmWave MIMO model detailed in Section 2, inspired from [12,45], has been adopted. For the general case, let us assume that only a single set of transmitter and receiver antennas are operational at each illustration of t. According to the adopted hardware structure described in Section 2, the transmitter's training precoding vectors and the receiver's training combining vectors have non-zero (unit) values at their respective ij-th position in matrix Ψ. At any t-th training illustration, followed by the matrix Ψ, the subsampled matrix H Ψ also has the estimated non-zero values in the ij-th positions. In contrast, the length of the training symbols is equal to the position of the non-zero elements, i.e., T = M and M N R N T . This factor is the stopping criterion of the proposed Ex-ADMM algorithm. In step (3) of the proposed Ex-ADMM algorithm, to maintain the cyclical order, dual variables P 1 (l+1) and P 2 (l+1) are updated first, with the help of Equations (16) and (17). The values of the dual variables P 1 and P 2 can be directly calculated using Equations (23), (26), (31) and (34).
In step (4), the first subproblem, I , has been updated using Equation (18). Steps (5) and (6)  , using Equations (19) and (20), , D (l+1) and J (l+1) ) have been updated in step (7) using Equation (21)). The relaxation factor γ provides better NMSE performance and a slightly improved convergence rate for all training lengths. Although Ex-ADMM is better than the other benchmark algorithms described in this paper, there is a major drawback regarding the knowledge produced during its implementation. When the proposed Ex-ADMM algorithm is implemented to solve sparse optimization problems on I (l+1) , D (l+1) , J (l+1) , P ), (P (l) 1 and P 1 (l) ) and (P (l) 2 and P 2 (l) ), respectively. The most expedient way to overcome this problem is, when the proposed algorithm reaches stopping criterion, to run only the cyclical part of the algorithm for one final iteration to ensure the sparsity (i.e., step 8).

Require : Subsampled matrix H Ψ , side information matrices A R and A T , and the set of indices of observed entries in Ψ Input
: H Ψ , Ψ, A R , A T , ρ, γ, Γ H , Γ Z and I max Output : Estimated output channel matrixĤ = H (I max ) Initialization: : Update H (l+1) by using Equation (23).

Step 9
: end do Step10 : end for
For the general case, TSSR is much faster than any one-stage method. The computational complexity of TSSR is dependent upon the calculation of its maximum diagonal elements and the smaller number of off-diagonal elements in its sparse matrix. The complexity order of the TSSR algorithm is O(nlog(n)), [46], where n is the full column rank of the targeted sparse matrix.
In VAMP, the complexity is dominated by the matrix vector multiplication. The computational complexity of VAMP is the order of O(N R N T Llog(N R N T L)) [47], where N R and N T is the number of receivers and transmitters, respectively, and L is number of channel paths or the sparsity level of the channel.
In OMP, the computational complexities depend upon the sparsity of the dictionary matrix and the number of grids. Mathematically, this can be expressed as O LlnG 2 . Here, L is the number of paths or the sparsity level of the channel and G is the grid of the dictionary matrix in which the AoAs and AoDs are distributed uniformly [16]. The computational complexity of ADMM reflects the number of iterations, I max , and the number of transmitting and receiving antennas, i.e., N T and N R , respectively. In ADMM, appropriately, matrix factorization can be performed offline, and only the matrix vector product has to be calculated online. This reduces the computational complexities of the ADMM in a significant way. The complexity of ADMM can be expressed as O N 2 R N T [18], where N R and N T are the numbers of receivers and transmitters, respectively. The proposed Ex-ADMM algorithm is originally derived from ADMM. Therefore, the complexity of the proposed Ex-ADMM algorithm remains the same as that of ADMM, i.e., O N 2 R N T .

Simulation Results and Discussion
In this section, the numerical results are explained to showcase the accomplishment and performance of the proposed algorithm (Ex-ADMM) with different standard algorithms.

System Model
The HBF system architecture was adopted as described in [45]. A total of 64 transmitting and receiving antennas, i.e., N T = N R = 64, were considered at the BS and MS. The antennas were assumed to be in a ULA configuration. Laplacian distributions with 55 • standard deviation were used to produce the Azimuth angle of the AoAs and AoDs. Phase shifters had the quantized phase, whereas spacing between antennas was determined by d = λ 2 .

Channel Model
A mmWave channel with two paths and two clusters is considered here. Over the range of [0,2π], AoAs and AoDs were uniformly distributed. The noise is complex additive white gaussian noise (AWGN), with zero mean and σ 2 q variance. A signal-to-noise Ratio (SNR) of 30 dB, mathematically defined as SNR σ −2 q , was used for simulation. The frequency of the mmWave channel model used for simulation was 90GHz.

Simulation Environment
In this section, we interpret the results of a comparison between the proposed Ex-ADMM algorithm and the TSSR, OMP, VAMP and ADMM algorithms, in terms of ASE, NMSE, the convergence of Ex-ADMM with respect to ADMM and SVT, and the effect of NMSE on the proposed Ex-ADMM, TSSR, OMP, VAMP and ADMM algorithms for multiple paths. An average of 100 independent iterations and 100 Monte Carlo realizations were considered for the simulations [48]. Three different training symbol lengths were considered for training purposes, i.e., T = 400, 800 and 1200.

Results and Discussions
OMP determines the sparsity of the channel or dictionary matrix; VAMP also uses this concept with a few differences. On the other hand, TSSR individually exploits the low rank property of the channel matrix along with the sparsity. At first, an SVT operator was implemented to recapture the channel matrix H. The SVT threshold operator Γ = β H ψ , while β = 3M N R N T was fixed to exploit the low rank property. Channel sparsity depends upon the number of paths in the mmWave channel, i.e., the sparsity of the channel is fixed with the number of propagation paths. For the case of the Ex-ADMM algorithm, the weighting factor of the channel matrix is Γ H = β H ψ , where β is set to be 0.005, and the weighting factor with respect to the sparse matrix D is obtained by . Under the aforementioned constraints, an analysis was conducted, and results are explained as follows.

Comparison of ASE
In Figure 2a-c, the ASE (in bits/sec/Hz) at the SNR points for OMP, TSSR, VAMP, ADMM and the proposed Ex-ADMM algorithm with perfect CSI is shown. The following expression is used to calculate the ASE [49,50],

HH H
The performance of the OMP in terms of ASE at low-to-mid SNR points is moderate, and it gets worse at mid-to-high SNR points as the training symbols' length is increased, i.e., T < 400. As is clear from Figure 2a, for all SNR points, VAMP performs very poorly for smaller training symbol lengths; i.e., T < 400. When the training symbol length is increased, i.e., T ≥ 800, the performance of VAMP increases significantly, as depicted in Figure 2b,c. It was found that, for different T values, the performance of TSSR is very bad at almost every SNR point relative to OMP, TSSR, VAMP, ADMM and Ex-ADMM. ADMM, at all SNR points, performed better than OMP, TSSR and VAMP for all training symbol lengths. In the case of Ex-ADMM, for T = 400, Ex-ADMM outperformed the OMP, TSSR, VAMP and ADMM algorithms at all SNR points. For T = 800 and 1200, at low-to-mid SNR points, ADMM and Ex-ADMM performed similarly, but, as the SNR range increased from mid to high, Ex-ADMM outperformed ADMM. Moreover, for high training symbol lengths, Ex-ADMM was very close to achieving the perfect CSI.
i.e., the sparsity of the channel is fixed with the number of propagation paths. For the case of the Ex-ADMM algorithm, the weighting factor of the channel matrix is Γ = β H , where β is set to be 0.005, and the weighting factor with respect to the sparse matrix D is . Under the aforementioned constraints, an analysis was conducted, and The performance of the OMP in terms of ASE at low-to-mid SNR points is moderate, and it gets worse at mid-to-high SNR points as the training symbols' length is increased, i.e., T < 400. As is clear from Figure 2a, for all SNR points, VAMP performs very poorly for smaller training symbol lengths; i.e., T< 400. When the training symbol length is increased, i.e., T ≥ 800, the performance of VAMP increases significantly, as depicted in Figure 2b,c. It was found that, for different T values, the

Comparison of NMSE
The performance of OMP, TSSR, VAMP, ADMM and the proposed Ex-ADMM algorithm in terms of NMSE, at different SNR points with respect to different training symbol lengths T, is depicted in Figure 3a-c. The following relation is used to calculate NMSE: whereĤ is the estimated channel matrix and H is the true channel matrix.Ĥ is determined by H (I max ) . The performance of OMP at different SNR points is almost constant for all training symbol lengths, and, due to its huge dictionary matrix, does not change for any increment of T. Thus, OMP may suffer from a discretization problem. One can therefore say that the OMP is not capable enough to recover small training symbols.
Appl. Sci. 2020, 10, 4397 13 of 18 The performance of OMP, TSSR, VAMP, ADMM and the proposed Ex-ADMM algorithm in terms of NMSE, at different SNR points with respect to different training symbol lengths T, is depicted in Figure 3a-c. The following relation is used to calculate NMSE: where H is the estimated channel matrix and H is the true channel matrix. H is determined by H ( ) . The performance of OMP at different SNR points is almost constant for all training symbol lengths, and, due to its huge dictionary matrix, does not change for any increment of T. Thus, OMP may suffer from a discretization problem. One can therefore say that the OMP is not capable enough to recover small training symbols.  On the other side, it was found that VAMP exhibited poor performance at T < 800, but, as depicted in Figure 3b,c, the performance of VAMP increases as the training symbol length increases. VAMP is used to calculate statical information on sparse signals, which is impossible in case of small training symbol lengths, but, at low-to-mid SNR points, as the training symbol length increases to T > 800, it exhibits a significant improvement in NMSE performance compared to other algorithms. TSSR is used for two-stage estimation. In the simulation results, TSSR is not capable of recovering the estimated values for low and high training lengths. TSSR exploits the low rank and sparsity of any channel matrix individually. In contrast, ADMM has the capacity to exploits low rank and sparsity together, with the channel matrix for any training symbol length. On the other side, it was found that VAMP exhibited poor performance at T < 800, but, as depicted in Figure 3b,c, the performance of VAMP increases as the training symbol length increases. VAMP is used to calculate statical information on sparse signals, which is impossible in case of small training symbol lengths, but, at low-to-mid SNR points, as the training symbol length increases to T > 800, it exhibits a significant improvement in NMSE performance compared to other algorithms. TSSR is used for two-stage estimation. In the simulation results, TSSR is not capable of recovering the estimated values for low and high training lengths. TSSR exploits the low rank and sparsity of any channel matrix individually. In contrast, ADMM has the capacity to exploits low rank and sparsity together, with the channel matrix for any training symbol length.

(a)
Estimation, in the proposed Ex-ADMM algorithm, is done by SVT implemented on H, which exploits the low rank property of H and l 1 − norm, enforced on the submatrix to ensure the sparsity. Estimation of H ψ becomes noisier, as the proposed Ex-ADMM algorithm and ADMM lack array gain, but the noise is not severe enough to cause a major effect on the estimated values. The proposed Ex-ADMM algorithm performed better than OMP, TSSR, VAMP and ADMM in terms of NMSE at different SNR points with respect to different training symbol lengths.

Comparison of Convergence
In Figure 4a-c, three values of γ, i.e., γ = 0.5, 1 and 1.5 are used for simulation to demonstrate the effect of the relaxation factor on convergence with respect to the NMSE of the proposed Ex-ADMM algorithm. As the value of γ increases from 0.5 to 1 and then to 1.5, the convergence of proposed algorithm gets slightly better and converges to smaller NMSE values for all training symbol lengths (i.e., T = 400, 800 and 1200) as compared to ADMM and SVT. Effects on NMSE for different algorithms of multiple paths are shown in Figure 4d. Due to the poor scattering nature of mmWave, the performance of all algorithms gets worse with escalation of the number of paths, L P . However, the performance of the proposed algorithm is still better than that of the others.
VAMP is used to calculate statical information on sparse signals, which is impossible in case of small training symbol lengths, but, at low-to-mid SNR points, as the training symbol length increases to T > 800, it exhibits a significant improvement in NMSE performance compared to other algorithms. TSSR is used for two-stage estimation. In the simulation results, TSSR is not capable of recovering the estimated values for low and high training lengths. TSSR exploits the low rank and sparsity of any channel matrix individually. In contrast, ADMM has the capacity to exploits low rank and sparsity together, with the channel matrix for any training symbol length.

Conclusions
In this paper, an extended version of ADMM (Ex-ADMM) was proposed for mmWave channel estimation. In the proposed scheme, a joint optimization problem was formulated, exploiting the sparsity and low rank properties of the channel matrix. The proposed Ex-ADMM algorithm exploits both the properties of targeted optimization problems by breaking them into several subproblems. These subproblems are then solved effortlessly, by acquiring their closed-form solutions independently. A relaxation factor was introduced to converge the proposed algorithm to smaller NMSE values for all training symbol lengths. Comprehensive simulation experiments were performed to validate the performance of Ex-ADMM. The proposed Ex-ADMM algorithm outperformed other benchmark algorithms in terms of ASE, NMSE and convergence rate.