Variational Sparse Bayesian Learning for Estimation of Gaussian Mixture Distributed Wireless Channels

In this paper, variational sparse Bayesian learning is utilized to estimate the multipath parameters for wireless channels. Due to its flexibility to fit any probability density function (PDF), the Gaussian mixture model (GMM) is introduced to represent the complicated fading phenomena in various communication scenarios. First, the expectation-maximization (EM) algorithm is applied to the parameter initialization. Then, the variational update scheme is proposed and implemented for the channel parameters’ posterior PDF approximation. Finally, in order to prevent the derived channel model from overfitting, an effective pruning criterion is designed to eliminate the virtual multipath components. The numerical results show that the proposed method outperforms the variational Bayesian scheme with Gaussian prior in terms of root mean squared error (RMSE) and selection accuracy of model order.


Introduction
Channel modeling plays a pivotal role in the design and evaluation of wireless communication networks. In the fifth-generation (5G) mobile communication systems, the diversity of the communication scenarios leads to the urgent need for advanced channel modeling techniques with higher accuracy and lower implementation complexity. In the existing channel measurement campaigns, the channel sounding technique based on pseudo-random (PN) sequences has widely been adopted due to its higher processing gain and delay resolution. As for the estimation of the multipath characteristics such as power delay profile, Doppler power spectrum, and frequency response from the obtained channel responses, super-resolution algorithms such as space-alternating generalized expectationmaximization (SAGE) [1] and subtractive deconvolution technique (CLEAN algorithm) [2] are widely applied. However, since the SAGE and CLEAN algorithms are based on the maximum likelihood criterion, the selection of model order, i.e., the number of multipath, is a nontrivial task. The estimated order of the channel model tends to be overestimated because of the contamination of noise. The unnecessary "virtual paths" will not only affect the accuracy of the channel coefficients estimation but also increase the computational cost of the algorithm. To solve this problem, different techniques such as negative log-evidence (NLE) [3] and Bayesian information criterion (BIC) [4] have been proposed to estimate the model order. However, the estimation performance of these model estimation schemes is far from ideal, especially when considering the low signal-to-noise ratio scenarios [5]. Since considerable computation is needed to find the optimal model order value, the existing techniques also suffer from prohibitive complexity.
Due to the fact that the multipath components (MPC) are sparse in channel impulse responses (CIR) [6], the joint estimation problem of channel parameters and channel model order can be addressed by sparse Bayesian learning (SBL). SBL is a typical machine learning Entropy 2021, 23, 1268 2 of 13 algorithm that was first proposed by Tipping in 2001 [7] and has been developed into an important branch of compressed sensing reconstruction algorithms. The basic idea of SBL is to introduce the sparsity parameter to establish a hierarchical Bayesian model. By defining the priors on the sparse vector, SBL can automatically determine the position and amount of the non-zero elements in the reconstructed signal without pre-specifying the sparsity number. SBL is a promising method for the joint estimation of channel model order and the channel response coefficients. However, even after the hierarchical Bayesian model is established, the posterior PDF cannot be directly derived due to an intractable integral of the joint probability density. Thus, the variational inference is introduced to obtain an approximation of the parameter posterior in the SBL framework. This method, also known as variational sparse Bayesian learning (VSBL) [8], has been widely applied in online spectrum estimation [9], orthogonal time-frequency space (OTFS) detector [10], multi-user joint decoding [11], joint signal detection [12], and high-resolution radar imaging [13], etc. For channel parameter estimation, the idea of complete hidden data is introduced to speed up the convergence in [5]. The simulation result shows that the VSBL scheme outperforms the BIC criterion in both root mean squared error (RMSE) and model order selection. In [14], the delay of MPCs is modeled as the Poisson distribution, which incorporates the Fast VSBL based channel estimation as a priori information and significantly improves the bit error ratio (BER) performance.
Several channel measurement campaigns conducted in high-speed railways [15], airport grounds [16,17], and inland rivers [18] have shown that Weibull and Nakagamim distributions are more suitable for modeling channel characteristics compared with Rayleigh and Rice distribution. This phenomenon originates from the uneven distribution of scatterers in the mobile communication environment. However, in most VSBL models, the channel taps were assumed to be Gaussian distributed [19,20], which is not appropriate for parameter learning. In this paper, the Gaussian mixture model (GMM) is adopted to describe the statistical characteristics of the MPCs. A GMM is linearly composed of K Gaussian PDFs with corresponding means and covariances. By using sufficient Gaussian components, the GMM can be flexible to approximate any given PDF [21], which is suitable for establishing various fading channels models.
The contributions of this paper are summarized as follows: • The Gaussian mixture model as a powerful method for sparse parameter learning for wireless channels is introduced to the estimation problem of wireless channel parameters under the VSBL framework. The flexibility of GMM is capable of describing the statistical characteristics of both theoretical general channels and complex real-world channels.

•
A new variational Bayesian inference scheme for the Gaussian mixture model (VB-GMM) is developed based on multiple channel observations. The corresponding graphical model is given, and the closed-form updates of the model variables are derived. By setting a pruning criterion on the sparsity priors, the joint estimation of channel parameters and model order is achieved with low complexity.

•
The simulation results demonstrate that the performance of VB-GMM is superior to the existing algorithms in terms of the estimation error, the convergence rate, and the model order selection accuracy in most non-Gaussian channels.
Throughout the paper, we use the following notations. (·) H and (·) −1 denote the Hermitian transpose and the inverse of matrices, respectively. The expression CN (µ, Σ) represents the multivariate complex Gaussian distribution with mean vector µ and covariance matrix Σ. Ga(α|a, b) denotes the Gamma distribution p(α) = b a Γ(α) · α a−1 e −bα , α > 0 in which a, b and Γ(·) represent shape parameter, scale parameter and Gamma function, respectively. We use calligraphic uppercase letters to represent sets, e.g., I denotes the index set of MPCs and I(l) is the complement set of l ∈ I. The remainder of the paper is organized as follows. In Section 2, the signal model is introduced, and the graphical model of VB-GMM is presented. In Section 3, the derivations of the VB-GMM estimators are given Entropy 2021, 23, 1268 3 of 13 followed by the initialization algorithm and the pruning criterion. Finally, in Section 4, the numerical results are presented to demonstrate the effectiveness of the proposed scheme.

Signal Model
Consider a single-input-single-output (SISO) wireless channel measurement system, the sounding signal is denoted as s(t) = ∑ , in which the pseudo-random sequence {p i }, i = 0, . . . , N C − 1 has N C chips and the shaped pulse b(t) has the duration T c . Assume that N S copies of s(t) are transmitted periodically to obtain the time-varying channel responses. After passing through the multipath channel, the signal at the receiver is made of L duplications of s(t) with additive Gaussian white noise η(t), and the received signal corresponding to the n-th s(t) can be expressed as where w ln is the complex coefficient of the l-th multipath and θ ln denotes the dispersion parameter vector including multipath delay, DoA, DoD, etc. We define y n = [y n (0), y n (T s ), . . . , y n ((D − 1)T s )] T as the discrete samples of the n-th received signal in which T s is the sampling interval at the receiver and D represents the number of discrete samples, i.e., DT s = N c T c . The discrete vector y n is considered as an observation of the measured channel. In our proposed scheme, an Bayesian probabilistic model is established to infer the posterior probabilities of the channel parameters {w ln , θ ln , L} based on multiple observations. Since the proposed algorithm is developed based on time-varying channels, it is crucial to determine the wide-sense stationary (WSS) region in which the statistical characteristics of channel parameters are assumed invariant. For vehicle-to-vehicle channel, the spatial distance of 20~40 times the wavelength is considered to be WSS [22]. For convenience, we denote the multiple observation matrix as y = [y 1 , y 2 , . . . , y N ] D×N , where N represents the number of channel responses obtained in a WSS region. If the spatial distance of the WSS is chosen properly according to the criterion, the dispersion parameter vector is assumed to be constant for the observations, i.e., θ l1 = . . . = θ lN = θ l , n = 1, . . . , N. Thus Equation (1) can be rewritten in a discrete form as: w ln s(θ l ) + η (2) in which η~CN (0,Σ) is the Gaussian white noise vector.
In the SAGE algorithm, the complete parameter set Ω is divided into several subsets Ω sub so that the joint estimation for Ω is decomposed into optimizations for Ω sub in a sequential manner. Let x sub denote the admissible hidden data with respect to Ω sub , when the equation p(y|x sub , Ω ) = p y x sub , Ω sub p(x sub |Ω ) is satisfied, the likelihood corresponding to the estimation monotonically increases and may converge to the local maximum [23]. In this paper, the concept of admissible hidden data x l is also adopted. Consider an MPC l we have x l = w l s(θ l ) + η l in which η l represents the noise that the l-th MPC contribute to the total noise η with E η l η H l = Σ l = βΣ, 0 ≤ β ≤ 1, and the likelihood p(x l |θ l , w l ) = CN (x l ; w l s(θ l ), Σ l ) . Then the covariance of the noise that is not related to the l-th component is ∆ l = (1 − β)Σ, which can also affect the estimation of x l when observations are obtained and fixed. The noise covariance has a significant impact on the estimation results of channel parameters. Since the length of the channel sounding frame is generally much longer than the channel delay spread, most of the channel response samples are considered to be measurement noise. In this paper, the noise covariance is computed by the tail of CIRs.

The GMM Model of Channel Coefficients
Denote the N channel coefficients of the l-th multipath as w l = [w l1 , w l2 , . . . , w lN ] 1×N . Assume that w l obeys the Gaussian mixture distribution with K components: where π l = [π l1 , π l2 , . . . , π lK ], ∑ K k=1 π lk = 1 is defined as mixing coefficients. µ lk and Λ lk represent the complex mean and covariance matrix of the k-th Gaussian distribution, respectively. For w l , each sample can only be derived from one component of the Gaussian mixture distribution. Therefore, the discrete hidden variable c l is introduced. c l is a 0-1 matrix in which c lnk = 1 means the channel coefficient w ln corresponds to the k-th Gaussian component. Given the mixing coefficients, the probability of hidden variables takes a product form: For the variable π l a Dirichlet distribution prior is applied: i.e., γ l1 = γ l2 = · · · = γ lK . The Dirichlet distribution is introduced here as the conjugate prior of the polynomial likelihood function p(c l |π l ), which means the posterior of c l has the same form of Dirichlet distribution. In variational inference, the application of conjugate priors makes it convenient to merely update the parameters of the density functions with their form unchanged. The graphical model for the variational Bayesian-based GMM is presented in Figure 1. Denote the parameter set as Ω = {θ, w, µ, c, α, π} the joint probability density can be rewritten according to the structure of the graph: in which The sparsity prior of l-th path p(w l |α l , c l , µ l ) is a key to restrain the gain of MPCs and implement model order selection. The sparsity prior is usually chosen as Laplace [24] or Gaussian distributed [7]. The Laplace prior leads to a l 1 -type sparsity and Gaussian prior a l 2 -type sparsity [5]. In the proposed scheme each component of the GMM is specified with an independent Gaussian sparsity prior, i.e., p(w ln |α lk ; c lnk = 1 ) = CN µ lk , α −1 lk . Thus consider all the observations for the l-th path, we define The nonnegative sparsity parameter α lk is inversely proportional to the width of the Gaussian function. Consider a situation that α lk → ∞ , the Gaussian function will Entropy 2021, 23, 1268 5 of 13 be concentrated at the mean value µ lk , which renders the MAP estimations of channel coefficients to be constant. Due to the time-varying nature of wireless channels, the channel coefficients cannot be a constant value over time, unless they are concentrated at the value of zero, i.e., the l-th MPC is considered to be the measurement noise and should be removed. By setting a large threshold for α l , the irrelevant MPCs are pruned and the sparsity is obtained. The prior of the sparsity parameter p(α lk ) is set to be a Gamma distribution p(α lk ) = Ga(α lk |a lk , b lk ) with a non-informative hyperprior for all branches, i.e., a lk = b lk = 10 −7 , k =1, 2, . . . , K. Such choice is proper when no prior information is available, and it has been widely applied in various Bayesian hierarchical models, e.g., [5,7,8].
Thus consider all the observations for the l -th path, we define ( ) The nonnegative sparsity parameter lk α is inversely proportional to the width of the Gaussian function. Consider a situation that → ∞ lk α , the Gaussian function will be concentrated at the mean value lk μ , which renders the MAP estimations of channel coefficients to be constant. Due to the time-varying nature of wireless channels, the channel coefficients cannot be a constant value over time, unless they are concentrated at the value of zero, i.e., the l -th MPC is considered to be the measurement noise and should be removed. By setting a large threshold for α l , the irrelevant MPCs are pruned and the sparsity is obtained. The prior of the sparsity parameter ( ) lk p α is set to be a Gamma distri- with a non-informative hyperprior for all branches, i.e., Such choice is proper when no prior information is available, and it has been widely applied in various Bayesian hierarchical models, e.g., [5,7,8].

GMM-Based Variational Bayesian Learning
In this section, the VSBL is applied to the estimation problem of the Gaussian mixture channel model. First, the basic principle of variational Bayesian inference is introduced. Then the update expression of each variable in the graphical model is derived. Lastly, the appropriate initialization algorithm and corresponding parameter settings are given.

Variational Bayesian Inference
Consider a probabilistic model defined by observed variables  , hidden variable sets  , and the corresponding joint probability density ( ) , p   . Our target is to obtain the posterior ( ) ( ) ( ) However, the computation of the marginal likeli- Therefore, a PDF with a simpler structure is introduced to approximate the posterior distribution, i.e.,

GMM-Based Variational Bayesian Learning
In this section, the VSBL is applied to the estimation problem of the Gaussian mixture channel model. First, the basic principle of variational Bayesian inference is introduced. Then the update expression of each variable in the graphical model is derived. Lastly, the appropriate initialization algorithm and corresponding parameter settings are given.

Variational Bayesian Inference
Consider a probabilistic model defined by observed variables O, hidden variable sets H, and the corresponding joint probability density p(H, O). Our target is to obtain the posterior p(H|O ) = p(H, O)/p(O). However, the computation of the marginal likelihood function p(O) is intractable since it involves a multi-dimensional and complicated integral p(H, O)dH. Therefore, a PDF with a simpler structure is introduced to approximate the posterior distribution, i.e., q(H) ≈ p(H|O ) (11) where q(H) is the variational posterior distribution (VPD). In the variational Bayesian framework, the Kullback-Leibler (KL) divergence is defined as To obtain a deterministic approximation of the posterior, (12) should be minimized, and the optimal VPD q * (H) is achieved only if KL(q p ) = 0. Using Bayes theorem to replace p(H|O ), it makes the derivation of VPD a tractable optimization problem: For simplicity, we adopt the mean-field theory and assume the joint PDF factorizes as For most of the variables including {w, α, c, π, x}, (13) is optimized over the exponential family of distributions. In this case, the KL divergence vanishes, and the optimal solution of (13) is assumed to be achievable. For µ and θ, the optimization on q s (H s ) is restricted on Dirac measures to obtain point estimates. By optimizing (13), the solution that minimizes the KL divergence is: In (15), E q(x) (·) denotes the expectation with respect to the distribution q(x),MB{H s } represents the Markov blanket [25] of the variable H s . In the Bayesian network, the Markov Blanket of a node is the set of the nodes comprised of the node's parents, its children, and its co-parents. I.e., MB{H s } is the minimal set around H s which makes H s independent of the rest of the variables.

Update of the Estimation Expressions
As discussed in Section 2, a hierarchical Bayesian model is set up to find sparse MPCs and infer the parameter sets Ω = {θ, w, µ, c} from the measurements y. A detailed derivation of the VPDs for each factor is given below. The VPDs can be updated in any order during the iteration process.

1.
Estimation of q(x l ): From the graphical model in Figure 1 and (15), the VPD of x l is expressed as q(x l ) ∝ exp E q(w I )q(θ I ) (ln{p(x l |y, θ I , w I )}) , in which p(x l |y, θ I , w I ) = p(x l |θ l , w l )p(x l |y, θ I(l) , w I(l) ) In (16), I is the set of all MPCs. Note that scattering parameters θ l should be determined for a specific MPC, thus, we define q(θ l ) to be a point estimation, i.e., q(θ l ) = δ(θ l −θ l ), whereθ l denotes the estimated value for θ l in a former update iteration. Based on the Gaussian noise assumption in Section 2.1, we have p(x l |y, θ I(l) , w I(l) ) = CN (x l |y − ∑ p∈I (l) w p s(θ p ), ∆ l ) By substituting p(x l |θ l , w l ) and (17) into (16), the admissible hidden data of the n-th observation obeys a Gaussian distribution with the mean and covariance matrix as follows:

3.
Estimation of q(w l ): From the graph the Markov blanket of w l is {x l , α l , θ l , c l , µ l }, thus By integrating the factors in MB{w l }, the VPD can be written as q(w l ) = ∏ ∏ ∏ N n=1 q(w ln )

6.
Estimation of q(π l ): The only variable related to π l is c l , thus, q(π l ) ∝ p(π l ) exp(E q(c l ) {ln p(c l |π l )}). By substituting (5) and (6) into (15) we obtain which indicates that the VPD also satisfies the Dirichlet distribution. The parameter of the VPD can be updated byγ

7.
Estimation of q(µ l ): The Markov blanket of µ l is MB{µ l } = {w l , c l , α l }, so that p(µ l ) ∝ p(µ l ) exp E q(w l ,c l ,α l ) ln{p(w l |α l , c l , µ l )} For simplicity, the prior of the mean vector µ l is set to be flat, and a delta VPD q(µ l ) = δ(µ l −μ l ) is also applied. By maximizing p(µ l ), a point estimation can be obtained as follows:μ

Initialization Algorithm
The initialization algorithm is proposed to infer the initial variational variables, including the preliminary estimation of the parameter set Ω and the GMM model parameters. As shown in Algorithm 1, the estimation for multipath parameters w l and θ l is obtained by cyclically finding and removing the corresponding template function s(θ l ) from the remaining observations. The estimation process stops when the number of multipath reaches the preset amount L 0 . This method will inevitably result in duplicated estimations with the same scatter parameter vectors. Therefore, the channel taps with the same θ l are checked and combined to be one. After obtaining the multipath coefficients w l , we use the expectation-maximization (EM) algorithm [21] to estimate the initial parameters of K Gaussian clusters. In the E-step, the probability that w ln is classified as the k-th Gaussian component, i.e., p(c lnk = 1|w ln ) , is computed by the relative weight of the Gaussian likelihoods. In the M-step, µ lk and Λ lk are updated by the statistics of observations. When the EM process converges, we obtain r lnk = p(c lnk = 1|w ln ) . Notice that when initializing the EM algorithm, improper mean and variance values may cause the estimation of p(c lnk = 1|w ln ) failing to converge. Thus in this paper, the K-means algorithm [26] is applied to estimate the initial mean and variance of each cluster.

Pruning and Convergence Condition
A close examination of the sparsity prior (10) for the l-th multipath reveals that the sparsity parameter α lk controls the width of the Gaussian prior. When α lk → ∞ the cluster will be concentrated around the center which tends to be zero for a fading channel. Thus, large values of α l will make the component "irrelevant" and be removed from the estimation. In the proposed method, we set ∑ K k=1 α lk > TH p as a pruning condition where TH p is a preset threshold. In order to halt the iterative variational process, another preset threshold ∆ 0 is used to check whether the estimation converges. Define the update rate for the i-th iteration The iteration process stops when ∆ (i) ≤ ∆ 0 . The general flow chart of the VB-GMM algorithm is shown in Figure 2.

Computational Complexity
The computational complexity of the proposed algorithm is discussed in this section. For the benchmark comparison, we introduce the Gaussian prior-based variational Bayesian estimation scheme proposed in [5] (VB-G). The complex multiplication times (CMT) is used as the comparison metric for the evaluation of the computational complexity in VB-G and VB-GMM. Note that the multiplication of a L M × complex matrix and a M N × complex matrix demands the CMT of M L N × × . And the CMT required by computing the inversion of a N N × complex matrix is 3 4N [27]. The computational complexity of each procedure is given in Table 1, in which 1 4 T T represents the number of iterations taken by the variational updates in VB-G, the EM process, the K-means algorithm, and the variational updates of the VB-GMM, respectively. Compared with VB-G, the initialization process of VB-GMM has greater complexity due to an iterative estimation for GMM parameters, in which the computational complexity of the K-means algorithm is ( ) 3 T NK  [28]. In the variational inference procedure, the coefficient of the main complexity term 3 ND in VB-GMM is much smaller, resulting in the running time of about 1/5 of the VB-G in numerical simulations.

Procedure
VB-G VB-GMM

Computational Complexity
The computational complexity of the proposed algorithm is discussed in this section. For the benchmark comparison, we introduce the Gaussian prior-based variational Bayesian estimation scheme proposed in [5] (VB-G). The complex multiplication times (CMT) is used as the comparison metric for the evaluation of the computational complexity in VB-G and VB-GMM. Note that the multiplication of a L × M complex matrix and a M × N complex matrix demands the CMT of M × L × N. And the CMT required by computing the inversion of a N × N complex matrix is 4N 3 [27]. The computational complexity of each procedure is given in Table 1, in which T 1~T4 represents the number of iterations taken by the variational updates in VB-G, the EM process, the K-means algorithm, and the variational updates of the VB-GMM, respectively. Compared with VB-G, the initialization process of VB-GMM has greater complexity due to an iterative estimation for GMM parameters, in which the computational complexity of the K-means algorithm is O(T 3 NK) [28]. In the variational inference procedure, the coefficient of the main complexity term ND 3 in VB-GMM is much smaller, resulting in the running time of about 1/5 of the VB-G in numerical simulations.

Simulation Results
In the numerical simulation, the setup of the sounding parameters is given below. We use the constant amplitude zero auto-correlation sequence as the sounding sequence with 256 chips, and the interval is T c = 5 µs. The sample rate at the receiver is set as R s = 1 MHz which implies that the number of discrete samples per CIR is D = 1280. The number of observations per WSS region is assumed as N = 500. For the multipath channel, a 4-tap model is adopted in which the mean power of each tap is set as 0, −3, −5, −10 dB, and the multipath delay equals 10, 12, 15, 20 µs, respectively. For testing the performance of VB-GMM in various complex environments, three different fading types are considered. In the first scenario, the coefficients of the MPC are generated from Gaussian mixture distributions with K = 3. The real and imaginary parts of Gaussian means, covariances, and the mixing coefficients are randomly selected from the interval [−10, 10],[0, 1] and [0, 1], respectively. In the second scenario, all four multipaths are modeled as Weibull distribution with the shape parameter set as 3, 4, 5, 6, respectively. In the third scenario, we define the channel taps to be Nakagami-m distributed, and the m-parameters are set as 2, 3, 4, 5, respectively, to simulate different fading depths.
The numerical setup for the VB-GMM algorithm is as follows. The preset multipath amount is L 0 = 20, and the pruning threshold is set as TH p = 10 6 for all SNR conditions. In the VBE-step, the initial parameters are: α l = 0, β = 1 and a flat prior probability p(θ l ). In the VBM-step, the initial values of the parameters are: a lk = b lk = 10 −7 , γ lk = 1/K. Note that all the numerical simulations are based on over 5000 observations, and the iteration process is terminated when the update rate ∆ (i) is smaller than 0.001. Figure 3a shows a scatter plot of the estimated Gaussian mixture channel taps with the noise level SNR = −2 dB, in which the circles represent cluster centroids and the dots represent the estimated channel taps. The graph indicates that when the channel coefficients are Gaussian mixture distributed, the proposed VB-GMM is capable of classifying channel tap samples that come from different Gaussian components. Meanwhile, the cluster centroids can be precisely located. In Figure 3b,c, we examined the classification capability of the proposed algorithm when the channel taps are assumed to be Weibull and Nakagami distributed, respectively. As shown in Figure 3b,c, the channel coefficients in Weibull and Nakagami scenarios with random phases are clearly clustered into three regions, which indicates that the proposed method can learn and fit the distribution of random channels. Figure 4a represents the RMSE between the synthetic and reconstructed CIRs versus SNR in different scenarios. Notice that the VB-G scheme adopts a fixed sensitivity level of SNR = 10 dB. For low SNR of −5 dB, the VB-GMM exhibits comparable performance with the VB-G. As SNR increases, the VB-GMM clearly outperforms the VB-G in all three scenarios due to the flexibility in fitting non-Gaussian data. The reason is that for VB-SAGE the estimation errors of θ l , i.e., the delay of MPCs in part of the observations will result in a significant RMSE between the synthetic and reconstructed channel responses. While for VB-GMM, the multipath observations make the estimate of θ l more robust because θ l is assumed invariant in the WSS region. Figure 4b shows the curves of the estimated mean model order versus SNR. In the SNR range of −5 dB to 4 dB, the value of sparsity parameters tends to diverge, resulting in an obvious suppression effect on the MPCs. Under most SNR conditions, the VB-GMM shows accurate model selection for the 4-tap multipath channel, while the VB-G has a positive model order bias. Because for some of the observations if θ l is wrongly estimated, there will be residual in the received signal after removing the template function, which leads to the falsely identified components. From the results of Figures 3 and 4, the proposed method demonstrates its superiority in terms of model complexity and estimation accuracy. In Figure 5 we demonstrate curves of RMSEs that vary with the increasing iteration number in different scenarios for an SNR of 10 dB. The proposed method performs a similar convergence rate to the Gaussian prior-based scheme with smaller RMSEs in all the scenarios.
iteration process is terminated when the update rate ( ) i Δ is smaller than 0.001. Figure 3a shows a scatter plot of the estimated Gaussian mixture channel taps with the noise level = − SNR 2 dB , in which the circles represent cluster centroids and the dots represent the estimated channel taps. The graph indicates that when the channel coefficients are Gaussian mixture distributed, the proposed VB-GMM is capable of classifying channel tap samples that come from different Gaussian components. Meanwhile, the cluster centroids can be precisely located. In Figure 3b,c, we examined the classification capability of the proposed algorithm when the channel taps are assumed to be Weibull and Nakagami distributed, respectively. As shown in Figure 3b,c, the channel coefficients in Weibull and Nakagami scenarios with random phases are clearly clustered into three regions, which indicates that the proposed method can learn and fit the distribution of random channels.  Figure 4a represents the RMSE between the synthetic and reconstructed CIRs versus SNR in different scenarios. Notice that the VB-G scheme adopts a fixed sensitivity level of . For low SNR of −5 dB, the VB-GMM exhibits comparable performance with the VB-G. As SNR increases, the VB-GMM clearly outperforms the VB-G in all three scenarios due to the flexibility in fitting non-Gaussian data. The reason is that for VB-SAGE the estimation errors of l θ , i.e., the delay of MPCs in part of the observations will result in a significant RMSE between the synthetic and reconstructed channel responses. While for VB-GMM, the multipath observations make the estimate of l θ more robust because l θ is assumed invariant in the WSS region. Figure 4b shows the curves of the estimated mean model order versus SNR. In the SNR range of −5 dB to 4 dB, the value of sparsity parameters tends to diverge, resulting in an obvious suppression effect on the MPCs. Under most SNR conditions, the VB-GMM shows accurate model selection for the 4-tap multipath channel, while the VB-G has a positive model order bias. Because for some of the observations if l θ is wrongly estimated, there will be residual in the received signal after removing the template function, which leads to the falsely identified components. From the results of Figures 3 and 4, the proposed method demonstrates its superiority in terms of model complexity and estimation accuracy. In Figure 5 we demonstrate curves of RMSEs that vary with the increasing iteration number in different scenarios for an SNR of 10 dB. The proposed method performs a similar convergence rate to the Gaussian priorbased scheme with smaller RMSEs in all the scenarios.    . For low SNR of −5 dB, the VB-GMM exhibits comparable performance with the VB-G. As SNR increases, the VB-GMM clearly outperforms the VB-G in all three scenarios due to the flexibility in fitting non-Gaussian data. The reason is that for VB-SAGE the estimation errors of l θ , i.e., the delay of MPCs in part of the observations will result in a significant RMSE between the synthetic and reconstructed channel responses. While for VB-GMM, the multipath observations make the estimate of l θ more robust because l θ is assumed invariant in the WSS region. Figure 4b shows the curves of the estimated mean model order versus SNR. In the SNR range of −5 dB to 4 dB, the value of sparsity parameters tends to diverge, resulting in an obvious suppression effect on the MPCs. Under most SNR conditions, the VB-GMM shows accurate model selection for the 4-tap multipath channel, while the VB-G has a positive model order bias. Because for some of the observations if l θ is wrongly estimated, there will be residual in the received signal after removing the template function, which leads to the falsely identified components. From the results of Figures 3 and 4, the proposed method demonstrates its superiority in terms of model complexity and estimation accuracy. In Figure 5 we demonstrate curves of RMSEs that vary with the increasing iteration number in different scenarios for an SNR of 10 dB. The proposed method performs a similar convergence rate to the Gaussian priorbased scheme with smaller RMSEs in all the scenarios.    To study the influence of the number of Gaussian components K, we generated over 30,000 channel responses from the Weibull and Nakagami channels and the performance of the proposed VB-GMM is given in Figure 6. In the simulation, the SNR is fixed at 10 dB. As depicted in Figure 6a, the RMSEs of the reconstructed channel responses have no significant change as K increases in both Weibull and Nakagami channels. From Figure 6b, it can be seen that the estimated model orders are accurate when K ≥ 2. Consider that a large K cannot bring considerate improvement and may increase the complexity of GMM, we propose to set K = 3 for most channel conditions. To study the influence of the number of Gaussian components K , we generated over 30,000 channel responses from the Weibull and Nakagami channels and the performance of the proposed VB-GMM is given in Figure 6. In the simulation, the SNR is fixed at 10 dB. As depicted in Figure 6a, the RMSEs of the reconstructed channel responses have no significant change as K increases in both Weibull and Nakagami channels. From Figure  6b, it can be seen that the estimated model orders are accurate when 2 K ≥ . Consider that a large K cannot bring considerate improvement and may increase the complexity of GMM, we propose to set =3 K for most channel conditions.

Conclusions
In this paper, we presented a variational Bayesian-based channel parameter estimation approach for Gaussian mixture distributed wireless channels, which combined the variational Bayesian estimation [5] and the variational GMM training [21]. Compared with the existing Bayesian estimation schemes, the proposed scheme can exploit the sparse solutions of the multipath channel and represent the complicated distribution patterns by the application of GMMs. Experiment results showed that the VB-GMM method can obtain the channel estimates with a small reconstruction RMSE and determine the optimal number of MPCs through a fixed pruning criterion under various SNR conditions. For future works, we will further investigate factors that may affect the algorithm performance in terms of the sparsity and the implementation complexity.

Conclusions
In this paper, we presented a variational Bayesian-based channel parameter estimation approach for Gaussian mixture distributed wireless channels, which combined the variational Bayesian estimation [5] and the variational GMM training [21]. Compared with the existing Bayesian estimation schemes, the proposed scheme can exploit the sparse solutions of the multipath channel and represent the complicated distribution patterns by the application of GMMs. Experiment results showed that the VB-GMM method can obtain the channel estimates with a small reconstruction RMSE and determine the optimal number of MPCs through a fixed pruning criterion under various SNR conditions. For future works, we will further investigate factors that may affect the algorithm performance in terms of the sparsity and the implementation complexity.