Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference

Lu, Wei; Wang, Yongliang; Wen, Xiaoqiao; Peng, Shixin; Zhong, Liang

doi:10.3390/electronics8050473

Open AccessArticle

Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference

by

Wei Lu

^1,*

,

Yongliang Wang

¹,

Xiaoqiao Wen

¹,

Shixin Peng

² and

Liang Zhong

³

¹

Air Force Early Warning Academy, Wuhan 430019, China

²

National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China

³

Department of communication system, China University of Geoscience, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Electronics 2019, 8(5), 473; https://doi.org/10.3390/electronics8050473

Submission received: 24 March 2019 / Revised: 19 April 2019 / Accepted: 24 April 2019 / Published: 28 April 2019

(This article belongs to the Special Issue Massive MIMO Systems)

Download

Browse Figures

Versions Notes

Abstract

:

We exploited the temporal correlation of channels in the angular domain for the downlink channel estimation in a massive multiple-input multiple-output (MIMO) system. Based on the slow time-varying channel supports in the angular domain, we combined the channel support information of the downlink angular channel in the previous timeslot into the channel estimation in the current timeslot. A downlink channel estimation method based on variational Bayesian inference (VBI) and overcomplete dictionary was proposed, in which the support prior information of the previous timeslot was merged into the VBI for the channel estimation in the current timeslot. Meanwhile the VBI was discussed for a complex value in our system model, and the structural sparsity was utilized in the Bayesian inference. The Bayesian Cramér–Rao bound for the channel estimation mean square error (MSE) was also given out. Compared with other algorithms, the proposed algorithm with overcomplete dictionary achieved a better performance in terms of channel estimation MSE in simulations.

Keywords:

massive MIMO; channel estimation; Bayesian inference; overcomplete dictionary

1. Introduction

Massive multiple-input multiple-output (MIMO) is the key technology for next generation wireless communication. The large number of antennas enable high spectrum efficiency and lower power consumption [1]. To get these benefits, the base station (BS) needs to acquire the channel stated information (CSI) for uplink and downlink. Pilot-based channel estimation is widely used in wireless communication systems. In the time division duplex (TDD) system, the channel reciprocity is used to get the CSI by only estimating the uplink channel at BS. In the frequency division duplex (FDD) system, the channel reciprocity cannot be used directly. In FDD massive MIMO system it is challenging to get the downlink CSI with the conventional feedback scheme. In the conventional feedback scheme each user estimates its channel and then feeds back the estimated CSI to the BS. The pilot and feedback overheads are high for massive MIMO, since they are scaling linearly with the number of antennas. Hence, it is important to design an efficient downlink channel estimation and feedback scheme for a FDD massive MIMO system.

By exploiting the sparsity in massive MIMO channel, compressed sensing (CS) was applied in the channel estimation and feedback. The users could feed the compressed training measurements back to the BS, and an orthogonal matching pursuit (OMP) was used for downlink CSI recovery in [2]. In [3] the modified basis pursuit (MBP) was proposed by utilizing the partial priori signal support information to improve the recovery performance. In [4] the support information of a signal in the discrete fourier transform domain was incorporated into the weighted l₁ minimization approach for CS recovery, which could reduce the number of measurements by the size of the known part of support. In [5] a three-level weighting scheme based on the support information was used for the weighted l₁ minimization and the simulation results showed superiority. In [6] we exploited the reciprocity between uplink and downlink channels in the angular domain, and diagnosed the supports of the downlink channel from the estimated uplink channel, and proposed a weighted subspace pursuit (SP) channel estimation algorithm for FDD massive MIMO. It can be seen that CS was effective in the channel estimation for massive MIMO.

However, most of these algorithms need the sparsity level in the estimation algorithm, which is not practical in engineering scenarios. The Bayesian framework can be applied to the compressive channel estimation. In [7], Bayesian estimation of sparse massive MIMO channel was developed in which neighboring antennas shared among each other their information about the channel support. In [8] a variational expectation maximization strategy was used for massive MIMO channel estimation, and a Gaussian mixture prior model was designed to capture the individual sparsity for each channel and the joint sparsity among users. In [9] a sparse Bayesian learning algorithm was proposed for FDD massive MIMO channel estimation with arbitrary 2D-array. By the Bayesian framework in compressive channel estimation the sparsity level is unnecessary, and it has relatively better recovery performance. Additionally, there exists angular reciprocity in massive MIMO. For example, the channel covariance matrices for uplink and downlink are reconstructed by making use of the angle reciprocity between uplink and downlink channels in [10]. Hence it is promising to apply the angular reciprocity and Bayesian framework in the compressive massive MIMO channel estimation.

Additionally, there exists angular reciprocity in the FDD massive MIMO. There is also time correlation of channels. In [11] a differential compressive feedback in FDD massive MIMO was proposed based on the channel impulses response (CIR) between timeslots, which were slow time-varying and sparse, and the differential CIR between two CIRs in adjacent timeslots was sparse. Inspired by the sparsity in the angular domain and time correlation of channels, the correlated angular sparsity can also be exploited for massive MIMO channel estimation.

In this paper we proposed a downlink channel estimation in a TDD/FDD massive MIMO system. The timeslots were divided into groups. In each group the estimated channel support information of the previous timeslot was utilized by the following timeslot. The correlated angular sparsity between timeslots in the downlink channel was utilized in the Bayesian inference for channel recovery. We transformed the complex sparse vector to the real sparse vector recovery by Bayesian inference, and the structural sparsity of the transformed real sparse vector was utilized. Meanwhile, the prior support information from the estimated channel in the previous timeslot was made use of in modeling the hidden hyperparameters in the Bayesian model. A Bayesian Cramér–Rao bound analysis is presented, and simulations are given out to verify the performance of the proposed algorithm. The main contributions were as follows: (1) a group-based channel estimation scheme was proposed, in which previous estimated channel support information was used as the priori information in the following timeslot due to the sparsity correlation; (2) priori information was merged into the Bayesian inference algorithm for channel recovery; (3) the Bayesian Cramér–Rao bound for the channel estimation mean square error (MSE) was analyzed.

The system model is illustrated in Section 2, while the proposed channel estimation algorithm based on Bayesian inference is presented in Section 3. The Bayesian Cramér–Rao bound (BCRB) for the channel estimation of mean square error (MSE) is given out in Section 4. Simulations and conclusions are presented in Section 5 and Section 6.

In the paper, we used the following notations. Scalars, vectors and matrices were denoted by lower-case, boldface lower-case and boldface upper-case symbols. The probability density function of a given random variable was denoted by p(·). Gamma(x|a, b) was the Gamma probability density function (PDF) with shape parameters a and b for x, while Normal(x|c, d) was the Gaussian PDF with parameters mean c and variance d for x. Γ(·) was the Gamma function, and ln(·) was the logarithm function. Tr(·) stood for the trace operator. 𝔼_a(·) denoted the expectation operation with the PDF of variable a.

2. System Model

We considered a massive MIMO TDD/FDD system with a single user, and assumed that the BS was equipped with N antennas and the user terminal (UT) had a single antenna. For the downlink channel estimation in the massive MIMO system, the BS transmitted the pilots to UT. The UT received the pilots and fed back the received signal to the BS directly. The received signal

y^{d} (t)

at the UT in the t-th timeslot was written as

y^{d} (t) = \sqrt{ρ^{d}} A h^{d} (t) + n^{d} (t)

(1)

where

h^{d} (t) \in ℂ^{N \times 1}

is the downlink channel,

A \in ℂ^{T_{d} \times N}

is the downlink pilots,

T_{d}

is the pilot length,

ρ^{d}

is the downlink received power,

n^{d} \in ℂ^{T_{d} \times 1}

is the received noise with each element to be i.i.d Gaussian with mean 0 and variance

σ^{2},

y^{d} (t) \in ℂ^{T_{d} \times 1}

is the received signal at UT.

Since in the massive MIMO there existed sparsity, when

D^{d} \in ℂ^{N \times M}

was the channel dictionary for downlink channel which could be unitary dictionary or overcomplete dictionary (

M > N

, their column vector had the form of steering vector with a different sampling angle),

h_{a}^{d} (t)

was the sparse representation with

h^{d} (t) = D^{d} h_{a}^{d} (t)

. In this paper we applied the overcomplete dictionary to present the sparse angular channel to get a better recovery performance. In the downlink channel estimation, we needed to obtain

{\hat{h}}_{a}^{d} (t)

the estimated downlink channel in the angular domain in the t-th timeslot.

By utilizing the sparse channel representation we then had

y^{d} (t) = \sqrt{ρ^{d}} A D^{d} h_{a}^{d} (t) + n^{d} (t)

(2)

For simplicity, the timeslot mark is omitted in the following equations. Since

y^{d} (t)

,

h_{a}^{d} (t)

, and

n^{d} (t)

are complex number vectors, we could rewrite Equation (2) into real number vectors as

[\begin{matrix} Re (y^{d}) \\ Im (y^{d}) \end{matrix}] = [\begin{matrix} Re (\sqrt{ρ^{d}} A D^{d}) & - Im (\sqrt{ρ^{d}} A D^{d}) \\ Im (\sqrt{ρ^{d}} A D^{d}) & Re (\sqrt{ρ^{d}} A D^{d}) \end{matrix}] [\begin{matrix} Re (h_{a}^{d} (t)) \\ Im (h_{a}^{d} (t)) \end{matrix}] + [\begin{matrix} Re (n^{d} (t)) \\ Im (n^{d} (t)) \end{matrix}]

(3)

where Re(·) and Im(·) denote the real and imaginary parts respectively. For simplicity, we rewrote Equation (3) as

\bar{y} = \bar{A} \bar{h} + \bar{n}

(4)

where

\bar{y} = [\begin{matrix} Re (y^{d}) \\ Im (y^{d}) \end{matrix}]

,

\bar{A} = [\begin{matrix} Re (\sqrt{ρ^{d}} A D^{d}) & - Im (\sqrt{ρ^{d}} A D^{d}) \\ Im (\sqrt{ρ^{d}} A D^{d}) & Re (\sqrt{ρ^{d}} A D^{d}) \end{matrix}]

,

\bar{h} = [\begin{matrix} Re (h_{a}^{d} (t)) \\ Im (h_{a}^{d} (t)) \end{matrix}]

and

\bar{n} = [\begin{matrix} Re (n^{d} (t)) \\ Im (n^{d} (t)) \end{matrix}]

.

On the other hand, we considered the meaning of sparse angular channel representation

h_{a}^{d} (t)

. If the transmission angles were allocated exactly at the sampling points in the channel dictionary

D^{d}

, then the corresponding coefficient in the

h_{a}^{d} (t)

was nonzero. If the path number was smaller than the antenna number, then

h_{a}^{d} (t)

was sparse. However, there was leakage effect induced by dictionary mismatch which will have deteriorated the sparsity of the angular channel representation [12]. When the movement velocity of UT was not very high, e.g., v = 12 km/h, and the typical timeslot duration τ = 0.5 ms, the movement distance of UT in one timeslot was 0.017 m. When the distance of UT and BS was 200 m, the angle change for the line of sight (LoS) transmission in one timeslot was 0.0049° which was much smaller than the sampling interval in the dictionary. For the non-LoS (NLoS) transmission, the angle change was also small which is discussed in Section 4.1. Hence the transmission angle change between two timeslots is very small if the transmission environment doesn’t change dramatically, and there is correlation in the angular channel sparsity between adjacent timeslots. In other words, the information regarding the estimated angular channel in the previous timeslot could be utilized in the current channel estimation.

It was proven that the prior support information could improve the channel recovery performance [3,4,5,6]. Hence in this paper we made use of the prior support information from the previous timeslot to improve the Bayesian channel estimation. In the following section we have discussed how to merge the prior information into the Bayesian inference algorithm for channel estimation.

3. Proposed Algorithm

We designed a three-layer hierarchical graphical model as shown in Figure 1. In the first layer,

\bar{h}

was assigned a Gaussian prior distribution

p (\bar{h} | α) = \prod_{i = 1}^{2 N} p ({\bar{h}}_{i} | α_{i})

(5)

where

{\bar{h}}_{i}

and

α_{i}

are the i-th entry in

\bar{h}

and

α

respectively,

p ({\bar{h}}_{i} | α_{i}) = N o r m a l ({\bar{h}}_{i} | 0, α_{i})

and

α_{i}

is the inverse variance of the Gaussian distribution_. When

{\bar{h}}_{i}

is close to 0, then

α_{i}

is very large, and vice versa.

In the second layer, we assumed a Gamma distribution as hyperpriors over the hyperparameters

α_{i}

, and it can be presented as

p (α) = \prod_{i = 1}^{2 N} G a m m a (α_{i} | a_{i}, b_{i})

(6)

where Gamma(·) is the Gamma PDF, and the parameters a_i and b_i characterize the shape of Gamma PDF. For fixed a_i, the larger b_i is, the smaller

α_{i}

is; then

{\bar{h}}_{i}

tends to be nonzero. In the sparse Bayesian learning a_i and b_i were set to be very small for non-informative hyperprior over

α_{i}

[13].

In our model, we set a_i to be constant with a predefined value, and we modeled b_i as random parameters. In Figure 1 it could be found that the entries of

\bar{y}

were divided into two sets by their indices, i.e.,

{S, S_{+ N}}

and

{S, S_{+ N}}^{c}

, where S was the set with channel support indices from the previous timeslot, and

S_{+ N}

was the set with each index in S added by N, since we converted the complex system model to the real system model as Equation (3)

. {S, S_{+ N}}^{c}

was the complementary set of

{S, S_{+ N}}

. For example, in the (t − 1)-th timeslot, the positions of nonzero entries or called supports in

h_{a}^{d} (t - 1)

were S = {4, 5, 6}, then

S_{+ N} = {4 + N, 5 + N, 6 + N}

. The probable supports for

h_{a}^{d} (t)

in the current t-th timeslot can be assumed to be the same as those for previous (t − 1)-th timeslot for simplicity. On the other hand, we could have also diagnosed the probable channel supports further by taking the angle deviation and leakage effects into consideration. In this paper we adopted the support diagnosis algorithm, and the details can be found in [6].

For

{\bar{y}}_{j}

,

j \in {S, S_{+ N}}

, we employed a Gamma distribution over the hyperparameters b_i in the third layer as

Gamma (b_{i} | c, d) = Γ {(c)}^{- 1} d^{c} b_{i}^{c - 1} e^{- d b_{i}}

(7)

where c and d characterize the shape of Gamma PDF. By the system model and assumptions for massive MIMO, we could use a Bayesian inference to perform the sparse channel recovery.

According to the standard Bayesian inference [14], let

z ≜ {\bar{h}, α, b}

, we have

\begin{array}{l} \ln p (z_{i}) & = E_{z_{i}, i \neq j} [\ln p (\bar{y}, z)] + constant \\ \propto E_{z_{i}, i \neq j} [\ln p (\bar{y}, z)] \end{array}

(8)

where constant is a constant used for

p (z_{i})

normalization,

p (\bar{y}, z)

is the joint pdf for

\bar{h}

and

z

, and z_i can be

\bar{h}, α, and b

. We have

p (\bar{y}, z) = p (z | \bar{y}) p (\bar{y})

. We assume

p (z | \bar{y})

posterior independence among the hidden variables

z

, then

p (z | \bar{y}) \approx p (z)

, and

p (z)

is the product of PDF of

\bar{h}, α,

and

b

.

In order to make use of the prior support information from the previous timeslot and the structure sparsity in Equation (4), we needed to make some modifications to the standard Bayesian inference. The main considerations for the modifications were as follows:

(I) Since we rewrote Equation (2) as Equation (4), if

h_{a, i}^{d}

was nonzero, then

{\bar{h}}_{i}

and

{\bar{h}}_{i + N}

were nonzero simultaneously. Hence it was wise to assume that b_i and b_i+N were the same;

(II) In the standard Bayesian learning a_i and b_i were set to be very small for non-informative hyperprior over

α_{i}

. This assumption was valid if no prior information was provided. If the prior support information was available, such as that the support information of the previous timeslot could be used for channel estimation in the coming timeslot by sparsity correlation, it was wise to assume that the supports between adjacent timeslots were partially common. If the i-th element in the angular channel vector was nonzero, then the hyperparameter b_i and b_i+N tended to be variables rather than to be fixed small numbers, which meant only for the indices from the prior support set S the third layer prior model was adopted.

It can be seen that the consideration (II) was similar to [15]. However, our proposed algorithm was extended for a complex number system and the structure sparsity was considered. However, on the other hand, the overcomplete dictionary was adopted in our algorithm.

The proposed uplink-aided downlink channel estimation based on Bayesian inference was as follows:

(i) Update of p(

\bar{h}

)

According to Equation (8), by ignoring the terms which are independent of

\bar{h}

, we have

\begin{array}{l} \ln p (\bar{h}) & \propto E_{α, b} [\ln p (\bar{y} | \bar{h}) + \ln p (\bar{h} | α) + \ln p (b)] \\ \propto E_{α, b} [\ln p (\bar{y} | \bar{h}) + \ln p (\bar{h} | α)] \\ = \frac{- 1}{2 σ^{2}} {(\bar{y} - \bar{A} \bar{h})}^{T} (\bar{y} - \bar{A} \bar{h}) - \frac{1}{2} {\bar{h}}^{T} Λ \bar{h} \end{array}

(9)

where

Λ = diag {E_{α} [α_{i}]}

,

σ^{2}

is the noise variance in the system model, the vectors b and α are comprised by b_i and α_i respectively. Since

p (\bar{y} | \bar{h})

and

p (\bar{h} | α)

are a Gaussian distribution, then

p (\bar{h})

follows a Gaussian distribution with the mean

μ

and covariance

ϕ

given by

μ = \frac{1}{σ^{2}} Φ {\bar{A}}^{T} \bar{y}

(10)

Φ = (\frac{1}{σ^{2}} {\bar{A}}^{T} \bar{A} + Λ)

(11)

(ii) Update of p(

α

)

According to Equation (8), by ignoring the terms which are independent of

α

, we have

\begin{array}{l} \ln p (α) & \propto E_{\bar{h}, b} [\ln p (\bar{y} | \bar{h}) + \ln p (\bar{h} | α) + \ln p (α | a, b) + \ln p (b)] \\ \propto E_{\bar{h}, b} [\ln p (\bar{h} | α) + \ln p (α | a, b)] \\ = \sum_{i = 1}^{2 N} E_{\bar{h}, b} {(a_{i} - 0.5) \ln α_{i} - (0.5 {\bar{h}}_{i}^{2} + b_{i}) α_{i}} \\ = \sum_{i \in {S, S_{+ N}}} E_{\bar{h}, b} {(a_{i} + 0.5 - 1) \ln α_{i} - (0.5 {\bar{h}}_{i}^{2} + b_{i}) α_{i}} + \\ \sum_{i \in {S, S_{+ N}}^{c}} E_{\bar{h}, b} {(a - 0.5) \ln α_{i} - (0.5 {\bar{h}}_{i}^{2} + b_{i}) α_{i}} \\ = \sum_{i \in {S, S_{+ N}}} {(a_{i} + 0.5 - 1) \ln α_{i} - (\frac{E_{\bar{h}, b} (b_{i} + b_{i + N})}{2} + \frac{E_{\bar{h}, b} ({\bar{h}}_{i}^{2} + {\bar{h}}_{i + N}^{2})}{4}) α_{i}} + \\ \sum_{i \in {S, S_{+ N}}^{c}} {(a_{i} + 0.5 - 1) \ln α_{i} - (b_{i} + \frac{E_{\bar{h}, b} ({\bar{h}}_{i}^{2} + {\bar{h}}_{i + N}^{2})}{4}) α_{i}} \end{array}

(12)

where S is the estimated support set from the previous timeslot. Since the complex system model was converted in Equation (4). By (II),

S_{+ N} ≜ {s_{i} + N}

was also the support set in the converted system model in Equation (4). For

i \in {S, S_{+ N}}

,

b_{i}

is variable number,

b_{i}

and

b_{i + N}

were assumed to be the same, we used

0.5 E_{\bar{h}, b} (b_{i} + b_{i + N})

to present

E_{\bar{h}, b} (b_{i})

. The same assumption was applied to

{\bar{h}}_{i}

and

{\bar{h}}_{i + N}

with

E_{\bar{h}, b} ({\bar{h}}_{i}^{2}) = 0.5 E_{\bar{h}, b} ({\bar{h}}_{i}^{2} + {\bar{h}}_{i + N}^{2})

. In this way the structural sparsity was utilized.

Since

p (α | a, b)

is the Gamma distribution and

p (\bar{h} | α)

is the Gaussian distribution,

p (α)

is the Gamma distribution. Then

p (α_{i})

is also the Gamma distribution with the updated parameters

{\tilde{a}}_{i}

and

{\tilde{b}}_{i}

given by

{\tilde{a}}_{i} = a_{i} + 0.5

(13)

{\tilde{b}}_{i} = {\begin{matrix} \frac{E_{\bar{h}, b} (b_{i} + b_{i + N})}{2} + \frac{E_{\bar{h}, b} ({\bar{h}}_{i}^{2} + {\bar{h}}_{i + N}^{2})}{4}, i \in {S, S_{+ N}} \\ b_{i} + \frac{E_{\bar{h}, b} ({\bar{h}}_{i}^{2} + {\bar{h}}_{i + N}^{2})}{4}, i \in {S, S_{+ N}}^{c} \end{matrix}

(14)

(iii) Update of

p (b_{{S, S_{+ N}}})

According to Equation (8), by ignoring the terms which are independent of b, we have

\begin{array}{l} \ln p (b_{{S, S_{+ N}}}) & \propto E_{α, \bar{h}} [\ln p (\bar{y} | \bar{h}) + \ln p (\bar{h} | α) + \ln p (α | a, b) + \ln p (b | c, d)] \\ \propto E_{α, \bar{h}} [\ln p (α | a, b) + \ln p (b | c, d)] \\ = \sum_{_{i \in {S, S_{+ N}}}} {- b_{i} E_{α} (α_{i}) + (c_{i} - 1) \ln b_{i} - d_{i} b_{i}} \end{array}

(15)

where

b_{{S, S_{+ N}}}

is comprised by the entries indicated by

{S, S_{+ N}}

in

b

. In (15) the

α

, a, b, c, d are also comprised by their indicated

{S, S_{+ N}}

, the subscript

{S, S_{+ N}}

is omitted for simplicity. As shown in Figure 1,

b_{{S, S_{+ N}}}

was modelled as a Gamma distribution. Since

p (α_{i} | a_{i}, b_{i})

and

p (b_{i} | c_{i}, d_{i})

were a Gamma distribution,

p ({\tilde{b}}_{{S, S_{+ N}}})

was Gamma(

{\tilde{b}}_{i \in {S, S_{+ M}}} | {\tilde{c}}_{i}, {\tilde{d}}_{i}

), and the updated

{\tilde{c}}_{i}

and

{\tilde{d}}_{i}

were given by

{\tilde{c}}_{i} = a_{i} + c_{i}

(16)

{\tilde{d}}_{i} = d_{i} + E_{α} (α_{i})

(17)

Then the Bayesian inference for the channel estimation was executed iteratively among (i), (ii), and (iii). The details of the algorithm are summarized in step 3 of Algorithm 1. When the estimated channel vector

\bar{h'}

was recovered, we needed to convert it to the complex vector

h_{a}^{d}

according to Equation (3).

Algorithm 1 Downlink channel estimation with variational inference algorithm and overcomplete dictionary.

Input:

\bar{A}, \bar{y}, σ^{2}

Output:

\bar{h'}

Divide the timeslots into groups, and with each group comprised by t_g timeslots.
For the first timeslot in the group, use variational Bayesian inference (VBI) for channel estimation, and obtain the angular channel supports.
For the rest of the timeslots in the group, utilize the support information from the previous timeslot for channel estimation one by one. The recovery algorithm in each timeslot is as follows:
3.1.
Initialize $α$ , a, b, c, d.
3.2.
$μ = \frac{1}{σ^{2}} ϕ {\bar{A}}^{T} \bar{y}, ϕ = (\frac{1}{σ^{2}} {\bar{A}}^{T} \bar{A} + Λ)$ , $E_{\bar{h}, b} ({\bar{h}}_{i}^{2}) = μ_{i}^{2} + ϕ_{i, i}$ , where $Λ = diag {E_{α} [α_{i}]}$ , $μ_{i}$ is the i-th entry in $μ$ , and $ϕ_{i, i}$ is the i-th diagonal entry in $ϕ$ .
3.3.
Update ${\tilde{a}}_{i}$ and ${\tilde{b}}_{i}$ according to Equations (13) and (14) in (ii) ( ${\tilde{a}}_{i}$ and ${\tilde{b}}_{i}$ are the updated $a_{i}$ and $b_{i}$ , and $a_{i}$ and $b_{i}$ are the results from last iteration); then according to the property of the Gamma distribution variable, $E_{α} (α_{i}) = {\tilde{a}}_{i}$ / ${\tilde{b}}_{i}$ .
3.4.
Update $\tilde{c}$ and $\tilde{d}$ according to Equations (16) and (17) in (iii) ( $\tilde{c}$ and $\tilde{d}$ are the updated $c$ and $d$ , and $c$ and $d$ are the results from last iteration); then according to the property of the Gamma distribution variable, $E_{α} ({\tilde{b}}_{i}) = \tilde{c} / \tilde{d}$ .
3.5.
Go to step 3.2 until stop criteria meets.
3.6.
Then $\bar{h'} = μ$ .
Go back to step 2 for a new group of timeslots.

In a practical massive MIMO system, the transmission environment may change suddenly, in this way the correlation of sparsity between adjacent timeslots will deteriorate, and the previous channel support information cannot be utilized. On the other hand, the error will accumulate if the previous channel support information is utilized timeslot by timeslot. Hence, the initialization is important for the robustness and efficiency of the algorithm. As shown in Figure 2 divided the timeslots into groups, and each group was comprised of several timeslots. During the channel estimation for each group, the VBI was used for the channel estimation in the first timeslot, and then the proposed algorithm was executed for the remaining timeslots in which the channel support information of the previous timeslot was made use of by the current timeslot. This procedure is detailed in steps 1, 2 and 4 in Algorithm 1.

4. Discussion

4.1. Sparsity Correlation Analysis

The UT movement distance was very small when the velocity of UT was small and the timeslot was 0.5 ms. The reflector for the transmission was static during the UT moving between timeslots. The ellipse geometry channel model is shown in Figure 3. The line of sight (LoS) distance between BS and UT was d_Los, the non-LoS (NLoS) distance by reflector between BS and UT was d_NLoS, and the UT movement distance in one timeslot was d_Δ. If the transmission path was still reflected by the same reflector as shown in Figure 3, the maximum and minimum NLoS distances from BS to UT between timeslots were d_NLoS + d_Δ and d_NLoS − d_Δ. The transmission angle change was Δ_θ. The distance between the reflector and BS was d₁. By some mathematical manipulations shown in Appendix A, we got

Δ_{θ} \approx \frac{2 d_{Δ} (d_{N L o S} - d_{1})}{2 d_{1} d_{L o S}} \frac{1}{\sqrt{1 - \cos^{2} θ}}

(18)

In order to illustrate the angle change Δ_θ during one timeslot, we assumed that d_NLoS was 800 m, the velocity of UT was 14.4 km/h, and the typical timeslot duration τ = 0.5 ms, then the movement distance of UT in one timeslot was 0.02 m. By changing the distance between BS and reflector, as shown in Figure 4, the angle change was not more than 0.025°. It should be noted that when the LoS distance and the d_NLoS were fixed, BS and reflector distance could not be arbitrary vales due to triangle inequality. Hence, the angle of arrival or departure changed slowly and then there was sparsity correlation among the angular channels for adjacent timeslots.

4.2. Bayesian Cramér-Rao Bound Analysis

In this section we have discussed the Bayesian Cramér–Rao bound (BCRB) for the channel estimation with the proposed algorithm. Let

z ≜ {\bar{h}, σ}

, then the BCRB for the channel vector

\bar{h}

is given by the inverse of the Fisher information matrix

J

as:

J = E_{z} {- \frac{\partial^{2} \log p (\bar{y}, z)}{\partial z_{i} \partial z_{j}}}

(19)

According to the system model in Section 2,

\bar{h}, σ

are independent, the Fisher information matrix

J

is block diagonal. We can rewrite

p (\bar{y}, z)

as

p (\bar{y}, z) = p (\bar{y} | z) p (\bar{h} | α) p (α | b) p (b)

(20)

Then the BCRB on the MSE of the estimated channel vector

\bar{h'}

is given by

E {{‖ \bar{h^{'}} - \bar{h} ‖}^{2}} \geq t r (J_{{\bar{h}}_{i} {\bar{h}}_{j}}^{- 1})

(21)

where

J_{{\bar{h}}_{i}}_{{\bar{h}}_{j}} = E_{z} {- \frac{\partial^{2} \log p (\bar{y}, z)}{\partial {\bar{h}}_{i} \partial {\bar{h}}_{j}}}

is the fisher information sub-matrix. Thus, we can obtain the Bayesian Cramér–Rao bound of the minimum mean square error for the estimated channel

\bar{h'}

as shown in Proposition 1.

Proposition 1.

The BCRB of MSE for the channel estimation

\bar{h'}

is represented as

E {{‖ \bar{h^{'}} - \bar{h} ‖}^{2}} \geq t r ({(d i a g (E (\frac{1}{α_{i}})) + \frac{1}{σ^{2}} {\bar{A}}^{T} \bar{A})}^{- 1}) = \sum_{i \in S} \frac{1}{\frac{1 + c}{a d_{i}} + \frac{λ_{i}}{σ}} + \sum_{i \notin S} \frac{1}{\frac{b_{i}}{a} + \frac{λ_{i}}{σ}}

(22)

where S is the diagnosed support set,

λ_{i}

is the eigenvalues of

{\bar{A}}^{T} \bar{A}

, and

{\bar{A}}^{T} \bar{A} \in ℝ^{2 N \times 2 N}

, and a,

b_{i}

, c, and

d_{i}

are the parameters in the Bayesian model inFigure 1. When

T_{d}, M \to \infty

and

\frac{T_{d}}{M} = β

, according to the random matrix theory, we have

\begin{array}{l} E {{(\bar{h} - \hat{\bar{h}})}^{H} (\bar{h} - \hat{\bar{h}})} & \geq | S | \cdot \frac{1}{| S |} \sum_{i \in S} \frac{1}{\frac{1 + c}{a \min (d)} + + \frac{λ_{i}}{σ}} + (N - | S |) \cdot \frac{1}{(N - | S |)} \sum_{i \notin S} \frac{1}{\frac{\max (b)}{a} + \frac{λ_{i}}{σ}} \\ \to | S | \frac{a \min (d)}{1 + c} (1 - \frac{F (s n r_{1}, β)}{4 β s n r_{1}}) + (N - | S |) \frac{a}{\max (b)} (1 - \frac{F (s n r_{2}, β)}{4 β s n r_{2}}) \end{array}

(23)

where

s n r_{1} = \frac{a \min (d)}{(1 + c) σ}

,

s n r_{2} = \frac{a}{σ \max (b)}

,

F (x, z) = {(\sqrt{x {(1 + \sqrt{z})}^{2} + 1} - \sqrt{x {(1 - \sqrt{z})}^{2} + 1})}^{2}

,

\min (d)

and

\max (b)

are the minimum and maximum entries in d and b.

The proof of proposition 1 is presented in Appendix B. From proposition 1, we can see that the MSE lower bound is related to the priori support size

| S |

,

(1 + c) / \min (d)

and

\max (b)

for the massive MIMO channel estimation.

5. Simulations

In the simulation, the support diagnosis algorithm in [6] was adopted, and we assumed that the transmission angle change between timeslots was within 1 degree. The pilot length was 50, and antenna number at the BS was 100. The channel was generated according to the spatial model as defined in 3GPP TR25.996. We compared our proposed algorithm with a unitary dictionary with a size of 100 and the overcomplete dictionary with a size of 150, 200, and 250, and compared this with a Bayesian sparse learning (SL) [16], weighted subspace pursuit (WSP) [6], weighted l₁ minimization (W-l₁ min) [5], weighted iteratively reweighted least square(W-IRLS), IRLS [17], compressive sampling matched pursuit (COSAMP) in [11], and l₁ minimization (l₁ min) [18].

In order to evaluate the channel estimation performance, we used a normalized mean-square error (MSE) between true and estimated channel vectors as follows:

M S E = \frac{1}{T} \sum_{T} \frac{{‖ {\hat{h}}^{d} - h^{d} ‖}^{2}}{{‖ h^{d} ‖}^{2}}

(24)

where T is the number of trials,

{\hat{h}}^{d}

and

h^{d}

are the estimated and original channel vector, respectively for each trial. In the simulations the trial number T was 250.

In Figure 5 the overcomplete dictionary size was 150 in the proposed algorithm. It could be seen that when the unitary dictionary was used, our proposed algorithm outperformed WSP, COSAMP and IRLS, but was a little worse than W-l₁ with a small gap. However, when the overcomplete dictionary was used, our proposed algorithm outperformed other algorithms, but almost had the same performance as SL with a little performance improvement which could be seen in the zoomed-in subfigure. The overcomplete dictionary in the proposed algorithm can dramatically improve the MSE performance due to the fact that there are more atoms in the overcomplete dictionary than in the unitary dictionary which can improve the sparsity in the angular channel; however, it doesn’t mean that the larger the overcomplete dictionary size is, the better performance it has, which is shown in Figure 6.

We compared the performance of the proposed algorithm with different dictionary sizes in Figure 6. It can be seen that in the high SNR region the performance improved when an overcomplete dictionary was used, but the MSE performance gain did not improve when increasing the dictionary size. For example, the algorithm with a dictionary size of 150 had a relatively better performance than with a dictionary size of 100. However, the performances with a dictionary size of 200 and 250 almost gave the same trends as that with a dictionary size of 150. This was because the larger dictionary would induce angel ambiguity because the correlation of atoms increased. Hence, in the practical engineering, the dictionary size is not recommended to be very large. A large dictionary size is computationally expensive and the benefit is limited. It also should be noted that in the low SNR region the MSE performance with a larger dictionary size did not always do better than those with a smaller dictionary size. For example, when the SNR was 0 dB, they hadsimilar performance. The reason was that in the low SNR region the estimated channel support of the previous timeslot was not accurate enough, and on the other hand larger dictionary size would have deteriorated the dictionary incoherence.

We compared the runtime and convergence performance of the proposed algorithm with a different dictionary size in Figure 7. The relative error was defined as the ratio of the difference of adjacent iteration results to the previous iteration result. It can be seen that the proposed algorithm with dictionary size 150 converged fast than with a dictionary size of 100. However, the improvement had its price, and the runtime for the proposed algorithm with dictionary size 150 was longer which meant that the computational complexity was higher with a larger dictionary size. Based on the simulation results shown in Figure 6 and Figure 7, when the antenna at BS is 100, the dictionary size is recommended to be set at 150 or so to balance the performance improvement and computation complexity.

6. Conclusions

In this paper we proposed a downlink channel estimation algorithm based on overcomplete dictionary and variational Bayesian inference. We converted the complex system model to a real model and exploited the correlation of angular channel sparsity in adjacent timeslots. In the algorithm we divided the timeslots into groups and made use of the channel support information of the previous timeslot to the channel estimation in the current timeslot within each group. The sparsity correlation and Bayesian Cramér–Rao bound for the MSE of channel estimation was analyzed. Compared with other recovery algorithms, such as WSP, IRLS, WIRLS, l₁ min, W-l₁ min and COSAMP, our proposed algorithm with overcomplete dictionary had a relatively better performance. Moderate overcomplete dictionary can improve the MSE performance of channel estimation to balance the computational complexity and performance gain.

Author Contributions

Conceptualization and methodology, W.L.; validation, X.W., S.P. and L.Z.; formal analysis, W.L.; writing—original draft preparation, W.L.; supervision, Y.W.

Funding

This work was supported in part by the National Science Foundation of China (No.61601509 and 61601334), the China Postdoctoral Science Foundation Grant (No.2016M603045 and 2018M632889) and the self-determined research funds of CCNU(CCNU18QN007) from the colleges basic research and operation of MOE.

Acknowledgments

Thank you to the reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of angle change with UT movement.

According to the cosine law, we have

\cos θ = \frac{d_{1}^{2} + d_{L o S}^{2} - {(d_{N L o S} - d_{1})}^{2}}{2 d_{1} d_{L o S}},

(A1)

\cos (θ \pm Δ_{θ}) = \frac{d_{1}^{2} + d_{L o S}^{2} - {(d_{N L o S} \pm d_{Δ} - d_{1})}^{2}}{2 d_{1} d_{L o S}} .

(A2)

Then we can get

θ \pm Δ_{θ} = \arccos (\frac{d_{1}^{2} + d_{L o S}^{2} - {(d_{N L o S} - d_{1})}^{2} - d_{Δ}^{2} \mp 2 d_{Δ} (d_{N L o S} - d_{1})}{2 d_{1} d_{L o S}}) .

(A3)

Since

d_{Δ}^{2}

is very small compared with

d_{1}

and

d_{L o S}

, by the first-order approximation we have

\begin{array}{l} θ \pm Δ_{θ} & \approx \arccos (\frac{d_{1}^{2} + d_{L o S}^{2} - {(d_{N L o S} - d_{1})}^{2} \mp 2 d_{Δ} (d_{N L o S} - d_{1})}{2 d_{1} d_{L o S}}) \\ \approx a r c \cos (\frac{d_{1}^{2} + d_{L o S}^{2} - {(d_{N L o S} - d_{1})}^{2}}{2 d_{1} d_{L o S}}) \pm \frac{2 d_{Δ} (d_{N L o S} - d_{1})}{2 d_{1} d_{L o S}} \frac{1}{\sqrt{1 - \cos^{2} θ}} \\ = θ \pm \frac{2 d_{Δ} (d_{N L o S} - d_{1})}{2 d_{1} d_{L o S}} \frac{1}{\sqrt{1 - \cos^{2} θ}} \end{array} .

(A4)

Then we have

Δ_{θ} \approx \frac{2 d_{Δ} (d_{N L o S} - d_{1})}{2 d_{1} d_{L o S}} \frac{1}{\sqrt{1 - \cos^{2} θ}} .

(A5)

□

Appendix B

Proof of Proposition 1.

Let

z ≜ {\bar{h}, σ}

, the we have

E_{z} {(z - \hat{z}) {(z - \hat{z})}^{T}} \geq J^{- 1} .

(A6)

Since

\bar{h}, σ

are independent, the Fisher information matrix

J

is block diagonal, and can be presented as

J = [\begin{matrix} J_{\bar{h}, \bar{h}} & 0 \\ 0 & J_{σ, σ} \end{matrix}] .

(A7)

Then the inverse of matrix

J

is

J^{- 1} = [\begin{matrix} J_{\bar{h}, \bar{h}}^{- 1} & 0 \\ 0 & J_{σ, σ}^{- 1} \end{matrix}] .

(A8)

Because

p (\bar{y}, z) = p (\bar{y} | z) p (\bar{h} | α) p (α | b) p (b) p (σ)

, we have

\begin{array}{l} J & = E_{z} {- \frac{\partial^{2} \log p (\bar{y}, z)}{\partial z_{i} \partial z_{j}}} \\ = E_{z} {- \frac{\partial^{2} \log p (\bar{y} | z)}{\partial z_{i} \partial z_{j}}} + E_{z} {- \frac{\partial^{2} \log p (\bar{h} | α)}{\partial z_{i} \partial z_{j}}} + E_{z} {- \frac{\partial^{2} \log p (α | b)}{\partial z_{i} \partial z_{j}}} + \\ E_{z} {- \frac{\partial^{2} \log p (b)}{\partial z_{i} \partial z_{j}}} + E_{z} {- \frac{\partial^{2} \log p (σ)}{\partial z_{i} \partial z_{j}}} \end{array}

(A9)

Since we mainly focus on the MSE of

\bar{h}

, we only need to analyze

J_{\bar{h}, \bar{h}}

. We discuss the above formula part by part as follows:

1): Let $J_{\bar{h}, \bar{h}} (\bar{y}) = E_{z} {- \frac{\partial^{2} \log p (\bar{y} | z)}{\partial z_{i} \partial z_{j}}}$ , according to the Bayesian model in Figure 1, we have

$p (\bar{y} | z) ~ N o r m a l (\bar{y} | \bar{A} \bar{h}, σ I) then J_{\bar{h}, \bar{h}} (\bar{y}) = E_{z} {\frac{{\bar{A}}^{T} \bar{A}}{σ}} = \frac{{\bar{A}}^{T} \bar{A}}{σ} .$
2): Let $J_{\bar{h}, \bar{h}} (\bar{h}) = E_{z} {- \frac{\partial^{2} \log p (\bar{h} | α)}{\partial z_{i} \partial z_{j}}}$ , and we have $P (\bar{h} | α) = \prod_{i = 1}^{2 N} N o r m a l ({\bar{h}}_{i} | 0, α_{i})$ , then we get $J_{\bar{h}, \bar{h}} (\bar{h}) = E_{z} {\frac{1}{α_{i}}}$ .
3): Because $E_{z} {- \frac{\partial^{2} \log p (α | b)}{\partial z_{i} \partial z_{j}}}$ , $E_{z} {- \frac{\partial^{2} \log p (b)}{\partial z_{i} \partial z_{j}}}$ and $E_{z} {- \frac{\partial^{2} \log p (σ)}{\partial z_{i} \partial z_{j}}}$ are independent with $\bar{h}$ , they are all 0.
Then in summary, we get $J_{\bar{h}, \bar{h}} = d i a g (E (\frac{1}{α_{i}})) + \frac{1}{σ^{2}} {\bar{A}}^{T} \bar{A}$ .

Since the priori support set information is used in our proposed algorithm, a three-layer model is constructed for the elements belonging to the priori support set, and a two-layer model is used for the elements not belonging to the priori support set, so

E_{Z} {\frac{1}{α_{i}}}

has different expressions for the two cases.

E_{Z} {\frac{1}{α_{i}}}

in the two cases are discussed as follows:

1): When i belongs to the priori support set, according to the three-layer graph model we have

$p (α) = \prod_{i = 1}^{2 N} Gamma (α_{i} | a, b_{i}),$

(A10)

$p (b_{i}) = Gamma (b_{i} | c, d_{i}) = Γ {(c)}^{- 1} d_{i}^{c} b_{i}^{c - 1} e^{- d_{i} b_{i}} .$

(A11)

Then we get

$\begin{array}{l} p (α_{i}) & = \int_{0}^{\infty} p (α_{i} | b_{i}) p (b_{i}) d b_{i} \\ = \int_{0}^{\infty} Γ {(a)}^{- 1} b_{i}^{a} α_{i}^{a - 1} e^{- b_{i} α_{i}} Γ {(c)}^{- 1} d_{i}^{c} b_{i}^{c - 1} e^{- d_{i} b_{i}} d b_{i} \\ = Γ {(a)}^{- 1} Γ {(c)}^{- 1} α_{i}^{a - 1} d_{i}^{c} \frac{Γ (a + c)}{{(a_{i} + d_{i})}^{a + c}} \end{array} .$

(A12)

Accordingly, we have

$\begin{array}{l} E {\frac{1}{α_{i}}} & = \int_{0}^{\infty} \frac{1}{α_{i}} Γ {(a)}^{- 1} Γ {(c)}^{- 1} α_{i}^{a - 1} d_{i}^{c} \frac{Γ (a + c)}{{(a_{i} + d_{i})}^{a + c}} d α_{i} \\ = \int_{0}^{\infty} \frac{1}{α_{i}} \frac{Γ (a + c)}{Γ (a) Γ (c)} {(\frac{α_{i}}{d_{1}})}^{a - 1} {(\frac{α_{i}}{d_{1}} + 1)}^{- a - c} d \frac{α_{i}}{d_{1}} \end{array},$

(A13)

where $\frac{Γ (a + c)}{Γ (a) Γ (c)} {(\frac{α_{i}}{d_{i}})}^{a - 1} {(\frac{α_{i}}{d_{i}} + 1)}^{- a - c}$ satisfies the probability density function of Beta prime distribution.
According to the properties of the Beta prime distribution, when $- a < - 1 < c$ , we have

$\begin{array}{l} E {{(\bar{h} - \hat{\bar{h}})}^{H} (\bar{h} - \hat{\bar{h}})} & \geq | S | \cdot \frac{1}{| S |} \sum_{i \in S} \frac{1}{\frac{1 + c}{a \min (d)} + + \frac{λ_{i}}{σ}} + (N - | S |) \cdot \frac{1}{(N - | S |)} \sum_{i \notin S} \frac{1}{\frac{\max (b)}{a} + \frac{λ_{i}}{σ}} \\ \to | S | \frac{a \min (d)}{1 + c} (1 - \frac{F (s n r_{1}, β)}{4 β s n r_{1}}) + (N - | S |) \frac{a}{\max (b)} (1 - \frac{F (s n r_{2}, β)}{4 β s n r_{2}}) \end{array} .$

(A14)
2): When i does not belong to the priori support set, according to the high-order moment properties for the general gamma distribution, we have

$E {\frac{1}{α_{i}}} = \frac{b_{i}}{a} .$

(A15)

Then in summary, we have

$E {{‖ \bar{h^{'}} - \bar{h} ‖}^{2}} \geq t r ({(d i a g (E (\frac{1}{α_{i}})) + \frac{1}{σ^{2}} {\bar{A}}^{T} \bar{A})}^{- 1}) = \sum_{i \in S} \frac{1}{\frac{1 + c}{a d_{i}} + \frac{λ_{i}}{σ}} + \sum_{i \notin S} \frac{1}{\frac{b_{i}}{a} + \frac{λ_{i}}{σ}},$

(A16)

where S is the diagnosed support set, $λ_{i}$ is the eigenvalues of ${\bar{A}}^{T} \bar{A}$ ,and ${\bar{A}}^{T} \bar{A} \in ℝ^{2 M \times 2 M}$ .

When overcomplete dictionary is as

D^{d} = {\frac{1}{\sqrt{N}} e^{- j \frac{2 π}{M} k n}}_{n, k}

,

k \in {1, \dots, M}, n \in {1, \dots, N}

, and A is Gaussian random matrix with each element is mean 0 and variance

\frac{1}{T_{d}}

, then

A D^{d}

is complex Gaussian random matrix. Then

\bar{A}

is Gaussian random matrix with mean 0 and variance

\frac{ρ^{d}}{2 T_{d}}

.

According to the random matrix theory, for

N \times K

dimensional random matrix H with each element is independent and is variable with mean 0 and variance 1/N, when

K, N \to \infty

and

\frac{K}{N} \to β

,then the empirical distribution of eigenvalues of

H^{T} H

converges almost surely as

f_{β} (x) = {(1 - \frac{1}{β})}^{+} δ (x) + \frac{\sqrt{{(x - a)}^{+} {(b - x)}^{+}}}{2 π β x}

, where

{(x)}^{+} = \max (0, x)

,

a = {(1 - \sqrt{β})}^{2}

,

b = {(1 + \sqrt{β})}^{2}

.

Since

\bar{A} \in ℝ^{2 T_{d} \times 2 M}

, and its element is Gaussian random variable with mean 0 and variance

\frac{ρ^{d}}{2 T_{d}}

By applying the above results for the empirical distribution of eigenvalues of

H^{T} H

, when

T_{d}, M \to \infty

and

\frac{T_{d}}{M} = β

, the empirical distribution of eigenvalues

λ

of

{\bar{A}}^{T} \bar{A}

converges almost surely as

f_{β} (λ) = {(1 - \frac{1}{β})}^{+} δ (λ) + \frac{\sqrt{{(λ - a')}^{+} {(b' - λ)}^{+}}}{2 π β λ \sqrt{ρ^{d}}}

(A17)

where

a' = \sqrt{ρ^{d}} {(1 - \sqrt{β})}^{2}

,

b' = \sqrt{ρ^{d}} {(1 + \sqrt{β})}^{2}

. When s

, M \to \infty

and

\frac{s}{M} = μ

, we have

\begin{array}{l} E {{(\bar{h} - \hat{\bar{h}})}^{H} (\bar{h} - \hat{\bar{h}})} & \geq | S | \cdot \frac{1}{| S |} \sum_{i \in S} \frac{1}{\frac{1 + c}{a \min (d)} + + \frac{λ_{i}}{σ}} + (N - | S |) \cdot \frac{1}{(N - | S |)} \sum_{i \notin S} \frac{1}{\frac{\max (b)}{a} + \frac{λ_{i}}{σ}} \\ \to | S | \frac{a \min (d)}{1 + c} (1 - \frac{F (s n r_{1}, β)}{4 β s n r_{1}}) + (N - | S |) \frac{a}{\max (b)} (1 - \frac{F (s n r_{2}, β)}{4 β s n r_{2}}) \end{array},

(A18)

where

s n r_{1} = \frac{a \min (d)}{(1 + c) σ}

,

s n r_{2} = \frac{a}{σ \max (b)}

and

F (x, z) = {(\sqrt{x {(1 + \sqrt{z})}^{2} + 1} - \sqrt{x {(1 - \sqrt{z})}^{2} + 1})}^{2}

.

Then the proofs are complete. □

References

Lu, L.; Li, G.Y.; Swindlehurst, A.L.; Ashikhmin, A.; Zhang, R. An overview of Massive MIMO: Benefits and challenges. IEEE J. Sel. Top. Signal Process. 2014, 8, 742–758. [Google Scholar] [CrossRef]
Rao, X.; Lau, V.K.N. Distributed compressive CSIT estimation and feedback for FDD multi-user massive MIMO systems. IEEE Trans. Signal Process. 2014, 12, 3261–3271. [Google Scholar]
Vaswani, N.; Lu, W. Modified-CS: Modifying compressive sensing for problems with partially known support. IEEE Trans. Signal Process. 2010, 9, 4595–4607. [Google Scholar] [CrossRef]
Borries, R.V.; Miosso, C.; Potes, C. Compressed sensing using priori information. In Proceedings of the 2nd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive, St. Thomas, VI, USA, 12–14 December 2007; pp. 121–124. [Google Scholar]
Tseng, C.C.; Wu, J.Y.; Lee, T.S. Enhanced compressive downlink CSI Recovery for FDD Massive MIMO systems using weighted Block l₁ minimization. IEEE Trans. Commun. 2016, 3, 1055–1066. [Google Scholar] [CrossRef]
Lu, W.; Wang, Y.; Fang, Q.; Peng, S. Downlink compressive channel estimation with support diagnosis in FDD massive MIMO. J. Wirel. Commun. Netw. 2018, 115, 1–12. [Google Scholar] [CrossRef]
Masood, M.; Afify, L.H.; Al-Naffouri, T.Y. Efficient coordinated recovery of sparse channels in massive MIMO. IEEE Trans. Signal Process. 2015, 1, 104–118. [Google Scholar] [CrossRef]
Cheng, X.; Sun, J.; Li, S. Channel estimation for FDD multi-user massive MIMO: A variational Bayesian inference-based approach. IEEE Trans. Wirel. Commun. 2017, 11, 7590–7602. [Google Scholar] [CrossRef]
Dai, J.; Liu, A.; Lau, V.K.N. FDD massive MIMO channel estimation with arbitrary 2D-Array Geometry. IEEE Trans. Signal Process. 2018, 10, 2584–2599. [Google Scholar] [CrossRef]
Xie, H.; Gao, F.; Jin, S.; Fang, J.; Liang, Y. Channel Estimation for TDD/FDD Massive MIMO Systems with Channel Covariance computing. IEEE Trans. Wirel. Commun. 2018, 6, 4206–4218. [Google Scholar] [CrossRef]
Shen, W.; Dai, L.; Shi, Y.; Shim, B.; Wang, Z. Joint Channel Training and Feedback for FDD Massive MIMO Systems. IEEE Trans. Veh. Technol. 2016, 10, 8762–8767. [Google Scholar] [CrossRef]
Tauböck, G.; Hlawatsch, F.; Eiwen, D.; Rauhut, H. Compressive Estimation of Doubly Selective Channels in Multicarrier Systems: Leakage Effects and Sparsity-Enhancing Processing. IEEE J. Sel. Top. Signal Process. 2010, 2, 255–271. [Google Scholar] [CrossRef]
Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
Christopher, M. Bishop. Bayesian Linear Regression. In Pattern Recognition and Machine Learning; Springer-Verlag: Heidelberg, Germany, 2006; pp. 152–160. [Google Scholar]
Fang, J.; Shen, Y.; Li, F.; Li, H.; Chen, Z. Support knowledge-aided sparse Bayesian learning for compressed sensing. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 19–24 April 2015; pp. 3786–3790. [Google Scholar]
Ji, S.; Xue, Y.; Carin, L. Bayesian compressive sensing. IEEE Trans. Signal Process. 2008, 6, 2346–2356. [Google Scholar] [CrossRef]
Lu, W.; Wang, Y.; Fang, Q.; Peng, S. Compressive Channel Estimation Based on Weighted IRLS in FDD Massive MIMO. Wirel. Pers. Commun. 2018, 2, 1–10. [Google Scholar] [CrossRef]
Candes, E.J.; Romberg, J.K.; Tao, T. Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 2006, 8, 1207–1223. [Google Scholar] [CrossRef]

Figure 1. Graphical model for the channel estimation with Bayesian inference. The nodes with double circle, single circle and square correspond to the observed data, hidden variables and parameters, respectively.

Figure 2. Channel estimations by group. Each block represents one timeslot, and the block filled with grey is the timeslot with variational Bayesian inference (VBI) for the channel estimation, while the blank blocks are the timeslots with the proposed algorithm for channel estimation.

Figure 3. Ellipse geometry channel model for line of sight (LoS) and non-LoS (NLoS) transmission.

Figure 4. Transmission angle change during one timeslot with a different LoS distance and different distances between the base station (BS) and reflector.

Figure 5. Comparisons of channel estimation mean square error (MSE) for different algorithms.

Figure 6. Comparisons of channel estimation of MSE for the proposed algorithm with different dictionary sizes.

Figure 7. Comparisons of runtime and convergence performances of the proposed algorithm with orthogonal dictionary (size is 100) and overcomplete dictionary (size is 150).

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, W.; Wang, Y.; Wen, X.; Peng, S.; Zhong, L. Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference. Electronics 2019, 8, 473. https://doi.org/10.3390/electronics8050473

AMA Style

Lu W, Wang Y, Wen X, Peng S, Zhong L. Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference. Electronics. 2019; 8(5):473. https://doi.org/10.3390/electronics8050473

Chicago/Turabian Style

Lu, Wei, Yongliang Wang, Xiaoqiao Wen, Shixin Peng, and Liang Zhong. 2019. "Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference" Electronics 8, no. 5: 473. https://doi.org/10.3390/electronics8050473

APA Style

Lu, W., Wang, Y., Wen, X., Peng, S., & Zhong, L. (2019). Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference. Electronics, 8(5), 473. https://doi.org/10.3390/electronics8050473

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference

Abstract

1. Introduction

2. System Model

3. Proposed Algorithm

4. Discussion

4.1. Sparsity Correlation Analysis

4.2. Bayesian Cramér-Rao Bound Analysis

5. Simulations

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI