Next Article in Journal
Using Deep Principal Components Analysis-Based Neural Networks for Fabric Pilling Classification
Next Article in Special Issue
Mobile-Phone Antenna Array with Diamond-Ring Slot Elements for 5G Massive MIMO Systems
Previous Article in Journal
Dual Band-Notched Rectangular Dielectric Resonator Antenna with Tunable Characteristic
Previous Article in Special Issue
An Efficient Pilot Assignment Scheme for Addressing Pilot Contamination in Multicell Massive MIMO Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference

1
Air Force Early Warning Academy, Wuhan 430019, China
2
National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China
3
Department of communication system, China University of Geoscience, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Electronics 2019, 8(5), 473; https://doi.org/10.3390/electronics8050473
Submission received: 24 March 2019 / Revised: 19 April 2019 / Accepted: 24 April 2019 / Published: 28 April 2019
(This article belongs to the Special Issue Massive MIMO Systems)

Abstract

:
We exploited the temporal correlation of channels in the angular domain for the downlink channel estimation in a massive multiple-input multiple-output (MIMO) system. Based on the slow time-varying channel supports in the angular domain, we combined the channel support information of the downlink angular channel in the previous timeslot into the channel estimation in the current timeslot. A downlink channel estimation method based on variational Bayesian inference (VBI) and overcomplete dictionary was proposed, in which the support prior information of the previous timeslot was merged into the VBI for the channel estimation in the current timeslot. Meanwhile the VBI was discussed for a complex value in our system model, and the structural sparsity was utilized in the Bayesian inference. The Bayesian Cramér–Rao bound for the channel estimation mean square error (MSE) was also given out. Compared with other algorithms, the proposed algorithm with overcomplete dictionary achieved a better performance in terms of channel estimation MSE in simulations.

1. Introduction

Massive multiple-input multiple-output (MIMO) is the key technology for next generation wireless communication. The large number of antennas enable high spectrum efficiency and lower power consumption [1]. To get these benefits, the base station (BS) needs to acquire the channel stated information (CSI) for uplink and downlink. Pilot-based channel estimation is widely used in wireless communication systems. In the time division duplex (TDD) system, the channel reciprocity is used to get the CSI by only estimating the uplink channel at BS. In the frequency division duplex (FDD) system, the channel reciprocity cannot be used directly. In FDD massive MIMO system it is challenging to get the downlink CSI with the conventional feedback scheme. In the conventional feedback scheme each user estimates its channel and then feeds back the estimated CSI to the BS. The pilot and feedback overheads are high for massive MIMO, since they are scaling linearly with the number of antennas. Hence, it is important to design an efficient downlink channel estimation and feedback scheme for a FDD massive MIMO system.
By exploiting the sparsity in massive MIMO channel, compressed sensing (CS) was applied in the channel estimation and feedback. The users could feed the compressed training measurements back to the BS, and an orthogonal matching pursuit (OMP) was used for downlink CSI recovery in [2]. In [3] the modified basis pursuit (MBP) was proposed by utilizing the partial priori signal support information to improve the recovery performance. In [4] the support information of a signal in the discrete fourier transform domain was incorporated into the weighted l1 minimization approach for CS recovery, which could reduce the number of measurements by the size of the known part of support. In [5] a three-level weighting scheme based on the support information was used for the weighted l1 minimization and the simulation results showed superiority. In [6] we exploited the reciprocity between uplink and downlink channels in the angular domain, and diagnosed the supports of the downlink channel from the estimated uplink channel, and proposed a weighted subspace pursuit (SP) channel estimation algorithm for FDD massive MIMO. It can be seen that CS was effective in the channel estimation for massive MIMO.
However, most of these algorithms need the sparsity level in the estimation algorithm, which is not practical in engineering scenarios. The Bayesian framework can be applied to the compressive channel estimation. In [7], Bayesian estimation of sparse massive MIMO channel was developed in which neighboring antennas shared among each other their information about the channel support. In [8] a variational expectation maximization strategy was used for massive MIMO channel estimation, and a Gaussian mixture prior model was designed to capture the individual sparsity for each channel and the joint sparsity among users. In [9] a sparse Bayesian learning algorithm was proposed for FDD massive MIMO channel estimation with arbitrary 2D-array. By the Bayesian framework in compressive channel estimation the sparsity level is unnecessary, and it has relatively better recovery performance. Additionally, there exists angular reciprocity in massive MIMO. For example, the channel covariance matrices for uplink and downlink are reconstructed by making use of the angle reciprocity between uplink and downlink channels in [10]. Hence it is promising to apply the angular reciprocity and Bayesian framework in the compressive massive MIMO channel estimation.
Additionally, there exists angular reciprocity in the FDD massive MIMO. There is also time correlation of channels. In [11] a differential compressive feedback in FDD massive MIMO was proposed based on the channel impulses response (CIR) between timeslots, which were slow time-varying and sparse, and the differential CIR between two CIRs in adjacent timeslots was sparse. Inspired by the sparsity in the angular domain and time correlation of channels, the correlated angular sparsity can also be exploited for massive MIMO channel estimation.
In this paper we proposed a downlink channel estimation in a TDD/FDD massive MIMO system. The timeslots were divided into groups. In each group the estimated channel support information of the previous timeslot was utilized by the following timeslot. The correlated angular sparsity between timeslots in the downlink channel was utilized in the Bayesian inference for channel recovery. We transformed the complex sparse vector to the real sparse vector recovery by Bayesian inference, and the structural sparsity of the transformed real sparse vector was utilized. Meanwhile, the prior support information from the estimated channel in the previous timeslot was made use of in modeling the hidden hyperparameters in the Bayesian model. A Bayesian Cramér–Rao bound analysis is presented, and simulations are given out to verify the performance of the proposed algorithm. The main contributions were as follows: (1) a group-based channel estimation scheme was proposed, in which previous estimated channel support information was used as the priori information in the following timeslot due to the sparsity correlation; (2) priori information was merged into the Bayesian inference algorithm for channel recovery; (3) the Bayesian Cramér–Rao bound for the channel estimation mean square error (MSE) was analyzed.
The system model is illustrated in Section 2, while the proposed channel estimation algorithm based on Bayesian inference is presented in Section 3. The Bayesian Cramér–Rao bound (BCRB) for the channel estimation of mean square error (MSE) is given out in Section 4. Simulations and conclusions are presented in Section 5 and Section 6.
In the paper, we used the following notations. Scalars, vectors and matrices were denoted by lower-case, boldface lower-case and boldface upper-case symbols. The probability density function of a given random variable was denoted by p(·). Gamma(x|a, b) was the Gamma probability density function (PDF) with shape parameters a and b for x, while Normal(x|c, d) was the Gaussian PDF with parameters mean c and variance d for x. Γ(·) was the Gamma function, and ln(·) was the logarithm function. Tr(·) stood for the trace operator. 𝔼a(·) denoted the expectation operation with the PDF of variable a.

2. System Model

We considered a massive MIMO TDD/FDD system with a single user, and assumed that the BS was equipped with N antennas and the user terminal (UT) had a single antenna. For the downlink channel estimation in the massive MIMO system, the BS transmitted the pilots to UT. The UT received the pilots and fed back the received signal to the BS directly. The received signal y d ( t ) at the UT in the t-th timeslot was written as
y d ( t ) = ρ d A h d ( t ) + n d ( t )
where h d ( t ) N × 1 is the downlink channel, A T d × N is the downlink pilots, T d is the pilot length, ρ d is the downlink received power, n d T d × 1 is the received noise with each element to be i.i.d Gaussian with mean 0 and variance σ 2 , y d ( t ) T d × 1 is the received signal at UT.
Since in the massive MIMO there existed sparsity, when D d N × M was the channel dictionary for downlink channel which could be unitary dictionary or overcomplete dictionary ( M > N , their column vector had the form of steering vector with a different sampling angle), h a d ( t ) was the sparse representation with h d ( t ) = D d h a d ( t ) . In this paper we applied the overcomplete dictionary to present the sparse angular channel to get a better recovery performance. In the downlink channel estimation, we needed to obtain h ^ a d ( t ) the estimated downlink channel in the angular domain in the t-th timeslot.
By utilizing the sparse channel representation we then had
y d ( t ) = ρ d A D d h a d ( t ) + n d ( t )
For simplicity, the timeslot mark is omitted in the following equations. Since y d ( t ) , h a d ( t ) , and n d ( t ) are complex number vectors, we could rewrite Equation (2) into real number vectors as
[ Re ( y d ) Im ( y d ) ] = [ Re ( ρ d A D d ) Im ( ρ d A D d ) Im ( ρ d A D d ) Re ( ρ d A D d ) ] [ Re ( h a d ( t ) ) Im ( h a d ( t ) ) ] + [ Re ( n d ( t ) ) Im ( n d ( t ) ) ]
where Re(·) and Im(·) denote the real and imaginary parts respectively. For simplicity, we rewrote Equation (3) as
y ¯ = A ¯ h ¯ + n ¯
where y ¯ = [ Re ( y d ) Im ( y d ) ] , A ¯ = [ Re ( ρ d A D d ) Im ( ρ d A D d ) Im ( ρ d A D d ) Re ( ρ d A D d ) ] , h ¯ = [ Re ( h a d ( t ) ) Im ( h a d ( t ) ) ] and n ¯ = [ Re ( n d ( t ) ) Im ( n d ( t ) ) ] .
On the other hand, we considered the meaning of sparse angular channel representation h a d ( t ) . If the transmission angles were allocated exactly at the sampling points in the channel dictionary D d , then the corresponding coefficient in the h a d ( t ) was nonzero. If the path number was smaller than the antenna number, then h a d ( t ) was sparse. However, there was leakage effect induced by dictionary mismatch which will have deteriorated the sparsity of the angular channel representation [12]. When the movement velocity of UT was not very high, e.g., v = 12 km/h, and the typical timeslot duration τ = 0.5 ms, the movement distance of UT in one timeslot was 0.017 m. When the distance of UT and BS was 200 m, the angle change for the line of sight (LoS) transmission in one timeslot was 0.0049° which was much smaller than the sampling interval in the dictionary. For the non-LoS (NLoS) transmission, the angle change was also small which is discussed in Section 4.1. Hence the transmission angle change between two timeslots is very small if the transmission environment doesn’t change dramatically, and there is correlation in the angular channel sparsity between adjacent timeslots. In other words, the information regarding the estimated angular channel in the previous timeslot could be utilized in the current channel estimation.
It was proven that the prior support information could improve the channel recovery performance [3,4,5,6]. Hence in this paper we made use of the prior support information from the previous timeslot to improve the Bayesian channel estimation. In the following section we have discussed how to merge the prior information into the Bayesian inference algorithm for channel estimation.

3. Proposed Algorithm

We designed a three-layer hierarchical graphical model as shown in Figure 1. In the first layer, h ¯ was assigned a Gaussian prior distribution
p ( h ¯ | α ) = i = 1 2 N p ( h ¯ i | α i )
where h ¯ i and α i are the i-th entry in h ¯ and α respectively, p ( h ¯ i | α i ) = N o r m a l ( h ¯ i | 0 , α i ) and α i is the inverse variance of the Gaussian distribution. When h ¯ i is close to 0, then α i is very large, and vice versa.
In the second layer, we assumed a Gamma distribution as hyperpriors over the hyperparameters α i , and it can be presented as
p ( α ) = i = 1 2 N G a m m a ( α i | a i , b i )
where Gamma(·) is the Gamma PDF, and the parameters ai and bi characterize the shape of Gamma PDF. For fixed ai, the larger bi is, the smaller α i is; then h ¯ i tends to be nonzero. In the sparse Bayesian learning ai and bi were set to be very small for non-informative hyperprior over α i [13].
In our model, we set ai to be constant with a predefined value, and we modeled bi as random parameters. In Figure 1 it could be found that the entries of y ¯ were divided into two sets by their indices, i.e., { S , S + N } and { S , S + N } c , where S was the set with channel support indices from the previous timeslot, and S + N was the set with each index in S added by N, since we converted the complex system model to the real system model as Equation (3) . { S , S + N } c was the complementary set of { S , S + N } . For example, in the (t − 1)-th timeslot, the positions of nonzero entries or called supports in h a d ( t 1 ) were S = {4, 5, 6}, then S + N = { 4 + N , 5 + N , 6 + N } . The probable supports for h a d ( t ) in the current t-th timeslot can be assumed to be the same as those for previous (t − 1)-th timeslot for simplicity. On the other hand, we could have also diagnosed the probable channel supports further by taking the angle deviation and leakage effects into consideration. In this paper we adopted the support diagnosis algorithm, and the details can be found in [6].
For y ¯ j , j { S , S + N } , we employed a Gamma distribution over the hyperparameters bi in the third layer as
Gamma ( b i | c , d ) = Γ ( c ) 1 d c b i c 1 e d b i
where c and d characterize the shape of Gamma PDF. By the system model and assumptions for massive MIMO, we could use a Bayesian inference to perform the sparse channel recovery.
According to the standard Bayesian inference [14], let z { h ¯ , α , b } , we have
ln p ( z i ) = E z i , i j [ ln p ( y ¯ , z ) ] + constant E z i , i j [ ln p ( y ¯ , z ) ]
where constant is a constant used for p ( z i ) normalization, p ( y ¯ , z ) is the joint pdf for h ¯ and z , and zi can be h ¯ , α , and b . We have p ( y ¯ , z ) = p ( z | y ¯ ) p ( y ¯ ) . We assume p ( z | y ¯ ) posterior independence among the hidden variables z , then p ( z | y ¯ ) p ( z ) , and p ( z ) is the product of PDF of h ¯ , α , and b .
In order to make use of the prior support information from the previous timeslot and the structure sparsity in Equation (4), we needed to make some modifications to the standard Bayesian inference. The main considerations for the modifications were as follows:
(I) Since we rewrote Equation (2) as Equation (4), if h a , i d was nonzero, then h ¯ i and h ¯ i + N were nonzero simultaneously. Hence it was wise to assume that bi and bi+N were the same;
(II) In the standard Bayesian learning ai and bi were set to be very small for non-informative hyperprior over α i . This assumption was valid if no prior information was provided. If the prior support information was available, such as that the support information of the previous timeslot could be used for channel estimation in the coming timeslot by sparsity correlation, it was wise to assume that the supports between adjacent timeslots were partially common. If the i-th element in the angular channel vector was nonzero, then the hyperparameter bi and bi+N tended to be variables rather than to be fixed small numbers, which meant only for the indices from the prior support set S the third layer prior model was adopted.
It can be seen that the consideration (II) was similar to [15]. However, our proposed algorithm was extended for a complex number system and the structure sparsity was considered. However, on the other hand, the overcomplete dictionary was adopted in our algorithm.
The proposed uplink-aided downlink channel estimation based on Bayesian inference was as follows:
(i) Update of p( h ¯ )
According to Equation (8), by ignoring the terms which are independent of h ¯ , we have
ln p ( h ¯ ) E α , b [ ln p ( y ¯ | h ¯ ) + ln p ( h ¯ | α ) + ln p ( b ) ] E α , b [ ln p ( y ¯ | h ¯ ) + ln p ( h ¯ | α ) ] = 1 2 σ 2 ( y ¯ A ¯ h ¯ ) T ( y ¯ A ¯ h ¯ ) 1 2 h ¯ T Λ h ¯
where Λ = diag { E α [ α i ] } , σ 2 is the noise variance in the system model, the vectors b and α are comprised by bi and αi respectively. Since p ( y ¯ | h ¯ ) and p ( h ¯ | α ) are a Gaussian distribution, then p ( h ¯ ) follows a Gaussian distribution with the mean μ and covariance ϕ given by
μ = 1 σ 2 Φ A ¯ T y ¯
Φ = ( 1 σ 2 A ¯ T A ¯ + Λ )
(ii) Update of p( α )
According to Equation (8), by ignoring the terms which are independent of α , we have
ln p ( α ) E h ¯ , b [ ln p ( y ¯ | h ¯ ) + ln p ( h ¯ | α ) + ln p ( α | a , b ) + ln p ( b ) ] E h ¯ , b [ ln p ( h ¯ | α ) + ln p ( α | a , b ) ] = i = 1 2 N E h ¯ , b { ( a i 0.5 ) ln α i ( 0.5 h ¯ i 2 + b i ) α i } = i { S , S + N } E h ¯ , b { ( a i + 0.5 1 ) ln α i ( 0.5 h ¯ i 2 + b i ) α i } + i { S , S + N } c E h ¯ , b { ( a 0.5 ) ln α i ( 0.5 h ¯ i 2 + b i ) α i } = i { S , S + N } { ( a i + 0.5 1 ) ln α i ( E h ¯ , b ( b i + b i + N ) 2 + E h ¯ , b ( h ¯ i 2 + h ¯ i + N 2 ) 4 ) α i } + i { S , S + N } c { ( a i + 0.5 1 ) ln α i ( b i + E h ¯ , b ( h ¯ i 2 + h ¯ i + N 2 ) 4 ) α i }
where S is the estimated support set from the previous timeslot. Since the complex system model was converted in Equation (4). By (II), S + N { s i + N } was also the support set in the converted system model in Equation (4). For i { S , S + N } , b i is variable number, b i and b i + N were assumed to be the same, we used 0.5 E h ¯ , b ( b i + b i + N ) to present E h ¯ , b ( b i ) . The same assumption was applied to h ¯ i and h ¯ i + N with E h ¯ , b ( h ¯ i 2 ) = 0.5 E h ¯ , b ( h ¯ i 2 + h ¯ i + N 2 ) . In this way the structural sparsity was utilized.
Since p ( α | a , b ) is the Gamma distribution and p ( h ¯ | α ) is the Gaussian distribution, p ( α ) is the Gamma distribution. Then p ( α i ) is also the Gamma distribution with the updated parameters a ˜ i and b ˜ i given by
a ˜ i = a i + 0.5
b ˜ i = { E h ¯ , b ( b i + b i + N ) 2 + E h ¯ , b ( h ¯ i 2 + h ¯ i + N 2 ) 4 , i { S , S + N } b i + E h ¯ , b ( h ¯ i 2 + h ¯ i + N 2 ) 4 , i { S , S + N } c
(iii) Update of p ( b { S , S + N } )
According to Equation (8), by ignoring the terms which are independent of b, we have
ln p ( b { S , S + N } ) E α , h ¯ [ ln p ( y ¯ | h ¯ ) + ln p ( h ¯ | α ) + ln p ( α | a , b ) + ln p ( b | c , d ) ] E α , h ¯ [ ln p ( α | a , b ) + ln p ( b | c , d ) ] = i { S , S + N } { b i E α ( α i ) + ( c i 1 ) ln b i d i b i }
where b { S , S + N } is comprised by the entries indicated by { S , S + N } in b . In (15) the α , a, b, c, d are also comprised by their indicated { S , S + N } , the subscript { S , S + N } is omitted for simplicity. As shown in Figure 1, b { S , S + N } was modelled as a Gamma distribution. Since p ( α i | a i , b i ) and p ( b i | c i , d i ) were a Gamma distribution, p ( b ˜ { S , S + N } ) was Gamma( b ˜ i { S , S + M } | c ˜ i , d ˜ i ), and the updated c ˜ i and d ˜ i were given by
c ˜ i = a i + c i
d ˜ i = d i + E α ( α i )
Then the Bayesian inference for the channel estimation was executed iteratively among (i), (ii), and (iii). The details of the algorithm are summarized in step 3 of Algorithm 1. When the estimated channel vector h ¯ was recovered, we needed to convert it to the complex vector h a d according to Equation (3).
Algorithm 1 Downlink channel estimation with variational inference algorithm and overcomplete dictionary.
Input: A ¯ , y ¯ , σ 2
Output: h ¯
  • Divide the timeslots into groups, and with each group comprised by tg timeslots.
  • For the first timeslot in the group, use variational Bayesian inference (VBI) for channel estimation, and obtain the angular channel supports.
  • For the rest of the timeslots in the group, utilize the support information from the previous timeslot for channel estimation one by one. The recovery algorithm in each timeslot is as follows:
    3.1.
    Initialize α , a, b, c, d.
    3.2.
    μ = 1 σ 2 ϕ A ¯ T y ¯ , ϕ = ( 1 σ 2 A ¯ T A ¯ + Λ ) , E h ¯ , b ( h ¯ i 2 ) = μ i 2 + ϕ i , i , where Λ = diag { E α [ α i ] } , μ i is the i-th entry in μ , and ϕ i , i is the i-th diagonal entry in ϕ .
    3.3.
    Update a ˜ i and b ˜ i according to Equations (13) and (14) in (ii) ( a ˜ i and b ˜ i are the updated a i and b i , and a i and b i are the results from last iteration); then according to the property of the Gamma distribution variable, E α ( α i ) = a ˜ i / b ˜ i .
    3.4.
    Update c ˜ and d ˜ according to Equations (16) and (17) in (iii) ( c ˜ and d ˜ are the updated c and d , and c and d are the results from last iteration); then according to the property of the Gamma distribution variable, E α ( b ˜ i ) = c ˜ / d ˜ .
    3.5.
    Go to step 3.2 until stop criteria meets.
    3.6.
    Then h ¯ = μ .
  • Go back to step 2 for a new group of timeslots.
In a practical massive MIMO system, the transmission environment may change suddenly, in this way the correlation of sparsity between adjacent timeslots will deteriorate, and the previous channel support information cannot be utilized. On the other hand, the error will accumulate if the previous channel support information is utilized timeslot by timeslot. Hence, the initialization is important for the robustness and efficiency of the algorithm. As shown in Figure 2 divided the timeslots into groups, and each group was comprised of several timeslots. During the channel estimation for each group, the VBI was used for the channel estimation in the first timeslot, and then the proposed algorithm was executed for the remaining timeslots in which the channel support information of the previous timeslot was made use of by the current timeslot. This procedure is detailed in steps 1, 2 and 4 in Algorithm 1.

4. Discussion

4.1. Sparsity Correlation Analysis

The UT movement distance was very small when the velocity of UT was small and the timeslot was 0.5 ms. The reflector for the transmission was static during the UT moving between timeslots. The ellipse geometry channel model is shown in Figure 3. The line of sight (LoS) distance between BS and UT was dLos, the non-LoS (NLoS) distance by reflector between BS and UT was dNLoS, and the UT movement distance in one timeslot was dΔ. If the transmission path was still reflected by the same reflector as shown in Figure 3, the maximum and minimum NLoS distances from BS to UT between timeslots were dNLoS + dΔ and dNLoSdΔ. The transmission angle change was Δθ. The distance between the reflector and BS was d1. By some mathematical manipulations shown in Appendix A, we got
Δ θ 2 d Δ ( d N L o S d 1 ) 2 d 1 d L o S 1 1 cos 2 θ
In order to illustrate the angle change Δθ during one timeslot, we assumed that dNLoS was 800 m, the velocity of UT was 14.4 km/h, and the typical timeslot duration τ = 0.5 ms, then the movement distance of UT in one timeslot was 0.02 m. By changing the distance between BS and reflector, as shown in Figure 4, the angle change was not more than 0.025°. It should be noted that when the LoS distance and the dNLoS were fixed, BS and reflector distance could not be arbitrary vales due to triangle inequality. Hence, the angle of arrival or departure changed slowly and then there was sparsity correlation among the angular channels for adjacent timeslots.

4.2. Bayesian Cramér-Rao Bound Analysis

In this section we have discussed the Bayesian Cramér–Rao bound (BCRB) for the channel estimation with the proposed algorithm. Let z { h ¯ , σ } , then the BCRB for the channel vector h ¯ is given by the inverse of the Fisher information matrix J as:
J = E z { 2 log p ( y ¯ , z ) z i z j }
According to the system model in Section 2, h ¯ , σ are independent, the Fisher information matrix J is block diagonal. We can rewrite p ( y ¯ , z ) as
p ( y ¯ , z ) = p ( y ¯ | z ) p ( h ¯ | α ) p ( α | b ) p ( b )
Then the BCRB on the MSE of the estimated channel vector h ¯ is given by
E { h ¯ h ¯ 2 } t r ( J h ¯ i h ¯ j 1 )
where J h ¯ i h ¯ j = E z { 2 log p ( y ¯ , z ) h ¯ i h ¯ j } is the fisher information sub-matrix. Thus, we can obtain the Bayesian Cramér–Rao bound of the minimum mean square error for the estimated channel h ¯ as shown in Proposition 1.
Proposition 1.
The BCRB of MSE for the channel estimation h ¯ is represented as
E { h ¯ h ¯ 2 } t r ( ( d i a g ( E ( 1 α i ) ) + 1 σ 2 A ¯ T A ¯ ) 1 ) = i S 1 1 + c a d i + λ i σ + i S 1 b i a + λ i σ
where S is the diagnosed support set, λ i is the eigenvalues of A ¯ T A ¯ , and A ¯ T A ¯ 2 N × 2 N , and a, b i , c, and d i are the parameters in the Bayesian model inFigure 1. When T d , M and T d M = β , according to the random matrix theory, we have
E { ( h ¯ h ¯ ^ ) H ( h ¯ h ¯ ^ ) } | S | 1 | S | i S 1 1 + c a min ( d ) + + λ i σ + ( N | S | ) 1 ( N | S | ) i S 1 max ( b ) a + λ i σ | S | a min ( d ) 1 + c ( 1 F ( s n r 1 , β ) 4 β s n r 1 ) + ( N | S | ) a max ( b ) ( 1 F ( s n r 2 , β ) 4 β s n r 2 )
where s n r 1 = a min ( d ) ( 1 + c ) σ , s n r 2 = a σ max ( b ) , F ( x , z ) = ( x ( 1 + z ) 2 + 1 x ( 1 z ) 2 + 1 ) 2 , min ( d ) and max ( b ) are the minimum and maximum entries in d and b.
The proof of proposition 1 is presented in Appendix B. From proposition 1, we can see that the MSE lower bound is related to the priori support size | S | , ( 1 + c ) / min ( d ) and max ( b ) for the massive MIMO channel estimation.

5. Simulations

In the simulation, the support diagnosis algorithm in [6] was adopted, and we assumed that the transmission angle change between timeslots was within 1 degree. The pilot length was 50, and antenna number at the BS was 100. The channel was generated according to the spatial model as defined in 3GPP TR25.996. We compared our proposed algorithm with a unitary dictionary with a size of 100 and the overcomplete dictionary with a size of 150, 200, and 250, and compared this with a Bayesian sparse learning (SL) [16], weighted subspace pursuit (WSP) [6], weighted l1 minimization (W-l1 min) [5], weighted iteratively reweighted least square(W-IRLS), IRLS [17], compressive sampling matched pursuit (COSAMP) in [11], and l1 minimization (l1 min) [18].
In order to evaluate the channel estimation performance, we used a normalized mean-square error (MSE) between true and estimated channel vectors as follows:
M S E = 1 T T h ^ d h d 2 h d 2
where T is the number of trials, h ^ d and h d are the estimated and original channel vector, respectively for each trial. In the simulations the trial number T was 250.
In Figure 5 the overcomplete dictionary size was 150 in the proposed algorithm. It could be seen that when the unitary dictionary was used, our proposed algorithm outperformed WSP, COSAMP and IRLS, but was a little worse than W-l1 with a small gap. However, when the overcomplete dictionary was used, our proposed algorithm outperformed other algorithms, but almost had the same performance as SL with a little performance improvement which could be seen in the zoomed-in subfigure. The overcomplete dictionary in the proposed algorithm can dramatically improve the MSE performance due to the fact that there are more atoms in the overcomplete dictionary than in the unitary dictionary which can improve the sparsity in the angular channel; however, it doesn’t mean that the larger the overcomplete dictionary size is, the better performance it has, which is shown in Figure 6.
We compared the performance of the proposed algorithm with different dictionary sizes in Figure 6. It can be seen that in the high SNR region the performance improved when an overcomplete dictionary was used, but the MSE performance gain did not improve when increasing the dictionary size. For example, the algorithm with a dictionary size of 150 had a relatively better performance than with a dictionary size of 100. However, the performances with a dictionary size of 200 and 250 almost gave the same trends as that with a dictionary size of 150. This was because the larger dictionary would induce angel ambiguity because the correlation of atoms increased. Hence, in the practical engineering, the dictionary size is not recommended to be very large. A large dictionary size is computationally expensive and the benefit is limited. It also should be noted that in the low SNR region the MSE performance with a larger dictionary size did not always do better than those with a smaller dictionary size. For example, when the SNR was 0 dB, they hadsimilar performance. The reason was that in the low SNR region the estimated channel support of the previous timeslot was not accurate enough, and on the other hand larger dictionary size would have deteriorated the dictionary incoherence.
We compared the runtime and convergence performance of the proposed algorithm with a different dictionary size in Figure 7. The relative error was defined as the ratio of the difference of adjacent iteration results to the previous iteration result. It can be seen that the proposed algorithm with dictionary size 150 converged fast than with a dictionary size of 100. However, the improvement had its price, and the runtime for the proposed algorithm with dictionary size 150 was longer which meant that the computational complexity was higher with a larger dictionary size. Based on the simulation results shown in Figure 6 and Figure 7, when the antenna at BS is 100, the dictionary size is recommended to be set at 150 or so to balance the performance improvement and computation complexity.

6. Conclusions

In this paper we proposed a downlink channel estimation algorithm based on overcomplete dictionary and variational Bayesian inference. We converted the complex system model to a real model and exploited the correlation of angular channel sparsity in adjacent timeslots. In the algorithm we divided the timeslots into groups and made use of the channel support information of the previous timeslot to the channel estimation in the current timeslot within each group. The sparsity correlation and Bayesian Cramér–Rao bound for the MSE of channel estimation was analyzed. Compared with other recovery algorithms, such as WSP, IRLS, WIRLS, l1 min, W-l1 min and COSAMP, our proposed algorithm with overcomplete dictionary had a relatively better performance. Moderate overcomplete dictionary can improve the MSE performance of channel estimation to balance the computational complexity and performance gain.

Author Contributions

Conceptualization and methodology, W.L.; validation, X.W., S.P. and L.Z.; formal analysis, W.L.; writing—original draft preparation, W.L.; supervision, Y.W.

Funding

This work was supported in part by the National Science Foundation of China (No.61601509 and 61601334), the China Postdoctoral Science Foundation Grant (No.2016M603045 and 2018M632889) and the self-determined research funds of CCNU(CCNU18QN007) from the colleges basic research and operation of MOE.

Acknowledgments

Thank you to the reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of angle change with UT movement.
 
According to the cosine law, we have
cos θ = d 1 2 + d L o S 2 ( d N L o S d 1 ) 2 2 d 1 d L o S ,
cos ( θ ± Δ θ ) = d 1 2 + d L o S 2 ( d N L o S ± d Δ d 1 ) 2 2 d 1 d L o S .
Then we can get
θ ± Δ θ = arccos ( d 1 2 + d L o S 2 ( d N L o S d 1 ) 2 d Δ 2 2 d Δ ( d N L o S d 1 ) 2 d 1 d L o S ) .
Since d Δ 2 is very small compared with d 1 and d L o S , by the first-order approximation we have
θ ± Δ θ arccos ( d 1 2 + d L o S 2 ( d N L o S d 1 ) 2 2 d Δ ( d N L o S d 1 ) 2 d 1 d L o S ) a r c cos ( d 1 2 + d L o S 2 ( d N L o S d 1 ) 2 2 d 1 d L o S ) ± 2 d Δ ( d N L o S d 1 ) 2 d 1 d L o S 1 1 cos 2 θ = θ ± 2 d Δ ( d N L o S d 1 ) 2 d 1 d L o S 1 1 cos 2 θ .
Then we have
Δ θ 2 d Δ ( d N L o S d 1 ) 2 d 1 d L o S 1 1 cos 2 θ .

Appendix B

Proof of Proposition 1.
 
Let z { h ¯ , σ } , the we have
E z { ( z z ^ ) ( z z ^ ) T } J 1 .
Since h ¯ , σ are independent, the Fisher information matrix J is block diagonal, and can be presented as
J = [ J h ¯ , h ¯ 0 0 J σ , σ ] .
Then the inverse of matrix J is
J 1 = [ J h ¯ , h ¯ 1 0 0 J σ , σ 1 ] .
Because p ( y ¯ , z ) = p ( y ¯ | z ) p ( h ¯ | α ) p ( α | b ) p ( b ) p ( σ ) , we have
J = E z { 2 log p ( y ¯ , z ) z i z j } = E z { 2 log p ( y ¯ | z ) z i z j } + E z { 2 log p ( h ¯ | α ) z i z j } + E z { 2 log p ( α | b ) z i z j } + E z { 2 log p ( b ) z i z j } + E z { 2 log p ( σ ) z i z j }
Since we mainly focus on the MSE of h ¯ , we only need to analyze J h ¯ , h ¯ . We discuss the above formula part by part as follows:
1)
Let J h ¯ , h ¯ ( y ¯ ) = E z { 2 log p ( y ¯ | z ) z i z j } , according to the Bayesian model in Figure 1, we have
p ( y ¯ | z ) ~ N o r m a l ( y ¯ | A ¯ h ¯ , σ I ) then J h ¯ , h ¯ ( y ¯ ) = E z { A ¯ T A ¯ σ } = A ¯ T A ¯ σ .
2)
Let J h ¯ , h ¯ ( h ¯ ) = E z { 2 log p ( h ¯ | α ) z i z j } , and we have P ( h ¯ | α ) = i = 1 2 N N o r m a l ( h ¯ i | 0 , α i ) , then we get J h ¯ , h ¯ ( h ¯ ) = E z { 1 α i } .
3)
Because E z { 2 log p ( α | b ) z i z j } , E z { 2 log p ( b ) z i z j } and E z { 2 log p ( σ ) z i z j } are independent with h ¯ , they are all 0.
Then in summary, we get J h ¯ , h ¯ = d i a g ( E ( 1 α i ) ) + 1 σ 2 A ¯ T A ¯ .
Since the priori support set information is used in our proposed algorithm, a three-layer model is constructed for the elements belonging to the priori support set, and a two-layer model is used for the elements not belonging to the priori support set, so E Z { 1 α i } has different expressions for the two cases. E Z { 1 α i } in the two cases are discussed as follows:
1)
When i belongs to the priori support set, according to the three-layer graph model we have
p ( α ) = i = 1 2 N Gamma ( α i | a , b i ) ,
p ( b i ) = Gamma ( b i | c , d i ) = Γ ( c ) 1 d i c b i c 1 e d i b i .
Then we get
p ( α i ) = 0 p ( α i | b i ) p ( b i ) d b i = 0 Γ ( a ) 1 b i a α i a 1 e b i α i Γ ( c ) 1 d i c b i c 1 e d i b i d b i = Γ ( a ) 1 Γ ( c ) 1 α i a 1 d i c Γ ( a + c ) ( a i + d i ) a + c .
Accordingly, we have
E { 1 α i } = 0 1 α i Γ ( a ) 1 Γ ( c ) 1 α i a 1 d i c Γ ( a + c ) ( a i + d i ) a + c d α i = 0 1 α i Γ ( a + c ) Γ ( a ) Γ ( c ) ( α i d 1 ) a 1 ( α i d 1 + 1 ) a c d α i d 1 ,
where Γ ( a + c ) Γ ( a ) Γ ( c ) ( α i d i ) a 1 ( α i d i + 1 ) a c satisfies the probability density function of Beta prime distribution.
According to the properties of the Beta prime distribution, when a < 1 < c , we have
E { ( h ¯ h ¯ ^ ) H ( h ¯ h ¯ ^ ) } | S | 1 | S | i S 1 1 + c a min ( d ) + + λ i σ + ( N | S | ) 1 ( N | S | ) i S 1 max ( b ) a + λ i σ | S | a min ( d ) 1 + c ( 1 F ( s n r 1 , β ) 4 β s n r 1 ) + ( N | S | ) a max ( b ) ( 1 F ( s n r 2 , β ) 4 β s n r 2 ) .
2)
When i does not belong to the priori support set, according to the high-order moment properties for the general gamma distribution, we have
E { 1 α i } = b i a .
Then in summary, we have
E { h ¯ h ¯ 2 } t r ( ( d i a g ( E ( 1 α i ) ) + 1 σ 2 A ¯ T A ¯ ) 1 ) = i S 1 1 + c a d i + λ i σ + i S 1 b i a + λ i σ ,
where S is the diagnosed support set, λ i is the eigenvalues of A ¯ T A ¯ ,and A ¯ T A ¯ 2 M × 2 M .
When overcomplete dictionary is as D d = { 1 N e j 2 π M k n } n , k , k { 1 , , M } , n { 1 , , N } , and A is Gaussian random matrix with each element is mean 0 and variance 1 T d , then A D d is complex Gaussian random matrix. Then A ¯ is Gaussian random matrix with mean 0 and variance ρ d 2 T d .
According to the random matrix theory, for N × K dimensional random matrix H with each element is independent and is variable with mean 0 and variance 1/N, when K , N and K N β ,then the empirical distribution of eigenvalues of H T H converges almost surely as f β ( x ) = ( 1 1 β ) + δ ( x ) + ( x a ) + ( b x ) + 2 π β x , where ( x ) + = max ( 0 , x ) , a = ( 1 β ) 2 , b = ( 1 + β ) 2 .
Since A ¯ 2 T d × 2 M , and its element is Gaussian random variable with mean 0 and variance ρ d 2 T d By applying the above results for the empirical distribution of eigenvalues of H T H , when T d , M and T d M = β , the empirical distribution of eigenvalues λ of A ¯ T A ¯ converges almost surely as
f β ( λ ) = ( 1 1 β ) + δ ( λ ) + ( λ a ) + ( b λ ) + 2 π β λ ρ d
where a = ρ d ( 1 β ) 2 , b = ρ d ( 1 + β ) 2 . When s , M and s M = μ , we have
E { ( h ¯ h ¯ ^ ) H ( h ¯ h ¯ ^ ) } | S | 1 | S | i S 1 1 + c a min ( d ) + + λ i σ + ( N | S | ) 1 ( N | S | ) i S 1 max ( b ) a + λ i σ | S | a min ( d ) 1 + c ( 1 F ( s n r 1 , β ) 4 β s n r 1 ) + ( N | S | ) a max ( b ) ( 1 F ( s n r 2 , β ) 4 β s n r 2 ) ,
where s n r 1 = a min ( d ) ( 1 + c ) σ , s n r 2 = a σ max ( b ) and F ( x , z ) = ( x ( 1 + z ) 2 + 1 x ( 1 z ) 2 + 1 ) 2 .
Then the proofs are complete. □

References

  1. Lu, L.; Li, G.Y.; Swindlehurst, A.L.; Ashikhmin, A.; Zhang, R. An overview of Massive MIMO: Benefits and challenges. IEEE J. Sel. Top. Signal Process. 2014, 8, 742–758. [Google Scholar] [CrossRef]
  2. Rao, X.; Lau, V.K.N. Distributed compressive CSIT estimation and feedback for FDD multi-user massive MIMO systems. IEEE Trans. Signal Process. 2014, 12, 3261–3271. [Google Scholar]
  3. Vaswani, N.; Lu, W. Modified-CS: Modifying compressive sensing for problems with partially known support. IEEE Trans. Signal Process. 2010, 9, 4595–4607. [Google Scholar] [CrossRef]
  4. Borries, R.V.; Miosso, C.; Potes, C. Compressed sensing using priori information. In Proceedings of the 2nd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive, St. Thomas, VI, USA, 12–14 December 2007; pp. 121–124. [Google Scholar]
  5. Tseng, C.C.; Wu, J.Y.; Lee, T.S. Enhanced compressive downlink CSI Recovery for FDD Massive MIMO systems using weighted Block l1 minimization. IEEE Trans. Commun. 2016, 3, 1055–1066. [Google Scholar] [CrossRef]
  6. Lu, W.; Wang, Y.; Fang, Q.; Peng, S. Downlink compressive channel estimation with support diagnosis in FDD massive MIMO. J. Wirel. Commun. Netw. 2018, 115, 1–12. [Google Scholar] [CrossRef]
  7. Masood, M.; Afify, L.H.; Al-Naffouri, T.Y. Efficient coordinated recovery of sparse channels in massive MIMO. IEEE Trans. Signal Process. 2015, 1, 104–118. [Google Scholar] [CrossRef]
  8. Cheng, X.; Sun, J.; Li, S. Channel estimation for FDD multi-user massive MIMO: A variational Bayesian inference-based approach. IEEE Trans. Wirel. Commun. 2017, 11, 7590–7602. [Google Scholar] [CrossRef]
  9. Dai, J.; Liu, A.; Lau, V.K.N. FDD massive MIMO channel estimation with arbitrary 2D-Array Geometry. IEEE Trans. Signal Process. 2018, 10, 2584–2599. [Google Scholar] [CrossRef]
  10. Xie, H.; Gao, F.; Jin, S.; Fang, J.; Liang, Y. Channel Estimation for TDD/FDD Massive MIMO Systems with Channel Covariance computing. IEEE Trans. Wirel. Commun. 2018, 6, 4206–4218. [Google Scholar] [CrossRef]
  11. Shen, W.; Dai, L.; Shi, Y.; Shim, B.; Wang, Z. Joint Channel Training and Feedback for FDD Massive MIMO Systems. IEEE Trans. Veh. Technol. 2016, 10, 8762–8767. [Google Scholar] [CrossRef]
  12. Tauböck, G.; Hlawatsch, F.; Eiwen, D.; Rauhut, H. Compressive Estimation of Doubly Selective Channels in Multicarrier Systems: Leakage Effects and Sparsity-Enhancing Processing. IEEE J. Sel. Top. Signal Process. 2010, 2, 255–271. [Google Scholar] [CrossRef]
  13. Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
  14. Christopher, M. Bishop. Bayesian Linear Regression. In Pattern Recognition and Machine Learning; Springer-Verlag: Heidelberg, Germany, 2006; pp. 152–160. [Google Scholar]
  15. Fang, J.; Shen, Y.; Li, F.; Li, H.; Chen, Z. Support knowledge-aided sparse Bayesian learning for compressed sensing. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 19–24 April 2015; pp. 3786–3790. [Google Scholar]
  16. Ji, S.; Xue, Y.; Carin, L. Bayesian compressive sensing. IEEE Trans. Signal Process. 2008, 6, 2346–2356. [Google Scholar] [CrossRef]
  17. Lu, W.; Wang, Y.; Fang, Q.; Peng, S. Compressive Channel Estimation Based on Weighted IRLS in FDD Massive MIMO. Wirel. Pers. Commun. 2018, 2, 1–10. [Google Scholar] [CrossRef]
  18. Candes, E.J.; Romberg, J.K.; Tao, T. Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 2006, 8, 1207–1223. [Google Scholar] [CrossRef]
Figure 1. Graphical model for the channel estimation with Bayesian inference. The nodes with double circle, single circle and square correspond to the observed data, hidden variables and parameters, respectively.
Figure 1. Graphical model for the channel estimation with Bayesian inference. The nodes with double circle, single circle and square correspond to the observed data, hidden variables and parameters, respectively.
Electronics 08 00473 g001
Figure 2. Channel estimations by group. Each block represents one timeslot, and the block filled with grey is the timeslot with variational Bayesian inference (VBI) for the channel estimation, while the blank blocks are the timeslots with the proposed algorithm for channel estimation.
Figure 2. Channel estimations by group. Each block represents one timeslot, and the block filled with grey is the timeslot with variational Bayesian inference (VBI) for the channel estimation, while the blank blocks are the timeslots with the proposed algorithm for channel estimation.
Electronics 08 00473 g002
Figure 3. Ellipse geometry channel model for line of sight (LoS) and non-LoS (NLoS) transmission.
Figure 3. Ellipse geometry channel model for line of sight (LoS) and non-LoS (NLoS) transmission.
Electronics 08 00473 g003
Figure 4. Transmission angle change during one timeslot with a different LoS distance and different distances between the base station (BS) and reflector.
Figure 4. Transmission angle change during one timeslot with a different LoS distance and different distances between the base station (BS) and reflector.
Electronics 08 00473 g004
Figure 5. Comparisons of channel estimation mean square error (MSE) for different algorithms.
Figure 5. Comparisons of channel estimation mean square error (MSE) for different algorithms.
Electronics 08 00473 g005
Figure 6. Comparisons of channel estimation of MSE for the proposed algorithm with different dictionary sizes.
Figure 6. Comparisons of channel estimation of MSE for the proposed algorithm with different dictionary sizes.
Electronics 08 00473 g006
Figure 7. Comparisons of runtime and convergence performances of the proposed algorithm with orthogonal dictionary (size is 100) and overcomplete dictionary (size is 150).
Figure 7. Comparisons of runtime and convergence performances of the proposed algorithm with orthogonal dictionary (size is 100) and overcomplete dictionary (size is 150).
Electronics 08 00473 g007

Share and Cite

MDPI and ACS Style

Lu, W.; Wang, Y.; Wen, X.; Peng, S.; Zhong, L. Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference. Electronics 2019, 8, 473. https://doi.org/10.3390/electronics8050473

AMA Style

Lu W, Wang Y, Wen X, Peng S, Zhong L. Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference. Electronics. 2019; 8(5):473. https://doi.org/10.3390/electronics8050473

Chicago/Turabian Style

Lu, Wei, Yongliang Wang, Xiaoqiao Wen, Shixin Peng, and Liang Zhong. 2019. "Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference" Electronics 8, no. 5: 473. https://doi.org/10.3390/electronics8050473

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop