Spectral and Energy Efficient Low-Overhead Uplink and Downlink Channel Estimation for 5G Massive MIMO Systems

Uplink and Downlink channel estimation in massive Multiple Input Multiple Output (MIMO) systems is an intricate issue because of the increasing channel matrix dimensions. The channel feedback overhead using traditional codebook schemes is very large, which consumes more bandwidth and decreases the overall system efficiency. The purpose of this paper is to decrease the channel estimation overhead by taking the advantage of sparse attributes and also to optimize the Energy Efficiency (EE) of the system. To cope with this issue, we propose a novel approach by using Compressed-Sensing (CS), Block Iterative-Support-Detection (Block-ISD), Angle-of-Departure (AoD) and Structured Compressive Sampling Matching Pursuit (S-CoSaMP) algorithms to reduce the channel estimation overhead and compare them with the traditional algorithms. The CS uses temporal-correlation of time-varying channels to produce Differential-Channel Impulse Response (DCIR) among two CIRs that are adjacent in time-slots. DCIR has greater sparsity than the conventional CIRs as it can be easily compressed. The Block-ISD uses spatial-correlation of the channels to obtain the block-sparsity which results in lower pilot-overhead. AoD quantizes the channels whose path-AoDs variation is slower than path-gains and such information is utilized for reducing the overhead. S-CoSaMP deploys structured-sparsity to obtain reliable Channel-State-Information (CSI). MATLAB simulation results show that the proposed CS based algorithms reduce the feedback and pilot-overhead by a significant percentage and also improve the system capacity as compared with the traditional algorithms. Moreover, the EE level increases with increasing Base Station (BS) density, UE density and lowering hardware impairments level.


Introduction
Massive MIMO is an upcoming technology for the future 5G Wireless Communication Systems [1]. It is a physical layer (PHY-L) envisioned technology that will resolve various physical layer issues in modern cellular networks. It has attracted substantial research interest from various communities for assessing its various merits. It has the ability to provide improved Spectral Efficiency (SE) and better than the conventional channel-statistics based method as it can track the channel-vectors more efficient. The S-CoSaMP scheme also provides a clue to reduce the channel overhead efficiently. Dense antenna-arrays provides more correlated signals at the BS. As the BS-antennas are closely-spaced which causes same path-delays, so the CIRs of each antenna has a common support vector which is deployed by S-CoSaMP to decrease the undesired overhead. S-CoSaMP recovers the CIR from the reduced pilots. This technique can further decrease the UL/DL feedback overhead. This scheme uses SCS-joint UL/DL mechanism for channel estimation. It also eliminates the need for channel estimation at each UE and then feedback to BS. UEs directly feedback their pilots to BS where CS-algorithms recover the corresponding CIR. It uses a joint mechanism which improves the system performance as compared to the conventional CS-algorithms that performs channel training and feedback separately.
The rest of the paper is organized as follows: Section 2 describes the literature review of the paper which discusses the related background work. Section 3 discusses the proposed system model with all necessary theoretical as well as mathematical explanation. Section 4 provides the simulation results which show a comparison between the proposed and the conventional algorithms in terms of various parameters. Finally, Section 5 provides our conclusions and future work.
Notations: Lowercase and upper-case letters denote vectors and matrices, respectively; (·) T , (·) H , and (·) −1 denote the transpose, conjugate transpose, and inverse of a matrix, respectively; φ represents the sensing matrix, γ denotes the feedback overhead, I K denotes the identity matrix of K × K dimension; ρ is the correlation coefficient, α h i is the support element of CIR h i and µ is the channel sparsity.

Literature Review
The recent research activities indicate that the main goals of the 5G mobile communications networks are to accomplish 1000× times the system-capacity and 10× times the Spectral Efficiency (SE), Energy Efficiency (EE) and data-rate, and 25× times the average-cell-throughput [9]. The demand for high data-rate and link-reliability is increasing exponentially due to the increasing number of new generation of devices such as smart-phones, tablets and notebooks, etc. The electromagnetic spectrum is a precious but limited commodity, so, to utilize such spectrum efficiently and to cope with the demands, the high-level perspective technology of massive MIMO is deployed. Massive MIMO is defined as a system that deploys a large number of smaller array antennas at the BS which significantly improve the beamforming and system capacity and can simultaneously provide service to a large number of users (UEs) [10]. It has various merits over the conventional MIMO. First, it uses a large number of antennas at the BS due to which the simplest coherent-combiner and linear-precoder can be used for signal processing such MF or ZF. Second, by increasing the number of antennas increases the system capacity substantially using the channel-reciprocity features and without increasing feedback-overhead. Third, the reduced power benefits in the UL/DL provide the feasibility to shrink the cell-size which can be used in micro and pico-cells. Massive MIMO Technology is used by various wireless communication standards and companies such as the Long-Term-Evolution-Advanced (LTE-A), due to its substantial potential both in channel-capacity and route-reliability. It uses hundreds of antennas at the BS in various geometrical shapes in the form of arrays. Increasing the number of antennas assist to direct the radiated energy towards much smaller-regions, which improves the throughput and EE. It also provides aggressive-spatial multiplexing. It has the ability to decrease the transmitter (Tx) power and also enhances the Spectrum Efficiency (SE) by 10 times and more. Also, low-power devices are deployed for service purpose, which is also a plus point for massive MIMO as it is low-cost and easily manufactured from solid-state devices.
The performance characteristics of the wireless system are not solely based on the transmitter (Tx) and receiver (Rx), but are also dependent upon the wireless channel under consideration. The wireless channel is very complicated and unpredictable. It is the key element to understand the development of high-performance, improved Quality-of-Service (QoS) and bandwidth-efficient wireless communication technology. In Massive MIMO Systems, accurate UL and DL CSI are necessary for channel estimation [5]. In [3], it is illustrated that in propagation-environments, the MIMO-channels shows a sparse-structure in Channel Impulse Response (CIR). To reap the advantages of massive MIMO, accurate CSI is mandatory for each Tx-Rx pair. As there is a large number of antennas used by massive MIMO, therefore the channels estimations associated with such antennas will also be large which is a tedious task because of the enormous increasing dimensions of the channel matrix. Many channel estimation techniques for CIR are proposed in the related research literature which can be classified into superimposed [11], training [12], pilot [13], blind [14], and semi-blind [15] techniques, respectively.
The CS paradigm is getting much research attention as it has the ability to recover the unknown signals from just a small number of measurements. It uses very few samples following the Nyquist sampling rate. It deploys the non-zero elements of a sparse-signal for the recovery operation. It also performs accurate estimation of system-parameters with fewer-pilots, therefore it also increases the bandwidth (BW) efficiency. The classical conventional CS-paradigms need pre-known information of channel-sparsity, which is practically unknown in system analysis. Furthermore, for reliable and accurate CIR retrieving, the CS-sensing or sampling matrix must satisfy the isometry property.
The previous research performed for channel estimation did not perform UL/DL channel estimation jointly using CS [16][17][18][19], differential CIR [20], Block-ISD [21], AoD [22] and S-CoSaMP [23]. Therefore, we propose an efficient, low-overhead UL/DL joint channel feedback and pilot-based channel estimation which gives an optimum comparison with the conventional schemes such as direct-CS, classical Block-ISD, BP, LS, CoSaMP, Kalman filter, J-OMP and conventional codebook mechanisms respectively. The simulation results provide a clear understanding of the improvement caused by the proposed algorithms in terms of the mean square error (MSE), data-rate, per-user rate and feedback bits.

System Model
The proposed system is modeled using N t transmitter antennas, K receiving antennas such that (N t >> K), SNR, channel length, channel sparsity, slot-interval, and various relevant parameters to obtain the required results.

CS-Differential Channel Feedback
The proposed CS algorithm for Uplink (UL) differential channel feedback is depicted in Figure 1. The CIR channel feedback is performed at particular time-slots. The direct compressed sensing (DCS) method does not perform the differential operation and directly compresses the CIR with sensing-matrix. But the CS differential algorithm applies differential operation for generating more strong and sparse Differential CIR (DCIR) than the DCS CIR. DCIR is then compressed to Compressed CIR (CCIR) using the sensing-matrix and it is known to base-station (BS) and User (UE). Such CCIR is fed back to BS. The CS Recovery algorithm gets back the DCIR from the CCIR accurately.
where: N: Number of BS-antennas and L: Maximum channel Length.
The Sparse CIR is represented by a supportive-vector p (k) n and an amplitude-vector a where p (k) n (l) ∈ {0, 1}, a (k) n ∈ C and "o" indicates the Hadamard-product.

Support-Vector
n having lth elements and T-vectors can be represented by a first-order Markov process that represent it by two transition-probabilities: and: where, p 10 is the transition probability from 0 (at time slot k) to 1 (at time slot k + 1). The corresponding distribution for initial time-slot (k = 1) is: For the steady-state Markov process: For all k, n. Where, µ denotes the channel sparsity. From the above equations, only p 01 and µ are important to represent the Markov process as p 10 can be obtained from p 01 and µ respectively.

Amplitude-Vector
It is represented by a first-order auto-regressive process as [24]: where ω (k) : is the noise-vector whose elements are independent identically distributed (i.i.d) with zero mean Gaussian variables which follow CN 0, σ 2 ω distribution and does not depend on a (k−1) n , ρ: is the correlation-coefficient which is obtained by the Bessel-function of zero order as follows: It is determined by taking the difference among two CIRs at the previous at current time-slots, which is the required sparse-signal. Mathematically: By putting the corresponding values from (2) and (7) in (9) we get: The above process is graphically illustrated in Figure 2, in which the previous CIR, current CIR and the differential CIR (DCIR) are shown. It is clear from the figure that the DCIR has more sparsity and can be more compressed than the other two CIRs.
where ѡ ( ) : is the noise-vector whose elements are independent identically distributed (i.i.d) with zero mean Gaussian variables which follow ℂN(0, 2 ) distribution and does not depend on ( −1) , ρ: is the correlation-coefficient which is obtained by the Bessel-function of zero order as follows: : Maximal Doppler-frequency and ∶ Time-interval of slot.

Differential CIR (DCIR)
It is determined by taking the difference among two CIRs at the previous at current time-slots, which is the required sparse-signal. Mathematically: By putting the corresponding values from (2) and (7) in (9) we get: The above process is graphically illustrated in Figure 2, in which the previous CIR, current CIR and the differential CIR (DCIR) are shown. It is clear from the figure that the DCIR has more sparsity and can be more compressed than the other two CIRs.

Compression CIR (CCIR)
The ∆ℎ ( ) is compressed by the sensing matrix є ℂ with ≪ and convert it into a lower level measurement-vector as follows: y: compressed channel-feedback to BS.

Compression CIR (CCIR)
The ∆h (k) n is compressed by the sensing matrix φ ∈ C MxL with M L and convert it into a lower level measurement-vector as follows: y: compressed channel-feedback to BS.

Feedback-Overhead
It is expressed as a ratio of channel compression w.r.t channel length. This ratio should be small enough for efficient channel-feedback and recovery process. Mathematically: where M L so the ratio is less than 1 (100%) and it is added as extra overhead bits to the feedback channel attributes.

CS-Recovery Algorithm
The desired ∆h (k) n is recovered by the BS from the noise accumulated signal if φ fulfills the isometric-property [25]. Mathematically: where n represents the feedback-channel noise having the same attributes as the noise-vector ω in (7) and follows CN 0, σ 2 n .

Flowchart
The flowchart in Figure 3 describes the major steps involved in the CS-differential channel estimation and feedback process. First, the system parameters are initialized which are mentioned in Table 1. Then the differential operation is performed to produce DCIR. The sensing matrix then performs compressing operation for CCIR and that is feedback to BS for recovering. The condition checks whether ϕ obey isometry, if yes then the required CIR is obtained to complete the desired channel-estimation process. It is expressed as a ratio of channel compression w.r.t channel length. This ratio should be small enough for efficient channel-feedback and recovery process. Mathematically: where ≪ so the ratio is less than 1 (100%) and it is added as extra overhead bits to the feedback channel attributes.

CS-Recovery Algorithm
The desired ∆ℎ ( ) is recovered by the BS from the noise accumulated signal if fulfills the isometric-property [25]. Mathematically: where n represents the feedback-channel noise having the same attributes as the noise-vector ѡ in (7) and follows ℂN(0, 2 ).

Flowchart
The flowchart in Figure 3 describes the major steps involved in the CS-differential channel estimation and feedback process. First, the system parameters are initialized which are mentioned in Table 1. Then the differential operation is performed to produce DCIR. The sensing matrix then performs compressing operation for CCIR and that is feedback to BS for recovering. The condition checks whether φ obey isometry, if yes then the required CIR is obtained to complete the desired channel-estimation process.     Orthogonal Frequency Division Multiplexing (OFDM) scheme is employed to determine the number of pilots easily in frequency-domain [26][27][28]. Mathematically, the received pilots-signals at the MS or UE is expressed as:

Block-ISD Pilot Feedback
where j = [1, 2, . . . , N] : Is the index of subcarrier-set that are chosen randomly and allocate to pilots-signals, y j : Measurements performed for CIR recovery; P i = diag(p i ): Is the diagonal matrix of the pilot-vector p i at the corresponding ith Tx antenna, p i ∈ C px1 here p represents the number of pilot-signals; F L ∈ C MxL is the sub-matrix which contains the Discrete Fourier Transform (DFT) column matrix of L columns of order M × M; M is the symbol-length of OFDM; (F L ) j is the sub-matrix T is the CIR between the ith BS antenna and single-user Rx antenna; L is the maximum channel-length; n j = [n 1 , n 2 , . . . , n p ] T is the channel noise-vector, having the same characteristics as discussed in Equation (13).
Entropy 2018, 20, x FOR PEER REVIEW 8 of 23 Figure 4 shows the proposed block-ISD based channel-feedback schemes. The same number of antennas are deployed for downlink channel estimation as in uplink channel estimation process. Orthogonal Frequency Division Multiplexing (OFDM) scheme is employed to determine the number of pilots easily in frequency-domain [26][27][28]. Mathematically, the received pilots-signals at the MS or UE is expressed as: For the sake of convenience, (14) can also be written as:

Creating Block-ISD CIR
For the recovery process, the CIR (h) is required which is extracted from the noise-accumulated signal. The spatial-correlation operation is performed on various CIRs of the considered antennas. The CIRs between the BS and UE of each antenna exhibit analogous properties such as the Time of Arrival (ToA), so they share a common-supporting element as follows [29]: where ℎ = { ∶ ℎ ( ) ≠ 0} is the support of ℎ . For the sake of convenience, (14) can also be written as: T is the cumulative CIR from the N T number of antennas.

Creating Block-ISD CIR
For the recovery process, the CIR (h) is required which is extracted from the noise-accumulated signal. The spatial-correlation operation is performed on various CIRs of the considered antennas. The CIRs between the BS and UE of each antenna exhibit analogous properties such as the Time of Arrival (ToA), so they share a common-supporting element as follows [29]: where Figure 5 illustrates the production of block-sparse equivalent CIR (s) from the considered CIRs which has more sparsity. The green, yellow and orange colors of CIRs (h) represents the non-zero elements and that is also the common support elements among them from which the desired sparse-CIR is constructed.  Figure 5 illustrates the production of block-sparse equivalent CIR (s) from the considered CIRs which has more sparsity. The green, yellow and orange colors of CIRs (h) represents the non-zero elements and that is also the common support elements among them from which the desired sparse-CIR is constructed. The equivalent CIR (s) is given as: which is obtained from the following equation: where = 1, 2, … , ; = 1, 2, … , From (18), the zero and non-zero elements can be grouped separately for feasibility if s is decomposed into L-blocks where each block contains -elements. Therefore, the supportive elements of s (i.e., ) can be updated collectively using the basic pursuit (BP) technique. To represent the new sensing-matrix ѱ with more-sparse attributes in terms of ф in (15) and (18) it is written as: The equivalent CIR (s) is given as: which is obtained from the following equation: where l = 1, 2, . . . , L; n t = 1, 2, . . . , N T . From (18), the zero and non-zero elements can be grouped separately for feasibility if s is decomposed into L-blocks where each block contains N T -elements. Therefore, the supportive elements of s (i.e., α s ) can be updated collectively using the basic pursuit (BP) technique. To represent the new sensing-matrix ψ with more-sparse attributes in terms of φ in (15) and (18) it is written as: and the corresponding estimated channel from (15) can be written as: where y j Is the received vector of order p × 1; s is the require sparse CIR with the order N T L × 1.

AoD-Adaptive Subspace Codebook Algorithm
We consider the same number of BS and UE antennas as in CS and Block-ISD techniques. It is also based on CS algorithms. To obtain the DL channel-vector, we employ classical narrowband ray-based mmWave channel model which is given by [30,31]: where h j is the DL channel-vector for jth user, that is also required by the BS for precoding and power allocation; P j is the number of resolvable paths from BS to jth UE; G j,i is the i.i.d complex-gain of the i-th propagation-path of the j-th UE having zero mean and unit variance; θ j,i is the i-th path-AoD of the j-th UE; a θ j,i ∈ C Nx1 is the steering-vector representing the antenna-response of the i-th path of the j-th UE.
To obtain a θ j,i , let us suppose that the BS antennas are uniform linear-arrays (ULA), therefore: where d is the spacing or separation between antenna array elements at the BS; λ Is the wavelength of the carrier frequency.
For the sake of convenience, we write (21) in matrix form as: where A j = a θ j,1 , a θ j,2 , . . . , a θ j, P j ∈ C NxP j ; G j = g j,1 , g j,2 , g j,3 , . . . , g j,P j H ∈ C P j x1 . The channel-vectors of all K-users are concatenated and represented as follows: The channel quantization is performed by the codebook: where 2 b is the N-dimensional different column-vectors of unit-magnitude, and b is the number of bits required for feedback. The quantization of the jth user channel-vector h j to a quantization vector is given by: where F j is the quantization index. It is calculated by the following equation: where h j is the direction of the concerned channel which can be obtained from the following relationship: The quantization vector is transmitted and F j can be feedback from the jth UE to BS by employing the feedback bits b. When the BS receive such b-bits and its associated F j , it can produce the feedback channel-vector which is given by:ȟ Entropy 2018, 20, 92

of 23
The feedback channel-vectors of all K-users are concatenated and represented as follows: The per-user rate is obtained from (30) using zero-forcing (ZF) linear precoding technique. ZF has the ability to attain near-optimal performance with lower complexity when the BS antennas approach infinity [32]. The signal that is transmitted after ZF precoding is determined by: where P t is the transmit power; s = [s 1 , s 2 , s 3 , . . . , s K ] ∈ C K×1 is a set of signals dedicated to the K-users with unit normalized-power; = 1 , 2 , 3 , . . . , K ∈ C N×K is the ZF precoding-matrix which contains K-different unit-magnitude precoding-vectors i of order N × 1. To obtain the precoding vectors, let us represent the aggregated channel vector as follows: Therefore, the precoding-vectors can be obtained as the normalized ith column of η as given below: The received-signal for the kth user can be expressed as: By putting (31) in (34) we get: where n k is the Gaussian noise-vector at the kth user having zero-mean and unit-variance. Therefore, the signal-to-interference noise ratio (SINR) at the kth user is given by: The per-user data rate R is then calculated by: Putting (35) in (36), we get: In (38), R is dependent upon -matrix which in turn depends on theȞ-matrix quality. The quantization-vectors are accurately placed on the normalized-channel subspace which is pictorially depicted in Figure 6.  The path-AoD ( , ) in (29) is mainly dependent on the nearby hurdles which are around the BS and may not change physically their place in a lot of time as compare to the coherence-time. On the other hand, for the path-gain ( , ) a resolvable-path is created by a cluster comprised of scatters which is nearby the kth user and it contains a number of unresolvable-paths. Therefore, the kth user sees a path-gain that depends on a number of unresolvable-paths. It is clear that the path-gain has much quicker variation than the path-AoDs which vary much slowly. The coherence-time of the angle in which the path-AoDs are considered static and the channel-vector is present in the channelsubspace, is much longer as compare to the channel coherence-time.
Due to joint UL/DL channel estimation benefit, the quantized AoDs can be produced at the BS as well as users. They can also create the steering-matrix. Therefore, the proposed AoD-subspace based codebook is written as: where, ℤ є ℂ ×1 is the traditional-codebook quantization-vector. The feedback bits b is determined as follows: where SNR is computed at the Rx side which is expressed by: It is clear from (41) that the slope of the b-bits ( − 1) is directly proportional to SNR and the rate-gap remains constant. It is also clear that as is very much smaller than , so the proposed AoD-codebook algorithm can substantially reduce the size of codebook and the overhead.

S-CoSaMP Algorithm
The conventional UL/DL schemes is a two-step process. In the first step, DL CSI is estimated at the UE and in the second step, the CSI is feedback in UL to BS. These two steps processes are individually optimized which is shown in Figure 7a. To point out the loopholes associated with such technique, we propose a novel approach that performs channel UL and DL feedback jointly. The proposed scheme process is depicted in Figure 7b. In this algorithm, the user feedbacks the pilots to BS without channel-estimation. The BS then extracts the required CIR using the CS algorithm. This is the main advantage of this algorithm as it provides freedom to users from channel-estimation which is a tedious task. Also, the performance of channel-feedback is also enhanced. The path-AoD (θ j,i ) in (29) is mainly dependent on the nearby hurdles which are around the BS and may not change physically their place in a lot of time as compare to the coherence-time. On the other hand, for the path-gain G j,i a resolvable-path is created by a cluster comprised of scatters which is nearby the kth user and it contains a number of unresolvable-paths. Therefore, the kth user sees a path-gain that depends on a number of unresolvable-paths. It is clear that the path-gain has much quicker variation than the path-AoDs which vary much slowly. The coherence-time of the angle in which the path-AoDs are considered static and the channel-vector is present in the channel-subspace, is much longer as compare to the channel coherence-time.
Due to joint UL/DL channel estimation benefit, the quantized AoDs can be produced at the BS as well as users. They can also create the steering-matrix. Therefore, the proposed AoD-subspace based codebook is written as: where, Z i ∈ C k×1 is the traditional-codebook quantization-vector. The feedback bits b is determined as follows: where SNR is computed at the Rx side which is expressed by: It is clear from (41) that the slope of the b-bits (Q − 1) is directly proportional to SNR and the rate-gap remains constant. It is also clear that as Q is very much smaller than N, so the proposed AoD-codebook algorithm can substantially reduce the size of codebook and the overhead.

S-CoSaMP Algorithm
The conventional UL/DL schemes is a two-step process. In the first step, DL CSI is estimated at the UE and in the second step, the CSI is feedback in UL to BS. These two steps processes are individually optimized which is shown in Figure 7a. To point out the loopholes associated with such technique, we propose a novel approach that performs channel UL and DL feedback jointly. The proposed scheme process is depicted in Figure 7b. In this algorithm, the user feedbacks the pilots to BS without channel-estimation. The BS then extracts the required CIR using the CS algorithm. This is the main advantage of this algorithm as it provides freedom to users from channel-estimation which is a tedious task. Also, the performance of channel-feedback is also enhanced. The procedure of obtaining the DCIR for UL is the same as explained in subsection A of CS differential CIR. To obtain the received pilots at the BS, we modify (15) as follows: We try to use the temporal-correlation of channels by calculating the differential-pilots (D-Pilots) at the BS in two consecutive time-slots k − 1 and k respectively as [33]: By putting (42) in (43) we get: where ∆ℎ ( ) = ℎ ( ) − ℎ ( −1) , which is the same as (9). We know that ℎ ( ) and ℎ ( −1) have sparse structure so the ∆ℎ ( ) also have sparse nature. So, to reap these sparse benefits, we employ the S-CoSaMP algorithm which has low-complexity and robustness to recover the CIR for LS technique which is given by [34]: This holds true when the CIR has sparse nature. The CoSaMP technique recovers the CIR in the initial time-slot from the pilots received at the BS. It also recovers the DCIR from the D-pilots in the succeeding time-slots. It is also more beneficial than the conventional CoSaMP algorithm as it updates each support of CIR. The conventional CoSaMP does not take into account the sparse structure of CIR. Whereas, the proposed S-CoSaMP utilizes the sparsity inherent in the CIR.

Energy Efficiency Maximization
In this section, we analyze the Uplink energy efficiency (EE) in terms of the number of BSs, the number of users per cell, the density of BS antennas and the pilot reuse factor respectively. The closed form of analytical expressions is derived to provide insights into the energy optimization of small cell based massive MIMO systems. The transmit power of user equipment i in j cell can be expressed by: where is the coefficient of power control. The average transmit power for such UE can then be represented by: The procedure of obtaining the DCIR for UL is the same as explained in subsection A of CS differential CIR. To obtain the received pilots at the BS, we modify (15) as follows: We try to use the temporal-correlation of channels by calculating the differential-pilots (D-Pilots) at the BS in two consecutive time-slots k − 1 and k respectively as [33]: By putting (42) in (43) we get: where ∆h (k) = h (k) − h (k−1) , which is the same as (9). We know that h (k) and h (k−1) have sparse structure so the ∆h (k) also have sparse nature. So, to reap these sparse benefits, we employ the S-CoSaMP algorithm which has low-complexity and robustness to recover the CIR for LS technique which is given by [34]: This holds true when the CIR has sparse nature. The CoSaMP technique recovers the CIR in the initial time-slot from the pilots received at the BS. It also recovers the DCIR from the D-pilots in the succeeding time-slots. It is also more beneficial than the conventional CoSaMP algorithm as it updates each support of CIR. The conventional CoSaMP does not take into account the sparse structure of CIR. Whereas, the proposed S-CoSaMP utilizes the sparsity inherent in the CIR.

Energy Efficiency Maximization
In this section, we analyze the Uplink energy efficiency (EE) in terms of the number of BSs, the number of users per cell, the density of BS antennas and the pilot reuse factor respectively. The closed form of analytical expressions is derived to provide insights into the energy optimization where ρ is the coefficient of power control. The average transmit power for such UE can then be represented by: Therefore, by using (47), we can calculate the optimized EE for the proposed massive MIMO system by through the statistical power control policy [35].

Simulation Results
The simulation results are provided to verify our theoretical analysis. Table 1 provides the simulation parameters for the proposed algorithms which are utilized to obtain the simulation results in MATLAB. The sub-space pursuit (SP) technique is used for the recovery process which is more robust to noise and have low complexity [36].  Figure 8 shows the comparison results for the uplink (UL) channel-feedback for massive MIMO system using the direct compressed-sensing (DCS) and the proposed CS differential techniques. The simulations are performed using two compression ratios (γ) for CIRs which are 25% and 50% respectively. The results clearly reveal that the DCS scheme fails for 25% compression ratio as its overhead is lesser than required and the gap between this and the proposed CS differential algorithm gets larger and its NMSE does not reduce as required, while the proposed scheme performs much better than former scheme. For the second compression ratio (50%), the proposed scheme still show better performance as compared to DCS. It also got 6-dB gain over DCS. The proposed CS differential channel feedback algorithm using 25% overhead have similar performance as the DCS algorithm that uses 50% overhead. Its mean that the proposed algorithm decreases the channel-feedback overhead from 50% to 25%, and the net reduction is 25%.    Figure 10 illustrates the number of feedback bits b required to maintain a constant rate-gap between the ideal CSI and the proposed practical AoD-adaptive subspace codebook. We can see that the rate gap is limited to within 0.12 bps/Hz. The graph is linear, therefore, the number of feedback  Figure 9 shows the NMSE performance comparison of the classical ISD and the proposed Block-ISD algorithms. The results clearly show that the proposed Block-ISD performs better than the classical-ISD and BP algorithms due to more inherent sparsity. The exact LS is a reference for comparison of our results. The number of pilots for LS and other scheme is p =LN t = 32 × 130 = 4160 which is far more than the proposed pilots of 640. By taking the percentage between these two pilots signals (4160−640) 4160 × 100 = 84.6%, it is evident that a significant reduction in the channel-overhead is obtained by using Block-ISD algorithm for channel-estimation of massive MIMO systems.    Figure 10 illustrates the number of feedback bits b required to maintain a constant rate-gap between the ideal CSI and the proposed practical AoD-adaptive subspace codebook. We can see that the rate gap is limited to within 0.12 bps/Hz. The graph is linear, therefore, the number of feedback  bits required is linearly proportional to the number of paths. By comparing this simulation result with our theoretical formulation in (39), it is clearly validating it. Figure 10. Feedback-bits comparison for maintaining a data-rate gap between the perfect-CSI and proposed AoD-codebook. Figure 11 compares the required feedback bits versus the SNR between the ideal scenario of perfect-CSI and the proposed scheme with limited channel-feedback. We observe that the required number of feedback bits vary linearly with the SNR under consideration. Figure 11. Comparison of the required number of feedback-bits with increasing the SNR between the ideal CSI and the proposed AoD-codebook schemes. Figure 12 shows the per-user data rate comparison of the ideal condition, conventional channel statistics-based codebook and the proposed AoD-adaptive subspace codebook with perfect path-AoDs. The feedback bits b is constant for the quantized channel feedback algorithms. It is clear from the results that the per-user rate gap between the ideal perfect-CSI and the proposed AoD-codebook is at a constant level when the Rx SNR increases. From the Figure, the rate gap at 0 dB SNR is approximately 0.18 bps/Hz, whereas, at 15 dB SNR, the proposed AoD-codebook overlaps on the Figure 10. Feedback-bits comparison for maintaining a data-rate gap between the perfect-CSI and proposed AoD-codebook. Figure 11 compares the required feedback bits versus the SNR between the ideal scenario of perfect-CSI and the proposed scheme with limited channel-feedback. We observe that the required number of feedback bits vary linearly with the SNR under consideration. bits required is linearly proportional to the number of paths. By comparing this simulation result with our theoretical formulation in (39), it is clearly validating it. Figure 10. Feedback-bits comparison for maintaining a data-rate gap between the perfect-CSI and proposed AoD-codebook. Figure 11 compares the required feedback bits versus the SNR between the ideal scenario of perfect-CSI and the proposed scheme with limited channel-feedback. We observe that the required number of feedback bits vary linearly with the SNR under consideration. Figure 11. Comparison of the required number of feedback-bits with increasing the SNR between the ideal CSI and the proposed AoD-codebook schemes. Figure 12 shows the per-user data rate comparison of the ideal condition, conventional channel statistics-based codebook and the proposed AoD-adaptive subspace codebook with perfect path-AoDs. The feedback bits b is constant for the quantized channel feedback algorithms. It is clear from the results that the per-user rate gap between the ideal perfect-CSI and the proposed AoD-codebook is at a constant level when the Rx SNR increases. From the Figure, the rate gap at 0 dB SNR is approximately 0.18 bps/Hz, whereas, at 15 dB SNR, the proposed AoD-codebook overlaps on the Figure 11. Comparison of the required number of feedback-bits with increasing the SNR between the ideal CSI and the proposed AoD-codebook schemes. Figure 12 shows the per-user data rate comparison of the ideal condition, conventional channel statistics-based codebook and the proposed AoD-adaptive subspace codebook with perfect path-AoDs. The feedback bits b is constant for the quantized channel feedback algorithms. It is clear from the results that the per-user rate gap between the ideal perfect-CSI and the proposed AoD-codebook is at a constant level when the Rx SNR increases. From the Figure, the rate gap at 0 dB SNR is approximately 0.18 bps/Hz, whereas, at 15 dB SNR, the proposed AoD-codebook overlaps on the ideal case. This validates our theoretical analysis. The rate gap between the ideal CSI and the conventional-codebook increases with increasing the SNR, which is undesired in practice. Also, the proposed AoD-codebook performs much better than the conventional-codebook. ideal case. This validates our theoretical analysis. The rate gap between the ideal CSI and the conventional-codebook increases with increasing the SNR, which is undesired in practice. Also, the proposed AoD-codebook performs much better than the conventional-codebook. Figure 12. Comparison of the per-user rate of the ideal perfect-CSI and the proposed scheme with quantized-channel feedback in terms of SNR. Figure 13 shows the NMSE comparison of the proposed S-CoSaMP and the conventional algorithms. The conventional algorithms are represented by dotted-lines while the joint algorithms are represented by solid lines. It is clear from the Figure that the S-CoSaMP based differential joint scheme performs best as compare to other algorithms as it deploys temporal-correlation and sparsity. The proposed algorithm has the least NMSE which means reduced feedback-overhead as compared to the conventional schemes. On the second best is the S-CoSaMP joint algorithm which performs better than the conventional schemes. The JOMP and CoSaMP joint schemes have the same performance as their common support vectors are not satisfied. The Kalman Filter response is not better as it requires the autocorrelation information of the channel. We also observe that the NMSE gap between the conventional algorithms and the proposed algorithms increases with increasing SNR. Thus, it also indicates the dominance of our proposed algorithms. Figure 14 compares the sum-rate between the ideal perfect-CSI, S-CoSaMP, and the conventional CoSaMP algorithms. We consider two compression ratios (γ) for comparing their performance evaluation. The perfect-CSI case is used for reference purpose with which we compare our proposed results. It is clear from the Figure that our proposed S-CoSaMP based joint differential feedback shows better performance and it is operating near the ideal case of perfect-CSI at γ = 70%. The proposed S-CoSaMP at γ = 40% perform better than the conventional CoSaMP which is operating with the overhead of γ = 70% and they have approximately the same data-rate. The conventional CoSaMP fails at the overhead of γ = 40% because the rate gap between this and the ideal CSI gets much larger which is undesired in practice. It shows that our proposed algorithm provides 30% reduction in overhead and improves the channel feedback efficiency as compared to the conventional algorithm in attaining the same sum-rate.  The proposed algorithm has the least NMSE which means reduced feedback-overhead as compared to the conventional schemes. On the second best is the S-CoSaMP joint algorithm which performs better than the conventional schemes. The JOMP and CoSaMP joint schemes have the same performance as their common support vectors are not satisfied. The Kalman Filter response is not better as it requires the autocorrelation information of the channel. We also observe that the NMSE gap between the conventional algorithms and the proposed algorithms increases with increasing SNR. Thus, it also indicates the dominance of our proposed algorithms. Figure 14 compares the sum-rate between the ideal perfect-CSI, S-CoSaMP, and the conventional CoSaMP algorithms. We consider two compression ratios (γ) for comparing their performance evaluation. The perfect-CSI case is used for reference purpose with which we compare our proposed results. It is clear from the Figure that our proposed S-CoSaMP based joint differential feedback shows better performance and it is operating near the ideal case of perfect-CSI at γ = 70%. The proposed S-CoSaMP at γ = 40% perform better than the conventional CoSaMP which is operating with the overhead of γ = 70% and they have approximately the same data-rate. The conventional CoSaMP fails at the overhead of γ = 40% because the rate gap between this and the ideal CSI gets much larger which is undesired in practice. It shows that our proposed algorithm provides 30% reduction in overhead and improves the channel feedback efficiency as compared to the conventional algorithm in attaining the same sum-rate.      Figure 15 shows the energy efficiency of the proposed massive MIMO system against the density of BSs. It is clear from the Figure that the EE gap between the upper and lower bounds for all three signal-to-interference ratios (SINRs) ζ is very small and it also decreases as SINR increases. The analytical expression of the SINR is given by: where is the level of hardware impairment, β is the optimization variable and ∂ is the power level coefficient respectively. Moreover, the EE is significantly improved with increasing the BS density. This means that by deploying smalls cells, the EE greatly improves which makes it an attractive solution to such issue. Furthermore, the EE decreases as the SINR increases which make a constraint on the EE and for this purpose, a target SINR should be specified from a practical perspective to obtain the required EE. where is the level of hardware impairment, is the optimization variable and is the power level coefficient respectively. Moreover, the EE is significantly improved with increasing the BS density. This means that by deploying smalls cells, the EE greatly improves which makes it an attractive solution to such issue. Furthermore, the EE decreases as the SINR increases which make a constraint on the EE and for this purpose, a target SINR should be specified from a practical perspective to obtain the required EE.  Figure 16 illustrates the EE against the UE density for the average SINR ( ) of 3 dB respectively. It is compared with two references usage cases, such as single-input multiple outputs (SIMO) and MIMO for performance evaluation. It is clear from the results that the EE becomes independent as the UE density increases to a large level. We can see that the EE saturation levels occurs at UE density of 100 and above respectively for the optimized and MIMO cases. The saturation for SIMO case occurs at UE of 2 and above. Moreover, we can observe that the proposed system shows optimized performance than the other two competing cases which makes it suitable for practical usage scenarios of small cells dense networks.   Figure 16 illustrates the EE against the UE density for the average SINR (ζ) of 3 dB respectively. It is compared with two references usage cases, such as single-input multiple outputs (SIMO) and MIMO for performance evaluation. It is clear from the results that the EE becomes independent as the UE density increases to a large level. We can see that the EE saturation levels occurs at UE density of 100 and above respectively for the optimized and MIMO cases. The saturation for SIMO case occurs at UE of 2 and above. Moreover, we can observe that the proposed system shows optimized performance than the other two competing cases which makes it suitable for practical usage scenarios of small cells dense networks. where is the level of hardware impairment, is the optimization variable and is the power level coefficient respectively. Moreover, the EE is significantly improved with increasing the BS density. This means that by deploying smalls cells, the EE greatly improves which makes it an attractive solution to such issue. Furthermore, the EE decreases as the SINR increases which make a constraint on the EE and for this purpose, a target SINR should be specified from a practical perspective to obtain the required EE.  Figure 16 illustrates the EE against the UE density for the average SINR ( ) of 3 dB respectively. It is compared with two references usage cases, such as single-input multiple outputs (SIMO) and MIMO for performance evaluation. It is clear from the results that the EE becomes independent as the UE density increases to a large level. We can see that the EE saturation levels occurs at UE density of 100 and above respectively for the optimized and MIMO cases. The saturation for SIMO case occurs at UE of 2 and above. Moreover, we can observe that the proposed system shows optimized performance than the other two competing cases which makes it suitable for practical usage scenarios of small cells dense networks.    Figure 17 shows the EE levels against the level of hardware impairments for three different SINR levels. It is clear from the results that the EE decreases with increasing hardware impairments. It is due to the decay of the desired signal power.
Entropy 2018, 20, x FOR PEER REVIEW 20 of 23 Figure 17 shows the EE levels against the level of hardware impairments for three different SINR levels. It is clear from the results that the EE decreases with increasing hardware impairments. It is due to the decay of the desired signal power. The loss can be high when the SINR increases to a large level which means that the hardware impairment substantially affects the EE in high SINR levels. Moreover, the EE loss is negligible for hardware impairment of level of 0.1 and below. Therefore, it can be concluded that for the moderate hardware impairment level, the effect on the system under consideration is negligible.

Conclusions
This paper scrutinizes the feedback and pilot overhead issues of uplink and downlink channel estimation in massive MIMO systems. It also investigates the EE evaluation in terms of BS density, UE density, and hardware impairment levels respectively. For the UL-channel estimation, it is concluded that by using temporal-correlation of massive MIMO time-varying channels which produces the compression-efficient differential CIR has 25% less overhead requirement or in other words, 25% more effective as compare to the direct CS (DCS) algorithm. It is also shown that the proposed CS algorithm has improved NMSE than the DCS algorithm. For the DL-channel estimation, the proposed Block-ISD scheme shows enhanced performance as compare to classical-ISD and BP schemes. It utilizes spatial-correlation of massive MIMO channels which produces block-sparse CIR that have more compression-efficiency and due to such mechanism, the pilot-overhead is decreased by 84.6%. In other words, the efficiency of the Block-ISD is 84.6% more than the classical-ISD or BP techniques. The AoD-adaptive subspace codebook provides a substantial reduction in the training and feedback overhead as well as the codebook size which results in improved data-rate and bandwidth-efficiency. It uses the idea that the time-based variation in channels path-AoDs is much slower than path-gains. The proposed results are so close to the theoretical analysis in terms of peruser data-rate and feedback bits. As we have proved that the feedback bits vary linearly with the number of paths P rather than the whole N-dimensional space which provides freedom from the large BS antenna-based feedback bits scaling. The S-CoSaMP algorithm is used for joint channel training and feedback which is based on CS that further reduces the channel overhead. It has reduced The loss can be high when the SINR increases to a large level which means that the hardware impairment substantially affects the EE in high SINR levels. Moreover, the EE loss is negligible for hardware impairment of level of 0.1 and below. Therefore, it can be concluded that for the moderate hardware impairment level, the effect on the system under consideration is negligible.

Conclusions
This paper scrutinizes the feedback and pilot overhead issues of uplink and downlink channel estimation in massive MIMO systems. It also investigates the EE evaluation in terms of BS density, UE density, and hardware impairment levels respectively. For the UL-channel estimation, it is concluded that by using temporal-correlation of massive MIMO time-varying channels which produces the compression-efficient differential CIR has 25% less overhead requirement or in other words, 25% more effective as compare to the direct CS (DCS) algorithm. It is also shown that the proposed CS algorithm has improved NMSE than the DCS algorithm. For the DL-channel estimation, the proposed Block-ISD scheme shows enhanced performance as compare to classical-ISD and BP schemes. It utilizes spatial-correlation of massive MIMO channels which produces block-sparse CIR that have more compression-efficiency and due to such mechanism, the pilot-overhead is decreased by 84.6%. In other words, the efficiency of the Block-ISD is 84.6% more than the classical-ISD or BP techniques. The AoD-adaptive subspace codebook provides a substantial reduction in the training and feedback overhead as well as the codebook size which results in improved data-rate and bandwidth-efficiency. It uses the idea that the time-based variation in channels path-AoDs is much slower than path-gains. The proposed results are so close to the theoretical analysis in terms of per-user data-rate and feedback bits. As we have proved that the feedback bits vary linearly with the number of paths P rather than the whole N-dimensional space which provides freedom from the large BS antenna-based feedback bits scaling. The S-CoSaMP algorithm is used for joint channel training and feedback which is based on CS that further reduces the channel overhead. It has reduced the joint UL/DL overhead by 30%. It also provides the benefit of getting rid of the burden on the UE which estimate their channel and then feedback to BS. The UE directly feedback their pilots to BS and then the CS algorithm obtain the required CSI at the BS. The EE levels of the proposed system outperform the SIMO and MIMO usage cases which makes it suitable for small cells dense networks operation. Also, the EE levels are optimum in upper and lower bounds which are defined using three different SINR level of 1, 3 and 7, respectively. The EE gap between both bounds is close enough which decreases further with increasing SINR levels. We also conclude that the EE level decreases as the SINR levels increases which provide information to specify a target SINR for each considered usage case in practical perspective. Therefore, the EE increases with increasing BS density, UE density, and lower hardware impairment levels respectively. Our future work is focused on reducing the channel overhead for which we will utilize the spatial-correlation of CSI from various UEs at the BS. Also, the quantization error of AoD in terms of feedback bits will be evaluated and analyzed.