A User Cooperation Approach for Interference Cancellation in FDD Massive MIMO Systems

: The performance of a massive multi-user Multi-Input Multi-Output (MIMO) system, operating in Frequency Division Duplex (FDD) mode, severely degrades under imperfect Channel State Information (CSI). Among the main challenges toward the acquisition of su ﬃ ciently accurate CSI at the transmitter is the issue of enormous CSI feedback overhead. In this paper, a novel interference cancellation strategy is proposed to alleviate the overhead. The concept of a device-to-device based interference cancellation strategy was hinted in some prior works but has not been fully exploited in the multi-user MIMO systems, especially when the number of antennas becomes large. Hence, this paper aims to exploit the potential of User Equipment (UE) cooperation to reduce the dependency of precoder at the transmitter to the accuracy of CSI. To do so, adjacent pieces of UE that experience correlated CSI are clustered in a similar group, jointly adjusting their receive antenna combining the weight vector to maximize the channel vector orthogonality. Simulation results show that the proposed strategy reduces the dependency of system performance on the accuracy of CSI feedback; moreover, compared to the conventional limited feedback strategy, a larger number of antennas can be deployed at the transmitter.


Introduction
Massive Multiple-Input Multiple-Output (MIMO) is a promising technology for the Fifth Generation (5G) of cellular networks [1][2][3]. In Massive-MIMO, it is assumed that the number of transmit antennas is 10 times larger than the number of receive antennas [1]. Such a massive number of antennas would provide significant multiplexing and diversity gains and serve a larger number of users in parallel [2]. The theory has anticipated that in rich scattering environments, by increasing the number of transmit antennas, the greater data-rate can be achieved without increasing bandwidth [3]. That is because, by adding multiple antennas, a greater degree of freedom, in addition to time and frequency dimensions, in wireless channels can be offered to provide more data-rate [4]. Moreover, Massive-MIMO will minimize intra and inter-cell interferences, by narrowing and focusing the radiated energy toward the intended user's direction [3].
Although Massive-MIMO can provide numerous advantages over other prominent technologies, it is still involved with technical challenges and constraints that should be solved before its realization. Recent works [5,6] addressed the practical challenges of implementing Massive-MIMO in Time Division Duplex (TDD) and Frequency Division Duplex (FDD) modes. Among the addressed problems, the issue the proposed schemes could reduce the complexity associated with the computation of singular value decomposition in conventional methods.
In [22], the authors proposed a compressive feedback beamforming scheme based on extracting the angular and spatial domain sparsity of Massive-MIMO channels. In this method, UE generates compressed feedback content using the angle-domain sparsity of channel and then conveys it back to the BS. Although the simulation results demonstrated the potential of the proposed method to achieve considerable performance gain over conventional feedback schemes, the scheme is still involved with technical limitations such as CSI feedback accuracy, CSI feedback overhead, and a limited number of transmit antennas.
The work in [23] proposed a protocol for fast fading FDD Massive-MIMO systems. The approach relies on Device-to-Device (D2D) cooperation to avoid instantaneous CSI feedback. In this protocol, UE cooperates to generate a virtual MIMO, while it is assumed that the BS is aware of downlink CSI as well as UE cooperation. Nevertheless, the assumption that the transmitter is aware of CSI in advance is unrealistic. The authors in [24] proposed a projection-based differential feedback protocol to reduce the number of feedback bits for Massive-MIMO. In this protocol, a difference between original and predicted vectors is projected, and quantization is performed in a smaller dimensional subspace. The strategy exploits both temporal and spatial correlation of the Massive-MIMO channel. However, the employed prediction method enhances the system complexity, especially in the presence of a large number of antennas.
The authors in [25] proposed a new metric for the UE scheduling to increase the multi-user diversity gain in the FDD Massive-MIMO system. The metric was derived based on an approximate lower bound of expected Signal to Interference plus Noise Ratio (SINR) value. Additionally, the authors proposed a new user grouping algorithm to reduce inter-group interference. This work assumed that the BS has perfect knowledge of the channel covariance matrix, while in reality, acquiring even a rough estimation of CSI in an FDD Massive-MIMO system is highly challenging.
Chen et al. in [26] proposed a precoding scheme for the Massive-MIMO system. The scheme is based on spatial coordination and exploiting the null-space of user signals, where to avoid interference, the BS precodes the signal within the null-space of the victim UE's channel. This scheme suffers from a prohibitive amount of feedback requirement and encoding complexity, especially for the case of Massive-MIMO, when the number of transmit antennas is massive. The authors in [27] proposed a CSI feedback strategy for the FDD Massive-MIMO system. In this strategy, UE is grouped into two classes according to their channel covariance. To do so, a Signal to Interference-plus-Noise Ratio (SLNR)-based precoder was developed for handling the instantaneous and statistical CSI. The authors claimed that the proposed strategy could outperform the conventional limited feedback strategy under a limited number of feedback bits.
As addressed, the aforementioned interference cancellation strategies mainly suffer from high computational complexity and CSI feedback overhead, which is due to the utilization of a conventional cell-centric interference cancellation strategy. To effectively cancel interference by conventional limited feedback techniques, precise cooperation between transmitter and receiver is required, which results in some technical challenges. One of the main challenges is utilizing complex optimization techniques at the receiver to jointly maximize the channel gain and minimize interference leakage. This is because UE should estimate the weight vector due to the non-cooperative nature of the channel. Another technical challenge is the computation of effective channel direction at the receiver side, in which the accuracy of channel direction directly affects the performance of the precoder at the transmitter side. The design of the codebook for quantizing CSI and exploiting the spatial dimensions of the channel matrix is highly challenging, especially when the scale of the system increases, which directly impacts the CSI feedback load and CSI accuracy. The last but not least technical challenge is the computational complexity of the precoder at the transmitter, while the complexity enhances by the channel dimension and the performance is limited by the accuracy of the quantized version of the estimated value of the effective channel.
Electronics 2020, 9, 1679 4 of 18 This paper aims to propose a novel interference cancellation strategy for the FDD large-scale MU-MIMO system. To do so, instead of assigning the task of interference cancellation to the BS, like conventional limited feedback strategies, the main task is assigned to the UE. In this strategy, adjacent active UE shares knowledge of its estimated downlink channel to jointly adjust the receive antenna combiner weight vector, which reduces the complexity of receiver combiner computation, as well as channel direction computation. To reduce the dependency of downlink transmission performance on CSI's accuracy, the computation method of the receive antenna combiner weight vector is developed in such a way that maximizes the effective channel gain, which consequently reduces the precoder computational complexity, as well as CSI feedback overhead.
We would like to emphasize that the proposed strategy should not be viewed as a complete scheme to cancel interference in a Massive-MIMO system. Instead, it is a general concept for changing the limited feedback interference cancellation strategy from the conventional cell-centric to a user-centric. Therefore, the downlink channel estimation, quantization, and precoding methods are not restricted to the utilized schemes, and any of the existing methods can be employed.
The key contributions of this paper are summarized as follows: The performance of the developed interference cancellation strategy is validated by link-level simulation, and the system throughput performance is evaluated under CAS and DAS configurations. The simulation results show that the proposed strategy can be effectively applied to the MU-MIMO systems, especially when the network is congested, and a large number of transmit antennas is utilized.
The rest of this paper is organized as follows. Section 2 presents the system model and the limited feedback technique for the MU-MIMO system. Section 3 proposes different phases of the proposed User Cooperative Interference Cancellation (UCIC) strategy. Section 4 validates the performance of the developed interference cancellation strategy in different scenarios, and Section 5 concludes the paper.
Notation: we use uppercase A for matrices, lowercase a for vectors, and calligraphic A for sets. The l 2 norm of a vector is denoted as a , and A denotes the Frobenius norm of a matrix. The size of a set is given by |A|. A −1 is the inverse of a matrix. The Moore-Penrose pseudo inverse is A † . A ∈ C m×n means A is a matrix with size m × n. a ∈ N (m.R) denotes a complex-valued circularly symmetric Gaussian random vector a, with means m and variance R. A T denotes the transpose of a matrix. A * denotes the conjugate-complex matrix, and A H is the conjugate-transpose. E{a} is an expended value of a vector. The operator diag(a) creates a diagonal matrix, whose main diagonal entries equal a.

System Model
This section presents the system model and the conventional limited feedback scheme for multiuser MIMO downlink transmission. We consider a single BS that is equipped with m transmit antennas Electronics 2020, 9, 1679 5 of 18 and K UE equipped with n receive antennas. A linear multi-user precoding is considered; therefore, the input-output relationship of k th user can be given as where y k is the perturbed received signal vector, H k ∈ C m×n and n k ∈ C n×1 are the channel matrix and the noise of kth user, x k is the transmitted symbol which is independently generated by channel encoders with statistical power E |x k | 2 = 1. f k ∈ C m×1 is the precoding vector. The focus of this study is on the ZF precoding technique, where the transmitter sends a single stream per UE. It is assumed that n k ∼ CN O, σ 2 n I n is Additive White Gaussian Noise (AWGN), and K ⊆ {1, . . . , K} is the set of selected users that are served in parallel over a given time-frequency resource.
At the receiver side, users apply a linear receiver filter, termed as receive antenna combiner weight vector, w k , to equalize their respective channels. The symbol of the k th user,ŝ k , can be estimated by applying w k to the perturbed receive vector y k , aŝ According to the limited feedback technique, for interference cancellation, a collaboration between precoder, f k (at the transmitter), and receive antenna combiner, w k (at the receiver), is required. Therefore, w k and f k should be computed in such a way that interference is avoided between users.
Generally, CSI can be divided into two categories: (i) Channel Direction Indicator (CDI), which is mainly utilized for computing the precoder matrix and (ii) Channel Quality Indicator (CQI), which is mainly used for UE-scheduling, power allocation, and modulation and coding-rate adaptation.
To achieve CDI, UE first computes its effective channel direction, as h k = h k / h k , where h k is the effective channel vector and it is a concatenation of the channel matrix and receive filter as h k = H k w k . Note, h k contains essential channel statistics for interference cancellation, and this information depends on the design criteria of the receive filter. After that, to save the uplink bandwidth, UE quantizes h k to a unit norm vector,ĥ k , by utilizing a quantization codebook, Ω, as The CQI can be achieved from the SINR value at the receiver side. The exact value of the SINR can be formulated as [28] where p k is the assigned transmission power to k th user and θ k is the angle between effective channel direction and quantized channel direction. g k = g k / g k , and g k is the k-th column of the right pseudo-inverse of the quantized channel matrix. e k is the quantization error, as After that, to save the uplink bandwidth, the UE quantizes SINR by applying a quantization function, Φ, as CQI k = Φ(SINR k ).
In the next step, each piece of UE feeds back its quantized CDI and CQI to the BS through the uplink channel. At the BS, the ZF precoder utilizes CSI for multi-user transmission. The ZF precoder Electronics 2020, 9, 1679 6 of 18 vector for the kth UE is computed in such a way that f k is orthogonal with the quantized effective channel vector of other users, h i for i ∈ K \{k}.

User Cooperative Interference Cancellation Strategy
This section describes different phases of the proposed interference cancelation strategy, which is termed as User Cooperative Interference Cancellation (UCIC) strategy. It is well known that to effectively cancel interference between users, an appropriate collaboration between the precoder, at the transmitter, and receive filter, is required at the receiver. Conventionally, the main task of interference cancellation is assigned to the transmitters; hence, the accuracy of downlink CSI at the BS plays a vital role in achieving the system capacity. To reduce the dependency of system performance on CSI feedback accuracy, the UCIC strategy is proposed. Figure 1 visualizes the proposed strategy.
Electronics 2020, 9, x FOR PEER REVIEW 6 of 18 performance on CSI feedback accuracy, the UCIC strategy is proposed. Figure 1 visualizes the proposed strategy. The UCIC strategy is developed based on the proximity of UE. In UCIC, vicinity UE cooperates in canceling the interferences. First, each piece of UE estimates its downlink channel, then by utilizing an unlicensed frequency band, broadcasts its estimated channel to share with other active UE that is located in its vicinity. In the next step, UE located close to other UEs that experiences roughly the same channel characteristics, is grouped into a cluster. In each group, a piece of UE is selected as the Head of clustered UE (H-UE). Then, H-UE by having knowledge of its cluster mates' CSI, starts to jointly adjust receive antenna combiner weight vectors of itself and its cluster mates. Then, only H-UE of each cluster feeds back the mutual CSI (CQI) to the BS. Therefore, instead of feeding back CSI by each piece of UE individually, only H-UE feeds back the mutual CSI, and other UE only feeds back its CDI, which consequently results in a lower CSI feedback overhead. Finally, the CSI feedback information at the BS is utilized for canceling the residual interference between pieces of UE, as well as UE scheduling, power, and resource allocation, adjusting MIMO mode, and modulation and coding scheme. The following describes different phases of the developed strategy in more detail.

Downlink Channel Estimation
As the first phase, each piece of UE individually estimates its desired downlink channel. To do so, the Minimum Mean Squared Error (MMSE) channel estimation technique is utilized. The downlink channel estimation is learned through known pilots that are transmitted by the BS. The training sequence can be denoted as Ψ ∶= [ψ 1 ; ⋯ ; ψ i ], where ψ i for i = 1, ⋯ , m, and the training sequence is normalized as ΨΨ H = I m . The received training signal at user k is The UCIC strategy is developed based on the proximity of UE. In UCIC, vicinity UE cooperates in canceling the interferences. First, each piece of UE estimates its downlink channel, then by utilizing an unlicensed frequency band, broadcasts its estimated channel to share with other active UE that is located in its vicinity. In the next step, UE located close to other UEs that experiences roughly the same channel characteristics, is grouped into a cluster. In each group, a piece of UE is selected as the Head of clustered UE (H-UE). Then, H-UE by having knowledge of its cluster mates' CSI, starts to jointly adjust receive antenna combiner weight vectors of itself and its cluster mates. Then, only H-UE of each cluster feeds back the mutual CSI (CQI) to the BS. Therefore, instead of feeding back CSI by each piece of UE individually, only H-UE feeds back the mutual CSI, and other UE only feeds back its CDI, which consequently results in a lower CSI feedback overhead. Finally, the CSI feedback information at the BS is utilized for canceling the residual interference between pieces of UE, as well as UE scheduling, power, and resource allocation, adjusting MIMO mode, and modulation and coding scheme. The following describes different phases of the developed strategy in more detail.

Downlink Channel Estimation
As the first phase, each piece of UE individually estimates its desired downlink channel. To do so, the Minimum Mean Squared Error (MMSE) channel estimation technique is utilized. The downlink Electronics 2020, 9, 1679 7 of 18 channel estimation is learned through known pilots that are transmitted by the BS. The training sequence can be denoted as Ψ := [ψ 1 ; · · · ; ψ i ], where ψ i for i = 1, · · · , m, and the training sequence is normalized as ΨΨ H = I m . The received training signal at user k is where p is the average transmit power during the pilot training period. By applying the conjugatecomplex matrix of ψ k to (7), the observed channel can be written aŝ whereĤ k is pilot training, and 1 √ p × n k × ψ * k is noise during pilot training. By applying the MMSE scheme to (8), the channel can be estimated as and where H k is the estimated channel of k th user, and R k is the Eigenspace representation of the channel matrix. For more details about the channel estimation, readers are referred to [29].

CSI Broadcasting
After each piece of UE estimates its desired downlink channel, it is time for the UE to share its knowledge of the estimated channel with other activated UE. For this purpose, UE can utilize an unlicensed frequency band and broadcast its H i based on D2D short-range communication.

Correlated UE Clustering and H-UE Selection
In this stage, thanks to the H i broadcasting, UE can identify H i of other UE in its vicinity. Generally, this knowledge can be categorized into two groups: (i) knowledge about large-scale fading, and (ii) knowledge about small-scale fading. It is known that small scale fading is uncorrelated between the components of the channel matrix, while it can be assumed that proximate UE can experience correlated large scale fading between pieces of equipment, such as path loss and shadow fading. Thus, it can be assumed that CQI can be almost similar between adjacent UE. Therefore, correlated UE can be grouped into one cluster.
For UE clustering, the K-means clustering algorithm is utilized, which partitions UE into different groups based on the similarity of its channel covariance matrices. To do so, the coefficients of the Rayleigh channel matrix is considered as h k ∼ CN (0, R k ), where coefficients are mutually independent across the pieces of UE, and R k is given as where U k is a m × r k matrix of the eigenvectors that correspond to the r k non-zero eigenvalues of R k, and Λ k is a diagonal r k × r k matrix composed of the non-zero eigenvalues of R k . The Karhunen-Loeve representation of h k is given as Electronics 2020, 9, 1679 8 of 18 where the entries of v k are i.i.d. ∼ CN (0, 1). The K-means algorithm uses the chordal distance between the covariance eigenspaces-i.e., U * k : k = 1, . . . , K , as where U m is the mean of g unitary matrices, as where eig p [X] is the unitary matrix formed by the p dominant eigenvectors of X. The pseudo-code of the clustering decision making process is presented in Algorithm 1.  (13) for every k and g Step 2: Assign correlated UE into similar groups Step 3: Update U m in (14) Step 4: Randomly select a UE as H-UE end for

Jointly Adjusting Receive Antenna Combiner Weight Vector
At this stage, it is time for jointly computing and adjusting receive antenna combiner weight vectors, w k . To do so, first, H-UE computes its receive antenna combiner weight vectors, w H−UE based on the MET technique [30], to maximize its effective channel gain as where it is assumed that the singular values σ  According to (20), H-UE computes the w k of its group mates, in such a way to be orthogonal with its effective channel, by projecting its effective channel to the effective channel of UE k . It should be noted that to compute CDI, the knowledge of effective channel direction,ĥ k is needed, which can be computed asĥ where h k is the effective channel vector, and it is a concatenation of channel matrix and receive filter, as It should be noted thatĥ k contains essential channel statistics for residual interference cancellation at the BS; hence, the design criteria of receive filter play an important role in enhancing system performance. Additionally, it is well known that adjacent pieces of UE are the main victim of inter-user interference. Therefore, by jointly adjusting the receive antennas' combiner vectors, more useful CDI can be fed back to the BS, for both inter-user interference and inter-sector interference cancellation.

CSI Computation, Quantization, and Feedback
As described earlier, only H-UE computes its own CQI, based on the expected value of SINR, as where p H−UE is the assigned transmission power to H-UE, and θ H−UE is the angle between effective channel direction and quantized channel direction. The H-UE computes and adjusts the receive antenna combiner vector for itself, as well as its group mates. Hence, H-UE must update its group mates about their w k , where it can be performed by broadcasting w k via the same broadcasting procedure described in phase B. Afterward, each piece of UE knows its own w k , and, therefore, starts to compute its effective channel direction according to (22), as well as CDI as CDI k : where Ω is a quantization function. For this purpose, the Random Vector Quantization (RVQ) method [31] is utilized, which is known for its advantages in CSI comparison. After this, each piece of UE individually feeds back its CDI to the BS.

Base Station Procedures
BS utilizes the received CDIs of all participating pieces of UE to compute the precoder matrix. In this regard, the ZF precoder is utilized, which can be computed as The design criteria of ZF precoder is based on the orthogonalization of quantized effective channel vectors of UE, asĥ † Moreover, BS utilizes the CQIs of H-UE for UE scheduling, resource allocation and modulation, and coding rate adaptation.

Simulation Setup
To evaluate the performance of the developed UCIC strategy, four scenarios are considered for the simulation, as: In Figure 2, the four scenarios mentioned above are visualized. A single-cell multi-user simulation scenario is considered, where one BS with a hexagonal grid architecture is employed, and out-of-cell interference is ignored. Generally, six users are distributed over the entire cell area, each of which is equipped with two antennas. In scenarios with DAS architecture (Scenario III and IV), the central BS is covered with R uniformly distributed Radio Remote Units (RRUs), which are located equiangularly on a ring with a radius of (2/3) × cellradius. The total number of transmit antennas is uniformly distributed among the central BS and RRUs. For more details about the DAS architecture, readers are referred to [31,32].
The results are based on the link-level simulation, where PHY procedures of single-cell multi-user scenarios are considered. For this purpose, version 1.3 of the Vienna link-level simulator [33] is used. Notice that the precoder and codebook quantization methods, which are used in the simulator, are based on the LTE and LTE-A specifications. Thus, in the simulations, the scale of the considered MIMO system is limited to 52 antennas. That is because by increasing the number of transmit antennas, the computation complexity of the codebook will increase dramatically, which consequently results in an unreasonable simulation run time. out-of-cell interference is ignored. Generally, six users are distributed over the entire cell area, each of which is equipped with two antennas. In scenarios with DAS architecture (Scenario III and IV), the central BS is covered with R uniformly distributed Radio Remote Units (RRUs), which are located equiangularly on a ring with a radius of (2 3 ⁄ ) × cellradius. The total number of transmit antennas is uniformly distributed among the central BS and RRUs. For more details about the DAS architecture, readers are referred to [31,32].
The results are based on the link-level simulation, where PHY procedures of single-cell multiuser scenarios are considered. For this purpose, version 1.3 of the Vienna link-level simulator [33] is used. Notice that the precoder and codebook quantization methods, which are used in the simulator, are based on the LTE and LTE-A specifications. Thus, in the simulations, the scale of the considered MIMO system is limited to 52 antennas. That is because by increasing the number of transmit antennas, the computation complexity of the codebook will increase dramatically, which consequently results in an unreasonable simulation run time. Table 1 presents simulation parameters under consideration.  At the transmitter, a greedy scheduler algorithm [28] is employed to assign resource blocks to the users. The scheduling decision is made according to the CQI feedback in (23). The ZF precoding technique, as described in 3.6, is employed at the transmitter side. The precoder uses the CDI feedback in (24) to compute the precoding matrix. The signaling transmission is assumed to be error-  At the transmitter, a greedy scheduler algorithm [28] is employed to assign resource blocks to the users. The scheduling decision is made according to the CQI feedback in (23). The ZF precoding technique, as described in 3.6, is employed at the transmitter side. The precoder uses the CDI feedback in (24) to compute the precoding matrix. The signaling transmission is assumed to be error-free.
To model the channel, a narrow-band multi-scattering directional channel model [34] is used, as where N p is the number of paths, α p,k is the complex-valued amplitude of pth path, α t,k and α r,k are the antenna array response vectors at the transmitter and receiver, respectively. ϕ (t,k) p , θ t,k p and ϕ (r,k) p , θ r,k p are the azimuth and elevation angles at the transmitter and receiver, respectively. In a large-scale MIMO system, the number of transmit antennas is much larger than the number of paths; therefore, when N p grows large, the considered channel can be represented as a Rayleigh fading model. Therefore, the CDI feedback of one subcarrier is adequate to set the same precoder for the whole bandwidth. The temporal evolution of the small-scale channel matrix is determined by a correlated block-fading model, and the channel is assumed to be invariable during the subframe interval of 1 ms. Hence, the period of the CSI feedback is set as 1 ms.
A two-dimensional horizontally aligned uniform linear antenna array (ULA) is utilized in simulation, where the array response vector can be represented as where x {r, t}, g e(φ) is the complex-valued antenna-element gain pattern, d h is the inter antenna-element spacing in multiples of the wavelength λ. In the simulation, it is assumed that g e(φ) = 1, d h = λ/2, and φ is uniformly distributed, φ [−π, π]. For more details about the channel modeling and the antenna specification, readers are referred to [15]. To achieve the throughput, the achievable mutual information introduced in [ where N p is the number of paths, α p,k is the complex-valued amplitude of pth path, α t,k and α r,k are the antenna array response vectors at the transmitter and receiver, respectively. φ p (t,k) , θ p t,k and φ p (r,k) , θ p r,k are the azimuth and elevation angles at the transmitter and receiver, respectively.
In a large-scale MIMO system, the number of transmit antennas is much larger than the number of paths; therefore, when N p grows large, the considered channel can be represented as a Rayleigh fading model. Therefore, the CDI feedback of one subcarrier is adequate to set the same precoder for the whole bandwidth. The temporal evolution of the small-scale channel matrix is determined by a correlated block-fading model, and the channel is assumed to be invariable during the subframe interval of 1 ms. Hence, the period of the CSI feedback is set as 1 ms.
A two-dimensional horizontally aligned uniform linear antenna array (ULA) is utilized in simulation, where the array response vector can be represented as where xϵ{r, t}, g e(ϕ) is the complex-valued antenna-element gain pattern, d h is the inter antennaelement spacing in multiples of the wavelength λ. In the simulation, it is assumed that g e(ϕ) = 1, d h = λ/2, and ϕ is uniformly distributed, ϕϵ[−π, π]. For more details about the channel modeling and the antenna specification, readers are referred to [15]. To achieve the throughput, the achievable mutual information introduced in [33] is used as where N tot is the total number of usable subcarriers, Ϝ is the set of precoding matrices, I N is an identity matrix of size N, and σ n 2 is the energy of the noise and interference at the receiver. B sub is the bandwidth of a subcarrier and Z is a factor of the system losses due to the transmission of the reference symbols and the cyclic prefix.

Results and Discussion
This section presents the simulation results of the aforementioned four scenarios. Figure 3 shows the system throughput of Scenario I in terms of 0.95 Empirical Cumulative Distribution Function (ECDF), where different numbers of transmit antennas and CSI feedback bits are considered. The results reveal that by increasing the number of transmit antennas, the system throughput increases until a saturation level and after that, the system performance starts to degrade. The reason for this behavior is because, in the presence of a larger number of antennas, BS requires more accurate CSI to exploit the multiplexing gain and adapt downlink transmission with current channel conditions. Moreover, pilot training sequences scales linearly with the number of transmit antennas. Thus, by increasing the number of antennas, pilot sequences will occupy more portions of the downlink channel, and less portions will remain for data transmission. The result comparison reveals that by assigning a larger number of feedback bits, the system throughput can be improved considerably, where more accurate CSI can be delivered to the BS. The gap between system (29) where N tot is the total number of usable subcarriers, Electronics 2020, 9, x FOR PEER REVIEW

Scenario I: Conventional Limited Feedback Technique and CAS Configuration
where N p is the number of paths, α p,k is the complex-valued amplitude of pth path are the antenna array response vectors at the transmitter and receiver, respectively. φ p (r,k) , θ p r,k are the azimuth and elevation angles at the transmitter and receiver, respec In a large-scale MIMO system, the number of transmit antennas is much larger th of paths; therefore, when N p grows large, the considered channel can be represented fading model. Therefore, the CDI feedback of one subcarrier is adequate to set the sam the whole bandwidth. The temporal evolution of the small-scale channel matrix is de correlated block-fading model, and the channel is assumed to be invariable during interval of 1 ms. Hence, the period of the CSI feedback is set as 1 ms.
A two-dimensional horizontally aligned uniform linear antenna array (ULA) simulation, where the array response vector can be represented as , where xϵ{r, t}, g e(ϕ) is the complex-valued antenna-element gain pattern, d h is the element spacing in multiples of the wavelength λ. In the simulation, it is assumed d h = λ/2, and ϕ is uniformly distributed, ϕϵ[−π, π]. For more details about the cha and the antenna specification, readers are referred to [15].
To achieve the throughput, the achievable mutual information introduced in [ , where N tot is the total number of usable subcarriers, Ϝ is the set of precoding mat identity matrix of size N, and σ n 2 is the energy of the noise and interference at the re the bandwidth of a subcarrier and Z is a factor of the system losses due to the trans reference symbols and the cyclic prefix.

Results and Discussion
This section presents the simulation results of the aforementioned four scenarios. Figure 3 shows the system throughput of Scenario I in terms of 0.95 Empiric Distribution Function (ECDF), where different numbers of transmit antennas and CS are considered. The results reveal that by increasing the number of transmit antenn throughput increases until a saturation level and after that, the system performance sta The reason for this behavior is because, in the presence of a larger number of antenn more accurate CSI to exploit the multiplexing gain and adapt downlink transmissio channel conditions. Moreover, pilot training sequences scales linearly with the numb antennas. Thus, by increasing the number of antennas, pilot sequences will occupy m the downlink channel, and less portions will remain for data transmission. The resu is the set of precoding matrices, I N is an identity matrix of size N, and σ 2 n is the energy of the noise and interference at the receiver. B sub is the bandwidth of a subcarrier and Z is a factor of the system losses due to the transmission of the reference symbols and the cyclic prefix.

Results and Discussion
This section presents the simulation results of the aforementioned four scenarios. Figure 3 shows the system throughput of Scenario I in terms of 0.95 Empirical Cumulative Distribution Function (ECDF), where different numbers of transmit antennas and CSI feedback bits are considered. The results reveal that by increasing the number of transmit antennas, the system throughput increases until a saturation level and after that, the system performance starts to degrade. The reason for this behavior is because, in the presence of a larger number of antennas, BS requires more accurate CSI to exploit the multiplexing gain and adapt downlink transmission with current channel conditions. Moreover, pilot training sequences scales linearly with the number of transmit antennas. Thus, by increasing the number of antennas, pilot sequences will occupy more portions of the downlink channel, and less portions will remain for data transmission. The result comparison reveals that by assigning a larger number of feedback bits, the system throughput can be improved considerably, where more accurate CSI can be delivered to the BS. The gap between system throughputs reveals that the maximum number of transmit antennas that can be implemented at the transmitter highly depends on the accuracy of CSI and is limited by the CSI feedback overhead.  Figure 4 demonstrates the system throughput of Scenario II, where different numbers of transmit antennas and CSI feedback bits are considered. The results reveal that by utilizing the UCIC strategy, more transmit antennas can be deployed at the BS, compared to the conventional limited feedback strategy. Under considerations, the maximum number of transmit antennas is limited to 20, 24, and 28 antennas for 8, 12, and 16 feedback bits. The reason that UCIC outperforms the conventional limited feedback technique is that it jointly computes the receive antennas combiner weight vectors of clustered UE to orthogonalize their effective channels. Moreover, the results show that by utilizing the UCIC strategy, the gap between the system performances is reduced, where it can be interpreted as a reduction in the dependency of BS on the accuracy of CSI.  Figure 5 shows the system throughput of Scenario III, where different numbers of transmit antennas, at the central BS and RRUs, and CSI feedback bits are considered. As the results show, by applying the DAS configuration to the considered system, the performance gaps are reduced compared to the Scenario I, where it can be interpreted as a lower dependency of system performance  Figure 4 demonstrates the system throughput of Scenario II, where different numbers of transmit antennas and CSI feedback bits are considered. The results reveal that by utilizing the UCIC strategy, more transmit antennas can be deployed at the BS, compared to the conventional limited feedback strategy. Under considerations, the maximum number of transmit antennas is limited to 20, 24, and 28 antennas for 8, 12, and 16 feedback bits. The reason that UCIC outperforms the conventional limited feedback technique is that it jointly computes the receive antennas combiner weight vectors of clustered UE to orthogonalize their effective channels. Moreover, the results show that by utilizing the UCIC strategy, the gap between the system performances is reduced, where it can be interpreted as a reduction in the dependency of BS on the accuracy of CSI.  Figure 4 demonstrates the system throughput of Scenario II, where different numbers of transmit antennas and CSI feedback bits are considered. The results reveal that by utilizing the UCIC strategy, more transmit antennas can be deployed at the BS, compared to the conventional limited feedback strategy. Under considerations, the maximum number of transmit antennas is limited to 20, 24, and 28 antennas for 8, 12, and 16 feedback bits. The reason that UCIC outperforms the conventional limited feedback technique is that it jointly computes the receive antennas combiner weight vectors of clustered UE to orthogonalize their effective channels. Moreover, the results show that by utilizing the UCIC strategy, the gap between the system performances is reduced, where it can be interpreted as a reduction in the dependency of BS on the accuracy of CSI.  Figure 5 shows the system throughput of Scenario III, where different numbers of transmit antennas, at the central BS and RRUs, and CSI feedback bits are considered. As the results show, by applying the DAS configuration to the considered system, the performance gaps are reduced compared to the Scenario I, where it can be interpreted as a lower dependency of system performance on the accuracy of CSI feedback in a large-scale MIMO system with distributed antenna deployment  Figure 5 shows the system throughput of Scenario III, where different numbers of transmit antennas, at the central BS and RRUs, and CSI feedback bits are considered. As the results show, by applying the DAS configuration to the considered system, the performance gaps are reduced compared to the Scenario I, where it can be interpreted as a lower dependency of system performance on the accuracy of CSI feedback in a large-scale MIMO system with distributed antenna deployment configuration. Moreover, results show that by distributing transmit antennas, not only the system throughput can be enhanced (versus CAS configuration), but also, the maximum number of transmit antennas can be significantly increased. Under considerations, the maximum number of antennas is limited to 28, 36, and 36 antennas for 8, 12, and 16 feedback bits, respectively. The main reason for this behavior is the lower dependency of DAS on the accuracy of CSI. In DAS, transmit antennas are distributed throughout the cell, which results in lower access distance between UE and BS. Hence, lower power is required for downlink transmission, which results in lower interference, and in the presence of lower interference level (higher SINR level), less accurate CSI feedback can provide more system throughput compared to the CAS configuration.

Scenario III: Conventional Limited Feedback Technique and DAS Configuration
Electronics 2020, 9, x FOR PEER REVIEW 13 of 18 configuration. Moreover, results show that by distributing transmit antennas, not only the system throughput can be enhanced (versus CAS configuration), but also, the maximum number of transmit antennas can be significantly increased. Under considerations, the maximum number of antennas is limited to 28, 36, and 36 antennas for 8, 12, and 16 feedback bits, respectively. The main reason for this behavior is the lower dependency of DAS on the accuracy of CSI. In DAS, transmit antennas are distributed throughout the cell, which results in lower access distance between UE and BS. Hence, lower power is required for downlink transmission, which results in lower interference, and in the presence of lower interference level (higher SINR level), less accurate CSI feedback can provide more system throughput compared to the CAS configuration.  Figure 6 shows the system throughput of Scenario IV, where different numbers of transmit antennas and CSI feedback bits are considered. As the results illustrate, by applying the UCIC strategy to the distributed Massive-MIMO system, the largest maximum number of transmit antennas can be effectively implemented at the transmitter, compared to the other three scenarios, where it is 28, 36, and 40 antennas for 8, 12, and 16 feedback bits, respectively. Moreover, the results show a larger system throughput gap reduction, versus Scenario III. This is because by integrating DAS configuration with UCIC strategy, the dependency of BS on the accuracy of CSI feedback will be minimized. On the other hand, this integration is robust against imperfect CSI, where BS can exploit more multiplexing gain and effectively mitigate interference with less CSI feedback, resulting in higher system throughput.  Figure 6 shows the system throughput of Scenario IV, where different numbers of transmit antennas and CSI feedback bits are considered. As the results illustrate, by applying the UCIC strategy to the distributed Massive-MIMO system, the largest maximum number of transmit antennas can be effectively implemented at the transmitter, compared to the other three scenarios, where it is 28, 36, and 40 antennas for 8, 12, and 16 feedback bits, respectively. Moreover, the results show a larger system throughput gap reduction, versus Scenario III. This is because by integrating DAS configuration with UCIC strategy, the dependency of BS on the accuracy of CSI feedback will be minimized. On the other hand, this integration is robust against imperfect CSI, where BS can exploit more multiplexing gain and effectively mitigate interference with less CSI feedback, resulting in higher system throughput.

Scenario IV: UCIC Strategy and DAS Configuration
show a larger system throughput gap reduction, versus Scenario III. This is because by integrating DAS configuration with UCIC strategy, the dependency of BS on the accuracy of CSI feedback will be minimized. On the other hand, this integration is robust against imperfect CSI, where BS can exploit more multiplexing gain and effectively mitigate interference with less CSI feedback, resulting in higher system throughput.

Performance Comparison
The following compares the performance of the aforementioned four scenarios, in terms of (i) amount of CSI feedback reduction, (ii) the maximum number of effective transmit antennas, and (iii) system throughput. In the UCIC strategy, the amount of CSI feedback overhead reduction depends on (i) the amount of active UE in a cell, (ii) cluster size of correlated UE, and (iii) portion of CSI feedback bits which are assigned to CDI and CQI. In this regard, the amount of feedback overhead reduction can be calculated according to the following developed equation: where K tot , K n , and K H are the total amount of active UE, amount of non-clustered UE, and the amount of H-UE, respectively. B CSI , B CDI , and B CQI are the number of CSI, CDI, and CQI feedback bits assigned to each piece of UE, respectively, where B CSI = B CDI + B CQI . The partitioning of CSI bits depends on the considered system and related parameters. Hence, let us assume that all active UE is clustered, and 12 bits are assigned for CSI feedback, and the codebook size is 256. Therefore, 8 bits are needed for CDI feedback (2 8 ), and 4 bits are available for CQI feedback. Thus, according to (21), Figure 7 demonstrates the amount of achievable CSI feedback overhead reduction by the UCIC strategy, in the presence of different amounts of UE and cluster size. As can be seen from the results, the amount of CSI feedback overhead reduction by the UCIC strategy is between 17 to 33 percent, where it depends on the ratio between the amount of active UE and the cluster size of correlated UE. Table 2 presents the maximum number of effective transmit antennas that can be deployed, in four considered scenarios, where different numbers of feedback bits are assigned. The comparison between the results of Scenario I and II reveals that the UCIC strategy can enhance the number of antennas up to 25%. This is because, by jointly adjusting the receive antenna weight vector, the orthogonality between the downlink channel of adjacent UE can be enhanced, and consequently, more multiplexing gain can be exploited. The results of Scenario III and IV show that by distributing transmit antennas throughout the cell, a larger number of transmit antennas can be implemented in the FDD Massive-MIMO system, compared to the CAS configuration. Moreover, by comparing the results of Scenario I and IV, it can be seen that by integrating the UCIC strategy with the DAS technique and applying to the Massive-MIMO system, up to 75% enhancement in the maximum number of transmit antennas can be achieved. clustered, and 12 bits are assigned for CSI feedback, and the codebook size is 256. Therefore, 8 bits are needed for CDI feedback (2 8 ), and 4 bits are available for CQI feedback. Thus, according to (21), Figure 7 demonstrates the amount of achievable CSI feedback overhead reduction by the UCIC strategy, in the presence of different amounts of UE and cluster size. As can be seen from the results, the amount of CSI feedback overhead reduction by the UCIC strategy is between 17 to 33 percent, where it depends on the ratio between the amount of active UE and the cluster size of correlated UE.  Table 2 presents the maximum number of effective transmit antennas that can be deployed, in four considered scenarios, where different numbers of feedback bits are assigned. The comparison between the results of Scenario I and II reveals that the UCIC strategy can enhance the number of antennas up to 25%. This is because, by jointly adjusting the receive antenna weight vector, the orthogonality between the downlink channel of adjacent UE can be enhanced, and consequently, more multiplexing gain can be exploited. The results of Scenario III and IV show that by distributing transmit antennas throughout the cell, a larger number of transmit antennas can be implemented in the FDD Massive-MIMO system, compared to the CAS configuration. Moreover, by comparing the results of Scenario I and IV, it can be seen that by integrating the UCIC strategy with the DAS technique and applying to the Massive-MIMO system, up to 75% enhancement in the maximum number of transmit antennas can be achieved.    Figure 8 compares the system throughputs of the aforementioned four scenarios, where 8 bits are assigned for CSI feedback. The results show that by utilizing the UCIC strategy, a larger number of antennas can be implemented at the transmitter, and larger system throughput can be achieved. To analyze the results in more detail, Table 3 presents the average system throughput improvements between the considered scenarios, where different CSI feedback bits are considered. The average system throughput improvement is calculated according to the following developed equation: where L UCIC and L LF represent the 0.95 ECDF of system throughput with UCIC and the conventional limited feedback technique, respectively, and κ is the index of the maximum number of transmit To analyze the results in more detail, Table 3 presents the average system throughput improvements between the considered scenarios, where different CSI feedback bits are considered. The average system throughput improvement is calculated according to the following developed equation: where L UCIC and L LF represent the 0.95 ECDF of system throughput with UCIC and the conventional limited feedback technique, respectively, and κ is the index of the maximum number of transmit antennas that can be deployed, κ ∈ {1, . . . , 12}, for the considered number of transmit antennas {8, 12, 16, . . . , 52}, respectively. To conclude the results, the proposed UCIC strategy can: (i) reduce the CSI feedback overhead by up to 33%, (ii) enhance the maximum number of transmit antennas by up to 25%, and (iii) improve the system throughput by up to 5.7%. Moreover, by integrating UCIC with DAS and applying it to the Massive-MIMO system: (i) system performance can be improved by 35.5%, and (ii) the maximum number of antennas can be increased by 75%, compared to the centralized Massive-MIMO system with the conventional limited feedback technique.

Conclusions
In this paper, a new interference cancellation strategy was designed and developed to reduce the CSI feedback overhead in the FDD Massive-MIMO system. The strategy was based on the user device cooperation and consists of two phases. In the first phase, an algorithm was developed to cluster the correlated UE into similar groups. In the second phase, based on the shared CSI by UE, a jointly adjusting receive antenna combining method was developed. The Massive-MIMO system's performance with centralized and distributed antenna configurations was investigated, where the conventional limited feedback strategy and the developed UCIC strategy were applied to the system. The simulation results revealed that by integrating the UCIC strategy with the DAS configuration, not only a larger number of effective transmit antennas can be implemented in the Massive-MIMO system, but also lower CSI feedback overhead can be achieved for interference cancellation. We believe that the results from this paper are significant for the 5G systems since the strategy proposes a solution to overcome the problem of deploying a large number of antennas in FDD Massive-MIMO systems. For future work, it would be interesting to consider the effect of channel aging on the performance of the developed strategy. Moreover, it would be beneficial to investigate the optimal cluster size and impact of imperfect UE clustering on the system performance.