Spatio-Radio Resource Management and Hybrid Beamforming for Limited Feedback Massive MIMO Systems

.


Introduction
The scarcity of available frequency band for wireless communications has led to the inclusion of millimeter Wave (mmWave) frequencies in cellular communications.This has opened the doors for massive multiple-input multiple-output (MIMO) systems.Due to high transmission frequencies, fabrication of large number of antennas with small form factor has become possible.MmWave band has inherent hindrances, like, high path-loss and absorption-loss.It has been known that MIMO systems advantages (spatial multiplexing or diversity gain) are scaled-up with the number of antennas.In summary, one can enjoy the benefits of the large bandwidth available at mmWave frequencies by combating high path and absorption losses with massive MIMO directional beamforming.Future mmWave massive MIMO-based cellular networks will be as shown in Figure 1.Due to the high pathloss on one hand and high directional gain on the other hand, the inter-cell interference and cell boundaries will become meaningless.The fixed area size cell boundaries of traditional cellular will probably no longer exist in the future mmWave massive MIMO systems.Narrow beams can serve distant user equipment (UE) without interfering other UEs provided that there is no obstacle between BS and intended UE, whereas a closely located UE may deprive of connection due to the obstacles.The cost of massive MIMO is in terms of excessive feedback overhead for channel estimation along with the hardware complexity of RF chains (increased number of radio-frequency (RF) chains).The feedback overhead has been tackled separately for frequency division duplex (FDD) and time division duplex (TDD) systems.In FDD systems, the uplink channel estimation consists of fewer overheads compared to the downlink channel estimation, because generally, the number of transmit antennas N t is larger than the number of users K, and the number of receive antennas per user n k (N t K and N t n k ).The most common technique to reduce the downlink channel estimation overhead is joint spatial division multiplexing (JSDM) [1].The JSDM uses two-stage precoding: second order channel statistics (covariance)-based user grouping and the traditional MU-MIMO linear precoding (zero-forcing) for the inter-user interference mitigation based on the low-dimensional effective channel.In TDD, only uplink channel estimation is done and the downlink channel estimates are obtained by the transpose of the uplink channel using the channel reciprocity principle.The TDD massive MIMO systems suffer from pilot contamination when the BS receives non-orthogonal pilot signals from the neighboring cells.This pilot contamination degrades the channel estimation and hence, affects both uplink combining and downlink precoding.
In traditional MIMO systems, a separate RF chain (analog-to-digital converter/digital-to-analog converter, serial-to-parallel/parralel-to-serial converter, up/down converter etc) is required for each antenna, but the high power consumption makes it infeasible for the case of massive MIMO systems.Hybrid beamforming technique resolves this problem by dividing the precoding/combining into baseband digital processing and RF analog processing.The hybrid precoding and combining offer extra degrees of freedom in space domain with a large number of antennas and analog beamforming [2].The hybrid beamforming can be realized by using MU-MIMO precoding as baseband digital precoding and the statistical channel state information-based pre-beamforming as RF analog precoding.This limited feedback (due to average CSI) configuration is particularly suited for massive MIMO mmWave systems with a large number of antennas but relatively small number of RF chains [3].It has been shown [4], that the covariance-based limited feedback works well for mmWave massive MIMO systems, where the number of users is small with respect to the number of BS antennas and the channels are formed by a few multi path components (MPCs) with small angular spread.
Limited work has been done on the joint multiuser massive MIMO resource allocation and hybrid beamforming design.Although mmWave massive MIMO system has a potential of tremendous increase of spectral efficiency.However, the cost and power consumption of power-hungry radio frequency chains (analog-to-digital converter (ADC)/digital-to-analog converter (DAC), parallel to serial converter, serial to parallel converter, up converter/down converter) make it impractical to build a complete RF chain for each antenna.A promising solution to this problem is hybrid beamforming, where the precoder at the transmitter is divided into two parts: analog precoder and digital precoder.The analog precoder (usually a network of phase shifters) at the RF stage reduces the number of RF chains required for the digital precoder.In order to configure these precoders, the transmitter requires channel state information in the form of uplink feedback from users, but in the presence of massive antennas, this feedback becomes a huge load on the wireless uplink, especially in FDD mode.JSDM [4] is a technique used to reduce the feedback overhead.It uses slowly varying average channel statistics to implement the analog precoder; then, the digital precoder is realized by using a low-dimensional effective channel.Till now, different variants of the JSDM have been proposed.Li et al. [5] generalize the JSDM scheme to support non-orthogonal virtual sectorization and with multiple RF chains at both link ends.It uses the Kronecker channel model to decouple the transmit and receive beamforming.Under this channel, the analog beamformer is obtained by stacking strongest eigenbeams of the channel covariance matrix and then the digital beamformer is based on a weighted minimum mean squared error (MMSE) with effective channel.However, the Kronecker model does not characterize the mmWave channel where transmitter and receiver have coupling effects due to highly directional transmission.In [6], the authors apply JSDM using a geometrical channel model and find hybrid precoder and combiner at transmitter and receivers, respectively.Hybrid beamforming with switches (HBwS) has been introduced in [7], where, L × N t analog beamformer is controlled by N RF × L instantaneous CSI based switches.N t is the number of transmit antennas, N RF is the number of RF chains, and N RF < L < N t .Another switch-based analog beamforming is proposed in [8] but it requires instantaneous CSI for both switching network and the phase shifter network.Also it contains L = N t .The JSDM implementation also requires the training in the downlink to estimate the channel covariance matrix.Most of the work assumes that the CSI is known at both ends.In [9], authors consider the joint optimization of the training resource allocation and channel-statistics-based analog beamformer design by using user centric virtual sectorization.There are different structures for the phase shifter-based analog beamformer, namely, fully connected, sub-connected, and dynamically connected [10].Park [11] investigate JSDM with these analog beamformer architectures.The dynamic architecture gives better result at the cost of added complexity.In [12], authors propose a hybrid beamforming method with unified analog beamformer by Subspace Construction (SC) based on partial CSI in massive MIMO OFDM system.In [13], statistical CSI based analog beamformer uses regularized block diagonalization to mitigate the inter-group interference and instantaneous CSI based digital beamformer utilizes the weighted MMSE to suppress intra-group interference.Jiang et al. [14] jointly optimize the user selection and beam selection during analog beamforming design.They use Lyapunov-drift optimization framework to obtain the optimal solution.Their work only focuses on the design of statistical CSI based analog precoder and user/beam selection.Our previous work [15] on resource allocation for transmit beamforming develops digital and analog precoders which maximize the sum rate with total power and desired number of RF chains constraints.The provided solutions require full instantaneous CSI at the transmitter and receiver, which, in case of the massive MIMO, consists of large number of pilot transmission in downlink and channel information feedback in the uplink.In this work we exploit the channel similarities by grouping (K-Mean machine learning) the users based on the location information.Low complexity DFT matrix based analog precoder is derived using statistical CSI.This greatly reduces the feedback overhead for the design of zero-forcing digital precoder.
Machine learning (ML) applications for the physical layer of wireless communication systems have been widely reported in [16].Most of the conventional transmitter and receiver blocks can be replaced by an ML-based auto encoder as suggested by the authors.The large number of antennas in massive MIMO leads to the challenging issue of channel estimation in mmWave communications.A common practice in TDD massive MIMO systems is to utilize the channel reciprocity to get the downlink CSI from uplink channel information estimates.However, in FDD, the channel reciprocity is not applicable and the downlink CSI estimation is very difficult.The downlink channel estimation is known to be hampered by the pilot contamination effect (user to base-station).The quality of channel estimates is deteriorated by the mutual interference caused by the non-orthogonal pilots in a cell.In [17], a supervised learning-based pilot decontamination scheme for massive MIMO uplink is reported.In the proposed ML-based solution, the users' locations in all cells and the pilot assignments stand for the input features and output labels, respectively.In [18], a deep learning network CsiNet is used to learn the CSI-to-codeword transformation (codebook approach is usually adopted to reduce the feedback overhead) at users' terminals and inverse CsiNet at base-station.The authors of [19], suggest a learning-based antenna selection for massive MIMO systems.It uses a multiclass K-NN and support vector machine (SVM) for data-driven optimal antenna selection.Wang et al. [20] employs K-nearest neighbor (K − NN) supervised learning for the N beams allocation among K users.In [21], a reinforcement learning based framework for radio resource management in radio access networks has been proposed.In our previous work [22], we used neural networks to reduce the execution time of the computationally intensive resource allocation part of the joint resource allocation and hybrid beamforming design in [15].However, in this work, we use K-mean based unsupervised machine learning scheme to group the users based on their spatial locations.To the best of our knowledge, there is no research work that jointly consider the spatio-radio resource management and the hybrid beamforming in massive MIMO systems.
In this work, we use spatial channel covariance matrices for the analog beamforming design.We also consider the users to RF beam mapping.This mapping requires channel state information and a search over all possible beam combinations at the base-station.This search is exponential in the number of users [23].Due to this exponential increase in complexity, we use DFT-based eigenmode beams with RF switches.
Contribution: In this paper, we develop joint spatio-radio resource and hybrid precoding algorithms for limited feedback wideband massive MIMO systems.The contributions of this paper are summarized as follow.

•
First, we consider the problem of joint hybrid precoder design with limited feedback and user-beam selection to maximize the sum proportional rate under the total power constraint.The formulated mixed integer programming problem is then transformed to the relaxed-convex optimization problem.• Second, a low complexity suboptimal solution is provided for the optimization problem.
The algorithm generates the analog beamforming matrix, digital beamforming matrix, and the set of users in each group.The DFT/eigenmodes-based analog beamforming is formed using limited statistical CSI feedback from the users.Then, the digital precoder design with users selection is done iteratively.• Finally, we develop a K-Mean algorithm based unsupervised machine learning scheme for users grouping.These users groups are used to form the limited feedback (statistical channel state information) based analog beamforming matrix.The proposed machine learning based analog beamforming along with the zero-forcing digital precoding and user scheduling gives better performance than the DFT/eigenmodes-based solution.
The rest of the paper is organized as follows.System, signal, and channel model along with the problem formulation are described in Section 2. Section 3 introduce the relaxed-convex transformation of the formulated mixed integer optimization problem.Suboptimal solution to the joint resource allocation and hybrid beamforming based on eigenmodes and discrete Fourier transform is given in Section 4. Section 5 proposes machine learning based users grouping and beam selection for joint optimization problem.Simulation results are given in Section 6, followed by the conclusions in Section 7.
Notations: Bold upper and lower case letters denote vectors and matrices, respectively.The notations X −1 , X † , X T , X H , and tr(X) denote the inverse, pseudo-inverse, transpose, Hermitian transpose, and trace of a matrix X. vec{•} is a vector operator, diag{x 1 , ..., x n } is diagonal matrix, and ⊗ is the Kronecker product.• F denotes the Frobenius norm.The n × n identity matrix is denoted by I n .E{•} represents the expectation with respect to the random variable within the brackets.

System Model
Consider a FDD MU-MIMO downlink system where a base station (BS) with N t antennas is located at the cell center and transmits to K single antenna users as shown in Figure 2.There are G groups of users such that the group g ∈ G = {1, ..., G}.Each group contains K g users.Assume that the BS and users have the knowledge of the channel.We consider multi-carrier OFDM transmission with narrow-band blocK-fading channel.The BS is equipped with N t antennas in linear antenna array (ULA) configuration.The information signal block S ∈ C K×N f at the input of the BS transceiver for the user k is given as and for the subchannel n, where N f and N s are the number of subchannels and the number of symbols per subchannel, respectively.In a subchannel n, the information symbol vector is s ∈ C N s ×1 .We assume N s = K, such that the transmit signal per subchannel n satisfying E{s n s H n } = P n K I K , where P n = P T /N f is the transmit power per subchannel and P T is the total transmit power of the BS.The transmit signal vector X is obtained from F B S, where F B ∈ C N t ×N s is the precoding matrix.The hybrid beamforming divides the precoding matrix into baseband digital precoding matrix F DB ∈ C N RF ×N s and RF analog precoding matrix F AB ∈ C N t ×N RF , where N RF is the number of RF chains as shown in Figure 3.The transmit signal X ∈ C N t ×N f is given by Also, the precoding matrix must satisfy since The transmit signal in subchannel n is x n ∈ C N t ×1 .Thus, the received signal vector y n ∈ C K×1 at K users in subchannel n is given by where ] H being the channel vector from BS to user k in subchannel n, x n = F DB n F AB s n , and w n ∼ CN (0, σ 2 I K ) be the additive white Gaussian noise (AWGN) in subchannel n at the users.The RF beamforming F RF is performed in time domain and the same beamforming is applied on all subchannels, whereas, the digital beamforming F DB n is performed in frequency domain on the per subchannel basis [11].In the n th subchannel, the j th UE receives the sum of all transmitted signals for K UEs over its MIMO channel H j,i as where h j,n is the N t × 1 channel vector.We denote the rank of the channel matrix H j,n by r j,n , where 0 ≤ r j,n ≤ min(K, N t ), ∀n.In matrix form, the above equation is given as The 1 × N f received signal at the k th UE is given by Combining the signals for all UEs in a K dimensional received signal vector y = [y 1 , ..., y K ] H , we get the system equation as where Y, W ∈ C K×N f .

Channel Model
Generally, massive MIMO channel models are categorized in two types (i) analytical models and (ii) physical models [4].Analytical models are commonly used for the theoretical analysis of wireless communication systems.The most commonly used analytical model is Kronecker channel model.It is a correlation-based model and characterizes the MIMO channel matrix in terms of the separate transmit and receive side spatial correlation matrices [24], under the above assumptions, the channel model H is simplified to Kronecker model, where K ∼ CN (0, 1) is an i.i.d. unit variance MIMO channel matrix, R tx and R rx are the transmit and receive corrrelation matrices, respectively.The transmit and receive correlation matrices are given as [24], The physical models explicitly model wave propagation parameters like the complex amplitude, DoD, DoA, and delay of an MPC [24,25].MmWave propagation leads to limited spatial scattering due to the high free-space pathloss.In addition, the large tightly packed antenna arrays lead to high levels of antenna correlation.The sparse scattering and antennas spatial correlation makes many of the commonly used statistical fading distributions inaccurate for mmWave channel modeling.Therefore, we use extended Saleh-Valenzuela model, which accurately describes the mathematical structure present in mmWave channels [26,27].For simplicity, we assume that each scattering cluster around the transmitter and receiver contributes a single propagation path [28].
In general, the mmWave MIMO channel matrix between the BS with N t transmit antennas and a user k with n r receive antennas in subchannel n, can be modeled as double directional channel, where L is the total number of multipaths, α k,n,l is the complex gain of the l th path with i.i.d.CN (0, 1), and ρ k,n is the distance dependent pathloss between the BS and user k [29].The LOS path is included with l = 0.Moreover, a and b are the receive and transmit steering vectors, respectively.The variables φ k,n,l ∈ [0, 2π) and θ k,n,l ∈ [0, 2π) are the l th path's azimuth angles (boresight angles in the receive array and transmit array) of arrival and departure, respectively.The steering vectors are given by The elements of transmit and receive steering vectors are given by where λ is the wavelength, ω = 2π λ , τ i is the beamforming delay, and d t and d r are the antenna spacing at the transmitter and receiver, respectively.
The channel matrix in (14) can also be written in more compact form as where ν = N t n r ρL and, A k,n and B k,n consist of stacked steering vectors of AoA and AoD, respectively, i.e., where such that E{ ᾱk,n,l } = 0 and E{ ᾱ2 k,n,l } = 1.Substituting (20) in (13) and averaging over small scale fading, we get the transmit and receive correlation matrices for user k in the subchannel n as For mmWave massive MIMO systems with large number of antennas, the steering vectors are asymptotically orthogonal to each other [6]: Moreover, in mmWave massive MIMO, acquisition of the instantaneous full CSI is not practical.Instead, an average CSI in terms of [A k,n ], [B k,n ], and [Σ k,n ] is a practical solution for the beamforming design because the coherence time of the channel statistics based CSI is of the order of few seconds or more as compared to the small scale of the order of milli-second [6].

Problem Formulation
The hybrid beamforming divides the beamforming matrix into two parts: covariance-based pre-beamforming matrix F AB realized by analog beamformers and the reduced dimension MU-MIMO digital precoding based on the effective channel H H F AB (omitting the subchannel subscript for simplicity).We assume that K users are divided into G groups, such that, the group g contains K g number of users.Since users are near the ground level and surrounded by the scatterers compared to the scatterer-free elevated base-station, we assume one-ring model [1] and all users in group g experience the same azimuth center angle (θ g ) and angular spread (∆ g ).In this case, R rx = I in (12), therefore, the channel covariance matrix of each user in group g is given by [30] for which the eigenvalue decomposition gives where U g ∈ C N t ×r g is a tall unitary matrix (U g U H g = I) comprises the eigenvectors of R g and Λ g ∈ R r g ×r g is diagonal matrix with r g nonzeros positive eigenvalues along the diagonal.The i, j − th element of covariance matrix R g represents the correlation between the channel coefficients antenna element i and j as where d is the distance between antenna elements of ULA and λ is the wavelength of carrier frequency.
Using the Karhunen-Loeve representation, the channel vector of user k in group g is given as where z g ∈ C N t ×1 ∼ CN (0, I r g ) and hg k is beam domain channel.For large N t , U g tends to discrete Fourier transform (DFT) matrix . Each column of U g represents one direction of angle-of-departure (AoD), i.e., a beam.
Alternatively, for the case, when dominant eigenvalues rg ≤ r g , then, the channel matrix can be written as ( [13], Equation ( 5)) The limited feedback-based hybrid beamforming consists of analog pre-beamforming matrix F AB g ∈ C N t ×N RF,g responsible for spatial group formation and inter-group interference mitigation; and the digital multi-users precoding matrix F DB g ∈ C N RF,g ×S g for spatial multiplexing inside the group and inter-user interference mitigation.Here, N RF,g is the number of RF chains for group g such that S g < N RF,g < rg and S g is the number of multi-carrier information symbols vectors for group g with N RF = ∑ G g=1 N RF,g and S = ∑ G g=1 S g .The overall analog pre-beamforming matrix F AB ∈ C N t ×N RF is given by and the overall digital beamforming matrix F DB ∈ C N RF ×N s is given by and the overall channel matrix where the channel matrix of group g is defined as The analog pre-beamforming F AB g is based on the slowly varying channel covariance matrix R g and can be implemented by the DFT matrix (when N t is large), whereas, the digital beamformer F DB g is based on the instantaneous channel information of the reduced dimension effective channel H H g F AB g .The overall effective channel is given by The excessive pilot transmission in downlink and feedback in uplink of FDD system can be reduced by only sending the group-wise average CSI based channel estimates in uplink.This is accomplished by using the diagonal elements H H g F AB g as feedback information with the size of K g × N RF,g for g = 1, ...G.The analog pre-beamforming is designed in such a way that the other elements of matrix (32) H H g F AB g ≈ 0 for all g = g.This group-wise division creates virtual sectors, each group corresponds to a virtual sector [30].
The second order channel statistics-based RF beamformer F AB remains the same across multiple coherence blocks which gives the effective instantaneous channel between BS and user k as with h n,g k ,e f f ∈ C N RF,g ×1 .Therefore, channel statistics-based CSI sufficiently reduces the feedback overhead on each user, otherwise, for instantaneous CSI, each user have to send the N t × 1 size of channel estimate on the uplink channel.The covariance of effective channel h H n,g k ,e f f is given by using (13) as, The analog beamformer consists of columns of the DFT matrix, which can be easily implemented by phase shifter network.Therefore, F AB n,g can be obtained by eigenvalue decomposition of channel covariance matrix.With the group-wise hybrid beamforming, the received signal y g,n for group g in subchannel n becomes and the received signal of user k in group g in subchannel n is given by Inter-user interference Inter-group interference The received signal to interference and noise ratio (SINR) at the user k in group g and subchannel n is given by The spectral efficiency of user k in group g and subchannel n is expressed as where Ψ g k ,n is the binary variable such that it is equal to 1 if user k is selected in group g in the subchannel n.In order to achieve balance tradeoff between throughput and fairness [32], we use proportional fairness (PF) based throughput maximization.We define per user proportional fairness metric as where Rg k ,n (t) is average throughput (moving average) over a past window of length T w = 1/α [33], as The large number of antennas in massive MIMO systems enable the use of the eigenmodes of the channel covariance matrix, i.e., B k,n comprises of the columns of the DFT matrix [6].DFT-based beams with N t = 16 and N t = 64 are shown in Figure 4a,b, respectively.The beam steering matrix B k,n consists of selected columns of N t × N t DFT matrix ∆ N t such that where consisting of all eigenmodes and Υ n is an N t × r R binary beam selection matrix, with r R is the rank of the channel covariance matrix.The selection matrix Υ n ∈ C N t ×r R with only a single one on each row and column such that ∑ i [Υ n ] i,j = 1 ∀j.Now we formulate our optimization problem for joint spatio-radio resource allocation and precoders design with the objective to maximize the utility function as max subject to The above optimization problem is a mixed integer programming (MIP) problem with coupling between the digital and RF precoders in the power constraint.This MIP problem is NP-hard [14].

Relaxed-Convex Transformation
Though the above MIP optimization problem is NP-hard, it can be transformed to a relaxed convex optimization problem by (i) relaxing the binary integer constraints to real number between 0 and 1 [14], and (ii) decoupling the digital and analog precoders.For decoupling purpose, we make use of change of variables F DB n = (F AB H F AB ) − 1 2 FDB n , where FDB n is the equivalent digital precoder [34].Thus, the problem in (42) can be written as max subject to For a given RF precoder F AB and the knowledge of perfect CSI at the base-station, the digital precoder can be obtained by conventional MU-MIMO techniques, e.g., the zero-forcing and block diagonalization [15].
For the digital precoder, we adopt the ZF precoder for no multiuser interference among the users in each groups.The beamforming vector of user k is chosen to be orthogonal to the effective channel vectors of all the other users in the group.Zero-forcing is a suboptimal but low complexity approach within the linear precoders' class.ZF precoder is asymptotically optimal among all downlink beamforming techniques in high SNR region.It guarantees high spectral efficiency for large-scale antennas with low-complexity linear processing [35].For N t N r , it has shown that zero-forcing beamforming can achieve up to 98% of the non-linear dirty paper coding (DPC) capacity [36].In order to make this paper self-contained, we describe the block diagonalization briefly.Since digital precoder is used to mitigate the multiuser interference within a groups and all groups are independent, we omit the subscript g.First we consider the downlink transmission over one subchannel n with the general case of BS with N t antennas and K n users with n k antennas each, such that ∑ K k=1 n k = N r .The downlink channel on the subchannel n is expressed as N r × N t matrix, For user k, we define the following (N r − n k ) × N t channel matrix Let the rank of H k,n,e f f be denoted by r k,n , then the nullspace of H k,n,e f f has dimension N t − r k,n ≥ n k .Performing the SVD of each user's channel matrix in subchannel n leads to the following where U k,n and V k,n are the unitary matrices.The columns of U k,n are the left singular vectors of H k,n,e f f , the columns of V k,n are the right singular vectors of H k,n,e f f , and Σ k,n is a diagonal matrix in which the diagonal entries are the singular values of H k,n,e f f .In the last equality of (46), V k,n holds the first r k,n right singular vectors of H k,n,e f f and V (0) k,n contains the N t − r k,n singular vectors of H k,n,e f f which are in the nullspace of H k,n,e f f .The columns of V (0) k,n are best suited for user k beamforming matrix F DB k,n , because they will provide zero interference at other UEs.Usually V (0) k,n contains more number of columns than the n k , therefore we use some linear combinations of the columns of V (0) k,n to make at most n k columns. where k,n gives the matrix with columns as the linear combinations of the columns of k,n represents the r k,n singular vectors with nonzero singular values of H k,n,e f f V (0) k,n .The Equation (47) can also be written as, The transmit beamforming matrix that maximizes the user k throughput without any inter-user interference is obtained as, The transmit digital beamforming matrix for subchannel n is defined as where H F B k,n = I, 1 ≤ k ≤ K n and P n is a block diagonal matrix whose elements scale the power allocated to each interference-free virtual subchannel for all UEs.The receive combining matrix for this user is U k,n [37].
In the case of single antenna users, complete diagonalization is achieved entirely at the BS by channel inversion, i.e., F DB n = (H H n,e f f ) † , where (H H n,e f f ) † is the pseudo-inverse of H H n,e f f [38].
where β n is a normalization factor chosen to satisfy the power constraint and is given by Using the definition of the pseudo-inverse, we get, where is the regularization parameter, = 0 for ZF precoding and = N s N RF η for regularized ZF, with η = P T,n /σ 2 .Lastly, introducing the group subscript again, the SINR of user g k is given by and the PF sum rate is calculated as

Suboptimal Solution
Joint optimization of analog and digital beamformers is challenging because they use different channel information for the design of analog and digital beamformers.Hybrid beamforming methods consider decoupled designs of analog and digital beamforming to reduce the complexity of joint optimization, but the main challenge remains the use of different channel information.To approximate the optimal solution to this mixed integer programming problem, we summarize our proposed algorithm below: The analog precoder is formed by selecting K g columns of DFT matrix of eigenvectors of channel covariance matrix R g of users' group g in (41) to minimize the inter-group interference I g , min subject to where To solve the MIP problem, we divide the solution into two parts.
In the first part, we get the analog precoder using the selected columns of the DFT matrix which maximize the PF sum-rate.The inherent benefit of the DFT matrix is its constant modulus which enables the use of analog phase shifters and RF switches to realize the analog beamforming.In the second part, for the given analog precoder, intra-group users scheduling is performed and a ZF digital precoder is designed to maximize the sum-rate utility function.The decoupling of the analog and digital precoders design makes the solution suboptimal but tractable [34].The joint hybrid beamforming and user scheduling Algorithm 1 takes K g , N f , N RF , N t , and K.It generates the analog beamforming matrix F AB , digital beamforming matrix F DB , and the set of users in each group.The first part of the algorithm (line 9 to line 19) forms DFT/eigenmodes-based analog beamforming using limited statistical CSI feedback from the users.Beam and users pairing within each group is taken place in this part of the algorithm.The while loop at line 12 executes till all the binary combinations in N t × N t are exhausted with the condition that each column contain exactly one binary 1 and total number of 1s are equal to the number of streams (or number of RF chains).The second part (starts from line 39) assigns the radio resources to users to maximize the utility function.
Algorithm 1 Joint Resource Allocation and Hybrid Beamforming Design Algorithm.while k ≤ K g do

23:
Compute k + + The Algorithm 1 is illustrated in flowchart Figure 5.

Machine Learning: K-Means Based Optimal Users Grouping for Analog Beamforming
In this section, we use machine learning technique to group the users.Then, the DFT based fixed switched-beams are used to realize the analog beamforming matrix.The joint users scheduling and hybrid beamforming architecture with ML-based users grouping is shown in Figure 6.  (1, y (1) ), (x (2) , y (2) ), (x (3) , y (3) ), ..., (x (m) , y (m) )}, where the i th example (x (i) , y (i) ) consists of the i th instance of feature vector x (i) and the corresponding label y (i) .Given a labeled training dataset, these algorithms try to find the decision boundary that separates the positive and negative labeled examples by fitting a hypothesis to the input dataset.Unsupervised machine learning algorithms, on the other hand, are given an unlabeled input dataset.These algorithms are used for extracting information or features from the dataset.These features might be related, but not confined, to the underlying structures or patterns in the input data, relationships in data items, grouping/clustering of data items, etc. Discovered features are meant to provide a deeper insight into the input dataset that can subsequently be exploited for achieving specific goals.Clustering algorithms make an important part of unsupervised learning where the input examples are grouped into two or more separate clusters based on some features.The K-Means (KM) algorithm, is probably the most popular clustering algorithm.It is an iterative algorithm that starts with a set of initial centroids given to it as input.During each iteration, it performs the following two steps.

1.
Assign Cluster: For every user, the algorithm computes the distance between the user and every centroid.The user is then associated to the cluster with the closest centroid.During this step, a user might change its association from one cluster to another one.

2.
Recompute centroids: Once all users have been associated to their respective cluster, the new position of centroid for every cluster is then calculated.
Figure 7a depicts how the cluster centroids keep moving across iterations until the system stabilizes for an example network consisting of thirty users being grouped in five clusters.The system becomes stable in only five iterations and the final cluster layout is shown in Figure 7b.
Let us define the following notations to be used later in this section.
K = Total number of clusters being formed.
x (i) = Location coordinates of user u (i) .In our case, x (i) ∈ IR 2 c (i) = Cluster to which the user u (i) is currently associated.
µ c (i) = Centroid of the cluster to which the user u (i) is currently associated.Now the cost function J can be defined as with the following optimization objective function.In this section, we use the KM algorithm for optimal clustering of m users competing for resources in a particular cell.The clustering is performed based on their geographic thus our input dataset {u (1) , u (2) , u (3) , ..., u (m) } has m vectors u (i) , 1 ≤ i ≤ m, consisting of location coordinates, of ith user.For the sake of simplicity, we assume these users are deployed in a two dimensional area, i.e., a plane and so 2 ), i.e., an ordered pair of location coordinates.Our clustering algorithm is summarized in Algorithm 2.
The proposed algorithm takes the location coordinates of m users as input.It also takes two numbers min k and max k as additional input.The algorithm outputs the best number of clusters, k, such that min k ≤ k ≤ max k , and corresponding members of each cluster.It starts with k = min k and randomly selects k user locations as the initial centroids (line 6).It assigns the closest centroid to each user (line 8) and then computes new centroids by calculating the center/average location of all nodes in each cluster (line 11).So, in effect, the location of centroids keeps moving in successive iterations.It repeats the above two steps until the change in centroid positions is zero or negligible.We repeat the test max t times with a new set of randomly chosen initial centroids every time.During every test, the discovered centroids, corresponding centroid assignment to users, and the cost are saved (lines 14-16) for later comparison.After running the loop for max t times, we select and store the best k centroids resulting from the test with the lowest cost while discarding the remaining (lines 19-21).
The same is repeated for the next value of k, i.e., k = k + 1, until k > max k .At the end we have cnt = max k − max k vectors µ k , one for each value of k, the corresponding assignment vector a k and cost c k .Finally, we choose the vector µ having the lowest cost and corresponding assignment vector a among cnt stored cases.That is the best number of clusters and corresponding centroids that the algorithm found.
a (t) = (a (1) , a (2) , a (3) , . . ., a (m) ) After the groups formation, BS sends this information to all users, where users use this information to form reduced average statistical CSI.For example, a user in a group of 5, needs to send the average statistical CSI only after 1/5 of regular feedback interval time.

Simulation Results
Consider the downlink of a multiuser massive MIMO single cell with three 120 degree sectors.We neglect inter-sector interference and focus on a single 120 degree sector served by a ULA of N t = 64 isotropic antennas at BS.The users grouping forms virtual sectors inside 120 physical sector.
In simulation, the results are obtained by averaging over 100 drops.In each drop we randomly generate spatial correlation matrices R g .For each realization of spatial correlation matrix R g , we simulate 1000 realizations of instantaneous channel H.
The joint spatio-radio scheduling and hybrid precoder scheme first forms the users groups and then selects the beams that maximizes the sum-rate through downlink training process.Secondly, it calculates the ZF based digital precoder using low dimensional effective channel feedback from the users.
Figure 8 shows the CDF of the non-zero eigenvalues of channel covariance matrix.Notice that approximately 50% of the non-zero eigenvalues are close to zero.The sum-rate increases as the number of groups increases at the cost of increased feedback overhead as shown in Figure 9.Using machine learning technique in Section 5 we can get optimal number of groups from channel covariance feedback.This results in increased sum-rate with substantial reduced feedback.The optimal G = 3 gives 27.6% increase in sum-rate compared to when G = 1 and 62.5% decrease in feedback overhead compared to G = 8.The comparison of performance of ML-based users grouping with previous work cannot be provided because there is no previous work that uses ML-based technique to reduce the CSI feedback overhead in massive MIMO systems.Many papers use users grouping in massive MIMO hybrid beamforming [3,5,39,40], but they do not utilize ML-based users grouping.Therefore, we have compared our proposed solution with two benchmarks of full-CSI (G = K) and coarse-CSI (G = 1).Figure 10 shows sum-rate with number of users at 10dB SNR.For a fixed number of groups G = 3, the increase in number of users, increases number of users per group.Due to the fixed number of groups, the feedback overhead remains the constant.Sum-rate is increasing with users because we assumed N RF = N s = K.If we fix the number of RF chains to some hardware limit, then the sum-rate will saturate at specific number of users.It can be seen in Figure 10, that increasing number of users per group decreases the slop of the sum-rate for limited CSI schemes.This decrease is due the increase in intra-group interference.
Sum-rate also depends on the number of RF chains but this dependence is not linear as shown in Figure 11.This figure shows sum-rate variation with number of RF chains N RF when N s = 8, K = 8, N t = 64, and SNR = 10 dB.Sum rate increases with number of RF chains because it yields better conditioned effective channel matrix.It can be seen that the spectral efficiency does not increase monotonically with N RF and saturates at N RF = N t where hybrid precoding is turned to the pure digital precoding.The increase in spectral efficiency with the number of RF chains comes at the cost of higher dimensional effective channel feedback overhead and power consumption in RF chains.
The spectral efficiency of the proposed scheme also varies with number of transmit antennas as shown in Figure 12.In the figure, N RF = N s = K = 8, SNR = 10 dB, and BS has 16, 64, 128 or 256 ULA antennas.The performance gain increases with the increase in number of transmit antennas because large antennas array increases the resolution of the transmit beams (also depicts in figure 4) and, hence, decreases the potential of inter-beams interference.
In general, the spectral efficiency is a function of SNR and for the SNR = 10 dB, our ML-based users grouping and hybrid beamforming scheme gives 27.6% increased sum-rate at the cost of 33.3% extra feedback overhead as compared to the coarse-CSI case (G = 1).Our proposed scheme incurs 62.5% reduced feedback at the cost of 25.2% reduction in sum-rate as compared to the full-CSI case (G = K).

Conclusions
This paper studied the limited feedback two-stage hybrid beamformimg for decomposing the precoding matrix at the base-station.The huge channel state information feedback of massive MIMO has been reduced by the channel covariance-based RF precoding and beam selection.The well-known regularized block diagonalization can mitigate the inter-group interference, but requires substantial feedback.We used K-mean algorithm based unsupervised machine learning technique for users grouping and channel covariance-based eigenmodes/discrete Fourier transforms to reduce the feedback overhead and designed a simplified analog precoder.The digital precoder is designed with joint optimization of intra-group user utility function.It has been shown that more than 50% feedback overhead is reduced by the eigenmodes-based analog precoder design.The spatio-radio resources scheduling and limited feedback-based hybrid precoding increases the sum-rate by 27.6% compared to the sum-rate of one-group case at the cost of 33.3% extra feedback overhead, and reduces the feedback overhead by 62.5% at the cost of 25.2% reduction in sum-rate, compared to the full CSI feedback.

Figure 4 .
DFT-based beams in a 120 sector.(a) DFT-based beams in a 120 sector with N t = 16; (b) DFT-based beams in a 120 sector with N t = 64.

1 : Inputs 2 :
K g : Number of users per group 3: N f : Number of subchannels 4: K n : Number of UEs to be scheduled in subchannel n 5: N RF : Number of RF chains 6: N t : Number of transmit antennas 7: Initialization

Figure 6 .
Figure 6.Joint users scheduling and hybrid beamforming architecture with ML-based users grouping.Machine learning algorithms can broadly be divided into two main categories, namely supervised learning and unsupervised learning algorithms.The former class of algorithms learn by training on the input labeled examples, called training dataset, {(x(1) , y(1) ), (x(2) , y(2) ), (x(3) , y(3) ), ..., (x (m) , y (m) )}, where the i th example (x (i) , y (i) ) consists of the i th instance of feature vector x (i) and the corresponding label y(i) .Given a labeled training dataset, these algorithms try to find the decision boundary that separates the positive and negative labeled examples by fitting a hypothesis to the input dataset.Unsupervised machine learning algorithms, on the other hand, are given an unlabeled input dataset.These algorithms are used for extracting information or features from the dataset.These features might be related, but not confined, to the underlying structures or patterns in the input data, relationships in data items, grouping/clustering of data items, etc. Discovered features are meant to provide a deeper insight into the input dataset that can subsequently be exploited for achieving specific goals.Clustering algorithms make an important part of unsupervised learning where the input examples are grouped into two or more separate clusters based on some features.The K-Means (KM) algorithm, is probably the most popular clustering algorithm.It is an iterative algorithm that starts with a set of initial centroids given to it as input.During each iteration, it performs the following two steps.

min c ( 1 )Figure 7 .
Figure 7. Change in position of centroids as K-Means clustering algorithm progresses.(a) shows the transition of cluster centroids (shown as crosses) up to iteration 5, whereas, (b) shows only the final stable state after iteration 5.In the figures, cross-sign represents cluster centroids and the colored-circle-sign represents the user associated with the same group or cluster.

Figure 8 .Figure 9 .
Figure 8. CDF of non-zero eigenvalues of channel covariance matrix R g for N t = 64.

Figure 10 .Figure 11 .Figure 12 .
Figure 10.Sum-rate Vs number of users with different number of groups and CSI, SNR = 10 dB.
(19)small scale fading at user k in subchannel n in multipath component (MPC) is given by α k,n,l with zero mean and variance σ 2 k,n,l .Assume that each MPC is i.i.d.such that ∑ L l=1 σ 2 k,n,l = 1.We can express the channel model in(19)as ∀n {Average CSI-based Beam Selection and RF Precoder Design} 9: Evaluate channel covariance matrix R g from CSI at users: R g = E{h g k h H g k } 10: Find transmit steering B k,n by Eigenvector based unitary matrix U g of R g or DFT matrix ∆ N t .11: while g ≤ G do 12: while tr(Υ g ) ≤ N RF,g , set {[Υ g ] i,i , ..., [Υ g ] N t ,N t } = {0, 1} do g {Instantaneous CSI-based Users Selection and Digital Precoder Design} 20: while n ≤ N f do 21: while b ≤ N RF,g do 22: