1. Introduction
The scarcity of available frequency band for wireless communications has led to the inclusion of millimeter Wave (mmWave) frequencies in cellular communications. This has opened the doors for massive multipleinput multipleoutput (MIMO) systems. Due to high transmission frequencies, fabrication of large number of antennas with small form factor has become possible. MmWave band has inherent hindrances, like, high pathloss and absorptionloss. It has been known that MIMO systems advantages (spatial multiplexing or diversity gain) are scaledup with the number of antennas. In summary, one can enjoy the benefits of the large bandwidth available at mmWave frequencies by combating high path and absorption losses with massive MIMO directional beamforming. Future mmWave massive MIMObased cellular networks will be as shown in
Figure 1. Due to the high pathloss on one hand and high directional gain on the other hand, the intercell interference and cell boundaries will become meaningless. The fixed area size cell boundaries of traditional cellular will probably no longer exist in the future mmWave massive MIMO systems. Narrow beams can serve distant user equipment (UE) without interfering other UEs provided that there is no obstacle between BS and intended UE, whereas a closely located UE may deprive of connection due to the obstacles.
The cost of massive MIMO is in terms of excessive feedback overhead for channel estimation along with the hardware complexity of RF chains (increased number of radiofrequency (RF) chains). The feedback overhead has been tackled separately for frequency division duplex (FDD) and time division duplex (TDD) systems. In FDD systems, the uplink channel estimation consists of fewer overheads compared to the downlink channel estimation, because generally, the number of transmit antennas
${N}_{t}$ is larger than the number of users
K, and the number of receive antennas per user
${n}_{k}$ (
${N}_{t}\gg K$ and
${N}_{t}\gg {n}_{k}$). The most common technique to reduce the downlink channel estimation overhead is joint spatial division multiplexing (JSDM) [
1]. The JSDM uses twostage precoding: second order channel statistics (covariance)based user grouping and the traditional MUMIMO linear precoding (zeroforcing) for the interuser interference mitigation based on the lowdimensional effective channel. In TDD, only uplink channel estimation is done and the downlink channel estimates are obtained by the transpose of the uplink channel using the channel reciprocity principle. The TDD massive MIMO systems suffer from pilot contamination when the BS receives nonorthogonal pilot signals from the neighboring cells. This pilot contamination degrades the channel estimation and hence, affects both uplink combining and downlink precoding.
In traditional MIMO systems, a separate RF chain (analogtodigital converter/digitaltoanalog converter, serialtoparallel/parraleltoserial converter, up/down converter etc) is required for each antenna, but the high power consumption makes it infeasible for the case of massive MIMO systems. Hybrid beamforming technique resolves this problem by dividing the precoding/combining into baseband digital processing and RF analog processing. The hybrid precoding and combining offer extra degrees of freedom in space domain with a large number of antennas and analog beamforming [
2]. The hybrid beamforming can be realized by using MUMIMO precoding as baseband digital precoding and the statistical channel state informationbased prebeamforming as RF analog precoding. This limited feedback (due to average CSI) configuration is particularly suited for massive MIMO mmWave systems with a large number of antennas but relatively small number of RF chains [
3]. It has been shown [
4], that the covariancebased limited feedback works well for mmWave massive MIMO systems, where the number of users is small with respect to the number of BS antennas and the channels are formed by a few multi path components (MPCs) with small angular spread.
Limited work has been done on the joint multiuser massive MIMO resource allocation and hybrid beamforming design. Although mmWave massive MIMO system has a potential of tremendous increase of spectral efficiency. However, the cost and power consumption of powerhungry radio frequency chains (analogtodigital converter (ADC)/digitaltoanalog converter (DAC), parallel to serial converter, serial to parallel converter, up converter/down converter) make it impractical to build a complete RF chain for each antenna. A promising solution to this problem is hybrid beamforming, where the precoder at the transmitter is divided into two parts: analog precoder and digital precoder. The analog precoder (usually a network of phase shifters) at the RF stage reduces the number of RF chains required for the digital precoder. In order to configure these precoders, the transmitter requires channel state information in the form of uplink feedback from users, but in the presence of massive antennas, this feedback becomes a huge load on the wireless uplink, especially in FDD mode. JSDM [
4] is a technique used to reduce the feedback overhead. It uses slowly varying average channel statistics to implement the analog precoder; then, the digital precoder is realized by using a lowdimensional effective channel. Till now, different variants of the JSDM have been proposed. Li et al. [
5] generalize the JSDM scheme to support nonorthogonal virtual sectorization and with multiple RF chains at both link ends. It uses the Kronecker channel model to decouple the transmit and receive beamforming. Under this channel, the analog beamformer is obtained by stacking strongest eigenbeams of the channel covariance matrix and then the digital beamformer is based on a weighted minimum mean squared error (MMSE) with effective channel. However, the Kronecker model does not characterize the mmWave channel where transmitter and receiver have coupling effects due to highly directional transmission. In [
6], the authors apply JSDM using a geometrical channel model and find hybrid precoder and combiner at transmitter and receivers, respectively. Hybrid beamforming with switches (HBwS) has been introduced in [
7], where,
$L\times {N}_{t}$ analog beamformer is controlled by
${N}_{RF}\times L$ instantaneous CSI based switches.
${N}_{t}$ is the number of transmit antennas,
${N}_{RF}$ is the number of RF chains, and
${N}_{RF}<L<{N}_{t}$. Another switchbased analog beamforming is proposed in [
8] but it requires instantaneous CSI for both switching network and the phase shifter network. Also it contains
$L={N}_{t}$. The JSDM implementation also requires the training in the downlink to estimate the channel covariance matrix. Most of the work assumes that the CSI is known at both ends. In [
9], authors consider the joint optimization of the training resource allocation and channelstatisticsbased analog beamformer design by using user centric virtual sectorization. There are different structures for the phase shifterbased analog beamformer, namely, fully connected, subconnected, and dynamically connected [
10]. Park [
11] investigate JSDM with these analog beamformer architectures. The dynamic architecture gives better result at the cost of added complexity. In [
12], authors propose a hybrid beamforming method with unified analog beamformer by Subspace Construction (SC) based on partial CSI in massive MIMO OFDM system. In [
13], statistical CSI based analog beamformer uses regularized block diagonalization to mitigate the intergroup interference and instantaneous CSI based digital beamformer utilizes the weighted MMSE to suppress intragroup interference. Jiang et al. [
14] jointly optimize the user selection and beam selection during analog beamforming design. They use Lyapunovdrift optimization framework to obtain the optimal solution. Their work only focuses on the design of statistical CSI based analog precoder and user/beam selection. Our previous work [
15] on resource allocation for transmit beamforming develops digital and analog precoders which maximize the sum rate with total power and desired number of RF chains constraints. The provided solutions require full instantaneous CSI at the transmitter and receiver, which, in case of the massive MIMO, consists of large number of pilot transmission in downlink and channel information feedback in the uplink. In this work we exploit the channel similarities by grouping (KMean machine learning) the users based on the location information. Low complexity DFT matrix based analog precoder is derived using statistical CSI. This greatly reduces the feedback overhead for the design of zeroforcing digital precoder.
Machine learning (ML) applications for the physical layer of wireless communication systems have been widely reported in [
16]. Most of the conventional transmitter and receiver blocks can be replaced by an MLbased auto encoder as suggested by the authors. The large number of antennas in massive MIMO leads to the challenging issue of channel estimation in mmWave communications. A common practice in TDD massive MIMO systems is to utilize the channel reciprocity to get the downlink CSI from uplink channel information estimates. However, in FDD, the channel reciprocity is not applicable and the downlink CSI estimation is very difficult. The downlink channel estimation is known to be hampered by the pilot contamination effect (user to basestation). The quality of channel estimates is deteriorated by the mutual interference caused by the nonorthogonal pilots in a cell. In [
17], a supervised learningbased pilot decontamination scheme for massive MIMO uplink is reported. In the proposed MLbased solution, the users’ locations in all cells and the pilot assignments stand for the input features and output labels, respectively. In [
18], a deep learning network CsiNet is used to learn the CSItocodeword transformation (codebook approach is usually adopted to reduce the feedback overhead) at users’ terminals and inverse CsiNet at basestation. The authors of [
19], suggest a learningbased antenna selection for massive MIMO systems. It uses a multiclass KNN and support vector machine (SVM) for datadriven optimal antenna selection. Wang et al. [
20] employs Knearest neighbor
$(KNN)$ supervised learning for the
N beams allocation among
K users. In [
21], a reinforcement learning based framework for radio resource management in radio access networks has been proposed. In our previous work [
22], we used neural networks to reduce the execution time of the computationally intensive resource allocation part of the joint resource allocation and hybrid beamforming design in [
15]. However, in this work, we use Kmean based unsupervised machine learning scheme to group the users based on their spatial locations. To the best of our knowledge, there is no research work that jointly consider the spatio–radio resource management and the hybrid beamforming in massive MIMO systems.
In this work, we use spatial channel covariance matrices for the analog beamforming design. We also consider the users to RF beam mapping. This mapping requires channel state information and a search over all possible beam combinations at the basestation. This search is exponential in the number of users [
23]. Due to this exponential increase in complexity, we use DFTbased eigenmode beams with RF switches.
Contribution: In this paper, we develop joint spatio–radio resource and hybrid precoding algorithms for limited feedback wideband massive MIMO systems. The contributions of this paper are summarized as follow.
First, we consider the problem of joint hybrid precoder design with limited feedback and userbeam selection to maximize the sum proportional rate under the total power constraint. The formulated mixed integer programming problem is then transformed to the relaxedconvex optimization problem.
Second, a low complexity suboptimal solution is provided for the optimization problem. The algorithm generates the analog beamforming matrix, digital beamforming matrix, and the set of users in each group. The DFT/eigenmodesbased analog beamforming is formed using limited statistical CSI feedback from the users. Then, the digital precoder design with users selection is done iteratively.
Finally, we develop a KMean algorithm based unsupervised machine learning scheme for users grouping. These users groups are used to form the limited feedback (statistical channel state information) based analog beamforming matrix. The proposed machine learning based analog beamforming along with the zeroforcing digital precoding and user scheduling gives better performance than the DFT/eigenmodesbased solution.
The rest of the paper is organized as follows. System, signal, and channel model along with the problem formulation are described in
Section 2.
Section 3 introduce the relaxedconvex transformation of the formulated mixed integer optimization problem. Suboptimal solution to the joint resource allocation and hybrid beamforming based on eigenmodes and discrete Fourier transform is given in
Section 4.
Section 5 proposes machine learning based users grouping and beam selection for joint optimization problem. Simulation results are given in
Section 6, followed by the conclusions in
Section 7.
Notations: Bold upper and lower case letters denote vectors and matrices, respectively. The notations ${\mathbf{X}}^{1}$, ${\mathbf{X}}^{\u2020}$, ${\mathbf{X}}^{T}$, ${\mathbf{X}}^{H}$, and $tr\left(\mathbf{X}\right)$ denote the inverse, pseudoinverse, transpose, Hermitian transpose, and trace of a matrix $\mathbf{X}$. $vec\{\xb7\}$ is a vector operator, $diag\{{x}_{1},\dots ,{x}_{n}\}$ is diagonal matrix, and ⊗ is the Kronecker product. ${\parallel \xb7\parallel}_{F}$ denotes the Frobenius norm. The $n\times n$ identity matrix is denoted by ${\mathbf{I}}_{n}$. $\mathbb{E}\{\xb7\}$ represents the expectation with respect to the random variable within the brackets.
2. System Model
Consider a FDD MUMIMO downlink system where a base station (BS) with
${N}_{t}$ antennas is located at the cell center and transmits to
K single antenna users as shown in
Figure 2. There are
G groups of users such that the group
$g\in \mathcal{G}=\{1,\dots ,G\}$. Each group contains
${K}_{g}$ users.
Assume that the BS and users have the knowledge of the channel. We consider multicarrier OFDM transmission with narrowband blocKfading channel. The BS is equipped with
${N}_{t}$ antennas in linear antenna array (ULA) configuration. The information signal block
$\mathbf{S}\in {\mathbb{C}}^{K\times {N}_{f}}$ at the input of the BS transceiver for the user
k is given as
and for the subchannel
n,
where
${N}_{f}$ and
${N}_{s}$ are the number of subchannels and the number of symbols per subchannel, respectively. In a subchannel
n, the information symbol vector is
$\mathbf{s}\in {\mathbb{C}}^{{N}_{s}\times 1}$. We assume
${N}_{s}=K$, such that the transmit signal per subchannel
n satisfying
$\mathbb{E}\left\{{\mathbf{s}}_{n}{\mathbf{s}}_{n}^{H}\right\}=\frac{{P}_{n}}{K}{\mathbf{I}}_{K}$, where
${P}_{n}={P}_{T}/{N}_{f}$ is the transmit power per subchannel and
${P}_{T}$ is the total transmit power of the BS. The transmit signal vector
$\mathbf{X}$ is obtained from
${\mathbf{F}}^{B}\mathbf{S}$, where
${\mathbf{F}}^{B}\in {\mathbb{C}}^{{N}_{t}\times {N}_{s}}$ is the precoding matrix. The hybrid beamforming divides the precoding matrix into baseband digital precoding matrix
${\mathbf{F}}^{DB}\in {\mathbb{C}}^{{N}_{RF}\times {N}_{s}}$ and RF analog precoding matrix
${\mathbf{F}}^{AB}\in {\mathbb{C}}^{{N}_{t}\times {N}_{RF}}$, where
${N}_{RF}$ is the number of RF chains as shown in
Figure 3. The transmit signal
$\mathbf{X}\in {\mathbb{C}}^{{N}_{t}\times {N}_{f}}$ is given by
Also, the precoding matrix must satisfy
since
$\mathbb{E}\left\{\mathbf{S}{\mathbf{S}}^{H}\right\}=\frac{{P}_{T}}{{N}_{f}K}{\mathbf{I}}_{K{N}_{f}}$, therefore,
The transmit signal in subchannel
n is
${\mathbf{x}}_{n}\in {\mathbb{C}}^{{N}_{t}\times 1}$. Thus, the received signal vector
${\mathbf{y}}_{n}\in {\mathbb{C}}^{K\times 1}$ at
K users in subchannel
n is given by
where
${\mathbf{H}}_{n}\triangleq [{\mathbf{h}}_{1,n},\dots ,{\mathbf{h}}_{K,n}]\in {\mathbb{C}}^{{N}_{t}\times K}$ is the channel matrix with
${\mathbf{h}}_{k,n}={[{h}_{1,k},\dots ,{h}_{{N}_{t},k}]}^{H}$ being the channel vector from BS to user
k in subchannel
n,
${\mathbf{x}}_{n}={\mathbf{F}}_{n}^{DB}{\mathbf{F}}^{AB}{\mathbf{s}}_{n}$, and
${\mathbf{w}}_{n}\sim \mathcal{CN}(\mathbf{0},{\sigma}^{2}{\mathbf{I}}_{K})$ be the additive white Gaussian noise (AWGN) in subchannel
n at the users. The RF beamforming
${\mathbf{F}}^{RF}$ is performed in time domain and the same beamforming is applied on all subchannels, whereas, the digital beamforming
${\mathbf{F}}_{n}^{DB}$ is performed in frequency domain on the per subchannel basis [
11]. In the
${n}^{th}$ subchannel, the
${j}^{th}$ UE receives the sum of all transmitted signals for
K UEs over its MIMO channel
${\mathbf{H}}_{j,i}$ as
where
${\mathbf{h}}_{j,n}$ is the
${N}_{t}\times 1$ channel vector. We denote the rank of the channel matrix
${\mathbf{H}}_{j,n}$ by
${r}_{j,n}$, where
$0\le {r}_{j,n}\le min(K,{N}_{t}),\phantom{\rule{0.166667em}{0ex}}\forall n$. In matrix form, the above equation is given as
The
$1\times {N}_{f}$ received signal at the
${k}^{th}$ UE is given by
Combining the signals for all UEs in a
K dimensional received signal vector
$\mathbf{y}={[{\mathbf{y}}_{1},\dots ,{\mathbf{y}}_{K}]}^{H}$, we get the system equation as
where
$\mathbf{Y},\mathbf{W}\in {\mathbb{C}}^{K\times {N}_{f}}$.
2.1. Channel Model
Generally, massive MIMO channel models are categorized in two types (i) analytical models and (ii) physical models [
4]. Analytical models are commonly used for the theoretical analysis of wireless communication systems. The most commonly used analytical model is Kronecker channel model. It is a correlationbased model and characterizes the MIMO channel matrix in terms of the separate transmit and receive side spatial correlation matrices [
24],
under the above assumptions, the channel model
$\mathbf{H}$ is simplified to Kronecker model,
where
$\mathbf{K}\sim \mathcal{CN}(0,1)$ is an i.i.d. unit variance MIMO channel matrix,
${\mathbf{R}}_{tx}$ and
${\mathbf{R}}_{rx}$ are the transmit and receive corrrelation matrices, respectively. The transmit and receive correlation matrices are given as [
24],
The physical models explicitly model wave propagation parameters like the complex amplitude, DoD, DoA, and delay of an MPC [
24,
25]. MmWave propagation leads to limited spatial scattering due to the high freespace pathloss. In addition, the large tightly packed antenna arrays lead to high levels of antenna correlation. The sparse scattering and antennas spatial correlation makes many of the commonly used statistical fading distributions inaccurate for mmWave channel modeling. Therefore, we use extended SalehValenzuela model, which accurately describes the mathematical structure present in mmWave channels [
26,
27]. For simplicity, we assume that each scattering cluster around the transmitter and receiver contributes a single propagation path [
28].
In general, the mmWave MIMO channel matrix between the BS with
${N}_{t}$ transmit antennas and a user
k with
${n}_{r}$ receive antennas in subchannel
n, can be modeled as double directional channel,
where
L is the total number of multipaths,
${\alpha}_{k,n,l}$ is the complex gain of the
${l}^{th}$ path with i.i.d.
$\mathcal{C}\mathcal{N}(0,1)$, and
${\rho}_{k,n}$ is the distance dependent pathloss between the BS and user
k[
29]. The LOS path is included with
$l=0$. Moreover,
$\mathbf{a}$ and
$\mathbf{b}$ are the receive and transmit steering vectors, respectively. The variables
${\varphi}_{k,n,l}\in [0,2\pi )$ and
${\theta}_{k,n,l}\in [0,2\pi )$ are the
${l}^{th}$ path’s azimuth angles (boresight angles in the receive array and transmit array) of arrival and departure, respectively. The steering vectors are given by
The elements of transmit and receive steering vectors are given by
where
$\lambda $ is the wavelength,
$\omega =\frac{2\pi}{\lambda}$,
${\tau}_{i}$ is the beamforming delay, and
${d}_{t}$ and
${d}_{r}$ are the antenna spacing at the transmitter and receiver, respectively.
The channel matrix in (
14) can also be written in more compact form as
where
$\nu =\frac{{N}_{t}{n}_{r}}{\rho L}$ and,
${\mathbf{A}}_{k,n}$ and
${\mathbf{B}}_{k,n}$ consist of stacked steering vectors of AoA and AoD, respectively, i.e.,
${\mathbf{A}}_{k,n}=[\mathbf{a}\left({\varphi}_{k,n,1}\right),\mathbf{a}\left({\varphi}_{k,n,2}\right),\dots ,\mathbf{a}\left({\varphi}_{k,n,L}\right)]$ and
${\mathbf{B}}_{k,n}=[\mathbf{b}\left({\theta}_{k,n,1}\right),\mathbf{b}\left({\theta}_{k,n,2}\right),\dots ,\mathbf{b}\left({\theta}_{k,n,L}\right)]$. The matrix
${\mathbf{D}}_{k,n}$ is a diagonal matrix, given as
${\mathbf{D}}_{k,n}=diag\{{\alpha}_{k,n,1},{\alpha}_{k,n,2},\dots ,{\alpha}_{k,n,L}\}$. The small scale fading at user
k in subchannel
n in multipath component (MPC) is given by
${\alpha}_{k,n,l}$ with zero mean and variance
${\sigma}_{k,n,l}^{2}$. Assume that each MPC is i.i.d. such that
${\sum}_{l=1}^{L}{\sigma}_{k,n,l}^{2}=1$. We can express the channel model in (
19) as
where
${\mathsf{\Sigma}}_{k,n}=diag\{{\sigma}_{k,n,1}^{2},{\sigma}_{k,n,2}^{2}\dots ,{\sigma}_{k,n,L}^{2}\}$ and
${\overline{\mathbf{D}}}_{k,n}=diag\{{\overline{\alpha}}_{k,n,1},{\overline{\alpha}}_{k,n,2}\dots ,{\overline{\alpha}}_{k,n,L}\}$ with
${\overline{\alpha}}_{k,n,l}=\frac{{\alpha}_{k,n,l}}{{\sigma}_{k,n,l}}$ such that
$E\left\{{\overline{\alpha}}_{k,n,l}\right\}=0$ and
$E\left\{{\overline{\alpha}}_{k,n,l}^{2}\right\}=1$.
Substituting (
20) in (
13) and averaging over small scale fading, we get the transmit and receive correlation matrices for user
k in the subchannel
n as
For mmWave massive MIMO systems with large number of antennas, the steering vectors are asymptotically orthogonal to each other [
6]:
Moreover, in mmWave massive MIMO, acquisition of the instantaneous full CSI is not practical. Instead, an average CSI in terms of
$\left[{\mathbf{A}}_{k,n}\right]$,
$\left[{\mathbf{B}}_{k,n}\right]$, and
$\left[{\mathsf{\Sigma}}_{k,n}\right]$ is a practical solution for the beamforming design because the coherence time of the channel statistics based CSI is of the order of few seconds or more as compared to the small scale of the order of millisecond [
6].
2.2. Problem Formulation
The hybrid beamforming divides the beamforming matrix into two parts: covariancebased prebeamforming matrix
${\mathbf{F}}^{AB}$ realized by analog beamformers and the reduced dimension MUMIMO digital precoding based on the effective channel
${\mathbf{H}}^{H}{\mathbf{F}}^{AB}$ (omitting the subchannel subscript for simplicity). We assume that
K users are divided into
G groups, such that, the group
g contains
${K}_{g}$ number of users. Since users are near the ground level and surrounded by the scatterers compared to the scattererfree elevated basestation, we assume onering model [
1] and all users in group
g experience the same azimuth center angle (
${\theta}_{g}$) and angular spread (
${\Delta}_{g}$). In this case,
${\mathbf{R}}_{rx}=\mathbf{I}$ in (
12), therefore, the channel covariance matrix of each user in group
g is given by [
30]
for which the eigenvalue decomposition gives
where
${\mathbf{U}}_{g}\in {\mathbb{C}}^{{N}_{t}\times {r}_{g}}$ is a tall unitary matrix (
${\mathbf{U}}_{g}{\mathbf{U}}_{g}^{H}=\mathbf{I}$) comprises the eigenvectors of
${\mathbf{R}}_{g}$ and
${\mathsf{\Lambda}}_{g}\in {\mathbb{R}}^{{r}_{g}\times {r}_{g}}$ is diagonal matrix with
${r}_{g}$ nonzeros positive eigenvalues along the diagonal. The
$i,jth$ element of covariance matrix
${\mathbf{R}}_{g}$ represents the correlation between the channel coefficients antenna element
i and
j as
where
d is the distance between antenna elements of ULA and
$\lambda $ is the wavelength of carrier frequency. Using the KarhunenLoeve representation, the channel vector of user
k in group
g is given as
where
${\mathbf{z}}_{g}\in {\mathbb{C}}^{{N}_{t}\times 1}\sim \mathcal{CN}(\mathbf{0},{\mathbf{I}}_{{r}_{g}})$ and
${\tilde{\mathbf{h}}}_{{g}_{k}}$ is beam domain channel. For large
${N}_{t}$,
${\mathbf{U}}_{g}$ tends to discrete Fourier transform (DFT) matrix
${\Delta}_{{N}_{t}}\in {\mathbb{C}}^{{N}_{t}\times {N}_{t}}$ [
31]. Each column of
${\mathbf{U}}_{g}$ represents one direction of angleofdeparture (AoD), i.e., a
beam.
Alternatively, for the case, when dominant eigenvalues
${\widehat{r}}_{g}\le {r}_{g}$, then, the channel matrix can be written as ([
13], Equation (5))
The limited feedbackbased hybrid beamforming consists of analog prebeamforming matrix
${\mathbf{F}}_{g}^{AB}\in {\mathbb{C}}^{{N}_{t}\times {N}_{RF,g}}$ responsible for spatial group formation and intergroup interference mitigation; and the digital multiusers precoding matrix
${\mathbf{F}}_{g}^{DB}\in {\mathbb{C}}^{{N}_{RF,g}\times {S}_{g}}$ for spatial multiplexing inside the group and interuser interference mitigation. Here,
${N}_{RF,g}$ is the number of RF chains for group
g such that
${S}_{g}<{N}_{RF,g}<{\widehat{r}}_{g}$ and
${S}_{g}$ is the number of multicarrier information symbols vectors for group
g with
${N}_{RF}={\sum}_{g=1}^{G}{N}_{RF,g}$ and
$S={\sum}_{g=1}^{G}{S}_{g}$. The overall analog prebeamforming matrix
${\mathbf{F}}^{AB}\in {\mathbb{C}}^{{N}_{t}\times {N}_{RF}}$ is given by
and the overall digital beamforming matrix
${\mathbf{F}}^{DB}\in {\mathbb{C}}^{{N}_{RF}\times {N}_{s}}$ is given by
and the overall channel matrix
where the channel matrix of group
g is defined as
${\mathbf{H}}_{g}\triangleq [{\mathbf{h}}_{{g}_{1}},\dots ,{\mathbf{h}}_{{g}_{{K}_{g}}}]$.
The analog prebeamforming
${\mathbf{F}}_{g}^{AB}$ is based on the slowly varying channel covariance matrix
${\mathbf{R}}_{g}$ and can be implemented by the DFT matrix (when
${N}_{t}$ is large), whereas, the digital beamformer
${\mathbf{F}}_{g}^{DB}$ is based on the instantaneous channel information of the reduced dimension effective channel
${\mathbf{H}}_{g}^{H}{\mathbf{F}}_{g}^{AB}$. The overall effective channel is given by
The excessive pilot transmission in downlink and feedback in uplink of FDD system can be reduced by only sending the groupwise average CSI based channel estimates in uplink. This is accomplished by using the diagonal elements
${\mathbf{H}}_{g}^{H}{\mathbf{F}}_{g}^{AB}$ as feedback information with the size of
${K}_{g}\times {N}_{RF,g}$ for
$g=1,\dots G$. The analog prebeamforming is designed in such a way that the other elements of matrix (
32)
${\mathbf{H}}_{g}^{H}{\mathbf{F}}_{{g}^{\prime}}^{AB}\approx 0$ for all
${g}^{\prime}\ne g$. This groupwise division creates virtual sectors, each group corresponds to a virtual sector [
30].
The second order channel statisticsbased RF beamformer
${\mathbf{F}}^{AB}$ remains the same across multiple coherence blocks which gives the effective instantaneous channel between BS and user
k as
with
${\mathbf{h}}_{n,{g}_{k},eff}\in {\mathbb{C}}^{{N}_{RF,g}\times 1}$. Therefore, channel statisticsbased CSI sufficiently reduces the feedback overhead on each user, otherwise, for instantaneous CSI, each user have to send the
${N}_{t}\times 1$ size of channel estimate on the uplink channel. The covariance of effective channel
${\mathbf{h}}_{n,{g}_{k},eff}^{H}$ is given by using (
13) as,
The analog beamformer consists of columns of the DFT matrix, which can be easily implemented by phase shifter network. Therefore,
${\mathbf{F}}_{n,g}^{AB}$ can be obtained by eigenvalue decomposition of channel covariance matrix. With the groupwise hybrid beamforming, the received signal
${\mathbf{y}}_{g,n}$ for group
g in subchannel
n becomes
and the received signal of user
k in group
g in subchannel
n is given by
The received signal to interference and noise ratio (SINR) at the user
k in group
g and subchannel
n is given by
The spectral efficiency of user
k in group
g and subchannel
n is expressed as
where
${\mathsf{\Psi}}_{{g}_{k},n}$ is the binary variable such that it is equal to 1 if user
k is selected in group
g in the subchannel
n. In order to achieve balance tradeoff between throughput and fairness [
32], we use proportional fairness (PF) based throughput maximization. We define per user proportional fairness metric as
where
${\overline{R}}_{{g}_{k},n}\left(t\right)$ is average throughput (moving average) over a past window of length
${T}_{w}=1/\alpha $ [
33], as
The large number of antennas in massive MIMO systems enable the use of the eigenmodes of the channel covariance matrix, i.e.,
${\mathbf{B}}_{k,n}$ comprises of the columns of the DFT matrix [
6]. DFTbased beams with
${N}_{t}=16$ and
${N}_{t}=64$ are shown in
Figure 4a,b, respectively.
The beam steering matrix
${\mathbf{B}}_{k,n}$ consists of selected columns of
${N}_{t}\times {N}_{t}$ DFT matrix
${\Delta}_{{N}_{t}}$ such that
where
${\Delta}_{{N}_{t}}=[{\mathbf{b}}_{1},{\mathbf{b}}_{2},\dots ,{\mathbf{b}}_{{N}_{t}}]$ consisting of all eigenmodes and
${\mathbf{Y}}_{n}$ is an
${N}_{t}\times {r}_{R}$ binary beam selection matrix, with
${r}_{R}$ is the rank of the channel covariance matrix. The selection matrix
${\mathbf{Y}}_{n}\in {\mathbb{C}}^{{N}_{t}\times {r}_{R}}$ with only a single one on each row and column such that
${\sum}_{i}{\left[{\mathbf{Y}}_{n}\right]}_{i,j}=1\phantom{\rule{1.em}{0ex}}\forall j$. Now we formulate our optimization problem for joint spatio–radio resource allocation and precoders design with the objective to maximize the utility function as
The above optimization problem is a mixed integer programming (MIP) problem with coupling between the digital and RF precoders in the power constraint. This MIP problem is NPhard [
14].
3. RelaxedConvex Transformation
Though the above MIP optimization problem is NPhard, it can be transformed to a relaxed convex optimization problem by (i) relaxing the binary integer constraints to real number between 0 and 1 [
14], and (ii) decoupling the digital and analog precoders. For decoupling purpose, we make use of change of variables
${\mathbf{F}}_{n}^{DB}={\left({{\mathbf{F}}^{AB}}^{H}{\mathbf{F}}^{AB}\right)}^{\frac{1}{2}}{\tilde{\mathbf{F}}}_{n}^{DB}$, where
${\tilde{\mathbf{F}}}_{n}^{DB}$ is the
equivalent digital precoder [
34]. Thus, the problem in (
42) can be written as
For a given RF precoder
${\mathbf{F}}^{AB}$ and the knowledge of perfect CSI at the basestation, the digital precoder can be obtained by conventional MUMIMO techniques, e.g., the
zeroforcing and
block diagonalization [
15].
For the digital precoder, we adopt the ZF precoder for no multiuser interference among the users in each groups. The beamforming vector of user k is chosen to be orthogonal to the effective channel vectors of all the other users in the group. Zeroforcing is a suboptimal but low complexity approach within the linear precoders’ class. ZF precoder is asymptotically optimal among all downlink beamforming techniques in high SNR region. It guarantees high spectral efficiency for largescale antennas with lowcomplexity linear processing [
35]. For
${N}_{t}\gg {N}_{r}$, it has shown that zeroforcing beamforming can achieve up to
$98\%$ of the nonlinear dirty paper coding (DPC) capacity [
36]. In order to make this paper selfcontained, we describe the block diagonalization briefly. Since digital precoder is used to mitigate the multiuser interference within a groups and all groups are independent, we omit the subscript
g. First we consider the downlink transmission over one subchannel
n with the general case of BS with
${N}_{t}$ antennas and
${K}_{n}$ users with
${n}_{k}$ antennas each, such that
${\sum}_{k=1}^{K}{n}_{k}={N}_{r}$. The downlink channel on the subchannel
n is expressed as
${N}_{r}\times {N}_{t}$ matrix,
For user
k, we define the following
$({N}_{r}{n}_{k})\times {N}_{t}$ channel matrix
Let the rank of
${\mathbf{H}}_{k,n,eff}^{\prime}$ be denoted by
${r}_{k,n}^{\prime}$, then the nullspace of
${\mathbf{H}}_{k,n,eff}^{\prime}$ has dimension
${N}_{t}{r}_{k,n}^{\prime}\ge {n}_{k}$. Performing the SVD of each user’s channel matrix in subchannel
n leads to the following
where
${\mathbf{U}}_{k,n}^{\prime}$ and
${\mathbf{V}}_{k,n}^{\prime}$ are the unitary matrices. The columns of
${\mathbf{U}}_{k,n}^{\prime}$ are the left singular vectors of
${\mathbf{H}}_{k,n,eff}^{\prime}$, the columns of
${\mathbf{V}}_{k,n}^{\prime}$ are the right singular vectors of
${\mathbf{H}}_{k,n,eff}^{\prime}$, and
${\mathsf{\Sigma}}_{k,n}^{\prime}$ is a diagonal matrix in which the diagonal entries are the singular values of
${\mathbf{H}}_{k,n,eff}^{\prime}$. In the last equality of (
46),
${\mathbf{V}}_{k,n}^{{}^{\prime}\left(1\right)}$ holds the first
${r}_{k,n}^{{}^{\prime}}$ right singular vectors of
${\mathbf{H}}_{k,n,eff}^{\prime}$ and
${\mathbf{V}}_{k,n}^{{}^{\prime}\left(0\right)}$ contains the
${N}_{t}{r}_{k,n}^{{}^{\prime}}$ singular vectors of
${\mathbf{H}}_{k,n,eff}^{\prime}$ which are in the nullspace of
${\mathbf{H}}_{k,n,eff}^{\prime}$. The columns of
${\mathbf{V}}_{k,n}^{{}^{\prime}\left(0\right)}$ are best suited for user
k beamforming matrix
${\mathbf{F}}_{k,n}^{DB}$, because they will provide zero interference at other UEs. Usually
${\mathbf{V}}_{k,n}^{{}^{\prime}\left(0\right)}$ contains more number of columns than the
${n}_{k}$, therefore we use some linear combinations of the columns of
${\mathbf{V}}_{k,n}^{{}^{\prime}\left(0\right)}$ to make at most
${n}_{k}$ columns.
where
${\mathbf{H}}_{k,n,eff}{\mathbf{V}}_{k,n}^{{}^{\prime}\left(0\right)}$ gives the matrix with columns as the linear combinations of the columns of
${\mathbf{V}}_{k,n}^{{}^{\prime}\left(0\right)}$. The right hand side of the equation is the SVD of
${\mathbf{H}}_{k,n,eff}{\mathbf{V}}_{k,n}^{{}^{\prime}\left(0\right)}$, where
${\mathsf{\Sigma}}_{k,n}$ is
${r}_{k,n}\times {r}_{k,n}$ diagonal matrix and
${\mathbf{V}}_{k,n}^{\left(1\right)}$ represents the
${r}_{k,n}$ singular vectors with nonzero singular values of
${\mathbf{H}}_{k,n,eff}{\mathbf{V}}_{k,n}^{{}^{\prime}\left(0\right)}$. The Equation (
47) can also be written as,
The transmit beamforming matrix that maximizes the user
k throughput without any interuser interference is obtained as,
The transmit digital beamforming matrix for subchannel
n is defined as
where
${{\mathbf{F}}_{k,n}^{B}}^{H}{\mathbf{F}}_{k,n}^{B}=\mathbf{I},\phantom{\rule{1.em}{0ex}}1\le k\le {K}_{n}$ and
${\mathbf{P}}_{n}$ is a block diagonal matrix whose elements scale the power allocated to each interferencefree virtual subchannel for all UEs. The receive combining matrix for this user is
${\mathbf{U}}_{k,n}$ [
37].
In the case of single antenna users, complete diagonalization is achieved entirely at the BS by channel inversion, i.e.,
${\mathbf{F}}_{n}^{DB}={\left({\mathbf{H}}_{n,eff}^{H}\right)}^{\u2020}$, where
${\left({\mathbf{H}}_{n,eff}^{H}\right)}^{\u2020}$ is the pseudoinverse of
${\mathbf{H}}_{n,eff}^{H}$ [
38].
where
${\beta}_{n}$ is a normalization factor chosen to satisfy the power constraint and is given by
Using the definition of the pseudoinverse, we get,
where
$\varrho $ is the regularization parameter,
$\varrho =0$ for ZF precoding and
$\varrho =\frac{{N}_{s}}{{N}_{RF}\eta}$ for regularized ZF, with
$\eta ={P}_{T,n}/{\sigma}^{2}$. Lastly, introducing the group subscript again, the SINR of user
${g}_{k}$ is given by
and the PF sum rate is calculated as
5. Machine Learning: KMeans Based Optimal Users Grouping for Analog Beamforming
In this section, we use machine learning technique to group the users. Then, the DFT based fixed switchedbeams are used to realize the analog beamforming matrix. The joint users scheduling and hybrid beamforming architecture with MLbased users grouping is shown in
Figure 6.
Machine learning algorithms can broadly be divided into two main categories, namely supervised learning and unsupervised learning algorithms. The former class of algorithms learn by training on the input labeled examples, called training dataset, $\{({x}^{\left(1\right)},{y}^{\left(1\right)}),({x}^{\left(2\right)},{y}^{\left(2\right)}),({x}^{\left(3\right)},{y}^{\left(3\right)}),\dots ,({x}^{\left(m\right)},{y}^{\left(m\right)})\}$, where the ${i}^{th}$ example $({x}^{\left(i\right)},{y}^{\left(i\right)})$ consists of the ${i}^{th}$ instance of feature vector ${x}^{\left(i\right)}$ and the corresponding label ${y}^{\left(i\right)}$. Given a labeled training dataset, these algorithms try to find the decision boundary that separates the positive and negative labeled examples by fitting a hypothesis to the input dataset. Unsupervised machine learning algorithms, on the other hand, are given an unlabeled input dataset. These algorithms are used for extracting information or features from the dataset. These features might be related, but not confined, to the underlying structures or patterns in the input data, relationships in data items, grouping/clustering of data items, etc. Discovered features are meant to provide a deeper insight into the input dataset that can subsequently be exploited for achieving specific goals. Clustering algorithms make an important part of unsupervised learning where the input examples are grouped into two or more separate clusters based on some features. The KMeans (KM) algorithm, is probably the most popular clustering algorithm. It is an iterative algorithm that starts with a set of initial centroids given to it as input. During each iteration, it performs the following two steps.
Assign Cluster: For every user, the algorithm computes the distance between the user and every centroid. The user is then associated to the cluster with the closest centroid. During this step, a user might change its association from one cluster to another one.
Recompute centroids: Once all users have been associated to their respective cluster, the new position of centroid for every cluster is then calculated.
Figure 7a depicts how the cluster centroids keep moving across iterations until the system stabilizes for an example network consisting of thirty users being grouped in five clusters. The system becomes stable in only five iterations and the final cluster layout is shown in
Figure 7b.
Let us define the following notations to be used later in this section.
Now the cost function
J can be defined as
with the following optimization objective function.
It may be pointed out that Equation (
57) allows us to compare multiple clustering layouts based on their cost and select the one with the lowest cost. The above optimization objective function constitutes a nonconvex and NPhard problem because it has many possible local minima and integer optimization variable
${c}^{\left(i\right)}$. The KM algorithm heuristically optimize this function by alternate minimization method. It iterates between two steps (Assign cluster and Recompute centroids) as described above.
In this section, we use the KM algorithm for optimal clustering of m users competing for resources in a particular cell. The clustering is performed based on their geographic location, thus our input dataset $\{{u}^{\left(1\right)},{u}^{\left(2\right)},{u}^{\left(3\right)},\dots ,{u}^{\left(m\right)}\}$ has m vectors ${u}^{\left(i\right)},1\le i\le m$, consisting of location coordinates, of ith user. For the sake of simplicity, we assume these users are deployed in a two dimensional area, i.e., a plane and so ${u}^{\left(i\right)}=({x}_{1}^{\left(i\right)},{x}_{2}^{\left(i\right)})$, i.e., an ordered pair of location coordinates. Our clustering algorithm is summarized in Algorithm 2.
The proposed algorithm takes the location coordinates of m users as input. It also takes two numbers $mi{n}_{k}$ and $ma{x}_{k}$ as additional input. The algorithm outputs the best number of clusters, k, such that $mi{n}_{k}\le k\le ma{x}_{k}$, and corresponding members of each cluster. It starts with $k=mi{n}_{k}$ and randomly selects k user locations as the initial centroids (line 6). It assigns the closest centroid to each user (line 8) and then computes new centroids by calculating the center/average location of all nodes in each cluster (line 11). So, in effect, the location of centroids keeps moving in successive iterations. It repeats the above two steps until the change in centroid positions is zero or negligible. We repeat the test $ma{x}_{t}$ times with a new set of randomly chosen initial centroids every time. During every test, the discovered centroids, corresponding centroid assignment to users, and the cost are saved (lines 14–16) for later comparison. After running the loop for $ma{x}_{t}$ times, we select and store the best k centroids resulting from the test with the lowest cost while discarding the remaining (lines 19–21). The same is repeated for the next value of k, i.e., $k=k+1$, until $k>ma{x}_{k}$. At the end we have $cnt=ma{x}_{k}ma{x}_{k}$ vectors ${\mu}_{k}$, one for each value of k, the corresponding assignment vector ${\mathbf{a}}_{k}$ and cost ${c}_{k}$. Finally, we choose the vector $\mu $ having the lowest cost and corresponding assignment vector $\mathbf{a}$ among $cnt$ stored cases. That is the best number of clusters and corresponding centroids that the algorithm found.
Algorithm 2 KMeans based users grouping algorithm. 
 1:
$cnt=0$  2:
for$k=mi{n}_{k}ma{x}_{k}$do  3:
$cnt=cnt+1$  4:
for $t=1:ma{x}_{t}$ do  5:
repeat  6:
Randomly choose initial k centroids ${\mu}_{1},{\mu}_{2},{\mu}_{3},\dots ,{\mu}_{k}$  7:
for $i=1:m$ do  8:
${a}^{\left(i\right)}=j,\phantom{\rule{1.em}{0ex}}1\le j\le k$, such that ${\mu}_{j}$ is the centroid closest to ${u}^{\left(i\right)}$  9:
end for  10:
for $l=1:k$ do  11:
${\mu}_{l}=$ mean of all users/points ${u}^{\left(i\right)}$ assigned to $lth$ centroid  12:
end for  13:
until converges  14:
${\mu}^{\left(t\right)}=({\mu}_{1},{\mu}_{2},{\mu}_{3},\dots ,{\mu}_{k})$  15:
${\mathbf{a}}^{\left(t\right)}=({a}^{\left(1\right)},{a}^{\left(2\right)},{a}^{\left(3\right)},\dots ,{a}^{\left(m\right)})$  16:
${c}^{\left(t\right)}=cost({\mu}_{1},{\mu}_{2},{\mu}_{3},\dots ,{\mu}_{k})$  17:
end for  18:
$idx={arg}_{min}\{c\left(t\right),1\le t\le ma{x}_{t}\}$  19:
${\mathit{\mu}}_{\mathbf{k}}^{\left(k\right)}={\mathit{\mu}}^{\left(idx\right)},1\le idx\le ma{x}_{t}$  20:
${\mathbf{a}}_{\mathbf{k}}^{\left(k\right)}={\mathbf{a}}^{\left(idx\right),}1\le idx\le ma{x}_{t}$  21:
${c}_{k}^{\left(k\right)}={c}^{\left(idx\right)},1\le idx\le ma{x}_{t}$  22:
end for  23:
$index={arg}_{min}\{{c}_{k}^{\left(k\right)},1\le k\le cnt\}$  24:
$\mathit{\mu}={\mathit{\mu}}_{\mathbf{k}}^{\left(index\right)},1\le index\le cnt$  25:
$\mathbf{a}={\mathbf{a}}_{\mathbf{k}}^{\left(index\right)},1\le index\le cnt$  26:
$n=index$

After the groups formation, BS sends this information to all users, where users use this information to form reduced average statistical CSI. For example, a user in a group of 5, needs to send the average statistical CSI only after $1/5$ of regular feedback interval time.
6. Simulation Results
Consider the downlink of a multiuser massive MIMO single cell with three 120 degree sectors. We neglect intersector interference and focus on a single 120 degree sector served by a ULA of ${N}_{t}=64$ isotropic antennas at BS. The users grouping forms virtual sectors inside 120 physical sector.
In simulation, the results are obtained by averaging over 100 drops. In each drop we randomly generate spatial correlation matrices ${\mathbf{R}}_{g}$. For each realization of spatial correlation matrix ${\mathbf{R}}_{g}$, we simulate 1000 realizations of instantaneous channel $\mathbf{H}$.
The joint spatio–radio scheduling and hybrid precoder scheme first forms the users groups and then selects the beams that maximizes the sumrate through downlink training process. Secondly, it calculates the ZF based digital precoder using low dimensional effective channel feedback from the users.
Figure 8 shows the CDF of the nonzero eigenvalues of channel covariance matrix. Notice that approximately
$50\%$ of the nonzero eigenvalues are close to zero. The sumrate increases as the number of groups increases at the cost of increased feedback overhead as shown in
Figure 9. Using machine learning technique in
Section 5 we can get optimal number of groups from channel covariance feedback. This results in increased sumrate with substantial reduced feedback. The optimal
$G=3$ gives
$27.6\%$ increase in sumrate compared to when
$G=1$ and
$62.5\%$ decrease in feedback overhead compared to
$G=8$. The comparison of performance of MLbased users grouping with previous work cannot be provided because there is no previous work that uses MLbased technique to reduce the CSI feedback overhead in massive MIMO systems. Many papers use users grouping in massive MIMO hybrid beamforming [
3,
5,
39,
40], but they do not utilize MLbased users grouping. Therefore, we have compared our proposed solution with two benchmarks of fullCSI (
$G=K$) and coarseCSI (
$G=1$).
Figure 10 shows sumrate with number of users at
$10dB$ SNR. For a fixed number of groups
$G=3$, the increase in number of users, increases number of users per group. Due to the fixed number of groups, the feedback overhead remains the constant. Sumrate is increasing with users because we assumed
${N}_{RF}={N}_{s}=K$. If we fix the number of RF chains to some hardware limit, then the sumrate will saturate at specific number of users. It can be seen in
Figure 10, that increasing number of users per group decreases the slop of the sumrate for limited CSI schemes. This decrease is due the increase in intragroup interference.
Sumrate also depends on the number of RF chains but this dependence is not linear as shown in
Figure 11. This figure shows sumrate variation with number of RF chains
${N}_{RF}$ when
${N}_{s}=8$,
$K=8$,
${N}_{t}=64$, and
$SNR=$ 10 dB. Sum rate increases with number of RF chains because it yields better conditioned effective channel matrix. It can be seen that the spectral efficiency does not increase monotonically with
${N}_{RF}$ and saturates at
${N}_{RF}={N}_{t}$ where hybrid precoding is turned to the pure digital precoding. The increase in spectral efficiency with the number of RF chains comes at the cost of higher dimensional effective channel feedback overhead and power consumption in RF chains.
The spectral efficiency of the proposed scheme also varies with number of transmit antennas as shown in
Figure 12. In the figure,
${N}_{RF}={N}_{s}=K=8$,
$SNR=10$ dB, and BS has 16, 64, 128 or 256 ULA antennas. The performance gain increases with the increase in number of transmit antennas because large antennas array increases the resolution of the transmit beams (also depicts in figure 4) and, hence, decreases the potential of interbeams interference.
In general, the spectral efficiency is a function of SNR and for the $SNR=10$ dB, our MLbased users grouping and hybrid beamforming scheme gives $27.6\%$ increased sumrate at the cost of $33.3\%$ extra feedback overhead as compared to the coarseCSI case (G = 1). Our proposed scheme incurs $62.5\%$ reduced feedback at the cost of $25.2\%$ reduction in sumrate as compared to the fullCSI case (G = K).