Spectrum Allocation and User Scheduling Based on Combinatorial Multi-Armed Bandit for 5G Massive MIMO

As a key 5G technology, massive multiple-input multiple-output (MIMO) can effectively improve system capacity and reduce latency. This paper proposes a user scheduling and spectrum allocation method based on combinatorial multi-armed bandit (CMAB) for a massive MIMO system. Compared with traditional methods, the proposed CMAB-based method can avoid channel estimation for all users, significantly reduce pilot overhead, and improve spectral efficiency. Specifically, the proposed method is a two-stage method; in the first stage, we transform the user scheduling problem into a CMAB problem, with each user being referred to as a base arm and the energy of the channel being considered a reward. A linear upper confidence bound (UCB) arm selection algorithm is proposed. It is proved that the proposed user scheduling algorithm experiences logarithmic regret over time. In the second stage, by grouping the statistical channel state information (CSI), such that the statistical CSI of the users in the angular domain in different groups is approximately orthogonal, we are able to select one user in each group and allocate a subcarrier to the selected users, so that the channels of users on each subcarrier are approximately orthogonal, which can reduce the inter-user interference and improve the spectral efficiency. The simulation results validate that the proposed method has a high spectral efficiency.


Introduction
The development of new applications requires that the data rate be significantly increased.However, a significant increase in data traffic represents a huge challenge to the scarce spectrum resources [1,2].Massive MIMO has become a key technology in 5G communication system, which greatly improves the system capacity and reduces latency by equipping base stations (BSs) and/or users with a large number of antennas [3,4].Millimeter wave technology has received significant attention due to its large bandwidth.The combination of millimeter wave and massive MIMO can effectively improve the bandwidth and spectrum utilization of the system [5,6].On the one hand, the millimeter wavelength is very short, which is conducive to the deployment of a large number of antennas in the system.On the other hand, antenna arrays in massive MIMO systems can bring greater freedom to the system, and using simple linear precoding at the BS can enable the system to achieve a high transmission performance [7,8].Thus, massive MIMO can compensate for the propagation loss of millimeter wave channels.Therefore, millimeter wave technology is very suitable for massive MIMO systems.In order to fully utilize the gains brought by millimeter wave massive MIMO, it is usually necessary to obtain the channel state information (CSI).
If instantaneous CSI is not available, then non-coherent (NC) techniques are considered key solutions for massive MIMO systems [9][10][11][12].The designs of NC techniques for massive MIMO systems are mainly focused on energy detection [9,10] or phase detection [11,12].Manolakos et al. proposed a soft-output detector where the transmitter modulated information during symbol amplitude determination, and the receiver obtained the average received energy [9].Ngo et al. calculated the marginal posterior of each transmitted signal using the channel distribution and proposed a novel soft-output multi-user detector [10].Baeza et al. proposed a new constellation design for a multi-user noncoherent large-scale single-input multiple-output uplink system based on M-ary differential phaseshift keying [11].Additionally, non-orthogonal constellation-based schemes to reduce the complexity and increase the spectral efficiency were proposed in [12].If instantaneous CSI is available, then massive MIMO systems are said to be coherent and have high spectral efficiency due to the available instantaneous CSI [10].In such scenarios, the symbols are usually from a scalar constellation, such as the quadrature amplitude modulation.In this paper, we consider coherent massive MIMO systems.
The massive MIMO system can serve multiple users on a single time and frequency resource.There have been many studies on the design of transmission schemes for singlecarrier massive MIMO systems.Due to the large bandwidth provided by millimeter waves, multiple subcarriers can be utilized to transmit information.How to allocate subcarriers to maximize the spectral efficiency of the system is worth studying.In orthogonal frequency-division multiplexing (OFDM) systems, there are multiple subcarriers to which the spectrum should be allocated.A simple way to achieve this is to randomly allocate the subcarriers to the users.However, this random scheme achieves a relatively low spectral efficiency.Zhang et al. considered a subcarrier allocation method in which the user moves, and each user is allocated one subcarrier [13].In order to further improve the system's spectral efficiency, Anand et al. proposed allocating multiple subcarriers to each user to ensure service quality [14].However, the above studies did not consider the user scheduling problem.
The number of users that a BS can serve in one slot is limited, the system cannot provide services for all candidate users when the number of users is large.Therefore, the user scheduling scheme plays a crucial role in the overall performance of the system [15].In massive MIMO systems, a simple scheduling scheme is a random scheduling scheme; it randomly selects multiple users for communication.This method does not consider channel information and, therefore, demonstrates poor performance.The authors in [16] proposed a sub-optimal user scheduling algorithm, which performs QR decomposition on the channel and schedules based on the results of QR decomposition, not only reducing the computational complexity but also improving the spectral efficiency.Shehata et al. proposed an iterative method to select terminals in turn by using the eigenvalue and condition number of the channel matrix between the BS and terminals [17].Bu et al. considered the user scheduling and spectral allocation problem in massive MIMO systems, and proposed a reinforcement learning-based algorithm to improve the spectral efficiency [18].However, all of these methods require the instantaneous CSI of the terminal.When there are many terminals, it is difficult to obtain the instantaneous CSI of all terminals simultaneously, meaning that these methods are no longer applicable.In addition, due to the slower variation of the statistical CSI compared to instantaneous CSI, user scheduling through statistical CSI can avoid obtaining instantaneous CSI, reduce pilot overhead, and improve spectral efficiency [19].However, the estimation of statistical CSI also requires a large overhead.When the terminal moves fast, the changes in statistical CSI will also become fast, causing the system performance to deteriorate.
This paper proposes a user scheduling and spectrum allocation scheme based on combinatorial multi-armed bandit (CMAB) for a multi-user millimeter wave massive MIMO OFDM system.The proposed CMAB-based scheme avoids the need for channel estimation for all users, which can reduce pilot overhead and improve spectral efficiency.Our approach utilizes the multi-path channel model in millimeter waves [20].The proposed scheme is a two-stage scheme.In the first stage, we transform the user scheduling problem into a CMAB problem, and the upper confidence bound (UCB) strategy is used to train and obtain the scheduled users to avoid channel estimation for each user and to improve the spectral efficiency.Each user is regarded as a base arm in the CMAB problem, and multiple users are regarded as the super arm.To improve the system's spectral efficiency, we aim to maximize the receiving energy.A linear UCB algorithm with low computational complexity is proposed.By setting the UCB value of each base arm, the UCB value of the super arm consists of a linear combination of the UCB value of each base arm, and we select the super arm to maximize the UCB value and the received energy.In addition, it is theoretically proven that the regret of the proposed scheme grows logarithmically over time.In the second stage, the instantaneous CSI estimated in previous slots is used to calculate the statistical CSI, and subcarrier allocation is performed based on the statistical CSI.By grouping the statistical CSI, such that the statistical CSI between different groups is approximately orthogonal, we are able to select one user in each group and allocate one subcarrier to the selected users, resulting in the users in each subcarrier being approximately orthogonal and, thus, improving the spectral efficiency.The simulation results show that the proposed method can significantly improve the spectral efficiency compared with previous methods.

System Model
Consider a millimeter wave uplink massive MIMO OFDM system in a single cell with multiple users, where the BS is equipped with a uniform linear array of antennas serving multiple single-antenna users.The communication bandwidth B is divided into F subcarriers with subcarrier spacing B F .This paper adopts a digital combiner design scheme at the BS [21].Let h k, f ,t be the channel vector between the BS and user k in the t-th slot at the f -th subcarrier and s k, f ,t be the transmitted signal from user k.Then the received signal at the BS at the f -th subcarrier is where W f ,t ∈ C M×1 is the digital combiner in the t-th slot at the f -th subcarrier, and n f ,t is the noise.Each element in n f ,t follows a Gaussian distribution whose mean is zero and variance is σ 2 .
In this paper, we consider a multi-path channel model in millimeter wave communications, and the channel is narrowband time-varying [13].The channel vector between the BS and user k is given by [21] where L k, f ,t is the number of paths of user k at the f -th subcarrier, is the steering vector, and the i-th element of . Usually, the statistical CSI remains unchanged for a long time.Suppose that the statistical CSI remains unchanged in T slots; this means that the angles θ k, f ,l,t remain unchanged in T slots, and the covariance σ 2 k, f ,t also remains unchanged.Then, in T slots, the channel vector in (2) can be transformed into where ρ t (θ k,l ) ∼ CN 0, σ 2 k I , and In the user scheduling problem, the conventional schemes design the user scheduling based on the instantaneous CSI.Then, it is necessary for the BS to obtain the instantaneous CSI of all users in real time.Due to the large number of users, it takes a lot of time to estimate the channel, which will lead to a decrease in the system performance.This paper proposes a method based on multi-armed bandit for user scheduling and proposes a subcarrier allocation method based on statistical CSI.In the process of spectrum allocation and user scheduling, this paper assigns each user a subcarrier and serves U users on each subcarrier.Key notations used in this work are listed in Table 1.

Notations Parameters
B Bandwidth

F
The number of subcarriers The channel vector between the BS and user k in slot t at subcarrier f The transmitted signal from user k in slot t at subcarrier f The digital combiner in slot t at subcarrier f The number of paths of user k at the f -th subcarrier The angle of the l-th path of user k in slot t at subcarrier f ρ The complex path gain

a(•)
The steering vector The variance of the Gaussian noise The power gain of user k in slot t at subcarrier f The power gain of user k The angle of the l-th path of user k in the t-th slot

A i
The action The value of the action in slot t

R i
The reward in slot i The set of selected users in slot t at subcarrier f The channel matrix of the selected users in slot t at subcarrier f

A t
The set of users selected in slot t

X f
The pilot signal at subcarrier f

Y f
The received signal at subcarrier f The set of users selected in slot t ūA t The UCB value of the super arm A t Ri,t The mean reward of action i slot t

D f
The discrete Fourier transform matrix at subcarrier f The channel vector ĥk, f ,t in the angular domain

Notations Parameters R a k
The statistical CSI on the angular domain The chord distance between X and Y U X (U Y ) The eigenvectors corresponding to the non-zero eigenvalues of X(Y)

The Description of the CMAB
The MAB problem [22] revolves around selecting actions with the goal of maximizing rewards and expectations for the selected actions.After each action is executed, a reward is obtained, and the expectation of the reward is called the value of the action.The value of the action at the t-th slot is Q t (A i ), which can be calculated as where R i denotes the reward of the action A i in the i-th slot.When the action A i at the i-th slot is chosen, then Then, it is possible to evaluate the quality Q t (A i ) of actions and develop strategies for selecting actions, including greedy strategies, UCB strategies, etc. [22].Unlike the MAB problem, in the CMAB problem [23], multiple actions need to be selected each time, and the reward expectation for multiple actions needs to be maximized.Each action is called a base arm, while the multiple actions selected each time are called a super arm.A super arm consists of multiple base arms.

The User Scheduling Problem Formulation
In each time slot, we select UF users among K users for communication.Each subcarrier serves U users, and there are F subcarriers in total.The purpose is to enable the selected users to obtain a large spectral efficiency.The spectral efficiency is related to the signal-to-interference-plus-noise ratio (SINR).The received energy is composed of the energy of the received useful signal and the energy of the Gaussian noise.By maximizing the received energy, the energy of the received useful signal will be increased, resulting in improved SINR and spectral efficiency.It should be noted that the received energy is affected by the channel energy.The larger the channel energy, the greater the received energy.Therefore, in order to improve the spectral efficiency of the system, in this paper, we hope that the channel energy of the selected UF users is large.Assuming that the set of selected users in the t-th slot at the subcarrier f is A f ,t , let H f (t) be the channel matrix of the selected users in the t-th slot at the subcarrier f .We define H f (t) 2 F as the equivalent channel energy of the selected users.The problem of maximizing the channel energy can be formulated as max In the above problem, due to the large number of users, it is difficult to estimate the instantaneous CSI of all users simultaneously.Without the CSI of all terminals, traditional optimization-based scheduling schemes are not feasible.This paper adopts a method based on a CMAB to select users.
We first transform the user scheduling problem into a CMAB problem.We define K users as K base arms of the CMAB problem, denoted as 1, 2, • • • , K. Each user (base arm) has a channel energy.As shown in Section 2, the CSI of user k corresponding to the subcarrier f at the t-th slot is h k, f ,t , and let h k, f ,t 2 2 be the reward of the base arm k.In each time slot, the selected users on all subcarriers are marked as a super arm, which is denoted as A t .At this point, the reward for the super arm A t is We select the users in each time slot based on a strategy to maximize the total reward.Assuming that the set of users selected for the t-th slot is A t , and the set of the users selected on the subcarrier f is A f ,t , we have A t = f A f ,t .
When the users scheduled in each subcarrier are obtained, the coherent transmission scheme is shown as follows.Due to the lack of interference between signals on different subcarriers, when estimating the channel, the pilot signal sent by each user on the subcarrier f to the base station is X f .Therefore, the signal received at the base station on the subcarrier where N f (t) is the noise matrix in the t-th slot at subcarrier f , and each element of N f (t) is a Gaussian noise with zero mean and the variance is σ 2 .Then we use the zero forcing criterion to estimate the instantaneous CSI.By multiplying (7) with X H f , we can obtain Then the estimation of H f (t After the channel estimation is obtained, the MMSE detection operation is adopted for the channels on each subcarrier, so as to reduce the interference and improve the spectral efficiency of the system.When using minimum mean square error (MMSE) detection, the combining matrix where (•) † is the pseudo-inverse operation.The spectral efficiency of the system is where w f ,u is the u-th column of W f (t), h f ,u (t) is the u-th column of H f (t), and σ 2 is the variance of the Gaussian noise.

Linear UCB User Scheduling Algorithm
From above, it can be seen that the channels of each user cannot be obtained simultaneously, so traditional optimization methods using the statistical CSI cannot be used.We propose the CMAB method for user scheduling.For the above model, it can be seen that in the multi-path channel model, the reward of user k (basic arm k) is where is a Chi-squared distributed random variable with the degree of freedom 1, and it also follows a sub-exponential distribution with parameter σ 2 k,l , 4σ k,l [24].According to the characteristics of sub-exponential distribution, 2 is also a sub-exponential distribution with parameter From Section 2, σ 2 k,l is normalized in [0, 1].Due to the large number of user combinations, the number of super arms grows exponentially with the number of users.In order to reduce the complexity of the algorithm-considering that the UCB strategy can provide a better performance-we propose a linear UCB strategy.This strategy has a complexity that is linear with the total number of users.Since the reward for the super arm is the linear sum of each combination arm, in the linear UCB strategy, the UCB value of the combination arm is set to the sum of the UCB values of the base arm.Considering that the reward of each arm obeys sub-exponential distribution, the UCB value of each arm is defined as where c = max σ 2 k, f ,l , m i,t denotes the number of the action i that has been selected, and Ri,t = Ri,t−1 m i,t−1 +r i,t−1 m i,t−1 +1 denotes the mean reward.From σ 2 l,k ∈ [0, 1], we have c = 1.Then we define the UCB value of the super arm A t as It can be seen that the UCB values of each super arm can be calculated based on the UCB values of the base arm, greatly reducing the computational complexity.In the t-th slot, we select A t to maximize the UCB value of the super arm, and the problem can be described as max The above problem is an unconstrained discrete optimization problem.In order to obtain the optimal solution, we calculate the UCB values for all actions, and then sort the UCB value u i (t).Finally, the users corresponding to the FU-largest UCB values are selected for communication.The user scheduling algorithm is shown in Algorithm 1.

Algorithm 1: The CMAB-based user scheduling algorithm
Input: m i,t−1 , Ri,t−1 ; 1: Use (12) to calculate the UCB values of all the base arms; 2: Sort the UCB values of all the base arms; 3: Select the users corresponding to the FU-largest UCB values; the set of these users are A t ; 4: For each action a ∈ A t , set m a,t = m a,t−1 + 1 and Ra,t = Ra,t−1 m a,t−1 +r a,t−1 m a,t−1 +1 ; Output: A t .

Theorem 1. The regret of the proposed linear UCB scheduling algorithm is upper-bounded by O(ln t).
Proof.Firstly, we define a variable B i,t to record the number of times the base arm i has been selected in the t-th slot.Considering the t-th slot, if the optimal combination arm A * is selected, the value B i,t will not change.If a non-optimal super arm A t is selected, then B i,t is added by 1 in the t-th slot, where i = min j∈A t m j,t .It is easy to prove that in the t-th slot, the number of times a non-optimal combination arm is selected is . Then, we define a variable V i,t .If the value B i,t at the t-th slot is increased by 1, V i,t is 1.If the value B i,t remains unchanged, then V i,t is set to 0. Let l be a positive integer, then we have where 1{a} is a function.If event a is true, then 1{a} = 1.When event a is false, then 1{a} = 0.If V i,p = 1, a non-optimal action is selected, and where From |A * | = U , A p+1 = U, and l ≤ B i,p ≤ m i,p , we have If Rg j ,t j + v g j ,t j , then one of the following three events must be satisfied. where R g j and R i is the mean reward of the action i.For the event ε 1 , we have follows a sub-exponential function with parameter Using the Hoeffding inequality, we have From the above, we have c h j ,t j ≥ c 2K(M+1) ln t m h j ,t j , and then e Similarly, we have , and Then we have and Then we can obtain 2 .Meanwhile, we can obtain Thus, Then, we have P{E 3} = 0. Note that Thus, (Z i,t ) ≤ l + 1 + π 2 3 M = O(ln t), and then where ∆ max = max a ∆ a .

Spectrum Allocation Algorithm Based on Statistical CSI Grouping
After obtaining the scheduled users, it is necessary to further allocate the subcarriers to the scheduled users.The goal of subcarrier allocation is to maximize the spectral efficiency.We hope that the user channels on the same subcarrier are approximately orthogonal to each other.If the channels between users are approximately orthogonal, then the inter-user interference will be low, and the spectral efficiency of the users will increase.
Since the instantaneous CSI cannot be obtained, we use the statistical CSI to allocate the subcarriers.Due to the estimation of each user's channel in the previous slots, the statistical CSI of the users can be calculated based on the instantaneous CSI in the previous slots.Using the statistical CSI to allocate the subcarriers greatly reduces the user's overhead and improves the spectral efficiency.Suppose the estimation of user k's channel vector in the t-th slot at subcarrier f is ĥk, f ,t .If this is the case, then its angular domain expression can be obtained as where D f is the discrete Fourier transform matrix at the f -th subcarrier.This is due to the fact that one user's channel vector is different on the different subcarriers but the angular domain representation of one user's channel vector on the different subcarriers are the same.In order to better group the statistical CSI, the calculation method for the statistical CSI on the angular domain (CSIA) is defined as After obtaining the CSIA of each user, the next step is to design a subcarrier allocation method.When allocating subcarriers, it is desired that the user channels on each subcarrier are approximately orthogonal.As a result, the serving users are divided into groups, where the channel vectors in different groups are approximately orthogonal.We select one user in each group to obtain one user as the service user on a subcarrier; thus, the users on each subcarrier are more orthogonal, which can lead to higher system spectral efficiency.Considering that the K-means method has good performance in user grouping, this article uses the K-means method to group user statistics into CSI groups.Before using the K-means algorithm, we define a chord distance between X and Y as where U X and U Y denote the eigenvectors corresponding to the non-zero eigenvalues of X and Y, respectively.Then we use an improved K-means algorithm to group the statistical CSI.The difference between the improved K-means algorithm and the K-means algorithm is that the number of users in each group in our algorithm is set to U. Compute the chordal distance d c (R a i , R a l ) between all users with the selected U users for l = 1:U do 6: Select F − 1 users from the remaining users that are closest to u l and the index set of these users is S l ; Select one user from each group and allocate these users to the f -th subcarrier, the index sets of these users are A f ,t , f = 1, 2, • • • , F; 12: end for Output: A f ,t .

Simulation Results
We validate the performance of the proposed method through simulation results in this section.Considering the millimeter wave communication scenario, it is assumed that there are a total of K mobile users in the cell.The base station is equipped with M antennas, and the maximum number P max of paths is set to 4. The number of subcarriers is F = 128.The path angles of each user are randomly distributed between [0, 180 • ].The subcarrier spacing is 30 kHz, the number of symbols in each slot is 14, and the slot time is 0.5 ms.Due to the mobile users, the statistical CSI coherence time length is set to 1000 slots (1000 time slots correspond to a range of 5 m at a speed of 10 m/s).After 1000 time slots, the statistical CSI of the users change; that is, the path angle and gain of each user change.We compare the proposed CMAB-based method with statistical CSI-based methods [9], random spectrum allocation, and user scheduling methods.In the method based on statistical CSI, it is assumed that the number of time slots used to estimate statistical CSI is 200, and user scheduling is carried out in the next 800 time slots.
Figures 1-3 show the spectral efficiency of the proposed method under different signalto-noise ratios (SNRs).The number of antennas at BS in Figures 1 and 2 is 64, and the number of antennas at BS in Figure 3 is 128.The total number of users is K = 2000.The number of scheduled users in Figure 1 at each subcarrier is 4; that is, the total number of scheduled users in Figure 1 is 512.The number of scheduled users at each subcarrier in Figures 2 and 3 is 8. Figures 1-3 show that the CMAB-based method proposed in this paper has a higher spectral efficiency than the statistical CS-based method and the random method.For the scheduling method based on statistical CSI, estimating the channel covariance matrix (CCM) incurs a large pilot overhead, leading to poor average spectral efficiency.The method proposed in this paper does not estimate the statistical CSI and can improve the spectral efficiency.In addition, compared to the random scheduling scheme, the proposed CMAB-based scheme accounts for partial CSI, providing a better channel quality and, thus, improving the spectral efficiency of the system.Figure 4 depicts the average spectral efficiency under different numbers of scheduled users per subcarrier.The number of BS antennas is set to 64, and the signal-to-noise ratio is set to 25 dB.The total number of users is 6000.From Figure 4, it can be observed that the method based on CMAB proposed in this paper performs better than the method based on statistical CSI and the random method under a different number of users.In addition, as the number of users increases, the spectral efficiency of all algorithms will increase first and then decrease.When the number of users is small, the inter-user interference is small, and the spectral efficiency will increase as the number of users increases due to user gain.When the number of users is large, the interference between users will increase.Because the number of users is large, user interference will play a major role, and the spectral efficiency of the system will decrease due to the large inter-user interference.Therefore, when the number of users is large, the spectral efficiency of the system will decrease with the increasing number of users.Figure 5 shows the spectral efficiency of the proposed CMAB-based user scheduling and subcarrier allocation scheme each time slot.The number of base station antennas is 64, and the signal-to-noise ratio is set to 25 dB.The total number of terminals is 2000, and the user numbers scheduled per subcarrier are four and eight.The spectral efficiency of the proposed CMAB scheme increases first and then converges.The convergence speed when the number of users is four is faster than the convergence speed when the number of users is right.This is because the smaller the number of users, the less time needed to explore.When the number of terminals is large, the spectral efficiency of the system is improved due to user gain, which corresponds to Figure 3; that is, eight users per subcarrier have higher spectral efficiency than four users per subcarrier.

Conclusions
In this paper, we propose a spectrum allocation and user scheduling scheme based on CMAB for millimeter wave massive MIMO systems.This scheme uses CMAB to schedule the users and uses the statistical CSI to allocate the subcarriers, which can effectively improve the system's spectral efficiency.Simulation results show that the proposed CMAB-based scheme can effectively improve the spectral efficiency compared with the existing schemes.In future work, we will consider downlink massive MIMO systems.Considering that channel reciprocity does not hold in frequency-division duplex (FDD) massive MIMO systems, a downlink CSI cannot be obtained using the uplink pilot.Thus, designing a transmission scheme for downlink in FDD massive MIMO systems is more challenging.Moreover, it should be noted that a hybrid architecture (including the digital and analogy parts) can reduce hardware complexity; due to this, we will design a user scheduling and spectrum allocation method in a downlink FDD massive MIMO system with hybrid architecture.

7 :
Calculate the center point R a l of the statistical CSIA of the users in S l ;
we can obtain e