Joint Beam-Forming, User Clustering and Power Allocation for MIMO-NOMA Systems

In this paper, we consider the optimal resource allocation problem for multiple-input multiple-output non-orthogonal multiple access (MIMO-NOMA) systems, which consists of beam-forming, user clustering and power allocation, respectively. Users can be divided into different clusters, and the users in the same cluster are served by the same beam vector. Inter-cluster orthogonality can be guaranteed based on multi-user detection (MUD). In this paper, we propose a three-step framework to solve the multi-dimensional resource allocation problem. In step 1, we propose a beam-forming algorithm for a given user cluster. Specifically, fractional transmitting power control (FTPC) is applied for intra-cluster power allocation. The considered beam-forming problem can be transformed into a non-constrained one and the limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) method is applied to obtain the optimal solution. In step 2, optimal user clustering is further considered. Channel differences and correlations are both involved in the design of user clustering. By assigning different weights to the two factors, we can produce multiple candidate clustering schemes. Based on the proposed beam-forming algorithm, beam-forming can be done for each candidate clustering scheme to compare their performances. Moreover, based on the optimal user clustering and beam-forming schemes, in step 3, power allocation can be further optimized. Specifically, it can be formalized as a difference of convex (DC) programming problem, which is solved by successive convex approximation (SCA) with strong robustness. Simulations results show that the proposed scheme can effectively improve spectral efficiency (SE) and edge users’ data rates.


Introduction
Traditional orthogonal multiple access (OMA) has met a bottleneck, since the limited spectrum resources cannot meet the ever-growing demand for mobile data traffic. As an alternative, non-orthogonal multiple access (NOMA) has attracted considerable attention since it allows multiple users to occupy the same spectrum resource simultaneously. According to NOMA protocols, users can be divided into different clusters based on their channel characteristics. The signals of the users in the same cluster will be further transmitted utilizing the same time-frequency resource [1]. In each cluster, the channel differences among different users should be large enough to perform successive interference cancellation (SIC) successfully [2][3][4]. Moreover, weak users can be compensated in the power allocation process, which not only improves edge users' performances, but helps to better identify multiplexed users in the power domain [5][6][7][8].
Moreover, multiple-input multiple-output (MIMO) also serves as a promising technique by which to multiply the spectrum efficiency (SE) gain [9][10][11]. In massive MIMO systems, beam-forming can effectively improve SE based on spacial diversity [12]. Conventionally, a specific beam vector can be designed for each user. The interference among multiple users can be eliminated when the number of antennas is greater than that of users. Specifically, the beam vector of each user can be set orthogonal to the channel vectors of others based on the zero-forcing beam-forming (ZF-BF) algorithm [13,14].

Our Contributions
In MIMO-NOMA systems, SIC is of great significance in reducing intra-cluster interference. While a user is decoding the signals of others based on SIC, past research has tended to set a lower bound for the received signal to interference and noise ratio (SINR) to ensure the decoding process goes smoothly. In [23], the optimal power allocation scheme for the downlink of NOMA system was obtained based on Karush-Kuhn-Tucker (KKT) conditions with a SINR bound of 0.3. Additionally, [9] jointly optimized power allocation and beam-forming for MIMO-NOMA with a SINR bound less than 0.5. Unfortunately, most related works fail to obtain a feasible solution when the SINR bound is greater than 1, which makes the received SINR for users relatively lower and, in turn, decreases system reliability. One explanation for this is that, most related works usually consider the joint optimization of power allocation and beam-forming. The scale of the considered problem is relatively large, which makes it challenging for optimization tools to obtain a feasible solution. To address this issue, we decompose the multi-dimensional resource allocation problem into three sub-problems. The scope of each sub-problem is relatively small, which helps to obtain a feasible solution with strong robustness.
Moreover, most existing works only consider channel difference characteristics while determining the clustering scheme for MIMO-NOMA. However, since the users in the same cluster are served by the same beam vector, their channel correlations should be relatively high to bring the advantages of MIMO into full play. The clustering criterion in [9][10][11][12] was to make the channel differences among multiplexed users as big as possible, which neglected channel correlations characteristics and was not directly related to the ultimate system performance. In this paper, channel correlations and differences are both involved in the design of user clustering. By assigning different weights to the two factors, we can produce multiple possible clustering schemes. Beam-forming can be done for each possible clustering scheme to compare their performance, which ensures the ultimate clustering scheme achieves the maximum SE performance.
In addition, some related literature only considers the resource allocation problem for a two-user-cluster. In this paper, the size of each cluster is not fixed, which makes the proposed scheme more practical. The main contributions of this paper are summarized as below: 1.
We present a system model for MIMO-NOMA. Multiple users can be divided into different clusters and the size of each cluster is not fixed. The users in the same cluster are served by the same beam vector. Each user is assumed to detect signals based on a specific receiving coefficient to ensure inter-cluster orthogonality. Moreover, SIC is applied to users to alleviate intra-cluster interference.

2.
We propose a three-step framework to solve the multi-dimensional resource allocation problem. In step 1, a beam-forming algorithm is proposed to obtain the optimal beam vector for a given user cluster. Specifically, fractional transmitting power control (FTPC) is applied to perform intra-cluster power allocation. The considered beam-forming problem can be transformed into a non-constrained one and the limitedmemory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method is applied to obtain a local optimal with less complexity. 3.
In step 2, user clustering is further considered, based on the proposed beam-forming algorithm. For each user k, we define a utility function to describe its preference on each cluster n. The utility function consists of two terms, which depict the channel differences and correlations between user k and the existing users in cluster n, respectively. A relative weight is introduced for the two factors to balance the tradeoff between channel differences and correlations. Based on the utility function, user k can be further assigned to its favorite cluster. In this paper, the relative weight is obtained by particle swarm optimization (PSO). In PSO, we can simultaneously produce multiple possible solutions for the relative weight, each corresponding to a possible clustering scheme. Based on the proposed beam-forming algorithm, beam-forming can be done for each possible clustering scheme to compare their performance, which ensures the ultimate clustering scheme achieves the maximum SE performance. 4.
In step 3, power allocation is further optimized based on the optimal user clustering and beam-forming schemes. As mentioned before, it can be formalized as a difference of convex (DC) programming problem utilizing the specific characteristic of the objective function, which can be solved by successive convex approximation (SCA) through limited iterations. We evaluate the performance of the proposed scheme and some other existing schemes to illustrate the significance of the proposed scheme.
The rest of the paper is organized as follows: Section 2 presents the system model for MIMO-NOMA and further provides a mathematical expression of the optimal resource allocation problem. Section 3 introduces more details about the proposed beam-forming algorithm. Sections 4 and 5 introduce the user clustering and power allocation schemes, respectively. The performance of the proposed scheme is evaluated in Section 6. Section 7 concludes this paper.

System Model
Consider a single-cell downlink MIMO-NOMA system, in which there is one base station (BS) equipped with N antennas and K single-antenna users. Let U = {1, 2, . . . , K} denote the set of users. Without loss of generality, the users are indexed by the descending order of channel gains, i.e., |h 1 | 2 > |h 2 | 2 > . . . > |h K | 2 , where h k ∈ C N×1 (k ∈ {1, 2, . . . , K}) denotes the channel vector of user k. All the K users will be further divided into S different clusters. Let U n = {i n (1), i n (2), . . . , i n (m n )} (n ∈ {1, 2, . . . , S}) denote the set of users assigned to cluster n, where m n denotes the size of U n , and i n (l) (l ∈ {1, 2, . . . , m n }) denotes the index of the l-th user in U n . Specifically, the users in U n are sorted in the ascending order of their indexes, i.e., i n (1) < i n (2) < . . . < i n (m n ). The superposed signal at the BS is given by where w n ∈ C N×1 denotes the beam vector of cluster n, s i n (l) and p i n (l) denote the signal and power of user i n (l), respectively. Assume the beam vector of each cluster has constant modulus (CM) elements. The received signal at user i n (l) is given by inter−cluster−interference where h i n (l) denotes the channel vector of user i n (l). The first term in (2) represents the received desired signal. The second and third term represent the inter-cluster and intracluster interference, respectively. The noise term ω i n (l) is a zero-mean complex additive white Gaussian noise (AWGN) with variance σ 2 . One can observe that the received interference is significantly larger. To solve this problem, each user is assumed to detect signals via a specific receiving coefficient, given by where α i n (l) denotes the receiving coefficient of user i n (l), v i n (l) ∈ C N×1 . Then, the received signal at user i n (l) can be re-written as =v H i n (l) H i n (l) w n p i n (l) s i n (l) where H i n (l) = h i n (l) h H i n (l) . For user i n (l), the interference from cluster q (q = n) can be eliminated when the following condition satisfies: v H i n (l) H i n (l) w q = 0 (5) Leth q n,l = H i n (l) w q , and letH n,l = [h 1 n,l ,h 2 n,l , . . . ,h n−1 n,l ,h n+1 n,l , . . . ,h S n,l ]. For user i n (l), the inter-cluster interference can be totally eliminated by setting v i n (l) as the left singular vector ofH n,l corresponding to the zero singular value. It is worth noting that there is a constraint for this operation, i.e., N ≥ S − 1. In addition, α i n (l) should be normalized to ensure that it will not bring extra SE gains, i.e., v H i n (l) h i n (l) = 1. Accordingly, the received signal at user i n (l) can be transformed intō Moreover, SIC is performed at users to further reduce intra-cluster interference. According to SIC, in each cluster, a user can decode the signals of the others with poorer channel conditions. Conventionally, since U n is a sorted sequence, user i n (b) (b ∈ {1, 2, . . . , m n }\{l}) can decode the signals of user i n (l) if and only if b < l. However, in MIMO-NOMA system, users' channel gains depend not only on the physical environments but on the beams, i.e., the decoding priority may not be fixed and is subject to the beam-forming scheme. Specifically, user i n (b) can decode the signals of user i n (l) when the following condition satisfies: Obviously, beam-forming affects the decoding order by adjusting users' effective channel gains. Accordingly, we introduce a decoding indicator λ b,l n to depict whether or not user i n (b) can decode the signals of user i n (l), given by Here, the sign function is introduced to denote the decoding priority, which returns 1 when its input is positive, and −1 otherwise. From (8), if user i n (b) can decode the signals of user i n (l), λ b,l n = 1; otherwise, λ b,l n = 0. When λ b,l n = 1, there is an implicit power constraint, given by From (9), when λ b,l n = 1, the power of user i n (l) should be larger than that of user i n (b) to make i n (l) more easily detected. The received SINR at user i n (l) can be expressed as where denotes the normalized channel gain of user i n (l). Based on the discussion above, the considered problem can be mathematically expressed as below:

Beam-Forming Algorithm for a Given User Cluster
Problem (11) considers the joint optimization of user clustering, beam-forming and power allocation for MIMO-NOMA, which is challenging to be solved in a polynomial time. Due to the orthogonality among different clusters, in this section, we first consider the beam-forming problem for a given user cluster (the optimal user clustering and power allocation schemes will be further discussed in Sections 4 and 5, respectively). Without loss of generality, we assign the first m users of U to cluster n, i.e., i n (l) = l, ∀l = 1, 2, . . . , m. The beam-forming problem for n can be mathematically expressed as below: where P n = {p l , ∀l = 1, 2, . . . , m} denotes the power allocation scheme for the m considered users. For the sake of simplify, the SINR constraint is omitted here and will be further considered in Section 5. Each cluster is assumed to have the same power budget, denoted by P tot S . Due to (12e), the beam vector can be represented as w n = 1 √ N (e jφ 1 , e jφ 2 , . . . , e jφ N ) T , where φ c denotes the phase of the c-th element in w n . The beam vector is obtained once the phases of its elements are determined. Inspired by this observation, we treat Φ = [φ 1 , φ 2 , . . . , φ N ] as variables. Based on perfect square formula, users' normalized channel gains can be expressed in terms of Φ (for the details of derivation, see Appendix A).
where κ l,c and ϕ l,c denote the amplitude and phase of the c-th element in h l , respectively. However, problem (12) is still difficult to solve due to (12c) and (12d). To predigest the scope of (12), we first produce a feasible solution for P n and then maximize (12a) by optimizing Φ.
Specifically, the power allocation scheme can be obtained based on FTPC, i.e., the transmit power of user l can be represented by: where γ denotes the decay factor. With FTPC, constraint (12d) always holds since ∑ l p l = P tot S . Moreover, γ determines the correlation between users' channel gains and transmitting power. When γ = 0, transmitting power is totally unrelated to normalized gains, i.e., each user has the same transmitting power. Moreover, p l and g l will be negatively correlated as γ increases, which is consistent with (12c) and thus makes (14) a feasible solution.
Accordingly, problem (12) can be transformed into However, it is still challenging for us to solve (15) since the sign function in (12b) is non-differentiable. To solve this problem, we produce an approximation of λ b,l n , given bȳ The Sigmoid function is introduced which is first-order differentiable. From (16), when Since the output of (16) ranges from zero to one, we consider (16) as the probability that user b successfully decodes the signals of user l.
Then, (15) can be re-written as Consider the partial derivatives: Since problem (17) is a differentiable non-constrained problem, a quasi-Newton method named L-BFGS can be applied to solve it in limited iterations. In each iteration, L-BFGS produces an updating direction for Φ based on the information from the last T iterations. Once the update direction is determined, the Armijo rule is applied to obtain a proper step size. More details about the proposed algorithm are as shown in Algorithm 1.

Algorithm 1 Beam-forming Algorithm for A Given Cluster
Require: U n Ensure: Φ 1: Initialize T, η. 2: Y n = ∅, S n = ∅, R n = ∅. 3: Randomly initialize Φ. 4: g pre ← the gradient of f at Φ. 5: Calculate the updating direction y= −g pre . 6: Obtain the optimal step size µ based on the Armijo rule. 7: y ← µy, Φ ← Φ + y. 8: g cur ← the gradient of f at Φ. 9: s ← g cur − g pre . 10: ρ ← y H s. 11: while |g cur | ≥ η do 12: g pre ← g cur . 13: Insert y to Y n . 14: Insert s to S n . 15: Insert ρ to R n . 16: L ← the number of the elements in Y n . 17: if L > T then 18: Pop the first element in Y n . 19: Pop the first element in S n . 20: Pop the first element in R n .

User Clustering for MIMO-NOMA System
In this section, optimal user clustering is further considered based on Algorithm 1. In each cluster, the channel differences among multiplexed users should be large enough to perform SIC successfully. Moreover, since the users in the same cluster are served by the same beam vector, their channel correlations should also be emphasized to bring the advantages of MIMO into full play. Accordingly, the optimal clustering scheme will be obtained with consideration for both the two factors.
Due to SIC, the strongest user in each cluster is in fact served by OMA, which can achieve good performance with less power when its channel gain is relatively large. Therefore, the first S users of U will be assigned to S different clusters, respectively. Due to the high channel gains, these users could achieve good performances with less power, which can in turn enable more power budget for others. After initializing S clusters, the remaining users in U will successively select a suitable cluster to join. For each user k, we define a utility function to assess its preference on different clusters, given by where u k (n) describes user k's preference for cluster n. The first term depicts the channel correlations between user k and the existing users in cluster n. The second term is the Jain's fairness index, which measures the channel difference between user k and the existing users in cluster n. Specifically, the second term ranges from 1 m n +1 to 1 and will decrease as the channel difference gets larger. θ denotes the relative weight for the two aspects. Based on the utility function, user k will be further assigned to its favorite cluster n k , given by Clusters will reject users only when condition (11f) is not met. For each cluster n, when m n = M + 1, n should reject a user to satisfy the size-constraint. Accordingly, we can produce multiple possible user set for cluster n by removing any single user from U n . Based on Algorithm 1, beam-forming can be done for each possible user set to compare their performances, and the optimal user set for cluster n is obtained accordingly.
One can observe that the relative weight is of great significance in steering the ultimate clustering scheme. When θ is relatively small, channel correlations play a decisive role in the clustering process. As θ increases, channel differences, in turn, become the controlling factor of the ultimate clustering scheme. With any given θ, user clustering can be performed based on Algorithm 2. Then, Algorithm 1 can be applied to obtain a beam-forming scheme, and the corresponding achievable SE can be denoted by w(θ). Sort multiple clusters based on user j's preference. 5: Denote the sorted sequence by Ω j . 6: while Ω j = ∅ do 7: n j ← the first cluster in Ω j . 8: Insert user j to U n j . 9: NU M ← the number of the users in U n j . 10: if NU M ≤ M then 11: Break. Remove the i-th user from U n j . 15: Obtain the optimal beam vector for cluster n j by Algorithm 1. 16: ε i ← the sum rate of the users in U n j . 17: Insert the removed user to its original position.  x ← the index of theĩ-th user in U n j . 21: Remove theĩ-th user from U n j . In this section, the optimal θ is obtained by PSO. In PSO, the optimal θ can be obtained through numerous iterations. In each iteration, PSO produces multiple possible solutions for θ, each corresponding to a possible clustering scheme. Based on Algorithm 1, beamforming can be done for each possible clustering scheme to compare their performance. We will further select the one with the maximum SE performance as the optimal clustering scheme, and its corresponding relative weight is exactly the optimal θ obtained by PSO. More details are as described in Algorithm 3. Note that the random variable δ in step (13) denotes the step size, which is a real number ranging from 0 to 1. Initialize position θ g for particle g.

Power Allocation for MIMO-NOMA
User clustering and beam-forming are jointly solved in Section 4. However, FTPC is still applied for intra-cluster power allocation, which needs further improvements. In this section, power allocation is optimized based on the optimal user clustering and beam-forming schemes. Without loss of generality, the users in U n are re-ordered in the descending order of effective channel gains. The re-ordered sequence can be denoted bỹ U n = {ĩ n (1),ĩ n (2), . . . ,ĩ n (m n )}, whereĩ n (l) (l ∈ {1, 2, . . . , m n }) denotes the index of the l-th user inŨ n . Moreover, we have w H n h˜i where h˜i n (l) denotes the channel vector of userĩ n (l). The achievable rate of userĩ n (l) with normalized bandwidth can be represented as: The power allocation problem can be mathematically expressed as below: As mentioned in Section 2, P = {p k , k = 1, 2, . . . , K} denotes the power allocation scheme. (24b) denotes the SINR constraint for decoding. In each cluster n, the signals of userĩ n (l) should be decoded from the others with higher channel gains. The series SINR constraints can be represented as below: . . .
The power budget of each cluster can be auto-adjusted based on the channel characteristics of intra-cluster users. To solve (28), we first introduce an auxiliary variable t to bound (28a) from below and then optimize (28) by maximizing t. The equivalence problem is given by However, problem (29) is non-convex since (29b) is a non-convex constraint. To address this issue, we produce a convex relaxation of (29b) based on SCA. Accordingly, (29) is transformed into a convex problem, which can be efficiently solved with a polynomial time.
To relax (29b), we first consider a DC function, given by The first and second term in (30) are both logarithmic functions, which makes (30) a DC function. Due to the concavity of logarithmic functions, the second term in (30) can be tightly bounded from above with its first-order Taylor expansion, i.e., with any given {p˜i n (1) ,p˜i n (2) , . . . ,p˜i n (l−1) }, we have Substitute (31) into (30), we obtain The left-hand side (LHS) of (29b) can be represented as ξ n,l . Accordingly, with any given {p k , k = 1, 2, . . . , K}, we can derive a lower bound B for the LHS of (29b), represented as (33) from the top of next page. Obviously, B is convex in P, and (29b) can be further relaxed by restricting B to be greater than t. The equivalence convex problem is given by (34)-(37).
p˜i n (l) g˜i n (l−1) − Γ(1 + l−1 ∑ j=1 p˜i n (j) g˜i n (l−1) ) > 0, ∀l = 2, 3, . . . , m n (36) According to the principles of SCA, the solution of problem (29) should be obtained through multiple iterations. In each iteration, we produce an equivalence problem of (29) as (34)-(37), which is further solved by some effective optimization tools. Specifically, in the i-th iteration, {p k , k = 1, 2, . . . , K} should be set as the solution obtained in the previous iteration. More details are as described in Algorithm 4.

Simulations Results
In this section, the performance of the proposed scheme is evaluated by multiple simulations. The distance from users to the BS is uniformly distributed in the range of 0 to 500 m. The channel vector of each user is assumed to be the product of large-scale path loss and Rayleigh fading. We also evaluate the performances of two existing schemes to illustrate the significance of the proposed scheme [9,21]. Some key parameters are as summarized in Table 1. The effects of multiple factors will be discussed in more details.  Figure 1 plots SE versus M with γ = 0.2 and S = 2. As shown in Figure 1, SE increases with M since a larger M allows a cluster to serve more users. However, such effect gets saturated as M increases to a certain degree, subject to the total power budget.
The effect of user diversity is also considered. Figure 2 plots SE versus K with γ = 0.2 and S = 2. From Figure 2, one can observe that user diversity plays a key role in increasing SE. Moreover, the proposed scheme outperforms the two existing schemes in terms of SE.  Next, the impact of the decay parameter γ is further considered. In Algorithm 1, γ determines the correlation between users' channel gains and transmitting power. As γ increases, weak users have made notable gains at the expense of strong users. However, the achievable SE mainly depends on strong users due to their high channel gains.  In Algorithm 2, channel differences and correlations are both involved in the design of user clustering. A relative weight θ is introduced for the two aspects, which is obtained based on group hunting strategy. As discussed before, θ is of great significance in steering the ultimate clustering scheme. With M = 2, γ = 0.2 and S = 2, Figure 4 plots SE versus K under different θ. When θ = 0.1, the achievable SE is relatively small because of neglect of channel difference characteristics. As θ increases, we can achieve a better balance between the two contributing factors and SE will increase accordingly. However, when θ increases to a certain degree, e.g., θ ≥ 10, the achievable SE will further decrease for overlooking channel correlations characteristics. Moreover, the performance upper bound is also considered. Exhaustive user search can be done to find the optimal clustering scheme. Based on Algorithms 1 and 4, beam-forming and power allocation can be done for each possible clustering scheme to compare their performance. From Figure 4, the performance of the proposed scheme can approach to the upper bound due to the optimization strategy of PSO.   The number of users that can be served simultaneously will increase with the increase of S. Moreover, when N ≥ S − 1, there is no interference among different clusters. As shown in Figure 5, SE has a nearly linear increase with S due to the orthogonality among different clusters. The effect of Γ is also considered. In the SIC-based decoding process, the received SINR at users will increase with the increase of Γ, which not only improves system reliability, but also enables more power budget for edge users. Accordingly, as Γ increases, there is a corresponding increase in edge users' data rates. With γ = 0.2, Figure 6 plots SE versus K under different Γ. From Figure 6, the achievable SE is almost unaffected by Γ, i.e., as Γ increases, we sacrifice strong users' data rates in exchange for weak users' rates to ensure all multiplexed users can achieve satisfactory performances. However, the existing schemes fail to obtain a feasible solution when Γ is greater than 0.5. By contrast, the proposed scheme is more robust which can obtain a feasible solution with a larger Γ. One explanation for this is that, the proposed scheme solves beam-forming and power allocation separately, which predigests the scope of the considered problem and helps to achieve a better performance with strong robustness. The effect of beam-forming on the optimal decoding order is also investigated. We generate 1000 instances with γ = 0.2, K = 60, S = 5 and M = 6. The proposed scheme is applied for the realization of each instance. In each cluster, the intra-cluster users are sorted in the descending order of channel gains and normalized channel gains, respectively. The positions of each user in the two sorted sequences are recorded, and their difference can be utilized to describe how often beam-forming changes the optimal SIC order. Figure 7 plots the distribution of the position differences. From Figure 7, the decoding order generally remains unchanged. However, there are circumstances where the optimal SIC order is slightly adjusted.

Complexity Analysis
With any given relative weight, the corresponding clustering scheme can be obtained by Algorithm 2. For each user k, Algorithm 2 measures k' s preference for different clusters.
User k will be first assigned to its favorite cluster n k . After assigning user k to cluster n k , two cases can occur: • Case 1: The number of the users in cluster n k is no greater than M; • Case 2: The number of the users in cluster n k is greater than M.
In case 1, user k can be directly assigned to cluster n k . In case 2, cluster n k should reject a user to meet the size constraint. Algorithm 2 produces (M + 1) possible user set for cluster n k . Based on Algorithm 1, beam-forming can be done for each possible user set to compare their performance, and the rejected user is obtained accordingly. If the rejected user is user k, k will be further assigned to its second-favorite cluster. The above process will be repeatedly executed until either user k is successfully assigned to a single cluster or all the clusters are processed.
Accordingly, Algorithm 2 consists of two parts: part 1 measures each user's preference for different clusters; part 2 helps each user select a suitable cluster to join. The complexity of part 1 is O(SK), and the complexity of part 2 is O(SK). The complexity of Algorithm 2 is O(SK).
The optimal relative weight can be obtained by Algorithm 3 through D iterations. In each iteration, Algorithm 3 produces G possible relative weights, each corresponding to a possible clustering scheme. Based on Algorithm 1, beam-forming can be done for each possible clustering scheme to compare their performance. The complexity of Algorithm 3 is O(SK).

Conclusions
In this passage, we consider the multi-dimensional resource allocation problem for MIMO-NOMA, which consists of power allocation, user clustering and beam-forming, respectively. A three-step resource allocation framework is proposed to solve the considered problem: step 1 solves the beam-forming problem for a given user cluster; step 2 obtains the optimal clustering scheme based on the proposed beam-forming algorithm; step 3 further optimizes power allocation based on the optimal user clustering and beam-forming schemes. Simulation results show that the proposed scheme can effectively increase the received SINR at users. Additionally, the performance of the proposed scheme can approach the performance upper bound in terms of SE.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The normalized channel gain of user l is given by where κ l,c and ϕ l,c denote the amplitude and phase of the c-th element in h l , respectively.