Efficient Allocation for Downlink Multi-Channel NOMA Systems Considering Complex Constraints

To enable an efficient dynamic power and channel allocation (DPCA) for users in the downlink multi-channel non-orthogonal multiple access (MC-NOMA) systems, this paper regards the optimization as the combinatorial problem, and proposes three heuristic solutions, i.e., stochastic algorithm, two-stage greedy randomized adaptive search (GRASP), and two-stage stochastic sample greedy (SSD). Additionally, multiple complicated constraints are taken into consideration according to practical scenarios, for instance, the capacity for per sub-channel, power budget for per sub-channel, power budget for users, minimum data rate, and the priority control during the allocation. The effectiveness of the algorithms is compared by demonstration, and the algorithm performance is compared by simulations. Stochastic solution is useful for the overwhelmed sub-channel resources, i.e., spectrum dense environment with less data rate requirement. With small sub-channel number, i.e., spectrum scarce environment, both GRASP and SSD outperform the stochastic algorithm in terms of bigger data rate (achieve more than six times higher data rate) while having a shorter running time. SSD shows benefits with more channels compared with GRASP due to the low computational complexity (saves 66% running time compared with GRASP while maintaining similar data rate outcomes). With a small sub-channel number, GRASP shows a better performance in terms of the average data rate, variance, and time consumption than SSG.


Introduction
To fully exploit spectrum resources and meet the explosive traffic growth, the nonorthogonal multiple access (NOMA) scheme has attracted sufficient attention in recent years for the next-generation cellular systems, and envisions to offer significant improvement for the current communication structures [1]. Distinct from orthogonal multiple access (OMA) schemes (e.g., orthogonal frequency division multiple access (OFDMA) and singlecarrier FDMA (SC-FDMA)) based communication system (e.g., long-term evolution (LTE)), NOMA allows the sharing of spectrum in the time and frequency domain among multiple users simultaneously at the cost of increasing computational complexity in detections at the receiver side. The NOMA scheme provides the possibility of outperforming OMA in dense networks with large numbers of users [2]. Some NOMA relevant applications are investigated, e.g., ensuring the multiple power beacons (PBs) based NOMA-mobile edge computing (MEC) to transmit in a reliable and energy-efficient way [3] and applying NOMA and energy harvesting to support device-to-device (D2D) transmission [4]. The NOMA construction is enabled with the successive interference cancellation (SIC) which has a preknown knowledge-base of users before allocation [2]. An illustrative diagram of NOMA communication is presented in Figure 1. Therefore, an efficient joint resource allocation of spectrum frequency and transmit power, i.e., joint dynamic power and channel allocation (DPCA) [5], is dominant to form the NOMA concept so as to improve the efficiency in wireless communication. The DPCA problem is typically regarded as a function optimization problem (FOP) and solved with the joint optimization of scheduling and power control [6]. Cui et al. [7] utilized the matching theory and successive convex approximation techniques for tackling the user scheduling and power allocation problems. Wei et al. [8] proposed a reliable allocation algorithm for multi-channel NOMA (MC-NOMA) considering the uncertainty and imperfect of channel state information (CSI). Fang et al. [9] exploited the user scheduling and power allocation with imperfect CSI by propositions of iterative algorithms, where the outage probability requirement, minimum user data rate, and user capacity per sub-channel are specified in modeling the allocation. The classical mixed integer formulations solved by searching, matching, and iterative algorithms for addressing DPCA are commonly believed to have a large number of iterative steps when obtaining the convergence results [5] due to the continuous variable of power budget, which drives demands of exploring solutions with the low running cost.
Dai et al. [2] compared Welch-bound equality spread multiple access (WSMA) based NOMA communication with OMA scheme leading to the conclusion of reducing the implementation complexity. For releasing the computations in the allocation, Liu and Petrova [5] proposed a water-filling based algorithm for allocating the power among inter-channels. Nain et al. [10] proposed a method that removes users who do not satisfy the derived condition before the allocation using the paring and Lagrangian approach. Fang et al. [11] applied convex relaxation and dual-decomposition techniques to release the computation loads in the optimization, where the optimal closed-form power allocation expressions were derived with the Lagrangian approach.
The other idea of relaxing the computation when the power variable involved is implemented by splitting the continuous power into a finite set of power streams during processing. Sokun et al. [12] considered the discretized the power and resource block allocations, and proposed an iterative sub-optimal heuristic for constructing the energy-efficient orthogonal frequency division multiple access (OFDMA)-based wireless communication system. D'Andreagiovanni et al. [13] sampled the power resources and construct a single knapsack polytope with two GUB inequalities for designing the wireless network, where the user transmitters are assumed to operate in the same frequency.
Differently from the exact algorithms, the heuristic solution presents more advantages in the large networks due to its efficiency, especially integrated with the approximation strategy for relaxing the constraints. However, the heuristic solution performance is typically not provable and unable to provide certain quality. D'andreagiovanni et al. [14] integrated the variable fixing with mixed integer programming (MIP) heuristic algorithm guided by linear programming (LP) relaxations in the constraints to address a robust optimization model. Capone et al. [15] relaxed the minimum signal-to-noise-and interference ratio (SINR) constraint to a knapsack constraint aiming at approaching the global optimality.
Several fundamental works have been done on modeling diverse DPCA problems and investigating NOMA performance. The user fairness is modeled in [16] to balance the user performance during allocations through adding the user sum weights in the sum rate function. Lei et al. [17] modeled the maximum user number per sub-channel as a constraint in DPCA for practical consideration, as well as considering the processing delay in the SIC. The diverse transmission power budget per individual sub-channel is modeled as a constraint in [5]. To guarantee the weak users' QoS, a W-QoS-based NOMA algorithm is proposed in [18] which envisions to be merged with the established selective and clustering methods, and guarantees the NOMA rate higher than the TDMA rate.
Submodularity is a useful tool for modeling the objective functions for the resource allocation problems. It simply requires that the marginal gain value obtained by allocating a resource diminishes as the number of allocated sources increases. Finding the optimal solution of submodular maximization is proven to be NP-hard [19], which implies that the complexity of the problem explodes as the problem size increases. Greedy-based algorithms have been widely applied in submodular maximization problems due to the efficiency and capability to provide a constant factor of approximation ratio. Typical applications include Max-Cut [20], facility location selection [21,22], sensor placement [23][24][25], and machine learning [26][27][28]. Recently, greedy-based algorithms have shown great potentials in reducing computational complexity when handling resource allocation problems [29,30].
This paper considers the allocation problem as the combinatorial problem by way of discretizing continuous power resources into finite sets based on the sampling concept. Three typical heuristic solutions are proposed owing to efficiency and ease of difficulty in modeling, i.e., stochastic strategy, greedy randomized adaptive search (GRASP), and stochastic sample greedy (SSG) strategy. Distinct from the non-convex problem based solvers, the applied combinatorial solver does not require searching of optimal results which accelerates the computation. Distinct from other greedy based solvers [31,32], the randomization of the sub-channel allocation and the resource assignment are integrated, as well as the consideration of multiple practical constraints.
The contributions of this paper are: (1) We propose greedy enabled low-complexity algorithms useful for the general environments (extreme spectrum dense and scarce environments are included), where DPCA is regarded as the combinatorial problem. (2) To reflect the practical scenario, we introduce some feasibility constraints (e.g., user number limit per each sub-channel, maximum power restrictions among users and BSs, and the minimum data rate per user) as the hard constraints which must be ensured during allocations. The priority control management is represented as a soft constraint modeling that the function is taken into account while not needing to be guaranteed for each iterative.
(3) Some strategies are presented for the exclusive addressing of DPCA. For instance, a two-phase allocation structure is applied to maintain the hard constraint of the minimum data rate requirement, the user capacity is achieved through the construction of the objective function, and the sampling of power values is applied for reducing the computational complexity in the combinatorial problem.
This paper organizes as follows. Section 2 discusses typical greedy algorithm along with its worst performance. Section 3 presents the formulation of the DPCA problem that is considered in this paper. Section 4 proposes heuristic solution algorithms for solving the DPCA problem. The performance of the proposed algorithms is testified in Section 5 through simulations. Section 6 offers the conclusion of the contributions in this paper.

Preliminary Background
This section discusses characteristics in the unified greedy algorithms (see Algorithm 1) for addressing the allocation problems with single objective function.

Algorithm 1 Greedy algorithm subject to matroid constraints
Input: f : 2 N → R ≥0 , N , I Output: A set S ⊆ I 1: S ← ∅ 2: while ∃u ∈ N such that S ∪ {u} ∈ I do 3: S ← S ∪ {u * } 5: end while 6: return S Since the greedy policy essentially adopts iterative allocation, we denote u as the current resource of interest and Z as the allocated set. The objective function with the u resource denotes f (u|Z) and aims to be maximized. Consequently, the differential objective function ∆ f (u|Z) can be calculated by using the marginal gain value (mgv) [33]: Definition 1 (Marginal gain value (mgv)). For a set function f : 2 N → R, a set Z ⊂ N , and an element u ∈ N , define the mgv of u given Z as where N is named as the ground set which is a finite set. The notation " . =" means equal by definition.
The objective function considered in this work is submodular and monotone. The formal definitions of submodularity and monotonicity are provided in the following.
where N is the ground set. N , X, and Y are finite sets.
Equivalently, the submodularity can also be derived by combining Definitions 1 and 2 as: Equation (1) is also named as the diminishing return [34], which is an important property for submodular functions. Intuitively, the marginal gain value when adding a new element u to a determined set increases with a smaller set size.
The constraint that one resource can be allocated to at most one user is represented by the partition matroid constraint. The definition of a matroid is described as: Definition 4 (Matroid [35]). A matroid is a pair M = (N , I) where N is the ground set and I ⊆ 2 N is a collection of independent sets, satisfying: if X, Y ∈ I, |X| < |Y|, then ∃u ∈ Y\X such that X ∪ {u} ∈ I. | · | denotes the cardinality operation.
Given the above definitions of mgv, submodularity, monotonicity, and matroid, a partition matroid constraint can be used for formulating the maximization function, where the partition matroid denotes that an independent subset S contain at most a certain number of elements from each of the disjoint partitions of N . Consequently, the worst performance of the typical greedy algorithm performance is denoted: Theorem 1. The Greedy algorithm achieves an approximation ratio of at least 50% of optimality for maximizing monotone submodular functions subject to matroid constraints.
Proof. Let OPT\S = {a 1 , · · · , a l } where OPT is the optimal solution. Let a * denote the element selected by the greedy algorithm at step j given S j−1 . Since a * is optimal with the greedy concept, we then have When the objective function is monotone, the optimality is obtained: Then, by submodularity, we get It is worth noting that, when the hard constraints are mitigated (such as in the spectrum dense environment), the greedy algorithm used in this paper can be regarded as a simplified version with the typical greedy algorithm format, and hence the 50% performance degradation applies.

Problem Formulation
To address the DPCA problem, the user sequence and power sequence are the outcomes to obtain the solution that users and their transmission power are allocated to the individual sub-channel.
For the downlink MC-NOMA system [5], we suppose that m ∈ M(1, 2, 3, . . . , M) users are allocated with n ∈ N(1, 2, 3, . . . , N) sub-channels in the base station (BS). The allocated sub-channels form a sub-channel set l ∈ L(1, 2, 3, . . . , L), where L ⊂ N. We define U as the allocated user sequence set, where the element in U is U ∈ U and U ⊂ M. The power budget for the mth user is defined as w m ∈ W where W is the power budget set for users representing the maximum power output for each transmission. The power budget for the nth sub-channel is defined as k n ∈ K with a determined power budget set K. The overall power budget for all the N sub-channels thus denotes K x = ∑ N i=1 k i . We define the power sequence set P which is treated as the discrete variable through the sampling strategy to construct the combinatorial optimization problem. For the user sequence U, the allocated cumulative power p over nth sub-channel is defined as p U(n) and can be obtained by p U(n) = ∑ r n i=1 p U i (n) . Similarly, p U(m) denotes the allocated cumulative power for the mth user and p U(m) = ∑ L i=1 p U i (m) . The instantaneous channel quality indicator (CQI) [5] defined as φ ∈ Φ is used as the SINR for representing the effects from the channel environment. φ U(n) represents the CQI for the allocated user U over the nth sub-channel. The channel propagation effects, such as small-scale fading, large-scale fading, and Doppler effect, are reflected in CQI, while they are not modeled in this paper.
We assume that the inter-cell interference [36] induced by multiplexing of multiple users over the same spectrum is resolved by SIC. The instantaneous data rate D for the allocated user U over n sub-channel is estimated by Shannon's theorem: where B denotes the bandwidth of the sub-channel and assumes to be equivalent among all the sub-channels. The unit of the data rate is nat/s. The cumulative data rateD is obtained by summarizing the data rate D of individual user m over N sub-channels, and it is denoted: Consequently, the general DPCA problem can be solved by optimizing the maximum data rate for the allocated user set U: This paper considers several feasibility constraints and useful functions in practice, i.e., user number limit per each sub-channel, maximum power restrictions among users and BSs, the minimum data rate per user, and the user priority control during allocation. Among them, the former three feasibility constraints are treated as the hard constraint, whereas the priority control function is the soft constraint to relax the computation as it is not necessarily imposed for each allocation. This paper aims at maximizing the cumulative data rate with the discrete unit power cost sampled in advance. The overall DPCA problem is exploited by solving the combinatorial optimization problem extended from the (4) format, and is reformulated as below: where the P are regarded as the discrete variable in the proposed solutions. As depicted in (5), we develop the following constraints when constructing the objective function: (1) The maximum user number r n = len(U n ) multiplexed per individual sub-channel is limited to a certain number s n ∈ S [37] (referring to the first inequality Constraint C1), where the len function calculates the length of the sequence. (2) The power budget among sub-channels p U(n) is limited to a maximum power budget k n ∈ K for ∀n ∈ N to prevent the over-usage of single sub-channel (referring to the second inequality Constraint C2), where p U(n) = ∑ r n i=1 p U i (n) . (3) Given the maximum power budget among users, the allocated cumulative power p U(m) for the mth user is limited to a diverse power budget per each user w m ∈ W over ∀m ∈ M (referring to the third inequality Constraint (4) For ensuring the minimum QoS requirement, we apply the fourth inequality constraint, i.e., Constraint C4 to guarantee the minimum data rate among users. (5) The priority control during allocation matters due to the multi-level feature of user qualities in service. This function is developed by adding weights β in the cumulative data rateD function (3), i.e., β mDm in the second equation.
The defined objective function also reveals the submodularity and monotonicity according to discussions in Section 2. It is worth noting that the presented DPCA is difficult to solve with the off-the-shelf algorithms. This paper presents heuristic solutions to address this problem, and the solvers are discussed in the following section.

Heuristic Solutions
In this paper, we consider two cases: (1) only the minimum resources exist in BS for maintaining the least data rate among users (see the Theorem 2), which means that users may not be allocated with sufficient power and sub-channels to maximize the data rate; and (2) the BS resources are sufficient to achieve the maximum data rate for all the users (see the Theorem 3).

Theorem 2.
To successfully enable the DPCA with the minimum constraints of data rate, the overall power budget K x needs to satisfy the condition: Proof. The transmission power for the single user can be estimated from the Shannon's theorem, i.e., p = e D B −1 φ . Given the diversity in D ∈ Q and φ ∈ Φ, the optimal transmission power for the single user is p max = e Qmax B −1 Φ min . For ease of expression, the power budget can be configured to be M × p max for mitigating the computation loads, hence the maximum K value is obtained. Theorem 3. The overall power budget K x needs to satisfy inequality K x ≥ ∑(W) to enable the maximum data rate constrained to the power limit p U(m) ≤ w m : Given the fact that the K x value in Theorem 3 outperforms the K x value in Theorem 2, the following relationship between Q, Φ, and W is obtained: where the mean function denotes the average values for the sets.
It is worth noting that the constraints of the maximum power budget in sub-channels K and the maximum user number S affect the allocated sub-channel sequence length L. Therefore, the sub-channel number is assumed to guarantee the minimum sub-channel number, i.e., the division of the maximal power budget by the minimal power budget among sub-channels.

Remark 1.
For maintaining the complex constraints, the sub-channel number N is assumed to satisfy the condition: where the min(x, y) function returns the minimum value of x and y inputs.
Given the difficulty in addressing Cases (1) and (2) under the complex constraints with current solutions, the heuristic solutions are presented to accelerate the computation for DPCA in the following section, i.e., stochastic strategy, GRASP strategy, and SSG strategy.

Stochastic Strategy
The stochastic strategy based solver addresses DPCA by generating random values or sequences, i.e., the allocated sub-channel set L, the user sequence set U, and the corresponding user power set P. It is worth noting that the consideration of fairness fitting can be negligible by using the stochastic strategy, since the allocations among users are treated in equal. The optimization function (4) holds. If the β sequence follows a decreasing order, the priority control function for M users is reflected as a condition statement sort(D) ==D where sort is the re-ordering function. We assume s n ≤ k n , ∀n ∈ N to reflect user number limit among sub-channels. Therefore, the stochastic strategy based DPCA solver is presented in Algorithm 2.

11:
end if 12: ifD m ≥ q m , ∀q ∈ Q then 13: f lag q = 1. 14: end if 15: f lag = f lag β × f lag q . 16: end while 17: return L, U and P The complicated constraints are taken into consideration. For instance, the maximum user number is reflected in Step (6); the sub-channel power budget and cumulative power budget are implemented in Step (8) by updating the remaining power budgets Kr and Wr for K and W, respectively; and the constraints of the minimum data rate and priority control are achieved by using f lag q (Step (14)) and f lag β (Step (12)), separately.
The expectation forD can be estimated as following: where the average sub-channel number E[L] = L max +N+1 2 , and the expectation of data rate per sub-channel denotes:

Greedy Randomized Adaptive Search
Given the fact that typical greedy algorithm is only applicable for solving the single objective optimization (DPCA in this paper is a joint optimization problem), greedy randomized adaptive search (GRASP) is used and modified owing to its efficiency for combinatorial optimization problems. Providing the continuity in power candidates among sub-channels, the discretization of power value is prominent to mitigate the computational complexity as well as simplifying the local-search procedure in obtaining the local optimal outcomes.
The typical GRASP consists of two phases, i.e., a construction phase and a local search phase. The construction phase aims at selecting a preferable outcome using the randomized adaptive greedy strategy. Based on the selected outcome, the local search phase is for improving the performance at a cost of computational complexity. It is worth noting that the local search can be both added to the GRASP and SSG, thus it is not presented for the fairness comparison purpose.
Due to the low-bound constraint, i.e., the minimum data rate limit for users, a twostage GRASP (construction phase combined) is presented. For the first stage, the minimum data rate is sufficiently allocated to users. Afterwards, the remaining power and subchannel resources are assigned with the best performance purpose in the second stage.
Given the optimal selection for each iterative, we reformulate the objective function from Equation (5)   0 ≤ p U(n) ≤ k n , k n ∈ K, ∀n ∈ N, where λ is the coefficient factor ranging between (0, 1) and aims at introducing the submodular characteristic for sub-channels (resulted from CQI).p m,n represents the power sum for the mth user over the nth sub-channel. The reason of having the piece-wise format is to ensure the allocated user number per sub-channel r n limited to the sub-channel capacity s n . Consequently, the GRASP based solver for DPCA is presented in Algorithm 3.
As depicted in Algorithm 3, the first stage (Steps (2)-(6)) of allocation is completed by using a typical greedy strategy (power resources are not segmented into pieces) until the minimum data rate is satisfied for each user. Before the further allocation of the remaining power resources, the power K over sub-channels is divided into segmentsK with a fixed length of ∆k n . [m, n, p] = max(∆F m,n (min( f −1 (Dr m ), k n ), s n )), ∀n ∈ N, ∀m ∈ M. 4: Save and combine m to U, n to L, and p to P. Update s n .

5:
Dr m = Dr m − f (p). Kr n = Kr n − p. Wr m = Wr m − p. 6: end while 7: Discretization of K with fixed interval ∆k to form a discrete power set ∆k n ∈K n among N sub-channels. 8: while 1 do 9: RCL ← R m,n = {∆k n ∈K n , ∀n ∈ N, ∀m ∈ M|∆F m,n (min(Wr m , ∆k n ), s n )}. 10: if ∑ R m,n = 0, ∀n ∈ N, ∀m ∈ M then 11: c max = max(R m,n ), R m,n ∈ RCL. 12: c min = min(R m,n ), R m,n ∈ RCL. // the minimum value exclude the value of 0. 13:RCL ← {R m,n ≥ c min + α(c max − c min ) : R m,n ∈ RCL}. 14: Select an element R m,n fromRCL at random. 15: p = min(Wr m , ∆k n ). 16: Save and combine m to U, n to L, and p to P. Update s n .

17:
Wr m = Wr m − p. Kr n = Kr n − p.K n =K n − ∆k n .  20)) follows the concept of the construction phase in GRASP. We first evaluate the objective function (11) by ergodic calculation of power segments for all users and sub-channels and save the objective values with user and sub-channel index R m,n as a restricted candidate list (RCL). The RCL is then filtered to reduce the number following {R m,n ≤ c min + α(c max − c min ) : R m,n ∈ RCL, where c min and c max correspond to the minimum and maximum values in the RCL set, respectively. The introduced α parameter is to adjust the randomness in generating RCL (the selection turns to be fully random when α = 0 and it turns to be fully greedy when α = 1). We randomly select a value from the filtered RCL setRCL, hence the user, sub-channel, and power are allocated.
The termination of the second stage (Step (10)) is reflected by the following three case: (1) full utilization of power resources ∃Wr m = , ∀m ∈ M; (2) sufficient allocation to users Kr n = , ∀n ∈ N; and (3) the user number constraint r n = s n for the second phase allocation.
It is worth noting that, owing to the user capacity constraint, the 0 values in the objective function may lead to the incorrect minimization value of R m,n . Therefore, 0 values should be ignored in the calculation (see Step (12)).
The worst-case computational complexity is: Proof. Since the proposed GRASP algorithm has two stages, we prove the computational complexity of the two stages separately.
In the first stage, the algorithm meets the minimum power requirement for each user. In the worst-case, the minimum power requirement for one user needs multiple channels to meet. The number of channels to meet the minimum power requirement for each user , which is equal to the number of greedy iterations. In each iteration, there are N number of channels as options for each user to select. Therefore, the computational complexity for each user is f −1 (Q max )N K min and the total computational complexity for all users ).
In the second stage, the algorithm allocates the rest of the resources to users. The maximum total amount of power left for all channels is K max N − f −1 (Q min )M. Then, we discretize the remaining power with an interval of ∆K forming (K max N − f −1 (Q min )M)/∆K number of power fractions. The maximum number of iterations for each user in the second stage is (W max − f −1 (Q min ))/∆K. Therefore, the computational complexity for the second stage is O( ). The total computational complexity of the algorithm is the sum of the computational complexity of the two stages.

Stochastic Sample Greedy Strategy
Given the high-computational loads introduced from the ergodic processing when building the RCL, another heuristic allocation solution, i.e., SSG, is proposed for addressing the DPCA problem. The main concept is applying the stochastic sampling of sub-channels with a probability of ρ to avoid the ergodic searching of sub-channels. Similar to GRASP, the discretization of the power value and the two-stage mechanism are needed for acceleration and maintaining the minimum data rate constraint. The proposed stochastic greedy based solver is presented in Algorithm 4. [m, n, p] = max(∆F m,n (min( f −1 (Dr m ), k n ), s n )), ∀n ∈ Ns, ∀m ∈ M. 8: Save m to U, n to L, and p to P.

9:
Dr m = Dr m − f (p). Kr n = Kr n − p. Wr m = Wr m − p. 10: end while 11: Discretization of K with fixed interval ∆k to form a discrete power set ∆k n ∈K n among N sub-channels. 12: while 1 do 13: [m, n, p] = max(∆F m,n (min( f −1 (Wr m ), k n , Kr n ), s n )), ∀n ∈ Ns, ∀m ∈ M, ∆k n ∈ K n . 14: if ∑ ∆F m,n (Wr m , k n ), s n ) = 0, ∀n ∈ Ns, ∀m ∈ M then 15: Save m to U, n to L, and p to P. 16: Wr m = Wr m − p. Kr n = Kr n − p.K n =K n /∆k n 17: else 18: Break. 19: end if 20: end while 21: return L, U and P Similar to the GRASP algorithm, the constraint of the sub-channel limits and the priority control function are reflected in the objective function (11); the power budget limit from sub-channels is reflected in Steps (11) and (13); and the power budget limit from users is represented in the Step (13).
Owing to the stochastic probability of the sub-channel sampling, the computational complexity can be derived from (12) by replacing N with ρN. The complexity of SSG is: By comparing the computational complexity in (12) and (13), the SSG reduces the complexity by down-sampling the sub-channels controlled by the ρ factor.
When the resources are abundant for the users, the theoretical performance of the proposed Greedy-based algorithms gets close to that of the Greedy algorithm [38] subject to matroid constraints in the monotone case. According to Buchbinder et al. [39] and [34], the sampling processes in GRASP and Stochastic Sample Greedy Algorithm do not reduce the theoretical approximation guarantee of the original Greedy algorithm [38] for maximizing monotone submodular functions except that the approximation guarantees become stochastic. The theoretical approximation guarantee of the original Greedy algorithm is analyzed through Algorithm 1.
It is worth noting that the applied greedy strategy by using the discrete power resources follows a similar mechanism by using the water-filling concepts. The computation is largely mitigated by using the greedy policy, and the complex constraints can be introduced conveniently. However, the current theoretical performance model is only applicable for the optimization problem with upper bounds, while we consider the lower bound, i.e., the minimum data rate for each user, and user capacity among sub-channels in DPCA. The best performance, expectation value and variance are difficult to estimate theoretically, hence we simulate the allocations and the results are presented in the following section.

Simulations and Discussions
In this section, we first present a demonstration to reveal the DPCA addressing with the presented stochastic, GRASP, and SSG solutions, respectively. The performance of the presented solutions is evaluated in terms of the data rate and the actual computation time by differing sub-channel numbers and user numbers. Given the trade-off problem existing in configuring parameters in GRASP and SSG, the trade-off analysis is implemented along with the discussions presented for the guidance purpose.

Use Case
We consider a use case of allocating five sub-channels to three users with complicated constraints. The detailed parameter configurations are presented in Table 1. The subchannel is configured to have 200 kHz bandwidth [40]. The power budget configurations refer to Fang et al. [9], where the power budget for sub-channel is averaged on 30 dBm, and the power budget for users is averaged on 40 dBm. The user capacity per sub-channel is 3. The trade-off factor α is set to 0.8. The power budget is sampled with a fixed interval of 1 dBm. We assume to have five sub-channels for allocation to meet the requirement in Remark 1. The channel propagation model and other environment interference effects are reflected in CQI [9]. We define the concept of the remaining power budget as W − P U . The stochastic algorithm, GRASP, and SSG results for DPCA are presented in Table 2.  Table 2, all constraints in (5), i.e., power restrictions, minimum data rate, and priority management, are satisfied with the three solutions. The GRASP result outperforms the other two in terms of the remaining power budget and data rate. Specifically the GRASP has the least power budget of 3.2 dBm followed by the SSG of 35.5 dBm and the stochastic algorithm of 42.25 dBm. By calculating the cumulative data rateD, the GRASP allocation results present the biggest data rate value (9.22 × 10 3 kbps), which is nearly 1.20 times the stochastic and SSG solutions. Due to the random selection of sub-channels before allocation, the stochastic algorithm and SSG algorithm may keep some sub-channels vacant, contributing to the performance degradation (SSG's data rate is smaller than GRASP's for the first user).

Performance Analysis
For the further analysis of the proposed solutions, we assume to have homogeneous configurations among sub-channels, such as K = 30 dBm, W = 60 dBm, S = 3, and Q = 1000. The elements in CQI are randomly generated following the uniform distribution ranging from [5,6] dB to avoid the unique allocation outcomes. The users are treated equally, hence the priority management is disabled ( f lag β = 1) for the stochastic algorithm, and β = 1 for the other two algorithms. By replicating the simulation 200 times, we obtain the averaged cumulative data ratesD along with error bars for representing the performance uncertainty resulted from the random selection of sub-channels. The performance results with proposed solutions are presented in Figure 2.   As depicted in Figure 2a, the GRASP result is observed to outperform the stochastic solution in terms of showing a larger data rate value and presenting smaller uncertainties (the steady mean data rate of GRASP is 6.26 times that of the stochastic algorithm, while the steady spread is around 0.1 times that of the stochastic one). Compared with SSG, GRASP displays better average values and spreads, when the sub-channel number N ≤ 6 (the increment rate of GRASP is larger than that of SSG by 17%, while the largest spread proportion of SSG is 48 times that of GRASP when N = 6). With the further increment of the sub-channel number, when N > 7, GRASP and SSG present similar data rates (the absolute error of steady data rate values is limited to 0.2 × 10 4 kbps).
When analyzing the data rate curves among solutions separately, some other results for the stochastic solution are obtained, as shown in Figure 2a. With the increasing subchannel number N, the data rate of the stochastic solution increases slightly (the climb rate is around 0.03 kbps per sub-channel), representing that the stochastic solution maintains a relatively stable performance on average. The spread length of the stochastic solution increases while the increment rate reduces with the further growth of N. The low bound of data rate is close to a fixed value attributed to the configured minimum data rate, i.e., 0.3 × 10 4 kbps in this case.
Moreover, the GRASP performance presents stable results with high data rate values and small uncertainties. There is a slight reduction of the data rate when N > 6, and we attribute such phenomenon to the non-optimal outcome in the first allocation step. Specifically, the allocated power resources are not sampled when guaranteeing the minimum requirement of transmission. With the increment of the user number, such degradation may be affected more leading to the communication performance degradation.
The averaged data rate curve for SSG grows with the increasing sub-channel number (nearly reaching the peak when N = 7) along with the reduction of the increment rate. We can roughly observe that the spreads of SSG are shortened with the growing sub-channel number (the spread is negligible when N = 12, i.e., four times M = 3). Therefore, we obtain the results that the uncertainty of the SSG solution narrows down (more reliable) with more frequency resources. The performance degradation effect attributed to the randomization in removing sub-channel candidates could be negligible under the spectrum dense environment, i.e., having sufficient sub-channels in this paper.
To analyze the allocation performance on the sub-channel capacity, we varied the user number M from 1 to 8, and the data rate results are presented in Figure 2b. Similar to the results in Figure 2a, the GRASP and SSG algorithms outperform the stochastic algorithm for each M value. The peak values for GRASP and SSG are obtained when N = 5, which means the optimal capacity with N = 10 sub-channels is M = 5 users, i.e., M = N/2. The reason for having the increasing data rate curves for GRASP and SSG when M < 5 is that the current sub-channels are sufficient for allocating a small number of users. Afterward, the data rate decreases because the resources turn to become insufficient with the growing user number. Nevertheless, such reduction is not reflected in the stochastic solution, which means the resources are still not fully exploited by using the stochastic one. The best data rate of SSG is similar to GRASP, while the worst data rates may vary when M ≥ 4. Both GRASP and SSG present close performance in terms of the data rate value and spreads in error bars, representing a similar performance under the resource dense environment (i.e., small user number and large sub-channel number). The averaged data rate curve for SSG grows with the increasing sub-channel number (nearly reaches the peak when N = 7) along with the reduction of the increment rate. We can roughly observe that the spreads of SSG are shortened with the growing sub-channel number (the spread is negligible when N = 12, i.e., four times M = 3). Therefore, we obtain the results that the uncertainty of the SSG solution narrows down (more reliable) with more frequency resources. The performance degradation effect attributed to the randomization in removing sub-channel candidates could be negligible under the spectrum dense environment, i.e., having sufficient sub-channels in this paper.
We simulated the practical running time for analyzing the computational complexity, where the PC specification was Intel I7-6700 CPU with 16GB of RAM. We set the priority management function into simulation (i.e., β = 1, 2, 3 for GRASP and SSG), and averaged the CPU running time for each allocation iterative in statistics. The time consumption results with diverse channel numbers are presented in Figure 3a. Given the practical running results, we observe that GRASP and SSG consume more time with the increasing sub-channel number N, because more sub-channel resources are taken into processing. The time consumption curve for the stochastic algorithm presents a declining tendency. Such decrease results from the fact that there is a higher possibility to meet the constraints with more sub-channels. When comparing the stochastic time consumption curve with the GRASP and SSG ones, it is noticeable that GRASP and SSG have even faster performance than stochastic. Combined with the solution in Figure 2, we conclude that GRASP and SSG are suggested when the allocation is for the extreme spectrum scarce environment (i.e., channel number is very small). By comparing SSG with GRASP, the SSG solution is processed more quickly than that of GRASP for all the N. With a smaller ρ value, SSG can be sped up further, although at a cost of losing data rate (the details of losing performance are shown in the following section).  We simulatef the running time consumption with the large sub-channel number (N = 30), and the time consumption curves are presented in Figure 3b. It is noticeable that the priority management is disabled in the stochastic algorithm because the stochastic solution is almost impossible to obtain the ordered allocation outcomes with complex constraints when the user number is more than 5. Similar results are shown in Figure 3a, for instance, SSG runs more quickly than GRASP; the small ρ value accelerates the processing in SSG. The time consumption curves for SSG and GRASP in Figure 3a turn to become nearly-linear, while they present an exponential relationship in Figure 3b.

Trade-Off Analysis
Providing that some factors, e.g., ∆K and ρ, are used to balance the trade-off between the performance and the computational complexity in GRASP and SSG solutions, we use the data rate to measure the performance and use the number of query of the objective function (11) to reflect the computational complexity. The reason for using the number of query rather than the actual running time is to improve the accuracy of measurements and reduce the interference effect from other programs. It is worth noting that there is no clear evidence to show that α presents a trade-off effect by simulations, hence the α effect is not discussed in this paper. The ∆K ranges in [0.1, 1.5] with the fixed interval of 0.05 and the ρ ranges in [0.1, 0.95] with the fixed interval of 0.05. The results of the data rate versus the number of query and sub-channel number N are presented in Figure 4.
As depicted in Figure 4a, by increasing ∆K values, data rate and query number rise. We use a fixed straight line (plotted as the solid line) to reflect the consistent selection of the optimal ∆K. Therefore, the optimal ∆K is selected as {0.1, 0.2, 0.25, 0.3, 0.35} corresponding to the N values in [3,5,7,9,11]. Consequently, we obtain the result that the increasing subchannel number leads to the relaxation of the sampling step (big ∆K) of power resources and could potentially accelerate the allocation further (the big ∆K corresponds to the smaller power set size). Such acceleration is attributed to the fact that the necessity of dividing power resources into small pieces is released when resources are sufficient. It is also observable that the increment of ∆K tends to be linear along with the growing sub-channel number (∆K increases by 0.05 when N grows by 2). Similar results can be obtained for the SSG algorithm, hence the trade-off curves are not presented.     Moreover, we plot the data rate curve versus the number of query by differing subchannel numbers and ρ values in Figure 4b. With the same method to select the consistent values in Figure 4a, we observe that, when N increases from 5 to 11, the ρ degrades quickly from ρ = 0.9 to ρ = 0.4. Specifically, only 40% usage of sub-channel resources maintains the 'optimal' result when N = 11, which could significantly improve the allocation efficiency. Combined with the time consumption results presented in Figure 3a, we observe that, when N = 11 and ρ = 0.3 (approximate to the optimal ρ = 0.4), the time consumption is reduced by 66% when comparing the SSG curve and the GRASP curve. In such case, GRASP and SSG present a similar data rate (the optimal points both show the data rate values around 1.5 × 10 5 kbps), which leads to the result that SSG reveals promising potentials under the spectrum dense environment (large sub-channel number). The linear tendency of the curve when N = 3 means that it may be difficult to find the optimal ρ under the spectrum scarce environment.

Conclusions
This paper proposes three heuristic solutions for addressing the DPCA problem in the downlink MC-NOMA system. Multiple complicated constraints are taken into consideration, e.g., the capacity for sub-channels, power budget for sub-channels, power budget for users, minimum data rate, and the priority control during the allocation. All three algorithms are capable of handling complicated constraints. The stochastic algorithm reveals benefits to the large resource allocation scenario without having priority management and the high-performance requirements. GRASP consumes a longer time in processing, while it presents better performance in terms of the data rate and reliability (narrower spread) than the SSG. Considering the large computation scenario, i.e., large user number and sub-channel number, SSG reveals advantages in accelerating the computation at the cost of losing performance. With the sufficient sub-channel number, the computation time is saved by 66% when selecting the optimal configuration of ρ in SSG, with the performance remaining similar.
Author Contributions: Conceptualization and methodology, Z.X. and T.L.; software and validation, Z.X.; writing-original draft preparation, Z.X. and T.L.; writing-review and editing, I.P.; supervision, I.P. and A.T.; and project administration, A.T. All authors have read and agreed to the published version of the manuscript.