Resource Allocation in Multi-Carrier Multiplexed NOMA Cooperative System

Non-orthogonal multiple access (NOMA) cooperative communication technology can combine the advantages of NOMA and cooperative communication, providing high spectrum efficiency and increasing user coverage for next-generation wireless systems. However, the research on NOMA cooperative communication technology is still in a preliminary stage and has mainly concentrated on the scenario of fewer users. This paper focuses on a user-centered NOMA collaboration system in an ultra-dense network, and it constructs a resource allocation optimization problem to meet the demands of each user. Then, this paper decomposes the optimization problem into two subproblems; one is the grouping match among multiple relays and users, and the other is jointly allocating power and subcarrier resources. Accordingly, a dynamic packet matching algorithm based on Gale–Shapley and an iterative algorithm based on the difference of convex functions programing are proposed. Compared with existing schemes, the proposed algorithms can improve system throughput while ensuring the quality of service of users.


Introduction
It is predictable that with the expansion in the Internet of Things (IoT) and the development of communication, a large number of wireless connections and huge data traffic will pose challenges for next-generation wireless systems [1]. The demand for spectrum efficiency and network capacity has grown rapidly [2]; since orthogonal resources are limited, traditional orthogonal multiple access (OMA) has difficulty meeting the multiple UEs demands [3,4]. Recently, due to the superior spectral efficiency, non-orthogonal multiple access (NOMA) has attracted tremendous attention in industry and academia [5]. Compared with traditional OAM, NOMA can reuse non-orthogonal superposition and then assume large-scale connectivity [6,7].
Previous research on single-carrier NOMA technology is relatively complete, and performance evaluations at the link and system levels have proven that NOMA has a better transmission rate and lower error rate than OMA systems [8]. For multi-carrier NOMA technology, the research is not sufficient and has mainly focused on the typical sparse code multiple access (SCMA) and pattern division multiple access (PDMA) technology. Research on SCMA technology mainly focuses on codebook design, channel transmission • We designed a user-centered multi-carrier multiplexed NOMA cooperative system that can fully combine the advantages of multi-carrier NOMA and cooperative communication technology to meet the abundant UE demands. • We constructed a problem to optimize throughput while ensuring multiple users' demands and decomposed it into two subproblems. Then, we proposed the corresponding dynamic grouping matching algorithm and iterative algorithm based on the difference of convex functions programing (DCP) to solve them. • Simulations were used to verify the effectiveness of the proposed NOMA cooperative network framework and the corresponding algorithms. Compared with two existing schemes, combining the dynamic grouping matching algorithm with an iterative algorithm improved system throughput while ensuring user quality of service (QoS).
The remainder of the paper is organized as follows. Section 2 describes the system model, including the signaling model and the throughput model. In Section 3, the problem is formulated as an optimization problem. In Section 4, we propose the resource allocation algorithm of the cooperative network. Section 5 presents the simulation results, which prove the effectiveness of the proposed algorithm. Finally, the conclusions are drawn in Section 6.

System Model
In this section, we describe the downlink NOMA-based cooperative network setting [29] consisting of a base station (BS), M relays, and N UEs, as shown in Figure 1. Each node is equipped with a transmit antenna and a receive antenna. The system frequency band is divided into K subcarriers. The signals of different UEs or different packets of a UE can be superposed in one subcarrier to transmit simultaneously. In addition, all the relays are connected to the BS in the backhaul stage. We assume that the UEs and relays follow two independent Poisson point processes (PPPs) with the densities of λ u and λ r , respectively. The notions about the system model in this section are listed in Table 1.  The BS can support the backhaul stage for relays and provide access services for UEs. In this paper, the UEs are assumed to be served dynamically via dense relays. The BS and the relays share the same frequency band, and the relays work in the time division duplex (TDD) mode. The signals passed into the backhaul stage and forward stage do not affect each other. At the backhaul stage, the signal is transmitted from the BS to the relays, and then, the relays decode the information of users and transmit it to corresponding users during the forward stage. Therefore, the downlink transmission can be divided into two processes, backhaul transmission and forward transmission, as illustrated in Figure 1.

Symbols Description
The density of UE λ r The density of relay α Path loss exponent β k The proportion of a time slot occupied by the access stage on the kth subcarrier The signal-to-interference-plus-noise ratio (SINR) of the mth relay on the kth subcarrier R B n The backhaul throughput for the nth UẼ R The whole system throughput

Signaling Model
In the forward stage, we take the nth UE as an example to illustrate the signals in the downlink cooperative network, n ∈ {1, 2, . . . , N}. The inter-group interference can be avoided by properly grouping relays and allocating subcarriers [30]. We use x k mn to denote the signal, transmitted from the mth relay to the nth UE on the kth subcarrier. The transit power of x k mn is p k mn . Meanwhile, h k mn and mn are the small-scale and large-scale fading channel coefficients from the mth relay to the nth UE on the kth subcarrier, respectively. The channels of the forward and backhaul stages are independent Rayleigh fading channels, and the path loss exponent is α. mn = d −α mn , where d mn denotes the distance from the mth relay to the nth UE. Then, the signal received on the kth subcarrier of the nth UE can be written as where z k n is the AWGN at the receiver of the nth UE on the kth subcarrier with mean zero and variance σ 2 .
In the backhaul stage, we take the mth relay (m ∈ {1, 2, . . . , M}) as an example to analyze the backhaul stage signals in the considered downlink cooperative network. Suppose that the signal transmitted from the BS to the mth relay on the kth subcarrier is x k m , and the power of the signal is q k m . Additionally, h k m and m denote the small-scale and large-scale channel coefficients from the BS to the mth relay on the kth subcarrier. Then, the received signal on the kth subcarrier of the mth relay can be written as where z k m denotes the additive white Gaussian noise (AWGN) with mean zero and variance σ 2 at the receiver of the mth relay on the kth subcarrier.

Throughput Model
Since the transmission process is divided into two stages, the throughput is analyzed separately at the two stages.
In the forward stage, since the successive interference cancellation (SIC) technique is applied at the receivers to decode the signals from different relays, we assume that the channel coefficients meet |H k 1n | |H k 2n | . . . |H k Mn |. H k mn = m h k mn represents the channel coefficient between the nth UE and the mth relay on the kth subcarrier. Then, the decoding order is consistent with the relay indexes. Thus, the received signal-tointerference-plus-noise ratio (SINR) of the nth UE served by the mth relay on the kth subcarrier is written as The corresponding throughput of the nth UE is given bỹ In the backhaul stage, without loss of generality, we assume that the channel coef- Here, H k m = √ m h k m represents the channel coefficient between the BS and the mth relay on the kth subcarrier. Then, the decoding is carried out in the reverse order of the relay indexes. The SINR of the mth relay on the kth subcarrier is given by Correspondingly, the backhaul throughput for the nth UE can be given bỹ where c k mn indicates whether the nth UE is served by the mth relay on the kth subcarrier or not, and c k mn ∈ {0, 1}. If c k mn = 1, the nth UE is served by the mth relay on the kth subcarrier. β k (0 β k 1) denotes the proportion of a time slot occupied by the forward stage on the kth subcarrier, and 1 − β k is the proportion of the time slot used for backhaul.
Since the signal needs to be transmitted to the relays first and then forwarded to the UEs, the system throughput is given bỹ

Problem Formulation
In this section, we maximize the system throughput under QoS constraints. Because the SIC technique is applied at the receiver, the complexity of the receiver grows with the number of superposed signals on a subcarrier. We assume that each UE can be served by up to Q relays on a subcarrier to harness the complexity of the receiver at the UEs. The constraints are written as C1 (a) : c k mn ∈ {0, 1}, ∀m, n, k Apart from the overall power constraints of the system, the power allocation in the NOMA system needs to satisfy the threshold for SIC decoding at the receiver (cf., the OMA system). Therefore, constraints C2 and C3 are given by and where P BS max and P m,max denote the maximum available powers of the BS and the mth relay, respectively, and p thr is the decoding power threshold for the SIC receiver.
In terms of the QoS of each UE, we consider whereR target n denotes the target data rate of the nth UE. Additionally, the time slot assignment coefficient between the access stage and backhaul stage on an arbitrary kth subcarrier needs to meet C5 : 0 β k 1, ∀k.
Therefore, the optimization problem can be formulated as maximize c,fi,p,qR where q ∈ R KM×1 + and p ∈ R NKM×1 + collect the power q k m allocated on the BS and the power p k mn allocated on the relays, respectively. c ∈ Z NKM×1 and fi ∈ R K×1 collect the variables c k mn and β k , respectively.

Resource Allocation Algorithms
Problem (13) is a mixed integer non-linear programming problem. It is challenging to derive a global-optimal solution [31]. In this paper, a low-complexity suboptimal solution is developed in the presence of multiple relays and UEs. Problem (13) is divided into two subproblems. First, we apply a dynamic group matching algorithm to map each UE with relays. Then, an iterative algorithm is proposed based on the D.C. programming to achieve a suboptimal solution for the joint power and subcarrier allocation.

Dynamic Group Matching for UEs and Relays
The grouping process of relays and UEs is a matching process between each UE and a set of relays serving the UE. To maximize the system throughput, we apply a deferredacceptance strategy from the Gale-Shapley algorithm to balance the two-side matching priority of the UEs and relays. Let Φ(m, n) represent the matched pair of the mth relay and the nth UE, and let Φ denote the set of matched pairs. |Φ(m, n)| = 1 denotes that the nth UE is matched with the mth relay; otherwise, |Φ(m, n)| = 0. We define an evaluation model of the pair between the nth UE and the mth relay as where With a two-sided competitive selection of the UEs and relays, each node has its matching priority list to match with others. We denote the matching priority sets of UEs and relays as where MP_UE n is the matching priority list that the nth UE matches with its nearby relays; similarly, MP_RE m is the matching priority list of the nearby UEs that the mth relay can match with. They can be further represented as where M n and N m are the number of relays near the nth UE and the number of UEs near the mth relay, respectively; MP_UE n (m n ) denotes the relay whose matching priority of the nth UE is m n , and MP_RE m (n m ) denotes the UE whose matching priority of the mth relay is n m . If MP_UE n (m n ) > MP_UE n (l n ), it signifies that the matching priority of the nth UE with the m n th relay is higher than the matching priority of the nth UE with the l n th relay. We also define the relay with the highest matching priority of the nth UE as MP_UE highest n . Correspondingly, we define the UE with the highest matching priority of the mth relay as MP_RE highest m . In this paper, to maximize system throughput, we have and The reason for our choice of the throughput R mn and small-scale channel coefficient h mn as the priority judgment criteria of relays and UEs is that they are our optimization function or one of the parameters of the optimization function, and the results screened by these criteria are more conducive to the maximization of throughput.
With the above illustration, the dynamic grouping matching algorithm between UEs and relays can be described as follows. First, we initialize the matching priority according to the available CSI. Then, we divide the grouping process into two matching processes. The first process is to guarantee that each UE can be served by a relay, and the second process is to group the relays for each UE.
In the first process, each UE requests matching the relay that prioritizes the UE over the other UEs. Then, each relay that has received the matching request from the UEs matches the UE which prioritizes the relay over the other relays, and then, it rejects the other UEs. This process is repeated until all UEs are served by at least one relay.
In the second process, each UE requests matching the unmatched relay that has the highest priority to the UE. Subsequently, the relay that has received matching requests from UEs selects the UE according to its matching priority if the number of relays in a group is below Q. When the number of relays in a group is Q, we determine whether the UE sending this matching request is more effective for improving the throughput than the other UEs in the group.
If this is the case, then we update the matched pair; otherwise, we reject the matching request. This process is repeated until all the relays are grouped or no UEs request matching with any relays. The details of the dynamic grouping matching algorithm are provided in Algorithm 1.  (20) and (21)  for relay m = 1, 2,. . . , M do 5: Each relay matches the UE with the highest priority according to the matching priority set of relays {MP_RE} and rejects the other UEs 6: The rejected UEs remove the mth relay from its matching priority set {MP_UE} 7: Add the matched paring Φ(m, n) to the set Φ and remove the mth relay and nth UE from U_RE and U_UE, respectively 8: end for 9: end while 10: while {MP_UE} = ∅ or U_RE = ∅ do 11: Each UE requests to match its highest matching priority relay from U_RE according to the updated set {MP_UE} 12: for relay m = 1, 2,. . . , M do 13: The mth relay makes the following judgment for its highest matching priority UE according to its matching priority set {MP_RE} 14: if ∑ M m=1 |Φ(m, n)| < Q then 15: Add the matched paring Φ(m, n) to the set Φ and remove the mth relay and nth UE from U_RE and U_UE, respectively 16: else 17: The relay matches with the nth UE when there exists a relay that satisfies ψ mn > ψ ln and |Φ(l, n)| = 1; then, it updates Φ and removes the lth relay into U_RE where b ∈ Z NK×1 collects variables b k n , ∀n, k. We combine the mixed integer constraint C1 with constraint C5, as given by The matching between the UEs and relays in constraint C1 is obtained by Algorithm 1. Only b k n remains to be solved in constraint C1. The integer constraint C1 c is equivalent to the following expression: Now, the optimization with the integer constraints is transformed to a continuousvalue problem. We define u ∈ R NK×1 , and v ∈ R NK×1 to collect the variables u k n and v k n , respectively. Problem (22) can be reformulated as: According to the theorem of monotone optimization [28], the equivalent problem of (25) can be formed as: where η is a sufficiently large penalty factor if u k n + v k n is neither 0 nor 1, and η 1. Then, we transform the decoding threshold constraint C3 into a maximum interference [32] The new constraint C3' is a convex set. However, the problem is still a non-convex problem, since neither the objective function nor constraint C4 is convex. Nevertheless, the following equivalent form always holds, Therefore, we derive that Similarly, we havẽ where Then, the non-convex constraint C4 can be rewritten as Constraint C4' is the difference of two convex Functions (31)- (33). Additionally, we have Therefore, we can rewrite (26) as minimize u,v,p,q where Note that F A n (u, p), G A n (u, p), F B n (v, q), G B n (v, q), H(u, v), and M(u, v) are convex functions. Therefore, problem (35) is a D.C. program. We can implement successive convex approximation to obtain a suboptimal solution of the problem [33,34]. Given the differentiability of the convex functions F A n (u, p), G A n (u, p), F B n (v, q), and M(u, v), for any feasible point u (τ) , v (τ) , p (τ) , and q (τ) , we have and In (36)-(39), ∇ affine and M(u, v), respectively. The gradients in the affine functions can be given by and For a given feasible point u (τ) , v (τ) , p (τ) , and q (τ) , we can achieve the upper bound of (35) by solving the following convex optimization problem: , p). Generally, the convex problem in (48) can be readily settled by standard convex program solvers, and it can be solved by standard convex programming solvers such as CVX [33]. We propose a successive convex approximation to tighten the upper bound solution in (48) by an iterative algorithm, i.e., Algorithm 2. It can generate a sequence of feasible solutions continuously and achieve a locally optimal solution in polynomial time [34].

Computational Complexity Analysis
The computational complexity of an exhaustive search in the grouping matching algorithm is O(2 M N). The exhaustive search scheme is user-centric, as it divides each UE into a group, and each relay can either belong to the group of the UE or not. Thus, the solution to all groupings is 2 M N, and the computational complexity of the exhaustive search is O(2 M N). The computational complexity of Algorithm 1 is O(NM 2 ). Specifically, N M steps are needed, while each UE matches with a relay for grouping in the proposed grouping algorithm, and the steps for the grouping process are less than M · N M steps. Therefore, the total computational complexity of the proposed grouping algorithm is O(NM 2 ). The computational complexity of the D.C. programming is O(T max M); as T max is no more than QM, the computational complexity of the D.C. programming is O(M). Thus, the computational complexity is O(NM 3 ).

Convergence of Algorithm 1
We divide the algorithm into two processes, and the first process guarantees UE communications. The system performance is slightly degraded to satisfy QoS. The second process of the algorithm is convergent, and the proof is as follows.
Proof. When UE n a matches with relays m 2 , m 3 , . . . , and m Q (the descending order of priority) in the second process, assuming that there exists a relay m q matching UE n b , the priority of m q for n a is higher than that of m Q . n a is higher than n b in the priority list of relay m q simultaneously.
When UE n a matches relay m Q not m q , there are two situations: UE n a sends a request to m q , and relay m q is rejected, which indicates that the priority of n b is higher than n a in the priority list of m q . n a does not send the request to m q . We can conclude that the priority of m Q is higher than m q in the priority list of n a . The two situations of the hypothesis cannot exist simultaneously, and thus, the hypothesis is not true. There are no better matched pairs, so the matched pair obtained is stable.

Simulation Results and Analysis
In this section, we evaluate the proposed framework and algorithms in terms of the system throughput through simulations. To make a fair comparison, we try to use the same system configuration in OMA, Co-OMA and the traditional NOMA system with the proposed scheme. We deploy the BS in the middle of a 1000 m × 1000 m area. The UEs and relays are modeled as independent PPP with density λ u and λ r . UEs are generated in the area randomly, as illustrated in the top of Figure 2. Other simulation parameters are summarized in Table 2. To show the grouping directly, we provide the schematic diagram when the maximum stack semaphore Q in our proposed scheme is set as 3. The matched pairs of relays and UEs chosen by our proposed algorithms are shown at the bottom of Figure 2. The codes are developed on MATLAB using the CVX toolbox and are executed on a 64-bit operating system with 16 GB RAM and Intel CORE i7, 3.4 GHz.  In Figure 3, we compared the system throughput of the proposed NOMA-based cooperative network scheme, the OMA-based cooperative network (Co-OMA) scheme, the NOMA, and the OMA scheme under different densities of UEs. The density of relays was set to 300. According to Figure 3, we can see that the proposed scheme achieved the highest system throughput, and it exhibited a 50% gain when the density of UEs exceeded 200 compared to the Co-OMA scheme. This is because with the increasing density of UEs, the spectrum resources is limited, and the proposed scheme shows the advantage of using non-orthogonal resources. Througtput (bps) 10 7 The proposed scheme Co-OMA Traditional NOMA OMA In addition, NOMA and OMA solutions without collaborative communication technology can provide high overall throughput when the density of user nodes becomes too high. This is because to ensure the QoS of some weak channel UEs, a large number of system resources are sacrificed to the specific UEs, which leads to the slow decline of the total throughput of the system and eventually tends to be stable. The comparison between NOMA and cooperative OMA in [35] shows the relationship between backhaul capacity and micro-area access number. When the number of UEs served by the system exceeds the threshold value, the system performance will decline OMA with sufficient system resources (low user node density). However, the user node density increases, and the cooperative OMA can improve the throughput by using the channel gain of the backhaul link and the multiplexing gain of a large number of relay nodes, thus exceeding the throughput performance of a non-collaborative NOMA scenario.
To verify the effectiveness of our proposed algorithms, we compared it with the following benchmarks in the NOMA-based cooperative network: (1)  . It needs to be emphasized that although DRG-FTPA has the lowest complexity, due to the unchangeable natuer of the transmit power, the DRG-FTPA shows the worst performance during the simulation. Then, we showed the simulation results in terms of the maximum power of relays, the density of relays and the density of UEs. Figure 4 shows the system throughput diagram of several algorithm schemes under different relay transmitting powers, where the densities of UEs and relays are 200 and 500, respectively. The proposed DRG-DCPA resource allocation algorithm can obtain the highest system throughput, which is followed by the ORG-DCPA algorithm and ORG-FTPA algorithm. In particular, compared with the system throughput of other resource allocation algorithms, the DRG-DCPA algorithm can at least double the system throughput when the maximum power is more than 15 dBm. Througtput (bps) 10 7 The proposed algorithm ORG-DCPA DRG-FTPA ORG-FTPA  Figure 5 shows the system throughput comparison of the above resource allocation algorithm schemes under different relay densities where the user node density is set to 50. When the density of relays is large enough, the system throughput of the DRG-DCPA resource allocation algorithm improved by more than 60%, 100% and 200%. When the density of intermediate relays increases, the system throughput of the DRG-DCPA algorithm improves faster, indicating that the algorithm has a stronger ability to utilize relay resources and thus can obtain more multiplexing gain. (bps) 10 7 The proposed algorithm ORG-DCPA DRG-FTPA ORG-FTPA Figure 5. The system throughput versus the density of relays for using different optimization algorithms. Figure 6 shows the system throughput comparison of the above resource allocation algorithm schemes for different densities of UE. The density of the relay is set to 300. When the relay node density is sufficient, the throughput of the DRG-DCPA algorithm will increase with increasing user density, and when the user node density is 200, the throughput of the DRG-DCPA algorithm will increase by more than 60%, 120% and 210% compared with the other three algorithms. In addition, the use of a fixed percentage of the power allocation algorithm under the condition of excessive user node density decreases. This is because the above allocation algorithm may spend too much power to weak channel users to ensure the QoS. This causes the system to not effectively utilize resources; when the user node is too saturated, the system throughput will deduce.
In Figure 7, we present the effect in the proposed collaborative system on throughput with different maximum numbers of signals superposed per subcarrier. When the maxi-mum relay power is 15 dBm, the proposed system can improve the throughput more than twice under QoS constraints. In addition, we use Q to denote the maximum number of signals superposed per subcarrier; when Q is larger, the proposed scheme can achieve a higher throughput. In particular, the throughput increases rapidly when the intermediate maximum power is higher. One possible reason is that the number of overstacked signals at low SINR will make the interference in the SIC decoder too large to meet the constraint C3; thus, the gain of the throughput is reduced. It should be noted that the influence of the direct path is not considered in the reference, and the channel condition is poor. The simulation comparison result is the lower bound of the system and algorithm performance, but it can also effectively demonstrate the effectiveness of the proposed collaborative system and optimization algorithm. To confirm the above conjecture, we give a further simulation. In this simulation, we consider the direst path scenario. In addition, if the impact of the direct path on the system is considered, there is generally a direct path between the BS and the relay, but there is no direct path between the relay and the user. At this point, the channel between the base station and the relay is better, which is generally regarded as the Rician fading channel, and the improvement of the channel condition can help relax the constraint C3. The simulation shows that in the low SINR, Q = 3 with direct path performances better than Q = 4, but as the power increased, the Q = 4 without a direct solution plays a non-orthogonal higher resource utilization. The simulation result can further improve our conjecture.

Conclusions
In this paper, the NOMA-based multi-user and multi-relay cooperative network has been studied. To maximize the system throughput, we have designed the resource allocation algorithm as a mixed integer non-linear programming problem. To improve its tractability, we have divided the problem between (1) dynamic group matching of relays and UEs and (2) DCP-based joint allocation of power and subcarriers. Simulation results have confirmed that higher system throughput can be achieved through the proposed algorithm. Compared with Co-OMA, OMA, and NOMA, the proposed algorithm had the highest throughput. The proposed algorithm can also increase the system throughput substantially when the maximum power of the relays is high. The superiority of the proposed algorithm was substantiated by comparing it with different algorithms under various user density and relay density configurations. Simulation results confirmed that the proposed algorithms can be appropriately applied to IoT scenarios with massively small UEs.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: