Deep Neural Network (DNN) for Efﬁcient User Clustering and Power Allocation in Downlink Non-Orthogonal Multiple Access (NOMA) 5G Networks

: Non-orthogonal multiple access (NOMA) emerges as a promising candidate for 5G, which radically alters the way users share the spectrum. In the NOMA system, user clustering (UC) becomes another research issue as grouping the users on different subcarriers with different power levels has a signiﬁcant impact on spectral utilization. In previous literature, plenty of works have been devoted to solving the UC problem. Recently, the artiﬁcial neural network (ANN) has gained considerable attention due to the availability of UC datasets, obtained from the Brute-Force search (BF-S) method. In this paper, deep neural network-based UC (DNN-UC) is employed to effectively characterize the nonlinearity between the cluster formation with channel diversity and transmission powers. Compared to the ANN-UC, the DNN-UC is more competent as UC is a non-convex NP-complete problem, which cannot be entirely captured by the ANN model. In this work, the DNN-UC is ﬁrst trained with the training samples and then validated with the testing samples to examine its mean square error (MSE) and throughput performance in an asymmetrical fading NOMA channel. Unlike the ANN-UC, the DNN-UC model offers greater room for hyper-parameter optimizations to maximize its learning capability. With the optimized hyper-parameters, the DNN-UC can achieve near-optimal throughput performance, approximately 97% of the throughput of the BF-S method.


Introduction
The main vision for 5G is not only to provide more synergistic, pervasive, and ubiquitous broadband access with a higher capacity and throughput at a more affordable cost, but also to entirely embrace new technological challenges that span far beyond, to transform new industries, enable new inventive technologies, and empower new types of user experiences [1]. However, one of the hurdles that always plagues the 5G rollout is the scarcity of radio resources, particularly spectral bandwidth, which become even more limited as 5G is anticipated to connect everyone and everything to facilitate the deployment of the massive Internet of Things (IoT) [2]. The conservative orthogonal multiple access (OMA) used in 4G has reached its spectral performance asymptote where any enhancement of bandwidth utilization beyond this point will not improve the performance to any further extent. Recently, the emergence of non-orthogonal multiple access (NOMA) has boosted the performance asymptote of spectral efficiency to another new height, making NOMA a viable and promising multiple access candidate for 5G [3].
In principle, NOMA can concurrently support the transmissions of multiple users in a single resource block. In a downlink power-domain NOMA system, multiple signals destined to different users are multiplexed using superposition coding (SC) at the base station (BS) with different transmission power levels on a non-orthogonal basis. Usually, the Symmetry 2021, 13 stronger users (with higher channel gains) are allocated with lower power levels whereas the weaker users (with lower channel gains) can transmit with higher powers. With such a power allocation strategy, the stronger users can suppress the interfering signals using a successive interference cancellation (SIC) receiver, which first decodes the dominant interfering signals and subtracts them from the superposed signal. For weaker users, since the powers assigned to the stronger users are low, the interfering signals are negligible and can be treated as noise.

Prior Works
Driven by 5G, NOMA has been rigorously researched and studied in academia and industries. To date, the research on NOMA technology is still actively ongoing. Recently, research works on the analysis of rate region, studies of bit error rate (BER) performance, and investigations of various resource allocation schemes for NOMA systems have been reported in [4][5][6] to validate the viability and effectiveness of NOMA in handling more users and managing the scarce radio resources. Particularly, explorations of numerous resource management techniques employing NOMA have been performed, among which some of the techniques are reviewed and recommended for 5G. In [7], the performance of the downlink NOMA scheme has been scrutinized with a random distribution of users in which the proposed NOMA scheme is capable of achieving a superior ergodic sum-rate and better outage performance than conventional multiple access methods. However, judicious selection of the users' targeted data rates and allocated power is crucial as the user's outage probability will always be one if the two parameters are not properly chosen. Next, the unicast-multicast transmission with fixed power allocation and dynamic power allocation schemes for the NOMA system has been developed in [8] by proposing the best user selection scheme that can fully utilize the diversity order offered by the multicast users. The drawback of this approach is that the heterogeneity among the multicast users is not taken into account and thus it is unable to best serve all the users concurrently. A fair NOMA approach pairing a near-BS user and a cell-edge user has been proposed in [9] to prove that the ergodic capacity limit can be enhanced for every user by utilizing a fraction of the total transmit power as compared to the capacity per user achieved by OMA. However, this power allocation strategy requires the BS to acquire perfect instantaneous channel state information which is impossible to be obtained in practical wireless systems. Meanwhile, the performance of the cooperative NOMA system has been inspected in [10] where the closed-form expressions of outage probability for near user helping the far user was derived to overcome the zero-diversity order for the far user. To implement such a system, the cancellation of both self-interference and inter-user interference is necessitated.
Over the last few years, research has focused on optimizing the user clustering (UC) for NOMA systems to improve NOMA performance. Unlike OMA, NOMA allows users to share the radio resources, therefore the cluster formation for NOMA plays a vital role in determining the ultimate bandwidth performance. The impact of user pairing on the performance of a fixed power allocation NOMA and cognitive radio assisted NOMA was studied for a downlink 5G transmission system in [11] to demonstrate how the NOMA system can offer a larger sum-rate than its OMA counterpart. In a fixed power allocation NOMA, users have been paired based on the significant difference in their channel gains. On the other hand, in a cognitive radio-assisted NOMA, a user that does not experience a significant difference in channel gains was opportunistically paired with another user under the condition that the former's interference does not adversely affect the QoS requirement of the latter. Conclusively, it has been found that cognitive the radio-assisted NOMA prefers to pair the first strongest user with the second strongest user, whereas the fixed power allocation NOMA favors pairing the strongest user with the weakest user. However, both techniques have certain limitations. The fixed power allocation NOMA might be unable to satisfy the user's predefined QoS if the target data rate for the weak user is large. On the other hand, in the cognitive radio-assisted NOMA, the performance of the strong user might be sacrificed as this user will only be served after the QoS of the weak Symmetry 2021, 13, 1507 3 of 20 user is fulfilled. Moreover, the user pairing for a cooperative NOMA transmission system was examined in [12], in which two users experiencing significantly different channel gains are grouped together based on the sorted users' channel gains in ascending order. It was demonstrated in the same work, that such pairing of users in a NOMA cluster provides significant performance gain over other existing UC methods as well as the OMA. However, instead of using the optimal power allocation, the authors only considered the fixed power allocation in their work. In another study, two novel UC schemes that include a centralized UC and a distributed UC were proposed [13] in which users are sorted based on the large-scale fading (LSF) gain to enhance the performance of the NOMA system. Particularly, in the centralized UC, the primary user with the highest LSF gain is selected and paired with the complementary user that is selected based on the principle of signal difference alignment (SDA). In contrast, in the distributed UC, the primary user is selected similar to that of the centralized UC and the complementary user is selected based on the zero-forcing vector achieved by the primary user. The work in [13] analytically showed that a NOMA utilizing the proposed centralized and distributed UC techniques can achieve a much better throughput performance than the conventional UC schemes. However, these conventional UC schemes only exploit the channel gains, but the distribution of users in each cluster is not taken into account.
The performance of NOMA under dynamic UC (DUC) was proposed in [14] to maximize the sum-throughput performance of a NOMA system by solving the formulated mixed-integer non-linear programming problem based on the Karush-Kuhn-Tucker (KKT) optimality conditions. In this work, the UC has been extended from grouping a pair of users to grouping three and four users in a cluster, which manifests a better performance due to higher frequency reuse. The number of clusters is pre-defined in this work to satisfy the SIC constraint before the UC is conducted. Due to the limitations of having a fixed number of users per cluster and a fixed number of clusters per network, NOMA users are occasionally forced to form clusters with unfavorable users, which may lead to throughput degradation.
In order to fully exploit the asymmetrical channel diversity among the NOMA users for UC, the above-mentioned limitations are omitted in [15], which permits users to dynamically form any clusters with any users with the sole objective to maximize the sum throughput, regardless of the number of users per cluster and the number of clusters per network. The authors in [15] analyzed the theoretical performance upper bound of a NOMA system by incorporating a brute-force search method (BF-S) in the UC. The BF-S approach searches all possible cluster formations and produces the optimal clustering output together with the optimal powers, which results in the highest throughput. Due to this exhaustive search, the BF-S-based UC incurs a prohibitive complexity, which is impractical for implementation in reality. In [16], the same UC problem was solved by employing particle swarm optimization (PSO) for UC in a NOMA system. The work has shown that the fast converging PSO-based UC algorithm can tremendously reduce the computational complexity, attaining a sub-optimal throughput performance. In the proposed PSO-based UC, the main issue encountered is the early convergence of the particles, causing the search space to always get trapped in a local minimum, particularly in a large NOMA network.
To reduce the performance loss due to the sub-optimality in PSO-based UC optimization, a more scalable and intelligent UC scheme is required to cope with the challenging 5G environments. The existence of the BF-S-based UC has shed some light on the application of machine learning algorithms in UC problems where the UC datasets of various NOMA scenarios can be generated using the BF-S method, which can be used to train the machine learning models for UC purposes. In 2020, the artificial neural network (ANN) was first adopted in solving the UC problem [17]. The proposed ANN-based UC is first trained with the datasets generated using the BF-S method, the scheme is then used to predict the cluster formations in the testing phase. Empirically, it is revealed that the throughput achievement is much better than the PSO-based UC, especially in a NOMA network with a large number of users. Furthermore, it is also observed that the ANN-based UC is very adaptive and flexible because the ANN model is trained to fully explore the channel heterogeneity and diversity among the NOMA users so that it can output an optimal clustering solution under any circumstances.

Motivations and Contributions
The work in [17] shows that the machine learning algorithms have started making inroads into UC optimization for NOMA systems with the availability of huge training datasets generated by [15]. Even though the ANN-based UC [17] can outperform other existing UC schemes in different scenarios, the near-optimal throughput performance comparable to that achieved by the BF-S method is still far beyond reach. Due to the lack of hyper-parameter optimization and the number of hidden layers, the ANN model cannot fully characterize the nonlinear relationship between the UC with the channel diversity and power difference among the users. To precisely model the nonlinearity between cluster formation, channel gains, and power allocation, a deep learning [18] approach can be adopted. As compared to ANN, the deep neural network (DNN) offers more room for optimization.
The application of DNN in wireless communications is not new, it can be traced back to 2017 where the DNN method is incorporated into the orthogonal frequency division multiplexing (OFDM) system [19] for signal detection in wireless communications. Additionally, DNN was also implemented in traffic control systems, whose exceptional performance has been proven in [20][21][22][23]. An effective DNN-aided NOMA system of randomly deployed users has been proposed in [24] to model the channel state information (CSI) to estimate the characteristics of the channel automatically. Meanwhile, DNN that automatically analyses the CSI of the communication system and detects the original transmit sequences has been proposed in [25]. The proposed scheme aims to combine the channel estimation process with the recovery of the desired signal suffering from channel distortion and multiuser signal superposition.
In [26], a deep learning-based resource allocation framework was developed for the downlink of simultaneous wireless information and power transfer (SWIPT) enabled MC-NOMA system with the pattern division multiple access (PDMA) scheme. The work demonstrates that the deep learning model could substantially reduce the required computation time while attaining a power consumption performance similar to that of the exhaustive search method. However, when the number of users is large, the complexity of this technique becomes significantly higher. To deal with imperfect SIC, Ref. [27] applies DNN to allocate power in the downlink of the NOMA system. Although the DNN based power allocation scheme could achieve near-optimal performance in terms of energy efficiency at a much lower complexity, only a very small number of users is considered in their work. In another attempt, Sim et al. proposed a convolutional neural network-based SIC for the downlink of the NOMA system to alleviate the sum-rate loss caused by imperfect SIC [28]. However, the maximum number of users considered in [28] is limited to four. In [29], a deep learning assisted receiver was developed for the uplink of the Faster than Nyquist (FTN) based NOMA system. However, the robustness of the technique with highorder modulations has not been investigated in [29]. To enhance the sum-rate of the NOMA system under the condition of imperfect SIC, a resource management framework based on DNN was designed in [30]. However, the work in [30] does not consider the scenarios in which the number of users dynamically changes in real-time. The prior work has laid down strong evidence that the DNN has the highest potential to achieve near-optimal performance among all the machine learning models. It is believed that DNN can explore deeper to capture the entire nonlinear transformations of clustering information that result in the precise prediction of UC to obtain a near-optimal throughput performance.
In this paper, we propose a novel DNN-UC scheme by adapting the DNN architecture into the NOMA environment with the consideration of asymmetrical fading in a NOMA channel. The DNN-UC will be implemented in two phases where during the training phase, the DNN-UC model is trained with datasets whereas, during the testing phase, the performance of the DNN-UC scheme is validated and benchmarked with other existing UC schemes. In short, the contributions of this paper are summarized as follows:

1.
A new NOMA-based DNN architecture is developed and the DNN model is adapted to various NOMA environments with asymmetrical channel fading to carry out UC operation. A novel algorithm for the DNN-UC scheme is developed for the BS to optimally cluster the users before transmitting the superposed signals to different clusters.

2.
The proposed DNN-UC model is comprehensively trained using a back-propagation learning algorithm with datasets containing cluster formation in a large variety of NOMA scenarios. Then, the trained model is validated by examining the mean squared error (MSE) in the testing phase.

3.
An efficient power allocation scheme is derived based on the ultimate coalition formation to further enhance the throughput attainment subject to the SIC constraints.

4.
Optimization of the hyper-parameters during the learning process of the DNN model is performed extensively to minimize the MSE and maximize the sum throughput to achieve a symmetrical balance between the MSE and throughput. The impact of different activation functions on the performance of the DNN-UC is also analyzed.

5.
The performance of the DNN-UC scheme in terms of throughput maximization is investigated for different NOMA environments to demonstrate the capability of the proposed scheme of adapting to various scenarios.

Organization of the Paper
The rest of the paper is organized as follows. The channel and system models for a downlink NOMA-based 5G network are developed in Section 2. The throughput maximization problem is formulated in this section together with the description of a dataset generation. In Section 3, a brief introductory explanation of the DNN model is included. In the same section, a novel DNN-UC scheme is designed together with the proposed DNN algorithms for training and testing phases. An efficient power allocation strategy is derived based on the output of the DNN-UC. Furthermore, simulation results with in-depth analytical discussion and justification are presented in Section 4. Various simulation plots showing the optimization of hyper-parameters of the proposed DNN architecture are illustrated in this section. Last but not least, the paper ends with some insightful concluding remarks in Section 5, which also navigates the readers to some possible future research avenues related to this work.

Channel and System Models for 5G NOMA-Based Networks
In this work, let's consider a downlink transmission of a NOMA-based 5G cellular network with a single cell within which K number of users are randomly and uniformly deployed. A single BS is located at the center of the cell, serving all NOMA users on an allocated 5G frequency band B T which is divided equally into M number of subcarriers, each with a bandwidth of B = B T /M. The distance of a user n from the BS is denoted as d n . In this context, it is assumed that the BS is endowed with the full knowledge of the channel state information (CSI) of all users in this network. Based on the prior knowledge of the CSI, the superposed signal is transmitted by the BS with a power p n,m to the user n on subcarrier m and the channel gain between the BS and the user n on subcarrier m is characterized as g n,m .
Unlike the conventional OMA, a NOMA-based network allows the sharing of subcarriers among all users. In this work, we incorporate a NOMA system with a dynamic subcarrier sharing environment in which a user is allowed to autonomously form a group with other users to receive their own data from the BS using the same sets of subcarriers. The flexibility in terms of the group formation introduced in this work enables the users to freely form any cluster with others regardless of the limitation on the number of users per cluster or the number of clusters allowed to be formed in the networks, which is normally enforced in literature to satisfy the SIC constraint. This dynamic clustering may result in a "grand" cluster (all users sharing all their subcarriers in a single big cluster) or a "singleton" cluster (one cluster contains one user only where each cluster/user is exclusively allocated with different sets of subcarriers, which is similar to the OMA implementation). In between the singleton and grand clusters, there are B K − 2 possible combinations of clustering outcomes where B K can be denoted as Bell number [15] as follows: where B 0 = B 1 = 1. For brevity, the user set containing N number of the NOMA users sharing the subcarrier m can be denoted as Since a cluster consists of a set of users sharing the same set of subcarriers, a cluster i can be formed by having C i = U m , m ∈ M i where M i is the set of subcarriers assigned to cluster i. Consider a specific two-user scenario on subcarrier m where a NOMA nearby user U 1,m and a NOMA distant user U 2,m (d 1 < d 2 ) in which the channel condition of U 1,m is naturally superior to that of U 2,m (g 1,m > g 2,m ). If U 1,m and U 2,m are grouped in the same cluster to receive data on subcarrier m, a lower transmit power is assigned to U 1,m but the power allocation must meet the SIC constraint denoted as p 2,m g 1,m − p 1,m g 1,m ≥ p min where p min is the minimum power difference to distinguish the decoded signal and the remaining nondecoded signals. With this condition, SIC can be performed successfully by U 1,m which first decodes the interfering signal destined to U 2,m and removes it subsequently. After that, the message intended to U 1,m can be decoded accordingly without interference. Figure 1 illustrates an N-user NOMA-based 5G system where N users are sharing a specific band of frequency for the downlink transmission in a cluster. The SIC operation is independently performed by the N users to retrieve their respective messages from the superposed signals. The SIC constraint described above can be extended into an N-user scenario so that each user is able to distinguish and decode their message successfully. From this diagram, it is noticed that the UC is essential for a multi-carrier NOMA system in which the clustering of users can ensure efficient power allocation and fulfill the SIC constraints. Furthermore, since the channel realization of every user is distinct on different subcarriers, the UC strategy is crucial to ensure that the channel diversity of all users can be fully explored and exploited to enable efficient clustering.  In the NOMA system where N users are sharing a subcarrier in a cluster , the signal received at , ∈ can be formulated as where , represents the transmitted signal by the BS to , and ,~( 0, 2 ) represents additive white Gaussian noise (AWGN). In order to enable the successful im- In the NOMA system where N users are sharing a subcarrier m in a cluster C i , the signal received at U n,m ∈ C i can be formulated as where x n,m represents the transmitted signal by the BS to U n,m and η n,m ∼ CN 0, σ 2 i represents additive white Gaussian noise (AWGN). In order to enable the successful implementation of SIC, power must be allocated to all users within a cluster following the SIC conditions [14]. Based on the power allocation strategy proposed in [14], the channel gains of the users clustered together are first sorted in a descending order g 1,m > g 2,m > · · · > g N,m , the powers are then assigned to all users to satisfy the SIC constraints. The received signalto-interference-plus-noise ratio (SINR) at the receiver side for U n,m , ∀n = 1, 2, 3, . . . , N can be expressed as where G n,m = |g n,m | 2 /σ 2 i represents the normalized channel gain of the user n on subcarrier m. From (3), the achievable throughput for U n,m , ∀n = 1, 2, 3, . . . , N can be written as Based on the Shannon equation expressed in (4), the achievable throughput of a cluster i can be denoted as Unlike the work in [14] which rigidly pre-defines the number of clusters in the UC of the NOMA system, the UC proposed in this paper fully explores the channel diversity among all users to form clusters, which leads to an unknown number of clusters mainly determined by the diversity of the channel gains of all users. If there are C clusters formed in the NOMA system, the total system throughput of the NOMA network can be denoted as In this paper, the UC problem is formulated as a throughput maximization problem in which the BS performs efficient UC and power allocation to maximize total system throughput in (6) subject to various constraints.
For convenience, we tabulate the complete list of notations adopted in this paper in Table 1. Transmit power of user n on subcarrier m g n,m Channel gain of user n on subcarrier m Weight matrix that links (q-1) th layer to the q-th layer f q (·) Activation function for the qth layer b q Bias vector for the qth layer a q Output vector of hidden layer z Predicted UC at the output of DNN-UC J(θ) Cost function η Step size S Total number of samples

User Clustering Problem Formulation
To ensure that the SIC operation can be implemented successfully at the users' receivers, the power used by the BS for the downlink transmission requires a careful allocation. The SIC condition for power allocation for cluster i can be expressed as: where p SIC is the minimum power difference necessitated by a user n to distinguish between the intended decodable signal and unwanted nondecodable signals from superposed signals sent by the BS. Let the user clustering indicator set of a user n be θ n = θ 1 n,m , θ 2 n,m , . . . , θ C n,m , ∀m ∈ M where θ i n,m is the clustering Boolean variable such that θ i n,m = 1 if a user n is grouped into a cluster i sharing a subcarrier m, otherwise θ i n,m = 0. Concretely, the K-user clustering indicator set can be denoted as θ = θ 1 , θ 2 , . . . , θ Q . The power allocation strategy for the BS to all users can be represented by the power allocation vector P = {p 1,m , p 2,m , . . . , p K,m }, ∀m ∈ M. By optimally selecting the clustering and power allocation strategies, the sum throughput of the NOMA system can be maximized subject to various constraints. The throughput maximization problem is formulated as The maximization in (8) is subject to constraints (8a)-(8d). Constraint (8a) limits the total power allocation to the available power budget at the BS while constraint (8b) ensures all users achieve the minimum required throughput, R min on their allocated subcarriers. Furthermore, constraint (8c) defines the SIC conditions that must be enforced in every cluster, and constraint (8d) guarantees that one user can be assigned to one cluster only. In this work, we assume that subcarriers are pre-allocated to every user and they can share the subcarriers with other users once they form clusters. Unlike the problem formulated in (14), the number of clusters and the number of users in each cluster are unknown and not pre-determined. This relaxation can further enhance the utilization of channel diversity in improving throughput, but it makes the maximization an NP-hard complete problem.
The throughput maximization is achieved by clustering users in their optimal group with optimal power allocations which results in the highest aggregate throughput of the system. In other words, user individual interest is not taken into account where some of the users may be deprived of their best individual throughput attainment for the sake of overall throughput improvement.

Proposed DNN Based UC in NOMA
In this section, a detailed description of the UC dataset used is provided and the working principle of the proposed DNN-UC scheme for the NOMA systems is presented.

Description of UC Dataset
To train the proposed model, the dataset which consists of transmit powers, channel gains, and the optimal cluster formation is generated using the B-FS-based UC developed in [15]. Specifically, the B-FS approach exhaustively searches through all the possible combinations of cluster formations obtained from (1) and examines their corresponding throughput performance by applying (6) to find the best cluster formation which could lead to the highest throughput. In K-user NOMA systems, there will be a total of 3K attributes, which comprises the K channel gain values, the K transmit power values, and the K cluster information values. Despite the fact that the computational complexity of the B-FS is prohibitively high which makes it impractical for real-world implementation, it plays a crucial role in establishing the throughput theoretic upper bound for NOMA systems and enabling the DNN-UC to learn from the best UC.
Once the best cluster formation is obtained, the clusters need to be systematically numbered to facilitate the training of DNN-UC. To this end, the smallest channel gains in each of the clusters are first determined, the clusters are then sorted in ascending order based on their smallest channel gains and numbered accordingly. Further, to ensure that the proposed DNN-UC is capable of adapting to network dynamics and channel variations without re-training the model, the dataset for the NOMA with a different number of users at random positions and a different number of subcarriers experiencing different fading levels is collected to train the DNN-UC. More explicitly, x contains the users' channel gains and initial transmit powers, while z corresponds to the clustering information. Hence, for a K-user NOMA system, there are 2K input nodes and K output nodes, i.e., I = 2K and V = K. Mathematically, the i-th input of DNN-UC x i can be formulated as

Working Principle of DNN Based UC
W (q) is the N q × N (q−1) weight matrix that links the (q-1) th layer to the q-th layer such that the (n, m)th element of W (q) denotes the weight that is associated with the connection from node n in the (q-1)th layer to node m in the qth layer. Upon defining f q (·) and where In DNN-UC, the activation functions are only applied to the hidden layer nodes. These activation functions play an essential role in dictating the success of DNN-UC in predicting the optimal clustering. Apart from determining if a node should be activated, it also facilitates the training of connection weights and enables powerful modeling of the complex user clustering problem via nonlinear transformation. The predicted UC at the final layer of DNN-UC can be obtained by aggregating the weighted sum of a (J) and the bias of output layer b (J+1) , which can be expressed aŝ During the training phase, the learning process in finding the optimum user clustering can be formulated as a minimization of the cost function. To train the proposed DNN-UC, stochastic gradient descent is selected to update the weights and biases iteratively via back-propagation as follows: where θ q symbolizes the parameter to be optimized and the subscript q indicates the iteration number, η represents the scalar-valued step size, ∇ θ signifies the derivative with respect to θ, and J(θ) is the cost function. In our proposed method, mean squared error (MSE) is chosen as the cost function and it can be expressed as where S is the total number of samples, z n denotes the ground truth for the nth user, i.e., optimal user clustering for the nth user which is generated by the BF-S method. Compared to the shallow learning-based machine learning models, the deeply layered architecture and hierarchical learning process of DNN-UC synergically provides it better ability to disentangle intricate features from massive volumes of complex datasets and thus this would lead to superior UC predictive performance in NOMA systems.
where S is the total number of samples, denotes the ground truth for the nth user, i.e., optimal user clustering for the nth user which is generated by the BF-S method.
Compared to the shallow learning-based machine learning models, the deeply layered architecture and hierarchical learning process of DNN-UC synergically provides it better ability to disentangle intricate features from massive volumes of complex datasets and thus this would lead to superior UC predictive performance in NOMA systems. (1) (2) ( +1) ( )

Power Allocation for Predicted UC
Once the prediction of UC is performed by DNN-UC, the next step is to allocate the power to the predicted UC. To perform power allocation, UC needs to satisfy the necessary power constraint of the NOMA system. Intuitively, the power constraint for performing successful SIC of number of users present in downlink NOMA clusters on subcarrier can be generalized as follows [16]:

Power Allocation for Predicted UC
Once the prediction of UC is performed by DNN-UC, the next step is to allocate the power to the predicted UC. To perform power allocation, UC needs to satisfy the necessary power constraint of the NOMA system. Intuitively, the power constraint for performing successful SIC of N number of users present in downlink NOMA clusters on subcarrier m can be generalized as follows [16]: where p min is the minimum power level required to differentiate between decoded and non-decoded signals. With respect to SIC constraint, the maximum power allocated to high channel gain user n on subcarrier m has to be p n,m < p c − ∆ 2 N−1 , n = 2, 3, . . . , N where p c is the maximum downlink transmission power for a cluster and ∆ is the minimum required power variance to perform SIC and can be expressed as Therefore, the power allocation for an N-user cluster can be generalized as: With the cluster formation obtained from the DNN-UC and the power allocated based on (18), the throughput of a cluster can be computed using (5) and the sum throughput of the NOMA system can be obtained via (6).

Simulation Settings
In this section, simulations are extensively carried out to evaluate the performance of the proposed DNN-UC approach for different NOMA environments. Specifically, the MATLAB tool is employed to model the channels and networks in a NOMA-based 5G system. The channel gains between the BS and all the NOMA users are modeled by considering the shadowing effect with the attenuation factor S n which is independently and identically distributed (i.i.d) with standard deviation, σ 2 s . Besides, to model the urban non-line-of-sight propagation environment in the 5G network, the propagation path loss of v is used where path loss experienced by a user n is denoted as PL n = d −v n . The frequencyselective Rayleigh fading is adopted to model the multipath propagation environment where the correlations between the adjacent subcarriers are taken into account. The total bandwidth is partitioned in such a way that each subcarrier possesses a bandwidth that is much less than the coherence bandwidth of the channel so that each subcarrier experiences flat fading. In this work, we assume that the proposed 5G NOMA-based network is always geographically static in the sense that the time scale of the algorithm convergence is much shorter than the channel coherence time. In other words, the channel gains are always invariant in one implementation of the algorithm.
The UC dataset adopted in this paper is generated by using the B-FS-UC scheme as described in Section 3.1. A total of 110,000 samples is collected and the maximum number of users considered is 50. Each sample has 150 attributes, which consists of 50 channel gain values, 50 transmit power values, and 50 cluster information values.
The DNN architecture for the NOMA UC optimization is constructed by modifying the MATLAB deep learning toolboxes, which are then used to perform the training, validation, and testing for the proposed DNN-UC scheme. To optimize the hyper-parameters for the proposed DNN model and the selection of the best activation function, the DNN-UC is simulated for various scenarios (with different learning rates, different lengths of training data samples, different numbers of hidden layers, different numbers of epochs, and different batch sizes, etc.) to observe its performance in terms of MSE and throughput. With the optimized parameters, the performance of the DNN-UC is benchmarked with the OMA, DUC, ANN-UC, and B-FS-UC in terms of the throughput achievement for different network sizes. The simulation parameters for the proposed DNN-UC scheme and the downlink NOMA system are summarized in Tables 2 and 3.

Simulation Results and Discussions
First, the DNN-UC scheme is simulated with different activation functions to demonstrate their throughput achievements for different numbers of users in the NOMA system. The selection of the activation functions is one of the most important hyper-parameter optimizations in the DNN model, hence, 4 well-established functions, namely ReLu, Sigmoid, Sine, and Tanh are incorporated into the DNN-UC to observe their efficiency in UC. In Figure 3, it is seen that ReLu consistently outperforms other Sigmoid, Sine, and Tanh by 10%, 16%, and 18% in terms of throughput for K = 24. In general, the main issue faced by the ReLu function is the dead neuron resulting from the zero mappings for negative values. However, this is not an issue in the DNN-UC where all the inputs (channel gains and powers for all users) are non-zero where the computation that results in negative values is less likely to occur. Sigmoid, Sine, and Tanh functions exhibit a nonmonotonic activation behavior, which does not precisely characterize the cluster formation whose size is growing monotonically with the diversity of the channel gains and power allocation. Furthermore, it is well known that the Sigmoid and Tanh functions are vulnerable to the vanishing gradient issue when more hidden layers are employed, and the network may be unable to propagate useful gradient information for weights and biases adaptation during back-propagation. In consequence, the DNN-UC which is equipped with the Sigmoid or Tanh function may potentially suffer from poor convergence rates and local optima problems. Conversely, as the partial derivative of the ReLu function is always one for any positive inputs, the vanishing gradient issue could be effectively addressed for arbitrary network depth [31]. Therefore, to ensure the precise mapping and stability of the DNN during the training phase, ReLu is recommended for the proposed DNN-UC. For the following simulation, the ReLu function will be automatically embedded onto the DNN-UC for other hyper-parameter optimizations. It is also noteworthy that for each network scenario with different numbers of users, the users tend to form multiple clusters to enjoy higher throughput performance. For instance, in the scenario with 50 users, 13 clusters are formed, each sharing an exclusive set of subcarriers. With the implementation of NOMA and SIC, interference among the users within a cluster is minimized through efficient power allocation, and this leads to higher throughput achievement compared to its predecessor, i.e., the OMA scheme. ent issue could be effectively addressed for arbitrary network depth [31]. Therefore, to ensure the precise mapping and stability of the DNN during the training phase, ReLu is recommended for the proposed DNN-UC. For the following simulation, the ReLu function will be automatically embedded onto the DNN-UC for other hyper-parameter optimizations. It is also noteworthy that for each network scenario with different numbers of users, the users tend to form multiple clusters to enjoy higher throughput performance. For instance, in the scenario with 50 users, 13 clusters are formed, each sharing an exclusive set of subcarriers. With the implementation of NOMA and SIC, interference among the users within a cluster is minimized through efficient power allocation, and this leads to higher throughput achievement compared to its predecessor, i.e., the OMA scheme. In Figure 4, the effects of batch size and the number of epochs on the training dynamics of the DNN-UC are investigated in the context of MSE at a learning rate of 0.001. Apparently, the MSE naturally manifests a declining tendency with the increasing number of epochs for all the different batch sizes. From the observation, further increas- In Figure 4, the effects of batch size and the number of epochs on the training dynamics of the DNN-UC are investigated in the context of MSE at a learning rate of 0.001. Apparently, the MSE naturally manifests a declining tendency with the increasing number of epochs for all the different batch sizes. From the observation, further increasing the number of epochs certainly improves the MSE achievement, but this leads to a longer training duration. Furthermore, choosing a large number of epochs may also result in model overfitting, which reduces the generalization gap. Among all batch sizes, the batch size of 50 provides the fastest decaying rate, ultimately achieving an MSE of 0.2. The use of batch sizes of 10 and 20 does not produce satisfactory MSE scores, which are 0.4 and 0.3, respectively for 50 epochs. The UC problem defined in this work is a nonconvex NP-complete optimization problem. Small batch sizes may cause the output of the DNN-UC to bounce around the global optima depending on the ratio of the batch size to the dataset size. On the contrary, having a big batch size as large as 100 does not guarantee a better MSE performance as well because the UC problem which is also combinatorial in nature is difficult to be generalized well. Hence, it can be concluded from Figure 4 that a batch size of 50 is appropriate as it has demonstrated an asymptotic behavior, where further increasing the batch size provides a very minor or no boost in the MSE score. At the same time, the R 2 score of this regression model is also evaluated to observe the predictive accuracy of this proposed scheme. From Figure 4, the proposed DNN-UC can achieve the best R 2 value, about 0.98 when a batch size of 50 is used at 50 training epochs, which corresponds to that achieved for MSE performance. In other words, the DNN-UC is able to achieve nearly 0.98 of R 2 which implies that the DNN-UC is a good model to characterize all the variations in response to the clustering output around its mean value. observe the predictive accuracy of this proposed scheme. From Figure 4, the proposed DNN-UC can achieve the best R 2 value, about 0.98 when a batch size of 50 is used at 50 training epochs, which corresponds to that achieved for MSE performance. In other words, the DNN-UC is able to achieve nearly 0.98 of R 2 which implies that the DNN-UC is a good model to characterize all the variations in response to the clustering output around its mean value. The effects of the learning rate and numbers of epochs on the efficiency of the DNN-UC are investigated in Figure 5. In this simulation, 70,000 data samples at a batch size of 50 (optimal batch size) are used. As anticipated, it is observed that the best throughput of 8.8 Mbps can be obtained when the learning rate is fixed at 0.001 while using 50 epochs. At this slow learning rate, the DNN-UC usually requires more training epochs given the smaller changes made to the weight update, leading to a smaller loss. When the learning rate is increased, it is perceived that the throughput drops. At the learning rate of 0.1, the throughput of 6 Mbps can be obtained if the training epoch is 50, which is a drop of 32%. This degradation is mainly due to the large weight update in which the optimizer may overshoot the optimal weight values. The cluster formation of a NOMA system is highly dependent on the users' channel gains and their powers where The effects of the learning rate and numbers of epochs on the efficiency of the DNN-UC are investigated in Figure 5. In this simulation, 70,000 data samples at a batch size of 50 (optimal batch size) are used. As anticipated, it is observed that the best throughput of 8.8 Mbps can be obtained when the learning rate is fixed at 0.001 while using 50 epochs. At this slow learning rate, the DNN-UC usually requires more training epochs given the smaller changes made to the weight update, leading to a smaller loss. When the learning rate is increased, it is perceived that the throughput drops. At the learning rate of 0.1, the throughput of 6 Mbps can be obtained if the training epoch is 50, which is a drop of 32%. This degradation is mainly due to the large weight update in which the optimizer may overshoot the optimal weight values. The cluster formation of a NOMA system is highly dependent on the users' channel gains and their powers where fine-grained weight updates are essential in the context of UC optimization. Another important hyper-parameter would be the length of training data samp because we need to ensure that the DNN-UC is sufficiently trained to operate efficie during the testing phase. As the cluster formation in the DNN-UC exhibits an expon tially increasing complexity with the number of users and subcarriers, the variation terms of cluster formation and power allocation may grow tremendously with a sli increase in the number of users and subcarriers. Therefore, it is essential to find out sufficient length of training data samples that can facilitate the DNN-UC to function isfactorily. Figure 6 illustrates the throughput attainments with respect to differ training sample lengths for different network sizes at the learning rate of 0.001. As pected, the throughput increases with the growing number of users, but the rising becomes slower as the bandwidth resource (number of subcarriers) is not increased cordingly. Among the compared sample lengths, it is seen that the DNN-UC can acq the best throughput if it is trained with 70,000. Other sample lengths below 70,000 grade the efficiency of the DNN-UC as the model is not adequately trained to capture relationships of the cluster formation and the input features. Increasing the train samples beyond 70,000 may not yield a better performance because the excess numbe data samples leads to overfitting of the model. In such a scenario, the DNN attempt learn too much in the training data samples along with the noise from the samples, wh results in poorer performance. Another important hyper-parameter would be the length of training data samples because we need to ensure that the DNN-UC is sufficiently trained to operate efficiently during the testing phase. As the cluster formation in the DNN-UC exhibits an exponentially increasing complexity with the number of users and subcarriers, the variations in terms of cluster formation and power allocation may grow tremendously with a slight increase in the number of users and subcarriers. Therefore, it is essential to find out the sufficient length of training data samples that can facilitate the DNN-UC to function satisfactorily. Figure 6 illustrates the throughput attainments with respect to different training sample lengths for different network sizes at the learning rate of 0.001. As expected, the throughput increases with the growing number of users, but the rising rate becomes slower as the bandwidth resource (number of subcarriers) is not increased accordingly. Among the compared sample lengths, it is seen that the DNN-UC can acquire the best throughput if it is trained with 70,000. Other sample lengths below 70,000 degrade the efficiency of the DNN-UC as the model is not adequately trained to capture the relationships of the cluster formation and the input features. Increasing the training samples beyond 70,000 may not yield a better performance because the excess number of data samples leads to overfitting of the model. In such a scenario, the DNN attempts to learn too much in the training data samples along with the noise from the samples, which results in poorer performance. The number of hidden layers plays a vital role in learning the cluster formation for the NOMA system. The DNN-UC is simulated with different numbers of hidden layers and their performance plots in terms of MSE and throughput versus the number of epochs are presented in Figure 7. For the case of 50 epochs, the best throughput performance of 8.7 Mbps and the best MSE of 0.2 can be obtained with four hidden layers. If two and three hidden layers are adopted in the DNN-UC, the throughput degrades by 6% and 5%, respectively, comparing to that achieved by the DNN-UC with four hidden layers. When the numbers of users and subcarriers are sufficiently large, the complexity of cluster formation and the power allocation becomes extremely high, increasing the dimensions of the nonconvexity of the UC dataset. As a result, the DNN-UC with 2 and 3 hidden layers is unable to cope with the large dimensions, which leads to inadequate approximation. On the other hand, the DNN-UC with four hidden layers demonstrates that the depth of the network suffices to produce the best generalization for the UC optimization. Nevertheless, further increasing the number of hidden layers to five and beyond does not rake in any beneficial impact to the MSE and throughput as the DNN-UC model becomes overfitting. The number of hidden layers plays a vital role in learning the cluster formation for the NOMA system. The DNN-UC is simulated with different numbers of hidden layers and their performance plots in terms of MSE and throughput versus the number of epochs are presented in Figure 7. For the case of 50 epochs, the best throughput performance of 8.7 Mbps and the best MSE of 0.2 can be obtained with four hidden layers. If two and three hidden layers are adopted in the DNN-UC, the throughput degrades by 6% and 5%, respectively, comparing to that achieved by the DNN-UC with four hidden layers. When the numbers of users and subcarriers are sufficiently large, the complexity of cluster formation and the power allocation becomes extremely high, increasing the dimensions of the nonconvexity of the UC dataset. As a result, the DNN-UC with 2 and 3 hidden layers is unable to cope with the large dimensions, which leads to inadequate approximation. On the other hand, the DNN-UC with four hidden layers demonstrates that the depth of the network suffices to produce the best generalization for the UC optimization. Nevertheless, further increasing the number of hidden layers to five and beyond does not rake in any beneficial impact to the MSE and throughput as the DNN-UC model becomes overfitting.
The number of hidden nodes is important as choosing insufficient hidden dimensionality may result in inadequate signal detection in a complicated dataset while overwhelming hidden nodes may lead to overfitting. In this work, since the dataset is complex, the number of hidden nodes should be prudently chosen to produce the best performance. In Figure 8, different numbers of hidden nodes are tested on the 2-, 3-, 4-hidden-layer DNN-UC to observe their achievable throughput. In these three implementations, all the 2-, 3-, 4-hidden-layer DNN-UC demonstrates the same performance trend where the highest throughput can be obtained when the number of hidden nodes per layer is 40. The number of hidden nodes is important as choosing insufficient hidden dimensionality may result in inadequate signal detection in a complicated dataset while overwhelming hidden nodes may lead to overfitting. In this work, since the dataset is complex, the number of hidden nodes should be prudently chosen to produce the best performance. In Figure 8, different numbers of hidden nodes are tested on the 2-, 3-, 4-hidden-layer DNN-UC to observe their achievable throughput. In these three implementations, all the 2-, 3-, 4-hidden-layer DNN-UC demonstrates the same performance trend where the highest throughput can be obtained when the number of hidden nodes per layer is 40. To show the effectiveness of the proposed UC scheme, the DNN-UC is benchmarked with the BF-S-UC, ANN-UC, DUC, and OMA in terms of their throughput performance for different numbers of users. In this simulation, the DNN-UC uses all the best hyper-parameters, as suggested in the aforementioned investigations. In Figure 9, it is seen To show the effectiveness of the proposed UC scheme, the DNN-UC is benchmarked with the BF-S-UC, ANN-UC, DUC, and OMA in terms of their throughput performance for different numbers of users. In this simulation, the DNN-UC uses all the best hyperparameters, as suggested in the aforementioned investigations. In Figure 9, it is seen that the throughputs of the DUC, ANN-UC, and DNN-UC are upper-bounded by the BF-S-UC and lower-bounded by OMA. The suggestion of replacing OMA with NOMA in the upcoming 5G is solidified again by showing that the DNN-UC can improve the throughput capacity by four times, while other NOMA UC schemes also outperform the OMA. Besides, it is also noticed that the DNN-UC can achieve a near-optimal performance, approximately 97% of the throughput attained by the BF-S-UC for different network sizes. In comparison to the DUC, without fixing the number of users per cluster, the DNN-UC can gain more than 30% of throughput improvement, thanks to the full exploitation of channel diversity and heterogeneity among the users. Therefore, the more users that present in the network, the better performance can be acquired by the DNN-UC than the DUC as the diversity increases accordingly. The sub-optimality of the ANN-UC is partially solved by the DNN-UC, which includes more hidden layers in the architecture. The addition of more layers provides further room for hyper-parameter optimization, which allows the DNN-UC to build a better intuition to characterize the nonlinearity of cluster formation, channel gains, and power allocation. By and large, the DNN-UC can outperform the ANN-UC by 8-10% for different numbers of users.

Conclusions
In this work, the DNN model has been proven to be feasible and efficient in solving the UC problem for NOMA 5G networks. With the datasets generated by the B-FS-UC, the proposed DNN-UC can be trained, validated, and tested. The inclusion of more hidden layers in the DNN model has facilitated the DNN-UC to better characterize the nonlinear transformation of diversity in channel gains and powers into cluster formation. After the cluster formation, an efficient power allocation scheme is implemented to ensure all users can achieve the required minimum throughput subject to the SIC constraint in each cluster. Extensive simulations have been carried out to optimize the hyper-parameters of the training for the DNN-UC. From the investigation, it is concluded that the DNN-UC functions the best when the structure adopts four hidden layers, each

Conclusions
In this work, the DNN model has been proven to be feasible and efficient in solving the UC problem for NOMA 5G networks. With the datasets generated by the B-FS-UC, the proposed DNN-UC can be trained, validated, and tested. The inclusion of more hidden layers in the DNN model has facilitated the DNN-UC to better characterize the nonlinear transformation of diversity in channel gains and powers into cluster formation. After the cluster formation, an efficient power allocation scheme is implemented to ensure all users can achieve the required minimum throughput subject to the SIC constraint in each cluster. Extensive simulations have been carried out to optimize the hyper-parameters of the training for the DNN-UC. From the investigation, it is concluded that the DNN-UC functions the best when the structure adopts four hidden layers, each with 40 nodes, and trained with the 33,600 data samples using a batch size of 50 at a learning rate of 0.001 for 50 epochs. With the best hyper-parameters, the proposed DNN-UC scheme is capable of attaining a near-optimal performance in terms of throughput, which is approximately 97% of throughput achieved by the B-FS-UC method. Furthermore, the robustness of the DNN-UC is also evaluated in various NOMA environments, which reveals that the proposed method is adaptive and robust against any NOMA environments without retraining the model. However, owing to the deep architecture of the proposed method which involves multiple levels of processing and nonlinear transformations, the learning process of the DNN-UC is slower and the computational complexity for the testing phase is higher compared to the ANN-UC. For future work, it is suggested to look into the complexity reduction issue for the DNN-UC, maybe adopting a deep learning pruning approach or using faster deep learning variants to solve the UC problem.
Nevertheless, owing to the deep architecture of the proposed method which involves multiple levels of processing and nonlinear transformations, the learning process of the DNN-UC is slower and the computational complexity for the testing phase is higher compared to the ANN-UC. When the depth and the width of DNN-UC are increased to cater for more complex deployment scenarios, the aforementioned issues would be exacerbated and the DNN-UC would also be more prone to overfitting. As such, for future work, it is suggested to look into model and complexity reduction issues for the DNN-UC, maybe adopting a deep learning pruning approach to eliminate redundant weights, using faster deep learning variants or advanced weight adaptation strategies to enhance its learning speed and computational efficiency.