User Selection Approach in Multiantenna Beamforming NOMA Video Communication Systems

: For symmetric non-orthogonal multiple access (NOMA)/multiple-input multiple-output (MIMO) systems, radio resource allocation is an important research problem. The optimal solution is of high computational complexity. Thus, one existing solution Kim et al. proposed is a suboptimal user selection and optimal power assignment for total data rate maximization. Another existing solution Tseng et al. proposed is different suboptimal user grouping and optimal power assignment for sum video distortion minimization. However, the performance of sub-optimal schemes by Kim et al. and Tseng et al. is still much lower than the optimal user grouping scheme. To approach the optimal scheme and outperform the existing sub-optimal schemes, a deep neural network (DNN) based approach, using the results from the optimal user selection (exhaustive search) as the training data, and a loss function modiﬁcation speciﬁc for NOMA user selection to meet the constraint that a user cannot be in both the strong and weak set, and avoid the post processing online computational complexity, are proposed. The simulation results show that the theoretical peak signal-to-noise ratio (PSNR) of the proposed scheme is higher than the state-of-the-art suboptimal schemes Kim et al. and Tseng et al. by 0.7~2.3 dB and is only 0.4 dB less than the optimal scheme at lower online computational complexity. The online computational complexity (testing stage) of the proposed DNN user selection scheme is 60 times less than the optimal user selection scheme. The proposed DNN-based scheme outperforms the existing suboptimal solution, and slightly underperforms the optimal scheme (exhaustive search) at a much lower computation complexity.


Introduction
To meet the rapidly increasing consumer demand for wireless data, especially wireless video delivery, wireless transmission technology is continuously evolving. To efficiently manage the resources of the wireless transmission technology, resource allocation such as user selection and beamforming group allocation is key. The multiple-input multipleoutput (MIMO) has been used in wireless communications. Chen et al. [1] investigated resource management in MIMO systems for multiview 3D video delivery. Yang et al. [2] proposed user grouping for multicell uplink multiuser MIMO systems to achieve higher sum rates. Lee et al. [3] proposed a cross-layer optimization scheme for heterogeneous multiuser MIMO networks.
In addition, non-orthogonal multiple access (NOMA) can meet the world's demand for higher data transmission rate. NOMA has promising applications in 5G networks and beyond [4][5][6][7] and the digital TV standard ATSC 3.0 [8]. NOMA can serve more than one user at the same radio resource, and has higher bandwidth efficiency than conventional orthogonal multiple access (OMA) [9]. Since the receiver uses serial interference cancellation (SIC) technology, multiple signals can be combined and transmitted [10].
Combining MIMO and NOMA can achieve higher spectrum efficiency and diversity. Senel et al. [11] shown that the combination of multi-user beamforming and NOMA

Related Works
For NOMA-MIMO systems, user selection is a key research topic. One prior work [12] proposed a suboptimal user selection and optimal power allocation to maximize sum data rate. Another prior work [13] proposed different suboptimal user selection and optimal power allocation to minimize the sum video mean square error (MSE) distortion. The comparison of the prior works and the proposed scheme is made in Table 1.
Deep learning has been applied for radio resource allocation in wireless communication systems. Sun et al. [22] proposed learning from the suboptimal WMMSE algorithm and achieved a performance close to the suboptimal WMMSE algorithm. Lee et al. [19] proposed power control of underlaid device-to-device communications. Tseng et al. [25] proposed learning resource allocation scheme for OFDMA/NOMA systems from a suboptimal scheme. A post processing scheme for the testing stage is also proposed to guarantee the constraint that each user has at least one subcarrier for user fairness. Wang [26] proposed a modified loss function in the training processing of OFDMA-NOMA resource allocation such that the constraint that each user has at least one subcarrier is usually satisfied.

[12] [13] Proposed
User selection Base on physical layer metric information rate in (9) and (13) Base on ross layer metric video MSE in (14) Learn from [13] Power allocation Base on physical layer metric information rate in (9) and (13) Base on cross layer metric video MSE in (14) the same as [13] Computational complexity Iterative algorithm, so high computation complexity Iterative algorithm, so high computation complexity Non-iterative, deep learning-based approach, so low online computation complexity The performance of the suboptimal schemes by Kim et al. [12] and Tseng et al. [13] still shows a significant gap with the optimal scheme. The previous works about deep learning for radio resource allocations in [22,25,26] all learn from the suboptimal scheme (training data), so their performance would be slightly worse than the suboptimal scheme and can't be close to the optimal solution. Our proposed scheme uses the DNN to learn the strong/weak set user selection from the optimal solution (by exhaustive search) and thus performs better than the suboptimal scheme, and close to the optimal scheme at the lower complexity.
Compared to the prior works, our proposed scheme makes the following contribution: (1) A deep learning scheme (Scheme DNN in Section 4) to learn from the optimal scheme (Scheme Optimal) is proposed. The Scheme Optimal attempts all the combinations/permutations of K candidate users (exhaustive search) and chooses the best performing user grouping. Scheme DNN uses the Scheme Optimal results as training data. The proposed Scheme DNN achieves near optimal performance at lower complexity. It outperforms the previous suboptimal schemes proposed in [12,13]. (2) A new loss function for deep learning of the user selection to deal with constraint violation is proposed. If a user is selected in both of the strong set and weak set (constraint violation), extra value is added to the cost function. This avoids post-processing after the training stage to satisfy the constraint that a user can't be in both of the strong set and weak set and reduces the complexity. For comparison, Tseng et al. [25] investigated the deep leaning-based resource allocation for OFDMA/NOMA but not MIMO. Its scheme has the post-processing after the training stage to satisfy the constraints and additional complexity and latency during the runtime. The scheme in [26] modified the loss function for satisfying the constraint that each user has at least one subcarrier and thus avoid post-processing, but it deals with different constraint (a user has at least one subcarrier, not that a user can't be in both of the strong set and weak set) in different systems (OFDMA/NOMA, not NOMA-MIMO). (3) The proposed deep learning approach for NOMA resource management crosses the physical and application layers. Previous NOMA schemes such as [4][5][6]12,27] focus on the physical layer and there is currently no deep learning-based cross-layer user selection scheme for NOMA-MIMO video systems [28][29][30].
The remaining part of this paper is organized as follows. Section 3 describes the system model. Section 4 describes the proposed deep learning approach and proposed modified cost function for constrained optimization. Our simulation results are shown in Section 5. The conclusion is given in Section 6.

Uplink NOMA-MIMO Video Transmission System Model
Sections 3.1-3.3 describe the structure of the uplink NOMA-MIMO video communication system, received signal model, and multiantenna beamforming method ZF post-coder. Key idea is the N antennas at the BS creates N multiantenna beamforming groups and NOMA allows two users in the same resource, so total 2N users can be supported in the uplink NOMA-MIMO systems, so the sum data rates of all users are 2N times. Section 3.4 is the received SINR and then the information (data) rate in (9) and (13) for the strong and weak NOMA users, respectively. The information (data) rate is a physical layer metric used in the prior work [12]. Section 3.5 describes the model of the video MSE distortion in (14) which is a function of the information (data) rates in Section 3.4. The video MSE distortion is a cross layer metric used in the prior work [13] and the proposed scheme. Then, the video quality indicator, PSNR, is a log expression of the video MSE distortion and defined in (17).

Uplink Noma-Mimo System Structure
The structure of the symmetric uplink NOMA-MIMO video transmission system is shown in Figure 1, and is the same as that in [12,13] except the gray part-resource allocation. The resource allocation in [12,13] are non-deep-learning-based. The resource allocation block in Figure 1 is a deep learning-based one with the training data obtained from the optimal solution. Figure 2 shows the symmetric uplink NOMA-MIMO system model with K users and N antennas at the BS, K < 2 N, and is the same as that in [12,13]. Overall, the uplink NOMA-MIMO video transmission system model in Figures 1 and 2 is the same as that in [12,13] except that the resource allocation is based on deep-learning.
coder. Key idea is the N antennas at the BS creates N multiantenna beamforming and NOMA allows two users in the same resource, so total 2N users can be suppo the uplink NOMA-MIMO systems, so the sum data rates of all users are 2N times. Section 3.4 is the received SINR and then the information (data) rate in (9) an for the strong and weak NOMA users, respectively. The information (data) rate is a ical layer metric used in the prior work [12]. Section 3.5 describes the model of the MSE distortion in (14) which is a function of the information (data) rates in Secti The video MSE distortion is a cross layer metric used in the prior work [13] and th posed scheme. Then, the video quality indicator, PSNR, is a log expression of the MSE distortion and defined in (17).

Uplink Noma-Mimo System Structure
The structure of the symmetric uplink NOMA-MIMO video transmission sy shown in Figure 1, and is the same as that in [12,13] except the gray part-resource tion. The resource allocation in [12,13] are non-deep-learning-based. The resource tion block in Figure 1 is a deep learning-based one with the training data obtaine the optimal solution. Figure 2 shows the symmetric uplink NOMA-MIMO system with K users and N antennas at the BS, K < 2 N, and is the same as that in [12,13]. O the uplink NOMA-MIMO video transmission system model in Figures 1 and 2 is th as that in [12,13] except that the resource allocation is based on deep-learning.

Received Signal Model
The received signal of all the groups with all users in the uplink NOMA system can be expressed as follows: where and denote the channel matrix of the strong and weak sets, respectively. is the additive white Gaussian noise (AWGN) with power , and , are the N × 1 transmitted signal vector of the strong and weak sets, respectively.
The channel vectors of the strong and weak sets can be denoted as where p ∈ 1,2, ⋯ , N , ∈ 1,2, ⋯ , N , and , , , are the N × 1 uplink channel matrix of the p-th and q-th users in the strong and weak sets, respectively.
The transmitted signal vector of the strong and weak set is given by where (. ) denotes the transpose of the matrix. , and , are the signal of the p-th and q-th user in the strong set and weak set, respectively. , and , are the power control factors of the p-th and q-th user in the strong set and weak set, respectively.

Multiantenna Beamforming: Zero-Forcing Post-Coder
As in [12], the BS in an uplink (UL) beamforming NOMA system can utilize the CSI of the entire set of users. In order to eliminate intra-set interference, the zero-forcing (ZF) scheme to generate the post-coding matrix is used.
where (. ) * is the complex conjugate of the matrix, and , and , is the 1 × N ZF post-coder of the j-th user in the strong set and weak set, respectively.

Received Signal Model
The received signal of all the groups with all users in the uplink NOMA system can be expressed as follows: where H s and H w denote the channel matrix of the strong and weak sets, respectively. n awgn is the additive white Gaussian noise (AWGN) with power P awgn , and x s , x w are the N × 1 transmitted signal vector of the strong and weak sets, respectively. The channel vectors of the strong and weak sets can be denoted as where p ∈ {1, 2, · · · , N}, q ∈ {1, 2, · · · , N}, and h s,p , h w,q are the N × 1 uplink channel matrix of the p-th and q-th users in the strong and weak sets, respectively. The transmitted signal vector of the strong and weak set is given by where (.) tr denotes the transpose of the matrix. s s,p and s w,q are the signal of the p-th and q-th user in the strong set and weak set, respectively. α s,p and α w,q are the power control factors of the p-th and q-th user in the strong set and weak set, respectively.

Multiantenna Beamforming: Zero-Forcing Post-Coder
As in [12], the BS in an uplink (UL) beamforming NOMA system can utilize the CSI of the entire set of users. In order to eliminate intra-set interference, the zero-forcing (ZF) scheme to generate the post-coding matrix is used. Based on H s and H w , W s and W w are defined to be the ZF post-coding matrices where (.) * is the complex conjugate of the matrix, and w s,j and w w,j is the 1 × N ZF post-coder of the j-th user in the strong set and weak set, respectively.

Received Sinr and Information (Data) Rate of Users
As mentioned above, the strong set signal after post-coding for the strong set can be obtained using W s and the received vector, z s = [z s,1 z s,n · · · z s,N ] tr is achieved as follows The received signal of strong set user (s, p) is expressed as where Σ N q=1 w s,p h w,q √ α w,q s w,q represents the interference coming from the weak user. The received SINR of the strong user (s, p) is denoted as follows: Then w the information rate of the strong set user (s, p) is given by where BW is the signal bandwidth, A s,p = η * P s,p * h s,p 2 /P awgn , C p = ∑ N q=1 P w,q * w s,p h w,q 2 /P awgn and η represents the gap to the theoretical capacity [13,15].
The transmit power of strong user (s, p) and weak user (w, q) is denoted as P s,p and P w,q , respectively. The maximum transmit power per user is P max , and p N is the power of the noise. On the opposite side, the weak set signal can be decoded by perfect SIC after the signal interference from the strong set is removed. Then z w = z w,1 z w,q · · · z w,N tr after the W w ZF post-coder is achieved, and the received vector of the weak set is represented as The received SINR of the weak user SI NR w,q = h w,q 2 α w,q P w,q P awgn (12) Then the information rate of the weak user where A w,q = η * P W,q * h w,q 2 /P awgn

Video MSE Distortion Model and Psnr
According to the video distortion model [15], the video MSE of each group of pictures (GOP) of the NOMA system can be approximated as the following equation [31]: Symmetry 2021, 13, 1737 The rate NOMA is either rate s,p in (9) for strong users or rate w,q in (13) for weak users. The a k , b k , and c k are fitted before transmission and depend on the video content [15,16,25].
The video MSE of the OMA system is The information rate of the OMA system is The reason that A OMA approximates to A w,q , the parameter of the weak user in NOMA system, is that the users of the OMA system do not interfere with each other, so A OMA = η * P OMA,k * h OMA,k 2 /P awgn , where P OMA,k is the transmit power of the OMA user, and h OMA,k is the channel vector of the OMA user, The PSNR, peak signal-to-noise ratio, is defined as [31] PSNR = 10 × log 10 255 × 255 MSE The theoretical PSNR is obtained by using MSE in (14), (15). The simulated PSNR is obtained by using MSE in the simulation, which accounts for channel-induced errors, imperfect source encoding rate control etc. [15].

Proposed Deep Learning Approach for User Selection (Scheme DNN)
The optimal user grouping is to attempt all the combinations/permutations of K candidate users (exhaustive search) and choose the best performing grouping, where K is the number of candidate users that BS can choose from. Its complexity is high, so the user set selections in the previous studies such as [12,13] are heuristic suboptimal solutions. The proposed deep learning approach for user selection uses optimal user grouping results as the training data and achieves near optimal performance at lower online computational complexity. is the weight of the neurons and is The normalized channel gains [32] (physical layer) and RD-function parameters (application layer) of all users are adoppted as the input to the DNN, and the output data is the user grouping result and can be represented as a 2 × K matrix. The first 1XK matrix indicates N users selected in the strong set (N ones, the others are zeros). The second 1XK matrix indicates N users selected in the weak set (N ones, the others are zeros). Therefore, it is possible that a user is in both strong and weak set. Furthermore, for DNN, the data needs to be one-dimensional, so the output data are reshaped to a 1 × 2 K matrix.

Deep Neural Networks Structure
The training data in the form of (DNN input, DNN desired output) pair are generated as follows. The channel coefficients are the DNN input of the training data and randomly generated based on the independent and identically distributed (i.i.d.) probabilistic model. The 1 × 2K resource allocation matrix is the output of the training data and obtained from the optimal or suboptimal resource allocation algorithm such as Scheme Optimal or Scheme A [13] in the next section. The testing data are generated in a similar way. The channel coefficients are generated based on the i.i.d. probabilistic model, and different from those in the training data.

DNN System Model
ω is used to represent the parameters of the DNN, ω = {ω 1 , ω 2 , . . . , ω L }. The set of the parameters of the layer l is ω l = {W l , b l }. W l is the weight of the neurons and b l is the bias of the neurons at the l-th layer. The l-th layer can be denoted as follows: where σ ( ) is an activation function. A rectified linear unit function (ReLU function) with σ ReLU (x) = max (0, x) is used as the activation function in each layer except for the last layer. The ReLU function can keep the gradient at 1 and the size of gradients will not reduce exponentially when back-propagating via many layers [33]. The softmax activation function in the output layer was attempted. All user combinations in the NOMA system are numbered and the pre-training data are transformed into numbers as DNN training data. The number is converted back to the original data type after training. However, the accuracy of this method is only 30%. Finally, the original training data are used and the activation function is changed to sigmoid σ sigmoid (x) = 1 1+e −x , which maps the output to interval [0, 1].
Binary cross-entropy (BCE) is used for the cost function since it is a classification problem: where Y(i) is the labeled (desired) DNN output and Y L (i) is the DNN output during the training stage.

Proposed Modified Cost Function for Constrained Optimization
The proposed modified loss function is as follows: where Loss constraint represents the proposed modification to meet the constraint of the resource allocation. In the NOMA system, a user cannot be selected in both of the strong set and weak set. To avoid the post-processing of the DNN output and the resulting additional online computational complexity, e.g., [25], the modification of the cost function is proposed: If the strong and weak set have user(s) in common, the value of Loss constraint will be 0.5; otherwise it will be 0. In order to minimize the loss function during the training stage, the DNN will avoid the situation that the strong and weak set have user(s) in common. Thus, the post-processing dealing with violation of the constraint that the strong and weak set cannot have user(s) in common, can be saved.

Statistical Analysis
For tasks in communications and networks, the training data can be collected or generated [17], so there is no problem of limited training samples. The training data do not have the data imbalance problem described in [34] since the channel coefficients of users at different slots are randomly generated based on the i.i.d. probabilistic model.

Simulation Results
The video content type in the simulation results is as follows. The video is a travel documentary of CIF size and of length 50 s at 30 fps [13,15]. Each user has different starting time of the same cyclic video. In this way, application layer diversity for users is created and the complexity over time for users is the same. The size of a GOP (time slot) is 15 frames. The resource allocation is conducted once per GOP. The source encoding rate control is H.264/AVC baseline profile, and 80~600 kbps for each GOP.
The signal bandwidth is BW = 50 kHz, and the adaptive modulation method is M-QAM with M = 4~256. The users randomly located and their channel gains are also random.
The channel gain is modeled as α rayleigh where α rayleigh is Rayleigh fading and γ is the path loss exponent. K 0 is −24 dB, d k is uniformly distributed [40 m, 100 m], and d 0 is 40 m. The maximum transmitting power per user P max is 24 dBm. The time varying channel response is assumed block fading. That is, the channel coefficients are constant during a GOP/time slot and are independently and identically distributed (i.i.d.) for different GOPs/time slots [13,15,16]. Additionally, Table 2 shows the parameters of the DNN. The activation function for the hidden layers is ReLU since it can keep the gradient at 1 and the size of gradients will not reduce exponentially when back-propagating via many layers [33]. The activation function for the output layers is sigmoid since the user selection in NOMA-MIMO systems is a multi-label classification. The number of epochs is selected based on the training/validation loss curve convergence in Figures 4 and 5.
The following schemes are considered for comparison. Scheme Optimal: the optimal scheme (the exhaustive search bound). Scheme DNN (proposed): Proposed DNN, learning from Scheme Optimal (optimal training data).
Scheme A: [13], sub-optimal scheme, state-of-the-art Scheme A': DNN, but learning from Scheme A (sub-optimal training data) Scheme B: [12], sub-optimal scheme, state-of-the-art Scheme C: The OMA system. overfitting occurs. The DNN model can learn the correct answer from the unseen data (the validation data are different from the training data). This validates the DNN system model with the parameters in Table 2. (3) The initial loss is greater than 1 (maximum of the binary cross entropy). Also, there are jumps of 0.5 (constraint violation) before convergence (about epoch 100) in the training and validation loss curves. These validate the in (20) in the DNN system model. The comparison metrics are the theoretical and simulated PSNR. The theoretical PSNR is obtained by using MSE in (14) and (15). The simulated PSNR is obtained by using MSE in the simulation and accounts for channel-induced errors, imperfect source encoding rate control etc. [15] Figure 6 shows the average theoretical PSNR of all schemes. Obviously, the proposed Scheme Optimal has perform best. The proposed Scheme DNN, which learns from the optimal solution, outperforms the previous suboptimal Schemes A and B by 0.7 dB and 2.3 dB, respectively, and is only 0.4 dB away from the Scheme Optimal. Scheme D, an OMA scheme, has the lowest 29.0 dB among all schemes.
In Figure 7, the simulated PSNR of all schemes are all lower than the corresponding theoretical PSNR. This is due to the fact that of the communication channel errors, imperfect rate control at the source encoder, etc. [15,16]. The complexity of Scheme Optimal is too high so its simulated PSNR can't be achieved. It can be seen that the proposed Scheme  The model validation and credibility of the simulation results for proposed Scheme DNN are justified as follows. The training loss and validation loss versus epochs are shown in Figures 4 and 5, respectively. The following is observed: (1) The loss function converges after 200 epochs, so the epochs = 200 in Table 2.
(2) The validation loss converges to almost zero in a way as the training loss, and no overfitting occurs. The DNN model can learn the correct answer from the unseen data (the validation data are different from the training data). This validates the DNN system model with the parameters in Table 2.
(3) The initial loss is greater than 1 (maximum of the binary cross entropy). Also, there are jumps of 0.5 (constraint violation) before convergence (about epoch 100) in the training and validation loss curves. These validate the Loss constraint in (20) in the DNN system model.
The comparison metrics are the theoretical and simulated PSNR. The theoretical PSNR is obtained by using MSE in (14) and (15). The simulated PSNR is obtained by using MSE in the simulation and accounts for channel-induced errors, imperfect source encoding rate control etc. [15] Figure 6 shows the average theoretical PSNR of all schemes. Obviously, the proposed Scheme Optimal has perform best. The proposed Scheme DNN, which learns from the optimal solution, outperforms the previous suboptimal Schemes A and B by 0.7 dB and 2.3 dB, respectively, and is only 0.4 dB away from the Scheme Optimal. Scheme D, an OMA scheme, has the lowest 29.0 dB among all schemes. The comparison metrics are the theoretical and simulated PSNR. The theoretical PSNR is obtained by using MSE in (14) and (15). The simulated PSNR is obtained by using MSE in the simulation and accounts for channel-induced errors, imperfect source encoding rate control etc. [15] Figure 6 shows the average theoretical PSNR of all schemes. Obviously, the proposed Scheme Optimal has perform best. The proposed Scheme DNN, which learns from the optimal solution, outperforms the previous suboptimal Schemes A and B by 0.7 dB and 2.3 dB, respectively, and is only 0.4 dB away from the Scheme Optimal. Scheme D, an OMA scheme, has the lowest 29.0 dB among all schemes.
In Figure 7, the simulated PSNR of all schemes are all lower than the corresponding theoretical PSNR. This is due to the fact that of the communication channel errors, imperfect rate control at the source encoder, etc. [15,16]. The complexity of Scheme Optimal is too high so its simulated PSNR can't be achieved. It can be seen that the proposed Scheme DNN outperforms Schemes A and B by 0.8 dB and 2.0 dB, respectively. In Figure 7, the simulated PSNR of all schemes are all lower than the corresponding theoretical PSNR. This is due to the fact that of the communication channel errors, imperfect rate control at the source encoder, etc. [15,16]. The complexity of Scheme Optimal is too high so its simulated PSNR can't be achieved. It can be seen that the proposed Scheme DNN outperforms Schemes A and B by 0.8 dB and 2.0 dB, respectively. Scheme DNN and Scheme A' are compared in Figures 6 and 7. Scheme DNN uses DNN to learn from the optimal scheme (Scheme Optimal) and Scheme A' use DNN to learn for sub-optimal scheme. Scheme DNN and Scheme A' use the same the DNN structure but different training data (from Scheme Optimal or Scheme A). Scheme DNN outperforms Scheme A' by 1.6 dB and 1.8 dB in the theoretical and simulated PSNR, respectively.
The DNN architecture, a computational model composed of more than one hidden layer, learns to represent data with multiple abstraction levels, in a similar way to human Scheme DNN and Scheme A' are compared in Figures 6 and 7. Scheme DNN uses DNN to learn from the optimal scheme (Scheme Optimal) and Scheme A' use DNN to learn for sub-optimal scheme. Scheme DNN and Scheme A' use the same the DNN structure but different training data (from Scheme Optimal or Scheme A). Scheme DNN outperforms Scheme A' by 1.6 dB and 1.8 dB in the theoretical and simulated PSNR, respectively.
The DNN architecture, a computational model composed of more than one hidden layer, learns to represent data with multiple abstraction levels, in a similar way to human brains [35]. A more complicated problem needs more hidden layers in a neural network to solve it. For an ordinary neural network (number of hidden layers = 1), the theoretical PSNR is 30.2 dB and significantly worse than the deep neural network (Scheme DNN in Figure 6, number of hidden layers = 4). Thus, DNN is more useful than ordinary neural network in a complicated cross-layer user selection in uplink NOMA-MIMO video transmissions.

Discussions
The proposed DNN model details and why it is a good solution are as follows. The number of neurons at 4 hidden layers is 1024/1024/1024/2048. The input is the normalized channel gains [32] (physical layer) and RD-function parameters (application layer) of all users. DNN model parameters such as number of hidden layers, number of neurons at each hidden layer, etc., are determined by exhaustive search [18]. The DNN model quality is quantitively indicated by the training loss and validation loss [36][37][38]. In Figure 4, the training loss converges to almost zero after 200 epochs, so there is no underfitting and the DNN model is not too simple. In Figure 5, the validation loss also converges to almost zero in a way as with the training loss, so there is no overfitting and the DNN model is not too complex. Therefore, the DNN model is identified as a good one.
Next, the performance is discussed and better presentation of simulation results is given. The training and validation loss in Figures 4 and 5 show convergence before 200 epochs and no underfitting/overfitting. The parameter setting in Table 2 in the revised manuscript ( Table 1 in the original manuscript) including the DNN size, epochs, training data size, etc. is appropriate. The jumps of 0.5 and greater-than-1 value of initial loss indicate the modified loss function in (20) with Loss constraint = 0.5. The proposed Scheme DNN outperforms prior work suboptimal Scheme A [13] by 0.7 dB and only 0.4 dB away from Scheme Optimal in theoretical PSNR in Figure 6 since it learns from the optimal Scheme Optimal. For comparison, Scheme A' learns from suboptimal Scheme A and slightly underperforms Scheme A. The simulated PSNR is obtained by using MSE in the simulation, which accounts for channel-induced errors, imperfect source encoding rate control etc. The proposed Scheme DNN outperforms the prior work suboptimal Scheme A by 0.8 dB in more realistic simulated PSNR in Figure 7. Again, the proposed Scheme DNN learns from the optimal, so it can surpass the suboptimal Scheme A.
Next the computational complexity is discussed. First, note that the training stage is executed beforehand (offline), so it is not an obstacle for the real-time (online) operation of the deep learning-based scheme [19]. Deep learning-based resource allocation decisions could be obtained with much less online computations than the non-deep-learning-based resource allocation schemes [28]. Thus, as in [18][19][20]28], the training time is excluded in the computational complexity comparison where only online (testing stage) computational complexity is counted since the training procedure is conducted offline. For K = 12 and N = 2, the execution time of Scheme Optimal for 3000 testing data is over 15 min. The execution time (testing stage only, not including training stage) of Scheme DNN is 14.86 s for the same 3000 testing data. The schemes are performed in a desktop computer with Intel Core I7-8700 NVIDIA 1080Ti GPU. For each testing data (resource allocation in one GOP), the proposed Scheme DNN requires only 5 ms and the Scheme Optimal requires 300 ms.
Lastly, the comparison among different video samples is needed in order to evaluate the performance of the overall adopted methodology (such as the modified loss function). It allows to evaluate the solution scalability to other cases and then to evaluate the goodness of DNN model. We simulate PSNR for other video sequences in CIF resolution with 30 fps in [39]. Although the absolute values of the simulated PSNR differs for different video samples, the relative performance gain among schemes are similar.

Conclusions
A DNN structure with the modified loss function to learn the optimal user selection scheme is proposed. The loss function modification is to skip the post-processing of the DNN output (and the corresponding complexity and delay) during the testing stage. The numerical results show that the proposed DNN-based approach learning from the optimal user selection (by exhaustive search) outperforms the state-of-the-art [13] and [12] by 0.7 dB and 2.3 dB in theoretical PSNR, respectively, and is only 0.4 dB less than the optimal solution. The proposed Scheme DNN using the results from the optimal user selection as the training data is 1.8 dB higher in theoretical PSNR than Scheme A' using the results from the sub-optimal user selection as the training data. The proposed Scheme DNN has 60 times lower computational complexity during the testing stage than the optimal scheme (Scheme Optimal) since each layer of the DNN is just a linear combination and a nonlinear activation function. and may benefit a low latency scenario for the next generation communication systems. Previously, the deep learning-based resource allocation schemes all learned from the sub-optimal scheme so they cannot outperform the sub-optimal scheme. In the paper, the proposed deep learning-based scheme learns from the optimal scheme, and offers near-optimal video quality at much less computational complexity. It may be beneficial for next generation multimedia communications to increase the quality of user experience.