A Machine Learning Approach to Achieving Energy Efficiency in Relay-Assisted LTE-A Downlink System

In recent years, Energy Efficiency (EE) has become a critical design metric for cellular systems. In order to achieve EE, a fine balance between throughput and fairness must also be ensured. To this end, in this paper we have presented various resource block (RB) allocation schemes in relay-assisted Long Term Evolution-Advanced (LTE-A) networks. Driven by equal power and Bisection-based Power Allocation (BOPA) algorithm, the Maximum Throughput (MT) and an alternating MT and proportional fairness (PF)-based SAMM (abbreviated with Authors’ names) RB allocation scheme is presented for a single relay. In the case of multiple relays, the dependency of RB and power allocation on relay deployment and users’ association is first addressed through a k-mean clustering approach. Secondly, to reduce the computational cost of RB and power allocation, a two-step neural network (NN) process (SAMM NN) is presented that uses SAMM-based unsupervised learning for RB allocation and BOPA-based supervised learning for power allocation. The results for all the schemes are compared in terms of EE and user throughput. For a single relay, SAMM BOPA offers the best EE, whereas SAMM equal power provides the best fairness. In the case of multiple relays, the results indicate SAMM NN achieves better EE compared to SAMM equal power and BOPA, and it also achieves better throughput fairness compared to MT equal power and MT BOPA.


Introduction
Green Radio communication has received a lot of attention in the past few years with an aim to decrease the carbon foot print of wireless networks. It has been estimated that nearly 70% of the energy being used by cellular operators is on the radio part [1] and around 9% of the global CO 2 emission is from the communication systems [2]. In addition, one of the main concerns is the User Equipment (UE) battery, which has not shown progression at par with the Radio Access Technology (RAT). This phenomena is highly visible for the cell edge users that despite spending higher energy (due to high pathloss shadow fading and adjacent cell interference) are unable to achieve fair share of the radio with the low complexity energy-efficient resource block and power allocation (LERPA) algorithm 3 and 4 of [11].
Artificial intelligence techniques can be used in highly dynamic and stringent constraint Next-Generation networks. Since machine learning is a most promising technique of artificial intelligence, it can be directly/indirectly employed to achieve the goals of 5 G in cognitive radios, massive multiple-input multiple-output (MIMO), hybrid beamforming, femto/small cells, smart grid, wireless power transfer, device-to-device communications, non-orthogonal multiple access (NOMA) etc. [12]. This paper [12] gives an overview of the applications of machine learning in Next-Generation wireless networks. Specifically, supervised learning techniques are suitable for massive MIMO channel estimations and spectrum sensing, unsupervised learning could be helpful in users grouping and clustering; and reinforcement learning can be applied in resource allocation problems.
A detailed review on existing techniques and methods have been provided in [13]. For example, in [14], a cooperative Q-learning approach was applied as an efficient approach to solve the resource allocation problem in a multi-agent network. The quality of service QoS for each user and fairness in the network are taken into account and more than a four-fold increase in the number of supported small cells. The authors in [15], proposed a machine learning framework for resource allocation to determine the optimal or near-optimal solutions based on the learning of the most similar historical scenario.
In paper [16], the authors proposed an approximated solution to a wireless network capacity problem using flow allocation, link scheduling, and power control. The Support Vector Machine (SVM) was used to classify each link to be assigned maximal transmit power or be turned off, whereas, the deep belief networks (DBNs) computes an approximation of the optimal power allocation. Both learning approaches have been trained on offline computed optimal solutions. A novel resource allocation method using deep learning to squeeze the benefits of resource utilization was developed in [17]. It was reported that when the channel environment is changing fast, the deep learning method outperforms traditional resource optimization methods. The resource allocation is to be optimized by a convolutional neural network using channel information. A similar problem has been explored in [18] that use Upper Confidence Bound learning for Greedy Maximal Matching (GMM) when the channel statistics are unknown. Since the subchannel and power allocation problem is a non-convex combinatorial problem, the optimal solution of the subchannel and power allocation problem requires an exhaustive search over all possible combinations of subchannels and power levels. In order to train the deep neural network (DNN) for an optimal solution, Ref. [19] utilizes the genetic algorithm to get the training data for DNN. It shows that the prediction accuracy increases with the size of dataset and the number of hidden layers. A four-step reinforcement learning based intercell interference coordination (ICIC) scheme is presented in [20]. The users selection, resource allocation, power allocation, and retransmit packet identification are handled by reinforcement learning to reduce the intercell interference.
However, to the best of our knowledge no available literature discusses LTE-A with L3 relays for SE and EE consideration. In this work,

•
We present an energy efficient algorithm based on SAMM and BOPA for LTE-A system with a L3 relay. Performance evaluation in terms of throughput, fairness, power consumption, SE and EE is shown between two best performing schemes i.e., MT and SAMM considering equal power and BOPA.

•
Considering the practical deployment, where there may be more than one relay supporting the cell edge users, we devise a clustering strategy to obtain near optimal placement of L3 relays and users' association.

•
In a multiple relay scenario, to optimize EE and reduce computational complexity of running algorithm every TTI, we present a two step machine learning process that uses both the SAMM and BOPA approach for resource and power allocation of the cell users. The proposed approach is compared to MT equal power, MT BOPA and SAMM equal power in terms of users' throughput and EE.
A complete list of notations used in this paper is given in Table 1. proportional rate constraint for user k λ k rate parameter for user k D k allocated RB set for user k R k rate matrix for user k p k,n optimal power allocation P T total transmit power θ Lagrangian multiplier Rest of the paper is organized as follows: system model is described in Section 2, algorithms and performance for MT, SAMM and BOPA with single relay network are given in Section 3. Multiple relay users' association and deployment with machine learning based power and RB allocation for SAMM is presented in Section 4. Complexity analysis is given in Section 5, followed by the conclusions in Section 6.

System Model
We consider a two-tier LTE-A system with a BS supported by L3 relays as shown in Figure 1. The relays are assumed to be In-band type 1b [3] and full duplex, placed in the center of BS to the most distant user. A total of K users and N RBs are considered with users placed at a uniform distance from BS. The total powers of BS and RN are denoted by P BS total and P RN total , respectively. The LTE-A system uses OFDMA transmission in the downlink. Let the system bandwidth is B with N number of RB, then, W = B N is the bandwidth of one RB. We express the channel gains g Direct_link k,n and g Relay_link k,n for user k where k ∈ K = {1..., K} on RB n where n ∈ N = {1..., N} for BS and RN respectively. Practically, the channel gain depends upon various factors, including thermal noise at receiver, receiver noise figure, antenna gains, distance between transmitter and receiver, path loss exponent, log normal shadowing and fading. Therefore, for all the links, we can write g k,n = − − φ10 log 10 d k − ζ k,n + 10 log 10 h k,n In the above equation, (83.46 dB) is a constant depending upon thermal noise at receiver, receiver noise figure, and antenna gains, φ is path loss exponent, d k is the distance in Km from UE k to the BS/relay, ζ k,n (10.5 dB) is shadowing parameter modeled by a normally distributed random variable with standard deviation 8 dB, and h k,n corresponds to the Rayleigh fading channel coefficient of user k in subchannel n [21]. The throughput of user k is given by, Direct link users (2) where the factor 1/2 in access link shows the two time-slots transmission from BS-RN and RN-UE, and µ k,n is the binary variable such that µ k,n = 1 when RB n is allocated to the user k, SNR k,n is the maximum average signal-to-noise ratio for user k between direct and relay links. Let SNR Direct_link k,n be the signal-to-noise ratio for user k via Direct Link, and SNR Relay_link k,n be the signal-to-noise ratio for user k via Relay Link, then, the SNR k,n is given as The Energy Efficiency EE in terms of bits/s/Watts can be expressed as The EE optimization problem for the above scenario can be written as where α k is the proportional rate constraint [22]. We assume that channel state information (CSI) of all the users is known to the BS. Also, it is assumed that the RB allocation decision and assignment is done in less than channel coherence time so that CSI information can be used. This further puts constraints on the RB allocation algorithm complexity. The two-hop transmission to the RN users will be carried out in two TTI's. In the first TTI, the BS will only send data to the RN users that are in close proximity of RN or have better RN-UE channel conditions than the direct link BS-UE. In the second TTI RN-UE data will be sent. BS will choose the path to the user (direct or via RN) with best channel coefficient in each TTI. The centralized scheduling minimizes the possibility of interference for In-band type of RNs. Frequency division duplexing ensures that the RN may handle backhaul data simultaneously with the access link data so that from the second TTI onwards backhaul BS-RN transmission is carried out simultaneously with the access link RN-UE transmission. The LTE-A downlink is an OFDM based system which supports M-ary quadrature amplitude modulation (MQAM). We can use Equation (2) to calculate the throughput of user k on RB n for both direct and relay-link paths. The two paths provide channel diversity to increase the users and system level throughput. We use MT and SAMM criteria for RB allocation with equal power allocation to all RBs or BOPA as explained below.

Fairness-Aware Power and Resource Block Allocation with Single Relay LTE-A Network
There are several well-known resource allocation schemes for cellular systems, namely, round robin RR, maximize throughput MT, maximize the minimum throughput (max-min), and proportional fairness PF. An improved hybrid MT and PF scheme, SAMM is presented in [4]. We briefly summarize MT, PF, and SAMM, and then present our fairness-aware power and resource allocation algorithm.

Maximum Throughput
In a Maximize Throughput MT scheme, the aim is to maximize the sum throughput of the network. It assigns more RBs to the user which has better channel conditions on direct link or two hop link thereby adding more throughput to the system but its drawback is that users with the worst channel conditions are essentially ignored. The maximum throughput criterion in mathematical form is given as, where D k is RB allocation matrix and R k is rate matrix.

Proportional Fairness
The proportional fairness based resource allocation schemes are widely used in practical wireless communication systems. In this scheme, the system allocates the resource to a user who has the maximum PF metric. The PF criterion in mathematical form is given as, where R k (t) is the throughput of user k at scheduling time t, andR k (t) is the average user throughput (moving average) over a past window of length T w = 1/α [23], as

SAMM
In SAMM [4] PF and MT are run one after the other, i.e., in first TTI PF run for K users and in second TTI MT run for K − 1 users ignoring the user with highest throughput in previous TTI. This results in maximizing fairness and throughput alternatively in each TTI.

BOPA Algorithm
Bisection based optimal power allocation BOPA Algorithm 1 allocates the power to the RBs assigned to a particular user. Given the RB allocation from MT or SAMM and throughput of each user at equal power allocated to all RBs we can calculate λ "rate parameter" as given below: where R k is the rate of each user and α k is proportional rate constraint set for fairness [6]. Optimal power allocation is water filling operation and obtained for single user aŝ where θ L is Lagrangian multiplier and its value is chosen such that R k is satisfied. Hence, the user power can be expressed as P k (λα k |D k ) and the total transmit power P T (λ) can be rewritten as EE can be given as user rate divided by power consumed to achieve that rate.
and total transmit power is also limited by According to [24] if transmit power P T (λ) is strictly convex in rate then EE(λ) is quasi-concave, global optimal solution proof is given in the appendix of paper [22] f (λ) = P T (λ) − λ ln 2 ∑ k∈K min n∈D k 1 +p k,n g k,n g k,n α k Bisection method is a simple and robust. Since the method brackets the root, it is guaranteed to converge. We apply BOPA on the RB allocation scheme SAMM, an alternating MT and PF scheme for the relay-assisted LTE-A for the optimal power allocation with the objective of maximizing the EE. In addition, we trained neural network with the dataset generated by the BOPA. Since power is a monotonically increasing function of the rate parameter λ, we apply bisection method on the following equation to find the root, Algorithm 1 BOPA Algorithm 1: Require:p k,n is the optimal power allocation matrix. 2: Ensure: Prior RB allocation through any algorithm and given as D k . 3: Getting all the λ then calculate λ max which gives the max energy Efficiency by substitution in Equation (6). 4: Using λ max set user rate as α k λ max , do water filling using Equation (13) and calculate f(λ max ) based on Equation (17). 5: If f (λ max ) ≥ 0 6: Return ;p k,n 7: Else Go to Step 9; 8: End if 9: Set λ high = λ max , λ low = 0, λ current = λ max /2 10: Repeat: Set user rate according to α k λ max , do water filling using Equation (13) and calculate f(λ max ) based on Equation (17). 11: If f (λ current ) > 0 12: Set λ low = λ current 13: Else Set λ high = λ current 14: End if 15: Set λ current = λ high + λ low / 2 16: Returnp k,n 17: End if

Performance Evaluation
A single cell is considered for generating simulations results. The cell consists of a BS, RN and UEs equipped with Omni-directional antennas. The throughput, energy and spectral efficiency is averaged over 1000 TTIs, with the duration of a TTI being 0.5 ms. The channel involves Raleigh fading and distance based path loss as shown in Figure 1. BS is located in the center of the cell coverage and most distant user is 1 Km distant from BS with RN in between at 0.55 Km. RN are In-band full duplex relays and bit error rate (BER) considered for MQAM modulation is 10 −3 . Table 2 below summarizes all simulation parameters used to derive results shown next. Figure 2 shows the result of average throughput for MT and SAMM with equal power and BOPA based power allocation. It can be seen that SAMM curves remain on top of MT curves for most of the users due to inherent fairness which ensures all users get due share of RBs. However as evident from Figure 3 Sum throughput of MT is higher as compared to SAMM for overall averaged throughput of sum users due to channel exploitation of users with good channel conditions. This makes MT better than SAMM as BOPA has proportional rate constraint set for assigning user priorities.    Figure 4 shows energy efficiency per user in bits per seconds per watts. SAMM BOPA outperforms for initial users and remains considerably lower for rest of the users. Whereas MT BOPA compared to all other schemes performs better for every user of the system with consistency due to convergence of BOPA to maximize throughput and minimize energy.  Figure 5 shows fairness Index using Jains fairness Index [25] using below equation where r k can be throughput or EE. Figure 5 shows SAMM has better fairness in terms of throughput due to PF in its algorithm. Figure 6 depicts the system's energy efficiency EE with and without power allocation. The BOPA-based power allocation algorithm allocates the available power to the RB to maximize the energy efficiency EE, therefore, both MT-BOPA and SAMM-BOPA outperforms their corresponding MT and SAMM schemes with equal power allocation.

Fairness-Aware Machine Learning Based Power and RB Allocation with Multiple Relays
In practical scenarios, multiple relays are deployed to facilitate the cell-edge users as shown in the Figure 7. The multiple relay deployment causes inter-relay interference. This interference can be minimized by the careful deployment of relays, transmit power control, and the scheduling of time/frequency resources. Though, L3 relays incur more processing delay as compared to the L1 and L2 relays but they provide robust transmission in the presence of interference [26]. Assume there are Q relays in a cell, such that relay q ∈ Q = {1, ..., Q}. The signal-to-interference-and-noise ratio (SINR) at UE k in direct link is given as where p q k ,n is the transmit power of relay q assigned to its associated user k and g q k,n is the channel gain between relay q and the UE k. Similarly, the SINR at UE k in relay q link is given as SI NR q k,n = p q k,n g q k,n ∑ q ∈Q−{q} p q k ,n g q k,n + p BS k,n g Direct_link k,n As seen from the above equation, the interference and fairness causes a significant increase in the computational cost when deploying multiple relays. Therefore, we present a machine learning based approach that utilizes relay deployment and users' association data to develop RB allocation and Power allocation strategy that maximizes the sum EE. Once trained, the proposed approach can save cost of scheduling in every TTI. This is shown in Figure 8, the machine learning model takes the inputs: number of relays, relays' coordinates, CSI, SNR, and total transmit power and produces the outputs: optimal relays' coordinates with associated users, set of RBs assigned to each user k, and the optimal power allocation (p * k,n ) to each user k in the RB n. Based on single relay performance, the RB allocation block is trained using SAMM and power allocation block is trained using BOPA. Since the relay deployment can significantly alter the RB and power allocation, a clustering approach is presented that determines relay positioning and corresponding users' association based on a pre-defined metric.

Relays Deployment and Users Association
In this section, we present an autonomous unsupervised machine learning scheme that provides users association with optimally deployed relay nodes in the cell-edge area. Machine learning algorithms can broadly be divided into two main categories, namely supervised learning and unsupervised learning algorithms. The former class of algorithms learn by training on the input labeled examples, called training dataset, {(x (1) , y (1) ), (x (2) , y (2) ), (x (3) , y (3) ), ..., (x (m) , y (m) )}, where the i th example (x (i) , y (i) ) consists of the i th instance of feature vector x (i) and the corresponding label y (i) . Given a labeled training dataset, these algorithms try to find the decision boundary that separates the positive and negative labeled examples by fitting a hypothesis to the input dataset. Unsupervised machine learning algorithms, on the other hand, are given an unlabeled input dataset. These algorithms are used for extracting information or features from the dataset. These features might be related, but not confined, to the underlying structures or patterns in the input data, relationships in data items, grouping/clustering of data items, etc. Discovered features are meant to provide a deeper insight into the input dataset that can subsequently be exploited for achieving specific goals. Clustering algorithms make an important part of unsupervised learning where the input examples are grouped into two or more separate clusters based on some features. The K-Means (KM) algorithm, is probably the most popular clustering algorithm. It is an iterative algorithm that starts with a set of initial centroids given to it as input. During each iteration, it performs the following two steps.
1. Assign Cluster: For every user, the algorithm computes the distance between the user and every centroid. The user is then associated to the cluster with the closest centroid. During this step, a user might change its association from one cluster to another one. 2. Recompute centroids: Once all users have been associated to their respective cluster, the new position of centroid for every cluster is then calculated.
Let us define the following notations to be used later in this section. K = Total number of clusters being formed.
x (i) = Location coordinates of user u (i) . In our case, x (i) ∈ IR 2 c (i) = Cluster to which the user u (i) is currently associated.
µ k = Centroid of k th cluster, µ k ∈ IR 2 µ c (i) = Centroid of the cluster to which the user u (i) is currently associated.
Now the cost function J can be defined as with the following optimization objective function. It may be pointed out that Equation (22) allows us to compare multiple clustering layouts based on their cost and select the one with the lowest cost.
In this section, we use the KM algorithm for optimal clustering of m users competing for resources in a particular cell. The clustering is performed based on their geographic location, thus our input dataset {u (1) , u (2) , u (3) , ..., u (m) } has m vectors u (i) , 1 ≤ i ≤ m, consisting of location coordinates, of ith user. For the sake of simplicity, we assume these users are deployed in a two dimensional area, i.e., a plane and so u (i) = (x (i) 2 ), i.e., an ordered pair of location coordinates. Our clustering algorithm is summarized in Algorithm 2.
The proposed algorithm takes the location coordinates of m users as input. It also takes two numbers min k and max k as additional inputs. The algorithm outputs the best number of clusters, k, such that min k ≤ k ≤ max k , and corresponding members of each cluster. It starts with k = min k and randomly selects k user locations as the initial centroids (line 6). It assigns the closest centroid to each user (line 8) and then computes new centroids by calculating the center/average location of all nodes in each cluster (line 11). So, in effect, the location of centroids keeps moving in successive iterations. It repeats the above two steps until the change in centroids' positions is zero or negligible. We repeat the test max t times with a new set of randomly chosen initial centroids every time. During every test, the discovered centroids, corresponding centroid assignment to users, and the cost are saved (lines 14-16) for later comparison. After running the loop for max t times, we select and store the best k centroids resulting from the test with the lowest cost while discarding the remaining (lines 19-21). The same is repeated for the next value of k, i.e., k = k + 1, until k > max k . At the end we have cnt = max k − max k vectors µ k , one for each value of k, the corresponding assignment vector a k and cost c k . Finally, we choose the vector µ having the lowest cost and corresponding assignment vector a among cnt stored cases. That is the best number of clusters and corresponding centroids that the algorithm found. A snapshot of the relay deployment and users's association algorithm output is shown in Figure 9. Randomly choose initial k centroids µ 1 , µ 2 , µ 3 , ..., µ k 7: for i = 1 : m do 8: a (i) = j, 1 ≤ j ≤ k, such that µ j is the centroid closest to u (i) 9: end for 10: for l = 1 : k do 11: µ l = mean of all users/points u (i) assigned to lth centroid 12: end for 13: until converges 14: µ (t) = (µ 1 , µ 2 , µ 3 , . . . , µ k ) 15: a (t) = (a (1) , a (2) , a (3) , . . . , a (m) ) 16: c (t) = cost(µ 1 , µ 2 , µ 3 , . . . , µ k ) 17: end for 18:

Resource Allocation by Multiclass Classification
The resource block allocation problem has multiple discrete outputs, i.e., the users, therefore, we use the multiclass classification to classify one out of K users. The multiclass classification is an extension of One-Vs-All classification. The input of the training network comprises of channel state information in terms of the SNR and the output consists of a particular user that maximizes the utility function (throughput for MT and PF metric for the proportional fairness). The training data is obtained from the implementation of SAMM algorithm of [4] as 25,000 K-dimensional samples of received SNR and the corresponding selected users. The dataset is partitioned into three parts, the training dataset, the validation dataset, and the test dataset. These are divided in 70%, 15%, and 15% ratio, respectively. The Matlab Neural Network Pattern Recognition Apps is used to train and deploy the neural network. It uses Scaled Conjugate Gradient algorithm [27] for training. Our application requires K = 10 neurons in input layer and 10 neurons in output layer. A hit and trial choice of eight neurons in hidden layer gave the best result. The neural network architecture is shown in Figure 10. The neural network loss function is a generalization of the logistic regression's loss function. In logistic regression classification problem, we try to find the weighted parameter θ, such that the mean square error between the predicted output and the actual output is minimized. This is called loss function (LF) or the cost function and is given by where the prediction or hypothesis function h θ (x) is a sigmoid function, i.e., h θ (x) = 1 1+e −θ T x . In the above equation, (x (i) , y (i) ) is a training dataset with 1, ..., m input-output pairs. However, loss function with sigmoid function leads to a non-convex function, therefore, a cross entropy based loss function is used to make it convex function as, where the second summation is for the regularization of weight or bias units θ j and λ R is a regularization parameter. In case of neural networks with multiclass classification, the prediction variable becomes K-dimension, h Θ (x) ∈ R K , therefore, the loss function is given as where L is the number of layers in neural network, s l is the number of neurons in layer l, and λ R = 5 × 10 −4 is a regularization parameter to control the tradeoff between fitting the training dataset and keeping the parameter Θ small. The neural network is trained using the stochastic gradient descent algorithm. The gradient or partial derivative is calculated by the backpropagation algorithm and weights (θ) are updated. The amount at which the weights are updated is called learning rate. It our case, we set learning rate to 0.01. Batch size is a matrix of input (or output) vectors applied to the network simultaneously to produce the update on network weights and biases. In our work, batch size of 128 (MATLAB default), 10 × 1 input vectors is used. We use MATLAB 2019a App, Neur al Network Pattern Recognition (nprtool) which is a two-layer (one for hidden layer activation functions and other for output layer activation functions) feedforward network.
Lower the cross entropy higher the classification accuracy, zero cross entropy means no error. Figure 11 shows that cross entropy reaches 0.0078318 at iteration 136. Figure 12 shows variation in gradient coefficient with respect to number of epochs. The final value of gradient coefficient at epoch number 142 is 0.001787 which is approximately near to zero. Minimum the value of gradient coefficient better will be training and testing of networks. From the figure, it can be seen that the gradient value is decreasing with the increase in number of epochs. Large number of validation fails indicate the overtraining. In Figure 12 validation fails are the iterations when validation mean square error (MSE) increased its value. A lot of fails means overtraining. MATLAB automatically stops training after 6 fails in a row. Figure 13 shows the error histogram of the trained neural network for the training, validation and testing parts. In this figure we can see that the data fitting errors are minimum and they are distributed within a closed range around zero. The confusion matrix Figure 14 visualizes the performance of supervised learning. The rows correspond to the predicted user (Output Class) and the columns correspond to the true user (Target Class). The diagonal cells correspond to observations that are correctly assigned the user-RB pairs. The off-diagonal cells correspond to incorrectly assigned user-RB pairs. The trained neural network provides 97.5% classification accuracy. The Figure 15 represent the receiver operating characteristics (ROC) curves. The ROC curve plot shows the true positive rate versus the false positive rate as the threshold is varied. A perfect test would show points in the upper-left corner, with 100% sensitivity and 100% specificity [28]. In the RB allocation module, it worked very well.  Figure 11. The mean-squared-error for the training and testing of the RB allocation module.

Errors = Targets -Outputs
Training Validation Test Zero Error Figure 13. The error histogram of the trained neural network for the training, validation and testing phases.

Power Allocation through Two-Layer Feedforward Neural Network
In the power allocation problem, we have to map the numeric input dataset (SNR) to the numeric output dataset (allocated power) per user per RB. Therefore, we use neural network curve fitting technique. The training dataset is generated by the Algorithm 2 as input received SNR and output allocated transmit power. Given the resource blocks allocation set D k ∀k ∈ K, the power allocation problem has been solved using two-layer feedforward neural network. The hidden layer neurons use sigmoid function as activation function and output neurons implement linear function as shown in Figure 16. We use Bayesian Regularization method to train the neural network. This method typically requires more training time but gives good results for difficult and noisy dataset. The Bayesian Regularization method uses Levenberg-Marquardt optimization to update the weight and bias values. It minimizes a combination of squared errors and weights, and then determines the correct combination for better generalization. In this method, the training does not stops after six consecutive validation (improve) fails and by default max_fail = inf. The training continues until an optimal combination of errors and weights is reached. More detail on the use of Bayesian regularization, along with Levenberg-Marquardt training, can be found in [29].
We use MATLAB 2019a App, Neural Net Fitting (nftool) which is a two-layer (one for hidden layer activation functions and other for output layer activation functions) feedforward network.
The mean-squared-error graph for the training and testing is shown in Figure 17. It shows that the MSE reaches to 0.087358 in 498 epochs. Our input/output samples to training network were channel gain/allocated power. Since, the total transmit power is a sum of linear functions of the channel gain, therefore, the neural network is got trained in a single epoch. An epoch is a full pass through the entire dataset and the calculation of new weights and biases. Figure 18 shows that the gradient coefficient reaches to 0.00076591 in 499 epochs. The lower value of gradient ensures the training and testing of the network. Other parameters such as Mu, Num Parameters, and Sum Squared Param are the stop criteria defined in Bayesian regularization backpropagation function 'trainbr' [30]. Error histogram in Figure 19 visualizes the errors between target values and the predicted values after training a feedforward neural network. In this figure we can see that the data fitting errors are minimum and they are distributed within a closed range around zero. Around 88.1% errors fall between −0.3 and 0.33.

Errors = Targets -Outputs
Training Test Zero Error Figure 19. The error histogram.

Performance Evaluation with Machine Learning Techniques
First, we apply the neural network for the RB and power allocation modules with a single relay. For the SAMM scheme in Figure 20 shows 30.25% increase in the EE. This is because of the limitations of the BOPA method which sometimes returns no result, whereas, the neural network is trained on diverse dataset and always gives the output result. We also compare our proposed schemes with LERPA of [11]. LERPA uses max-min criteria for RB allocation and fractional programming based transmission power control. In case of LTE network with multiple relays as shown in Figure 7 or Figure 9, the users associated with relay q experience interference due to the neighboring relays q neigh . This interference decreases the users' throughput as shown in Figure 21. However, the EE maximization based NN power allocation continues to dominate in the multiple relay scenario. Since the transmission is orthogonal between BS and RNs, only the relay's associated users are affected by the other relays transmissions. The equal power MT throughput does not affect because almost all the users are associated with BS. This further reduces the required transmit power of the relay, hence a net increase in EE has been observed in Figure 22. Addition of multiple relays slightly affect the SAMM NN and SAMM equal power in positive and negative way, respectively. The PF component of the SAMM forces the association of low throughput users to increase the fairness. This association goes in positive way for the SAMM NN due to the EE based power allocation, but goes in negative way for the SAMM equal power because of no compensation of the interference power. The increased fairness of SAMM NN is evident from the Figure 21, where, even the farthest users 9 and 10 have higher throughput. It can be seen that in LERPA, closer users get lower throughput but fairly large throughput is given to the farther users. This is because it uses max-min criterion for the RB allocation, which assigns the RB to the users who have lowest received SNR.  Energy Efficiency bits/sec/Wats 10 5 Figure 22. The system energy efficiency with neural network for SAMM which is trained on waterfilling based power allocation among users and BOPA based power allocation among subchannels along with LERPA of [11] in a multiple relays scenario.  It can be seen that SAMM with BOPA and NN compete well in fairness with best EE. Tradeoff has to be done on system throughput. LERPA has better fairness performance but is less efficient in EE and system throughput, whereas, the hypothetical MT performs better in average system throughput. We say hypothetical because it only allocates the RB and power to the users with the highest SNR which can not be applicable on practical scenarios.

Complexity Analysis
The RB allocation scheme SAMM uses alternate MT and PF metrics to assign the N RBs to K users. MT assigns N 2 RBs to K users and PF assigns N 2 RBs to K − 1 users in alternate TTI. Therefore, the computational complexity of SAMM is O N(K − 1 2 ) . The BOPA Algorithm 1, first requires λ min and λ max in line 4 using water-filling algorithm for which the worse-case complexity is O(2NK). After that, BOPA uses binary search method to estimate the roots of Equation (17). In the worse case, with N p points in the search space, binary search requires log 2 (N p ) iterations to find the roots of polynomial. In our case, N p = λ max −λ min , where is the error tolerance. Therefore, the overall complexity of the Algorithm 1 is O(2NK 2 log 2 (N p )). In case of the optimal exhaustive search (K N ) RB allocation combining with the BOPA; the complexity is O(2NK N+2 log 2 (N p )), whereas, the complexity of SAMM-BOPA is O((NK) 2 (2K + 1) log 2 (N p )).
The running-time complexity of the K-mean algorithm is O(kmdi) [31], where k is the number of clusters, m is the number of objects to be clustered, d is the dimension of objects, and i is the number of iterations. In our application of K-mean Algorithm 2, we use min k < k < max k and two-dimensional geographical location of the users. Therefore, the worse-case computational complexity is given as O(max k Ki).

Conclusions
In this paper, we have investigated the impact of using single and multiple L3 relays in terms of EE and throughput. For a single relay scenario, equal power and BOPA are used in conjunction with the SAMM and MT RB allocation algorithms. Simulation results show that SAMM BOPA has 26% power saving when compared with MT BOPA. Whereas, when comparing SAMM with equal power allocation to all RBs, our proposed scheme gives 77% increase in EE. For a multiple relay scenario, a clustering scheme is proposed that addresses relay placement and users' association. This information acts as an input to a machine learning process (SAMM NN) that cognizes both the SAMM and BOPA approaches using One-Vs-All classification and feedforward neural networks, respectively. The SAMM NN approach when compared with the SAMM Equal Power, gives a 2.07 times increase in EE at the cost of 0.72 times decrease in throughput. A SAMM BOPA approach adopted in the case of single relay still provided the best tradeoff in terms of energy efficiency EE, throughput and fairness in the case of multiple relays.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: