Intelligent Resource Allocation Using an Artiﬁcial Ecosystem Optimizer with Deep Learning on UAV Networks

: An Unmanned Aerial Vehicle (UAV)-based cellular network over a millimeter wave (mmWave) frequency band addresses the necessities of ﬂexible coverage and high data rate in the next-generation network. But, the use of a wide range of antennas and higher propagation loss in mmWave networks results in high power utilization and UAVs are limited by low-capacity onboard batteries. To cut down the energy cost of UAV-aided mmWave networks, Energy Harvesting (EH) is a promising solution. But, it is a challenge to sustain strong connectivity in UAV-based terrestrial cellular networks due to the random nature of renewable energy. With this motivation, this article introduces an intelligent resource allocation using an artiﬁcial ecosystem optimizer with a deep learning (IRA-AEODL) technique on UAV networks. The presented IRA-AEODL technique aims to effectually allot the resources in wireless UAV networks. In this case, the IRA-AEODL technique focuses on the maximization of system utility over all users, combined user association, energy scheduling, and trajectory design. To optimally allocate the UAV policies, the stacked sparse autoencoder (SSAE) model is used in the UAV networks. For the hyperparameter tuning process, the AEO algorithm is used for enhancing the performance of the SSAE model. The experimental results of the IRA-AEODL technique are examined under different aspects and the outcomes stated the improved performance of the IRA-AEODL approach over recent state of art approaches.


Introduction
Unmanned aerial vehicle (UAV)-assisted communication presents a line-of-sight (LoS) wireless connection with controllable and flexible utilization [1].In this regard, UAVs were mainly utilized to enrich the capacity and network coverage for ground users.As well, in wireless powered networks (WPN), UAVs are used as mobile charging stations to deliver radio frequency (RF)-energy supply to lower power user gadgets [2].As a UAV generally utilizes limited-capacity batteries to carry out tasks, like flying, hovering, and offering services, it was vital to make the trade-offs between their coverage area and energy utilization along with service time [3].Specifically, UAV-based aerial platforms that provide wireless services have allured the wide industry and research efforts concerning control, deployment problems, and navigation.To enhance the coverage and energy efficiency for UAV-aided communication networks, resource allocation, namely subchannels, transmit power, and serving users, is essential [4].
Furthermore, consider a multiple-UAV-based wireless communication network (multi-UAV network) where a joint model to optimize trajectory and resource allocation was analyzed as a means to guarantee fairness by optimizing the minimal output throughput among users [5].In this study, the author to strike tradeoffs between the sum rate and delay of sensing errands for multi-UAV based uplink single cell network devised a hybrid trajectory design and subchannel assignment method [6].Human interference is constrained for the control design of UAVs because of the maneuverability and versatility of UAVs.Hence, to boost the outcome of UAV-enabled communication networks, machine learning (ML)based intelligent control of UAVs is a priority [7].Neural networks (NNs)-based trajectory design is taken into account where concerned from the viewpoint of UAVs' manufactured structures.Likewise, based on reinforcement learning (RL), a UAV routing design method was developed.
To build data distributions, the Gaussian mixture model was used where a weight expectation-related predictive on-demand deployment algorithm of UAV was devised for reducing the transmit power.As previously mentioned, ML is an auspicious power tool to offer potential and autonomous solutions smartly to boost the UAV-assisted communication network.But, several pieces of research focused on the trajectory and deployment models of UAVs in communication networks [8].However, resource allocation methods like sub-channels and transmit power are taken into account as well as the previous research concentrated on time-independent scenarios.Furthermore, for time-dependent cases, the capacities of ML-based resource allocation techniques were inspected [9].But, many ML techniques concentrated on multi-or-single UAV scenarios by assuming the accessibility of whole network data for all UAVs.
This article introduces an intelligent resource allocation using an artificial ecosystem optimizer with a deep learning (IRA-AEODL) technique on UAV networks.The presented IRA-AEODL technique aims to effectually allot the resources in the wireless UAV network.In such cases, the IRA-AEODL technique focuses on the maximization of system utility over all users, combined user association, energy scheduling, and trajectory design.To optimally allocate the UAV policies, the stacked sparse autoencoder (SSAE) model is used in the UAV networks.For the hyper-parameter tuning process, the AEO algorithm is used to enhance the performance of the SSAE model.The experimental results of the IRA-AEODL technique are examined under different aspects.
The highlights of this article include the use of unmanned aerial vehicles (UAVs) as a solution for flexible coverage and high data rates in next-generation networks, the challenge of energy consumption and limited battery capacity in UAVs, and the introduction of an intelligent resource allocation technique using an artificial ecosystem optimizer with deep learning (IRA-AEODL) on UAV networks.The research motivation behind this article is to find a solution to the energy cost issue in UAV-aided mmWave networks by utilizing energy harvesting and an intelligent resource allocation technique.

Related Works
In [10], the authors examine the Resource Allocation (RA) issue in UAV-assisted EHpowered D2D Cellular Networks (UAV-EH-DCNs).The main goal is to enhance power effectiveness and, at the same time, ensure the gratification of Ground Users (GUs).Also, the LSTM network is implemented to ease the rapidity of conjunction by taking out the prior data of GUs' gratification in regulating the present RA policy.Chang et al. [11] suggest an ML-founded policy RA protocol that encompasses RL and DL to devise the maximum strategy of the comprehensive UAV.Then, the authors also introduce a Multi-Agent (MA) DRL system for dispersed employment without being aware of a previous idea of the dynamic behavior of networks.Li et al. [12] suggest a novel DRL-founded Flight Resource Allocation Framework (FRA) to lessen the comprehensive information packet loss in a sequential activity space.Also, a state classification layer, leveraging LSTM, is established in forecasting network dynamics, outcoming from time-varying airborne channels and power arrivals at the devices on the ground.
In [13], the authors concentrate on a downlink cellular network, where several UAVs play as aerial base stations for the users on the ground over Frequency Division Multiple Access (FDMA).Targeting maximizing both fairness and comprehensive throughput, the authors prototype RA and route design as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and suggest MARL as a resolution.In [14], a MADRLfounded approach is introduced to accomplish the optimum long-term network utility while gratifying the customer's device value of service needs.However, considering that the efficacy of every UAV was determined founded on the atmosphere of the network and several other UAV activities, the JTDPA issue is prototyped as the stochastic game.
In [15], the authors present their IRA-AEODL framework, which combines the Intra-Routing Algorithm (IRA) and Aerial Edge-mounted On-Demand Learning (AEODL).The IRA allows UAVs in the network to organize them for routing, while AEODL leverages machine learning to enhance dynamic route optimization.Afterward, the authors evaluate the performance of their proposed IRA-AEODL network, comparing it against existing UAV network solutions.They perform numerical simulations to evaluate the end-to-end delay, network throughput, and packet delivery ratio.They also analyze the mobile edge computing capabilities of their proposed network.
In [16], the authors examine the anti-jamming issue with integrated channel and energy distribution for UAV networks.Specifically, the authors concentrate on discarding both shared intrusion amongst exterior malevolent jamming and UAVs to optimize the scheme Quality of Experience (QoE) related to energy utilization.Then, the authors suggest a joint MA Layered Q Learning (MALQL) founded anti-jamming transmission protocol in minimizing the huge dimensionality of the activity space and examine the asymptotic convergence of the suggested protocol.In [16], the novelty of this research lies in its ability to address the total energy reduction issue in a non-convex way, while also incorporating several advanced protocols, such as a central MARL protocol and an MA Federated RL protocol, into an MEC scheme with multiple UAVs.By doing so, the authors propose a new and innovative approach that can potentially reduce energy consumption and improve the overall energy efficiency.The author [17] presents a stochastic geometry-based analysis of an integrated aerial-ground network, enabled by multi-UAVs.The novelty of this paper is that the exact distribution of the network throughput is derived and explored under various system parameters.However, the analysis is restricted to Rayleigh fading and a single interfering UAV.
Overall, the literature survey highlights a research gap in the area of resource allocation in UAV-assisted networks.While there have been previous studies focusing on using algorithms such as LSTM, RL, and DRL for efficient resource allocation, there is still a need for further investigation in this area.Furthermore, there is also a need for exploring the use of multi-agent reinforcement learning (MARL) in resource allocation as it has shown promising results in other areas of machine learning.There is also a gap in the evaluation of these proposed resource allocation techniques as most existing studies use simulationbased results rather than real-world implementation and testing.Therefore, further research in this field can contribute to the development of more efficient and adaptive resource allocation policies for UAV-assisted networks.

The Proposed Model
In this article, we proposed a novel IRA-AEODL technique for efficient resource allocation in UAV networks.A key advantage of the proposed IRA-AEODL technique compared to existing solutions is its ability to maximize system utility over a set of users by combining user association, energy scheduling, and trajectory design.Figure 1 visually demonstrates the overall architecture of the IRA-AEODL approach.Furthermore, a 3D Cartesian coordinate system is used to ensure optimal coverage for each user.The user set and UAV swarm are represented as U and M, respectively, with |M| = M and |U| = U.The trajectory of each UAV is modeled through time slots, t, with t∈{1,2,T}.Additionally, the constellation of UAVs is assumed to fly at a fixed height H. Finally, the base station or satellite is responsible for the learning procedure required to ensure optimization within the IRA-AEODL approach.The main motivators for using this technique are (i) the availability and ease of access to unlabeled data; (ii) the potential for notable enhancements in the model's performance by including a significant amount of unlabeled data in training; and (iii) the practical constraints of human resources in terms of labeling data.To assess the efficacy of this approach, we conducted a practical analysis on a genuine dataset which showcased the considerable boost in the overall classification accuracy of the SSAE model through the inclusion of a substantial quantity of unlabeled data in the pre-training stage.

The Proposed Model
In this article, we proposed a novel IRA-AEODL technique for efficient resource allocation in UAV networks.A key advantage of the proposed IRA-AEODL technique compared to existing solutions is its ability to maximize system utility over a set of users by combining user association, energy scheduling, and trajectory design.Figure 1 visually demonstrates the overall architecture of the IRA-AEODL approach.Furthermore, a 3D Cartesian coordinate system is used to ensure optimal coverage for each user.The user set and UAV swarm are represented as U and M, respectively, with |M| = M and |U| = U.
The trajectory of each UAV is modeled through time slots, t, with t∈{1,2,T}.Additionally, the constellation of UAVs is assumed to fly at a fixed height H. Finally, the base station or satellite is responsible for the learning procedure required to ensure optimization within the IRA-AEODL approach.The main motivators for using this technique are (i) the availability and ease of access to unlabeled data; (ii) the potential for notable enhancements in the model's performance by including a significant amount of unlabeled data in training; and (iii) the practical constraints of human resources in terms of labeling data.To assess the efficacy of this approach, we conducted a practical analysis on a genuine dataset which showcased the considerable boost in the overall classification accuracy of the SSAE model through the inclusion of a substantial quantity of unlabeled data in the pre-training stage.

System Model
Assume M > 1 UAVs share a similar frequency spectrum and a group of U > 1GUs.The GU set and UAV swam are represented as U and M, respectively [18].We have |M| = M and |U | = U.Each UAV provides service to the user in successive time slots.We represent the time slot as t; t ∈ {1, 2, T}.The total period was represented as T .In the presented model, take a 3D Cartesian coordinate system but the predetermined position of every GU u represented by vertical and horizontal coordinates, for example, ϕ u = [x u , y u ] T ∈ R 2×1 , u ∈ U .Each UAV is considered to fly at a fixed distance d h = H above ground and the coordinate of UAVs m at t time was represented as Assume a base controller is performing the learning procedure that could be BS or satellite.Furthermore, the UAV is capable of communicating within the swam.
Assume each UAV will fly back to the base hence the trajectory needs to fulfil the subsequent constraints Moreover, the trajectory of the UAV is also subjected to specific constraints of distance and speed, which are the following: (2) where V max denotes the maximal speed of UAV and S in represents the minimal inter-UAV distance to prevent specific collision or interference.Consequently, the distance between UAV m and user u in the t time slot is shown below:

Path Loss Model
The UAV is capable of establishing an LoS link with GU.Since the changes in realtime environments (urban, rural, suburban, and so on) are generally unpredictable, the randomness related to LoS and Non-LoS (NLoS) in a specific time must be considered while developing the UAV.Consequently, consider the GU connection with UAV through the LoS connection with specific probability which we represent as LoS probability.The LoS probability depends on the environment and the location of GU and UAV.
In Equation ( 5), ξ 1 and ξ 2 denote the constant; the value depends on the environment and carrier frequency.θ m,u (t) indicates the elevation angle as follows: The LoS and NLoS path loss methods between the user u and UAV m are shown below: In Equation ( 7), η 1 and η 2 denote the excess coefficient in LoS and NLoS links, respectively.f c indicates the carrier frequency, c represents the light speed, and α shows the path loss exponent.Assuming the UAV and GU locations, it is challenging to define whether it is LoS or NLoS path loss method that must be utilized in the UAVs technique.

Transmission Model
A binary parameter β m,u (t) is determined as the user association indicator to express the user relationship between GU and UAV, which is Consider one GU passing through one UAV in a provided time slot, viz., ∑ M m=1 β m,u (t) ≤ 1.Furthermore, the transmit power of UAV m for u was represented by p m,u (t) and the channel gain between UAV m and user u is represented by h m,u (t).
As a result, different UAVs could cause interference with GU u, γ m,u (t), modeled as SINR of the relationship between m and u, as follows: In Equation ( 11), σ 2 denotes the noise variance.It should be such that the transmit power, channel state, and trajectory of the UAV are continuous.Next, after quantizing and partitioning the value into distinct levels within the range, in every t time slot, the value of this variable is understood as a discrete counterpart.

SSAE-Based Resource Allocation Scheme
To optimally allocate the UAV policies, the SSAE model is used in the UAV networks.The building block of SSAE in the AE is an archetypal NN that learns to map the input X to output Y [18].The entire AE is split into decoder and encoder parts: the encoded part (W X , B X ), which maps the input X to the code I c , and the decoder part (W Y , B Y ), which maps the code to the reconstruction data Y.The architecture of SSAE was demonstrated in Figure 2, the decoding part is with weighted W Y and bias B Y and the encoding part is with W X weight and B X bias.Thus where the output Y represents the estimate of input X and g LS indicates the log sigmoid function: The SAE is different from the AE model.The sparsity could assist AE to attain the best performance.To minimalize the error between the output Y and the input vector X, the raw loss function of AE is assumed as follows: In Equation (15), N S denotes the number of training instances.From Equations ( 12) and ( 13), the output Y is formulated as follows

Hyperparameter Tuning using the AEO Algorithm
For the hyperparameter tuning process, the AEO algorithm is used for enhancing the performance of the SSAE model.The AEO is an innovative nature-inspired metaheuristic algorithm that hinges on the energy transmission model among living creatures that assist to maintain species stability [19].The three operators that are utilized to obtain solutions are decomposition, production, and consumption.The energy flow in an ecosystem consists of decomposers, producers, and consumers.

Production
In AEO, the producer represents the worse individual in the population.Thus, it needs to be upgraded concerning the optimal individual by the lower and upper boundaries such that it helps others to find other areas.Through the production operator, a new individual is produced, among randomly generated  and the best (x) individuals by substituting the prior one.The mathematical representation of the production operator is shown below: Here,  represents the population size,  signifies the iteration number,  and  denote upper and lower boundaries, and  signifies a random integer that lies between [0,1]. and  denote a random vector within 0,1 and a linear weight coefficient.
The  coefficient provided in Equation ( 21) assists to drift the individual linearly from the random location to the optimal individual through iteration.In Equation ( 16), g AE denotes the abstract of the AE function.Thus, Equation ( 15) is formulated as follows: To learn a trivial mapping or prevent over-complete mapping, we determine one regularized term Γ s of the sparsity constraint and one L 2 regularization term Γ w of the weight W X , W y and it is expressed below: In Equation (18), a s and a w refer to the sparsity and weight regulation factors.The sparsity regularization term can be represented as follows: In Equation (19), ρj denotes the j-th neuron's average activation value over each N s trained sample, |I| denotes the number of components of internal code output I C , ρ indicates the desirable value, termed the sparsity proportion factor, and g KL represents the Kullback-Leibler divergence function.The weight regularization term can be represented as follows: SAE is utilized as a key component and the last SSAE classifiers by subsequent three processes are constructed by the following actions: (i) append the softmax layer at the end of the AI method; (ii) involve preprocessing, input, vectorization, and 2D-FrFE layers; and (iii) stack the available SAE.In the classifier stage, four SAE blocks with many neurons of (N 1 , N 2 , N 3 , N 4 ) are applied.As a result of the trial-and-error method, we apply four SAE blocks.Lastly, the softmax layer with the neuron of N c is appended, where N c denotes the number of fruit classes.

Hyperparameter Tuning using the AEO Algorithm
For the hyperparameter tuning process, the AEO algorithm is used for enhancing the performance of the SSAE model.The AEO is an innovative nature-inspired metaheuristic algorithm that hinges on the energy transmission model among living creatures that assist to maintain species stability [19].The three operators that are utilized to obtain solutions are decomposition, production, and consumption.The energy flow in an ecosystem consists of decomposers, producers, and consumers.

Production
In AEO, the producer represents the worse individual in the population.Thus, it needs to be upgraded concerning the optimal individual by the lower and upper boundaries such that it helps others to find other areas.Through the production operator, a new individual is produced, among randomly generated (x rand ) and the best (x) individuals by substituting the prior one.The mathematical representation of the production operator is shown below: Here, n represents the population size, T signifies the iteration number, Ub and Lb denote upper and lower boundaries, and r 1 signifies a random integer that lies between [0,1].r and α denote a random vector within [0, 1] and a linear weight coefficient.The α coefficient provided in Equation ( 21) assists to drift the individual linearly from the random location to the optimal individual through iteration.

Consumption
The consumers perform this operation and then the production operator finishes the production.Each consumer may eat an arbitrarily selective consumer taking low energy or a producer for obtaining energy.A Lévy flight is a random walk termed as a consumption factor (C) and was determined as follows for enhancing the exploration ability: N(0, 1) represents the normal distribution for the mean and SD equivalent to zero and one, respectively.Distinct approaches can be implemented with various kinds of users.A consumer eats only the producer in case of being arbitrarily selective as a herbivore (x 2 and x 5 are herbivore consumers, therefore, consume only producer x 1 ).This strategy was depicted in Equation (26).
A consumer only eats another consumer with a high energy level once it can be selective as a carnivore arbitrarily (a consumer in individuals of x 2 -x 5 are consumed by consumer x 6 as the last is a carnivore and takes a lower energy level than individuals of x 2 -x 6 ).A carnivore performance was demonstrated as follows: Uniquely from the last two performances, a consumer with a high level of energy or producer is arbitrarily eaten by the user when it can be selected as an omnivore arbitrarily (either the producer x 1 or arbitrarily selected users in x 2 -x 6 is eaten by x 7 since it can be an omnivore and is the low energy level of x 2 -x 6 ).
whereas, r 2 implies the random number from the range of zero to one.A searching individual's place was upgraded in terms of both arbitrarily selective and worse individuals from the population utilizing the consumption operator.Hence, it permits the technique for executing a global search.

Decomposition
SDecomposition is a vital procedure for taking a suitably working ecosystem.The decomposer breaks down every dead individual continuously from the population for providing needed nutrients for the producer's development.The decomposition feature of D together with weighted coefficients of h and e can be intended for the mathematical model.Individuals' parameters support upgrading the location of x i (ith individual) by the location of x n (the decomposer position).Besides, every individual's next position has been permitted for spreading nearby the decomposer (optimum individual).The mathematical formula is provided as follows:

Results and Discussion
In this section, the experimental validation of the IRA-AEODL technique is examined under various aspects.Table 1 and Figure 3 report a comparative average throughput (ATHRO) study of the IRA-AEODL technique with recent models [20].The outcomes indicate the increasing ATHRO values of the IRA-AEODL technique under all K values.For K = 2, the IRA-AEODL technique obtains a higher ATHO value of 1.62 bps while the MP, RP, MAB, DQL, and MADDPG [21] models accomplish reduced ATHO values of 0.71 bps, 0.72 bps, 1.43 bps, 1.50 bps, and 1.57 bps, respectively.Similarly, with K = 6, the IRA-AEODL technique reaches improving ATHO of 1.72 bps while the MP, RP, MAB, DQL, and MADDPG models result in reduced ATHO values of 1.20 bps, 1.06 bps, 1.47 bps, 1.59 bps, and 1.66 bps, respectively.The proposed DNN was trained on an offline dataset of simulated UAV-aided mmWave.The parameters of the proposed algorithm were optimized to obtain the best learning performance.The training was conducted for 1000 epochs using Keras and Tensorflow on a Nvidia GTX 1060 GPU.The accuracy comparison of the proposed DNN was conducted against existing state-of-the-art algorithms.The results showed that the proposed IRA-AEODL technique achieved an average improvement in 11.5% over existing algorithms.This accuracy improvement was attributed to the stacked sparse autoencoder's ability to efficiently perform resource allocation and the AEO algorithm's ability to optimize the model.The proposed model is a Deep Neural Network (DNN) model that has been trained on a dataset of images of different fruits.The DNN architecture uses convolutional layers to extract features from the images, followed by a densely connected set of layers to identify the classes of fruits.The training process will involve feeding the DNN model with labeled images of each of the desired fruit classes.The model will learn the features associated with each class and develop a set of weights that will allow it to recognize which fruits belong to which class.After training has been completed, the model can then be used to classify new images of fruits into their respective classes.Additionally, to improve accuracy, the model can also be fine-tuned using data augmentation techniques, such as randomly adjusting the size and orientation of the images as well as adjusting the brightness and contrast.This can help the model to better recognize the features in different images.Once training and fine-tuning is complete, the DNN can then be tested with a set of validation images to ensure that it is able to accurately classify the different types of fruits.Once satisfactory accuracy has been achieved, the model can then be deployed for use in applications.
Table 2 and Figure 4 demonstrate a comparative ATHRO study of the IRA-AEODL method with recent methods.The results represent the increasing ATHRO values of the IRA-AEODL technique under varying time slots.For 100 time slots, the IRA-AEODL method attains a maximum ATHO value of 1.84 bps whereas the MP, RP, MAB, DQL, and MADDPG methods attain decreased ATHO values of 0.97 bps, 0.92 bps, 1.47 bps, 1.58 bps, and 1.72 bps, respectively.Similarly, with 300-time slots, the IRA-AEODL method attains an increasing ATHO of 1.83 bps while the MP, RP, MAB, DQL, and MADDPG methods resulted in decreased ATHO values of 1.14 bps, 1.05 bps, 1.58 bps, 1.69 bps, and 1.75 bps, resepctively.Table 3 and Figure 5 illustrate a comparative ATHRO study of the IRA-AEODL method with recent models.The results indicate the increasing ATHRO values of the IRA-AEODL technique under varying users.For 100 users, the IRA-AEODL technique obtains a higher ATHO value of 1.84 bps while the MP, RP, MAB, DQL, and MADDPG methods accomplish reduced ATHO values of 1.04 bps, 0.84 bps, 1.49 bps, 1.52 bps, and 1.70 bps, respectively.Similarly, with 300 users, the IRA-AEODL technique reaches an improving ATHO of 2.28 bps while the MP, RP, MAB, DQL, and MADDPG models resulted in reduced ATHO values of 1.43 bps, 1.36 bps, 1.79 bps, 1.91 bps, and 2.06 bps, respectively.Table 6 and Figure 8      Finally, the average reward examination of the IRA-AEODL technique with different models takes place in Table 7 and Figure 9.The results demonstrate that the IRA-AEODL technique gains increasing reward values over other models.For instance, with 200 episodes, the IRA-AEODL technique attains an increasing average reward of 1.41 while the MAB, DQL, and MADDPG techniques obtain reducing average rewards of 1.28, 1.34, and 1.29, respectively.Meanwhile, with 800 episodes, the IRA-AEODL technique attains an increasing average reward of 1.44 while the MAB, DQL, and MADDPG methods attain reducing average rewards of 1.31, 1.38, and 1.43, respectively.Eventually, with 1600 episodes, the IRA-AEODL technique attains an increasing average reward of 1.54 while the MAB, DQL, and MADDPG techniques obtain reducing average rewards of 1.29, 1.43, and 1.47, respectively.These results exhibited the superior performance of the IRA-AEODL technique over other existing models on the UAV networks.

Conclusions
In this article, we introduced a new IRA-AEODL technique for the optimal allocation of resources in UAV networks.The presented IRA-AEODL technique is intended for the effectual allocation of resources in wireless UAV networks.Here, the IRA-AEODL technique focused on the maximization of system utility over all users, combined trajectory design, user association, and energy scheduling.To optimally allocate the UAV policies, the SSAE model is used in the UAV networks.For the hyperparameter tuning process, the AEO algorithm is used to enhance the performance of the SSAE model.The experimental results of the IRA-AEODL technique are examined under different aspects and the outcomes stated the better performance of the IRA-AEODL approach over recent state of art approaches.In the future, the ensemble learning process can be included to improve the resource allocation performance of the IRA-AEODL technique.In comparison to other learning methods, the proposed algorithm has several advantages such as fast convergence, improved local optimization ability, and well-balanced global or local search ability.With the help of production, consumption, and decomposition operators, the proposed model is able to quickly explore the search space and find the optimal solution.Therefore, the proposed algorithm is vital for hyperparameter tuning as it ensures optimal results, sustainability, and robustness compared to other learning models.

Figure 1 .
Figure 1.Overall procedure of the IRA-AEODL system.

Figure 1 .
Figure 1.Overall procedure of the IRA-AEODL system.

Figure 3 .
Figure 3. ATHRO analysis of the IRA-AEODL approach under varying UAVs.Table 2 and Figure4demonstrate a comparative ATHRO study of the IRA-AEODL method with recent methods.The results represent the increasing ATHRO values of the IRA-AEODL technique under varying time slots.For 100 time slots, the IRA-AEODL method attains a maximum ATHO value of 1.84 bps whereas the MP, RP, MAB, DQL, and MADDPG methods attain decreased ATHO values of 0.97 bps, 0.92 bps, 1.47 bps, 1.58 bps, and 1.72 bps, respectively.Similarly, with 300-time slots, the IRA-AEODL method attains an increasing ATHO of 1.83 bps while the MP, RP, MAB, DQL, and MADDPG methods resulted in decreased ATHO values of 1.14 bps, 1.05 bps, 1.58 bps, 1.69 bps, and 1.75 bps, resepctively.

Figure 3 .
Figure 3. ATHRO analysis of the IRA-AEODL approach under varying UAVs.

Figure 4 .
Figure 4. ATHRO analysis of the IRA-AEODL approach under varying time slots.Table 3 and Figure 5 illustrate a comparative ATHRO study of the IRA-AEODL method with recent models.The results indicate the increasing ATHRO values of the IRA-AEODL technique under varying users.For 100 users, the IRA-AEODL technique obtains a higher ATHO value of 1.84 bps while the MP, RP, MAB, DQL, and MADDPG methods accomplish reduced ATHO values of 1.04 bps, 0.84 bps, 1.49 bps, 1.52 bps, and 1.70 bps, respectively.Similarly, with 300 users, the IRA-AEODL technique reaches an improving ATHO of 2.28 bps while the MP, RP, MAB, DQL, and MADDPG models resulted in reduced ATHO values of 1.43 bps, 1.36 bps, 1.79 bps, 1.91 bps, and 2.06 bps, respectively.

Figure 4 .
Figure 4. ATHRO analysis of the IRA-AEODL approach under varying time slots.

Figure 5 .
Figure 5. ATHRO analysis of the IRA-AEODL approach under varying users.Table 4 and Figure6depict a comparative ATHRO study of the IRA-AEODL technique with recent models.The outcomes indicate the increasing ATHRO values of the IRA-AEODL technique under varying energy arrival Emax.For 80 energy arrival Emax, the IRA-AEODL technique attains a higher ATHO value of 1.73 bps while the MAB, DQL, and MADDPG methods obtain minimum ATHO values of 1.55 bps, 1.66 bps, and 1.71 bps respectively.Similarly, with the 160 energy arrival Emax, the IRA-AEODL technique reaches an improving ATHO of 1.85 bps while the MAB, DQL, and MADDPG models resulted in reduced ATHO values of 1.75 bps, 1.80 bps, and 1.83 bps, respectively.

Figure 5 .
Figure 5. ATHRO analysis of the IRA-AEODL approach under varying users.Table 4 and Figure 6 depict a comparative ATHRO study of the IRA-AEODL technique with recent models.The outcomes indicate the increasing ATHRO values of the IRA-AEODL technique under varying energy arrival E max .For 80 energy arrival E max , the IRA-AEODL technique attains a higher ATHO value of 1.73 bps while the MAB, DQL, and MADDPG methods obtain minimum ATHO values of 1.55 bps, 1.66 bps, and 1.71 bps respectively.Similarly, with the 160 energy arrival E max , the IRA-AEODL technique reaches an improving ATHO of 1.85 bps while the MAB, DQL, and MADDPG models resulted in reduced ATHO values of 1.75 bps, 1.80 bps, and 1.83 bps, respectively.Table5and Figure7demonstrate a comparative ATHRO study of the IRA-AEODL technique with recent methods.The results indicate the increasing ATHRO values of the IRA-AEODL technique under varying battery capacity (BC).For 3000 BC, the IRA-AEODL technique obtains a higher ATHO value of 1.74 bps while the MAB, DQL, and MADDPG methods accomplish reduced ATHO values of 1.55 bps, 1.63 bps, and 1.70 bps, respectively.Similarly, with 5000 BC, the IRA-AEODL technique reaches an improving ATHO of 1.79 bps while the MAB, DQL, and MADDPG models resulted in reduced ATHO values of 1.60 bps, 1.67 bps, and 1.79 bps, respectively.

Figure 6 .
Figure 6.ATHRO analysis of the IRA-AEODL approach under varying energy arrival values Emax.Table 5 and Figure 7 demonstrate a comparative ATHRO study of the IRA-AEODL technique with recent methods.The results indicate the increasing ATHRO values of the IRA-AEODL technique under varying battery capacity (BC).For 3000 BC, the IRA-AEODL technique obtains a higher ATHO value of 1.74 bps while the MAB, DQL, and MADDPG methods accomplish reduced ATHO values of 1.55 bps, 1.63 bps, and 1.70 bps, respectively.Similarly, with 5000 BC, the IRA-AEODL technique reaches an improving ATHO of 1.79 bps while the MAB, DQL, and MADDPG models resulted in reduced ATHO values of 1.60 bps, 1.67 bps, and 1.79 bps, respectively.

Figure 7 .
Figure 7. ATHRO analysis of the IRA-AEODL approach under varying battery capacity.
depict a comparative ATHRO study of the IRA-AEODL technique with recent models.The results indicate the increasing ATHRO values of the IRA-AEODL technique under varying Energy Transfer b/w Two UAVs (ETTUAV).For 3000 ETTUAV, the IRA-AEODL technique attains a higher ATHO value of 1.69 bps while the MAB, DQL, and MADDPG methods accomplish reduced ATHO values of 1.54 bps, 1.58

Figure 7 .
Figure 7. ATHRO analysis of the IRA-AEODL approach under varying battery capacity.

Figure 8 .
Figure 8. ATHRO analysis of the IRA-AEODL approach with other systems under varying ETTUAV.

Figure 9 .
Figure 9. Average reward analysis of the IRA-AEODL approach under varying episodes.Meanwhile, with 800 episodes, the IRA-AEODL technique attains an increasing average reward of 1.44 while the MAB, DQL, and MADDPG methods attain reducing average rewards of 1.31, 1.38, and 1.43, respectively.Eventually, with 1600 episodes, the IRA-AEODL technique attains an increasing average reward of 1.54 while the MAB, DQL, and MADDPG techniques obtain reducing average rewards of 1.29, 1.43, and 1.47, respectively.These results exhibited the superior performance of the IRA-AEODL technique over other existing models on the UAV networks.5.ConclusionsIn this article, we introduced a new IRA-AEODL technique for the optimal allocation of resources in UAV networks.The presented IRA-AEODL technique is intended for the effectual allocation of resources in wireless UAV networks.Here, the IRA-AEODL technique focused on the maximization of system utility over all users, combined trajectory design, user association, and energy scheduling.To optimally allocate the UAV policies, the SSAE model is used in the UAV networks.For the hyperparameter tuning process, the AEO algorithm is used to enhance the performance of the SSAE model.The experimental results of the IRA-AEODL technique are examined under different aspects and the out-

Figure 9 .
Figure 9. Average reward analysis of the IRA-AEODL approach under varying episodes.

Table 1 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying UAVs.

Table 2 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying time slots.

Table 2 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying time slots.

Table 3 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying users.

Table 3 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying users.

Table 4 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying energy arrival values Emax.

Table 4 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying energy arrival values E max .

Table 5 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying battery capacity.

Average Throughput (bps) Battery Capacity (C) MAB DQL
Figure 6.ATHRO analysis of the IRA-AEODL approach under varying energy arrival values E max .

Table 5 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying battery capacity.

Table 6 and
Figure 8depict a comparative ATHRO study of the IRA-AEODL technique with recent models.The results indicate the increasing ATHRO values of the IRA-AEODL technique under varying Energy Transfer b/w Two UAVs (ETTUAV).For 3000 ETTUAV, the IRA-AEODL technique attains a higher ATHO value of 1.69 bps while the MAB, DQL, and MADDPG methods accomplish reduced ATHO values of 1.54 bps, 1.58 bps, and 1.65 bps, respectively.Similarly, with 5000 ETTUAV, the IRA-AEODL technique reaches an improving ATHO of 1.78 bps while the MAB, DQL, and MADDPG models resulted in reduced ATHO values of 1.65 bps, 1.70 bps, and 1.77 bps, respectively.

Table 6 .
ATHRO analysis of the IRA-AEODL approach with other systems under varying Energy Transfer b/w Two UAVs.
Drones 2023, 7, x FOR PEER REVIEW 16 of 19 395 Figure 8. ATHRO analysis of the IRA-AEODL approach with other systems under varying 396 ETTUAV.397 Finally, the average reward examination of the IRA-AEODL technique with different 398 models takes place in Table 7 and Figure 9.The results demonstrate that the IRA-AEODL 399 technique gains increasing reward values over other models.For instance, with 200 epi-400 sodes, the IRA-AEODL technique attains an increasing average reward of 1.41 while the 401 MAB, DQL, and MADDPG techniques obtain reducing average rewards of 1.28, 1.34, and 403

Table 7 .
Average reward analysis of the IRA-AEODL approach with other systems under varying

Table 7 .
Average reward analysis of the IRA-AEODL approach with other systems under varying pisodes.