On the Application of Machine Learning to the Design of UAV-Based 5G Radio Access Networks

: A groundbreaking design of radio access networks (RANs) is needed to fulfill 5G traffic requirements. To this aim, a cost-effective and flexible strategy consists of complementing terrestrial RANs with unmanned aerial vehicles (UAVs). However, several problems must be solved in order to effectively deploy such UAV-based RANs (U-RANs). Indeed, due to the high complexity and heterogeneity of these networks, model-based design approaches, often relying on restrictive assumptions and constraints, exhibit severe limitation in real-world scenarios. Moreover, design of a set of appropriate protocols for such U-RANs is a highly sophisticated task. In this context, machine learning (ML) emerges as a useful tool to obtain practical and effective solutions. In this paper, we discuss why, how, and which types of ML methods are useful for designing U-RANs, by focusing in particular on supervised and reinforcement learning strategies. include: (a) design of ML-based cross-layer algorithms; integration of multiple antennas devices in ML solutions; developing ML approaches explicitly accounting for security aspects; and investigation of pros and cons of unsupervised learning procedures.


Introduction
Future cellular communication systems should be capable of adaptively changing their radio access functions in response to dynamic changes in the environment [1,2]. In particular, radio access networks (RANs) have to face with the high variability of data traffic patterns, by deploying additional stations, acting either as base (BSs) or relay stations (RSs), whenever required and in case of unexpected events, such as, e.g., peaks of multimedia data traffic and occurrence of disasters. In such situations, conventional deployment of terrestrial BSs and RSs is not feasible, due to placement costs, long installation time, and environmental limitations.
There has recently been an increasing interest in integrating communication devices mounted onboard of unmanned aerial vehicles (UAVs) into 5G and beyond networks [3]. From a communication perspective, UAVs can operate (see Figure 1) as flying stations (BSs or RSs) to increase the coverage area, balance traffic load, and enhance network capacity [4]. Along with their inherent features, such as, e.g., mobility, flexibility, and variable altitude, UAV-based stations can be deployed faster, offer higher flexibility for reconfiguration, and provide line-of-sight (LoS) connectivity towards users. In spite of their many benefits, integration of UAVs into terrestrial networks introduces some challenges, both from a theoretical and practical viewpoint, which must be properly addressed [5,6], such as, e.g., propagation channel modeling, 3D trajectory planning, energy-efficient design, radio resource allocation, and user association. In addition to the customary throughput-delay trade-off [7], further trade-offs must be taken into account: (i) probability of LoS connection versus flying altitude and signal-to-noise ratio (SNR); (ii) number of UAVs, supported data rate, and network lifetime versus signal-to-noise-plus-interference ratio (SINR); and (iii) speed of the UAVs and network coverage versus energy consumption and outage probability.
From the physical (PHY) to the application layer, effective protocol design for UAV-based RANs (U-RANs) is a challenging task, since a large number of heterogeneous variables must be taken into account, such as, e.g., physical characteristics of the UAVs, flight duration, storage capacity, distance from terrestrial nodes, terrestrial node density, and communication requirements. Therefore, model-based optimization approaches for U-RANs typically include plenty of restrictive assumptions and constraints [8][9][10][11], which limit their usefulness in practice. Moreover, they do not provide the desirable adaptivity that is needed to cope with rapidly changing system parameters.
In recent years, data-driven machine learning (ML) methods-including support vector machines, decision-tree learning, Bayesian networks, genetic algorithms, and rule-based learning-have been developed and utilized to support the design and operation of complex communication systems [12]. Very recently, innovative ML techniques, such as deep learning (DL) and reinforcement learning (RL) methods, have attracted attention for UAV-based wireless communications [13][14][15][16][17][18][19][20][21]. The advantage in using ML tools for the design of U-RANs stems from the fact that they easily allow one to take into account application-specific issues, such as, among the others, the choice of the best type of UAV [22][23][24][25][26][27][28], Doppler effects due to the UAV motion, dynamic positioning, interference management, and load balancing, which are instead difficult to incorporate into more conventional model-based design approaches.
Similarly to other applications, e.g., those typical of Internet of Things (IoT) and smart cities, the use of advanced ML tools for U-RAN design is expected to ease fulfillment of the stringent requirements of 5G and beyond systems [29]. Indeed, ML tools are capable of automatically extracting from the huge amount of available data valuable hidden information, and can take into account plenty of parameters, which are typically not considered even in an oversimplified manner by model-based approaches. Additionally, ML models can easily incorporate different metrics, such as energy consumption of UAVs for hovering and transmitting/receiving information, into a single feature called information bit per joule of flying time [30]. Motivated by the above considerations, the main focus of this paper is to highlight the possible U-RAN scenarios in which the use of ML can allow system designers to take into account specific issues of UAVs, by evidencing the fundamental limits of model-based approaches.
The paper is articulated as follows. Related work is discussed in Section 1.1, where, moreover, the novel issues covered by our survey are underlined. In Section 2, the main application scenarios are introduced, whereas technical challenges and requirements are discussed in Section 3, including the main disadvantages of conventional model-based solutions [31][32][33][34][35] when applied to this problem. The principles of ML techniques are discussed in Section 4, with reference to a number of typical problems in U-RAN design. Finally, promising future research topics and their potential impacts are outlined in Section 5.

Related Work
A survey on UAV positioning for throughput maximization using DL tools has been presented in [36]. Multi-layer perceptron (MLP) and long short term memory (LSTM) approaches are used to determine the UAV position maximizing the overall system performance and user throughput. Specific methods of ML and the hybrid of MLP-LSTM for classification/regression tasks and K-means algorithms are applied for automatic clustering of classes. Moreover, it is suggested to apply DL for indoor-outdoor UAV positioning in various environments, such as urban, suburban, and rural areas. Our paper is complementary with respect to [36], since we present additional U-RAN applications and discuss potential challenges by highlighting the possible solutions based on SL and RL methods.
Survey [37] provides a detailed description of all relevant research, wherein ML techniques have been employed in U-RAN to improve different design and functional aspects, such as channel modeling, resource management, positioning, and security. Although [37] is a rather comprehensive survey, some U-RAN applications and challenges have not been discussed. Moreover, approaches relying on SL are not considered for several U-RAN scenarios and applications. In our paper, instead, we have considered all the main applications of U-RAN along with their challenges, by delineating how the available information can be used in SL and multi-objective SL approaches.
ML applications in wireless sensor networks from networking and application perspectives have been discussed in the survey [25], which gives a good perspective of general applications in sensor networks that can be solved using ML methods. However, the use of UAVs as BSs (i.e., U-RANs) is not considered in this work.
Several UAV applications have been discussed in [38]. A number of existing applications have been categorized into groups that have distinct qualitative and quantitative communication needs. However, Ref. [38] is mainly devoted to performance analysis, and recent solutions based on ML are not discussed.

Application Scenarios
In the following subsections, three different scenarios where UAVs have a major impact on system design are briefly outlined. We do not consider those applications wherein network optimization issues are less challenging, e.g., UAVs used as static data collectors/aggregators or gateways.

UAV-Mounted Base Stations
An UAV-mounted BS is a flying vehicle equipped with one or more different types of payloads, aimed at providing radio access to ground users ( Figure 1). Flying BSs can be used in non-accessible areas or in all those situations where temporary, unexpected, and critical events require additional communication resources. UAV-based BSs with caching capabilities can also provide download links to ground users in content delivery applications, thus significantly decreasing terrestrial backhaul demands.
The coverage area supported by an UAV-mounted BS is a function of many parameters [1,[39][40][41], such as, the type of RAN, the transmission power (depending on large-and small-scale fading of the channel), the antenna characteristics, the environment, the user density, and the UAV characteristics (i.e., flying altitude, battery lifetime, hovering time duration, storage size, and processing capabilities). In particular, multi-tier U-RANs serving the same area by several cells of different sizes can be employed to accommodate large traffic volumes while keeping high quality of service (QoS) and good coverage [10,42,43].
With respect to conventional RANs, the use of flying BSs allows one to better serve ground users with different requirements in terms of mobility, data rate, delay sensitivity, and reliability. Furthermore, they can be used to easily integrate recently developed communication technologies, such as free space optical (FSO) or millimeter-wave (mmWave) transmissions, which can achieve high data rate, assuring low spectrum costs [44][45][46].

UAV-Based Cooperation
Relaying is a way to improve transmission reliability over fading channels, by opportunistically taking advantage of the broadcast nature of wireless communications [47]. Due to practical constraints, i.e., limited mobility and wired backhauls, many relays in terrestrial RANs are placed in fixed locations, thus providing only static relaying functionalities. Using UAVs as relays (see Figure 2) introduces additional degrees of freedom: for instance, UAVs can move between a source-destination pair, by uploading data from the source and, then, downloading it to the destination (so-called data ferrying) [38]. In addition to relaying, cooperation among BSs can be beneficial to system performances as well. Specifically, neighboring UAV-based BSs can cooperate among themselves and/or with terrestrial networks, with the aim of decreasing the number and the latency of handovers, optimizing power and resource allocation, and mitigating interference of terrestrial and air-to-ground (A2G) transmissions [48,49].

UAV-Based Software-Defined Networks
Network function virtualization (NFV) is commonly used to virtualize network functions on servers with general-purpose, standard storage devices, and switches, thus allowing a programmable network architecture, which is particularly useful to seamlessly integrate UAVs into existing networks. Moreover, by separating the control and data planes, software-defined networking (SDN) provides centralized control, a comprehensive view of the network, ease of reconfiguration, and arrangement of network functions through flow-based networking.
In a cellular network, a centralized SDN controller performs efficient radio resource and mobility management, which is particularly important to enable U-RANs [40]. For instance, SDN-based load balancing can be very useful for multi-tier U-RANs, so as to suitably weigh up the load of each UAV-mounted BS and terrestrial BS. Moreover, SDN allows faster switching between RANs, enabling the use of different RANs in multi-tier UAV-based cell networks.
U-RANs can be particularly useful in cloud computing architectures (see Figure 3), since they can perform remote processing and computing, either to reduce the load of the main cloud center, or to substitute damaged parts of the terrestrial networks in case of natural disasters. Furthermore, network architectures based on hierarchical SDN controllers are able to allow granular management of flows through UAV cells to implement efficient handoff and routing procedures [38,48,49].

Technical Challenges and Requirements
Compared to terrestrial networks, the use of UAVs in RANs offers additional degrees of freedom at the designer's disposal. However, their integration in future network infrastructures poses important technical challenges and requirements. Although some of these issues are application-dependent, we describe herein the main common problems. Moreover, even though such issues are presented separately, they might be closely intertwined in practice.

Payload and Flight-Time Constraints
Size and weight limitations of UAVs induce severe restrictions on the embarked payload (including cameras, sensors, and transceiver machinery), which motivates the need for a careful choice of the most efficient type of UAV, especially in multi-tier UAV cells. Moreover, UAVs have limited on-board batteries and are subject to strict flight regulation rules and weather conditions. Therefore, minimizing the flight time, while meeting the demands needed to optimize the service performance, is another challenge in U-RANs [50][51][52][53]. For instance, in U-RANs supporting different applications, the choice of the type of UAV depends on payload, data generating period, delay sensitivity, and required amount of data processing. As another example, the usage of heavier UAVs is more suitable in windy weather, since they are more robust. However, their hovering time is limited and their maneuvering flexibility is low.

Optimal UAV Placement and Trajectory Optimization
UAVs are expected to operate in highly dynamic and heterogeneous environments, since they must serve users moving with different speeds and characterized by different communication requirements. Moreover, some network parameters-such as cell load and backhaul conditions-can be changing as well. In this respect, 2D/3D optimal placement and trajectory optimization of UAVs are crucial issues, which are even more challenging if the UAVs are allowed to cooperate with each other. In particular, the altitude of UAVs, based on the antenna beamwidth, size and user density of the area to be covered, as well as the number of deployed UAVs are other issues that should be considered, which tremendously increase the complexity of system design [53,54].
Many state-of-the-art solutions do not provide the required flexibility or adaptability to cope with environmental changes, since mobility of users is not adequately taken into account and optimized UAV placement is carried out only for specific and/or static scenarios. In this case, any environmental change imposes a redesign of the whole system in order to adapt it to the new situation. For these reasons, these types of solutions might not be suitable for practical situations, wherein the environment is frequently changing [53][54][55].

Channel Acquisition and Reconstruction
In many applications involving U-RANs, acquisition and tracking of channel state information (CSI) at the transmitters and/or receivers is of paramount importance to optimize system performance [56]. Thus, large-and small-scale characterization of the communication channel plays a major role in U-RANs, especially for multiple-input multiple-output (MIMO) systems and mmWave transmissions. Most existing channel models and CSI acquisition/tracking methods targeted at terrestrial RANs cannot be directly applied to U-RANs. Indeed, although an UAV flying over a non-crowded zone might be experiencing favorable propagation conditions (e.g., strong LoS with negligible multipath), CSI acquisition is generally more challenging since both communicating parties (i.e., access points and user devices) might be in motion. Moreover, high-speed UAVs flying in urban environments experience different propagation scenarios (crowded and narrow streets, large roads, parks, etc.) which might induce variable SINR values and frequent changes in the fading scenario from LoS to non-LoS (NLOS) states [57]. In these cases, the capacity of radio links exhibits a highly-dynamic nature, which depends on an extremely large number of parameters. Therefore, one of the major tasks in cellular U-RANs is to continuously monitor the link quality and determine when handoff is required. If a poor signal quality is not detected fast enough (due to rapid changes in channel state), the system capacity may suddenly drop due to excessive co-channel interference and/or undue switching load.
A reference scenario using ML methods to overcome channel modeling problems is given in [56]. The challenge hence is to reconstruct the path loss map of the environment on the basis of local measurements, together with the path loss values for those geographical locations where no measurements are available [55,58,59].

Backhauling
Satellite and 5G technologies are considered as the main candidate technologies to provide wireless backhauling for U-RANs . Satellite backhauling offers unlimited coverage and allows aerial networks to be connected for any practical distance (see Figure 4). However, the latency introduced by satellite links considerably affects delay-sensitive services, such as real-time audio/video conversations. On the other hand, compared to satellite backhauling, terrestrial links have the advantages of lower cost and latency, but they provide reduced backhaul coverage. Other solutions for wireless backhauling are mmWave and FSO communications with ground stations. The choice of the best backhauling technology is a challenging issue possibly affecting the overall performance of U-RANs [41,[59][60][61][62].

ML-Based Designs
Design of wireless communications systems is dominated by model-based approaches. However, it is difficult to develop accurate models in highly dynamic environments and, moreover, most existing algorithms cannot perform full processing and utilization of the available data. As a result, much valuable information and patterns in the data are unexploited, so-called hidden information.
ML techniques can be a powerful tool to design U-RANs using example data or past experience [15,17,18,37]. ML algorithms can be classified as supervised, unsupervised, and reinforcement learning. In supervised learning (SL), the aim is to automatically acquire a mapping from the input to the output, whose correct values (referred to as labels) are provided by a supervisor during a training phase. In very complex scenarios with multiple UAVs, there are plenty of parameters that can be considered as input data training for SL. In the case of SL, a ML box can be used having as input the features and variables related to the defined trade-offs of the network and a loss function can be defined based on the given scenario. In unsupervised learning, there is no such supervisor, and the algorithm must autonomously find similarities, patterns, and differences in the available data without any prior training. However, a sufficient amount of training data may not be available in many cases.
In some applications, the output of a system is a sequence of actions. In such a case, an ML algorithm should be able to find the best sequence of actions, i.e., a policy, to reach a given goal, by observing past good action sequences. Such learning methods are called reinforcement learning (RL) algorithms, and are based on sequential decisions, where the next step input depends on the decision of the system. In contrast to SL, where the algorithm learns through analyzing a labeled training dataset, RL algorithms discover the environment in order to learn which actions are the best to be taken at specific states. Due to their capability of online learning, RL algorithms can be particularly suitable for U-RANs. Examples of basic RL elements, namely actions, states, and reward functions, are reported in Table 1. As an example, let us consider a non-flat area including many buildings, trees, and open area markets, and suppose the goal is to find the best location of an UAV that has to broadcast some data to the users. In this case, an RL approach will consider the number of covered users as a reward function. The state can be defined in terms of UAV movement directions, the agent is the UAV itself [15,19], and the actions are UAV speed values and trajectories. This approach can help to find the best location of an UAV in online manner when no training dataset is available, while finding the best location of UAV through conventional optimization methods is too complicated [20]. Moreover, SL and RL can be combined together [1,37,63].
In general, a reward based function can be defined based on the scenario, e.g., increasing coverage or the number of the users as it is shown in Table 1. In a well-defined area with uniform distribution of the users, increasing the number of supported users can be considered as an improvement in coverage. Moreover, coverage can be increased by increasing UAVs altitude, but interference will be increased as well. Furthermore, hovering time decreases when UAV altitude is increased. UAVs make different actions in a defined period of time and the number of satisfied users are counted in each step. Based on the defined reward function, the configuration with the highest number of supported users is chosen. This can be done in real time if the user traffic is not delay-sensitive. On the contrary, when delay-sensitive applications must be supported by U-RANs encompassing many UAVs, RL may not be a convenient choice, as it might take a long time to find the best location based on the reward function [64]. In contrast, if enough data for training are available, SL might allow for obtaining real-time adjustments for UAV locations. Strictly speaking, SL is faster than RL if a training dataset is available.
In highly dynamic environments, there might exist unknown relationships between the variables in the input dataset. In a traditional neural network, all inputs (and outputs) are assumed to be independent of each other and, thus, such a network is unable to capture data interdependencies [1,37,63]. The idea behind recurrent neural networks (RNNs), instead, is to make use of sequential information, in order to capture long-term dependencies in the input dataset, such as UAV altitude and hovering time. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous computations. It is worthwhile to note that data collected using RNN and RL can be used as input datasets for SL.
The focus of the rest of the paper is on the use of SL and RL in all those cases where UAVs are employed as a part of a RAN. In the following subsections, applications and benefits of such ML tools in U-RANs are discussed in detail.

Radio Resource Allocation
Radio resource allocation (RRA) in U-RANs includes functionalities such as power control, spectrum management, beamforming design, backhaul management, cache management, and computational resource management. Such tasks can be jointly managed by resorting to ML tools, as depicted in Figure 5. Different variables of user patterns and behavior, such as spatial positions and communication requirements (e.g., their need of the data traffic rate, their delay tolerance, and reliability), can be considered as main user patterns that can be injected into the ML box. User pattern data can be gathered in an offline manner or can be predicted at a specific time during the day. RL techniques can be used to single out the best A2G connections that minimize interference, whereas SL approaches are more suitable for dynamic bandwidth assignments. The number of far-flung connections should be accurately controlled to mitigate interference. The optimum number of connections can be also achieved in a learning process using RL. State is the number of links at a certain time, action consists of adding or removing the certain number of UAV-based radio links. When there is no dataset available, RL can also be used to generate input data for the SL approach. Hidden information in user traffic pattern and priority of some users over the others can be extracted from data via ML techniques in order to build an area traffic map. UAV features, such as flight time and speed, and characteristics of the RAN interface, can be used in combination with a path loss map and specific information of weather conditions [11,34,40,55,65].

Design of Collectors and Relays
UAVs can be used as collectors and relays in IoT applications [66,67]. With reference to Figure 6, the UAV starts its journey towards the source (S) at time t = 0 with speed v, to collect its data at t = δ/2 (first phase or hop). In the second phase or hop, which starts at t = δ, the UAV flies to the destination (D) and sends the data in its buffer to D at t = 3δ/2. It is obvious that, compared to static relaying with fixed relays at a given position, such a mobile relaying strategy allows one to shorten the link distance in each of the two hops. With reference to this problem, an approach based on genetic algorithms (GAs) has been proposed in [68], where a delay-tolerant network is considered with ground nodes far apart from each other. Such nodes communicate through flying UAVs from one location to another and a static area having nodes with fixed traffic generating features is assumed. Results in [68] show that the GA-based approach outperforms other schemes in terms of delivery ratio, although it does not always present the best average delay. However, in real operative scenarios, data traffic dynamically changes and optimally finding the speed and location of UAVs to support the nodes efficiently is an open problem. The design of the UAV trajectory, to collect and ferry data from multiple widespread sensor nodes, has been considered in [69]. To this aim, an RL algorithm is used by assuming a slowly changing scenario, in which a single UAV can potentially adapt its trajectories to these changes. However, the extension to the case of multiple UAVs and the incorporation of other rapidly changing variables, such as channel state information and user requirements in terms of delay, still remain to be addressed. Other open issues are pointed out in [69,70].
The choice of the best positions to collect data from S and deliver it to D, as well as the storage capacity and speed of the UAV, are crucial issues to fulfill a QoS target in terms of reliability and latency, when several nodes and UAVs are involved. For instance, UAVs should choose the best speed and stop point to fulfill all nodes' communication requirements. The best speed can be calculated by a regression model with SL and the stop point can be obtained by RL. Specifically, SINR values, activation periods, required data rate and delay sensitivity, and UAV features (i.e., buffer size, processing capability, and maximum hovering time) can be considered as data for training in a SL approach. The optimal speed of the UAVs is considered as the desirable output. With mobile collector-relaying, UAVs constantly fly between the source and the destination, with the aim of reducing the link distances during both UAV information reception and relaying phases. Searching for the best location of UAVs with model-based approaches is a difficult problem, since many restriction and assumption should be considered. Such a task can indeed be solved by resorting to RL approaches [38,71]. Agents in RL are UAVs, and their flying parameters are the system state; in a certain time period, the numbers of satisfied users are measured as reward function; the action consists of lowering or increasing the speed of the UAVs.
Another crucial aspect is the adjustment of the transmit power of the UAV relays, which can be evaluated by using a ML algorithm. In this case, the outage probability due to low power is considered as a loss function to be minimized. The location of the users, their densities and communication requirements, and the mutual distances among UAVs are inputs to the ML box. The reward or loss function can be the power level of the UAVs and the percentage of non-satisfied users. The latter one can be evaluated through different methods. It can be obtained e.g., by knowing the density of the user in the area, coverage area size, and the number of acknowledgment message received by UAVs (during hovering time cycle) from the users.

Choice of the Type of UAV
The main energy expenditures of UAVs with communication capabilities are due to the aircraft operations (i.e., hovering and flying) and radio transceivers. Different UAVs features such as speed, wings type, motion features, and size may affect the energy consumption of the radio interface [53,63,66,67,69]. Therefore, the choice of the type of UAVs (i.e., hovering time, speed range, storage capacity, maximum download/upload speed) is important to support users with different communication requirements. This task can be accomplished via ML techniques.
As a first step, the area to be covered is divided in sub-areas, each one has to be served by a specific UAV. By using available traffic patterns, each area can be classified on the basis of the number of users and their communication requirements by means of SL approaches. The obtained classification is then used to choose the most appropriate type of UAV. For instance, if we have N different UAVs with different features, an appropriate match for a subarea can be chosen among these UAVs. For a given time slot, when the feature of both area traffic and UAV is changed, the class of that area is changed to a new one and a new UAV is employed to cover the considered area. This new UAV can again be chosen from N different type of available UAVs (classification task), by choosing the one ensuring the best functionality in terms of hovering time, buffer size, capability to have a certain speed and different capability on their maximum altitude. Based on the data set that can be achieved in an offline manner, the aforementioned features can be chosen to be injected to an ML box. A schematic view of this process is given in Figure 7.

Choice of the Number of UAVs Acting as BSs
When UAVs are used as BSs to serve ground users, the choice of the number and positions of UAVs greatly impacts the transmit power of each UAV, the coverage area, and SINR at the receivers [1,3,7,20,63]. Joint optimization of the number and positions of the UAVs might be too complex to be performed in many scenarios of interest due dynamic system requirements of U-RANs, especially when UAVs have to be integrated with ground BSs. For such a reason, they are often optimized separately [4,10,19,20,34,35,39,49]: in the first step, the best number of UAVs is found and, then, the optimal UAV positions is obtained for such a number. The former step is discussed herein, whereas the latter one is deferred to the next subsection.
The number of users also affects the distance between the UAVs and their flying altitudes. The most appropriate number of UAVs depends on the considered scenario. Let us consider an application where multiple UAVs share the same file to be requested by users and, moreover, the users have device-to-device (D2D) communication links in order to facilitate file download (see Figure 8). In this case, the optimum number of UAVs can be obtained by using RL methods in combination with SL. As a matter of fact, UAVs need high altitude to better support users in LoS conditions and manage interference effectively. If user patterns change, the number of UAVs has to be updated accordingly. This can be done by operating on a slot-by-slot basis: in each time slot, user patterns, such as activation period, payload, delay sensitivity, movement behavior and densities, are monitored.
Joint control of altitude and interference as a function of user patterns is a challenging task. To simplify the matter, static scenarios can be considered, and the number of UAVs to support a given area can be calculated through simulation by using data from telecommunication operators or via sophisticated time-consuming optimization methods. In this scenario, RL can be helpful in an offline manner, in order to generate a huge amount of data set for SL. In this case, the UAVs are the agents, the current number of UAVs is the state, the action consists of increasing the number of UAVs by one and the reward function can be the number of supported users. In these static scenarios, by changing the number of users and interference-sensitive users available in each area, the optimum number of UAVs can be calculated. If UAVs are employed to serve delay-sensitive users, the dataset can be used to obtain the number of UAVs in real time by means of SL approaches. In this case, the users' behavior in a certain time slot, their sensitivity to the interference, and their communication requirements are the features that can be used as training data set to the ML black box and its label can be the number of the UAVs. Knowing the number of UAVs, the altitude of each UAV from the ground can be easily computed in order to cover the area with minimum overlap, in the cases where the UAVs are equally distributed in the area. Despite of several assumptions made, this ML algorithm can be the best available method to calculate the number of UAVs in each area and it is worthy to be investigated. Moreover, when a dataset is not available, the efficiency of an evolutionary algorithm, such as GA, to find the optimal number and positions of the UAVs should deserve further investigations.
The percentage of failure in downloading the file by the users is another loss function, reward and punishment, in which adding or decreasing one UAV can be considered as an action and the number of UAVs is instead the state. The flowchart of this process is reported in Figure 9. As a first step, a certain number of the scenarios is defined and the number of UAVs is obtained by using RL with defined loss function. The results are stored as the input dataset and desired outputs. Then, a certain number of scenarios is redefined, and the output is generated by using SL in the next phase and for these scenarios the loss functions are checked. If a certain percentage of failure in the desired results is observed, then more datasets should be generated in order to be injected into the SL box. Thus, data are achieved by RL method and the results are stored until the desired value in the second phase is achieved. A number of additional scenarios can be defined as a certain percentage of the scenarios that was used at the first step in the RL phase.

Positioning of UAVs Acting as BSs
Traditionally, the locations of terrestrial BSs are mainly calculated on the basis of the user density and traffic requirements. When UAVs are used as BSs, their 3D positioning is a critical issue. Figure 10 shows a scenario in which different users are served by multiple UAVs. Specifically, the upper-side plot in Figure 10 shows a scenario in which different users are served by seven UAVs (UAVs 1, 2, 3, . . . support areas A, B, C, and so forth). Each UAV is associated with an area delimited by curves. The boundaries of such areas are changed as user traffic varies and, accordingly, the location of the UAV is updated, as in the lower-side plot of Figure 10. The partition is developed so as to maximize the coverage area as well as to minimize the flying time of the UAVs and their communication power expenditure.
As mentioned earlier, there is a fundamental trade-off between the energy consumption of UAVs for flying and communication power, which can be described in a single metric [30]. In particular, the authors in [30] have studied energy-efficient UAV communications with rotary-wing UAVs. The propulsion power consumption model of a rotary-wing UAV is derived, based on which an optimization problem is formulated so as to minimize the total UAV energy consumption, while satisfying the individual target communication throughput requirement for multiple ground nodes.
The considered scenario in [30] encompasses a single UAV and dynamic traffic generation is not considered at all. The problem in [30] can be regarded as a SL task. RNN and RL can be used when the target is maximizing the number of users that are served by a single UAV. The same interplay depicted in Figure 9 between SL and RL can be applied in this scenario as well. If maximizing a certain target is considered, usually RL is the best choice and a reward function can be defined, which includes both energy consumption and the number of satisfied users. In RL, agents are the current locations of UAVs. Action consists of finding the best location of each UAV one after the other, in order to maximize the number of supported users.
In addition to the collected data, it is useful to have a desired amount of data to be used for SL in a real-time manner, thus avoiding the delay for finding UAVs configuration, which is particularly important for users with strict latency requirements. After finding the appropriate locations of UAVs by RL, the 3D locations of UAVs are determined relying on detailed traffic patterns. More precisely, users interference sensitivity, density, data rate, delay sensitivity, and reliability, are considered as features to build the input dataset for a multi-output SL (MOSL) box, with the output being 3D locations of the UAVs [72]. Multi-output learning subsumes many learning problems in multiple disciplines and provides complex decision making in many real-world applications. It exhibits a multivariate nature and the multiple outputs may have complex interactions, designed to be handled by structured inference. The output values have diverse data types, depending on the type of ML problem.

Design of a Mobile Cloud/Cloudlet
ML can be used to analyze the data that can be stored and processed in the cloud (see Figure 3). A number of clouds can be placed into a certain number of UAVs, local clouds, and terrestrial BSs, so called mobile clouds or cloudlets. Collecting, storing, and processing big data, to balance centralized (e.g., mobile cloud) and distributed (e.g., mobile edge computing) paradigms in a U-RAN are the main challenges. The cloudlet can have different cloud computing characteristics. Thus, for instance, in order to support the processing requirements of different users in the area, the cloudlet should have an optimized speed or location.
UAV-based cloudlets have been established in [73] as an efficient approach to reduce the latency and effectively enlarge the computational abilities of IoT devices. In particular, the problem of jointly optimizing the number and positions of deployed UAV cloudlets in 3D space in order to support IoT services with stringent latency requirements is discussed. The method of [73] exhibits good performance with low complexity, under some restrictive assumptions. For instance, it is assumed in that the uplink bandwidth is equally distributed among the nodes and, moreover, the average time services and QoS of the different users are equal. These assumptions may not be fulfilled in many scenarios of practical interest and, in this case, model-based solutions are difficult to be found.
A very comprehensive survey on using U-RANs in this area has been provided in [70], where the main challenges of managing an UAV-based cloud computing in heterogeneous environments are discussed, without, however, providing possible solutions to the reported issue.
Let us consider a scenario where multiple UAVs are employed to offer cloud-based services to ground users with different communication requirements. The goal is to find the best speed of the mobile cloud and the number of UAVs that are equipped with cloudlets. The number of UAVs can be determined by means of SL, where computing/processing time of a certain task and the percentage of satisfied users are used as dataset. Indeed, consider a scenario with users having different communication requirements. If we have a cloudlet that serves the users, the number of the UAVs as processors in a cloudlet should be increased as the number of the users rises. The users' density, their communication requirements, and UAVs features (e.g., processing/hovering time) can be considered as an input dataset for an SL box, whose output can be the number of the UAVs in the cloudlet.
Deep RL can also be used to dynamically change the speed of the UAV cloudlet(s) in order to maximize user performance. It can happen that such a dataset is not available, and the users are not delay sensitive. In this case, agents are still UAV, action is the increasing speed of the cloudlet and the state is the current speed of the cloudlet. The reward function is the percentage of the satisfied users that is achieved, as stated in previous sections.

Conclusions, Open Problems, and Future Directions
The use of UAVs is a promising approach for improving the QoS of 5G RANs due to their inherent ability to form LoS radio links in highly dynamic environments. However, design and deployment of U-RANs is a challenging problem, which cannot be easily solved by conventional model-based techniques. In this respect, ML-based solutions are expected to be a viable alternative, which is able to properly take into account all the valuable information and parameters of the problem. In this paper, the advantages and potentials of using ML algorithms in U-RANs have been discussed, in a number of different network design scenarios that cannot be easily solved by conventional model-based methods. A summary of the considered applications and results of the survey is reported in Table 2. To meet the stringent requirements for 5G and beyond systems, additional parameters must be used in ML algorithms for U-RANs design, such as latency and resiliency, as well as additional figures of merit based not only on QoS but also on the quality of experience (QoE) of the users. It is worth mentioning that the performance of ML algorithms may be degraded or might exhibit unexpected behaviors when the obtained solutions operate on data having different properties than those of the data used for training the model. Such issues deserve further investigations.
Specific issues for future works include: (a) design of ML-based cross-layer algorithms; (b) integration of multiple antennas devices in ML solutions; (c) developing ML approaches explicitly accounting for security aspects; and (d) investigation of pros and cons of unsupervised learning procedures.
Author Contributions: Conceptualization, V.K., F.V., G.G., and J.A.; methodology, V.K. and F.V.; writing-original draft preparation, V.K. and J.A.; writing-review and editing, F.V. and G.G.; supervision, F.V. All authors have read and agreed to the published version of the manuscript.