Using Reinforcement Learning in the Path Planning of Swarms of UAVs for the Photographic Capture of Terrains

. Abstract: The number of applications using unmanned aerial vehicles (UAVs) is increasing. The use of UAVs in swarms makes many operators see more advantages than the individual use of UAVs, thus reducing operational time and costs. The main objective of this work is to design a system that, using Reinforcement Learning (RL) and Artiﬁcial Neural Networks (ANNs) techniques, can obtain a good path for each UAV in the swarm and distribute the ﬂight environment in such a way that the combination of the captured images is as simple as possible. To determine whether it is better to use a global ANN or multiple local ANNs, experiments have been done over the same map and with different numbers of UAVs at different altitudes. The results are measured based on the time taken to ﬁnd a solution. The results show that the system works with any number of UAVs if the map is correctly partitioned. On the other hand, using local ANNs seems to be the option that can ﬁnd solutions faster, ensuring better trajectories than using a single global network. There is no need to use additional map information other than the current state of the environment, like targets or distance maps.


Introduction
There are more and more applications for the collective use of Unmanned Aerial Vehicles (UAVs), more known UAV swarms. In addition to the advantages of the individually usage of these systems, the main motivation for swam usage is the reduction of flight time and operating costs together with increased fault tolerance [1]. Advances in the creation of algorithms [2] and telecommunications [3] allow us to have collective systems that are practically autonomous in their entirety. Thus, it is not necessary to have an operator per vehicle. Currently there are few systems that solve these path planning problems in the literature oriented to agricultural and forestry use, especially dedicated to the optimization of field survey tasks. This sector can be strongly benefited by the group use of aircraft. Therefore, the main field of application of this project is field prospecting. This objective of this paper is to develop a system for solving the Path Planning problem with 2D grid-based maps adapted to UAVs' sensors with different number of UAVs using Q-Learning techniques.

Materials and Methods
This section describes the calculation used for the extraction of the flight maps and the proposed method for the calculation of the flying paths, each described in its corresponding subsection.

Flight Maps
For the calculation of the flight maps, the cell size is calculated as the projection of the capture area of the sensors on the terrain based on the image size, the flight height and the lens angle of view. In order to better combine the captured data, the smallest area among all UAVs is chosen to take advantage of the overlapping of those with larger capture areas.
No previous information is extracted from the calculated grid-map to direct the calculation of the paths in order to avoid biases. However, by storing also information such as the position of the drones and the cells already visited at each moment, it is possible to provide a great amount of information in real time in order to improve the calculation of the paths.

Proposed Model
The proposed model for the calculation of the paths is a variation of the Q-Learning algorithm [4]. In this Reinforcement Learning algorithm (RL) [5] the calculation of the q-values is predicted based on an Artificial Neural Network (ANN) [6] with two fullyconnected layers with sigmoid activations and the RMSprop optimizer.
To obtain better results in less time, a Hill-Climbing policy [7] is followed to update the rewards received by the UAVs as they move. A training strategy using Memory Replay [8] has also been followed.
Another inherent problem with the proposed models is their configuration with respect to UAVs. There are two possibilities: first, to use a single global ANN for all UAVs; and, second, to use an ANN for each UAV, or local ANN. The first proposal requires less computational resources, but the path calculation for one UAV can be distorted with erroneous information from the paths of the other UAVs. On the other hand, the second approach requires more computational resources, but each ANN is specialized only for each UAV.

Results
For the experiments, simulations were carried out in the terrain of the CITIC research center. The metric of interest is the flight time taken to find a solution as it influences the energy consumption of each UAV. Resuls are listed at Table 1.

Conclusions
The calculation of flight path calculation of UAV swarms is approachable by Q-Learning with small full-connected ANNs. This makes the system faster and more efficient than others found in the literature. Thus, facilitating its use by other users. Minimizing the time taken to find each solution is a satisfactory metric that is rarely used by other authors. However, it is one of the most realistic since it is not possible to predict the battery consumption since it depends on other external factors such as the incident wind. One ANN per UAV is usually the best option. As the number of UAVs increases the time taken to find a solution does not grow much more, unlike a global ANN.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.