Ellipsoidal Path Planning for Unmanned Aerial Vehicles

: The research in path planning for unmanned aerial vehicles (UAV) is an active topic nowadays. The path planning strategy highly depends on the map abstraction available. In a previous work, we presented an ellipsoidal mapping algorithm (EMA) that was designed using covariance ellipsoids and clustering algorithms. The EMA computes compact in-memory maps, but still with enough information to accurately represent the environment and to be useful for robot navigation algorithms. In this work, we develop a novel path planning algorithm based on a bio-inspired algorithm for navigation in the ellipsoidal map. Our approach overcomes the problem that there is no closed formula to calculate the distance between two ellipsoidal surfaces, so it was approximated using a trained neural network. The presented path planning algorithm takes advantage of ellipsoid entities to represent obstacles and compute paths for small UAVs regardless of the concavity of these obstacles, in a very geometrically explicit way. Furthermore, our method can also be used to plan routes in dynamical environments without adding any computational cost. a ﬁtness to ﬁnd the TLBO


Introduction
Autonomous Unmanned Aerial Vehicles (UAVs) play an important role in both military and civilian applications. In contrast with manned aircrafts, UAVs are able to perform complex and dangerous tasks with high maneuverability and low cost [1,2]. An important problem to solve in order to achieve a certain level of autonomy is path planning. In the past, the best path was selected as the shortest distance to a goal; now, the best path is associated with the traveled distance and energy consumption [3]. If more parameters, besides distance, are considered, the path planning problem can been stated as an optimization problem, and population based algorithms have been used in many cases to solve it successfully [4][5][6][7][8].
In [9], we described a novel algorithm for path planning, which uses conformal geometric algebra to generate maps using spheres. By using spheres, we gain in terms of the number of parameters needed for representing the maps and these maps are rich in information. For example, we need the same number of parameters for representing a sphere as for a plane, but the plane also needs an extra number of parameters for bounding the plain. Moreover, the spheres are easy to operate in conformal geometric algebra.
The algorithm employs the characteristics of the spheres described in this algebra to navigate through the maps by combining them with Teaching-Learning Based Optimization (TLBO). In this paper, we compared different evolutionary optimization algorithms where TLBO had the best result.
On the other hand, we also explored approaching the robotic mapping problem by using ellipsoidal representations [10]. These ellipsoidal geometric entities are coded in the geometric algebra G 6,3 . The resulting map is compact and rich in information as we showed in [10].
The problem of robotic mapping consists of constructing a spatial representation of the environment, which is helpful for the robot [11]. There are classic mapping abstractions, such as grid occupancy [12], where cubes represent the objects. This abstraction is memory efficient but discretizes the environment; furthermore, it is useful for office-like environments but is not adequately suited for outdoor environments.
We can also find variable size grid occupancy [13], where we can change the resolution of the grid. With particular modification, we can model dynamic maps with this abstraction [14].
There are other map abstractions such as multiplanar maps [15], landmarks, and points of obstacles [16]. A mapping algorithm called OctoMap was proposed in [17,18]. OctoMap has variable resolution grid occupancy representation with a probabilistic construction. We include in Table 1 a qualitative comparison between ellipsoidal maps and Octomap. In Figure 1, we present an example of a cloud point (left) [19] and its ellipsoidal map (right).
(b) Ellipsoidal Map (700 ellipsoids). Figure 1. Example of an ellipsoidal map generated with the ellipsoidal mapping algorithm presented in [10] .
In this work, we present a novel algorithm for path planning in 3D environments for small UAVs. This algorithm works on ellipsoidal mapping provided by the algorithm in [10]. There is no closed form for calculating the distance between two ellipsoidal surfaces and using an iterative algorithm will be computational expensive.
We propose to solve this problem by training a dense neural network for approximating the distance between two ellipsoids. We propose a new fitness function to find the path with the TLBO algorithm.
The TLBO algorithm was chosen because it obtained the best performance in a similar problem presented in [9]. We refer the reader to [9] for a performance comparison on metaheuristics for similar path-planning.
The paper is organized as follows: in Section 2, we introduce our solution to efficiently calculate the distance between ellipsoids and we show the training and generalization results. In Section 3, we offer a brief review of the TLBO algorithm and we develop the fitness function for path planning in ellipsoidal maps. Then, the simulation and results of the proposed algorithm are presented in Section 4. Finally, in Section 5, we offer a conclusion and future directions based on this work.

Approximating the Distance Function with Neural Networks
Our goal was to develop a path planning algorithm to work with the ellipsoidal maps to take advantage of these maps being compact-in-memory yet rich-in-information. The hypothesis to achieve the above goal was to design an algorithm that could compute a path using the distance between the envelope ellipsoids of the obstacles, and the ellipsoid that models the UAV. The computed path maintains the vehicle safe free space between itself and the occupied places.
To know how much free space exists between ellipsoids, we needed to solve the nontrivial key problem of finding a method to compute the distance between them due to the fact that there is not a closed way to do it. We propose a machine learning method to solve the above computation using neural networks to overcome the problem that represents the great computational costs of using iterative algorithms to calculate distances in maps, where the number of ellipsoids is large.
To solve this problem efficiently, we use a dense neural network to estimate the distance between two ellipsoids. One ellipsoid will represent a small UAV and the other will represent an obstacle.
We can train a neural network for the regression problem. To generate a dataset for training, we randomly generate a pose for the UAV (yaw, pitch and roll). We fix the semi-axes of the ellipsoid representing the UAV with (0.5, 0.5, 0.3) in meters. Furthermore, we generate a random ellipsoid around the UAV. We calculate a cloud mesh for every ellipsoid and estimate the distance between the UAV by using a brute force approach. In Figure 2, we present an example of generated samples and their estimated distance. One problem with this approach is to find a correct normalization of the data. If the input data of a neural network is not well normalized it could lead to slow or biased learning. In the following, we describe a novel normalization method that achieves good results on the neural network performance.
Firstly, the small UAV is represented by an ellipsoid with fixed semi-axes. These semi-axes are not considered in the learning problem, because the neural network can learn them as well. We will fix the UAV position at the center of the 3D space. Then, the position and the semi-axes are not input data for the neural network. The UAV is represented only with the angles (yaw, pitch, roll) ∈ [0, 2π] 3 .
In the second instance, the 3D points on the map representing the obstacles are mapped using ellipsoids. As we presented in [10], we applied a clustering technique to the cloud point. Each cluster is a set of 3D points [x i , y i , z i ] T n i=1 , with center of mass [µ x , µ y , µ z ] T . We can also calculate the pair-wise covariance between two variables; for example, for x and y coordinates, the covariance is calculated with (1): With the pair-wise covariances, we construct the covariance matrix is defined with (2). The parameters (c 1 , . . . , c 9 ) carry the information of an ellipsoid that covers all non-outlier data points.
The obstacle ellipsoids will be represented with the normalized covariance matrix. This parametrization is chosen because it is the output of the multi-ellipsoidal mapping algorithm presented in [10]. Other parametrizations lead to a high computational cost; for instance, one could use the angles of rotation and the semi-axes, but this will require the spectral decomposition of the covariance matrix.
In Figure 3, we present a 2D scheme of the maximum and minimum distances between two ellipses. The desired distance will be between the minimum and the maximum distances and it will depend on the orientation of the ellipses.  The positions of the UAV and obstacles are difficult to normalize. If we normalize using common techniques like max-min or the standard normalization the neural network could output strange values for ellipsoid outside this normalization. To avoid this situation, we will code the relative position with a normalized vector from the UAV center to the obstacle center (u x , u y , u z ).
Instead of doing regression with the real distance between the ellipsoids, we just estimate a correction variable δ if we subtract this value from the centers distance of the ellipsoids, we can calculate the distance between ellipsoids, as we show in (3): The δ correction factor depends only on the relative position of the obstacle ellipsoid and the UAV ellipsoid and their orientations. We can approximate this correction factor with a neural network. In Figure 4, we present the normalized input vector and the neural network architecture. We generate 185,000 random examples that took one day to calculate. We distribute these samples the following way: 149,850 samples for training, 16,650 for validating and hyper-parameter tuning, and 18,500 for testing. We trained this neural network for 50 epochs. In Figure 5, we show the training and validation evolution on the mean squared error (MSE). We got 0.0016 final MSE for training and 0.0017 final MSE for validation. In Table 2, we present the training results. In particular, we present the R 2 -score where the best possible value is one, we also show the mean absolute error (MAE) and the median absolute error (MedianAE), in order to give a good sense of the capabilities of the neural network. The neural network achieves a high performance in predicting the correction factor and by using (3), we can accurately calculate the distance between the UAV and the obstacle ellipsoids. The Network was programmed on Keras/Tensorflow, then the network natively can run on a graphical process unit (GPU) for high performance. After testing, we can calculate the distance between a UAV ellipsoid and 200 obstacles in a mean time of 0.1208 s (We use a RTX 2060 GPU). With this result, we can assure that the application of path planning over ellipsoidal maps is computationally affordable.
Finally, we run an experiment on a virtual environment by placing the UAV in a grid position and calculating the minimum distance to the closest object. In Figure 6, we show the result of the experiment. The warmer colors represent greater distances. Notice that the neural network can calculate an accurate map that is compact in memory and rich in information. In the next section, we develop the path planning algorithm based on this neural network by using a bio-inspired algorithm with a greedy approach.

Teaching-Learning Based Optimization
TLBO is an optimization algorithm based on the teaching of knowledge from a teacher to his or her students on a classroom [20]. This population based algorithm has two main phases, in which it generates new knowledge: the teacher and the learner phase.
The teacher phase is inspired by the transmission of knowledge from a teacher to the students, and centers efforts to increase the average score of the class. The learner phase is inspired by the knowledge shared among students; the students with more information will be beneficial as the other learners learn new information from them. The teacher is the best solution so far in the current iteration.

Teacher Phase
A good teacher will try to increase the knowledge of the students/learners based on his or her own knowledge over time/iterations. But no matter how good a teacher is, because of many factors this can only be done to some extent in a classroom composed of n students. It can be said that the mean of the new knowledge of the class will be moved in some extent towards the teacher's knowledge, but it will also depend on the capabilities of the class [21].
The following equation shows the intent of the teacher X T to influence, to some degree, each individual X i (composed of the drones (x, y, z, yaw, pitch, roll) values) with the help of the mean of knowledge of the whole classx: where T f ∈ {1, 2} is a random value of only two possible values, named the teaching factor; and r ∈ [0, 1] is a random number. If f (X i ) provides a better solution than f (X i ) (where f is a fitness function), X i replaces X i as a solution.

Learner Phase
The learner phase depends on the interchange of knowledge between students. A learner with new information will have an influence on the overall knowledge of the class [22].
This phase consists of adjusting each learner X i based on another learner X k , where i and k ∈ [1, n] : i = k [9].
There are two alternatives that could happen for learners X; when the f (X i ) is better than f (X k ) which generates the following learner: or vice versa, when f (X k ) is better than f (X i ), X i replaces X i as a solution if it represents a better solution. As can be seen, this phase also includes the teacher solution from the previous phase, but in a less important role.

Fitness Function
We designed a fitness function that is composed of four terms: where d t is the Euclidean distance between a learner X i and the target point θ (composed of only by x, y and z values); by itself this term helps to attract the population towards the target. c is the obstacle collision indicator: where o j is the distance between the leaner X i and the obstacle j, which is obtained using the neural network described in Section 2. The collision indicator's function is to heavily penalize collisions, since it is of utmost importance to guarantee the UAV's safety. h represents a heat factor that indicates the proximity of X i to a set of obstacles o d ∈ o j : 0 < o d ≤ r 1 , where r 1 is a user defined range of proximity: To give the drone the capability to avoid large convex obstacles inside a room (usually obstacles that go from floor to ceiling), a stuck factor s is added. To obtain s it is necessary to create a set of stuck zones s z ; when the euclidean distance of the (x, y, z) values of teachers τ t−1 and τ t (where τ is the last teacher obtained from a TLBO run and t is the current run) is less than a user defined threshold α, a zone is added to s z . If the condition is met the (x, y, z) values from τ t are queued to s z . The stuck factor is obtained as follows: s = ∑ (−log 2 (0.001 * s k ) + log 2 (r 2 )), (10) where s k is the set of Euclidean distances from a learner X i to any point in s z , where 0 < s k ≤ r 2 , and r 2 represents a user defined range of proximity. The pseudocode for path planning is presented in Algorithm 1. X i ← initialize to zero 5: X T ← obtain teacher from population as a separate value 6: ngens ← number of generations 7: stop ← stopping value 8: tqueue ← teachers queue 9: S z ← set stuck zones list to empty 10: User defined threshold 12: while X T f itness > stop do 13: for gen ← 1 to ngens do 14: for every X i do 15: X i ← teacher_phase(X i ) This step corresponds to (4) 16: X i ← bound_increments(X i , actual_pos) 17: (7) 18: Selects the best candidate between X i and X i

Simulation and Results
To define the whole path, TLBO was run several times, and each time (except the first one) the learners were initialized randomly within the proximity of τ t−1 ; for the first iteration the (x, y, z, yaw, pitch, roll) base values where defined arbitrarily. Each TLBO run consisted of 20 iterations and a population of five individuals; r 1 and r 2 were assigned values of 0.35 and 1.5, respectably. As a stopping condition, the Euclidean distance from τ t and the target θ was used (a distance value less than 0.1). At the end of a run, τ t is added to the path. As Figure 7 shows.
The values that could be achieved by a learner were bounded, so that the drone could not make abrupt changes that could make it behave unstably from one state to another or fly at very pronounced angles. In each iteration, the values of the learners were bounded by τ t−1 ± (0.5, 0.5, 0.5, 0.15, 0.15, 0.25). The drone's (yaw, pitch, roll) values were also bounded globally to ±[π, 0.7, 0.7] radians, respectively. The bounds and other values were empirically selected. We included the pitch and roll angles in the search space because some control schemes for UAV such as Backstepping or Inverse Optimal Control need these references [23][24][25]. However, our proposal can work even when ignoring pitch and roll angles.
Compared to [9], our approach involving the path planning algorithm offers several advantages. Firstly, our approach was designed to work indoors and outdoors alike.
Ref. [9] shows several limitations avoiding large convex obstacles inside a room since it does not take them into consideration and this prevents the algorithm to be trapped in certain local minimums. Tthe influence of the obstacles in our approach also only takes into account the nearest obstacles in the range and adds smaller penalty values that do not heavily obfuscate the influence of the distance to a target in the fitness function. Figures 8-12 show several maps where paths were generated for a drone to follow. All the maps were contained in a room composed of ellipsoids . The room was not plotted for display purposes. (c) Side view. Figure 8. Path of map 1 generated with TLBO. The green ellipsoids represent the path obtained, the asterisk represent the target and the multicolor ellipsoids represent the obstacles. It can be seen that the path is sufficiently smooth for a drone to follow it. (c) Side view. Figure 9. Path of map 2 generated with TLBO. The green ellipsoids represent the drone path and the multicolor ellipsoids represent the obstacles. Although it shows difficulties to find a path, it is safely pushed away the obstacles at the start. (c) Side view. Figure 10. Path of map 2 without the room of ellipsoids. In this case, the ellipsoids, where the map is contained, have been removed. As can be seen, now the UAV can easily find a path above and beside the obstacles. (c) Side view. Figure 11. Path of map 3 generated with TLBO. The path represented by the green ellipsoids, are sufficiently smooth for the UAV.  The figures display the path followed by the drone (green ellipsoids) to its target (red asterisk), avoiding on the way several obstacles (multicolor ellipsoids). As can be seen, the drone can easily follow the paths obtained by the algorithm. Maps, like those shown in Figures 8 and 9, show little difficulty finding a path; even in the presence of convex obstacles at the start, the drone is safely pushed away until it finds a way to circumnavigate the obstacles. Comparing Figures 9 and 10 it can be seen that, without the room, the path simply passes above and beside the obstacles.

Comparison with State of the Art Algorithms
In order to validate the proposed method, we compared it with other evolutionary techniques and a well-known path planning algorithm. Firstly, we describe the Rapidlyexploring Random Tree Star (RRT*) [26].
The RRT* algorithm, presented in Algorithm 2, finds a free path from the initial point z init to the final point z goal . We created a tree T with an initial node z init . Next, we grew the tree up for N attempts. if ObstacleFree(z new ) then 8: z near ← Near(T, z new ) 9: z min ← ChooseParent(z near , z nearest ,z new ) 10: To find the next node in the tree, we randomly sampled a point on the map z rand that is not with an obstacle. We found the nearest node of the tree and renamed it z nearest . After steering from z nearest to z rand , this function has heuristics about the robot's kinematics. Then we renamed z rand as z new and found a path U new .
The ObstacleFree function search for collision with obstacles on the line from z nearest to z new ; if there were no collisions, we proceeded to add the z new point. We collected the points close to z new within a certain radius. Then we chose the parent node that carried the least cost and renamed z min . Finally, we added the link between z min and z new and rewired the tree to find the minimum cost.
To apply the RRT* algorithm on an ellipsoidal map we developed a collision detection function. We used the Cholesky factorization of the covariance matrices that represent the obstacles. We show this in (11). In the case of the evolutionary algorithms, we can expect longer run-times because of the neural network prediction.
Using the triangular matrix L, we can calculate if a point x is inside an ellipsoid on the map by using the inequality (12), where µ is the center of the ellipsoid.
In Figure 13, we show the resulting path of the RRT* algorithm in the same four maps. Notice that the found paths avoid the obstacles but do not consider the ellipsoid that represents the UAV. We used a maximum of 5000 iterations, but for the mean convergence on the four maps it was 1348 iterations. Furthermore, we also present experiments with other state-of-the-art evolutionary optimization algorithms. We ran the same tests for the the Differential Evolution (DE) [27] algorithm, the Particle Swarm Optimization (PSO) [28] algorithm, and the Firefly (FF) [29] algorithm.
All the evolutionary algorithms used the same fitness function presented in (7). All the proposed methods are non-deterministic. To ensure the validity of the results, we ran each algorithm experiment 30 times on the same computer. The start and endpoints were the same for all the experiments. In Table 3, we present the comparison of the different methods. Notice that the evolutionary schemes have the best mean performance for maps 1, 3, and 4. In Figure 14, we show the box-and-whisker plots for each map and each method."   In Table 4, we include the number of steps in each evolutionary algorithm and the number of nodes in the RRT* algorithm. We proved the novel algorithm in a real environment. We used the map from [19], and we created the ellipsoidal map. In Figure 15, we offer the resulting path planning. Notice that this path planning has a good performance even in non-structural environments. In Table 5, we show how the Cholesky factorization allows lower complexity in the RRT* algorithm.  15. Path planning in a real ellipsoidal map, we show the point cloud for a better understanding of the image. The cloud point has 100,000 points but the map only has 700 ellipsoids.

Conclusions and Future Work
In this work we presented a geometrically explicit and simple algorithm that takes advantage of the ellipsoidal representation of maps generated to plan collision-free and smooth paths regardless of the concavity of the obstacles or whether the environment is indoor or outdoor.
Our method solves the non-trivial problem of computing the distance between two ellipsoids by using a neural network. To train the neural network, it was necessary to produce a training dataset and design a novel normalization method for these data. So, we obtain an accurate approximation for the distance, which allows the computation of the free-space and occupied-space of an environment, as is shown in Table 2 and Figure 6.
In order to obtain paths and to keep the computational costs of our approach low, a bio-inspired algorithm named TLBO was used. We have chosen TLBO because it has had the best performance in previous comparisons with other techniques [9]. The fitness function was designed taking into account four desired features for the path obtained: to keep a safe distance between obstacles' ellipsoids and the UAV ellipsoid; to avoid collisions; to reduce the total amount of the UAV's proximity range throughout the whole path; and to give the drone the capability to avoid large convex obstacles (walls inside a room, dense and tall vegetation).
It is important to mention that, based on our design of the fitness function, the computed paths are not optimal. Because it was used as a greedy approach, it cannot get the optimal path but is more versatile and can manage dynamical maps without extra computational costs.
We compare the proposed approach with other evolutionary algorithms, DE, PSO, and FF. We also compare with RRT*, and we constructed the collision detection function for the ellipsoidal map. In most cases, evolutionary techniques with the presented fitness function obtained a better performance than RRT*, but clearly RRT* has a better performance on time. Furthermore, the evolutionary methods also calculate the UAV orientation and not just the (x, y, z) position.

Future Work
The proposed algorithm uses a greedy strategy to find a nearby optimal position. Therefore, the algorithm is locally optimal. To compare different path planning algorithms, a cost function must be developed, which has non dependency on the map abstraction.
We are working on substituting the greedy politic used to compute the path as well as the bio-inspired algorithm by a reinforcement learning algorithm so the system can learn policies of navigation instead of computing paths. Furthermore, we are designing an intelligent low-level control algorithm that considers the dynamical model of the UAV to follow the planned path.