Recursive Rewarding Modified Adaptive Cell Decomposition ( RR-MACD ) : A Dynamic Path Planning Algorithm for UAVs

A relevant task in unmanned aerial vehicles (UAV) flight is path planning in 3D environments. This task must be completed using the least possible computing time. The aim of this article is to combine methodologies to optimise the task in time and offer a complete 3D trajectory. The flight environment will be considered as a 3D adaptive discrete mesh, where grids are created with minimal refinement in the search for collision-free spaces. The proposed path planning algorithm for UAV saves computational time and memory resources compared with classical techniques. With the construction of the discrete meshing, a cost response methodology is applied as a discrete deterministic finite automaton (DDFA). A set of optimal partial responses, calculated recursively, indicates the collision-free spaces in the final path for the UAV flight.


Introduction
The world market for unmanned aerial vehicles (UAVs) is expanding rapidly, and there are various forecasts and projections regarding the market for unmanned vehicles.The economic impact of integrating UAVs into the National Airspace System in the United States will grow substantially and reach more than $82.1 billion between 2015 and 2025 [1].
A wide diversity of air missions can be completed by UAVs [2,3] in various scenarios and including outdoor/indoor and water/ground/air/space environments [4].The types of missions include military (missile launching drones, bomb-dropping drones, flying camouflaged drones) and civilian (video-graph/photography, disaster response, environment and climate) [5][6][7][8][9].A highly demanded task for UAVs is 3D autonomous navigation (either in static or dynamic environments) that optimises the route and minimises the computational cost.Thus, path planning defines the methodology that an autonomous robot must complete to move from an initial location to a final location, deploying its own resources as sensors, actuators, and strategies, while avoiding obstacles during the trip.Several path planning and obstacle avoidance techniques are being used in unmanned ground vehicles (UGVs), autonomous underwater vehicles (AUVs), and unmanned aerial vehicles (UAVs).
From the traditional robotics point of view, numerous works have been developed in which the path planning and obstacle avoidance algorithms perform searches in continuous or discrete Euclidean [10] dimensional movement environments.It is important to mention that LaValle in [11] has done significant work on sampling-based path planning algorithms.However, although his analysis is complete from a two-dimensional perspective, 3D planning analysis is not completely addressed.An exhaustive study of the growing work on sampling-based algorithms is presented in [12].It must be remembered that the 2D path planning problem is NP-hard; and so environmental dimensional increases and UAV kinematics affect problem complexity.
An in-depth review of the current literature shows several works focus on two-dimensional (2D) scenarios [13] that limit vehicle behaviour to just a flat surface and consider its height as constant by making a dimensional analysis (2.5D) [14].However, in complex unstructured situations (including, for example, forests, urban, or underwater environments) a simple 2D algorithm is insufficient and 3D path planning is needed.
A diversity of methodological paradigms have been developed to complete the task of 3D path planning.These are based on sampling, node/based algorithms, bio-inspired algorithms, and mathematical models, among other techniques.A brief bibliographic review focused on 3D trajectory planning is presented below.Some representative techniques used in path planning methods and based on continuous and discrete environment sampling include: RRT (rapidly-exploring random tree) [15][16][17][18]; PRM (probabilistic road maps) [19][20][21][22][23]; Voronoi diagrams [24][25][26]; and artificial potential [27][28][29][30].Nevertheless, it is important to note that RRT and PRM make random explorations (continuous sampling) of the defined environment.RRT is an expensive algorithm in terms of computational cost when searching for feasible solutions in cluttered environments.It should be emphasised that once the PRM road map is made, a methodological base built on nodes must be invoked to define the lowest cost path.The main disadvantage of the Voronoi diagram is that it is an offline method.Finally, artificial potential algorithms present little computational complexity-although they tend to fall into local minimums.
Node-based algorithms (discrete space) are mathematical structures used to model pairwise relations (in this context, the structures are made with vertices and edges) and the aim is to calculate the cost of exploring nodes to find the optimal path.Various methodologies and subsequent variations, such as Dijkstra's algorithm [31,32], A* [33,34], D* [35,36], and Theta* [37], present these characteristics in their results.In [38] the characteristics and approximations of various methodologies of * (Star) search algorithms are studied.
In recent years, these classical techniques have been improved with new learning machine techniques.ANN (Artificial Neural Networks) [39][40][41], fuzzy logic [42,43], ACO (Ant Colony Optimisation) [44,45], and PSO (Particle Swarm Optimisation) [46,47], among others [48][49][50], are examples of these heuristic methodologies.Hence, these biological algorithms attempt to optimise the path by mimicking animal behaviour.The weaknesses and strengths of a set of heuristic techniques are discussed in [51].The implementation relevance of these methodologies does not present significant experimental results.In addition, the different techniques presented in this section have a particular computational cost and complexity based on the different approaches [52] (see Table 1).A summary of the above mentioned methodologies is shown in Table 2, which details the approximation methodology, authors and reference, type of obstacle avoidance (static or dynamic), type of implementation (simulation or real), and publication year.
The 3D path planning problem is still an open issue in this field.The general approach is to combine several of the above mentioned techniques to improve overall performance.
Several planners optimise the path planning distance.However, this paper attempts to include distance as an objective, as well as the geometrical characteristics of the UAV and flight constraints (velocity, turning capacity, battery, flight distance, etc.).All of these constraints are evaluated as potential cost and the path planning result is based on the sum of contributions for each cost (see Section 4).It is important to highlight that planning results do not attempt to arrive at an optimal path in a shorter distance.Furthermore, unlike other path planning methodologies in which pruning of the results is necessary, this paper attempts to minimise such pruning.An adaptive cell decomposition (ACD) is a strong methodology for solving physical systems led by partial differential equations [53,54].Such techniques offer a substantial improvement in computational time and discretisation is not governed by a dominant equations system.This methodology is used in accurate complex 3D Cartesian geometry reconstructions [55,56].The approach presented in this work does not seek a refined environment reconstruction, and only tries to determine occupied and free spaces within the 3D Cartesian space.The savings in computational and memory effort is significant.The computational structure that constructs the algorithm makes a rapid labelling of the geometric figure of the environment as a 3D solid with a rectangular shape.
In this paper, a functional 3D UAV path planning algorithm is proposed that is based on an evolution of the (ACD) method.The proposal attempts to achieve a linear speedup, exploring and decomposing the 3D environment under a recursive reward cost paradigm, and building an efficient and simple 3D path detection.The aim is not to generate a large scale reconstruction of the environment, nor start the procedure with a defined cloud of points [57].In the presented paper, the UAV just receives the obstacle information from the control station and generates a trajectory.Over time, physical phenomena often generate unknown space distributions between the UAV and obstacles, and so adaptation according these changes and spatial constraints might exist.In the event of obstacle collision with the previously calculated path, a new estimated path is generated.This paper is organised as follows.Section 2 defines the terrain representation and codification obstacles, and the general problem of path planning under static and dynamic environments is stated.In Section 3, the basis of the adaptive cell decomposition technique is revisited and modified to be ready for our proposal.In Section 4, the new algorithm for planned paths is then explained, and finally, several application examples are shown in Section 5. Conclusions and future works are considered in the final section.

Problem Definition
At the moment when obstacles in the 3D real world are represented, they do not possess exact geometries, and for simplicity in this paper, obstacles have been modelled as cuboids in direct relation to their dimensional characteristics and location.When cuboids increase or decrease, special care is taken to ensure that they do not collide with each other and there is a free flight route during the environment tests.
In a 3D environment where a UAV performs a continuous flight from an init point (q i ) to a goal point (q f ), a set of various manoeuvres to complete this mission are deployed.The UAV had previously defined the trajectory to follow after taking into account several considerations and constraints.
Let us assume that a complete description of the possible operating environment as an urban space in which buildings of different dimensions are defined.The UAV in flight receives data from its control station about the environmental conditions and it makes the necessary calculations to determine the best trajectory.The relevant data includes the goal point q f , the current location, and the size of the obstacles (static or dynamic), as well as speed and movement directions.Since q i is related directly with the current UAV location, the aim is to apply a discrete decomposition (partial and recursive) of the environment to find the set of collision-free spaces for the UAV flight and head towards the middle of those collision-free spaces until it arrives at q f .Hence, the final trajectory result of this methodology generates a vector (x i , y i , z i ) of three-dimensional points (system coordinates) translated as waypoints.Furthermore, it is important to emphasise that the resulting vector indicates spatial positions, and therefore the UAV that performs the trajectory tracking must possess these tracking skills.Thus, a UAV that has a quadropter-type holonomic system (the number of controllable degrees of freedom of the UAV system is equal to the total degrees of freedom) can complete the 3D waypoint tracking.Hence, a non-holonomic (the system is described by a set of parameters subject to differential constraints) (fixed wing) UAV does not have to be able to follow these trajectories.
Figure 1 shows an environmental example, where the discrete decomposition is built around the obstacles that interfere with the UAV flight.Hence, the three-dimensional characteristics of the obstacle might be considered to determine an escape trajectory that can surround the obstacle (including its sides and above and below) in continuous flight.Therefore, the 3D environment decomposition will take advantage of the 3D displacement capabilities of the UAV.

Modified Adaptive Cell Decomposition (MACD)
A standard 3D ACD algorithm attempts to make a discrete approximation of the environment, typically in a tree data structure known as octree (Octree is a tree data structure in which each internal node has exactly eight children) [58].This process requires considerable computational resources and time.The modified adaptive cell decomposition (MACD) [59] does not make an exhaustive routing for each little space in the environment.If an obstacle exists, the routing in the three dimensions is in direct relation to the obstacle characteristics.The parameterised decomposition level is fixed in direct relation to the UAV manoeuvrability.This means that whenever the UAV needs a minimal space to complete a movement, this value will be defined in the level of decomposition.
Let us say Υ = (x, y, z) denotes a discrete 3D environment, where the set of collision-free voxels are defined as S f ree , the set of occupied voxels are defined as S occup , and the optimal path (metric term of distance) between q i and q f as ρ.The aim is to find the optimal path ρ compound with a set of the nearest voxels in the S f ree space that enclose the obstacles.
The procedure is summarised in the flowchart shown in Figure 2, beginning with a specification of the number of decomposition levels.For the first level (i = 0), search limits are set to the environmental dimensions Υ. Partition of the octree divides the environment in 8 equal parts in relation to the previous boundaries.In every single octree, an obstacle search is performed that determines the possibility of a later decomposition.Once this level decomposition is finished, the complete information of S occup and S f ree space is completed.Finally, a planner determines the best trajectory in distance terms.A simple example in a 2D environment is shown in Figure 3, using quadtree decomposition.The decomposition level n is predefined, n being the total levels which define the tree growth as a hierarchical computational structure.In a first segmentation, with n = 1, the environment is partitioned in (2 n ) 2 underlying discrete spaces (blue lines).A second division of the environment (n = 2) will generate 16 spaces (green lines).Figure 3 shows an example of a quadtree decomposition, denoted by q k with current level n = 0, with reference to its own neighbourhood.On its left side, there are neighbours with smaller dimensional characteristics than the present q k .There is a total of 8 neighbours, from (2 n ) 2 with n = 3 → (2 3 ) 2 = 64, of which only 8 are related directly with the q k neighbourhood.The lower and upper face of q k has the same level of decomposition, being n = 0 and producing a single q k+1 neighbour.On its right side, as the dimensions of the neighbour q k+1 are larger than q k , the number of neighbours is equal to 1, counting a total of 11 neighbours and possible movements in this case.The 3D decomposition methodology has two relevant variations.Firstly, the definition and location of each voxel boundary, and secondly, the definition of the number of neighbours per each voxel belonging to S f ree .The number of neighbours q k is bounded by at least q k+1 = 3 neighbours, and the maximum number of each q k voxel face is a multiple of (2 0,1,2,...,n ) 2 with n decomposition levels (shown in Figure 4).At this point, the procedure is partially complete, since the decomposition just finds the set S f ree and additional calculations to determine if ρ are needed.Hence, MACD uses Dijkstra's [31] algorithm to calculate the optimal path in distance terms.
Algorithm 1 shows the pseudo-code to generate a structure called a rectangloid which contains all the information compiled throughout the cell decomposition process.The algorithm performs a recursive searching of the free S f ree space and the occupied S occup space in the discrete 3D environment to determine each voxel property (including its set of neighbours), and it simultaneously builds the computational structure that joins the voxels.For a better understanding, a brief description of several algorithm steps is offered here.if rectangloid(j).ocup== S occup then 9: [boundary] = boundaryOctree(rectangloid(j).boundaries);

Recursive Rewarding Modified Approximate Cell Decomposition (RR-MACD)
In this section, a new algorithm for path planning in 3D environments is formulated so that using an external planner based on nodes, such as Dijkstra [31] or A * [33], becomes unnecessary.Since it attempts to achieve a final path ρ based on starting conditions, or initial states, each future system state is determined by the present one (each state is a collision-free neighbour voxel).

Methodology
Let Υ denote a work environment as a discrete 3D space that contains a finite set of collision-free voxels (S f ree ) and a finite set of busy voxels (S occup ).
Let us assume an UAV is included within a collision-free voxel, which is considered as initial state s k , with the aim of reaching the end point q f .Let's assume a set formed by voxels of different sizes S k+1 as a neighbourhood of s k .In this context, a state model and a transition matrix can be developed as in Figure 5 to determine the optimal transition from s k to any state belonging to S k+1 based on two transitional measurements (D 1 and D 2 ).Starting from the current state s k , the method will try to obtain the optimum neighbour state belonging to S k+1 (s k → S k+1 = D 1 ).It will then locate which sub-path from each neighbour in S k+1 to the final point q f is best (S k+1 → q f = D 2 ).
To solve this problem a discrete deterministic finite automaton (DDFA) [60,61], F, can be defined as F = (S, G, D, q) T with a set of R m partial functions, where: • q are two points in the 3D environment space, where q i is the initial point q f is the final point.
• S is a finite set of M current states, where -S f ree is the finite set of collision-free voxels.Split in the current voxel and the set of its neighbours s k+1 = [s k+1 (x), s k+1 (y), s k+1 (z)].
-S occup is the finite set of occupied voxels.
• R m , m = 1 . . .N is a set of N partial functions involved in the 3D UAV navigation characteristics and determining feasible progress.In this paper, N = 4 functions are defined as flight parameters, being: where M distance (s i → s j ) is the Euclidean distance between any two states and M trDirect is the distance in a straight line between q i and q f .
where M tan (s i → s j ) is the direction change measurement of the tangent vector to a curve, which shows the inclination angle between any two states.
where M phi (s i → s j ) is the direction change of the bi-normal vector around the tangent vector between any two states.R 4 (s i , s j ) is associated with the amount of battery and determines the possibility of success on a predefined trajectory: where M batt (s i → s j ) is the normalised theoretical quantity of battery needed to fly from any state to any other-and the Cur batt is the current amount of battery available for flight.
Further, a Gaussian function g(R m ) is used to determine the reward in executing a possible action and it is defined as: where the transition cost values (s k → S k+1 , S k+1 → q f ) have been normalised within boundaries [0, 1].Notice how the greater the effort R m , the lower the reward g(R m ) and vice-versa.Therefore, the execution of an action from state s i to different states s j produces state transitions at different costs-an elevated cost will produce a lower reward on the transition.
All these rewards can be expressed as a vector G(i, j) of flight parameters such as: • D ∈ R M−2 is the received reward associated with a priority p ∈ R N for executing an action on a function g(R m ) and is stated as the sum of two transition priority vectors (D 1 and D 2 ) defined as: where ξ is a predefined negative reward value in each state belonging to S k+1 .Notice that the probability distribution values of the functions g(R 1 ), g(R 2 ), . . ., g(R N ) are independent and the set of answer vectors D which are mappings of S × S f ree .Therefore, this map is generated with a time-independent probability distribution.Hence, the probability of moving between one instant and the next does not change.
Hence, the best reward value from vector D generates the best x-and the final path, denoted by ρ x (F), defines a finite labeled graph with vertex S x ∈ S f ree . (a)

Simple Application
To explain the methodology, and for sake of simplicity, let's state the problem of travelling from the actual state q i ∈ s k or init point to the goal point q f in a 2D environment using a standard 2D cell decomposition.Figure 6a shows the initial scenario as well as the different S k+1 states (observable and neighbours) that may become a new s k .In this case, in t = 0, it is the same cost to move right or down-this situation appears due to the inherent symmetry of the decomposition methodology.In such a situation, randomness decides the next state.
Since a state cannot point to itself and can be visited only once, when the S k+1 states are visited and evaluated, one of them is selected by producing a forwarding movement (see Figure 6b).Hence, the state selected is the new initial point s k , and the process is repeated again (see Figure 6c).The network structure shown indicates a forward movement s k to the immediately next state s k+1 .This movement is independent of any previous state s k .
Moving towards a 3D environment, a similar scene is represented as a master voxel containing an obstacle inside (Figure 7).This voxel can be split into different levels of voxels with different dimensional characteristics in different spatial locations.To obtain the finite set of collision-free spaces S f ree , MACD is performed recursively.The main environment boundaries have been defined from the initial coordinates q i ≡ (x i , y i , z i ) to the final one q f ≡ (x f , y f , z f ), resulting in a rectangular shape, being env = ([ where h i (λ) → i > 0 could take two possible states, static (Equation ( 10), the obstacle does not change its position with passing time) or dynamic (Equation ( 11), the obstacle changes its position to another with passing time).Figure 7a shows the environment definition (blue as the main node n [1] with an obstacle h 1 (λ) placed inside (box green lines).Once the first decomposition is performed (Figure 7b), a first octal level n[[1 1], . . ., [1 8]] is generated with nodes having different occupancy properties.Each will belong to S f ree if there is no h i (λ) within its voxel limits (s k h i (λ) = 0) (blue lines), or to S occup if the voxel is partial or totally occupied by the obstacle ((s The first step is to determine the q i container voxel (for this example, it is n[1 8]).In case of obstacle detection in n[1 8], a recursive decomposition would be performed on the location.At this level, the node n[1 8] is the new starting state s k and the observable neighbours S k+1 are the nodes n[1 4], n[1 6], and n[ 1 7].At this stage, the state model is depicted in Figure 8 and the transition matrix is detailed in Table 3.
The probability distribution for the discrete states can be derived from a multidimensional matrix of dimensions M × M × N (see Table 3) where, M is the number of states, and N is the number of partial functions R (m=1...N) involved in the 3D UAV navigation.Table 3.Initial multidimensional transition matrix listed by first column (i = 1 . . .M) and the first row (j = 1 . . .M).The number of levels are given by the number of N functions R m for the particular navigation characteristics.
This multidimensional matrix equivalent to Figure 5b has been constructed with the information of the state model and the reward values.The aim is find the partial responses coming from the first row and last column, and split in two transition priority vectors (D 1 , D 2 ) ∈ R (M−2) .Using the priority vector p and the offset ξ, vectors D 1 and D 2 deliver partial reward distributions to a possible next state.
The transition priority vector D 1 is built with the set of columns j = 2 . . .(M − 1) and the row i = 1 such that . . .
The second transition priority vector, D 2 , is built with the set of rows i = 2 . . .(M − 1) and column j = M.
Finally, the final reward vector D expressed as the sum of D 1 and D 2 contains the optimal value which points to the best state (node) for continuing the search of the path ρ x : To continue with the example in Figure 7, let us assume that x = best(D) points to n[ 1 6].Nevertheless, n [1,6] is occupied (it belongs S occup ), so MACD is invoked, creating a new level in the data structure, composed of n[1 6 (1 . . .8)] (see Figure 7c).Even though the state s k remains in n[1 8], the new decomposition on n[1 6] returns a new set of neighbours, which join with previous ones, and define the new set S k+1 defined by (n So looking for the optimum within S k+1 is required.Let us assume the best node from s k+1 is n[1 6 3] (notice that MACD is invoked once until now) and so the new state s k is reassigned to n[1 6 3] and consequently, the neighbourhood of the new state s k , is conformed by The previous actions produce the displacement from a current s k state to the next best state, and towards the final point.While the container voxel of the resulting better state x does not contain the goal point, there is the possibility that x has neighbours in different decomposition levels.Hence, the process continues until the goal point is reached.
Once the previous phase has been completed, the optimal path ρ x (F) is totally determined and the search finishes.The flowchart depicted in Figure 9 and the Algorithm 2 show the pseudo-code for the described actions.Algorithm 2 RR-MACD singleDecomp(s k ); 6: end if 12: end while 13: return ρ x (F) The procedure is split in two stages.Firstly, a start voxel location is defined and, if there is an h i (λ) in the environment, an initial simple decomposition singleDecomp is performed (as a result, the collision-free voxel that contains the init point will be defined).Once the initial s k state is assigned-which is also an initial ρ x -the procedure proceeds depending on its occupation.If s k is collision-free, the neighbours S k+1 are assigned, the rewards are calculated, and the optimal x is located.Therefore, the best state x becomes the new s k and is added to ρ x .If this new s k is occupied, the process requires another decomposition on the current s k , and s k will return to its previous state.The procedure is completed when the current s k contains the goal point and s k is collision-free.
Algorithm 2 summarises the procedure described in Figure 9.In line 2, a search of the first containing q i ∈ s k is performed.The loop continues until the current s k state contains q f (line 3).If the current state s k collides with an obstacle h i (λ), the s k state is decomposed (line 5: singleDecomp) and s k returns to its previous state (line 6).In line 8, the neighbours of s k , S k+1 , are defined.Line 9 measures the transition rewards for any neighbour in S k+1 .Finally, the optimum is added to the path ρ x and x is assigned as the new s k in line 10.When the loop finishes, the complete path ρ x (F) is returned (line 13).
The methodology described presents the following properties: (a) A stochastic process in discrete time has been defined (it lacks memory), the probability distribution for a future state depends solely on its present values and is independent of the current state history.(b) The sum of the priorities defined in vector p is not equal to 1. ∑ N i=1 p i = 1.(c) The sum of the values of each priority vector, is not equal to 1.
Finally, it should be noted that the environmental discrete decomposition results in ever smaller voxels of differing sizes.The level of voxel decomposition is variable based on two goals that must be fulfilled: (1) Designer defines the maximum decomposition level (minimum voxel size); however, the algorithm tries to reduce the computational cost and, as a consequence, the minimum voxel size is generally avoided.(2) As soon as a free space meets the defined constraints it is selected by the algorithm regardless of the size of the voxel.

Dynamic environment approach
The RR-MACD can be applied to a 3D UAV environment with obstacles for movement.Once q i and q f points are defined, the trajectory ρ x (F) is calculated and the UAV navigates through it.A dynamic environment implies a positional Equation (11).If the obstacles intersect the previously calculated trajectory (h i (λ)| t ρ x (F) = 1), a new trajectory must be generated.The described methodology in the previous sections has constant targets q i and q f .However, for a dynamic trajectory the path must be updated with a new q i in the current UAV location.

Experiments
In this section, a comparison of computational performance between MACD and RR-MACD algorithms is made.Three different simulation examples have been carried out with 10 executions of each algorithm over each environment.The 150 set of responses supplies the results to determine the performance.The algorithms have been run in a "8 x Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz" computer (Manufacturer: Gigabyte Technology Co., Ltd., Model: B85M-D3H) with 8Gb RAM and S.O.Ubuntu Linux 16.04 LTS.The algorithms were programmed in MATLAB version 9.4.0.813654 (R2018a).
For each simulation example, five different 3D environments were defined.Table 4 shows one row per environment, where the column entitled "UAV target coordinates" expresses the coordinates in terms of init and goal points.The environmental dimensions have been defined in distances related to those coordinates (in a scale of meters m).The column "Obstacles Dimensions" shows the dimensional characteristics of each obstacle and the "Obstacle Location" places the obstacles in a specific location.Moreover, the altitude between init and goal target points are different, guaranteeing that the UAV path will be built in the (x, y, z) axes.It should be highlighted that maps are created with predefined static obstaclesfor the different groups of experiments.
First, an urban environment with several buildings is described.For the construction of this environment, a maximum altitude reference has been taken, such as stated in "The Regulation of Drones in Spain 2019 [62,63]".For environments defined as 2, 3, 4, and 5, larger dimensions have been considered in which a varying number of obstacles in different air spaces are defined.This example shows the performance of RR-MACD when four functions R m are used (N = 4).These functions are the same as those detailed in Section 4. In Figure 10, the specifically results for environments #1 and #2 are shown.Therefore, Figure 10a shows the urban environment.Hence, Figure 10b shows the built path by RR-MACD of the environment #1, where black boxes are the obstacles h i (λ), green stars shows the voxel set of vertices in the state s k and the orange line shows the final path ρ x (F).In Figure 10c, the resulting MACD technique on the environment #2 can be appreciated, where black boxes are the static obstacles h i (λ), blue boxes are the set of voxels used for the Dijkstra's planner to obtain the optimal final path (green stars are the vertices set of its belonging voxel and the orange line shows the final path ρ.Finally, Figure 10d depicts the final trajectory built by RR-MACD, where the orange line shows the final path ρ x (F).
Comparisons of both algorithms regarding computational time is shown in Table 5.The results show an important advantage in decomposition time S f ree and ρ x (F) spaces for RR-MACD.Notice that, when an obstacle h i (λ) intersects with a voxel decomposition, MACD recursively continues until the predefined level n (for this reason in environment #3 the searching time for MACD is considerably greater than other environments).The third column "RR-MACD 4 vs.MACD (%)" shows in percentages the differences between MACD and RR-MACD 4 regarding environmental decomposition time, number of S f ree free-spaces generated during the searching process, and the number of final path nodes ρ.It is important to mention that MACD needs an additional time because Dijks time(s) shows the seconds needed to find ρ.For example, in the first environment, the RR-MACD algorithm just needs 36.238% of the time that MACD takes for environment decomposition, and 54.641% of the time that MACD takes to generate S f ree , and 93.684% of the nodes that MACD needs to build ρ.Therefore, RR-MACD shows a general improvement on the process.

Example 2. Static Obstacles and 10 Constraints
In example 1, the set of partial functions involved in 3D UAV navigation is equal to 4. For this example, an additional set of 6 random values were added as new functions to simulate complex flight characteristics.Therefore, the number of partial functions in 3D UAV navigation R m will be equal to M = 10 and the probability distribution multidimensional matrix now has 10 levels.This increment of functions, and its random nature, will provoke inherent changes in the results.These can be observed in the fourth column of Table 5, entitled "RR-MACD 10 vs. RR-MACD 4 %", where the relative difference between RR-MACD 4 and RR-MACD 10 shows the percentage increase or decrease in decomposition time, S f ree and ρ x (F).
For example, let us compare performances between RR-MACD 10 and RR-MACD 4 in the environment #1.RR-MACD 10 needs 51.499% more time to find a final path.It generates 74.975% plus voxels and the number of nodes in the final path ρ is 50% higher.
Table 6 shows a set of additional data corresponding to the results in terms of distances travelled between the init point and the goal point after the execution of each algorithm.As mentioned in the previous sections, the main objective of planning is not to reach optimality in exclusive terms of distance.

Example 3. Dynamic Obstacles
An additional experiment was performed that considered obstacles in motion.A new environment has been proposed for obstacles intersecting with the calculated ρ x (F).Hence, a new ρ x (F) with a new init (actual location of the UAV) is built.
Figure 11 represents this new environment with two obstacles in motion (black boxes).The first dynamic obstacle begins its displacement in location (600, 600, 600)m, and performs a continuous motion in an east direction with a constant velocity of 15 m/s.The second one begins its flight in location (80, 800, 600)m with a constant velocity of 15 m/s in the same direction.After 11 s, the first obstacle collides with ρ x (F) (this crash is illustrated with orange line in Figure 11a) and a new ρ x (F) is calculated, Figure 11b shows the new location of the obstacles (notice that limits in x and y axis have changed from 1000 m to 800 m).At t = 34 s, a new collision is detected (Figure 11c), even though the first obstacle is out of the environment, the second one has collided with ρ x .When the UAV (blue square) has passed this part of ρ x (F), the new trajectory until goal is collision-free.

Conclusions
This paper presents an adaptive grid methodology in 3D environments applied to flight path planning.The approach described considers different constraints such as UAV maneuverability and geometry or static and dynamic environment obstacles.
The proposed algorithm, RR-MACD, divides the 3D environment in a synthesised way and does not need to invoke any additional planner to search (such as Dijkstra or A * ) for an optimal path generation.The improvement in the computational effort and the reduction in the number of nodes generated by the RR-MACD has been shown.The stochastic process in discrete time involved in the algorithm also shows a future probability distribution that only depends on its present states.
The partition of the 3D space into a defined geometric form enables decreasing the number of control points in the generated trajectory.In addition, the issue of computational cost and complexity has been addressed by providing a solution that generates relatively shorter time responses compared to techniques for generating similar trajectories.
In a future work, an additional processing task will be carried out, using the set of nodes, or control points ρ x (F) generated, to create a smoother path-and then the methodology will be tested under real flight conditions on an UAV model as in [64][65][66].

Figure 1 .
Figure 1.General scenario for 3D unmanned aerial vehicles (UAV) path planning.The grey cubes depict the environmental obstacles.Blue cubes are generated by the environmental discretisation and represent collision-free spaces.Orange lines are possible paths.

Figure 9 .
Figure 9. Flowchart of RR-MACD.Notice how steps 3 to 8 in Figure 2 are re-used in this chart and renamed as singleDecomp.

Figure 10 .
Figure 10.Graphical Results of Example 1.(a) Modeled urban environment cluttered with surrounding buildings.(b) Results for environment #1 with RR-MACD.(c) Final path generated using MACD for environment #2.(d) Final path generated using RR-MACD for environment #2.

Figure 11 .
Figure 11.Network structure and state transition. (a) Collision in time of flight t = 11.(b) New ρ x (F) after first collision.(c) New ρ x (F) after first collision.

Table 1 .
Computational cost and complexity in the graph structure, where n is the number of vertex and m is the number of edges.

Table 4 .
Definition of five different 3D environments for the simulation examples.

Table 5 .
Comparison of both algorithm executions for different environments.Column recursive rewarding modified adaptive cell decomposition (RR-MACD) 4 vs.MACD (%) shows the average resources (decomposition time (s), number of free voxel decomposition "S f ree " and number of nodes in the final path "ρ") used for RR-MACD in comparison with MACD.Column RR-MACD 10 vs. RR-MACD 4 (%) shows the average resources (decomposition time (s), number of free voxels decomposition "S f ree " and number of nodes in the final path) used for RR-MACD when the number of flight parameters (constraints) are augmented to 10.

Table 6 .
Path results in distance metrics.