Optimal Motion Planning in GPS-Denied Environments Using Nonlinear Model Predictive Horizon

Navigating robotic systems autonomously through unknown, dynamic and GPS-denied environments is a challenging task. One requirement of this is a path planner which provides safe trajectories in real-world conditions such as nonlinear vehicle dynamics, real-time computation requirements, complex 3D environments, and moving obstacles. This paper presents a methodological motion planning approach which integrates a novel local path planning approach with a graph-based planner to enable an autonomous vehicle (here a drone) to navigate through GPS-denied subterranean environments. The local path planning approach is based on a recently proposed method by the authors called Nonlinear Model Predictive Horizon (NMPH). The NMPH formulation employs a copy of the plant dynamics model (here a nonlinear system model of the drone) plus a feedback linearization control law to generate feasible, optimal, smooth and collision-free paths while respecting the dynamics of the vehicle, supporting dynamic obstacles and operating in real time. This design is augmented with computationally efficient algorithms for global path planning and dynamic obstacle mapping and avoidance. The overall design is tested in several simulations and a preliminary real flight test in unexplored GPS-denied environments to demonstrate its capabilities and evaluate its performance.


Introduction
Throughout the last century, injuries and fatalities in subterranean environments have remained a major concern around the world. For example, mine workers are vulnerable to hazards such as cave-ins, underground flooding, and gas explosions. Unmanned vehicles can play a key role in performing both tedious and dangerous tasks, for instance air quality sampling, tunnel inspections, and search-and-rescue missions. A flying drone is a particularly attractive platform for underground operations due to its abilities to move quickly, traverse any terrain, navigate through tight spaces, and capture data from any angle. Recent advances in robotics have motivated research into designing novel path planning approaches, allowing the vehicle to plan safe paths and navigate through previously unknown environments.
Path planning is a computational problem to generate and follow a collision-free trajectory from one point to another [1]. It has many applications, such as robotic surgery [2], driverless cars [3], automation [4], and mining [5]. An extensive amount of research has been conducted in the field of path planning for autonomous vehicles [3,6]. However, most of the presented approaches provide non-or sub-optimal solutions and do not account for the dynamics of the vehicle, instead treating it as a kinematic model with velocity inputs [1], for instance a unicycle or kinematic car [7]. Moreover, navigating through dynamic and unknown environments is a challenging task as it requires safe navigation around both static and dynamic obstacles, which adds computational load for the onboard computer of the autonomous vehicle. Nonlinear Model Predictive Control (NMPC [8]) is The contributions of this work are as follows: • A recently proposed trajectory generation algorithm (NMPH) is used for local path planning of the drone. The NMPH is integrated inside the global motion planner and produces optimal local trajectories for the drone vehicle in real-time. • A methodological two-layer global motion planner design is proposed. The first layer utilizes a graph-based planner to generate terminal setpoints for the second layer, which uses the NMPH design to generate continuous optimal paths from the vehicle's current pose to the terminal setpoint in real time. • Efficient algorithms for obstacle mapping and avoidance are proposed which produce models of static and dynamic obstacles used by the NMPH to generate safe and collision-free paths in a dynamically changing environment. • A robust path guidance algorithm is implemented to avoid the risk of NMPH getting trapped into a local minima. • The overall design is implemented using quadcopter and hexacopter drone dynamics, enabling navigation through unknown, dynamic and GPS-denied environments. • Several simulation results and a preliminary experiment are presented in this work to validate the proposed approach.
The remainder of this paper is organised as follows. Section 2 surveys literature related to our work. Section 3 presents the problem formulation of NMPH and its integration with feedback linearization. The two-layer global motion planner design and choices of algorithms to provide robust path planning and obstacle avoidance are discussed in Section 4. System dynamic models of and implementation of our design on rotary-wing drones are presented in Section 5. In Section 6, various simulation and experimental results are presented to evaluate and validate the proposed approach. Finally, concluding remarks are given in Section 7.

Related Work
Path planning for an autonomous robot in an unknown, GPS-denied and dynamically changing environment is a challenging task, since the robot needs to plan trajectories that consider the vehicle's relative motion with respect to the surrounding obstacles. The path planning problem itself has been thoroughly studied in the literature and can be classified into three main categories: search-based, sampling-based, and optimizationbased methods.
The search-based methods, a.k.a. grid-based, discretize the environment map into a graph of grids and use a search algorithm to find a collision-free path through these grids [6]. The two fundamental graph search algorithms are Breadth-First Search (BFS) and Depth-First Search (DFS) [12]. BFS is based on a first-in-first-out queue and can produce an optimal solution if the graph is uniformly weighted. Meanwhile, a last-in-first-out stack is used in DFS until the goal is reached, but no optimality is guaranteed.
One of the most widely used optimal searching algorithms for quickly finding the shortest path is the Dijkstra algorithm [13]. It directs the search towards unvisited nodes, then calculates and updates the shortest distances to the neighbor nodes from the root node. It keeps doing this until all the nodes are visited. Meanwhile, A* [14] is a commonly used algorithm for path planning. A* is an extension to Dijkstra algorithm, where it combines the cost search with heuristics that guide the search towards the goal point to achieve quicker searching performance. Many extensions of A* have been proposed, for instance Lifelong Planning A* (LPA*) [15] was developed to support changes in the environment without recalculating the entire graph, D* Lite [16] extends LPA* to re-plan the path while the robot is moving, Anytime Repairing A* (ARA*) [17] improves the optimality of the path by reusing suboptimal solutions from previous executions, and Hybrid-state A* [18] generates the graph based on the robot dynamics and thus searches for a dynamically feasible path.
The sampling-based methods are considered one of the main motion planning methods for robots with a high number of Degrees-Of-Freedom (DOF) [19]. In these methods, feasible robot poses are randomly sampled to form admissible paths. Probabilistic Road-Map (PRM) [20] and Rapidly-Exploring Random Tree (RRT) [21] are the fundamental sampling-based methods for motion planning. In PRM, a graph is built from random configurations and connected using a local planner (for instance Dijkstra's searching algorithm for the shortest path between two configuration). PRM is complete but does not necessarily provide an optimal path solution. The RRT method is designed to randomly build a space-filling tree of vertices and edges inside a complex environment to find a feasible path to the goal node. However, the RRT-generated paths are not optimal [22]. Asymptotic optimality of paths can be achieved by employing various extensions of RRT, such as RRT* and Rapidly-Exploring Random Graph (RRG) methods [22]. RRG constructs a graph by connecting new samples with all nodes within a specified distance, then finding the shortest path using a local planner such as the Dijkstra algorithm. Meanwhile, RRT* searches the local nodes and finds the shortest path from the start to end nodes.
In general, there are three main limitations of the search-and sampling-based motion planning approaches. First, they do not account for the constraints imposed by the robot dynamics, even if some support kinematic and/or dynamic constraints (e.g., velocity and/or acceleration limits, respectively) [23]. A second limitation is consistency, since for several executions the algorithms may not produce identical trajectories between a start and goal configuration in the very same environment. Third, the computational load of these methods generally prevents them from being able to actively regenerate paths while moving between the start and goal configuration, which makes motion planning a difficult task in dynamic environments. However, some optimization-based methods can overcome these limitations, and the present work is directed at using optimization for planning safe, consistent, and time-efficient paths which also respect the dynamics of the vehicle. This last feature allows generating smooth trajectories for the robot vehicle, avoding the jerky motions and rapidly changing trajectories often generated by other planning methods [1].
The optimization-based approaches solve a constrained non-convex optimization problem to smooth the trajectory generated by other methods. Some optimization-based methods use cost-gradient information of a trajectory's waypoints for refinement purposes, for instance CHOMP [24], Trajopt [25], and STOMP [26]. Other optimization-based methods are more closely related to optimal control, which focuses on system dynamics more than collision prevention. Examples include dynamic programming [27], LQR-based [28], and Model Predictive Control (MPC) [29].
One of the challenges of using an optimization-based path planning approach is accounting for obstacle constraints at each time instant the optimization problem is solved, especially for real-time implementations [30]. For a small number of obstacles, it has been demonstrated that finding local optimal trajectories is possible with MPC in outdoor environments [31]. Conversely, increasing the number of obstacles and considering 3D and dynamic environments makes the optimization problem much more computationally expensive to find feasible paths in real-time.
Our proposed formulation addresses the above challenges in two ways. First, it reduces the non-convexity of the optimization problem by combining the nonlinear plant model with a feedback linearization. Second, it maps obstacles as possibly moving volumes, and adaptively introduces state constraints modeling them in order to efficiently find local solutions. Our proposed design is integrated with a graph-based exploration planner [5] in order to provide global motion planning capabilities.

Nonlinear Model Predictive Horizon for Path Planning
The Nonlinear Model Predictive Horizon is a recently proposed methodology by the authors [9]. The purpose of NMPH is to generate optimal reference trajectories for closed-loop systems, and it will be implemented here for path planning. An overview of the NMPH algorithm is presented next.

NMPH Algorithm for Optimal Trajectory Generation
NMPH is an optimization-based reference trajectory generation technique for nonlinear closed-loop systems. Using a model of the nonlinear plant dynamics plus a feedback linearization control law, the NMPH creates optimal reference trajectories for a closed-loop system as shown in Figure 2. The generated trajectory is continuously updated by accounting for the current state of the system and path constraints within the optimization problem.
The purpose of the nonlinear control law within the NMPH is to reduce the nonlinearity of the system model, and consequently the non-convexity of the optimization problem. This leads to better performance in terms of reduced computational time and better convergence properties, enabling motion planning to be repeatedly computed in real time.  Figure 2. NMPH architecture. A model of the nonlinear system dynamics is used to perform the optimization process within NMPH (gray box). The resulting optimized reference trajectory is passed to the actual closed-loop system (blue box) for tracking purposes.
Consider a continuous-time nonlinear system controlled by a nonlinear control law, where x ∈ X ⊆ R n x are the system states, u ∈ U ⊆ R n u are the system inputs, and ξ ∈ Ξ ⊆ R n ξ are the system outputs (here 3D position and heading, such that Ξ ⊆ X). The reference trajectory is denoted by ξ re f ∈ Ξ. The map f : X × U → X represents the system dynamics and g : X × Ξ → U is the control law that is used to make the system output follow the reference trajectory. The optimization within NMPH uses Equations (1)-(3) plus assigned constraints to solve for the predicted system state trajectoryx (which for us includes the predicted output trajectoryξ as a subset) and the estimated reference trajectory for the closed-loop systemξ re f , all whilex converges to a desired stabilization setpoint x ss . The optimization calculations are summarized in Algorithm 1.
Within Algorithm 1, the optimization problem is solved over the finite time horizon τ ∈ [t n , t n + T]. The cost function is chosen to penalize the deviation of the predicted system state trajectory from the stabilization setpoint x ss , as well as the deviation of the predicted output trajectoryξ from the estimated reference trajectoryξ re f . The weighting matrices W x , W ξ , and W T are chosen by the user to balance the effects of the optimization. The cost function consists of the stage cost function L x(τ),ξ re f (τ) , which represents the cost of the problem over the time horizon, and the terminal cost function E x(t n + T) , which represents the cost of steady-state error. X ⊆ X, U ⊆ U, and Z ⊆ X are the state, control, and trajectory constraint sets, respectively. The inequality constraint O i (x) ≤ 0 is used to model p dynamic obstacles in the environment.
In Algorithm 1, the current system state x(t n ) of the actual system is first measured or estimated, then a prediction of the reference trajectoryξ re f for an admissible control is obtained by minimizing the cost function over the prediction horizon while respecting the system model, control law, and other constraints. Finally, the predicted reference trajectory is sent to the closed-loop system for tracking, and the above process is repeated at the next sampling time instant, possibly with an updated terminal setpoint x ss . Algorithm 1 (NMPH algorithm with stabilizing terminal condition x ss ).
(9) ifx → x ss then (estimated trajectory converging towards terminal setpoint) n ← n + 1; else break; In this paper,ξ is defined as the predicted output trajectory which in our case (drone's position and yaw angle) represents a subset of the predicted state trajectoryx, whileξ re f is the estimated reference trajectory which is computed by minimizing the cost function Equation (4) within the NMPH.ξ re f is used as the reference trajectory for the actual closedloop system, providing smooth trajectories which respect the dynamics of the vehicle. Please note that ξ re f (n) in Equation (3) is taken asξ re f , where the latter is actively updated by the NMPH algorithm in response to events such as new dynamic obstacles. Also, please note thatξ re f andξ converge to each other, and both can be used as a reference trajectory for the actual closed-loop system (here the drone). Further details about the NMPH algorithm can be found in the authors' recent work [9].

Feedback Linearization Control Law
This section reviews the feedback linearization (FBL) control law used within the NMPH structure to improve the prediction performance of the reference trajectory. In this work, a Multi-Input Multi-Output (MIMO) control-affine system (here a drone) is used. The FBL design for this class of systems is summarized below.
Consider a MIMO nonlinear control-affine system of the form, where f , g 1 , . . . , g n u are smooth vector fields in R n x . G(x) is a n x × n u matrix with rank G(0) = n u . The objective of FBL is to transform the nonlinear system (10) into a linear canonical form with a non-singular state feedback [11], which is defined as where β(x) is a smooth function, and β(0) = 0. D(x) −1 is the inverse of a non-singular n u × n u matrix. D(x) and the non-singular state feedback transformation are . . , n u } are smooth functions that form the non-singular matrix D(x). r i are the controllability indices and G is a distribution of vector fields. L f ϕ = dϕ, f is the Lie derivative, L r f means the Lie derivative is applied r times, and ad r f g = [ f , ad r−1 f g] is the Lie bracket between the vector fields f and g. Further details about the non-singular state feedback transformation for MIMO control-affine systems can be found in Theorem 2.7.3 of reference [11].

Motion Planner Architecture
Our proposed motion planning design aims to produce optimal vehicle paths while navigating in unexplored, dynamic and GPS-denied environments. We combine a graph-based exploration technique with a Nonlinear Model Predictive Horizon-based approach based on optimization which respects the vehicle's dynamics and supports dynamic obstacles. This integration yields feasible, optimal, and robust paths while exploring challenging environments. Figure 3 describes the overall architecture of our motion planner. The design is composed of three successive stages. The first stage acquires sensor data to build a physical representation of the environment which contains both static and dynamic objects in it. Volumetric mapping is used for this stage since it is computationally efficient, easy to visualize, can be incrementally constructed and reconstructed online, and provides the voxel grid structure needed for the next stage. Details about the volumetric mapping and its layers will be discussed in Section 4.2. Section 4.3 discusses the second stage of the motion planner, which is built around a graph-based planning approach. It consists of the sampling-based RRG algorithm, which builds a connected roadmap graph, and the Dijkstra searching algorithm to extract the best path from within the graph. The main purpose of the graph-based approach is to guide the vehicle towards unexplored areas within the environment and provide terminal vertices, a.k.a. stabilization or terminal setpoints, to the local path planner.
The third stage of the motion planner uses the NMPH-FBL local path planning method. Fusing this method with the earlier stages improves the robustness of generating optimal paths and avoiding static and dynamic obstacles. The considerations involved in finding a feasible path are shown in Figure 3. Further details are provided in Section 4.4.
The reference trajectory computed by the path planner is fed to the control system of the vehicle for tracking purposes. As the drone vehicle moves, the NMPH continues to update its reference trajectory in response to feedback of the vehicle's state and new obstacles. Once the vehicle reaches a setpoint, the motion planning process is repeated, which continues until the environment is fully explored or the mission is interrupted by the operator. Dijkstra

Volumetric Mapping
Volumetric mapping is the foundation of motion planning and navigation strategies in 3D environments. The volumetric mapping algorithm named Voxblox [32] is used in our work. In this technique, the map of the environment is represented volumetrically using the signed distance field to distinguish between known, unknown, free, and occupied spaces [33]. The grid consists of voxels with a corresponding type. Groups of occupied voxels represent surfaces of an object, walls, and so on. The main advantage of volumetric mapping is its real-time capability for incrementally representing unstructured and unexplored environments, which makes it a suitable solution for online planning and exploration. The Truncated Signed Distance Field (TSDF [34]) is one of the most efficient methods of representing volumetric maps. TSDF is an implicit surface representation that consists of a 3D voxel array. Each voxel is indexed by the distance of the ray between the sensor and the surface, and is truncated near the surface to decrease the errors that are caused by sensor noise. TSDFs are computationally efficient and can be constructed online. Also, they are capable of filtering out sensor noise and create meshes with voxel resolution for visualization purposes.
In contrast to TSDF, the Euclidean Signed Distance Field (ESDF) uses the Euclidean distance to the nearest occupied cell in labeling the voxel grid [32]. ESDFs are directly built out of existing TSDFs to make use of the distance information in determining the obstacle surface location for planning purposes. In other words, TSDF is for mapping and ESDF for planning, and the main difference between them is the way that distances are computed [35].
As presented in Figure 3, the volumetric mapping process consists of three layers. The sensor data is processed to build the TSDF layer, then the voxels are integrated into the ESDF and mesh layers as presented in [32]. The ESDF voxels and mesh blocks are updated incrementally allowing real-time map generation for planning and online visualization of the environment. To reduce the complexity of calculating the layers data, a voxel hashing approach [36] is used to store the information of each layer in a hash table, which results in O(1) complexity for adding or retrieving the data.

Graph-Based Path Planning
In this section, we summarize the graph-based planner presented in [5], which is used to help the vehicle navigate through unknown GPS-denied environments.
Assume that M G is a global 3D voxel-based map, which consists of voxels m ∈ M G . The map is incrementally built by a depth sensor S plus the vehicle's pose estimation using the volumetric mapping approach previously discussed in Section 4.2. The map is categorized into three spaces, free space M f G , occupied space M o G , and unknown space M u G . The map has a global volume V G and is incrementally constructed within a local map sub-space M L of volume V L centered around the current vehicle's output (here 3D position and heading) The graph-based planner [5] performs a local search towards unknown areas of M G . It is based on the sampling-based RRG algorithm [37] which builds a connected roadmap graph G G composed of collision-free vertices ν and edges e stored in vertex set V and edge set E, respectively. The edges are straight paths connecting the vertices using the nearest neighbor search [38]. The global graph G G is continuously constructed from the undirected local graph G L within the local space M L . The local search within the bounded volume V L considers the physical size of the vehicle V R and bounds it within a sub-space M R . Collision detection is performed to ensure collision-free paths σ L , where M R ∈ M f G for all randomly generated vertices and edges.
The set of all shortest paths Σ L , starting from the initial or current vertex ξ 0 to all destination vertices ξ ν , is found using the Dijkstra algorithm [12]. After that, the best path is evaluated by calculating the Volumetric Gain V for each vertex. The Volumetric Gain of a vertex is a measure of the unmapped volume based on the depth reading around that vertex. The weight functions related to distance and direction combined with V are used to compute the Exploration Gain E (σ i ) for all σ i ∈ Σ L , i = 1, . . . , n. The vertices of these paths are ν i j , j = 1, . . . , m i , and ν i 0 is the initial vertex along path σ i . The Exploration Gain for a path is calculated as [5] where S(σ i , σ sp ) is a distance factor between a path σ i and its corresponding straight path σ sp which has the same length along the estimated exploration direction. This factor prevents the vehicle from sudden changes in its exploration direction which might happen in branched environments within M L . D(σ i , σ sp ) is the distance between ν i j and ν i 0 of the path σ i , which penalizes longer paths for a higher exploration rate. The tunable gains λ S and λ D are positive gains.
Subsequently, the best path σ best that maximizes the Exploration Gain is selected and sent to the NMPH-FBL local motion planner to find the optimal path that the vehicle will follow. The whole procedure is repeated once the vehicle reaches the destination vertex. The detailed algorithm for building the map and planning the best path is presented in Algorithm 2.
If all vertices within G L are explored, the search will be expanded to the unexplored vertices of G G . This will guide the vehicle to another location on the global map and continue the exploration mission. For the return-to-home feature, the Dijkstra algorithm is also used to find the shortest path between the vehicle's current output ξ 0 and the homing vertex on G G . This feature can be invoked once the exploration mission is completed, the battery level is low, or by intervention from the operator. ξ nearest ← NearestVertex(G L , ξ rand ); 8: if CollisionFree(ξ rand , ξ nearest ) then 9: V ← V ∪ {ξ rand }; 10: E ← E ∪ {ξ rand , ξ nearest };

Nmph for Local Path Planning
The graph-based planner in Section 4.3 generates non-optimal or sub-optimal paths because the vertices are created randomly within V L . In addition, the straight edges connecting vertices cause jerky motions for a drone following the path. Finally, the path generated by the graph-based planner does not respect the vehicle's dynamics. The NMPHequipped path planning approach presented in Algorithm 3 overcomes these issues by generating a path which respects the system's dynamics and provides a smooth and optimal path which also avoids obstacles. From Figure 3, the NMPH path planning stage includes end for 10: end for 11: C L ← ObstacleConstraint(ν term , ξ 0 , M obs ); Find the obstacles constraints 12: σ opt ← NMPH_Planning(ν term , ξ 0 , C L ); 13: if σ opt not f easible then 14: for i = 1, . . . , n do Path Guidance Algorithm 15: ν i ← ExtractVertex i (σ best ); 16: σ opt ← NMPH_Planning(ν i , ξ 0 , C L ); 17: end for 18: if σ opt not f easible then 19: σ opt = σ best ; 20: end if 21: end if 22: PathFollowing(σ opt ); Follow the optimal path

Dynamic Local Obstacle Mapping
Transforming physical obstacles within the volumetric map to optimization constraints is a challenging task. These obstacles need to be represented by a cluster of constraints while respecting the limitations of the optimization process, specifically a limit on the number of inequality constraints that the optimization problem can handle.
In this Section, we will present a strategy that maps obstacles in the environment into a dynamically moving space around the vehicle. This facilitates representing the obstacles as inequality constraints for optimization. This mapping technique, called Dynamic Local Obstacle Mapping (DLOM), generates a continuously changing map M obs .
Based on the occupied voxel in M o G , the DLOM strategy generates virtual spheres centered on occupied voxels within a certain space surrounding the vehicle (e.g., a box of dimensions D obs ). These virtual spheres have a radius which ensures a safe clearance between the vehicle sides and the occupied voxel. Figure 4 shows the volumetric map without and with DLOM. One advantage of using a sphere is for modeling the obstacle as a state constraint. This inequality constraint requires r i obs , the distance between the vehicle and the center of the i th sphere, to be larger than a specific threshold r thld representing the radius of the virtual sphere as r i obs ≥ r thld . The solution of the optimization problem within NMPH will thus generate a path that doesn't pass through the virtual spheres and hence avoids the obstacles in the environment.  Modeling all occupied voxels in D obs as obstacles would result in an excessively large computational burden to continuously generate M obs and solve the optimization problem. Instead, any voxels inside the ith sphere are excluded from M obs . Lines (3-10) of Algorithm 3 employ a simple running window strategy to remove extra voxels, and those remaining are represented as virtual spheres which provide constraints to the optimization problem. Figure 5 shows how the extra spheres are removed to reduce the computational load involved in producing the obstacles map. The exact time needed to build the dynamic obstacles map depends on the number of occupied voxels within D obs . Experimentally, we found that the time required to build such a map on a desktop-class machine with a modern GPU (detailed specifications are given in Section 6.1) takes approximately 3 ms.

Obstacle Avoidance Algorithm
As soon as the obstacle map is created, the NMPH creates an optimal local path respecting the constraints acquired from M obs . The optimization solver is limited in the number of inequality constraints it can handle, making it impossible to include all the mapped obstacles in M obs within the optimization problem. Hence, a dynamic method for selecting a specific number of constraints is described next and included in Algorithm 4. if k is n T then 12: k = 1; 13: end if 14: end if 15: end for 16: end for 17: Our chosen solver provides a solution to the optimization problem every ∼4 ms (running on the computer described in Section 6.1), which makes it possible to solve the problem several times before sending the optimum reference path to the vehicle's flight control system. The Obstacle Avoidance algorithm takes advantage of this by first solving the optimization problem without considering obstacle constraints, then running a collision check on the generated path to find whether it crosses any virtual spheres in M obs . It is important to mention that the collision check is performed over the entire optimization horizon [t n , t n + T] in Algorithm 1, which is discretized into N points for numerical computation.
If a collision is detected at some points within the optimization horizon, a Dynamic Constraint Array registers the center of a sphere s ∈ R 3 that contains these collision points, and passes them to the solver as inequalities used to compute a new solution which avoids them. The Dynamic Constraint Array has dimensions of N × 3 and can register up to N different inequality constraints for the next run of the optimization problem. For example, assume that a collision is detected on horizon points 3, 4 and 5, and each of the collision points are located within the 40th virtual sphere. In this case, the coordinates of the center of this sphere are registered in the Dynamic Constraint Array at indices 3, 4 and 5, while the rest of the array entries are kept Null. In the next iteration of the solver, a new constraint representing the cloned entries of the Dynamic Constraint Array will yield a path which avoids the region where the collisions previously occured.
To enhance the reliability of the Obstacle Avoidance algorithm while the vehicle is in motion, a specific number of Temporary Constraint Arrays (labeled by k in Algorithm 4) store the information from the Dynamic Constraint Array and are used in the optimization solution as well. The Temporary Constraint Arrays are static, which means that each registers only one virtual sphere over all its N indices.

Robust Path Guidance Algorithm
The initial state of the vehicle, the nature of the environment (e.g., branched narrow passages), and the terminal condition location may all affect the feasibility of the optimization problem solution. Figure 6 depicts two different path planning scenarios. In Figure 6a, the obstacle is small and NMPH can easily find a feasible solution. In Figure 6b, the obstacles almost block the way to the destination point. In this situation, the NMPH solver risks producing infeasible solutions by getting trapped in local minima.
As mentioned in Section 4.3, the graph-based path planning yields multiple vertices, which are used by the NMPH approach to generate multiple feasible paths, ranging from the nearest to the most distal (terminal) vertex. The small obstacle depicted in Figure 6a does not cause any issues for the NMPH in generating a feasible path directly to the terminal vertex. However, Figure 6b illustrates how the NMPH algorithm uses multiple consecutive paths (gray lines) generated to the intermediate vertices of the path generated by the graph-based planner (green line) to eventually find the resulting optimal path (blue line). Lines 12-21 in Algorithm 3 demonstrate the Path Guidance algorithm that adds robustness to the NMPH approach in finding a feasible solution. Note in case the Path Guidance algorithm is unable to help NMPH find a feasible path to the terminal vertex, the system can still use the path generated by the graph-based planner.

System Model
In this section, both a quadcopter and a hexacopter system are modeled as rigid bodies with lumped force and torque inputs at each rotor. For simplicity, drag forces, rotor gyroscopic effects, and propeller dynamics are not included in the models. The rigid-body dynamics are formulated using the Newton-Euler equations [39].
A fixed navigation frame N and a moving body-fixed frame B are the two reference frames used in this work and their basis are selected to follow the North, East, and Down (NED) aerospace convention. Schematics of the drones with their body-fixed reference frames and basis are depicted in Figure 7. The dynamics of a rigid body moving through 3D space is represented by rigid-body kinematics and the Newton-Euler equations as shown below [39], where p n ∈ R 3 is the vehicle's position and v n ∈ R 3 is its velocity, both in coordinates of the inertial navigation frame N . The mass moment of inertia matrix J is assumed to be diagonal as J = diag([J 1 , J 2 , J 3 ]), and m is the total mass of the drone. ω b ,ω b are the angular velocity and acceleration vectors, respectively, in coordinates of the body-fixed frame. The rotation matrix of B with respect to N is R = R nb ∈ SO(3). S(·) ∈ R 3×3 is a skew-symmetric matrix such that S(x)y = x × y, x, y ∈ R 3 . The system input vector is [ū, τ b ] T , whereū > 0 is the net thrust from all rotors in the direction of −b 3 , and τ b = [τ b1 , τ b2 , τ b3 ] T are the torques created by the rotors about the three body frame axes.
It is important to mention that each of the vehicle configurations (quadcopter and hexacopter) transforms the rotors' thrusts and torques to the system input vector [ū, τ b ] T differently. These transformations are assumed to be performed in the onboard flight controller, and consequently both configurations are represented by the same dynamics Equation (15). Hence, the proposed algorithm development is the same for both configurations.

Development of NMPH on a Drone Vehicle
The state and input vectors of the rigid-body dynamics presented in Equation (10) are Using the roll-pitch-yaw (φ, θ, ψ) Euler angle parameterization of R ∈ SO(3), the nonlinear control-affine representation of Equation (10) can be expressed aṡ where x 10 x 12 where s (·) = sin(·), c (·) = cos(·), and t (·) = tan(·). In order to make the system in Equation (17) state feedback linearizable, the state vector is augmented with two additional states, which are the thrust x 13 =ū and its rate x 14 =u, and the thrust is replaced byü in the input vector [9]. Moreover, integral states ζ are added to the system dynamics to compensate for unmodeled external disturbances which affect the control system and NMPH performances. The proposed extension of the system is presented in Equations (18) and (19).
Finally, the feedback linearization control inputs are found using Equation (13), which arë where the feedback inputs are selected as follows, and the error e z i is defined as the difference between the desired and the actual feedback state e z i = z re f i − z i , i = 1, . . . , n. As presented in Equations (15) and (21), the drone's behavior is described by its nonlinear system dynamics and the feedback linearization control. The optimization within NMPH exploits their integration to enhance the performance of generating the reference trajectory. The continuous-time NMPH presented in Algorithm 1 is solved using a multiple shooting optimization technique. The solver used in our work, ACADO [40], discretizes the system dynamics, control law, and inequality constraints over the prediction horizon at each time instant t n . Figure 8 shows the optimization process from the problem formulation to the trajectory generation.

Nonlinear Program (NLP)
Solves the optimization problem that includes nonlinear function and/or nonlinear constraints using Sequential Quadratic Programming (SQP)

Trajectory Generation
Continuous online generation of the desired trajectory for the system Figure 8. NMPH Optimization Process. The non-convex optimization problem can be solved iteratively using Sequential Quadratic Programming (SQP) [41]. In SQP, the problem is divided into a sequence of subproblems, each of which solves a quadratic objective function subject to linearized constraints [42].
The integration of the feedback linearization within the NMPH aims to reduce the non-convexity of the optimization problem. A perfect model of the system dynamics would allow the closed-loop form to be represented by a linear canonical form as shown in Equation (20), for which the feasibility and stability of the optimized solution are guaranteed and the computational power needed to solve the optimization problem is greatly reduced over the non-convex case. However, even an imperfect model still reduces the non-convexity of the optimization problem as compared to working directly with the (nonlinear) plant dynamics as in standard NMPC.

Experimental Results
In this section, simulation and a preliminary real-time hardware flight test are presented to evaluate and validate the proposed design on quadcopter and hexacopter vehicles while operating in GPS-denied environments.
The algorithms are implemented within the Robot Operating System (ROS) [43], a Linux-based system that handles communication between the individual subsystems and the vehicle. The ACADO Toolkit [40] is used for optimization. The optimization problem is programmed in a self-contained C++ environment within this toolkit, then a real-time nonlinear solver is generated to run the optimizations online. The resulting code can be compiled and run within ROS, which also handles the communication between the solver and the vehicle, either simulated or real [44]. The NMPH optimization problem (4)-(9) was written in C++ code using ACADO, then automatically converted into a highly efficient C code that is able to solve the optimization problem in real-time.

Simulation Results
In order to test the proposed approach before implementing it on a real drone, the quadcopter drone vehicle is simulated within AirSim [45]. AirSim is an open-source simulator that provides photo-realistic environments plus a physics engine to enable performing lifelike simulations.
All frameworks and the AirSim simulator are run in ROS on an Intel Core i7-10750H CPU @ 2.60-5.00 GHz equipped with the Nvidia GeForce RTX 2080 Super Max-Q GPU. The prediction horizon for NMPH was set to T = 8 s using a sampling time of 0.2 s, which was found satisfactory for trajectory generation. The cost function weights W x , W ξ , and W T were adjusted heuristically to ensure a balanced trajectory generation performance towards the terminal setpoint.
The drone's measured pose and pointcloud information are obtained from the AirSim simulator and sent to the motion planner. The global graph-based and local NMPH planners generate reference trajectories for the vehicle, which are forwarded to AirSim for trajectory tracking purposes. RViz, the 3D visualization tool for ROS, is used to monitor and visualize the simulation process. Figure 9 shows the ROS network architecture of the nodes and topics employed in performing our simulation. Different simulation results are now presented to evaluate the performance of the proposed approach on a quadcopter drone navigating autonomously through a previously unexplored, GPS-denied underground environment available within the AirSim simulator. The motion planner design illustrated in Figure 3 is implemented for this purpose. Within AirSim, the virtual quadcopter is equipped with a customized 32-channel 360 • scanning Lidar sensor with a 45 • vertical field of view, 10 Hz rotation rate, and 50 m range. The pointcloud data plus the vehicle pose are acquired and used to build a volumetric map of the environment and locate the vehicle within it.
As discussed in Section 4.3, the graph-based planning algorithm guides the vehicle towards unexplored areas within the map and provides vertices as terminal setpoints x ss for the NMPH local path planner. The design's robustness is increased by implementing the Obstacle Avoidance and Path Guidance algorithms proposed in Sections 4.4.2 and 4.4.3, respectively. Finally, the generated reference path from the motion planner is sent to the AirSim quadcopter for tracking. The NMPH continues updating the path during the vehicle's movement toward a setpoint. This allows to avoid dynamic obstacles and improves the tracking performance. Once the vehicle reaches a setpoint, the planning process is repeated until the environment is fully explored or the mission is interrupted by the operator. Figure 10 shows the paths generated by the graph-based and the NMPH path planners. The NMPH is seen to provide a smooth and optimal path as compared to the sharp-corner path generated by the graph-based planner. Moreover, the NMPH keeps updating the path dynamically from the start to the terminal point at a rate of up to 100 Hz, while the graphbased planner generates only one path between the two points. To reduce computational load, the NMPH algorithm rate is set to 5 Hz, which was found to be suitable in generating continuous and smooth paths in the environment. This rate of path regeneration also provides good path following performance in the presence of static and dynamic obstacles. A portion of the overall tracked trajectory between multiple vertices using the NMPH algorithm can be seen in Figure 11. Respecting the system dynamics provides smooth flight paths and thus reduces power consumption caused by abrupt changes in the trajectory.
The DLOM generates a continuously changing obstacle map modeled by virtual spheres as depicted in Figure 12. As discussed in Sections 4.4.1 and 4.4.2, mapped obstacles are represented by (continuously updated) inequality constraints within the optimization problem. The Obstacle Avoidance Algorithm helps in creating and updating a path that avoids the obstacles as shown in Figure 12. In the next simulation test, the quadcopter autonomously navigates an unexplored GPS-denied environment. Figure 13 shows the exploration mission performed by the quadcopter. The drone travels a total distance of 774.5 m while following smooth trajectories generated by our proposed algorithm. Meanwhile, the graph-based planner generated paths with a total length of 993.1 m for the same exploration mission. Table 1 and Figure 14 offer a mission performance comparison between using the graph-based planner solo versus using the graph-based planner integrated with our NMPH approach in terms of exploration time, average computation time of the generated paths, path lengths between terminal vertices, and average and total length of the generated paths. This comparison shows the impact of using the NMPH algorithm for reducing power consumption, total mission time, and unwanted abrupt motions while following the generated reference paths.  In the final simulation test, obstacle avoidance for a moving object is tested while the drone navigates through the environment. This is shown in Figure 15, where the continuous regeneration of the path by the NMPH algorithm enables the drone to safely navigate to the stabilization point. Figure 15. Obstacle avoidance for a moving object. The object (sphere) is moving to the left. The NMPH regenerates the red path continuously to avoid the object. The blue curve represents the flight trajectory of the drone.

Preliminary Real-Time Flight Test Results
For real-time hardware testing, a FlameWheel F550 hexacopter (DJI Technology, Shenzhen, China) was used. The vehicle is equipped with a Pixhawk 2.1 autopilot control board running the PX4 flight control software [46], and an onboard Jetson TX2 (NVidia, Santa Clara, CA, USA) computer running ROS. The communication between ROS and PX4 is established through MAVLink. A RealSense T265 stereo camera and a RealSense L515 LiDAR camera (Intel, Santa Clara, CA, USA) are mounted on the hexacopter to provide pose and RGB-D pointcloud data, respectively.
The preliminary flight test evaluates the path generation performance of NMPH running onboard a real drone, whose Jetson TX2 has lower computational power than the computer used in simulation. Also, local trajectory tracking and the functionality of the motion planner are tested in this experiment, as shown in Figure 16. Note that hardware flights in large-scale areas will be performed and evaluated in future work. In the current setup, the optimization solver within NMPH was able to provide continuous regeneration of the reference trajectories at rates approaching 70 Hz. This rate was reduced to 5 Hz to minimize the computational load on the onboard system. The graphbased motion planner was also tested by generating terminal points for NMPH. The latter sent the predicted reference trajectories to the onboard PX4, and the hexacopter was able to follow them. Figure 17 shows the hexacopter following a continuously regenerated reference path to a terminal vertex.

Conclusions
This paper presented a methodological motion planning approach for drone exploration in GPS-denied environments, which integrates our recently proposed NMPH path planning approach with a graph-based planner. The NMPH formulation employs the nonlinear system dynamics model with feedback linearization control inside an online optimization-based process to generate feasible, optimal and smooth reference trajectories for the vehicle. The performance of the overall motion planner is increased by introducing methods for robust path generation and dynamic obstacle mapping and avoidance.
The developed motion planner was evaluated through a series of simulation flights as well as a real-time hardware flight test to validate the performance of the proposed design on quadcopter and hexacopter drones navigating the environment. The results show the ability our algorithm to improve motion planning performance over conventional techniques and generate smooth and safe flight trajectories in a computationally efficient way.
Future work will include testing the proposed motion planning methodology inside large-scale GPS-denied environments such as subterranean mines.