Nonlinear Model Predictive Horizon for Optimal Trajectory Generation

: This paper presents a trajectory generation method for a nonlinear system under closed-loop control (here a quadrotor drone) motivated by the Nonlinear Model Predictive Control (NMPC) method. Unlike NMPC, the proposed method employs a closed-loop system dynamics model within the optimization problem to efﬁciently generate reference trajectories in real time. We call this approach the Nonlinear Model Predictive Horizon (NMPH). The closed-loop model used within NMPH employs a feedback linearization control law design to decrease the nonconvexity of the optimization problem and thus achieve faster convergence. For robust trajectory planning in a dynamically changing environment, static and dynamic obstacle constraints are supported within the NMPH algorithm. Our algorithm is applied to a quadrotor system to generate optimal reference trajectories in 3D, and several simulation scenarios are provided to validate the features and evaluate the performance of the proposed methodology.


Introduction
For autonomous vehicle systems, the ability to generate and track collision-free trajectories in unknown environments is a crucial task that is attracting both researchers' and companies' attention in a world witnessing steep advancements in this area. The need to generate real-time trajectories for highly nonlinear systems puts an emphasis on finding ways to efficiently predict trajectories which respect the system dynamics as well as internal and external constraints. One methodology which meets these requirements is nonlinear model predictive control.
Nonlinear Model Predictive Control (NMPC) is the nonlinear variant of Model Predictive Control (MPC), which was originally developed for process control applications and can be traced back to the late 1970s [1,2]. MPC, sometimes called moving horizon or receding horizon optimal control, involves using a system dynamics model to predict a sequence of control inputs over a time interval known as the prediction horizon. For nonlinear systems, NMPC can be used as an optimization-based feedback control technique [3], and it has been viewed as one of the few control strategies which can account for state and input constraints and respect the nonlinearities and coupling of the system dynamics.
NMPC is employed in applications including setpoint stabilization, trajectory tracking, and path following. In setpoint stabilization, the system is controlled to converge to a specified setpoint within a terminal region. NMPC for setpoint stabilization has been used in many applications, such as fluid level and temperature control [4,5]. In trajectory tracking, the system must track a time-varying reference, which is a more challenging problem. Some examples of this taken from the fields of aerial vehicles and medicine are presented in [6,7]. Meanwhile, the path following problem considers time-invariant reference trajectories, where the goal is to achieve the best possible tracking of the geometric path regardless of the time taken. A common example of a path following problem is controlling the end-effector of a robot manipulator to follow a prescribed path within its workspace, which is treated using MPC in [8,9]. Ref. [10] provides a comprehensive overview of the setpoint stabilization, trajectory tracking, and path following problems and their relevant features and challenges.
Within the model predictive control approach, the optimization problem solves for both the input and the state of the system over a finite time horizon. These local plans can then be combined online to generate a prediction of the state trajectory, which is used for motion planning [11]. The work in [12] considers the problem of point-to-point trajectory generation using linear MPC, while NMPC is used for trajectory optimization and tracking control in [13], where it solves the nonlinear problem using an iterative sequential linear quadratic algorithm to obtain the optimal feedforward and feedback control inputs to a hexacopter vehicle.
In the present work, we focus on the reference trajectory generation problem for a nonlinear system (in our case a drone) under closed-loop control by an existing control law design, which may be linear (e.g., a PID-based design) or nonlinear (e.g., the geometric tracking controller in [14]). Our paper presents the development of an algorithm which allows the generation of optimal trajectories based on the NMPC approach, but using a closed-loop system model consisting of the nonlinear plant connected to a state feedback linearization (FBL) control law. This proposed formulation is called Nonlinear Model Predictive Horizon (NMPH). The purpose of employing FBL within the NMPH is to reduce or eliminate the nonconvexity of the optimization problem relative to working directly with the nonlinear plant dynamics as in standard NMPC.
Please note that while NMPC provides an input to a nonlinear plant, our proposed NMPH computes a reference trajectory to be tracked by a closed-loop system consisting of a nonlinear plant connected to a feedback control law. This makes NMPH ideally suited for drones, which run a closed-loop control law in their onboard firmware to obtain a stable hover for the vehicle. Employing an NMPC design thus requires bypassing the onboard control system, which can lead to a loss of the drone in scenarios where the computer running the NMPC code has an operating system crash or timeout. An alternative is to include a model of the onboard control law into the plant model used by the NMPC, but the full details of this control law are often kept proprietary (as seen in [15], for instance), which in turn requires employing system identification techniques and their resulting uncertain models.
To understand the differences between the proposed method and approaches based on model predictive control, Table 1 shows a comparison between NMPC, NMPH and NMHE (Nonlinear Moving Horizon Estimation). NMHE is an optimization-based technique that inputs measurements, which may contain noise and other uncertainties, and outputs estimates of the nonlinear system state or parameters. Further information about NMHE is given in [16].
The research contributions of this paper are: • Formulating a novel motion planning approach (named NMPH), which employs a model of a closed-loop system under feedback linearization to efficiently solve for the optimal reference trajectory of the target closed-loop system; • Designing a feedback linearization control law for a model augmented with integral states to achieve a more robust performance in the presence of modeling uncertainties. • Implementing support for static and dynamic obstacles within the NMPH, enabling collision-free reference trajectory generation in unknown, dynamic environments; • Validating the ability of the system to generate optimal trajectories for the quadrotor vehicle in real time using realistic flight environment simulation scenarios.
The remainder of this paper is structured as follows. Section 2 presents the formulation of NMPH and the design of its state feedback linearization control law. The application of the algorithm to a quadrotor vehicle is presented in Section 3. Section 4 presents different simulation scenarios to evaluate and validate the proposed approach. Concluding remarks are given in Section 5.

Nonlinear Model Predictive Horizon
Our proposed Nonlinear Model Predictive Horizon (NMPH) is an optimization-based trajectory generation method based on Nonlinear Model Predictive Control (NMPC). Unlike NMPC, which yields an optimal feedback control law for a nonlinear system, the objective of NMPH is to generate optimal reference trajectories that a closed-loop system can follow.
The goal of NMPH is to generate a smooth trajectory which is continuously updated by solving an Optimal Control Problem (OCP) in real time while respecting the state and input constraints of the closed-loop system. The resulting optimization problem will be referred to as an Optimal Trajectory Problem (OTP).
An overview of the NMPH architecture is depicted in Figure 1. The Nonlinear System Model and the Nonlinear Control Law, representing the model of the plant and the upcoming feedback linearization control law design, are both involved in the solution of the optimization problem. The NMPH inputs the current system state and a desired setpoint stabilization and outputs an optimal reference trajectory by solving the OTP at each time instant t n .  Our proposed NMPH method: • Predicts the trajectory of a nonlinear closed-loop system; • Works in real-time using a specified time horizon; • Uses a feedback linearization control law to reduce the nonconvexity of the optimization problem; • Supports state and input constraints of the closed-loop system, and is able to account for environmental constraints such as dynamic obstacles; • Assumes that a stable terminal point is specified, and the state vector of the closed-loop system is available (measured or estimated), and • provides a combination of stabilization and tracking functionality: -Stabilization: provides a solution which guides the closed-loop system to a specified setpoint or terminal condition; -Tracking: generates a smooth reference trajectory for the closed-loop system to track or follow.
The NMPH provides a dynamic parameterization of the reference trajectory. This provides a continually evolving optimal reference trajectory from the current state of the closed-loop system to the terminal setpoint, which respects the system dynamics and environmental constraints such as dynamic obstacles.
In the following subsections, a formulation of the NMPH using a state feedback linearization control law is presented which produces optimal reference trajectories. First, the proposed NMPH algorithm structure and its constraints are discussed, then the design of the state feedback linearization control law is performed.

NMPH Algorithm
Consider a discrete-time model of the nonlinear closed-loop system at time instant t n , where x(n) ∈ X ⊆ R n x are the system states, z(n) ∈ Z ⊆ R n z are the system outputs, u(n) ∈ U ⊆ R n u are the system inputs, and z re f (n) ∈ Z is the reference trajectory at time instant t n . We assume that the system outputs are a subset of the system state vector, Z ⊆ X, in our case the drone's position and yaw angle. The map f : X × U → X represents the discrete-time plant dynamics and x(n + 1) are the states at the next sampling instant. g : X × Z → U is the control law that is used to steer the system output to follow the reference trajectory.
A closer look at the NMPH structure is shown in Figure 2. The OTP solver uses the plant dynamics (1a), output (1b) and control law (1c) plus any applicable constraints (e.g., obstacles in the environment) to predict the sequence of future states and outputs of the closed-loop system model over the finite time horizon. In order to differentiate between the variables of the actual nonlinear system and its model within the NMPH, the latter uses subscripts as seen in Figure 2 and discussed below in Algorithm 1. As shown in Figure 2, the sequence x k provides a prediction of the trajectory of the system (1) between its current state x(n) and the setpoint stabilization x ss , which is regenerated each time the OTP is solved. Meanwhile, the sequences z k andẑ k calculated by the OTP are defined as follows: • z k is the predicted output trajectory sequence which represents a subset of the vector entries of the state sequence x k (in our case the quadrotor's position and yaw angle). It is important to distinguish between z(n), the current output of the actual closed-loop system, and z k , the predicted output sequence produced by the OTP solution. •ẑ k is the estimated reference trajectory sequence which is calculated by solving an optimization problem inside the NMPH. Usingẑ k as the reference trajectory for the actual closed-loop system yields smoother flight paths plus the ability to deal with constraints such as obstacles in the environment. In this way, z re f (n) in (1c) acquires its value from the first predicted point of theẑ k sequence.
As illustrated in Figure 3, the trajectory sequences z k andẑ k will converge to each other and towards the terminal point. Thus, either of them can be taken as the reference trajectory for the actual closed-loop system.
The predicted future of the closed-loop system's behaviour is optimized at each sampling instant n and over a finite time horizon k = 0, 1, . . . , N − 1 of length N ≥ 2. The system is assumed to follow the first j elements of the predicted optimal trajectory sequence until the next sampling instant, at which time the trajectory is recalculated, and so on. The predicted state x k and the control input u k sequences at each time instant n are calculated as Now, consider µ(x k , u k ) : R n x × R n u → R n z which is the mapping from the predicted state and control law sequences to the estimated trajectory sequence. This is written aŝ Our objective is to use an optimization methodology to determine the estimated reference trajectoryẑ k and the state x k (whose subset is z k ) sequences that both converge to the stabilization setpoint x ss = x re f N . To do this, a cost function J(x k ,ẑ k ) is chosen to penalize the deviation of the system states from the reference, and the predicted output from the estimated reference trajectory, as shown below in Algorithm 1 and the cost function in (5).
Algorithm 1 NMPH algorithm with stabilizing terminal condition x ss .
Solve the following Optimal Trajectory Problem, if x k → x ss then (estimated trajectory converging to terminal condition) n ← n + 1; else break; end end In Algorithm 1, N is the prediction horizon, x(n) is the current measured state at a time instant t n , which represents the initial condition of the OTP, and L(·, ·) and E(·) are the stage cost function and the terminal cost function, respectively. The constraint sets X , U , and Z will be defined in Section 2.2, and the inequality constraints O i (x k ) ≤ 0 allow modeling a set of p static and dynamic obstacles. The optimization process of Algorithm 1 is summarized as follows:

1.
Measure or estimate the actual closed-loop system's current state x(n); 2.
Obtain a prediction of the reference trajectory sequenceẑ k for an admissible control input by minimizing the cost function over the prediction horizon subject to the dynamics of the closed-loop system plus state and input constraints; 3.
Send the predicted reference trajectory sequence to the closed-loop system for tracking; 4. Repeat until the system reaches the desired terminal point or encounters an infeasible optimization solution.
Within NMPH, convergence can be achieved by proper choices of the stage cost L(x,ẑ) and the terminal cost E(x) for a setpoint stabilization problem [10]. The requirements for these cost functions are summarized below in Assumption 1: The stage cost is continuous and bounded from below, meaning that L(x,ẑ) ≥ α for all (x,ẑ) ∈ X × Z\{0, 0} and L(0, 0) = 0. • The terminal cost is a positive semi-definite function, which is continuously differentiable and satisfies ∂E ∂x The terminal constraint set E N is a subset of the state constraint set X and it is compact.

•
For every x N ∈ E N , there exists an estimate of the reference trajectoryẑ k and predicted output trajectory z k sequences where both converge to the terminal setpoint and stay within the terminal region E N .
Our cost function is chosen to penalize the deviation of states from their reference values, and the deviation of the predicted output trajectory from the estimated reference trajectory, as follows: where x re f k ∈ X is the reference states sequence used in the optimization problem. The ter- W N with its weighting matrix W N steers the system towards the stabilization setpoint x ss = x re f N , while the stage cost function L(x k ,ẑ k ) uses the weighting matrices W x and W z to penalize deviations of the states and outputs, respectively. The entries of the weighting matrices are selected to adjust the relative importance of these three factors for the optimization problem.
A visual interpretation of the NMPH process can be seen in Figure 3, where the path planning task is to guide the closed-loop system described by (1) to follow a predicted trajectory from x(n) (at time instant t n ) to x ss (at a future time t n+N ) while minimizing the cost function (5) Figure 3. NMPH process at time t n , which predicts the optimal trajectory until time t n+N . The difference between the predicted output z k and the estimated referenceẑ k trajectories is penalized to ensure their convergence towards each other.
Some of the features of the predicted trajectory sequences are: • The first j elements of the reference trajectory sequenceẑ k are passed to the closedloop system, which is different from the NMPC control problem where only the first element of the predicted control sequence u * (n) is used. This provides some flexibility in choosing the rate at which the OTP is solved, which addresses the computation time issue of solving a Nonlinear Program (NLP). • Thanks to recent advancements in computing, specifically graphics processing units (GPUs), the computations required for optimization problems can be performed very quickly, meaning solving the NLP problem for OTP or even OCP can be done in realtime. Irrespective of this, OTP has an advantage over OCP since the computational power requirement can be controlled by adjusting the rate of solving the optimization problem while allowing the vehicle to track the first j elements of the estimated reference trajectory.
• While the tailing N − j elements of the reference trajectory sequence are discarded, the entire trajectory is still required to be calculated over the prediction horizon.
The reason for this is that optimizing over the full horizon ensures a smooth trajectory from the initial state to the terminal setpoint. • The optimization problem is solved iteratively using a reliable and accurate optimization approach based on the multiple shooting method and sequential quadratic programming.
The discrete-time representation (1) shown in Section 2.1 was presented to understand the problem formulation analysis and NMPH development, with the optimization being performed in the discrete-time domain as in sampled-data MPC [17]. Conversely, a continuous-time representation is important for NMPH implementation since our chosen optimization algorithm (ACADO [18]) has the ability to discretize the system equations.
Consider the continuous-time nonlinear closed-loop systeṁ where x(t) ∈ R n x are the system states and u(t) ∈ R n u are the system inputs. z(t) ∈ R n z are the system outputs, assumed to be a subset of the state vector x(t). The maps f and g are the nonlinear system dynamics and control law, respectively. The optimization problem presented in Algorithm 1 can be rewritten in the continuoustime domain as min The continuous-time optimization problem must be solved over the full time interval τ ∈ [t n , t n + T]. As discussed above, the closed-loop system will be asked to track a portion of the resulting reference trajectory running from t n to t n + t j , where t j < T and where t j can be adjusted online to affect the trajectory generation and tracking performance and control the computational power required by the optimization.

NMPH Constraints
Support for constraints within the NMPH algorithm provides full control over the optimization problem. The constraints can apply to the state, input and output trajectories, and also model dynamic obstacles.
State constraints belong to the subset X ⊆ X, while outputs, which are assumed to be a subset of the state vector entries, belong to the subset Z ⊆ X . U (x,ẑ) ⊆ U is defined by physical input constraints in the system, for instance due to actuator limits.
The objective of introducing the constraint sets are to ensure that the optimized trajectories are bounded and lie within their allowable ranges. The following assumptions regarding the constraint sets are made [3,10]: Assumption 2 (Closed and Bounded Sets). The constraint sets of the state X and the reference trajectory Z are closed, and the control constraint set U is compact.

Assumption 3 (Differentiability and Lipschitz).
The system dynamics f (x, u) : R n x × R n u → R n x is continuously differentiable for all (x, u) ∈ X × U . Also, f (x, u) and the reference trajectory mapping µ(x, u) : R n x × R n u → R n z are considered to be locally Lipschitz.
Assumption 4 (Uniqueness). For any element of the estimated reference trajectoryẑ resulting from a control input u and any possible initial states x 0 ∈ X , the system dynamics produce a unique and continuous solution.
Assumption 5 (Viability). For each state x ∈ X and estimated reference trajectoryẑ ∈ Z there exists a control u = g(x,ẑ) ∈ U such that f (x, u) ∈ X .
Taking Assumptions 2-4, we can make the following definition: Definition 1 (Admissibility). In the discrete-time OTP, consider the system dynamics x k+1 = f (x k , u k ) and the control law u k = g(x k ,ẑ k ), which maps the state to the estimated trajectory as µ(x k , u k ) :=ẑ k ∈ Z, with the constraint sets for state X ⊆ X, control U (x,ẑ) ⊆ U, and reference trajectory Z ⊆ X.

•
The system states x k ∈ X and the estimated reference trajectoryẑ k ∈ Z are called admissible states and trajectories, respectively, and the control u k = g(x k ,ẑ k ) ∈ U are called admissible control values for x k andẑ k . Hence, the admissible set can be defined as • The control sequence u k and its associated estimated reference trajectoryẑ k and state sequence x k from the time t 0 of the initial value x 0 ∈ X up to time t N of the setpoint stabilization value x N are admissible if (x k ,ẑ k , u k ) ∈ Y for k = 0, . . . , N − 1 and x N ∈ X .

•
The control law is called admissible if g(x k ,ẑ k ) ∈ U for all x k ∈ X andẑ k ∈ Z.

•
The estimated reference trajectory is called admissible if µ(x k , u k ) ∈ Z for all x k ∈ X and u k ∈ U .
A feasible problem is defined as an optimization problem in which there exists at least one set of solutions that satisfies all the constraints [3]. Based on Assumptions 2-5 and the admissibility Definition 1, the feasibility of the OTP is determined by Theorem 1.

Theorem 1 (Feasibility).
If the OTP is feasible for an initial condition x 0 and the cost function for a setpoint stabilization problem with associated constraint sets satisfy Assumptions 2-5, then the OTP is recursively feasible, meaning the state converges to the stabilizing terminal point x N , and both the estimated reference trajectoryẑ k and predicted output trajectory z k sequences converge toward the terminal stabilization setpoint under sampled-data NMPH.
Proof. The solution of the OTP is feasible for an initial value x 0 ∈ X to a stabilizing terminal point x N ∈ X if the sets over which we optimize are nonempty. The viability considered in Assumption 5 for Z and X implies that the OTP is feasible for each initial state x 0 and consequently ensures that the control g(x k ,ẑ k ) is properly defined for each x ∈ X andẑ ∈ Z. Since the OTP is performed with respect to admissible predicted state trajectory and control law sequences (as stated in Definition 1), the future behavior of the system is consequently feasible.
It is important to note that the solution of the OTP is viable in the case of a stabilizing terminal constraint, meaning the NMPH problem is confined to feasible subsets since the terminal constraint is viable. The closed-loop system embedded within NMPH satisfies the desired constraints, which will lead to a feasible solution in the OTP. The stability of a feasible solution is governed by Theorem 2.
Theorem 2 (Stability). Assume that the OTP within Algorithm 1 satisfies Assumption 1 and has a feasible solution as determined by Theorem 1. Then the optimized solution leads to a stable prediction of the system state x k and estimated reference trajectoryẑ k .
Proof. Assume that at any time instant t i , i ∈ [n, ..., n + N − 2], x * i andẑ * i are the optimal solutions of the OTP in Algorithm 1, with their associated control value u * i . A Lyapunovlike function is defined as The cost function in (5) guarantees a positive semi-definite Lyapunov-like candidate [19], meaning that 0 ≤ V(x i ,ẑ i ) < ∞, which can be written at time t i as Considering the solution at a subsequent time t i+δ , the feasible solution of the cost function is Since where E(x n+N+δ ) − E x * n+N + L(x n+N−1+δ ,ẑ n+N−1+δ ) ≤ 0 based on the inequality considered in Assumption 1. Therefore, this implies that the rate of change in the Lyapunov-like function is decreasing with time.
Hence, the solution of the OTP problem in Algorithm 1 converges asymptotically to the terminal setpoint.
To perform safe navigation, it is necessary to include the obstacle constraints within the optimization problem. The inequality constraint presented in (4f) accounts for the space that the predicted trajectory should avoid. For instance, the obstacle where x pos k ∈ R 3 is the predicted vehicle position over the prediction horizon, o i ∈ R 3 are the position of the obstacles centers, and i represents the safety distance between the i th obstacle and the vehicle, which accounts for the drone and obstacle sizes, including the safety tolerance.

NMPH Closed-Loop Form with Feedback Linearization Control Law
In this section, we cover the feedback linearization-based control law design within the NMPH. The nonlinear system studied in this work, a quadrotor drone, is a Multi-Input Multi-Output (MIMO) system. We thus start by reviewing the method of feedback linearization for a class of MIMO systems.
Consider a MIMO nonlinear control-affine system of the forṁ in which x ∈ R n x , and f , g 1 , . . . , g n u are smooth vector fields in R n x . G(x) is an n x × n u matrix and its rank at x = 0 is rank G(0) = n u . For notation simplicity in the following sections, take n x ≡ n and n u ≡ m.
Prior to feedback linearization analysis, the following theorems and definitions are presented in the context of differential geometry.

Definition 2 (Diffeomorphism).
A diffeomorphism is a differentiable map ϕ between two manifolds M and N , such that ϕ : M → N is one-to-one and onto (bijective), and its differentiable inverse map ϕ −1 : N → M is bijective as well. ϕ is called a C ω diffeomorphism if it is ω times continuously differentiable. If ω = ∞, then ϕ is called a C ∞ smooth map [20].
A change in coordinates can be defined globally or locally. A map ξ : R n → R n is called a global diffeomorphism between two coordinates if, and only if, the determinant det ∂ξ ∂x = 0 for all x ∈ R n , and lim x →∞ ξ(x) = ∞ [21]. For a local change in coordinates, let U be an open subset of R n with ξ : U → R n . If det ∂ξ ∂x = 0 at some x ∈ U , then there exists V ⊂ U , which is an open set that includes x such that the map ξ : V → V (ξ) is a diffeomorphism [20].
A specific class of nonlinear systems can be transformed into a linear state feedback controllable form by satisfying the conditions to be presented in Theorem 3. To facilitate our understanding of this process, we first define a nonsingular state feedback transformation and controllability indices in Definitions 3 and 4, respectively [20]: Definition 3 (Nonsingular state feedback transformation). Consider V 0 ⊂ U 0 which is a neighborhood of the origin. On the one hand, the nonsingular state feedback is defined as: where the function β(x) is smooth from V 0 into R n and β(0) = 0. D(x) −1 is the inverse of a nonsingular m × m matrix in V 0 . On the other hand, a local diffeomorphism in V 0 , which is defined by (ξ = T(x), T(0) = 0), exists if, and only if, in U 0 the distributions G l = span ad j f g i : 1 ≤ i ≤ m, 0 ≤ j ≤ l are involutive and of constant rank for 0 ≤ l ≤ n − 2, and the rank of the distribution of n − 1 is rank G n−1 = n.
A transformation that contains a nonsingular state feedback and a local diffeomorphism is called a nonsingular state feedback transformation.
Employing Definitions 3 and 4, the sufficient conditions for feedback linearization and the form of its feedback transformation are given in Theorem 3 [20].
and consequently the nonsingular state feedback transformation is In Theorem 3, L f ϕ = dϕ, f is the Lie derivative which can be realized as a directional derivative of the smooth function ϕ along the smooth vector field f , and ad r f g = [ f , ad r−1 f g] is the iterated Lie bracket between the vector fields f and g.
Using the feedback linearization control law yields a locally stable closed-loop system model within the NMPH structure. One of the main motivations for choosing the feedback linearization approach in our work was to reduce nonlinearities. Since the NMPH works with a closed-loop system model, the optimization problem will have a reduced nonconvexity as compared to working directly with the nonlinear plant model as in NMPC. This will lead to better optimization performance in terms of computation time to find feasible solutions. Modeling errors, system uncertainties, and external disturbances can affect the performance of the state feedback linearization control law. For instance, a constant wind gust applied to the drone while following a trajectory will lead to an offset in the corresponding position outputs. The baseline feedback linearization controller is unable to compensate for this type of offset, which will consequently degrade the accuracy of the predicted states and reference trajectories produced by the NMPH. To overcome this issue, an extension of the state feedback linearization can be achieved by augmenting the system model with integral states of the position vector and yaw angle of the drone. The validity of using this extension is demonstrated by using Theorem 3 in the feedback linearization design Section 3.2.

System Model
In this section, the NMPH is developed for a quadrotor vehicle connected to a feedback linearization control law. A standard nonlinear rigid-body dynamics vehicle model is adopted in this work from [22]. This model is simplified by ignoring drag forces, rotor gyroscopic effects, and propeller dynamics. A more comprehensive nonlinear model that includes those assumptions can be considered in future work.
The rigid-body kinematics and dynamics are developed using two reference frames, which are the fixed navigation frame N and the moving body frame B (fixed to the quadrotor's Center of Gravity, CG). The bases of both frames are selected based on the North, East, and Down (NED) directions in a local tangent plane as the orthonormal vector sets {n 1 , n 2 , n 3 } and {b 1 , b 2 , b 3 } for the navigation and body frames, respectively. The two basis are depicted in Figure 4. In general, any configuration of a rigid body in space belongs to the Special Euclidean group SE(3), the product space of the rigid-body orientation and position (R nb , p n ) ∈ SO(3) × R 3 = SE(3), where the Special Orthogonal group SO(3) is defined as SO(3) = {R ∈ R 3×3 | RR T = R T R = I, det (R) = +1}, and the rotation matrix of B with respect to N is denoted as R nb ∈ SO(3). To represent the orientation, the ZYX Euler angle parameterization is employed. Based on the roll-pitch-yaw Euler angles η = [φ, θ, ψ] T , the rotation matrix can be written as where s (·) = sin(·) and c (·) = cos(·). Note t (·) = tan(·) will be used in (20). The most prominent issue of using Euler angles is the singularity when the parameterization loses injectivity at θ = π/2 + kπ, k ∈ Z. A recent work by [23] tackles this issue by using non-Euclidean rotation groups in the NMPC, but in this work the problem is addressed simply by adding state constraints within the NMPH optimization problem in (4d).
The rigid-body kinematics and dynamics are modeled as shown beloẇ where p n , v n ∈ R 3 are the vehicle's position and velocity with respect to the inertial frame N , respectively. m is the mass of the quadrotor, g = 9.81 m/s 2 is the gravitational acceleration, and the vehicle's mass moment of inertia matrix is assumed to be diagonal with J = diag([J 1 , J 2 , J 3 ]). The system input vector is [ū, τ b ] T , whereū = ∑ 4 i=1 f i > 0 is the total generated thrust in the direction of −b 3 , and τ b = [τ b1 , τ b2 , τ b3 ] T are the torques created by the rotors about the body frame axes. ω b andω b are the angular velocity and acceleration vectors, respectively. The operator S(·) : R 3 → so(3) maps a vector to a skew-symmetric matrix such that S(a)b = a × b for a, b ∈ R 3 .

Quadrotor Feedback Linearization Control
The system represented in (18) has to be described in a nonlinear control-affine form as shown in (12). The state and input vectors are The control-affine form is as described beloẇ where x 10 x 12 As stated in Theorem 3, the controllability indices {r 1 , r 2 , r 3 , r 4 } of the system need to be found first to verify whether the system is state feedback linearizable or not. To find them, first m j , 0 ≤ j ≤ 4 is computed based on the distributions G l = span ad j f g i : 1 ≤ i ≤ 4, 0 ≤ j ≤ l for l = 0, ..., 10. Therefore, the calculated controllability indices are {5, 2, 2, 2}. These need to be checked against the conditions of Theorem 3. G 3 is found to not be involutive and dim G 4 = 11 = n. Hence, the system is not state feedback linearizable as conditions (ii) and (iii) are not satisfied. It can also be noted that ∑ 4 i=1 r i = 11 = n. From the above, the system is not locally state feedback linearizable, meaning it cannot be transformed into a linear controllable system written in Brunovsky controller form. The system states and inputs are thus reformulated by augmenting the state vector with two additional states, which are the thrust x 13 =ū and its rate x 14 =u, and replacing the thrust byü in the input vector. The same approach was successfully validated in [24][25][26]. Furthermore, the system is extended by including the integral states ζ defined byζ p = p n , ζ ψ = ψ, as shown below: Based on the extended system's vectors, the presented state space control-affine form in (21) can be written as followsẋ =f (x) +Ḡ(x)u (22) wheref x 10 x 12 x 10 x 11 The controllability indices of the extended system are {5, 5, 5, 3}, where ∑ 4 i=1 r i = 18 = n. With r 1 = 5, the distribution G 3 is found to be involutive and dim G 4 = 18 = n, meaning all conditions of Theorem 3 are satisfied. Therefore, the system is state feedback linearizable, meaning it is locally transformable into a linear controllable system in Brunovsky controller form about the origin, as shown below in (23).
The existence of four smooth functions is guaranteed, representing the linearizing position and yaw outputs {ϕ (15) is nonsingular about the origin. The resulting linear system is written aṡ To determine the domain of the transformation, the determinant of the matrix D(x) is calculated to be det D(x) = −ū 2 cos (φ) m 3 J 1 J 2 J 3 cos (θ) Therefore, the domain for a nonsingular solution is {ū = 0, − π 2 < φ < π 2 , − π 2 < θ < π 2 }. As discussed earlier in Section 3.1, these constraints will be included within the NMPH optimization problem in (4d).
The actual control law is obtained using (16), givinḡ where v k are feedback inputs defined as where the error e ξ i is defined as the difference between the desired and the actual feedback linearized system state e ξ i = ξ d i − ξ i i = 1, ..., n. The errors can be interpreted as the differences between the desired outputs p n d , ψ d T with their rates and integrals, and the corresponding actual system outputs [p n , ψ] T with their rates and integrals.

Trajectory Generation Using NMPH with Feedback Linearization
The optimization problem of NMPH uses the system dynamics (18) and state feedback linearization control law (24) to solve for an optimal reference trajectory for the resulting closed-loop system. The OTP, which is written in the continuous-time domain and presented in (7), is solved using an efficient direct multiple shooting technique [27]. The solver discretizes the system dynamics, control law, and state and input constraints over the prediction horizon into k = n, ..., n + N at each time instant t n , as discussed in (4). An overview of the process from the optimization problem formulation to the trajectory generation is given in Figure 5.

Nonlinear Program (NLP)
Solves the optimization problem that includes nonlinear function and/or nonlinear constraints using Sequential Quadratic Programming (SQP)

Trajectory Generation
Continuous online generation of the desired trajectory for the system The optimization problem operates on the dynamics of a closed-loop system which may not be convex. Therefore, the optimization problem is solved iteratively using a Sequential Quadratic Programming (SQP [28]) approach of splitting the problem into a sequence of subproblems, each of which solves for a quadratic objective function subject to linearized constraints about their operating point using the qpOASES solver [29].
To ensure local convergence of the SQP, the quadratic function of the subproblem has to be bounded within a feasible region of the optimization problem sets. Starting from an initial condition x 0 , the optimization variables should be sufficiently close to the terminal condition x ss ; then, the sequence x k generated by the NMPH converges to the terminal condition at a quadratic rate.
The purpose of using the feedback linearization-based control law within the NMPH is to reduce the nonconvexity of the optimization problem. Assuming a perfect system model, the closed-loop system contains the linear Brunovsky system form as shown in (23), leading to an optimization problem in which feasibility and stability of the optimized solution and the computational power required to solve the optimization problem are notably improved relative to working with a nonlinear open-loop system as in standard NMPC.

Simulation Results
In this section, several simulations are presented to validate the proposed NMPH approach to generating optimal trajectories for a quadrotor vehicle.
The Robot Operating System (ROS) [30] is the base environment used to implement our algorithm. It is a platform that integrates different software packages or frameworks by handling communication between them and the host hardware. The ACADO Toolkit [18] is used for dynamic optimization in this work. It allows users to write their code within a self-contained C++ environment and generates the nonlinear solver that can solve the optimization problems in real time. The compiled codes run within ROS to communicate with either a simulation model or the actual hardware [27]. For testing and validation of the proposed approaches on a quadrotor vehicle, we used the AirSim simulator [31], which is an open-source software that includes a physics engine and provides photo realistic images.
The NMPH optimization problem (7) was written in C++ code using ACADO and compiled into a highly efficient C code that is able to solve the optimization problem online. The AirSim simulator, ACADO optimization solver, and ROS environment run on a system with an Intel Core i7-10750H CPU @ 2.60-5.00 GHz equipped with the Nvidia GeForce RTX 2080 Super (Max-Q) GPU. The prediction horizon of the optimization problem was set to N = 40 with a sampling period of 0.2 s, which was found to be sufficient for the purposes of trajectory generation. The cost function weights W x , W z , and W N were adjusted heuristically to ensure a balanced trajectory generation performance towards the terminal setpoint.
The initial state of the quadrotor is acquired from the AirSim simulator and sent to the NMPH solver. The solution of the optimization problem is sent back to AirSim as a reference trajectory for the vehicle. The 3D visualization tool for ROS (rviz) is used to monitor and visualize the simulation process. Figure 6 shows the network architecture of the nodes and topics employed for running the simulation. In the following sections, various simulation scenarios are provided to verify the features and evaluate the performance of the proposed NMPH approach. Note the NED frame representation used within NMPH is converted to ENU (East, North, and Up) representation used by AirSim, and so the simulation results will be presented using the ENU convention as well.

Predicted Output and Estimated Reference Trajectories
The purpose of this simulation is to show the convergence of the estimated referenceẑ k and predicted output z k trajectories toward the setpoint stabilization x ss while minimizing the difference between them. The quadrotor is assumed to start at position p n = [0, 0, 0.2] T m and NMPH is used to generate a trajectory to the terminal setpoint p d = [6, −3, 5] T m, as shown in Figure 7. The plots depict a sequence of the optimal predicted trajectory z k , k = 0, ..., N and the estimated reference trajectoryẑ k , k = 0, ..., N − 1 produced by the NMPH. It is important to mention that the convergence of the estimated reference trajectory to the terminal point ensures that the closed-loop system is steered to the desired endpoint.

Trajectory Generation and Initial Conditions
In this simulation, the generated trajectory is investigated for different initial conditions. The initial conditions being tested are related to the quadrotor's kinematics, where the vehicle is commanded to move in a straight path between [4, 2, 0] T and [8,8,8] T while changing its speed |v| linearly from 0 to 1.5 m/s, as shown in Figure 8. Note the objective is not to track generated trajectories, but just to observe the behaviour of OTP solutions towards the stabilization setpoint p d = [5, 10,5] T . The solution of the optimization problem for eight different trajectories are plotted in Figure 8 to show the effect of the initial conditions on them. The resulting solution of each one shows a trajectory which convergences smoothly to the stabilization setpoint while taking into consideration the initial position and velocity of the system. Commanding the quadrotor to track generated trajectories which account for its initial conditions will reduce jerky flight motions and therefore reduce flight power consumption, which is especially important for exploration missions.
Using the setup described in Section 4, our NMPH achieves a 250 Hz generation rate, meaning a reference trajectory is generated every 4 ms. If running on lower-powered hardware, the computational power can be minimized by reducing the rate of trajectory generation, which still provides a smooth reference trajectory for the vehicle.

Trajectory Tracking
In this simulation, the quadrotor's trajectory tracking and static obstacle avoidance performance are examined. First, the vehicle is commanded to track a continuously updated trajectory generated on the fly by the NMPH algorithm while avoiding static obstacles, as shown in Figure 9. Each static obstacle is considered to be a sphere of 1 m in diameter. A radial allowance of 1 m is considered for the obstacle to avoid crashing into it. Hence, the constraint of each obstacle represents a sphere with a diameter of 3 m, which makes the safety distance = 1.5. The smooth tracking performance while avoiding the obstacles can be seen in Figure 9, which shows the importance of using the NMPH in regenerating the trajectory while tracking it. In the second study, the regeneration process of the predicted trajectory is limited to one regeneration in order to examine its effect on the tracking performance while avoiding the obstacles. The simulation result is depicted and explained in Figure 10. The continuous regeneration of the reference trajectory provides optimal flight paths in real time based on the system's state. This ability also enables handling dynamic obstacles, as shown next in Section 4.4. Selected predictions over 2 s are shown in Figure 11d, which illustrates the smooth regeneration of the trajectories while avoiding the dynamic obstacle. It is important to note that about 500 trajectories are generated in 2 s.  (c) (d) Figure 11. Dynamic obstacle avoidance for a 2 m spherical obstacle that moves at a velocity of 0.5 m/s in the y-axis direction starting from the initial position (3, 0, 0.5) m. (a-c) show the continuous regeneration of the NMPH predicted optimal trajectory, which avoids the moving obstacle, and (d) depicts the smooth regeneration process for a selected number of the trajectory updates.

Conclusions
This paper proposed a novel reference trajectory generation method for a nonlinear closed-loop system (in our case, a piloted quadrotor drone) based on the NMPC approach. The proposed formulation, called NMPH, applies a feedback linearization control law to the nonlinear plant model, resulting in a closed-loop dynamics model with decreased nonconvexity used by the online optimization problem to generate feasible and optimal reference trajectories for the actual closed-loop system. The feedback linearization design includes integral states to compensate for modeling uncertainties and external disturbances in the system. The proposed NMPH algorithm supports both static and dynamic obstacles, enabling trajectory generation in continuously changing environments.
The NMPH was implemented on a simulated quadrotor drone and validated to generate 3D optimal reference trajectories in real time. Four different simulation scenarios were carried out to evaluate the performance of the proposed method. Convergence of the predicted and estimated trajectories, trajectory generation under different initial conditions, trajectory tracking performance, and the ability to navigate around static and dynamic obstacles were validated through simulation results.
Future work will include testing the NMPH methodology with alternative nonlinear control law designs such as backstepping, using a more detailed dynamics model of the vehicle, and implementing and validating the proposed method onboard hardware drones, which has become feasible thanks to GPU-equipped single-board computers such as NVIDIA's Jetson Xavier NX.