Survey on Motion Planning for Multirotor Aerial Vehicles in Plan-Based Control Paradigm

: In general, optimal motion planning can be performed both locally and globally. In such a planning, the choice in favor of either local or global planning technique mainly depends on whether the environmental conditions are dynamic or static. Hence, the most adequate choice is to use local planning or local planning alongside global planning. When designing optimal motion planning, both local and global, the key metrics to bear in mind are execution time, asymptotic optimality, and quick reaction to dynamic obstacles. Such planning approaches can address the aforementioned target metrics more efﬁciently compared to other approaches, such as path planning followed by smoothing. Thus, the foremost objective of this study is to analyze related literature in order to understand how the motion planning problem, especially the trajectory planning problem, is formulated when being applied for generating optimal trajectories in real-time for multirotor aerial vehicles, as well as how it impacts the listed metrics. As a result of this research, the trajectory planning problem was broken down into a set of subproblems, and the lists of methods for addressing each of the problems were identiﬁed and described in detail. Subsequently, the most prominent results from 2010 to 2022 were summarized and presented in the form of a timeline.


Introduction
Adroit motion planning of little flying creatures, such as birds and butterflies, is an extraordinarily demanding task for several reasons, including aggressive maneuvers.An example of such a high-speed maneuver need is one in particularly tight spots where the environment is obstacle-rich.Researchers have been trying to replicate similar maneuvers using two different types of aerial vehicles: conventional and unconventional.In this research, we deal with conventional aerial vehicles, for instance, unmanned aerial vehicles (UAVs), multirotor aerial vehicles (MAVs), etc.Recent progression in computation capabilities and embedded sensing has been boosting the procedure of mimicking natural flying animals; this advancement has enabled plenty of new opportunities in diverse fields: inspection, autonomous transportation, logistics, delivery, aerial photography, post-disaster, and medical services.However, optimal motion planning remains a crucial task in all the fields listed above.In optimal motion planning, the environmental reasoning cannot be predictable, since environmental conditions change rapidly.Hence, there are various challenges to be addressed to obtain highly efficient and optimal motion planning.In this paper, we mainly focus on how researchers have been addressing these challenges in optimal motion planning to obtain robust navigation in various domains for multirotor aerial vehicles (MAVs).
In most industrial applications, the environment is either fully or partially unexplored, and unpredictable events can occur at any time due to a variety of factors.Therefore, a fast and accurate optimal motion planning technique is required to handle these unexpected problems in real time.The optimal motion planning problem is generally divided into three subcategories: path planning followed by smoothing, kinodynamic search-based trajectory generation, and motion-primitive-based approaches.Of these three, plan-based control approaches are the most widely used and efficient way to address the problem considered, compared to the other two approaches.Numerous plan-based control strategies have been proposed in the past decade, with promising results.This is one of the key motivating factors for reviewing plan-based control, especially for industrial multirotor aerial vehicles (MAVs).Most industrial multirotor aerial vehicles (MAVs), such as quadrotors, have low-level controllers, for example, PX4 [1], DJI [2], that can be operated independently irrespective of high-level execution commands.Moreover, such controllers reduce the overhead and complexity of developing high-level planning algorithms due to their independence.In other words, the same planner can be deployed on different firmware by implementing an interface between a high-level planner and a low-level controller.Thus, we narrowed down our study to considering only plan-based control approaches (Figure 1), particularly in application to industrial multirotor aerial vehicles (MAVs).The main limitation of multirotor aerial vehicles (MAVs) is low flight time.Hence, a multirotor aerial vehicle (MAV) should be capable of executing robust, agile, and aggressive maneuvers while ensuring dynamic feasibility and guaranteeing smoothness of the trajectory in low flight time.Furthermore, trajectory plotting should be performed within an obstacle-free zone at high speed to handle a given mission effectively.Such behavior is imposed by adhering to a set of constraints.If and only if the constraints are incorporated appropriately, desired needs can be fulfilled.Obtaining the right constraints at the right moment and applying appropriate control sequences to improve motion quality is the key objective of any plan-based control approach.However, the procedure of obtaining such right constraints is an open research problem due to its complexity and numerous other challenges that should be handled simultaneously.For example, the multirotor aerial vehicle (MAV) has been widely employed in video-making-related fields in recent years, with cinematographic aerial shooting being one of the popular areas of interest during the last five years.In such shooting, generating smooth, obstacle-free trajectories is the main challenge.Various other challenges exist, and most of them are application-specific.In this work, we examine the most common problems related to trajectory planning applications in the paradigm of plan-based control, as well as how researchers have been alleviating those problems by proposing compelling solutions.
In optimal trajectory planning, trajectory generation and controlling the multirotor aerial vehicle are strongly interconnected.For multirotor aerial vehicles (MAVs), the trajectory generation process is relatively easy due to the dynamic properties of the multirotor aerial vehicles (MAVs).When dynamic obstacles are incorporated, the trajectory has to be refined at a high rate in order to keep a smooth maneuver despite increased computational demands.Moreover, understanding close-in obstacles' positions relative to the multirotor aerial vehicle (MAV) is crucial for making decisions in real time; this raises a new challenge: the one of the rapidity and accuracy of relative environment reconstruction, which essentially is how obstacle constraints are added to the problem formulation.Yet another challenge is of the impact of the obstacles and constraints on the smoothness and dynamic feasibility of the generated trajectory.After conducting an extensive literature review on the topic of trajectory planning for multirotor aerial vehicles (MAVs), we were able to isolate basic building blocks that are essential for optimal motion planning, as shown in Figure 2.Each of the primary components plays a key role in the process of trajectory generation.• Kinodynamic and kinematic enable, A-star, RRT-star, FMT-star • Incorporate motion primitive • Minimum-snap • B-spline (uniform or nonuniform), Minimum-time B-spline • Bernstein basis polynomial • Refinement trajectory cost in most of the cases, defined by J(Γ) = ξ smooth J smooth (Γ) + ξ obs J obs (Γ) + ξ so f t J so f t (Γ) + ξ end J end (Γ); different types of techniques are employed considering a few or all of the preceding individual costs: jerk or snap, end point, obstacle, elastic band for control points refinement • iterative linear quadratic regulator (iLQR), Extended linear quadratic regulator (LQR), linear quadratic Gaussian (LQG), model predictive control (MPC), corridor-based model predictive contouring control (CMPCC) • A set of control barrier functions (CBFs) for improving the robustness Figure 2. The basic building blocks are encountered in trajectory planning problems.In general, a considered trajectory planning problem can be composed of one or more blocks sequentially or in parallel to fulfil the desired needs.Details explanation of each of the blocks are explained in the following sections: motion model selection (Section 2), intermediate waypoint identification (Section 3), initial trajectory generation (Section 4), free space segmentation (Section 5), continuous trajectory refinement (Section 6), and receding horizon trajectory planning (Section 7).
We survey the existing literature and identify the main building blocks of trajectory planning for MAVs: free-space segmentation, motion model selection, initial waypoint identification, initial trajectory generation, continuous trajectory refinement, and receding horizon trajectory planning.For each building block, we discuss and examine how previous research has addressed it.Furthermore, we used the same benchmark example to fairly compare different strategies.Also, we aim to provide a comprehensive overview of the recent advances and the most prominent results in trajectory planning for multirotor aerial vehicles (MAVs) from 2010 to 2022, with a focus on real-time generation of optimal trajectories as follows : Quadrotor attitude control for trajectory tracking [3], trajectory planning in large-scale and cluttered environments [4], a differentially flat hybrid system is used to formulate the trajectory planning problem for a bidirectional quadrotor [5], decentralized trajectory planning method for quadrotor swarm [6] Sampling-based method for time-optimal paths generation for a point-mass model [7], a continuous reference trajectory refinement technique for slow-speed maneuvering [8], trajectory planning approach considering geometrical configuration constraints and user-defined dynamic constraints for unconstrained control effort minimization [9], Logistic curve-based trajectory generation technique [10] Gaussian process-based residual dynamic learning [11], nonuniform kinodynamic search-based trajectory generation [12], a standard form of a two-point boundary-value problem using Pontryagin's minimum principle-based approach is proposed [13] Online teach and repeat planning technique was proposed [14], in which a geometric controller [15] was utilized for trajectory tracking.Moreover, an iterative trajectory refinement strategy was proposed to relieve the local minima problem where the free space was represented as a convex cluster, i.e., a set of convex polytopes [14], a faster approach for segmenting free space as a set of polytopes using point cloud [16], receding horizon trajectory generation was proposed in [17], whereas trajectory generation for moving target was proposed in [18] Trajectory planning technique was proposed based on nonuniform B-splines ensuring kinodynamic feasibility [19] where geometric tracking control (GTC) is used for controlling, incremental ESDF method for constructing the environment [20], B-spline based kinodynamic search algorithm followed by elasticbased optimization [21], preception-aware optimal trajectory generation with limited filed of view [22], direct collocation method for trajectory generation [23], Minimum-time B-spline trajectory generation [24] B-spline based kinodynamic search followed by refining the trajectory by using elastic optimization (EO) [25], fast marching method alone side with Bernstein basis polynomial trajectory generation [26], Topomap: three-dimensional topological map in which the sparse point cloud was directly utilized to construct the environment [27], continuous-time trajectory optimization technique was applied for generating the trajectory in which initial waypoints were generated using RRG.Furthermore, monocular visual-inertial fusion was used for constructing the environment [28] Informed RRG method for finding an initial obstacle free path [29], uniform B-spline based trajectory generation [30], using visual features to construct dense map and utilized for extracting obstacle-free space [31], SFC for extracting obstacle-free regions as a convex set [32], free space was constructed as a set of convex polytopes based on stereographic projection [33], topologically distinctive online trajectory planning [34], proposing 3D jump point search (JPS) [32] Extending Minimum-snap as an unconstrained quadratic program in which path segments were jointly optimized [35], mixed integer quadratic programming (MIQP) based trajectory generation technique in which free space was segmented convexly by IRIS [36], generating safe avoidance trajectories [37] which was inspired by covariant hamiltonian optimization for motion planning (CHOMP) and minimumsnap.Moreover, it introduces a random restart technique to avoid local minima, kinodynamic FMT-star followed by minimum-snap trajectory smoother [38], sophisticated octree-based partitioning tree-based obstacles representation [39] Proposing IRIS for free space segmentation [40], minimum-snap trajectory generation using MIQP in which IRIS used for free space segmentation [41], motion primitive based approach for polynomial trajectory generation [42], Long range navigation based on teach and repeat where iterative closest point matching (ICP) was utilized [43], coordinate descent optimization in which objective was to minimize the along the coordinate hyperplanes [44] Trajectory generation based on pre-computed convex regions, which were used to build the map [45], the trajectory was generated seeking the time-optimal parameterization of a given path (TOPP) [46] Local replanning for exploring in which motion primitives were used to ensure the dynamic feasibility [47], path planning by using A-star for searching the optimal path in lattice space (x, y, z, heading) followed by motion-primitive-based trajectory generator [48], asymptotically optimal kinodynamic RRTstar trajectory planner [49], CHOMP trajectory generation and continuous improvement of the initial trajectory considering obstacles and smoothness of the trajectory [50] Proposing MIQP based approach for trajectory generation [51], seeking different homology classes of trajectories and generating an optimal trajectory subject to that homology classes [52] Minimum-snap trajectory generation [53], Gradient free optimization technique, STOMP [54], proposing quite faster search algorithm JPS in uniform grid [55] Free space was extracted by discretizing the space via the 3D Delaunay triangulation [56] Covariant gradient-based trajectory generation, CHOMP [57]  Our goal was to investigate trajectory planning for multirotor aerial vehicles (MAVs) in the plan-based control paradigm, focusing on analytical approaches rather than learning and evolutionary approaches.In future work, we plan to investigate trajectory planning approaches based on imitation learning, inverse reinforcement learning, and evolutionary computing.Further investigations also include a comparison of different paradigms in a simulated environment and through real-world experiments for different operating scenarios: trajectory planning for high-dense and less-dense environments, static and dynamic obstacle avoidance, and high-speed and low-speed maneuvers.These considerations were outside of the scope of this work since we focused on the theoretical aspects of the plan-based trajectory planning paradigm.The rest of the paper is organized as follows:

Motion Model Selection
Exact model, empirical model and differential flatness are the main techniques that can be employed for selecting the most appropriate motion model for a specified application.The appropriate motion model selection procedure varies depending on the problem formulation.For example, planning followed by controlling approaches does not necessarily have an exact motion model mainly due to high computational demands.In such scenarios, an empirical motion model is sufficient for planning, since a dedicated controller is utilized for controlling the quadrotor.

Exact Model
In general, multirotor aerial vehicle (MAV) dynamics is described by six degrees of freedom.However, in planning followed by high-level controlling approaches, it is not required to define an actual motion model for planning, since a high-level controller consists of a fully-fledged quadrotor motion model.In most circumstances, the planner is composed of approximated quadrotor dynamics; this is due to computational complexity, which is not adequate for real-time onboard processing.Hence, the motion model selection process depends on the approach that formulates needs.In [58], the researchers proposed a six-degrees-of-freedom motion model, whose state vector is defined by x = [p , v , ψ , ω ], where ψ, p, v and ω stand for orientation (rad), position (m), velocity (m/s), and angular velocity (rad/s) in R 3 , respectively, with respect to a defined local coordinate frame.The system input or total trust that is applied for each of the motors is given by f 8 ms −2 and e i , i = x, y, z stand for standard basis vectors in R 3 , and k v , m, J, ρ, and k m are robotspecific constants.

Empirical Model
Other than the exact model, a six-degree-of-freedom motion model was proposed for governing quadrotor in a distributed setup [59].Later, it was reduced to four-degree-offreedom motion model [60].Furthermore, in [61], a 4-degree of freedom (DoF) motion was used for controlling several quadrotors in a distributed setup in which nonlinear model predictive control (NMPC) and model horizon estimation (MHE) are incorporated for relative tracking, where the relative motion model was defined as: where the function is the state of the motion model, where p i , i ∈ {x, y, z} is the position of the MAV in the world frame.ψ z and ψz denote the yaw angle or heading angle around the z axis and relative yaw angle, respectively.Derivative of ψ z and ψz are denoted by ψz and ψz , respectively.v i , i ∈ {x, y, z} denote the velocities on each direction, whereas ṗi , i ∈ {x, y, z} gives the derivatives of p i .Since discrete space was chosen for controlling the system, Euler discrete model (1) was formulated as follows: where δ is the sampling period and . f c and f d denote continuous and discrete dynamics, respectively.x + depicts the next state given the current state x.Subsequently, the motion model was simplified to 4-DoF for trajectory tracking for a quadrotor [62] (Equation (1)).In this trajectory-tracking approach, planning followed by the high-level controlling paradigm was applied.Such an approach was introduced because a simplified motion model is a reasonable choice for achieving real-time performance.
Quadrotor state was defined as x = [p x , p y , p z , ψ z ] T ∈ R n x , whereas input to the system was given by u = [v x , v y , v z , ψz ] T ∈ R n u .The simplified motion model was given by: where f c (•) : R n x × R n u → R n x and n x = n u = 4.The discretization of (3) was given by: where

Differential Flatness
Here, differential flatness [63] provides algebraic functions (e.g., polynomials) which analytically map the trajectory and whose higher-order derivatives map to system states and inputs.Since the Nth-order polynomial can be differentiated up to N − 1 times, the differential flatness property ensures the feasibility of the trajectory and generates appropriate control commands.More precisely, let the following: (5) be a nonlinear system.According to to [64], if the system is differentially flat, there always exists a flat output, namely z ∈ R n z , where the dimension of the output is given by n z .In such a system, states and control inputs can also be formulated from the system flat outputs whose derivatives are mapped through functions, namely and τ.Let z = (x, u, u, . . ., u (q) ) be the flat output, holding x = (z, ż, . . ., z (r) ) and u = τ(z, ż, . . ., z (r) ), where apices (i) stipulates the ith derivative.Along with that, the explicit trajectory generation process can benefit when it uses differentially flat systems, for example, and τ can be a dth-order polynomial p(t).Then, x (t) = [p (t) ṗ (t) p (t)] are the state of the system at time t in which ṗT and pT indicate the velocity and acceleration of the system, respectively.Control inputs can be determined by jerk [65], namely ... p T (t), where p(t) = λ d t d + . . .+ λ 1 t + λ 0 , t ∈ [0, dt], where λ i , i = 0, . . ., d are the polynomial coefficients.There are various ways to construct these kinds of polynomials, including Minimum-snap, B-spline, etc.

Initial Waypoint Identification
Generally speaking, robots have a limited sensing range.Thus, planning a trajectory out of such a sensing range would be counterproductive.Hence, local trajectory planning and refinement when a robot moves is the optimal choice.With the help of sensing capabilities within the robots' sensing range, the robot's surrounded environment can be constructed as the intersection of three separate disjoint sets: free-known (C f ree ), occupied (C obs ), and unknown (C unknown ).Once C f ree ∪ C unknown is identified, a set of intermediate waypoints is needed to navigate the robot along the trajectory from the start position to the desired position.There are various techniques for finding a set of intermediate waypoints: sampling-based techniques (e.g., RRT*, probabilistic road map (PRM)) and path-searching techniques (e.g., A*, D*, JPS), where waypoint poses are given in the UTM (universal transverse mercator) coordinate system and then converted into the local coordinate system.Moreover, kinodynamic properties are incorporated into preceding intermediate waypoints finding techniques to ensure the dynamic feasibility of the robot.One of the first kinodynamic-based path planning approaches was proposed in [66], in which a variant of the A* method alongside with kinodynamic properties was applied to ensure the dynamic feasibility.Subsequently, several different methods were proposed for enhancing path planning, ensuring the dynamic feasibility by kinodynamic properties, including motion-primitive-based approaches.
Motion-primitive-based approaches [42,67,68] can be utilized for finding intermediate waypoints and for trxajectory generation.Gordon et al. [69] proposed a set of motion primitives for connecting edges of the graph that was constructed from A*.In this method, motion primitives were used to defining state vector x(t) and control input u(t) as a linear time invariant (LTI) system as follows: where p µ (t) = Σ d j=0 λ j t j , µ ∈ {x, y, z}, which is formulated similar to (16), while k r and d are the order of the derivative and the order of the polynomial, respectively.
Hence, given control policy u i (t) and initial state x(0), a sequence of succeeding states for a given time duration is determined by: where γ is the time duration that control policy is applied.In [69], to define the actual and heuristic cost of A*, the researchers used motion primitives, which are defined (as shown) in ( 8), and calculated initial waypoints set.
Another interesting approach to finding a set of initial intermediate waypoints is by using fast marching methods.In general, fast marching methods [70] are applied to track the propagation of a convoluted interface such as wavefront, especially in image processing.Let ϕ be a close curve in a plane ∈ R 3 that propagates orthogonally to the plane with a speed v(p), assume v > 0. Given T time period, propagation of the plane can be described by | T(x)| = 1 v(p) based on Eikonal partial differential Equation [71], where p is the position in R 3 and the arrival time is formulated by T(x).The fast marching concept was applied for path searching in [26] by proposing a method for calculating a velocity map.In this method, the arrival time was determined by assessing the desired velocity at the considered position in the local coordinate system.Hence, arrival time was calculated by backtracking from the goal pose to the start pose along the minimum cost path, which can be estimated from the gradient descendant.Though gradient descendant may trap in a local minimum, when smart marching is applied, gradient descendant does not trap in local minimum due to fast marching nature; this property was proven in [72].To define the velocity map, ESDF was utilized to get the closest obstacle poses from the given pose.A quadrotor should move faster when there are no close-in obstacles and should be slower when it is moving through a cluttered environment.Such a behavior was mimicked by incorporating a hyperbolic tangential function, i.e., tanh.With such an assumption, the corresponding velocity was calculated based on (9): where v max is the maximum velocity a quadrotor can fly, l is the distance to the closest obstacle from the considered pose p and, e is Euler's constant.

Initial Trajectory Generation
Let us consider a nonlinear system in the form of ẋ(t) = f c (x(t), u(t)) with initial state x(t 0 ) = x 0 , where state vector and control inputs are denoted by x ∈ R n x and u ∈ R n u , respectively.When generating an initial trajectory (Γ), ensuring dynamic feasibility is a must.In other words, x and u satisfy the following constraints: In addition to these constraints, safety constraints should also be imposed after reasoning the environment, to guarantee safety.The environment or configuration space C can be decomposed into C obs and C f ree .Hence, a set of constraints should be introduced for the quadrotor to always be within free space x ∈ C f ree = C/ C obs .Hence, the initial trajectory generation process has to consider both said types of constraints simultaneously so that the quadrotor would have a smooth flying experience.

Define Trajectory
Let Γ ← C ⊂ R d be an initial trajectory, which is parameterized as a function of time where d denotes the C's dimension.Since Γ is a function, the objective of the trajectory generator is to determine the precise objective, which will eventually provide the optimal trajectory in a timely manner satisfying constraints and hypotheses that are imposed.Hence, optimal trajectory, namely Γ * , can be posed as a discrete or continuous optimal control problem (OCP) [73]: where t 0 and t n denote the start and terminal time, respectively.Yet another challenging problem is to formulate the objective function, namely J.In the following subsections, we discuss several approaches to address this problem.

Minimum-Snap-Based Trajectory Generation
minimum-snap trajectory generation [53] uses the differential flatness property (Section 2.3) to automate the trajectory generation process.Let quadrotor trajectory be Γ T (t) = [r T (t), ψ T (t)] T for flat output [x, y, z, ψ z ] T , where r = [x, y, z] is the center position of the MAV with respect to world coordinate system and ψ z is the yaw angle of the MAV.The continuous trajectory can be expressed as follows: where d is the dimension of the space, e.g., 3.As we defined in Section 2.3, system states and inputs can be determined in terms of Γ and its derivatives.Γ, Γ, and Γ correspond to position, velocity, and acceleration, respectively.Flat output and its derivatives estimation in minimum-snap refer to the original work [53] (Equations (1)-( 35)).
In minimum-snap trajectory parameterization, the total time duration of the trajectory is divided into a set of subintervals, i.e., keyframes.Each keyframe consists of a desired position and a yaw angle.A safe corridor is constructed between consecutive keyframes as a set of piece-wise-polynomial functions to estimate smooth transitions through the keyframes.Let m d and d be the number of keyframes and the order of the piece-wisepolynomial functions, respectively.Hence, Γ T (t) can be formulated as: To generate an optimal trajectory, the following objective is utilized: where ξ r and ξ ψ are regulation parameters, k r and k ψ are the order of derivation at each keyframe, and Γ T (t i ) = [x i , y i , z i , ψ z i ] T , i = 0, . . ., T. Time intervals, t 1 , t 2 , . . ., t m d can be kept constant or varying when deriving the minimum-snap trajectory generation.In most cases, having varying time intervals between keyframes is necessary.Mellinger et al. [53] proposed a gradient descent-based approach for finding optimal time intervals between keyframes.Furthermore, Chen et al. [74] utilized A* to find the intermediate waypoints.
Based on these estimations, time segments or keyframes are calculated incorporating both velocity and acceleration limits.In the latter approach, the steps listed below were used to obtain intermediate waypoints.Initially, the environment was constructed as a map using OctoMap.Afterwards, the formed map was split into two subsets: allocated and nonallocated (a set of free spaces).Then, the discrete graph was constructed connecting consecutive free spaces, which were represented as cubes.Afterwards, A* was applied for finding the optimal path segment within each cube.Similar to (14), the researchers set k r = 3 and minimized only total jerk (15) to minimize the angular velocity.As an aside, minimizing the angular velocity helps to avoid fast rotation.

Polynomial Trajectory Generation as QP
In minimum-snap trajectory generation, total trust force, i.e., attitude acceleration, is proportional to the fourth derivative (snap) of the trajectory [53].The gracefulness of such behavior helps to avoid generating excessive control commands.Subsequently, a slight variation of minimum-snap trajectory generation was proposed in [35], where segment times or keyframes were fixed initially.Once start and goal positions were provided, RRT* [49] was utilized for finding an obstacle-free path between the start and the goal poses as a sequence of optimal waypoints.Initial segment times (m d ), which were estimated using optimal waypoints, were calculated according to the maximum velocities that the quadrotor is allowed to fly due to set technical limits.Let p i (t) be the dth-order polynomial in the ith segment that describes as follows: Each p i (t) provides a flat output for a given time index t.λ j , j = 0, . . ., d denotes the polynomial coefficients.The objective or cost function J(Γ i ) can be fully determined by penalizing the derivatives of squares [35]: where P i is a vector whose elements contain polynomial coefficients: ξ 0 , ξ 1 , . . ., , where k i r is the highest order of derivative and Q(T i ) is Hassin matrix, which contains the ith segment squares of derivatives.Since there are m d number of segments, total cost J(Γ) can be expressed by: For a smooth flight experience, ensuring the continuity of derivatives between segments is necessary.Hence, imposing constraints between segments, e.g., velocity, acceleration, jerk, and snap is needed, which can be formulated as follows: where C i contains a mapping matrix whose entries contain the start and end coefficients of ith segment, whereas d i contains derivative values, i.e., start and end of ith segment.Taking all constraints of m n segments: Now this can be solved as a constrained quadratic programming (QP) problem.

Unconstrained Polynomial Trajectory Generation
The techniques that are used for uconstrained trajectory optimization are faster than constraints optimization.In [35], the researchers extended minimum-snap trajectory generation as an unconstrained QP.According to their findings, minimum-snap works well for small segments size.For higher-order polynomials with varying segment sizes, minimumsnap becomes ill-conditioned.Thus, an unconstrained QP was proposed.After substituting (19) and ( 20) into (18), J(Γ) can be reformulated as: where d contains fixed derivatives (d f ) and free derivatives (d p ), and S is a permutation matrix (ones and zeros), which is used to correct the order.Then, dJ(Γ) dd p = 0 yields the optimal value for d p : Once d p is determined, a polynomial that corresponds to each segment can be recovered.

Unconstrained Polynomial Trajectory Generation with Collision Avoidance
Oleynikova et al. [37] extended what Richter [35] proposed for adding support for collision avoidance capabilities.They added an additional term for calculating the collision cost: where J smooth exactly equals (21).To estimate J obs (Γ), it is required to initially calculate position p i (t) (16) and velocity v i (t) for each axis at time t after selecting the corresponding segment (i, i = 1, . . ., m d ): Knowing (the values of) p i (t) and v i (t), J obs (Γ i ) can be fully determined by: where (C −1 S) pp is the right-side matrix, which corresponds to d p .For representing the collision cost c(p i (t)), a line integral of a potential function, i.e., (44), was used.As total cost is given (21), J obs (Γ) can be calculated for all the segments provided that d * p can be estimated.In a cluttered environment, optimization problem is most likely to be nonlinear as well as nonconvex.Thus, Broyden-Fletcher-Goldfarb-Shanno (BFGS) [75] was used to solve the optimization problem.However, the solver failed to obtain the global minimum most of the time.Hence, several random restarts were needed to find the optimal solution.A thorough discussion of how random restarts were invoked into the optimization problem was detailed in [45].

Covariant Gradients for Trajectory Generation
The significance of covariant gradients technique is that both J obs (Γ) and J smooth (Γ) depend solely on physical characteristic of the desired trajectory.In other words, the trajectory generation is invariant to its parameterization.If gradient descent is applied, it depends on the way trajectory is parameterized.The covariant gradients technique removes this dependency.Hence, covariant gradient technique depends solely on physical representation or dynamic quantities of the trajectory with respect to an operator, Θ: where ξ is a constant and apices (n) determine the nth-order derivative.The correlation of derivatives between two trajectories: Γ 1 and Γ 2 , which are defined by assuming inner product as given (27).
The primary objective of Θ is to distinguish the norm (26) and the inner product (27) from the L2 norm [50].

B-Spline-Based Trajectory Generation
dth-order B-spline can be defined for a given knot sequence p k = {t 0 , t 1 , . . ., t n k } and control points p c = {p 0 , p 1 , . . ., p n p }, where t * ∈ R, p * ∈ R d and n k = n p + d + 1.If d is set to 3, each p i represents position in R 3 , where i = 0, . . ., n p .For a given time index t, the corresponding position p(t) can be fully determined by using the de Boor-Cox formula [76].
Estimation is not limited to the position; velocity, acceleration, or any high-order derivative of p c can be estimated using DeBoorCox(t, p ( * ) c ), as given in Algorithm 1, where ( * ) depicts the order of the derivative of p c such that ( * ) < d.
Algorithm 1 The B-spline trajectory (p) and its derivative estimation for a given time index t, where p equals p ( * ) c .1: procedure DEBOORCOX(t, p) while true do for i ← 0 to d do 10: for r ← 1 to d do 12: for i ← d to r do 13: Later, the B-spline matrix representation was proposed by Qin [77].B-spline can be formulated as uniform or nonuniform.J. Hu et al. [78] detailed the uniform B-spline matrix representation.In uniform B-spline, knot span is the same for any considered consecutive time interval, i.e., ∆t = t i+1 − t i , i ∈ [0, n k ).Any position of the trajectory can be parameterized by considering only d + 1 consecutive control points: [p i , p i+1 , . . ., p i+d ].Hence, corresponding normalized time q(t) can be calculated as follows: In the matrix representation, c(q(t)), which is given in (28), can be determined by: c(q(t)) = q(t)M d p i , q(t) = [1, q(t), q 2 (t), . . ., q d (t)] T , p i = [p i , p i+1 , . . ., p i+d ] T , Since each control point p i belongs to d + 1 of successive spans, B-spline can be controlled locally.Due to such controllability, B-spline is suitable for local trajectory planning [30].Moreover, the derivatives of a given B-spline are also B-spline [8].Hence, B-spline's derivatives (e.g., velocity, acceleration, jerk) can be calculated considering corresponding span [t i , t i + 1) for a given d + 1 consecutive control points p i = [p i , p i+1 , . . ., p i+d ] T ∈ R d×3 and corresponding knot vector.
The explicit form of estimation of velocity and acceleration of a given time index is calculated as follows: , ). ( In most of the situations, initial control points are generated, as explained in Section 3.Such methods may or may not be smooth enough for initial trajectory generation.There are various ways to generate intermediate waypoints to improve the quality of the trajectory using B-splines.For example, the initial trajectory was constructed using cubic B-Spline in [62].Such a capability is mainly due to B-spline's properties.
It is particularly continuity and convex-hall properties that make B-spline trajectory generation such a robust technique.

Convex Hull Property
Among the properties of the B-spline, the convex hull property is the most significant property due to its capabilities for checking the dynamical feasibility and the collision.How convex hull property is incorporated for calculating dynamical feasibility is given in (32).As shown in Figure 3, d h > 0 and d h > d c − r h should be held for a considered point in the trajectory to ensure a collision-free trajectory, where d c is the distance between a given control point and its closest obstacle position.In dth-order B-spline, a convex hull is formed by connecting any successive d + 1 control points, e.g., p i , p i+1 , p i+2 , . . ., p i+d or union of all consecutive control points that lie on the corresponding B-spline curve [69].Moreover, r h can be substituted with [19], the following condition should hold for collision-free trajectory planning:

Bernstein Piece-Wise Trajectory Generation
Bernstein polynomial is a specific form of B-spline, which is similar to the Bezier curve [79,80].Bernstein polynomial can be described as follows: where d is the degree of the polynomial (Figure 4), λ 0 j , λ 1 j , . . ., λ d j are the control points of jth polynomial segment, and t ∈ [0, 1].Since Bezier is a particular form of the B-spline curve, such curves hold convex hull property.Hence, given a sequence of control points, a constrained convex hull can be defined using the control points that are considered.Both the beginning and end of the curve are determined by the first and the last control points, respectively.Furthermore, the derivative of Bezier is also a Bezier curve.
where i, j refer to ith control point in jth segment, i.e., λ i j , s j is a scaling factor of jth segment for mapping time duration from [0 , 1] to [t j−1 , t j ] and µ ∈ {x, y, z}.Once Γ µ (t) is obtained, the objective is to minimize the total cost, which can be determined by taking the integral of square error up to k r order as given in (15).Such a problem can be formulated as a QP constraint problem.For instance, Gao and Wu [26] proposed a Bernstein-based trajectory optimization approach in which three types of constraints piece-wise trajectory continuity, safety constraints which are based on convex hull property, and dynamical feasibility constraints enforced [26].

Comparison of Several Trajectory Techniques
In the preceding subsections, several types of trajectory parameterization techniques were considered.We have selected three different types of trajectory parameterization techniques for this comparison: piece-wise-polynomials technique, fitting based on a sequence of points, and the third is uniform B-spline-based technique.The objective of piecewise-polynomials is to find optimal polynomial coefficients [53] or end-derivatives [35] of consecutive segments, whereas the objective of the third technique is to find a set of points satisfying the provided constraints [57].A comparison of how velocity, acceleration, jerk, and snap are varied for selected techniques in terms of mean, standard deviation (std), min, and max for the same a set of control points and knot vector is present below.The considered knot vector and control points are: Each approach has its own set of parameters to fine-tune for obtaining an optimal trajectory.The generated trajectories are shown in Figure 5 with different configuration setups (with different parameter sets).Figure 6 shows how the derivatives up to the fourth change over time in each direction, i.e., x, y, z, separately for each technique.When looking at the derivatives of each method, it is clear that smoothness, which is the main point to be considered for motion planning, is higher in both B-spline and minimum-snap compared to CHOMP.Since uniform B-spline is used in this comparison, smoothness changes of each derivative between B-spline and minimum-snap cannot be compared directly due to time allocation when generating the trajectories.Hence, minimum-snap trajectory smoothness can be changed, optimizing the time allocation process [35].On the contrary, such a time allocation process is not necessary for a uniform B-spline.However, control points are interpolated appropriately to generate a continuous and smooth trajectory.We varied the parameters for each approach appropriately and estimated the mean, standard deviation, maximum, and minimum of the velocity, acceleration, jerk, and snap profiles.The results are shown in Table 1.The results clearly show that the consistency of the trajectory depends on the parameters used to parameterize the trajectory.Therefore, selecting the appropriate parameter set for a given task is of utmost importance, as can be seen by looking at the statistical properties (mean, standard deviation, minimum, and maximum) of the higher-order derivatives, such as velocity, acceleration, jerk, and snap.As described in the previous paragraph, the time allocation process directly affects the parameter selection for minimum-snap.Further, the optimal polynomial coefficients process depends on time allocation, as given in (13).On the other hand, the Poly-traj [35] generation process has fewer parameters to be optimized, since it uses free-end derivatives of each segment.Hence, the latter technique is faster than minimum-snap.
Table 1.Velocity, acceleration, jerk, and snap profile for generating an optimal trajectory for a given set of knot vector and control points (Figure 6) using three different techniques: minimum-snap [53], Poly-traj [35], and CHOMP [57].d: order of the polynomial, mc: maximum continuity or maximum continuity order in between consecutive segments, pd: number of proposed points or point density per defined time duration of the trajectory.

Free Space Extraction
Obstacle region identification is of utmost importance for optimal trajectory planning in real time.In a cluttered environment, the way the trajectory planning problem formulated matters for fast reaction.Such trajectory planning approaches can be designed as QP mainly due to less computation power required for such tasks.Hence, forming obstacle-free regions in the form of convex has more advantages in terms of reducing the computation power, simplicity, and fast convergence.Chen [74] attempted to define free space as a series of cubes between the start and goal pose.Thenceforth, OctoMap [81] was used for constructing the map surrounding the quadrotor, where regions with no obstacles are considered free spaces.After obtaining the free space information, obstacle constraints are enforced into (15) to generate optimal trajectory.
Once C f ree was obtained, free space regions can be considered as a set of inequality constraints that can be added into the piece-wise-polynomials trajectory generation as , where i = 1, . ., m d − 1 and Γ T was defined in (13).In such a trajectory, additional boundary constraints should be introduced if the extrema of dth-order polynomial violates the boundary constraints corresponding to each axis, i.e., x, y, and z, in each segment [74] (Equation ( 10)).Similar to the preceding approach, Gao and Shen [82] proposed a sequence of spheres to represent free space from the initial position to the final position.To construct the environment, a map was not built; instead, they bypassed map building by constructing a KD-tree-based placeholder [83] to store raw point cloud for the LiDAR.Afterwards, a relative map to the current pose of the MAV was retrieved using nearest neighbour search; RRG [84] combined with A* was used to find a flight corridor or intermediate waypoints.Such intermediate waypoints were connected by overlapping spheres.
IRIS [40] is one of the first successful ideas in which obstacle-free spaces are extracted using a convex optimization technique.In this proposed approach, initially, it is required to provide a seeking point and an area with a boundary box where an obstacle-free region is to be searched.Seeking point is defined as a unit ball: where p 0 is the center point.The linear constraints, which separate the boundary box into obstacle-free and obstacle-rich regions, are defined as a set of hyper-planes: P = {p | Ap ≤ b}.Subsequently, finding the optimal representation of ε(C, p 0 ) and P with respect to given obstacles, ı j , j = 1, . . ., N is solved as an iterative process (38): where A i and b i correspond to ith row of A and b.The first constraint, i.e., A T j p k ≥ b j , is imposed to move the obstacle into one side of the plane, A T j p = b j , whereas the second constraint, i.e., sup p A T i (C p + p 0 ) ≤ b i , ensures the ellipsoid lies on the other side of the plane.The researchers proposed to solve the (38) as a two-step process: searching, first, for proper constraints (i.e., A i and b i ), and then the maximum volume that satisfies ellipsoid, ensuring preceding constraints.In other words, they attempted to find hyperplanes that separate obstacle regions and free regions.Conceptually, hyperplane separation is completed by finding planes that intersect with obstacle boundaries.Afterwards, the ellipsoid is uniformly expanded until it intersects with obstacle boundaries.Let α be the scaling factor which defines the expansion.Let ε α = {C p + p 0 | p 2 ≤ α} for α ≥ 1 be the expanded ellipsoid.Hence, the optimal α * can be determined by: After finding α * , it is possible to define the optimal inscribed ellipsoid (ε * ), which is the obstacle-free region [40] (Section 3.3) .
Sikang et al. [32] proposed a new approach, quite different from the aforementioned IRIS, for extracting obstacle-free regions as a convex set SFC (Figure 7).SFC searches a set of overlapping polyhedra from the start pose to the goal pose.To obtain intermediate obstacle-free positions, the researchers utilized a graph search technique, namely JPS [55].The main reason for selecting JPS over sampling-based algorithms (e.g., RRT* and PRM) or search-based techniques such as A* or Dijkstra is due to the nature of JPS; it uses a uniform-cost grid map with uniform voxels.In general, sampling-based techniques are not deterministic though probabilistically complete.Thus, there is no guarantee about the duration of searching time.On the other hand, the computational time for search-based methods is pretty high if the environment is cluttered.However, JPS has a lower searching time compared to A* [32].Let p c = p 0 , p 1 , . . ., p n be the intermediate waypoints from start to goal pose and l i =< p i , p i+1 > be the ith line segment, where i 1, . . ., n − 1.Each line segment constitutes convex polyhedra, namely, E i .Along with that, SFC can be expressed as SFC(P) = {E i | i = 0, . . ., n − 1}.SFC has two steps: finding E i that fits the l i and seeking a set of linear inequalities that are tangent to , C i can be decomposed as R T SR, where R gives the axis of rotation between considered line segment in between p i and p i+1 ).The semi-axis of E i is given by S = diag(a, b, c) as a diagonal matrix.p 0 i is the center of l i .The objective of SFC is to find each pair E i and p i 0 , given the l i and obstacles set (Obs i ), which touches the E i .Initially, ellipsoids are spheres whose center poses are located as the midpoints of l i , i = 1, . . ., n − 1. Afterwards, semi-axes, except for the axis along p i+1 − p i , are shrunk until the corresponding ellipsoid contains no obstacles.Let ε * i (C i , p 0 i ) be the ith ellipsoid after applying the shrinking process.p j denotes the closest point that touches the ε * i (C i , p 0 i ), where j = 1, . . ., m and m is the number of obstacles.Hence, corresponding half-space H j = {p j | a T j p j < b j } is defined as a plane that is tangential to ε * i (C, p 0 ), where a j and b j are determined by: Hence, the intersections of these m half spaces create a convex polyhedron C = ∪ m j=0 H j = {p | A T p < b}.The same approach is applied to each line segment, l i in which we can generate each C i .All in all, SFC(P) = {C | i = 0, . . ., n − 1} can be constructed.A more descriptive formulation is given in [32] (Algorithm 1).

Continuous Trajectory Refinement
The objective function consists of several subobjective functions: for improving the smoothness, for avoiding obstacles, and so forth.In this section, a precise explanation is given on how to construct subobjective functions for each of the various occasions.First, we examine the simplest case where only dynamic feasibility and obstacle avoidance constraints are taken into consideration.Let J be the objective function or performance index: There are various formulations of how J obs and J smooth are determined.In general, J smooth can be expressed as: Eliminating unnecessary higher-order motion is the main objective of the J smooth .On the other hand, J obs encourages to generate or modify collision-free trajectory by trying to push control points away from the obstacle zone if the trajectory is already in collisions or penalizing parts of the trajectory that is close to the obstacles.Let B ⊂ R d be the exterior boundary of the MAV and c is the cost function of penalizing close-in obstacles with respect to B. Along with that, J obs can be formulated as follows: where the function f c (Γ(t), p), which was proposed by Ratliff at al. [57], can be defined as follows: where δ dis denotes the distance from the boundary (B) of the quadrotor to a given obstacle position.Before taking gradient at i, J(Γ) is linearized around i, J(Γ) ≈ J(Γ i ) + (Γ − Γ i ) T J(Γ i ).Defining c and d is detailed in [57] (Equations ( 22)-( 28)).
In [19], the cost of the trajectory was estimated based on the following formulation: where J so f t (Γ) is determined by soft limits on acceleration and velocity.J smooth (Γ) is defined by considering only geometric information without minimizing snap and/or jerk [53].Such minimization is required because of the following stages of trajectory optimization.In such trajectory optimization, time reallocation has less impact on the objective function.Hence, J smooth (Γ) is defined as follows: where a number of control points, denoted n, and f i+1,i and f i−1,i can be interpreted as connecting joint force of two springs between control points pairs: (p i+1 , p i ) and (p i−1 , p i ), for example, control points lie on a straight line if the sum of all terms equals zero.As an aside, similar approaches were proposed in [85,86].The value of J obs (Γ) is determined by calculating the distance to the closest object pose from each control point, in which the distance to the obstacle, i.e., f c (p i ), is given by: where δ dis is the free distance between MAV's center and the pose of the closest obstacle.Hence, J obs (Γ) = Σ n i=d f c (p i ) can be estimated based on a given trajectory in the form of control points.Soft constraints are defined by not exceeding both acceleration and velocity within those max limits.
To calculate acceleration and velocity at each control point and when both acceleration and velocity exceed their maximum limits, the convex hull property (33) of B-spline is utilized to penalize those control points.Based on the previous method, Ref. [30] proposed an endpoint cost J endpoint (Γ) into the objective function as an additional term.The key intuition behind adding J endpoint (Γ) is to reduce the error between local trajectory and global trajectory since J endpoint (Γ) penalizes error of both velocity and position with respect to the desired global trajectory.J endpoint (Γ) is determined as follows: where ξ p end and ξ v end are regularization parameters, whereas p end and ṗend are the desired end position and velocity of the trajectory.

Receding Horizon Trajectory Planning
On most occasions, paths which are obtained by planning techniques are suboptimal.Hence, the initial trajectory that is generated based on the initial path is to be refined, ensuring dynamic feasibility for controlling the MAV.Various approaches can be applied for trajectory refinement.However, enabling recursive feasibility and incorporating terminal constraints and convergence to the desired state are the utmost importance considerations to be contemplated throughout the process.LQR and MPC are the two most popular approaches that are being used for receding horizon planning.LQR is applied for linear systems, whereas iLQR and differential dynamic programming (DDP) are applied for nonlinear system.Both in LQR or iLQR, OCP is defined as an open-loop control problem.On the other hand, MPC is designed as a close-loop OCP.In other words, OCP is seeking actions knowing the behavior of the surrounding environment.

LQR-Based Trajectory Generation
DDP [87,88] is one of the first techniques proposed for solving optimal control problems.Let x k+1 = f d (x k , u k ) be the discrete-time system dynamics; the total cost of the trajectory can be formulated for a given control policy, i.e., π k+i , for all i = {0, 1, . . ., N − 1}.
The optimal control input, i.e., u k+i = π k+i (x k+i ), for a given time index, i.e., i + k, can be obtained by minimizing the (50).Thus, cost (cost-to-go), which was proposed in [89], is fully determined by: The same procedure can be applied recursively in a backward direction for seeking the optimal π k+i (x k+i ) = arg min u k+i (c(x k+i , u k+i ) + V k+i (f d (x k+i , u k+i ))).DDP yields almost the same behavior: first estimate optimal control and then apply a forward pass to determine the updated nominal trajectory.Consequently, LQR is a simplified version of DDP.LQR is one of the fundamental ways to obtain a closed-form solution for a given optimal control problem under which system dynamics is assumed to be linear.Let us assume the system dynamics is defined as in (4).The intuition of LQR is to estimate the optimal control sequence for maneuvering the quadrotor from an initial position to the desired pose.Let N be the receding horizon whose optimal trajectory is to be determined.The total cost, i.e., J k (x k , π N ), consists of three parts: initial, intermediate, and final costs, where π N = {π k , π k+1 , . . ., π k+i , . . ., π N−1 }: where ≤ 0, and ∂ 2 C ∂u∂u ≤ 0 are positive semi- definite Hessians to guarantee the minimizing of the total cost.The total cost can be formulated in various ways.In LQR, the total cost is defined as Quadratic costs as follows: where i ∈ {0, 1, . . ., N − 1}, , and ξ ∈ R are predefined in which Q start , Q goal , Q, and R are positive definite, whereas J k ≥ 0 and j k ≥ 0 assumed to be positive semi-definite.LQR problem ( 52) and ( 53) provides an optimal π N in close form solution as expressed in (51); the cost-to-go function, i.e., (51), can be reformulated as an explicit quadratic formulation as follows: The estimation of both J k+i and j k+i can be obtained in a recursive way starting from the goal position x x+N to the initial position x k , using Riccati differential equation for all i = {0, . . ., N − 1}.
In general, system dynamics is described by: If the system dynamics is nonlinear, A k and B k are recalculated by linearizing the f c at each time index.Since linearization has to be carried out in each iteration, it is called the iLQR [90].
Boundary or goal position conditions are given by S k+N = Q goal , j k+N = q goal .The feedback control policy in LQR is fully determined as follows: As given in (55), system stability depends on system dynamics.When quadrotor dynamics is nonlinear, the stability of iLQR is not guaranteed.Jur and Berg [91] attempted to address the stability issue by proposing a novel method called LQR smoothing; this method can be applied for linear or nonlinear systems to acquire the minimum-cost trajectory.The main difference in LQR smoothing compared to LQR is that LQR minimizes the cost of not only backward direction, i.e., cost-to-go, but also applies forward direction, i.e., cost-tocome [58,91,92].However, the output of LQR, iLQR or LQR smoothing does not address the system noise.Both linear or nonlinear state estimator may eliminate the system noise.LQG [93,94] is one of the ways to solve this problem.LQG consists of a state estimator, i.e., Kalman filter (KF), and state feedback, i.e., iLQR or LQR.

MPC-Based Trajectory Generation
As detailed in Section 7.1, unaccountability of addressing sudden disturbances is the main limitation of OCP techniques (e.g., LQR, DDP); this is due to its nature.LQR calculates fixed receding control policy and applies it to the system; there is no intervention during the control policy execution.MPC is one of the ways to address the preceding problem, which is characteristic of both LQR and DDP.The difference between MPC and LQR is that only the first portion of the control policy is applied to a system (Figure 8) in MPC through the calculation of full control policy rather than employing full control policy as in LQR.Let us assume the system dynamics as given in (2).In general, MPC can be formed as follows: min where w = u k , . . ., u k+N−1 is the optimal control sequence to be estimated in each iteration.Variable J end (x k+N ) plays a significant role in terms of the stability of the system locally and globally.Presenting local stability is relatively easy, e.g., Lyapunov's analysis compared to global stability.In addition to terminal cost, terminal constraints for states should be enforced, which is quite computationally challenging for real-time applications.Moreover, enforcing terminal constraints is even more difficult for nonlinear dynamics.Thus, in most of the practical applications, terminal constraints are not enforced into the optimization procedure.Furthermore, classical MPC lacks recursive feasibility.Several varieties of MPC have been proposed to address processing issues to a certain extent.For a linear system, the performance index, i.e., J k (x, u, z re f , u re f ), can be defined as follows: where Q x , which is a positive semi-definite matrix, consists of the state error penalty coefficients, whereas R u should be positive definite and P is state error on the terminal cost.
In principle, stability and feasibility are not assured implicitly.Consequently, stability and feasibility tend to improve for the longer receding horizon, which is quite challenging due to computational demands.
Quadrotor dynamics are usually expressed in a nonlinear fashion.Therefore, LQR or linear MPC cannot be applied without linear approximation.Hence, the nonlinearprogramming-based approach has to be applied.Direct multiple shooting and direct collocation are the main two techniques that are used to transform OCP into nonlinear programming (NLP).In both direct multiple shooting and direct collocation, the state is minimized in addition to controlling inputs.Direct multiple shooting differs from direct collocation due to the way of the problem formulation.In multiple shooting, the problem is quantized into N subintervals, i.e., receding horizon length.In direct collocation, however, those subintervals are further described by a set of polynomials such as B-spline or Lagrangian; this will increase the problem sparsity.On the contrary, the number of optimization parameters to be optimized has dramatically increased in direct collocation compared to multiple shooting.This, collocation is better when it is accuracy-wise, but direct multiple shooting is better when it is performance-wise.In [62], the trajectory tracking problem is formulated based on direct collocation and multiple shooting.Furthermore, the researchers have proven that multiple shooting has a lower computational footprint compared to direct collocation.

Disturbance Estimation
In the context of optimal trajectory planning, simultaneously computing optimal control policy, which is required to respond to unknown, sudden disturbances, and handling kinematics (i.e., obstacle avoidance) as well as dynamics (i.e., satisfying velocity and acceleration constraints) yields a challenging problem, especially for quadrotors.While geometry-based path planning techniques [95,96] ensure the asymptotical optimality of a path, they however do not consider quadrotor dynamics.However, it is essential that the generation of an optimal control policy ensures dynamic feasibility.Thus, in [97,98], LQR was incorporated into path planning, by which both dynamic feasibility and local optimality were guaranteed.However, local optimality does not necessarily yield global optimality [99].In [26,32], a set of motion primitives was used to find feasible trajectories ensuring both global and local optimality.When dealing with unknown disturbances, MPC is a more robust technique than LQR.In [32], MPC-based trajectory planning approach was proposed, ensuring both the local and global optimality.However, none of the aforementioned approaches formally guarantees stability and safety.Lyapunov's analysis can be applied to confirm the local stability.Moreover, the terminal constraints set [100] can be incorporated.However, those measures are time consuming, which directly affects the realtime performance [101].A set of CBFs was proposed for improving real-time performance without affecting the system stability in [102][103][104].Recently, reference governors-based techniques were proposed in [105,106], enforcing safety constraints.It is natural that designing a path planer is followed by the actual controller to maneuver MAV.In such approaches, a reference governor can be used to handle the stability and constraint satisfaction separately to ensure system stability [107].
The above approaches are employed to estimate optimal control policy for safe navigation while imposing stability either using Lyapunov functions or reference governors.On the other hand, Li et al. [108] proposed to obtain an optimal control policy using a state-dependent distance metric (SDDM).They have modeled the system dynamics as a linear, time-invariant as follows: where u indicates the control input.The system state, i.e., x := (p(t), y(t)), consists of two parts: p and y, where p(t) denotes the quadrotor position at a given time t and y(t) describes the higher-order terms, e.g., velocity, acceleration, etc.In the latter work, the quadratic norm was utilized to represent the error between robot position and close-in obstacles positions.The quadratic norm is defined as p R := p T Rp, where R is a symmetric positive matrix.R[ψ z ] is fully determined by the MAV heading direction ψ z at a given time instance as follows: where both c 1 and c 2 are predefined scales such that c 2 > c 1 > 0; this process is called the SDDM, trajectory will be bounded incorporating SDDM information.Since quadrotor dynamics are linear, a reference governor [106] is introduced to maintain safety and stability.

Solving the Trajectory Planning Problem
As explained in the preceding sections, several constraints (e.g., soft and hard) are imposed to ensure dynamic feasibility, smooth navigation, handling disturbances, etc.Hence, optimal trajectory planning is posed as a constraint optimization problem in most situations.Constraint-based optimization problems are solved in two different ways: adding hard constraints or introducing soft constraints.In general, a constraint-based optimization problem can be formulated as a quadruple, i.e., P constraint = (c, g1, g2, J), where c stands for performance index or cost function, whereas equality and inequality constraints are given by g1 and g2, respectively.The objective function is given by J.In hard-constraint-based formulation, the optimal solution, i.e., w, for P constraint is calculated, ensuring all the constraints.In soft constraints formulation, the objective function does not need to satisfy all the constraints, but satisfying those constraints will improve the final w.D. Mellinger and V. Kumar [53] took the lead in proposing a successful approach for trajectory generation as a hard-constraint-based optimization approach, i.e., minimum-snap.Subsequently, in [35], the researchers extended the minimum-snap trajectory generation as an unconstrained or soft-constraint-based optimization problem.
When generating trajectories, ensuring a collision-free path is essential.Hence, representing free space in a structured way and imposing obstacle constraints for trajectory generation is a must for safety.Free space can be represented in different ways, such as cubes [26,74], spheres [82,118], and polyhedrons [32].The intuition of these approaches is to apply path planning through the free space to obtain the intermediate waypoints.Once intermediate waypoints are extracted, the trajectory generation procedure is utilized for retrieving a smooth, feasible, and collision-free trajectory.On the other hand, in [21,25,26], kinodynamic path planning followed by B-spline-based trajectory generation is considered.Most of the works that were proposed for soft-constraint-based trajectory generation formulated optimal trajectory planning as nonlinear optimization problems in which smoothness and safety were introduced as soft constraints.Most of the time gradient-descent-based [50] or gradient-free approaches [37,54] were applied for minimizing the cost of smoothness and safety.
The constraints optimization problem can be designed in either QP or NLP form.In QP, the procedure is to minimize or maximize the objective subject to a set of linear constraints in most situations.On the other hand, nonquadratic programming is used to handle the nonlinear constraints each of which has a unique nature to solve the problem.In general, the QP objective can be described as: where Ax b stands for the set of linear inequalities and Q is a positive symmetric matrix.There are various ways to solve QP, including interior point, active set, and gradient projection.In some situations, multiple variables that are to be optimized are integer values; those are solved as MIQP.For example, FASTER [119] used MIQP for safe trajectory planning with aggressive controls [36].Most of the recent optimal trajectory planning techniques [19,29,30,37] were formulated as gradient-based trajectory optimization (GTO) in which optimization problem was designed as a nonlinear form.The gradient descent is performed with respect to each parametrization index of Γ to minimize the difference, i.e., Γ i+1 − Γ i .Hence, Γ i+1 can be determined by solving the following optimization problem, as given in [50,120].
where M is a weighting matrix and η is a regularization parameter.GTO is rather popular due to its ability to deform ineffability trajectory segments, low memory requirement, and high throughput.Despite having the listed advantages, GTO cannot avoid the problem of a local minimum.STOMP [54] is one of the early techniques proposed to address the local minimum problem.STOMP is based on the gradient-free technique.However, STOMP is unable to obtain real-time performance.Besides STOMP, the local minimum problem has been addressed by various recent works.However, this remains an open problem to be solved.Zhou [121] proposed a method, i.e., path-guided optimization (PGO), for overcoming local minima problems by generating topologically distinct paths and doing parallel optimization.Furthermore, various solvers can be utilized for solving optimization problems, including BOBYQA [122], L-BFGS [8,123], ACADO [124], SLSQP [125], Proximal Operator Graph Solver (POGS) [126,127], sequential quadratic programming (SQP), and MMA [128].Shravan et al. [65] proposed a trajectory optimization technique in a distributed setup in which the researchers evaluated their formulation with several solvers.According to their observations, BOBYQA is faster compared to BFGS and SLSQP, while MMA yielded a similar performance to that of BOBYQA.In [129], L-BFGS was proposed for finding the shortest path in real-time; in this research effort, however, L-BFGS does not guarantee optimality, and only feasibility is enforced.Mathematical program with complementarity constraints (MPCC) [130] is yet another proposed method for fast trajectory optimization in real-time.Moreover, Mathieu and Nicolas [131] proposed a SQP-based trajectory generation approach for carrying augmented loads.The intuition behind selecting SQP over other solvers is due to its superlinear convergency and ability to handle nonlinear constraints within milliseconds.

Conclusions
All in all, we have thoroughly reviewed the trajectory planning problem in the paradigm of plan-based control for multirotor aerial vehicles (MAVs).Such a trajectory planning problem was broken down into a set of subproblems: free-space segmentation, motion model selection, initial waypoints identification, initial trajectory generation, continuous trajectory refinement, and receding horizon trajectory planning.Afterwards, for each subproblem, we examined how previous research has addressed those by presenting and evaluating various approaches to the considered subproblem.Finally, several selected recent approaches were listed (Table 3) according to the listed subproblems we have identified.Furthermore, Table 4 summarizes the key findings of the study, including features and performance.With that, we concluded that the trajectory planning problem can be designed by addressing those subproblems carefully for MAVs.A sums-of-squares (SOS) programming approach that ensures the entire piece-wise-polynomial trajectory is collision-free using convex constraints.
Computation time (≈60 ms), limited maneuvering capabilities A trajectory tracker [62] Nonlinear model predictive control (NMPC) with multiple shooting is used to predict the optimal control policy at each iteration.Computation time (≈60 ms), limited maneuvering capabilities Residual dynamics [147] A learning-based technique using Sparse Gaussian Process Regression is proposed to reduce the residual dynamics between high-level planning and low-level controlling Computation time (≈20 ms), extreme maneuvering capabilities A replanner [8] A continuous optimization-based method for refining the reference trajectory to move it out of obstacle-occupied space in the global phase.
Computation time (≈15 ms), extreme maneuvering capabilities Our hypothesis was to investigate trajectory planning for multirotor aerial vehicles (MAVs) in the plan-based control paradigm, focusing on analytical approaches rather than fully learnable approaches, such as machine learning and deep learning.In future work, we plan to investigate other types of trajectory planning approaches, such as those based on imitation learning and inverse reinforcement learning.We also plan to conduct simulated and real-world experiments to compare the performance of the considered approaches in various conditions, such as high-dense and less-dense environments, and with static and dynamic obstacles.These considerations were not included in this work, as we focused on the theoretical aspects of trajectory planning.Apices (q) stipulates the qth derivative, for example z (q) C Configuration space that can be one of these: The optimal estimation for states and/or controls after minimizing given cost function g Equality constraints are denoted by g 1 (w), whereas inequality constraints are denoted by g 2 (w)

Figure 1 .
Figure 1.An overview of the plan-based control paradigm in the context of trajectory planning problem formulation.There are various ways to formulate the trajectory planning problem, each of which consists of a set of submodules (green color boxes) depending on the problem behavior.

Figure 3 .
Figure 3. Showing the B-spline convex-hull property.Convex hull, which comprises consecutive control points, e.g., p i , p i+1 , p i+2 and p i+3 , always belongs to obstacle-free space if the preceding control points satisfy the inequality (33).4.7.2.Continuity B-spline-based trajectory generation offers several advantages over piece-wise-based trajectory generation [35,37].In piece-wise-based trajectory generation, the boundary constraints must be explicitly satisfied to ensure continuity.The smoothness of the trajectory depends solely on how the control points are formed.In contrast, B-spline-based trajectory generation can neglect boundary constraints because the entire trajectory can be treated as one segment.Moreover, as explained in Section 4.7.1,B-spline-based trajectories can be controlled locally without affecting the rest of the trajectory.

Figure 4 .
Figure 4. Trajectory generation using uniform B-spline.The smoothness of the curve is dependent on the degree of the B-spline.The trajectory passes precisely through the given control points at the degree equal to 1, as depicted in light blue color.The smoothness of the trajectory increases with the order of the B-spline.

Figure 5 .
Figure 5. Generated trajectories using three different approaches for a given sequence of control points and knot vector (36).

Figure 6 .
Figure 6.Changes of position, velocity, acceleration, jerk, and snap profiles over time for the provided control points sequence and knot vector (36).
Let C = [c m 1 , c m 2 , . . .] be a set of consecutive grids within the OctoMap and corresponding free space regions beC f ree = [c f 1 , c f 2 , . . .].Both c mi and c f i were defined as cubes, each of which is described by:

Figure 7 .
Figure 7. Free space extraction using SFC.Once intermediate initial waypoints are defined, SFC calculates free space along the path, which is constructed from the initial waypoints.

Figure 8 .
Figure 8.The basic idea of MPC-based receding horizon planning.MPC-based receding horizon planning predicts the optimal control policy u * at each iteration to minimize the given cost function J.
C f ree , C obs , C unknown , and C unknown d Order of polynomial Γ Initial trajectory; the optimal trajectory is defined as Γ * , trajectory derivatives are defined as Γ and Γ, and trajectory is a function of time, i.e., Γ T (t) ξ Regularization parameter c Formulation of cost function, where c(•), • denotes the inputs A H representation of polytope, i.e,A T p = b w

Table 2 .
Comparison of the properties of receding horizon trajectory planning techniques.Checks and cross marks indicate whether a feature is available or not.

Table 3 .
The basic building blocks that are encountered in trajectory planning problems as described in the paper.

Table 4 .
[25]comparison of contrasts features and performance of selected different approaches for trajectory planning.PGO) approach to address infeasible local minima problems, not limited to a specific use Computation time (≈15 ms), perform aggressive maneuvers A replanner[25]A dynamically feasible time parameterized trajectory generation to overcome the limitation of the greedy search, not limited to a specific use Computation time (≈30 ms), limited maneuvering capabilities A replanner [101] Searches for smooth, minimum-time trajectories using a set of short-duration motion primitives, not limited to a specific use Computation time (≈15 ms), limited maneuvering capabilities Teach-repeat-replan [14] Generate safe local trajectories (smooth, safe, and kinodynamically feasible) to avoid moving obstacles, infrastructure inspection, aerial transportation, and search-and-rescue Computation time (≈15 ms), perform aggressive maneuvers Fast planner [19] Kinodynamic feasible and minimum-time trajectory generation in the discretized control space, not limited to a specific use Computation time (≈5 ms), perform aggressive maneuvers EGO-Planner [134] A Euclidean signed distance field (ESDF)-free gradient-based planner, not limited to a specific use Computation time (≈2 ms), extreme maneuvering capabilities ILQR [91] LQR smoothing to compute a locally optimal feedback control policy, can work with nonlinear dynamics and nonquadratic cost, limited to a specific use Computation time (≈3 s), limited maneuvering capabilities Monocular visual-inertial fusion [28] A monocular visual-inertial navigation system (VINS), consisting only an inertial measurement unit (IMU) and a camera.VINS supports self-extrinsic calibration Computation time (≈30 ms), limited maneuvering capabilities Safe flight corridors (SFC) [32] The SFC is a set of overlapping convex polyhedra that represent free space and provide a connected path for the robot to reach its goal.Computation time (≈100 ms), extreme maneuvering capabilities SDDM [108] A control policy for MAVs systems that uses ellipsoidal trajectory bounds defined by a quadratic state-dependent distance metric.SDDM behavior is adapted to the geometry of the local environment.Computation time (≈100 ms), limited maneuvering capabilities FASTER [119] FASTER guarantees safety without compromising speed by having a safe backup trajectory, and MIQP is used to allocate trajectory intervals Computation time (≈15 ms), extreme maneuvering capabilities MADER [141] MADER uses the MINVO basis to generate trajectories through free space more effectively than Bernstein or B-Spline bases in obstacle-dense environments.
State vector and its derivative is denoted as ẋ.Term x + depicts the next state given the current state x, and term x k denotes discrete state at time t equals k u Control input.The term u * denoted as the optimal control inputs p Position (m) in R 3 and its derivative is denoted as ṗ.p * , * ∈ x, y, z, stands for position alone * component p dth-order polynomial, which is a function of time.Term ṗ(t), p(t) (or p(t) (1) , p(t) (2) ) denote the higher order derivatives of p(t) λ Polynomial coefficients, e.g., p(t) = λ d t d + ...+ λ 1 t + λ 0 , t ∈ [0, dt], where d is the order of the polynomial v Velocity (m/s) in R 3 and its derivative is denoted as v. v * , * ∈ x, y, z, stands for velocity alone * component ω Angular velocity (rad/s) in R 3 and its derivative is denoted as ω ω ω ψ Orientation is represented as quaternion in R 3 and its derivative is denoted as ψ ψ ψ. ψ ψ ψ * , * ∈ x, y, z, stands for orientation alone * component f= [ f 1 , f 2 , f 3 , f 4 ] TSystem input or total trust that is applied for each of the motors in N (Newton) f d