Vision Object-Oriented Augmented Sampling-Based Autonomous Navigation for Micro Aerial Vehicles

: Autonomous navigation of micro aerial vehicles in unknown environments not only requires exploring their time-varying surroundings, but also ensuring the complete safety of ﬂights at all times. The current research addresses estimation of the potential exploration value neglect of safety issues, especially in situations with a cluttered environment and no prior knowledge. To address this issue, we propose a vision object-oriented autonomous navigation method for environment exploration, which develops a B-spline function-based local trajectory re-planning algorithm by extracting spatial-structure information and selecting temporary target points. The proposed method is evaluated in a variety of cluttered environments, such as forests, building areas, and mines. The experimental results show that the proposed autonomous navigation system can effectively complete the global trajectory, during which an appropriate safe distance could always be maintained from multiple obstacles in the environment.


Introduction
Due to their good flexibility, strong maneuverability, easy operation, and unique viewpoint, micro aerial vehicles (MAVs) are widely used in aerial photography, searchand-rescue missions [1], delivery of goods, and mine exploration [2], etc. The autonomous navigation systems of drones can replace humans to complete some tasks that are dangerous or impractical for humans. The research on autonomous navigation systems can effectively reduce the complexity of tasks and improve the efficiency of their execution. It has attracted widespread attention in recent years. For example, in search-and-rescue missions, an autonomous navigation system can autonomously control the MAV to avoid obstacles in real time and provide image information with a unique perspective for rescue operations.
Typically, an autonomous navigation system consists of four parts: perception, mapping, planning and control, in which the planning provides a real-time, safe, collision-free trajectory for MAVs that meets dynamic constraints. At the present, the research of planning based on target searches in unknown environments mainly deals with two problems: the first is obstacle avoidance, while the second is solving the problem of getting stuck at local minima. This poses a special problem in unexplored or partially unexplored environments, where only locally optimal or reactive planners will frequently fail to find a path [3]. Different from the known environment, an autonomous navigation system in partially known or unknown environments will model the environment through sensors step by step. Since the environmental information is gradually obtained, there is a high risk of possible collision in the trajectory at the current and next moments. Therefore, the planning policy should not only gradually complete the task through exploration, but also have the capability of guaranteeing the safety of the MAV in unknown surroundings.
However, current algorithms cannot fully meet this requirement. The proposed scheme simplifies the environment into a model with sparse and evenly distributed obstacles and makes optimistic assumptions about it [4]. This method cannot make immediate adjustments in the presence of large, irregular obstacles, although it is indispensable in a real environment. The author proposes that the trajectory follows obstacles and always maintains a safety range [5] when MAVs face complex environments. Although this method guarantees sufficient safety, planning is restricted. It is not only unfavorable to the execution of the task, but also makes planning get stuck at local minima, and can even fail to find a path. Another type of approach does not focus on obstacles, but ensures safety from the perspective of local re-planning. Some scholars combine the strategy of exploration and re-planning, which has shown good results. However, the global trajectory is mechanically spliced between the previous and next trajectories, and so lacks flexibility. This leads to discontinuities in the trajectory direction, which greatly increases the possibility of drone collisions.
Based on the rich information from the vision sensor, in this work we propose a model for extracting spatial-structural information, which can guide drones well to maintain a suitable safe distance with obstacles in space. With it, we develop an exploration strategy with a B-spline-based local trajectory optimization by selecting intermediate goal points to ensure a sufficiently smooth global trajectory. Additionally, we quantitatively compare our autonomous navigation system with the Ewok system [6] by simulating a variety of different environments for the system to operate in, including three specific scenarios: a dense forest, building and mine. Experiments show that our planning algorithm can not only achieve a better connection between the two trajectories, but also maintain a high success rate in a variety of complex environments.

Related Work
At the present, there are many studies on path planning strategies in unknown environments, which can be divided into three categories: motion primitive-based approaches,trajectory optimization approaches and sampling-based approaches.
Motion primitive-based approaches discretize the state space of the drone into a variety of motion primitives and then search for the corresponding operation in the motion primitive library according to the current state of the drone. The approach makes an immediate decision based on the sensor data without building a persistent map [7][8][9]. This method has the advantages of a fast response and a good effect on moving objects, but it can only run in a scenario with low obstacle density, and it easy for it to get stuck in the local optimal position in a complex obstacle environment.
Trajectory optimization approaches rely on a minimized objective function to achieve a collision-free and smooth trajectory through multiple iterations of optimization. CHOP [10] and CTTOP [11] both use a two-part objective function with a smoothness and collision cost. The difference is that the former uses a series of discrete points to express the trajectory and the latter uses polynomial segments instead. A more recent advent in trajectory optimization expresses the trajectory by B-spline and performs gradient descent with positions of discrete waypoints as parameters [6]. A trajectory optimization-based planner requires an initial trajectory. If the initial trajectory passes through a large obstacle, the ability of this optimizer will be very limited, and the trajectory cannot be guaranteed to be collision-free. Therefore, it is not suitable for large-scale path planning.
The closest to our proposed approach are search-based approaches. Traditional methods, such as RRT [12] and RRBT [13], obtain samples from graphs by random sampling, collision detection, expansion of the RRT tree, and repeated iterations. Finally, a nonsmooth path from the start point to the end point is constructed. This method initially needs to search in the global map, so an a priori map is necessary. In order to adapt to the unknown environment, scholars have proposed a strategy of exploration, which does not focus on collision avoidance, but tries to discover the unexplored space. The proposed receding horizon Next-Best-View Planner (NBVP) was proposed by Bircher in [14], which counted the number of unknown occupied cells in the field of view of the sensor as the potential exploration value of this sample point and established the RRT tree. Each time, the MAV only executed the branch that could obtain the highest profit. Anna [15] uses the storage characteristics of an Octomap to extract the frontier instead of the complex calculation of the traditional method and incorporates the idea of exploration gain based on the sampling method, which achieved a good effect in the exploration map. Another work saves the explored historical information of NBVP so as to achieve a high sampling efficiency [16]. In an unknown environment, due to the limited perception range of the sensor, the contribution of RRT to planning is limited, requiring a lot of calculations to maintain the RRT tree. Oleynikova et al. [3] extend the method of NBVP and replace the method of RRT tree with the strategy of intermediate goal, by proposing a local trajectory optimization method represented trajectory based on a polynomial. Although the exploration strategy of intermediate goal is very effective in guiding drones to explore in unknown environments, the local plan optimizer expressing the trajectory in a polynomial obtained the global trajectory with insufficient smoothness and many inflection points, especially in the connection point between two segments trajectories. The proposed algorithm in this paper extends the exploration strategy of intermediate goal, but replaces random sampling by extracting samples from the special spatial-structure information; in particular, the proposed algorithm expresses the trajectory based on B-spline that has better locality for connecting two segments trajectories and achieves sufficient smoothness of the global trajectory.

System Framework
In order to make the autonomous navigation system adapt to different application scenarios, we assume a task scenario in which a drone needs to explore from the source point (x s , y s , z s ) to the goal point (x g , y g , z g ). There is a space V ∈ R 3 without prior information, where the obstacles are densely distributed and unevenly distributed. The drone is equipped with an autonomous navigation system and vision sensor. The vision sensor can obtain an RGB image and depth image in the field of view of the drone at the current position, but the effective range of depth information is limited, as shown in Figure 1. The autonomous navigation system is mainly composed of four parts, perception, mapping, planning and control; in this work, we focus on the part of planning. Since there is no prior environmental information, MAVs need to continuously model environmental information through vision sensors, called perception. Vision-based autonomous navigation systems generally obtain visual information from a vision camera that can be an RGB-D camera, monocular or stereo. Visual simultaneous localization and mapping (V-SLAM) technology is often used to provide the drone's pose and map, which is mainly divided into two parts: localization and mapping. Since our research is mainly focused on the planning part, we provide drone poses through high-precision IMU and obtain the grid map through ray-cast operations [6,17], which is a typical mapping algorithm. At the same time, we propose a special spatial-structure information extraction model, which will input the two-dimensional image and output a special spatial-structure information ξ. Planning plans a collision-free path and satisfies a series of constraints trajectory for MAV. In this part, we have improved the sampling method to enhance the applicability of the autonomous navigation system in the actual environment. The intermediate goal selected by our planning algorithm is more instructive and safer. Local planning optimizer can adjust flexibly in real-time, when the drone faces unmodeled, large and irregular obstacles. Control receives control information from planning to guide drone flight, as shown in Figure 2. Autonomous navigation system. The spatial-structure extraction model obtains image data from the visual-sensor, and sends it to goal-find planning. Goal-find planning quantifies the exploration and safety values of each viewpoint, selects the best inter-goal and detects collision. Then, local planning optimizer relies on inter-goal guidance to plan the trajectory.

Methods
Our path planning method is an extension of intermediate goal strategy [3], which belongs to the sampling method. Intermediate goal strategy compares a large number of randomly sampled points to obtain a position with a greater exploration value. However, the sample points obtained by random sampling contain limited spatial information, which cannot well express the relationship between the drone and the objects in space. It is difficult for such an intermediate goal point to ensure the safety of the drone in an unknown environment. To deal with these problems, we propose a spatial-structure extraction model, which will process the two-dimensional image and output special spatialstructure information. The special information can be expressed as a ray or a viewpoint in R 3 , which are limited in number and contain information about the distribution of objects in space. Then, we improve the original value evaluation function by adding a safety evaluation factor, balance the value of exploration and safety with a weight factor, and adjust the number of candidate intermediate goal points in an environment-adaptive manner. Last but not least, we propose setting the intermediate goal for the initialization parameter of the local planning optimizer, which expresses the trajectory in B-spline. In order to facilitate the evaluation of the value of exploration and safety, we use the circular buffer, a kind of occupancy grid map, to store the environmental information obtained by the sensor, as shown in Figure 3.

Map Representation
Building a grid map is a very common strategy used to facilitate collision detection in planning. Environmental information is projected onto the grid map through the sensor. Objects in space are represented in the grid in an occupied state. One grid also includes occupied and unknown states. Usenko compares a kind of 3D circular buffer [6] and the Octomap [18]. The three-dimensional Circular Buffer is a kind of grid map that defines a variable o f f set and a continuous array with a size of 2 N. The o f f set is given by the drone position, and array stores the state of each voxel in the corresponding space with the o f f set as the center. The circular buffer maintains a higher rate and higher quality for point cloud insertion and occupancy information query, which can speed up the estimation of exploration value and safety value. So, we set the three-dimensional circular buffer as a storage method for local environment information.
where O f f set(o x , o y , o z ) is constantly updated by the drone position. In particular, (n x , n y , n z ) is the index in the Array corresponding to the spatial coordinate (x, y, z). We proposed a spatial-structure information extraction model, in which this special information expresses a direction relationship between the position of the MAV and the geometric center of the object or hole in MAV's field of view. We define ξ as the special structure information, which represents multiple rays from the position of the drone to its viewpoints. The direction of the ray is the geometric center of one object or one hole formed by multiple objects.

Spatial-Structure Information Extraction
The ray-cast method [6] can efficiently insert the point cloud data of the visual sensor into the circular buffer and update the occupancy probability of each voxel. However, the detection distance range of the vision sensor is limited, so only the point cloud information within the detection range is beneficial to the circular buffer. We design a spatial-structure information extraction model based on image segmentation technology to extract environmental information. The method can extract information outside of the valid depth range of the visual sensor from the image. It is not a traditional one-to-one mapping of voxels in space and does not require the support of depth information of vision sensor. Instead, it uses the overall characteristic of a single object in space presented in the image data to map the spatial structure of the object from the outline.
The spatial-structure extraction algorithm is divided into three steps: cutting image, extracting center-point, and converting reference system. Cutting image is the core of the spatial-structure information extraction algorithm. We define a single image Fig, and a single Block, which represents a collection of pixels with the same characteristic in the image. We segmented the image based on the graph-based image segmentation algorithm [19], which abstracts the pixels in the image into vertices and edges in the graph theory, and divides the graph into multiple regions C by the clustering method. The algorithm gives three parameters, Sigma, k, and Min to, respectively, control the size, color sensitivity and minimum block of the segmented image.
After cutting, Block corresponds to all the pixels contained in the regional C in the graph, which incorporates different objects or gaps formed by the extrusion of multiple objects. In the extracting center-point part, we calculated the smallest circumscribed circle of each Block and extracted the center points (u, v). Since (u, v) is only extracted from the RGB image, the depth information is lost. In order to facilitate the calculation, we fixed z c = C as a constant, which is more appropriate to use the effective detection range of the vision sensor. One 3D viewpoint view(x, y, z) will be obtained in the drone world reference by converting the reference system. The conversion is shown in Equations (6) and (7).
where cx, cy, f x, and f y are four camera internal parameters and T is the transform matrix. From above, we obtain some viewpoints view(x, y, z = c) with spatial-structure information from the drone's field of view. It is used to represent a ray ξ, which starts from Dp and points to the view that is the geometrical center of an object or an gap, as shown in Equation (8). The spatial-structure information extraction is shown in Algorithm 1.
where Dp is the drone's position, and view is the viewpoint obtained from cutting fig. The number of viewpoints is n.

Intermediate Goal
We combined the special spatial-structure information ξ to improve the intermediate goal strategy [3]. Since the special spatial-structure information always points to the geometric center of the object or the gap, it can find an optimal intermediate goal point very effectively and control the number of candidate seeds through the parameters of the spatial-structure information extraction model.
The planning algorithm [3,14,20] sets the potential exploration value as a cost function in order to escape from the local optimal solution when selecting the next intermediate goal points. After analysis, the method that simply sets the exploration value as an indicator can easily lead drones into dangerous scenarios. Combining the proposed spatial-structure information ξ, we divide the cost function Dv into two parts. One part is the exploration obtained by counting the number of grids in an unknown state in the circular buffer map. The other part is safety obtained by counting the number of grids in an unoccupied state in the circular buffer map. We balance the two indicators of exploration and safety by different weight parameters.
D v = λ s sa(Dp, ξ) + λ e ex(Dp, ξ) where λ s and λ e are the weighting parameter to balance the two values of exploration and safety in the cost function Dv. In particular, Dp is the drone's position, while ξ represents the special spatial-structure information. Vision sensors have a fixed view field FOV, which limits the observation range of a single frame of image in the horizontal plane. The strategy of intermediate goal just obtains the candidate seeds by the camera's view field at a certain range. The adaptive control of the number of viewpoints expands the drone's view field. The method is used to adaptively control the number of viewpoints through the environment and store them in the inter-goal buffer. The intermediate goal algorithm is shown in Algorithm 2.
We set a threshold for the cost function Dv. Viewpoints that exceeded the threshold were all saved in the inter-goal buffer, while the others were discarded. This threshold is dynamically updated in each iteration. The advantage of self-adaptation is that the number of viewpoints can be adjusted in real time according to the environmental conditions, and the Dv value of viewpoint can be kept stable.
where DvBu f f er is the set of Dv values for each viewpoint, and num is the number of viewpoints expected to be retained. α and β are two respective weight parameters. The larger the α, the more stable the number of viewpoints and the less flexibility.

Flight Status Switch
We obtained the intermediate goal ξ by cost function, which contains a specific direction from the drone position in R 3 . In order to provide a three-dimensional point for local planning to optimize the local trajectory, we added a radius r to ξ and set three types of radius lengths to divide the flight state into three types. The longer the radius r, the smaller the number of intermediate goals. However it is difficult to find a feasible solution in the environment with dense obstacles. As a result, it is easy for planning to fall into a local optimal solution. On the other side, the shorter the radius r, the more intermediate goals, and the more time it takes to iterate an intermediate goal. However, in the gradual short-distance exploration, more feasible solutions can be searched to ensure that the MAV reaches the final goal.
The first flight state is called the Forward state, which is designed to pass through the scene with sparse obstacles. The flight step length r is set to be as long as possible to avoid the complexity caused by frequent switching of intermediate goal. The second state is called the exploratory state, designed to pass through dense obstacles, in which the planning algorithm will frequently fail to find a path. The step length r is set to be shorter in order to explore the environment and eliminate visual blind spots step by step. The third state is designed to control the direction of the MAV's view field. The flight state switch algorithm is shown in Algorithm 3.
In order to avoid detours and ensure the MAV advances to the goal, we set the state to keep the MAV's yaw angle as far as possible to the goal direction and propose a simplified method to estimate the deviation between the intermediate goal and the goal in the yaw angle and pitch angle directions, as shown in Figure 4. The reward function Lv is used and defined as follows.
Lv = wYaw·y vsg + wPich·p vsg (14) where y vsg is the angle formed by the intermediate goal point ξ r = (x ξ , y ξ , z ξ ), the drone position point Dp(x s , y s , z s ), and the goal point goal(x g , y g , z g ) on the horizontal plane. p vsg is the angle formed by ξ r , Dp and goal on the vertical plane. wYaw and wPich are weights parameters.

Local Planning Optimization
The local planning optimization algorithm takes the global path as the initial trajectory input [20,21] or initializes a straight line from the starting point to the goal point as the input [3,6,11,14]. Then, it penalizes the deviation from the initial trajectory in the cost function to ensure that the local path planning is around the initial trajectory. Although the method greatly ensures that the local trajectory follows the initial trajectory, it also limits the optimization range of the local trajectory, lacks flexibility, and cannot effectively deal with the emergency goal point change. We use b-spline to express the trajectory, other than the traditional mode of taking initialization trajectory as input. The intermediate goal point is directly used as a part of the objective function of the local planning. We divided the penalty function of trajectory optimization into three parts: the first part is the intermediate goal cost function that penalizes the trajectory deviating from the intermediate goal, the second part is the collision cost, and the third part is the trajectory smooth cost function.
where p(t) is the drone position at time t. ξ r is the intermediate goal point intercepted from the ray of radius r. d(p(t)) is the distance function from the drone position to the obstacle, which is directly obtained by ESDF. Es is the penalty integral over square derivatives of the trajectory to smooth the trajectory. It set an exponential function to limit the maximum speed and maximum acceleration, which was proposed by Usenko in [6]. sgn(x) is a symbolic function. λ g , λ c and λ s are three weight parameters.

end if end switch end while
The papers [22,23] found that B-spline has sufficient local characteristics. When the local control point changes, the new trajectory can be expressed as q(t).
The new curve q(t) is the sum of the original curve p(t) and a translation vector vN i,k (t),which is non-zero on the interval [t i , t i+k+1 ). Hence, moving a control point only affects the shape of a section of the given curve. It can be seen from Equation (22) that when the intermediate goal points are alternated or urgently replaced, only the optimized local trajectory in the trajectory will change, and the optimized local trajectory will be retained. The flight control of the drone is not affected. It not only improves the flexibility of path planning, but also greatly guarantees the safety of MAV flight.

Experimental Results
In this section, we present experimental results obtained using the proposed approach. We combined the drone simulation platform Rotors simulator [24] and the ROS software platform to build a drone autonomous navigation system simulation platform. We provided six kinds of simulation maps to test the performance of the planning algorithm and autonomous navigation system. The Rotors simulator provides a simulated RGB-D camera, which provides RGB and depth images similar to the real camera at 20 FPS, and the effective range of the depth image is set to 4.8 m, which is based on a general RGB-D camera as a reference standard. ROS is a well-known distributed operating system, which provides task scheduling for tasks in the form of nodes. The spatial-information extraction model we proposed is encapsulated as a Graph Cut node in the ROS system and runs at a frequency of 4 Hz, which continuously provides seeds for the intermediate goal. Planning is also packaged in the form of an Intermediate Goal Planning node in ROS and runs at a frequency of 2 Hz. Additionally, dt is set to extract control points at 0.5 s for optimization. Six simulation maps include three typical application scenarios: forests, urban building areas, and underground mines. Some holes are designed in the urban building area to test the ability of planning to penetrate and avoid obstacles.

Spatial-Structure Information Extraction
In the experiment, we set parameters sigma = 0.5, k = 2000, Min = 2000 to the graph-based image segmentation algorithm and checked the effect of spatial-structure information extraction method. Some of these images are obtained from the software Gazebo, simulating the actual environment, and the other part is obtained from the real environment. The image's resolution is 640 × 480. The result is shown in Figure 5. It can be seen that the center point extracted by the image in the simulation environment is very effective, and it can basically match the center of the object and the gap in the image. Due to the complex environment, the cut block became irregular, but different objects and gap centers can still be extracted. This verifies the applicability of our model to the actual environment.

Across the Gap
In order to estimate the performance of the spatial structure extraction model, we used a pavilion with four holes as a test scene. The goal position was set at (6.0, 0.0, 1.0), and the drone started from a distance of about 6 m from it. The directions are 0°, 15°, 30°, and 45°. The result is shown in Figure 6. It shows that the first intermediate goal is directly positioned to the geometric center of the hole at 0°, 15°and 30°. Then, the drone is guided to safely fly to the goal point. At the 45°position, the drone did not find the best intermediate goal under the initial search, but the geometric center of the hole was still located during the flight, so as to guide the drone to reach the goal position safely.

Local Planning Optimization
We used a variety of simulation scenarios to test the performance of our local planning optimizer. Part of the trajectory is shown in Figure 7. In the process of testing, the current intermediate goal is blocked by obstacles, which negatively affects local planning, but a new intermediate goal is selected quickly to guide local planning. Trajectory avoids obstacles smoothly and the original path will not be affected. Compared with the polynomial trajectory expression proposed in [3], the time cost and smoothness are greatly improved. The smooth trajectory can greatly ensure the safety of MAV flight and effectively shorten the flight trajectory.

System Simulation
In order to evaluate different aspects of our autonomous navigation system, we compare our system and Ewok [6] in different scenarios, and record the time cost, distance cost, and success ratio, respectively. Both our method and Ewok take the path planning of a fixed source point and goal point as the basic task.   The results show that Ewok reached the goal with a 100% success rate, its trajectory maintained smoothness and was collision-free, and it performed perfectly in terms of time and distance in map No.1. However, EWOK's success rate is greatly reduced when faced with large and irregular obstacles in maps No.2, No.3 and No.4. In its successful cases, the MAV frequently approaches the obstacle surface during the flight, leading to difficulty in ensuring the safety of drones. On the contrary, our algorithm can still maintain a high success rate in complex scenarios, while maintaining a suitable distance between multiple obstacles, which greatly guarantees the safety of the drone.
In maps No.5, No.6 and No.7, the MAV can only pass through the door to reach the goal due to the occlusion of the wall. EWOK cannot obtain a feasible path through re-planning optimization, but our intermediate goal strategy plays an important role. The position of the door can be found with the help of spatial-structure information and switching flight status. When the drone falls into a local optimal problem, our planning algorithm can quickly pull the drone out of the local optimal solution area. It can be seen that in a space with a more regular structure, the intermediate goal is more accurate, and the drone is safer.

Conclusions
This paper presents a set of vision-based autonomous navigation systems, which can still provide a collision-free and real-time trajectory in an environment with densely distributed obstacles, as well as with a previously unknown map. We use the basic idea of graph-based image segmentation to construct a spatial extraction model, combined with the exploration-inspired sampling method. We improved the optimization performance by adding the control point of B-Spline with an intermediate goal. The simulation results verified that our system can still show excellent performance in the face of complex and changeable scenarios in contrast to Ewok.
Author Contributions: Methodology, X.Z.; data collection, X.Z.; formal analysis, Z.Y. and X.Q.; writing-original draft preparation, X.Z., J.C. and Z.Y.; writing-review and editing, X.Z. and Z.Y.; supervision, J.C. and X.Q.; project administration Z.Y. All authors have read and agreed to the published version of the manuscript.