Optimal Frontier-Based Autonomous Exploration in Unconstructed Environment Using RGB-D Sensor

Aerial robots are widely used in search and rescue applications because of their small size and high maneuvering. However, designing an autonomous exploration algorithm is still a challenging and open task, because of the limited payload and computing resources on board UAVs. This paper presents an autonomous exploration algorithm for the aerial robots that shows several improvements for being used in the search and rescue tasks. First of all, an RGB-D sensor is used to receive information from the environment and the OctoMap divides the environment into obstacles, free and unknown spaces. Then, a clustering algorithm is used to filter the frontiers extracted from the OctoMap, and an information gain based cost function is applied to choose the optimal frontier. At last, the feasible path is given by A* path planner and a safe corridor generation algorithm. The proposed algorithm has been tested and compared with baseline algorithms in three different environments with the map resolutions of 0.2 m, and 0.3 m. The experimental results show that the proposed algorithm has a shorter exploration path and can save more exploration time when compared with the state of the art. The algorithm has also been validated in the real flight experiments.


Introduction
During the last decade, the research focus on Micro Aerial Vehicles (MAVs) for search and rescue tasks has considerably increased [1][2][3][4][5]. Because of the small size and high maneuvering, MAVs can perform search and rescue missions more efficiently. Considering the payload of MAVs, only limited sensors can be carried on-board. For this reason, choosing the type of sensors and building an efficient system for MAVs is still challenging in order to perform such autonomous missions.
In the application of the MAVs, a large number of works have been carried out in the area of autonomous exploration [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. MAVs-based autonomous exploration needs miniaturized sensors which can provide rich information to successfully fulfill the endeavor. Particularly, LIDARs or cameras are the main sensors used by the research or industrial society which are able to provide high dimensional information for the autonomous exploration or inspection tasks. Compared with LIDARs, cameras such as RGB cameras, RGB-D cameras or event cameras have a huge advantage in the price and size.
In this paper, an RGB-D camera which can provide point clouds is used as the main sensor to perform the perception tasks. Then, the information from the RGB-D camera is used for the mapping and planning module to find a feasible path that has more environment information. The OctoMap for mapping and Information Gain (IG) is used to search the optimal goal position and orientation for exploration. With the optimal goal, a safe corridor-based A* algorithm is used to plan safety and feasible path. Finally, position control is applied to fly the path.
There are three main innovations in our work: First of all, the efficent strategy for generation frontier points, filtering frontier points using the k-means++ and finding the optimal frontier using IG information. Secondly, combining a safe corridor-based A* planning algorithm with a position controller to move to the optimal frontier. At last, the algorithm is integrated with our aerial robot framework and tested in a custom edge computing aerial platform.
The remainder of the paper is organized as follow. Section 2 presents the related works. The robot architecture and environment definition are described, and the problem is formulated in Section 3. The proposed optimal frontiers generation and motion planning algorithm are explained in Section 4. Section 5 introduces the algorithm complexity. The experimental results and discussion are shown in Sections 6 and 7, respectively, and finally, Section 8 concludes the paper and summarizes future research directions.

Related Works
A number of methods have been proposed during the last few decades addressing the problem of autonomous exploration in unconstructed environments. There are two fundamental types of approaches: the frontier-based approaches and sampling-based approaches. The frontier-based approaches try to maximize the exploration efficiency by selecting and exploring the optimal frontiers between the known and unknown area of a map, while the sampling-based approaches randomly generate robot states and search the path which can maximize the information gathered in the environment. There are also works [8,9] that combine the frontier-based and sampling-based approaches. In recent years, the learning-based strategies [10,11,21] such as reinforcement learning, are also used to solve the autonomous exploration problem. In this paper, the classic autonomous navigation algorithms are mainly discussed. In the remainder of the section, firstly the literature using the frontier-based approach is introduced and then the work using the sampling-based method are explained; finally, the research using the hybrid exploration strategy is summarized.

Frontier-Based Autonomous Exploration
The frontier-based exploration approach was first introduced by B. Yamauchi in the year of 1997 [12]. Considering the limited computing resources on-board MAV, S. Shen et al. [22] proposed an autonomous exploration algorithm only using the information of the known occupied space in the current map. After evaluating the information in the regions determined by the evolution of a stochastic differential equation, the ones of most significant particle expansion correlated to unexplored space and then MAV started to explore these unknown regions. In 2013, C. Dornhege and A. Kleiner proposed a frontier-void-based autonomous exploration approach [23]. They combined two concepts of voids, which are the unexplored volumes and frontiers. Their approach has been evaluated in a mobile platform with a five DOF manipulator. Recently, S. Bai et al. [13] used Bayesian optimization to predict mutual information (MI) of the frontiers and then trained a Gaussian process (GP) to estimate the MI in the robot's action space and chose the next exploration step. C. Wang et al. [14] proposed a multi-objective reward function that could minimize both the map entropy and the path cost during the exploration process, then the frontier that had the best reward would be chosen as the goal, finally, a potential field is construed to guide the robot to the goal. H. Umari et al. [15] presented a local and global RRT-based frontier detection for candidate frontiers generation and then a function with information gain and navigation cost is applied to select the optimal frontier. Because of the local and global RRT-based frontier generation, this approach has a high algorithm complexity. T. Cieslewski et al. [16] proposed a frontier selection method for high speed flight. They used a function to select the frontier that had the minimum change in the velocity to reach it. Their strategy could significantly reduce the exploration time because of the high exploration speed.
The have been more works in the last two years, an effective exploration strategy using expected information gain is proposed by E. Palazzolo and C. Stachniss [24]. Their method could reduce the risk of collision, while still maximizing the information gain in the exploration process. C. Gomez et al. introduced a topological frontier-based exploration using semantic information [25]. They combined frontier-based concepts with behavior-based strategies in order to build a topological representation of the environment.

Sampling-Based Autonomous Exploration
As one of the most famous sampling-based exploration methods, Receding Horizon "Next-Best-View" Planner (RHC-NBVP) [17] was firstly proposed by A. Bircher et al. RHC-NBVP repeatedly expanded a rapidly-exploring random tree and selected the best next node to explore using an objective function that considered the set of visible and unmapped voxels from robot configuration. This approach had two main disadvantages: (1) keep repeatedly expanding RRT is computing expensive; (2) select the optimal next step in the field of view might lead to a sub-optimal solution. C. Papachristos et al. [18] presented an RHC-NBVP considering about the localization and mapping uncertainty. This implementation helped the planner run in challenging environments like dark or clutter environments. A two stage optimized next-view planning framework is developed by Z. Meng et al. [26]. The two-stage planner consists of a frontier-based boundary coverage planner and a planner returns an exploration path with the consideration of global optimality in the context of accumulated space information. Thinking about reducing the computing resources of the RHC-NBVP, C. Witting et al. proposed the history-aware strategy [19]. Their planner would maintain and use a history of visited places so that they can improve the sampling efficiency. The Visual Saliency-aware NBVP is introduced by T. Dang et al. [20]. They used visual attention to build the maps that are annotated regarding the visual importance and saliency of different objects and entities in the environment. Then the visual saliency information was used by the planner to find the next best step.
In the year of 2020, L. Schmid et al. [27] presented an RRT*-inspired online informative path planning algorithm. This algorithm could achieve global coverage and maximize the utility of a path in a global context, using a single objective function.
By comparing with the algorithms above, a frontier-based exploration algorithm using global information gain that could avoid the sub-optimal path is used. In order to achieve good exploration, the robot that performs the exploration missions is moving at a low velocity.

Preliminary
In this section, firstly the architecture of the robot and the definition of the environment are given and then the problem is formulated.

Robot Architecture and Environment Definition
The robot used in this paper is an aerial robot that has six degrees of freedom. These six degrees of freedom are the translational movements in the X, Y, and Z axes and the rotational movements which are roll, pitch, and yaw. The onboard sensor of the robot is a visual-inertial sensor with an RGB-D camera that can be used to perform localization and realize the environment.
A 3D occupancy grid map generated by using OctoMap [28] is applied to describe the environment and the gathered data from the RGB-D camera are used to provide the point clouds for updating the map.
The 3D occupancy map is organized by voxels, and every voxel has one state which is free, occupied or unknown. With OctoMap, every voxel has an occupied possibility. The value of the occupied possibility is in the range of 0 to 1. For the voxel x, the entropy e(x) can be calculated using the occupied possibility p(x) as in Equation (1).
The entropy can be used to describe the uncertainty of the voxel. Lower values of the entropy imply a lower value of the uncertainty. The sum of the voxels' entropy can be used to guide the autonomous exploration to reduce the map uncertainty.
In this work, the voxels whose occupied possibility value are higher than 0.5 are thought as the obstacle voxels.

Problem Formulation
The goal of the autonomous exploration is to explore an unknown environment as fast as possible.
To improve the efficiency of the exploration algorithm, the width, length, and height of the environment are previously defined in this work.
The width, length and height of the environment are assumed w, l and h, the volume of every voxel is V voxel , the guide goal pose is The robot starts to move from P 0 to G i . The exploration process ends when the algorithm cannot find a new G i . If the sum of V voxel is equal to w × l × h, it is thought of as a successful running of the exploration algorithm. The exploration process can be seen in Figure 1.

Methodology
In this section, the optimal frontier selection approach and the motion planning and control strategy are explained in detail. Firstly the optimal frontiers generation approach, which generates the goal for the motion planning module, is described and then how we move the robot to the goal using the motion planning and control module is explained.

Optimal Frontiers Generation
The points between known and unknown area in the occupancy map are selected as frontier points. Then, the candidate points are generated around these frontier points. The candidate points are generated using the formula below.
In Equation (2), θ i is the yaw angle generated by the candidates generation algorithm. R min and R max is the minimum and maximum distances from the frontiers to the candidates. ∆R and ∆θ are predefined increments for updating R i and θ i . In this article, ∆R and ∆θ are chosen as 0.5 m and 12 • . In Equation (3), X c , Y c , Z c and θ c are the x, y, z coordinate and yaw angle of the candidates, respectively. X f , Y f and Z f are the x, y, z coordinates of the frontiers.
The distance from the candidate point to the frontier points is set as D c f and the distance to the robot is set as D cr . With the candidates generated, firstly the candidates which are in the unknown area are removed and then the candidates whose D c f and D cr are smaller than a predefined value are removed.
With the generated candidates, the Kmeans++ algorithm is used for candidates clustering. By using the Kmeans++ algorithm, the number of the candidates can be reduced which can speed up the optimal frontier selection process.
The information value of the candidates after clustering can be calculated by using the information gain cost function [29]. By using the information value, the candidates can be sorted. The information gain cost function can be seen from Equation (4).
In Equation (4), G v is the information gain of the candidate points on the view v. R v is the set of rays cast through the candidate points on the view v in the robot's field of view. X is the set of voxels each ray traverses through. e(x) is the mapping uncertainty of the voxel. We thought of the sum of e(x) of the candidate points on the view v as the information gain value.
After the candidates are sorted, they are sent to the motion planning and control module, the whole process of the optimal frontiers generation is given by Figure 2.

Motion Planning and Control
With the frontiers points from the optimal frontiers generation module, the motion planning module firstly selected the frontier point, which has the smallest value as the goal point.
Firstly, the A* path planning algorithm is used to plan a primitive path from the current point to the goal point. Then, the safe corridors are generated based on the path point. The safe corridor generation approach is similar, as in [30]. The occupancy grid including the path point is inflated from each of its six faces. The inflation of each face finishes when it reaches the obstacle voxel or the size of the inflated voxel reaches a predefined value. Finally, a path is replanned through the center of every safe corridor. By using this strategy, the replanned path is farther away from the obstacle than the previous path. The process is shown in Figure 3. For the yaw planning, the yaw angle of the endpoint always looks towards the direction that has the largest IG gain while the other path points always look towards the next path point.
If the planner cannot find a feasible path to the goal or the planning time is higher than the predefined value, the current goal point are discarded and choose another frontier point with smaller value as the goal point.
After getting a feasible path, the planner sends the path points one by one to the motion controller. When there is a new obstacle detected in the path, the controller will stop the robot and let the planner find another feasible path. If there is no feasible path, the algorithm will generate candidates and search the feasible path again. The whole exploration process stops when the algorithm cannot find new frontiers. the whole process of the motion planning and control module is given by Figure 4.

Algorithm Complexity
The algorithm complexity is important when the computing resources of the agent are limited. For every run of the algorithm, firstly the frontiers are generated, then the information gain of the frontiers is computed and the best frontier with the most information is selected, and finally the A* algorithm is used to search the feasible path.
1. Frontiers Generation: the complexity of the frontier generation algorithm depends on the raw frontiers generation and Kmean++ clustser algorithm. As can be seen from Algorithm 1, The complexities of raw frontier algorithm and Kmean++ clustering algorithm are O(2π × (R max − R min )/(∆yaw × ∆R)) and O(3 × i × k × m), respectively. i, k, and m are the iteration times of the Kmean++ algorithm, the number of frontiers defined and the number of frontiers generated by raw frontiers generation, respectively.
2. In f ormation Gain Computing: The complexity of the gain estimation of each frontier point is the number of rays n in the FOV times the length of rays l to the obstacle divided by the resolution of the map r, which is O(n × l/r). 3. Path Search: when using A* algorithm searching the feasible path in a 3D occupancy map, we assume A* algorithm can search six directions when expanding a new node and the number of nodes in the path is a. The complexity of the A* search is O(6 × a). For the complexity of the safe corridor generation algorithm, it can be calculated by the minimum distance L j to the obstacle in the x, y, and z directions of every path node times the number of nodes a divided by the resolution of the map r, which is O(a × ∑ a j=0 L j /r). 4. Total Computational Complexity: The total computational complexity per run of the algorithm is for R < R max do 8: F n (y) ← F n (y) − R sin(yaw) 10: F n (yaw) ← yaw 12: if F n ∈ unknown then 13: F group (n) ← F n 14: end if 15: R ← R + ∆R 16: end for 17: yaw ← yaw + ∆yaw 18: end for 19: n ← n + 1 20: end for 21: {F 0 , F 1 , ..., F k } ← KmeanCluster(F group , k) 22: IG max ← 0; n ← 0; opt ← 0 23: for n < k do 24: IG F n ← IG(F n ) 25: if IG F n > IG max then 26: IG max ← IG F n 27: opt ← n 28: end if 29: n ← n + 1 30: end for 31: Path ← AstarSearch(P init , F opt ) 32: while P i ∈ Path do 33: while P i is not in collision do

34:
P new ← In f late(P i (x), P i (y), P i (z))

35:
if P i is in collision then The process of the whole exploration algorithm is shown in Algorithm 1. The autonomous exploration path generation algorithm can be divided into three parts. These three parts are frontiers generation parts (from line 1 to line 20), optimal frontiers selection part (from line 21 to line 30), corridor-based A* path planning part (from line 31 to line 40). The detailed descriptions about these three parts can be seen in the Methodology.

Experimental Setup
For the simulation experiments, the experiments using the Robot Operating System (ROS) [31] are run on top of Ubuntu 18.04. The Gazebo simulator is used to provide the environment model and the RotorS simulation [32] is used to provide the physical model parameters of robot. All the experiments run on a laptop with Intel Core i7-8750H at 2.2 GHz. The robot in the simulation is the AscTec firefly, which is equipped with a RGB-D camera with a horizontal field of view of 135 • and vertical field of view of 60 • . The sensing range of the RGB-D camera is set as 5 m. The camera has an angle of 6 • looking down to the ground. In the simulation experiments, the ground truth is used to perform the robot localization and the robot starts in the origin with yaw angle of zero.
To evaluate our exploration algorithm, it is tested in three different environments. These environments are a small-size environment, a medium-size environment, and a big-size environment. They are named as the apartment environment, the maze environment, and the office environment. These three environments can be seen in Figure 5. The size of the flat environment is 15 m × 14 m × 3 m, the size of the maze environment is 20 m × 20 m × 2.5 m. The height of the office environment is 2.5 m and the area of it is 800 m 2 . Our algorithm is compared with two different algorithms. One algorithm uses Bayesian optimization and the Gaussian process to predict the mutual information for frontier selection [13] and the other algorithm uses the Euclidean distance to choose the nearest frontier to explore. The main differences between our algorithm and these two algorithms are the kmeans++ for candidate clustering and using information gain for calculating the information value of the frontiers. The algorithms are also tested in the same environment with resolutions.
The platform for the experiment is that shown in Figure 6. As can be seen from the figure, the aerial platform is the Parrot Bebop 2, the on-board computer is the Jetson TX2 4GB module with orbit carrier and the RGB-D sensor is the Intel Realsense D435i, respectively. The depth camera D435i is part of the Intel RealSense D400 series of cameras, a lineup that takes Intel's latest depth-sensing hardware and software and packages them into easy-to-integrate products. Adding an IMU allows the application to refine its depth awareness in any situation where the camera moves. This opens the door for rudimentary SLAM and tracking applications allowing better point-cloud alignment. The minimum depth distance of the depth camera is 0.0105 m while the horizontal and vertical FOVs of are 86 and 57 • , respectively. The whole syste, including the controller, mapping and exploration module is run on the Jetson TX2 with a Dual-core NVIDIA Denver 2 64-bit CPU and NVIDIA Pascal TM Architecture GPU with 256 CUDA cores. The exploration algorithm is integrated with the Aerostack frame work [33] in the real flight. The system architecture used in this work is shown in Figure 7.
As can be seen in Figure 7, there are five main parts in the system architecture that are human operator, execution module, communication module, sensor system, and actuator system. For the part of the human operator, three basic commands are used. These three basic commands are take off, position control and land. The basic commands can be sent to the onboard computer via Wifi. The other four parts including the autonomous exploration algorithm run from the onboard computer and the outputs are the roll, pitch, yaw, and thrust commands. The control commands are sent to Parrot Bebop 2 via a USB cable that connect the onboard computer and drone.    The optitrack odometry is provided by the optitrack motion capture system. The position and steering tracking error of it is less than 0.03 mm and 0.05 • , respectively.

Simulation Experiments
The simulation results of the exploration path in the office, maze, and flat environments are shown in Figure 8. The start point of the exploration algorithms in the three environments with different mapping resolutions are all (0, 0, 1) m.     We limit the maximum exploration time of the office, maze, and flat environments as 1500 s, 1200 s, and 500 s. In this article, BOGP and NF are used to express the Bayesian optimization and the Gaussian process-based algorithm and the nearest frontier-based exploration algorithm. Each algorithm is run five times in the flat, maze and office environment with different resolutions. Table 1 shows the exploration time of these three algorithm under different map resolutions. The results of BOGP-based algorithm in the office environment with the map resolution of 0.2 m are not included because the exploration time is far more than the limited time.

Real Flight Experiments
In order to validate the proposed algorithm, a real flight experiment is given that runs the algorithm on-board an MAV. The experiment was conducted in a 4 m × 2 m × 2 m scenario equipped with a Optitrack motion capture system providing the MAV pose. Figure 9 shows the scenario for the MAV to navigate and the mapping for visualization. As can be seen from the figure, the cylider is put as an obstacle and the MAV will navigate around it. The cylinder obstacle is put at the position of (1.3, −0.1)/m. The height and radius of the cylinder are 1.26 m and 0.1 m, respectively. The boxes and board show the boundary of the scenario. The exploration time and the length of the exploration path is 44.9 s and 9.0398 m. P1 and P2 is the selected optimal frontier points. As can be seen from Figure 10, the robot first flew to P1 and then to P2. When the robot arrived P2, the final optimal frontier point P3 was selected. The exploration process was finished when robot arrived P3.
A video description about the experiments is shown at the following link: https://vimeo.com/ 455285058/f969e81451.

Discussion
As can be seen from Figure 8, if the exploration time is not limited, all the exploration algorithms can explore the whole environment when the map resolution is 0.2 m and 0.3 m.
The performance improvement of the proposed algorithm can be more clear when the exploration time is limited. As shown in Figure 11a,c,e, when the resolution of the map is 0.2 m (the solid line in the figure), the proposed algorithm (red line) can finish the exploration tasks in the limited time in all the scenarios. For BOGP based and NF based algorithm, it can finish the exploration tasks but takes a longer time than the proposed algorithm. In Figure 11a, the BOGP-based exploration cannot finish the exploration in the limited time when the map resolution is 0.2 m (the coverage of the whole scenario is 2000 m 3 , while it only reaches around 1250 m 3 ). It is because the Bayesian optimization and Gaussian processes take a long time to analyze the map and this becomes more clear when performing in a big environment like the office scenario. As can be seen from Table 1, when the map resolution is 0.2 m, the proposed algorithm takes 255 s, 808 s, and 1495 s on average to finish exploration, while BOGP-based and NF-based algorithms take a much longer time. When the map resolution is increased to 0.3 m, as shown in Figure 11a,c,e, and Table 1, the performance of NF based algorithm is close to the proposed algorithm but the proposed algorithm still can finish the task faster.    Table 1 that the proposed algorithm has a shorter exploration path than the NF-based algorithm in all the three scenarios and has a shorter path than the BOGP-based algorithm in the maze scenario with both resolutions and flat scenario with the resolution of 0.2 m.
The results show that both optimal frontier selecting functions of the proposed and BOGP-based algorithm can help the exploration algorithm follow a shorter path for fulfilling the exploration task. In the office environment, with a resolution of 0.2 m, our algorithm has a longer exploration path length because the BOGP-based method only explores around half of the whole environment in the limited time. Figure 11 shows one run for each method with different resolutions. Repeated experiments confirm that these curves are representative.

Conclusions and Future Works
In this article, an autonomous exploration strategy using the information gain to select the optimal frontier is presented. Firstly, an RGB-D camera to sense the environment and OctoMap is used to express the obstacle, free, and unknown space in the environment. Secondly, the Kmean++ algorithm is used to filter the frontiers and an information gain cost function is applied to select the optimal frontier to explore. Finally, A* path planner is applied to find the path and a safe corridor generation algorithm is used to move the path point far away from the obstacle before sending the path to the controller. The proposed algorithm has been compared with BOGP-and NF-based algorithms in three different environments with different map resolutions, in which the experimental results show that the proposed algorithm can save more exploration time than the other two algorithms and it also has a shorter exploration path.
The future work will focus on reducing the algorithm complexity and achieve higher exploration efficiency. A reinforcement learning-based exploration approach will also be considered in the future. The proposed exploration algorithm will also be implemented with Visual Inertial Odometry to perfrom the large environment exploration.  Acknowledgments: The work reported in this paper is sponsored by the Chinese Scholarship Council (CSC). We also thank Alejandro Rodriguez-Ramos, Hriday Bavle and Adrian Carrio for their sincere advice and help during the real flight experiments.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: