Real-Time Compact Environment Representation for UAV Navigation

Recently, unmanned aerial vehicles (UAVs) have attracted much attention due to their on-demand deployment, high mobility, and low cost. For UAVs navigating in an unknown environment, efficient environment representation is needed due to the storage limitation of the UAVs. Nonetheless, building an accurate and compact environment representation model is highly non-trivial because of the unknown shape of the obstacles and the time-consuming operations such as finding and eliminating the environmental details. To overcome these challenges, a novel vertical strip extraction algorithm is proposed to analyze the probability density function characteristics of the normalized disparity value and segment the obstacles through an adaptive size sliding window. In addition, a plane adjustment algorithm is proposed to represent the obstacle surfaces as polygonal prism profiles while minimizing the redundant obstacle information. By combining these two proposed algorithms, the depth sensor data can be converted into the multi-layer polygonal prism models in real time. Besides, a drone platform equipped with a depth sensor is developed to build the compact environment representation models in the real world. Experimental results demonstrate that the proposed scheme achieves better performance in terms of precision and storage as compared to the baseline.


Introduction
Driven by the advantages of on-demand deployment, high mobility, and low cost, unmanned aerial vehicles (UAVs) have become appealing solutions for a wide range of commercial and civilian applications over the past few years, including remote sensing [1], search and rescue [2], and surveillance [3]. For a UAV navigating in an unknown environment, obstacle detection is essential, specifically for autonomous navigation. In particular, the environment representation model needs to be quickly built from the sensor carried by the UAV and stored in the onboard memory. Such information can be utilized for path planning, traversability analysis, and exploration [4][5][6].
As the UAVs only have limited storage [7], efficient environment representation is crucial.
The most commonly used environment representation method is to divide space into cubes with an equal size [8][9][10]. This method can simplify the original point cloud data and quickly analyze the surrounding obstacle areas. However, the resolution of the corresponding grid maps has to be carefully designed, since a small resolution grid consumes a large storage space while a large resolution grid may lead to unfavorable errors near the obstacle surfaces [11]. With this consideration, an accurate and compact environment representation model without relying on the grid map needs be developed, so as to provide a concise spatial relationship [12,13] and improve the efficiency of navigation [14], as well as environmental information sharing among the UAVs [15]. Nonetheless, building an accurate and compact three-dimensional (3D) environment representation model from an onboard sensor is highly non-trivial. Specifically, the main challenges are three-fold. Firstly, it is difficult to construct a unified compact model for the obstacles because of their unknown shapes and the sensing noise. In addition, it is quite time-consuming to find all the arbitrarily shaped gaps on the obstacle surfaces and determine whether these environmental details should be removed to reduce storage consumption. Moreover, due to the requirement of the perception latency to process the captured sensor data and the limited computational capability of the UAVs [16,17], only low-complexity algorithms can be applied.
To tackle the above challenges, a novel vertical strip extraction algorithm is proposed in this work to analyze the probability density function characteristics of the depth sensor data and extract the obstacle information according to the roughness of the obstacle surface by an adaptive size sliding window. Furthermore, to speed up the elimination of irrelevant environmental details, a two-stage plane adjustment algorithm is presented to quickly fill the irrelevant gaps and then obtain a rectangular outline of the obstacle. Further combining these two algorithms, the obtained overall modeling scheme, dubbed as obstacle surface adaptive plane extraction (OSAPE), can convert the onboard sensor data into the multi-layer polygonal prism models in real time, as shown in Figure 1. In addition, by establishing the compact environment representation model within the field of view, a global map can be obtained by merging the processing results from successive frames and different perspectives. The main contributions of this work are summarized as follows: • Two novel algorithms, namely the vertical strip extraction algorithm and the plane adjustment algorithm, are proposed to effectively adapt to different obstacle shapes and different surface roughness, as well as to speed up the elimination of irrelevant environmental details by minimizing redundant information. • The proposed OSAPE modeling scheme, which is the combination of the two proposed algorithms, can convert the normalized data into simplified prisms based on the size of the UAV in real time. • By building a drone platform with a depth sensor, real-world experiments are conducted to demonstrate the advantage of the proposed scheme over the baseline.
The rest of this paper is organized as follows. Section 2 discusses the related works. In Section 3, the system model and the main process of the proposed scheme are presented. The vertical strip extraction algorithm and the plane adjustment algorithm are presented in Sections 4 and 5, respectively. Experimental results are presented in Section 6. In Section 7, conclusions and future works are discussed.

Related Works
In the literature, several methods about building the compact environment representation models have been developed for unmanned ground robot trajectory generation [18][19][20]. However, two-dimensional (2D) compact environment representation models for the ground robots cannot directly be used in UAV autonomous navigation due to the lack of height information [21]. The extraction of 3D world representations from depth sensors has raised tremendous interests in recent years. For example, in [22], an obstacle detection system in an indoor environment was proposed based on the Kinect sensor, which can accurately detect several types of obstacles in real time. In [23], an omnidirectional obstacle detection method was proposed by repairing the obstacle regions in the depth images, which can provide a three-dimensional omnidirectional obstacle viewing. Most recently, another related research interest in recognizing the obstacles in an unknown environment is semantic SLAM [24], which can provide not only where the obstacles are, but also what they are. Different from these works, the focus of our work is how to further simplify the obstacle model.
Recently, several pioneering works studied the problem of compact environment representation in the context of UAV autonomous navigation. For example, in [25], the polygonal outline of obstacles was extracted based on the grid map, which was used to generate a safe UAV trajectory and avoid the detected obstacles. In [26], an online autonomous collision-free navigation approach was presented by utilizing the octree-based map to generate flight corridors. However, the above grid-based environment representation models result in inflexible grid placement and quantization errors, and the octree-based map [10] requires an additional computational cost to acquire a free space area by searching and querying the state of the grid in tree-based storage structures [27].
There is also some literature on the plane extraction algorithm [28][29][30][31][32]. This algorithm can segment the parts of the depth image or the point cloud data belonging to the same plane, thereby utilizing a small number of geometric parameters to simplify the environment representation. For example, approximate planes are extracted by principal component analysis (PCA) based on voxel maps, and the plane parts are preserved for path planning [28]. However, some irregular obstacle surfaces may be lost during the conversion process due to the preset roughness threshold [29]. Moreover, the farthest recognizable distance of this method is limited (less than 10 m), as the plane fitting parameters cannot be adaptively adjusted according to the distance to the object [30].
Existing works on 3D compact building reconstruction have already demonstrated that the UAV is an effective method to improve the efficiency of reconstruction [33,34]. The observation data are mainly composed of the top area of the objects (e.g., building roofs and the top of the trees). For example, in [35], a modeling method combining meshes and geometric primitives was proposed, which simplifies the environment model while preserving the details of the environments. In [36], a novel algorithm was presented to obtain a lightweight, watertight polygonal surface model from the obtained global point cloud. However, due to different observation angles and global data requirements, these compact building reconstruction methods cannot be directly utilized for real-time obstacle modeling and route planning.

Adaptive Plane Extraction Model
In this section, the compact environment representation model is presented first, followed by the main process of the proposed modeling scheme.
Consider a UAV equipped with a depth sensor (e.g., binocular vision sensor, ToF camera [37]) performing tasks in an unknown environment. The generated depth data are given byD(u, v) (The data of other scanning sensors such as LiDAR [38] can also be converted into a 2D matrixD(u, v)), where u = 1, ..., U and v = 1, ...V are the column and the row numbers of the sensor data, respectively. D(u, v) ∈ [D,D], where D andD correspond to the minimum and the maximum measurement distances of the sensor, respectively. For the ease of analysis, the normalized disparity data D(u, v) are defined as: where f c is the camera focal length and C is a normalization parameter. Furthermore, the pitch angle θ, the roll angle φ, and the yaw angle ϕ of the camera can be measured by a six degrees of freedom inertial measurement unit (IMU). A static obstacle in an unknown environment is denoted by O ∈ R 3 and can be modeled as a multi-layer polygonal prism Θ ∈ R 3 surrounding this obstacle. The heights of all the layers in Θ are denoted by Y Θ = (Y 1 , Y 2 , ..., Y k , ...), where Y k and Y k+1 define the height range of the kth layer. In addition, the corresponding side profile of the polygonal prism is composed of several vertical rectangles indexed by l ∈ {1, ..., L}, i.e., In Equation (2), P 1 a and P 2 b are two diagonal points of the rectangle Π l shown in Figure 2a, and the plane fitting parameter η can provide a basis for merging planes, which will be elaborated later in Section 5.3. In addition, the orientation from the point P 1 a to the point P 1 b is utilized to distinguish free space and obstacles without additional parameters. In particular, following the orientation from P 1 a to P 1 b , the area on the right-hand side is always free space from the top view. On the contrary, the area on the left-hand side is either an obstacle or an unknown area. With the above consideration, a novel modeling scheme is proposed to convert the normalized data D(u, v) into several rectangles and merge them into simplified prisms in real time. The main process of the proposed scheme is illustrated in Figure 3. Firstly, the depth image is obtained by a depth sensor show in Figure 3a, and whether the image should be rotated or not is determined by the roll angle φ of the camera. Then, the identified 2D obstacles are converted into 3D Euclidean space, shown in Figure 3d, which will be referred to as the vertical strips in the subsequent discussions. The vertical planes are fitted based on the coplanarity of the obtained vertical strips shown in Figure 3e. Based on the above discussion, the obtained rectangles at multiple viewing angles can be merged and fused to form closed multi-layer polygonal prisms. The modeling process of a single-layer prism is shown in Figure 2b. Specifically, the newly obtained planes are clustered according to the positions of their vertices and merged with adjacent planes. A polygonal prism Θ will be generated if the planes in one cluster form a closed side profile. The proposed modeling scheme mainly consists of the vertical strip extraction algorithm and the plane adjustment algorithm, which will be illustrated in Section 4 and Section 5, respectively, in detail. Important notations and symbols used in this work are given in Table 1. Table 1. List of notations.

Notation
Physical Meaning The parameter of the rectangle θ, φ, and ϕ The pitch, roll, and yaw angles The normalized disparity data The Gaussian kernel function The probability density function in column u F (x) The minimum threshold of the probability density function x i The center of the ith peak ∆x i The width of the ith sliding window ∆v i The height of the ith sliding window H m The minimum recognizable obstacle height H s The minimum height to the passable region for the UAV W s The minimum width to the passable region for the UAV g(x) Estimated disparity value C r The rth cluster of the vertical strips

The Proposed Vertical Strip Extraction Algorithm
In this section, the vertical strip extraction algorithm is proposed, which identifies the obstacles in the normalized disparity data by columns and converts them into a 3D Euclidean space.

Statistical Estimation of Obstacles
The normalized disparity data D(u, v) of pixels belonging to the same obstacle are approximately equal or within a certain range. Therefore, to improve the anti-noise capability and the accuracy of the obstacle model, the distribution characteristics of the normalized disparity data are analyzed, and the roughness of the obstacle surfaces is estimated statistically.
Kernel density estimation (KDE) [39] is a non-parametric way to estimate the probability density function of random variables, which can be utilized to analyze the distribution characteristics of the obstacles in the sensor data while resisting measurement noise. If the roll angle |φ| exceeds the threshold φ 0 , the disparity data will be rotated by the angle −φ to facilitate statistical analysis. Lemma 1. Given the normalized disparity data D(u, v) and the pitch angle θ of the UAV, a new converted disparity value that eliminates the influence of the pitch angle is given by: Proof. Please refer to Appendix A.
After eliminating the influence of the pitch angle of the UAV by Lemma 1, the probability density function of the disparity value x in the uth column is given by: where K(x) is the kernel function and h represents the width of the kernel function. The kernel K(x) satisfies the conditions ∫ K(x)dx = 1 and K(x) > 0. The widely used Gaussian kernel is chosen to estimate the obstacle because of its strong noise immunity, where σ is the standard deviation and A is a normalization parameter. The center of the Gaussian kernel is D θ (u, v), and σ is set according to the sensor noise and depth resolution.
To filter out the measurement noise and slight undulations on the ground, the minimum recognizable obstacle height is set to H m , and it is not difficult to verify that the corresponding minimum value of F u (x) is given by: where x is the corresponding disparity value of the obstacle and C is the normalization parameter used in Equation (1). As shown in Figure 4, several columns of the probability density function are displayed in different colors, and the blue translucent plane represents the minimum threshold F (x) under different disparity values. The peak of the probability density function is greater than F (x), which indicates that there exists an obstacle near the current disparity value x. The probability density function in Figure 4b corresponds to the white dotted line in Figure 4a.

Obstacle Identification with a Sliding Window
Due to the measurement error of the depth sensor and the narrow gaps on the obstacle surface, depth pixels may be partially mutated, resulting in the same obstacle being divided into several parts. To address this issue, the sliding windows with adaptive sizes are utilized to identify the obstacles in the normalized disparity data, and the corresponding window size is adjusted according to the roughness of the obstacle surfaces to improve the anti-noise capability.
Specifically, the number and the distance of the obstacles in the uth column can be determined by the peak value F u (x i ) above the threshold F (x), where x i is the disparity value of the ith identified obstacle in the column u. Construct one sliding window for each disparity value x i . The width of the corresponding sliding window is given by respectively, and ∆x i reflects the roughness of the obstacle surface. Furthermore, the length ∆v i of the ith sliding window is adjusted according to the minimum height H s of a passable region of the UAV as follows: In Equation (7), the estimated disparity value g (x i ) is designed according to the influence of the measurement noise, which is given by: where k e is the proportional coefficient of sensor measurement error. Intuitively, the estimated disparity value g(x i ) corresponds to the minimum distance to the obstacle under the influence of noise, so that the window height can be adjusted according to the measurement noise. More specifically, the noise increases approximately proportional to the square of the distance according to the measurement characteristics of the depth sensor [40].
To speed up identification, the sliding step is set to half the length of the corresponding window. As the window slides down, the state of the pixel D θ (u, v) can be determined by the following conditions: 1. The average of the disparity value in the sliding window is within the range [x i , x i ] and 2. more than half of the pixels in the sliding window are within the range [x i , x i ].
If both conditions are admitted, the pixels in the sliding window belong to the ith obstacle. Otherwise, the last pixel of the disparity value within the range [x i , x i ] is set as the endpoint of the obstacle.
Based on the above discussion, the obstacle detected in the column u is described as ( , wherex i is the estimated disparity value of the obstacle x i ; v t i and v b i are the top and the bottom coordinates of obstacles in the disparity data, respectively. In particular, the estimated disparity valuê x i is determined by the roughness of the obstacle surfaces, which is given by: where e(x) = k e · ( C· f c x ) 2 + 2α · h. For example, Figure 5 demonstrates this sliding process. Specifically, the horizontal and the vertical units in Figure 5 correspond to the normalized disparity value and the row index of the depth data, respectively. Since ∆x i for the tree is greater than its corresponding threshold e(x i ), the distance to the object is represented by the closest point. On the contrary, for a relatively smooth wall, its distance is estimated by the average disparity value to improve the estimation accuracy.
With the above process, the obstacle surfaces with relatively concentrated depth values can be identified. However, the obstacles with an irregular structure or scattered depth values may not be extracted by this method. To build a unified obstacle model, a discretization strategy is presented in the next subsection to segment the remaining pixels.

Irregular Object Processing
In this subsection, the horizontal surface in the remaining pixels is identified first, followed by irregular or inclined object segmentation.
The geometric relationship of the horizontal surface in the camera coordinate system is given as follows: and : where Z c and Y c are the horizontal distance and the vertical distance to a point on the horizontal plane (cf. the red circle in Figure 6), respectively. In Equations (10) and (11), ∆H is the altitude difference between the horizontal surface and the depth sensor shown in Figure 6, and the image calibration center is (u 0 , v 0 ). By plugging Equation (11) into Equation (10), the row index v for the horizontal surface is given as follows: Optical axis c Z Z axis Figure 6. The geometric relationship of the horizontal surface.
If D(u, v) = 0, then v = v 0 − f c · tan θ, which corresponds to the vanishing point. For example, it can be seen from Figure 5 that the extension of the ground portion in the v-disparity image will pass through a fixed point. Hence, a horizontal surface can be identified by fitting the correlation coefficient of the normalized disparity data D(u, v) and the row index v. The fitted image center is given by: and the correlation coefficient between the disparity value and the row index is given by: where n r is the number of pixels to be identified andv The pixels belong to a horizontal surface if (1 − |r|) < ε r and |v c − (v 0 − f c · tan θ)| < ε v . The horizontal object surface is used as a reference for the thickness of the obstacle in this work.
Then, the pixels that do not meet the above conditions should belong to a sloping or irregular surface. However, it is difficult to obtain the accurate position of these obstacles and build a unified compact model. Here, a simple and effective strategy is presented to quickly divide the remaining pixels at a constant height interval H d , which is set according to the resolution requirements. Similar to Equation (6), the number of segmented pixels can be estimated as n d = H d · x/C, where x is the disparity value of the irregular object. The segmentation position of the irregular obstacles can be obtained by searching in steps of n d . As a result, the pixels belonging to the irregular objects in the same column are divided into multiple segments along the vertical direction, and the closest point in each segment is used to represent the distance to the divided object. This simple and effective way can convert different types of objects into arrays with four elements, i.e., (x i , u, v b i , v t i ).

Vertical Strip Clustering
In this subsection, the identified obstacle in the normalized disparity data will be converted into 3D Euclidean space. In particular, the projection of a point P c = (X c , Y c , Z c ) from the camera coordinate system to the image coordinate system is given by: Furthermore, the point P c in the camera coordinate system is given by: where R and T denote the rotation matrix and the translation matrix of the camera. According to Equations (15) and (16), (x, u, v b i , v t i ) in the camera coordinate system is converted into 3D Euclidean space, i.e., S i = (X i , Y b i , Y t i , Z i ), which looks like a strip perpendicular to the ground. To further build the compact contours of the obstacle, the spatially adjacent vertical strips should be clustered.
The clustering of the vertical strips in 3D Euclidean space often requires a radius search to find their neighbors within the radius. However, the 3D radius search is time consuming when the amount of data is large. By utilizing the spatial continuity of adjacent pixels on the normalized disparity data, the vertical strips are clustered according to the 2D index on these disparity data, which can improve operational efficiency. Specifically, the distance between the vertical strip and the last inserted strip in each cluster is calculated. If the distance is less than the minimum width W s of the passable region, this vertical strip will be inserted into the corresponding cluster. Otherwise, a new cluster is created for this vertical strip. The details of the fast strip clustering algorithm are given in Algorithm 1.

Algorithm 1 Fast strip clustering algorithm.
1: Initialize cluster set C 1 = {S 1 }, R = 1. 2: for u = 1 to U do 3: for each S i in column u do 4: Calculate the distance d i,C r from this strip to the last inserted strip in each cluster C r , where r ∈ {1, ..., R}.

5:
if min r d i,C r < d th then 6: Insert strip S i into cluster C r .

Computational Complexity
For a depth sensor data with size n = U · V, the calculation time of the probability density function t 1 ∝ n · h with the ceiling operator · and the kernel width h. When the height of the sliding window is M, the calculation time of the sliding extraction t 2 ∝ 2 · n/M . The time of the remaining pixels processing t 3 ∝ n r (n r ≤ n), where n r is the number of remaining pixels. In addition, the computational complexity for the projection of the vertical strips is negligible, since the number n s of strips admits n s n. Therefore, the complexity of the vertical strip extraction algorithm is O(n).

The Proposed Plane Adjustment Algorithm
In this section, the plane adjustment algorithm is proposed to convert the obtained vertical strips into prisms, while useless environmental details will be removed. Since the gap shape on the obstacle surface is arbitrary and unknown, it will take a long time to search every gap and determine whether the UAV can pass. To avoid this, a two-stage adjustment method is presented: first, fill the gap in the vertical direction, and then, convert the concave surface in the horizontal direction. These two main procedures are elaborated in Sections 5.1 and 5.2, respectively.

Vertical Gap Filling
The vertical strips belonging to the same obstacle can be regarded as a wall with a certain thickness as shown in Figure 7, in which the translucent blue cylinders represent the vertical strips. The height of this wall is determined by the maximum value Y max and the minimum value Y min of the vertical strips in the vertical direction. Obviously, if a narrow gap on the wall is smaller than the minimum size of the passable region, this gap should be filled, as these details are unnecessary for UAV navigation. As a result, the main work of removing unnecessary gaps is to obtain the maximum gap area among the vertical strips.
It is not difficult to find that in the vertical direction, the height of the gaps is the relative complement of each vertical strip . Specifically, there exists two relative complements, i.e., the cylinder with the dotted line in Figure 7). If the ranges of these two relative complements are less than H s , the relative complements should be filled. Otherwise, the size of the gap can be obtained by calculating the intersection range of the adjacent complements in the vertical direction. Then, a gap smaller than the minimum size of the passable region will be filled, while reserved gaps split the wall into multiple clusters (If the width of the wall is less than the minimum width W s of the passable region, there is no need to search inside this cluster. Therefore, a binary searching strategy is adopted to calculate the complements of the vertical strips, which can greatly reduce the amount of calculation). Hence, the vertical strips inside one cluster are filled with the same height. Figure 7. Illustration of the vertical gap filling method.

Concave Surface Converting
Based on the vertical strips obtained in the previous subsection, a concave surface converting algorithm is presented to remove the concave surface according to the size of the passable region of the UAV and obtain the compact obstacle model.
Since the vertical strips in a cluster share the same height after vertical filling adjustment, plane extraction is equivalent to a line segment extraction process with a certain height. However, vertical strips belonging to an uneven surface may be split into more scattered groups, which makes the obstacle model less compact. Inspired by the split and merge algorithm developed in [41] (The core idea of the split and merge algorithm is to iteratively split the entire cluster at the point with the maximum fitting error and merge adjacent parallel line segments), a concave surface converting algorithm is proposed to model uneven surfaces and reduce irrelevant detailed information. Whether it is necessary to delete redundant nodes can be determined by the vector cross product and the distance among three adjacent points. Specifically, for three adjacent vertical strips S i−1 , S i , denotes a cross product operation to determine the concavity. The above steps are repeatedly performed until no improvement can be made. The detailed steps of concave surface converting algorithm are illustrated in Algorithm 2.

Adjacent Plane Refinement
Due to different observation angles and fitting errors of the extracted obstacle contours, the adjacent planes belonging to the same object may overlap or intersect. With this consideration, a plane refinement strategy is presented to extend the adjacent planes and remove redundant parts and then obtain an interconnected prism profile. Specifically, the adjacent planes are supposed to be merged if the following two conditions are satisfied: 1. The angle between the two planes is less than the threshold ϕ 0 and 2. the distance between the boundaries of the adjacent sides is smaller than W s . Algorithm 2 Concave surface converting algorithm. 1: Initialize line set L = ∅, and put all strips into set Ω 1 and R = 1. 2: for each Ω r do 3: if |Ω r | > 3 then 4: The strips in Ω r are fitted into a line according to their horizontal locations with the fitted error ε l . 5: if ε l <= ε th then 6: Put this line into L, and remove the corresponding strips in Ω r . 7: else 8: Split the strips at the maximum fitting error into Ω R+1 and Ω R+2 ; R = R + 2.
Delete S i ; put the line composed of S i+1 and S i−1 into L. To improve the efficiency of the plane fitting, the parameters of a new plane can be calculated based on the original parameters as follows: where η = (n s , X, Z, XZ, XX); here, n s represents the number of the strips that are fitted into the plane, and X, Z, XZ, and XX denote the corresponding average coordinate values of the vertical strips. Hence, the newly acquired plane can be quickly merged with the established compact model without storing all the merged planes.

Computational Complexity
In the plane extraction process, the computational complexity of the vertical gap filling step is O(log n s ), and the complexity of the concave surface converting step is O((n s ) 2 ). As O(log n s ) is negligible as compared to O((n s ) 2 ), the complexity of the overall proposed scheme is O(n)+O((n s ) 2 ).

Experiment and Analysis
In this section, numerical results are provided to evaluate the performance of the proposed algorithms. The simulation results on the AirSim simulator (The AirSim simulator is a photorealistic game engine with cutting-edge graphics features such as high-resolution textures and realistic lighting and shadows, where the depth image can be obtained in real time) and the experiment results on the developed drone platform are presented in Sections 6.1 and 6.2, respectively. Specifically, the proposed scheme in this work was implemented in C++11, and the simulations were performed on an Intel Core i7-8700K processor. Furthermore, the drone platform was built based on DJI Matrice 100, which is equipped with a ZED binocular vision sensor and can build the compact environment representation model of the real world as shown in Figures 1 and 8. The image size obtained from the AirSim simulation platform [42] is 640 × 480, and the image size of the ZED sensor is 672 × 376.
In the simulation and experiment, the minimum height H s = 1 m and the minimum width W s = 2 m of the passable region of the UAV are set according to the size of the drone platform (The height of the drone platform is 0.7 m, and the width of the drone platform is 1.4 m. H s and W s can be set according to the size of the drone platform and the accuracy of the flight controller). The height division range was set to 2 m according to the resolution requirements [25]. Furthermore, the coefficient of the measurement error k e = 0.01, and the threshold of the fitting error ε l = 0.2 m (Based on our experience in the experiments, when ε l changes within [0.1 m, 0.3 m], there is little difference in the obtained models). In addition, the threshold of the roll angle φ 0 = 2 • . The proposed OSAPE scheme is compared with the 3D prisms (3DP) scheme [25] in terms of storage, precision, and processing time.

AirSim Simulation
In this subsection, the compact environment representation model on the AirSim simulator is presented first (A video showing the simulations can be found at https://youtu.be/TketUNf0ers), followed by the comparison of storage, precision, and processing time.

Compact Model
The proposed scheme is evaluated on different kinds of obstacles in this subsection, including buildings, trees, and other man-made objects, some of which are shown in Figures 9 and 10 (The colors of the planes are artificially labeled for the sake of clarity), where the coordinate unit of the 3D compact environment model is meters. For example, it can be seen from Figure 9 that irregular objects (e.g., stones) are modeled as a plurality of planes surrounding them, and gaps in the field of view that are smaller than the passable region of the drone are filled. In Figure 10, the left, the middle, and the right sides are the RGB images, the disparity data, and the corresponding compact environment representation models, respectively. Specifically, Figure 10 shows the obtained models when the drone is at different pitch and roll angles, which indicates that the proposed scheme is adaptable to large tilt angles.

Memory Usage and Model Precision
The proposed scheme is compared with the 3DP scheme [25] for storage consumption and model precision in this subsection (In 3DP, the grid map is obtained from the depth data, and then the contour of the compact model is extracted based on this grid map). Different grid resolutions of the 3DP scheme were set for a clearer comparison, i.e., 0.1 m, 0.2 m, 0.4 m, and 0.8 m.
First, the performance of storage consumption was evaluated on the AirSim simulator, and the compact obstacle model was obtained in real time. It can be seen from Figure 11 that, as the grid size increases, the storage consumption of the 3DP scheme decreases, which conforms to the discussions in the Introduction. In the AirSim simulator, a compact environment representation model is constructed by gradually merging and fusing the obtained models of multiple frames. The storage consumption for structured obstacles and unstructured obstacles is compared in Figure 11a,b, respectively (The structured obstacles refer to the obstacles with planar surfaces while the unstructured obstacles refer to the obstacles with rough surfaces). For structured obstacles, the proposed OSAPE scheme extracts the planes without quantization, and the unrelated obstacle details are removed; thus, the number of planes is smaller as compared to that obtained by the 3DP scheme. For unstructured obstacles, the number of planes is slightly larger than the 3DP scheme with 0.8 m grids and far smaller than the 3DP scheme with 0.1 m grids. In addition, the modeling accuracy is defined as the difference between the established compact model and the actual obstacle surfaces. However, the location of obstacles is unknown, and it is difficult to accurately measure the position and the shape of the obstacle surface. Therefore, for quantitative analysis, the modeling accuracy can be approximated by the average distance from the points of the obstacle surfaces to the obtained compact model. Based on the above discussion, the statistics of the modeling accuracy of the proposed scheme and the 3DP scheme are compared in Figure 12. For the 3DP scheme, as the grid size increases, the modeling accuracy of the 3DP scheme also increases due to the quantization error. For structured obstacles, the modeling error of the proposed scheme is much lower than that of the 3DP scheme with 0.1 m grids. The main reason is that the grid in the 3DP scheme cannot be placed arbitrarily, and the extracted polygon depends on the border of the grids. However, for unstructured obstacles, the modeling precision of the proposed scheme is slightly larger than the 3DP scheme with 0.1 m grids, which is because the proposed scheme eliminates irrelevant obstacle details and introduces modeling error.

Processing Time
In this subsection, the processing time of the proposed scheme is compared with the 3DP scheme in the AirSim simulator. The processing times of the 3DP scheme with 0.1 m grids and 0.8 m grids are on average about 226.1 ms and 17.6 ms, respectively, as shown in Figure 13a (Since there is no provided source code, the 3DP algorithm is implemented according to [25]). In addition, it can be seen from Figure 13a that, as the grid size increases, the processing time of one frame gradually increases. As shown in Figure 13b, the proposed scheme takes only 18.6 ± 3.0 ms to process one frame of depth sensor data without downsampling in the simulations. The processing time of the proposed scheme can be reduced by downsampling the depth image in the column direction. As the sampling interval increases, the processing time gradually decreases. Here, the sampling interval refers to the number of interval columns for downsampling the depth data only in the column direction, which provides a way to improve the efficiency of the proposed scheme.
Furthermore, the anti-noise performance of the proposed scheme is evaluated by adding random noise of different amplitudes. Figure 13 illustrates that as the sampling interval increases, the efficiency of the proposed scheme improves significantly, but the modeling accuracy and anti-noise performance decrease. Therefore, the sampling rate can be set according to the requirements of efficiency and precision.

Experiment on the Developed Platform
In this subsection, the proposed OSAPE scheme is evaluated on the developed drone platform. Specifically, the developed drone platform flies at an altitude from a few meters to several tens of meters and converts the depth sensor data into the simplified obstacle models in real time. The modeling results are shown in Figure 8, where the left, the middle, and the right sides are the RGB images, the disparity data, and the corresponding compact environment representation models, respectively. It can be seen from Figure 8 that different obstacles including buildings, street lights, and trees are converted into compact planar models. The red dotted rectangles in Figure 8c,d represent the gaps that the drone can pass through, and their sizes are 2.7 m × 2.0 m and 5.7 m × 2 m, respectively, while the actual sizes of these rectangles are 2.9 m × 2.1 m and 5.8 m × 2.2 m, respectively. The extracted window is slightly smaller than the actual one due to the conservative extraction strategy.
The processing time of the proposed scheme is on average about 89ms on the developed drone platform, and the compact environment model can be obtained on the developed drone platform between 10 and 12Hz. Since the processing time of the algorithm has been compared via simulations (cf. Section 6.1.3), the processing time comparison on the developed platform is omitted. In addition, the performance of storage consumption is compared based on the modeling results derived from real flight data collected by the drone platform. Specifically, the proposed scheme reduces the plane numbers of the field of view by up to 40% as compared to the 3DP scheme with 0.8m grids shown in Figure 14.

Application
In this subsection, the obstacles of different shapes shown in Figure 15 are converted into multi-layer polygonal prisms, including cylinders, trapezoidal prisms, and spheres. Furthermore, the obtained compact environment representation model can be used for some traditional path planning algorithms, such as rapidly exploring random trees (RRT) [43] and probabilistic road map (PRM) [44]. Here, the commonly used RRT method is used to verify the validity of the obtained compact model. RRT is an algorithm designed to search for non-convex space and generate feasible paths by randomly building a space-filled tree. Based on the position of the obtained compact model, it can be determined whether the sampled path point is feasible. As shown in Figure 16, the pink dots represent the starting location and the target location, and the green dots represent the sampling path points on the space-filled tree. In addition, the red and the black lines correspond to the planned path and the smoothed path, respectively. It is worth noticing that the gaps that the UAV cannot pass through are eliminated, so as to avoid some unsafe path generated by the path planning algorithm.

Conclusions and Future Works
In this paper, the problem of the compact environment representation model for UAV navigation is studied, where the depth data from the onboard sensor are analyzed and converted into the multi-layer polygonal prisms. By analyzing the probability density function of the normalized disparity data, a novel vertical strip extraction algorithm is proposed to improve adaptability to the roughness of the obstacle surfaces and obtain the more accurate locations of the obstacles. Furthermore, the plane adjustment algorithm is presented to speed up the elimination of irrelevant environmental details by minimizing redundant information and obtaining a rectangular outline of the obstacle. By combining these two proposed algorithms, the obtained modeling scheme can convert the depth sensor data into simplified prisms in real time. In addition, a drone platform is developed to build a compact environment representation model in the real world. The experimental results demonstrate that the proposed scheme consumes less storage as compared to the baseline algorithm and provides higher accuracy in modeling structured obstacles.
Extending the proposed scheme to the complex terrains and dynamic scenes, exploring a collaborative multi-UAV modeling strategy, and investigating more efficient storage and search mechanisms in the large mission area are all worthwhile future works.