Urban Scene Vectorized Modeling Based on Contour Deformation

: Modeling urban scenes automatically is an important problem for both GIS and nonGIS specialists with applications like urban planning, autonomous driving, and virtual reality. In this paper, we present a novel contour deformation approach to generate regularized and vectorized 3D building models from the orthophoto and digital surface model (DSM).The proposed method has four major stages: dominant directions extraction, ﬁnd target align direction, contour deformation, and model generation. To begin with, we extract dominant directions for each building contour in the orthophoto. Then every edge of the contour is assigned with one of the dominant directions via a Markov random ﬁeld (MRF). Taking the assigned direction as target, we deﬁne a deformation energy with the Advanced Most-Isometric ParameterizationS (AMIPS) to align the contour to the dominant directions. Finally, the aligned contour is simpliﬁed and extruded to 3D models. Through the alignment deformation, we are able to straighten the contour while keeping the sharp turning corners. Our contour deformation based urban modeling approach is accurate and robust comparing with the state-of-the-arts as shown in experiments on the public dataset.


Introduction
In recent decades, there has been significant development in remote sensing technology. From valuable satellites to low-cost unmanned aerial vehicles (UAV), from high-resolution multi-spectrum images to LiDAR, we can capture both 2D and 3D data of our environment easily. Therefore, extracting semantic, geometric, and topological information from these heterogeneous data automatically has been an important research field in both GIS and nonGIS communities. Obtaining precise and compact geometry representation of large-scale urban scene is one of the core problems in urban reconstruction [1], which is referred to as vectorized modeling by many researchers. It not only has direct application in GIS like urban planning, navigation, and real estate but also is beneficial for model storage, transmission, and rendering. Apart from that, the structured and vectorized representation of an object is also a popular demand in 3D computer vision. In this paper we aim to generate compact building models from the orthophoto and digital surface model (DSM) as shown in Figure 1.
Aerial images and LiDAR point clouds are two common data types for mapping the outdoor environment. LiDAR can measure distance with great precision but the irregular form of point cloud brings an extra burden to further processing. With the increasing amount of literature on deep learning, semantic information can be easily extracted from images. Meanwhile, structure-from-motion (SfM) and multi-view-stereo (MVS) enable us to reconstruct the scene from images with ease. Therefore, we choose the orthophoto and DSM generated from aerial images as input.
Semantic segmentation [2,3] is the fundamental part in our system. With the help of deep learning, geographic information practitioners are able to predict semantic of aerial images effortlessly. DeepLab [4,5] is one of the most acknowledged neural network structures with dilated convolution, spatial pyramid pooling, and encoder-decoder. Liu et al. [3] use a self-cascaded architecture and achieved the highest overall accuracy on the ISPRS 2D semantic labeling contest [6]. Typically the boundary of each semantic region is a chain of rasterized pixels that is too dense for a modern GIS system. The Ramer-Douglas-Peucker (RDP) algorithm [7] is widely used to simplify a polyline. It approximates a curve composed of line segments with fewer points by decimating the endpoint with minimum distance to the polyline iteratively. Poullis et al. [8] use a Gaussian mixture model and Markov random field (MRF) to classify the boundary points before applying the RDP algorithm. 3D geometry is another important aspect for our system. Modern SfM and MVS pipelines [9][10][11][12][13][14][15][16]. can generate accurate models from images. However, the dense triangle surface meshes or point clouds are not suitable for modern GIS system. Turning these dense outputs into the compact form, also known as vectorization has been attracting increasing attention [17][18][19][20][21][22][23]. Generally, they can be classified into two categorizes: (1) Bounding volume slicing and selection [17,18,24], which slice the bounding volume with planar primitives and select the polytopes inside the building or the faces on the building. (2) Contour regularization and extrusion [19,21,25,26], which usually regularize the contours of a building first with the assumption of 2.5D scene, then simplify the contours and extrude them into 3D space. Level of details (LODs) defined by CityGML [27] is widely recognized for how urban environment should be described, represented, and exchanged in the digital world. Both [17,21] focus on generating models adhering to the CityGML [27] standard. We follow the second contour regularization path and generate LOD0 and LOD1 models as shown in Figure 1d.
In this paper, we propose a novel contour deformation based urban vectorized modeling method as shown in Figure 1. We take segmented orthophoto as input and extract the contours of each building first. Then the contour normals are smoothed with a bilateral filtering, and dominant directions are detected from them with the RANSAC algorithm. After that, each edge is assigned with one of the dominant directions through an MRF. By defining a deformation energy on the triangulation of the contour polygon, we could align the boundary edges to the dominant directions. Finally, the contour can be vectorized into the polygon LOD0 model and extruded to the LOD1 model from DSM. Our main contributions include: -An effective bilateral smoothing and RANSAC based dominant direction detection method.
-An efficient deformation energy optimization defined on the contour triangulation to align the boundary to the target directions. -A novel deformation based building modeling method, which enables us to generate compact LOD0 and LOD1 models from orthophoto and DSM.

Overview
The proposed modeling method takes the orthophoto and DSM as input and outputs vectorized LOD0 and LOD1 models. As shown in Figure 2, our algorithm has four major stages: Generally, an urban scene has elements including road, building, and vegetation [17,21]. Among them, building is the most important category for urban vectorized modeling. Therefore, we only focus on the building in this paper. In the following sections, each building is reconstructed separately with its related data isolated from the orthophoto and DSM.

Dominant Directions Detection
Strong regularity in direction is one common property of the building. Manhattan assumption is widely used [24,26] in urban modeling, which assumes that the whole scene has three global orthogonal directions. The less restricted Atlanta assumption [28] requires that each building has its own local orthogonal directions. Here we do not pose any restriction to the dominant directions and detect them on the bilaterally smoothed contour normals with the RANSAC algorithm. Figure 3a shows the building contour C = {c i } extracted from the segmentation in Figure 1. Due to the rasterization of the image, the boundary normals {n i } of all boundary edges {e i = − −− → c i c i+1 } are limited to a few discretized directions [8]. To alleviate this problem, we propose a bilateral smoothing on the normals by weighing on both the angle similarity and the geodesic distance: where N i = {n k | dist(e i , e k ) < thre d } is the neighboring normals and dist(e i , e k ) measures the distance between e i and e k along the contour. The composite weight w(i, k) is given by: where (n i , n k ) represents the angle between n i and n k . In our experiments, the distance variation σ d and the angle variation σ a is set to 0.5 m and 10 • respectively. Figure 3b shows the effect of our distance and angle weighted bilateral smoothed normals. With the smoothed contour normals, the RANSAC algorithm is used to detect the dominant directions D = {d i } with the inlier set defined as: Specifically, in the ith iteration and with the remaining normals R i = {n k } \ i−1 k=1 I k , we keep generating candidate directions {d c i } and collecting their corresponding inlier sets {I c i } until the missing probability [29] p m is below a threshold thre p : The best candidate I i = argmax({|I c i |}) is selected to compute the ith dominant direction: In our experiment, the iteration stops when |I i | is less than |{n i }|/16. Figure 3c shows the detected directions from the smoothed contour normals.

Align Direction
With the detected dominant directions D, we adopt the similar MRF formulation in [8] to assign each boundary edge e i a target direction d i , which will be later used to drive the deformation in the following Section 2.4.
For each building contour C, an undirected dual graph G = V, E is constructed on it, meaning each edge e i is treated as a vertex in V and each vertex c i is considered as an edge in E . Accordingly, our label set is D = {d i } and a labeling configuration is f : V → D. The data term measures the difference between the observed datan i and the mapped label f (e i ): The smoothness term favors the connecting edges e i , e j ∈ V with similar orientations f (e i ), f (e j ) ∈ D and penalizes otherwise: where δ controls the smoothness uncertainty. Intuitively, if two neighboring edge normalsn i andn j are close, there is a higher probability that f (e i ) and f (e j ) are similar. The last label term favors a lower number of labels in a labeling configuration f : where h i = |I i |/ ∑ |D| i=1 |I i | measures the relative portion of a dominant direction on the contour and ζ i ( f ) indicates whether label d i exist in f : The label term penalizes heavily when there exists a label that corresponds to a direction of little portion h i , thus tends to keep the few most significant directions in D. Then the overall energy function for the graph-cut is given by: where κ 1 and κ 2 are the balance coefficients of the smoothness term and the label term respectively. They are set to 1 and 10 in the following section. As proven in [8], the formulation is regular and can be minimized by the α-expansion algorithm [30,31]. Figure 3d shows how each boundary edge is assigned with a target dominant direction.

Deformation Formulation
This section introduces our dominant direction driven deformation formulation, which regularizes the contour and greatly helps the following modeling step. For each contour C, we use the constrained Delaunay triangulation [32] to triangulate the bounded area as shown in Figure 4. The generated mesh M = {t 1 , · · · , t l } contains vertexes {v 1 , · · · , v m } and boundary edges {e 1 , · · · , e n }.
Our goal is to align the normaln i of each e i on C to its assigned target direction under the configuration f in Section 2.3. Therefore, we have the align energy that measures the difference between the edge normalsn i and their target dominant directions d i ∈ D: By changing the positions of {v 1 , · · · , v m }, we can deform M to align the contour C to major directions D and minimize E align . The deformation is a piece wise linear function g : R 2 → R 2 defined on M, which maps every original triangle t = v p v q v r to the output triangle t = v p v q v r as illustrated in Figure 4. We adopt the Advanced Most-Isometric ParameterizationS (AMIPS) [33] to achieve as low distortion as possible. Then the mapping is an affine transformation defined on each t: where J t is a 2 × 2 affine matrix and also the Jacobian [33] of g t : With the singular value of J t denoted as σ 1 and σ 2 , the AMIPS [33] conformation and area distortion energy defined on t is: The rigid energy measuring how rigid the mapping from the original triangle t to the deformed triangle t is defined as: Then the overall deformation energy balanced by λ is To minimize E de f orm , we use the MATLAB optimization toolbox and the closed form gradient is in the Appendix A. The conformation and area distortion balance α is set to 0.5 and the alignment and deformation balance λ is set to 1e5. Figure 5 shows the process of energy minimization.

Model Generation
After the deformation optimization in Section 2.4, the building contour C becomes C and is aligned to the dominant directions. LOD0 and LOD1 models adhering to the CityGML [27] standard can be easily extracted from it. Specifically, we traverse the contour C and only keep the corner vertexes {c i }, which has large angle between the neighboring edges e i−1 and e i+1 . Connecting the corner vertexes gives us the LOD0 model of a simplified polygon. LOD1 model is given by extruding the LOD0 contour to the averaged height of the building [17,21]. Figure 6 shows the process of model generation.

Results and Discussion
The proposed method is implemented in C++. The max-flow library [30,31] is used to solve the MRF in Section 2.3 and the MATLAB optimization toolbox is used to solve the energy minimization in Section 2.4. Qualitative as well as quantitative assessments are conducted on the public dataset from ISPRS [6] in this section. Experiments show that our contour deformation optimization framework can generate regular and compact building models compared to the state-of-the-arts.

Effect of Alignment Deformation
To demonstrate the effect of the alignment deformation, we conduct a simple experiment that aligns the contour to the axes, as shown in Figure 7. The contour is extracted from our own real world orthophoto of a high rising building, which is sampled from the dense textured mesh reconstructed from aerial images by Pix4D [34]. Due to the high level of noise and inaccuracy derived from the mesh reconstruction and orthophoto segmentation, the extracted building contour is uneven and zigzags as illustrated in Figure 7. To make matters worse, the corners of small protrusion are rounded (green rectangle in Figure 7), which is essential for vectorized modeling.  For each border edge normal, we simply assign the nearest axis as its target direction. As shown in Figure 7 the jaggy input contour is aligned to the axes gradually as the deformation energy drops. The closeup on the right of the area in the green rectangle on the left shows detail of the optimization process. The energy drops dramatically in the first few iterations as shown in Figure 5d. Since we set the alignment and rigidness balance λ to 1 × 10 5 , we can also observe that the alignment is reached quickly at first, then the rigidness in the following iterations.

Quality Comparison
There are three major aspects in evaluating the modeling quality: contour accuracy, contour complexity and regularity. Both general urban modeling algorithms [8,21] and general polyline simplification algorithm [7] are evaluated and compared on the public Vaihingen data set [6]. The input orthophotos shown in Figures 1 and 8  For LOD0 and LOD1 generation, the problem is usually treated as a contour extraction and simplification problem. The RDP algorithm [7] removes the vertex close to the curve iterative until the target number of vertexes is reached or the minimum distance is reached. The simplified curve is a subset of the points that are in the original polyline. This makes it easily affected by the few extreme points as shown in Table 1.
Poullis et al. [8] conduct a orientation detection and classification before the RDP algorithm [7]. This enables it to be aware of the turning points of the global orientation. Unfortunately, the simplified curve is still the subset of the input curve. Therefore, RDB [7] and Poullis et al. [8] could not capture the position of the corners well, due to rounded corners in the segmentation mask as shown in Table 1.  Figure 3. In each row from left right to left are: the ground truth building contour provided by ISPRS [6], the our LOD0 model, the RDP algorithm [7] output by specifying the number of vertexes to be the same as ours, the result by Poullis et al. [8], and Zhu et al. [21]. The last column in Table 1 demonstrates that Zhu et al. [21] could generate contour with strong regularity. However, it fails when the scene does not satisfy the Manhattan assumption. The missing part in the lower left corner of the second building in Table 1 is caused by the missing short segments.

Ground
As illustrated in Table 1, our method has the highest IoU while maintaining the sharp corners of the contours. Thanks to the deformation optimization process, we are able to recover the position of the corner vertex more accurately. Table 2 lists more results of the buildings in Figure 1, and the orthogonal neighboring edges are correctly reconstructed. In addition, Figure 8 shows the result on another block from the Vaihingen data set [6]. On the left is the input orthophoto of residential area, and on the right is the output LOD1 model overlaid on the semantic segmentation map [3]. Table 2. Generated model on different buildings in Figure 2. In each column from top to bottom are: building segmentation, generated vectorized polygon LOD0 model, and the extruded LOD1 model.

Region B C D
Contour LOD0 LOD1

Conclusions
In this paper, we try to turn the dense 2D orthophotos into compact 3D polygon models automatically for the urban scene [17,21], which is suitable for efficient representation, processing, and rendering of large-scale scenes. The generated models can be used in various fields like urban planning, navigation, emergency simulation, and risk reduction [1].
Specifically, we propose a novel deformation based contour simplification approach that generates vectorized LOD0 and LOD1 building models from the orthophotos. To begin with, building contours are extracted from the semantic labeled orthophotos and processed separately. For each building, we first extract dominant directions by applying the RANSAC algorithm on the bilaterally smoothed contour edge normals. Then each edge normal is assigned with one of the dominant directions as target alignment by formulating the task as an MRF labeling problem [8]. Finally, a deformation energy combining the edge normal alignment and AMIPS [33] rigidness is defined on the contour triangle mesh. By minimizing the deformation energy and connecting the corner vertexes, we could generate compact LOD0 and LOD1 models easily.
Compared to the classic RDP algorithm [7] and the most recent advanced contour based methods [8,21], we are able to enhance the global regularity and retain the contour topology at the same time. The proposed novel deformation approach could aggregate different constraints into one optimization framework. It shows great potential in the urban scene modeling, which is rich with regularities like orthogonality, parallelism, and collinearity. In the future, we would like to add more common regularity constraints and take the generated models to higher LODs.
Funding: This research was funded by Natural Science Foundation of China under Grants 61991423, 61873265, 61421004.