1. Introduction
Recently, 3D digital city modeling using aerial images is a popular research field. Aerial images can provide not only geometric position information but also texture information, which can satisfy the whole reconstruction process of generating photogrammetric point clouds, surface model modeling and texture mapping. It is worth emphasizing that buildings constitute a pivotal element of the urban landscape, and their modeling has a substantial impact [
1]. Through entities in conjunction with computer vision methods, important information such as geometry, semantic, textures, and structural systems can be stored in building models and ultimately serve applications such as city planning and energy consumption analysis [
2].
In the process of modeling, using photogrammetric point clouds for 3D model reconstruction is an important step [
3,
4]. Level of detail (LOD) models are simplified building model with five successive levels of detail defined in the CityGML standard, which has been widely used in building information modeling (BIM) [
5], urban energy conservation [
6] and indoor geometry [
7]. Among them, LOD2 building models usually include complex mesh models and lightweight polygonal models. Some commercial 3D modeling software such as Pix4DMapper and Context Capture [
8] frequently use the triangulation of dense point clouds to produce complex mesh models. Nevertheless, these methods can result in uneven surfaces or modeling errors due to outliers and uneven point cloud data. Conversely, lightweight polygonal models not only have flat surfaces but also have advantages in applications with extensive scenes or numerous objects due to the efficient utilization of storage resources [
9]. However, how to reasonably build lightweight surface models and make them convenient for subsequent modeling steps such as texture mapping is an important research direction. Therefore, on the basis of the advantages of planes regarding extracting the main building components and their textures, this work uses the point cloud generated by oblique photography and models the building from the perspective of the plane combination, which can not only generate a lightweight LOD2 geometric model but also provides a basis for the subsequent texture searching and mapping.
Data-driven methods and model-driven methods are the two main kinds of building modeling methods [
10]. Between them, the data-driven method employs a bottom–up approach that begins with data extraction to reconstruct the geometric model [
11,
12]. This method has great flexibility and can reconstruct complex shapes; however, its success hinges highly on data accuracy. Conversely, the model-driven method follows a top–down strategy. It involves constructing a library of primitive models, aligning them with data, and ultimately combining selected primitives to form a model [
13,
14]. Models produced by this method incorporate semantic information and maintain accurate framework. Nonetheless, the descriptive capacity is constrained due to the limitations of the content of the model library. Hence, it is necessary to develop a method that minimizes the impact of data and avoids excessive reliance on rules while still ensuring boundary accuracy and correct framework.
Primitive extraction constitutes the foundational step in building reconstruction [
15,
16]. The primary objective of primitive extraction is to obtain high-quality geometric primitive for texture mapping and semantic attachment. As typical artifacts, buildings are mostly composed of planes [
15,
17]. Therefore, this work concentrates on piecewise planar buildings. Typical methods encompass Random Sample Consensus (RANSAC) [
18], region growth [
19], and clustering [
20], primarily relying on factors such as normal vectors, point cloud distances, and colors [
21]. Nevertheless, the results of these methods can be influenced by the complexity of the building, causing some problems such as excessive segmentation, which requires further post-processing of the generated results.
Framework reconstruction is the key to surface polygonal reconstruction. It aims to enhance the geometry of plane primitives with a framework, resulting in the creation of a coherent and logical model. Moreover, framework reconstruction can be used to simplify complex building models [
22] and further build different LOD models including indoor and outdoor [
23,
24]. To construct a reasonable overall building framework, there are three common reconstruction methods: cell decomposition methods, shape grammar methods, and connection evaluation methods [
25].
Cell decomposition methods are commonly used to create two-dimensional indoor or three-dimensional outdoor watertight models [
26]. Researchers transform the reconstruction task into an optimal subset selection problem with hard constraints [
16]. The PolyFit proposed by Nan et al. is a state-of-the-art (SOTA) cell decomposition method of building 3D reconstruction. It generates face candidates by intersecting the extracted planar primitives and then selects an optimal subset of these candidates via optimization. Xie et al. have further improved PolyFit by incorporating constraints related to adjacent faces [
16]. Meanwhile, Liu et al. have enhanced PolyFit by integrating feature line detection to improve the connectivity between plane primitives [
15]. Nevertheless, it is worth noting that while cell decomposition methods excel at creating watertight models, they might struggle with complex buildings or sharp edges like fences due to their requirement for watertightness. In addition, due to the impact of global optimization, some planes may be generated that actually deviate from the original point cloud, which is not conducive to subsequent modeling steps.
Shape grammar methods involve establishing a predetermined set of grammar rules or hypotheses, which are then employed to model according to a defined sequence [
27]. Particularly within BIM, shape grammar methods are used to generate instances of architectural elements such as walls and doors [
25]. Some researchers segment the buildings into different parts and then match them with the predefined building primitives. The corresponding parameters are estimated with the identified primitives to form the model [
28,
29]. Moreover, one exemplar of a shape grammar approach is Computer-Generated Architecture (CGA), which excels at procedurally generating intricate 3D building geometries [
30]. While it facilitates rapid batch modeling, CGA is contingent on rule definitions and may necessitate pre-training efforts. When facing some complex buildings, the effect may not be ideal.
Connection evaluation methods are conventional approaches to reconstruct the framework [
25]. These methods assess potential connections among primitives and facilitate their linkage based on specific criteria. Frequently, a certain Euclidean distance of primitives is used to determine candidate adjacent planes [
31]. Among the candidate adjacent planes, the connection modes are selected according to the wall axes, and the intersecting connection is the most popular. This process subsequently leads to the establishment of the topological connection [
32]. However, connection evaluation methods are often applied to indoor modeling or BIM modeling, where the orientations of their wall axes remain constant. When considering entire building modeling, determining the appropriate wall axes presents a challenging issue due to the uncertainty surrounding the normal directions of the planes.
With the continuous development of deep learning methods, 3D reconstruction using deep learning algorithms has gradually emerged in recent years. Implicit Geometric Regularization for Learning Shapes (IGR) proposed by Gropp et al. has been proved to be useful for different shape analysis and reconstruction tasks [
33]. However, IGR is not designed for building reconstruction, so it generates complex mesh models and may create unnecessary parts due to point cloud noise. Bacharidis et al. used deep learning methods of image segmentation and single image depth prediction for 3D building façade reconstruction. However, they only considered facades of the buildings and also built complex mesh models [
34]. As for lightweight polygonal LOD2 building models, Building3D is an urban-scale building modeling benchmark, allowing a comparison of supervised and self-supervised learning methods [
35]. In a research paper that studied Building3D, the authors firstly proposed and adopted self-supervised pre-training methods for 3D building reconstruction. However, Building3D does not handle the model plane well, resulting in wave parts which should belong to one plane. Moreover, in order to reconstruct buildings with different characteristics, it is necessary to train another model instead of the provided one before using deep learning methods for reconstruction. Therefore, we may need a non-deep learning method to provide some data in advance to assist training.
Drawing upon photogrammetric point clouds and targeting the creation of lightweight polygonal models for planar buildings, this work introduces a building modeling pipeline. The main contributions of this paper are outlined below:
- (1)
Hybrid method for surface reconstruction. To achieve the structural integrity and accurate framework essential for building modeling, a hybrid method is proposed. This approach combines connection evaluation and framework optimization to reconstruct building surfaces from photogrammetric point clouds.
- (2)
Candidate plane generation method. To solve the problem of poor effect of plane primitive extraction, a candidate plane generation method is proposed to remove redundancies and merge similar elements.
- (3)
Improved connection evaluation method. To ensure the integrity of the model framework, based on connection evaluation methods, an improved method is proposed. This method extracts wall axes from adjacent planes in any direction and calculates the intersection mode to extract potential skeleton lines from different planes.
- (4)
Plane shape modeling method. To make the modeling result close to the actual situation while retaining vital boundary details, a plane shape optimization method based on topological connection rules is proposed, which converts the plane modeling task into an optimal boundary loop selection problem.
The subsequent sections of this paper are as follows.
Section 2 introduces the workflow of the proposed method and describes the process for each step in detail.
Section 3 presents the experimental results and conducts comparative analyses against other methods.
Section 4 summarizes the findings and draws the conclusions.
3. Experiments
To evaluate the proposed approach, a building is taken as an example to give the whole process result of modeling. Then, this work qualitatively and quantitatively compares the reconstruction quality of different buildings with those of some SOTA methods. Experiments demonstrated the advantages of our hybrid method of connection evaluation and framework optimization for building surface reconstruction. In this section, experimental comparisons are made with SOTA traditional methods and deep learning methods, respectively, and two kinds of metrics are used to verify simplicity and accuracy. Moreover, to demonstrate the superiority of the proposed method in the 3D reconstruction of buildings, this work further illustrates the application based on the model result. Taking a building as an example, a textural model and LOD3 model based on the surface model are constructed. The results are shown below.
3.1. Data and Metrics Selection
3.1.1. Experimental Data
The method in this paper can reconstruct piecewise planar buildings. To qualitatively and quantitatively analyze the proposed method, several real-world buildings with different shapes and complexity are chosen as shown in
Figure 10 (buildings 1–6). The photogrammetric point clouds are generated from oblique photography images of typical modern buildings captured in the suburban areas of China. Among buildings 1–6, buildings 1–4 are captured in Beijing and buildings 5–6 are captured in Zhuhai. Image orientation and point cloud generation are obtained by existing solutions including structure form motion (SFM) and multi-view stereo (MVS). The descriptions of the photogrammetric surveys employed for the experiments and their characteristics are shown in
Table 1. The major components of buildings in the point clouds are visible, yet there remains some noise and absence of the point clouds due to the factors such as shooting heights and occlusions. As for the comparison experiment with the deep learning algorithm proposed in Building3D [
35], in order to avoid the difference caused by different data characteristics, three buildings in the dataset of the original paper are used for comparison, as shown in
Figure 10 (buildings 7–9). Notably, the point clouds in Building3D are laser point clouds instead of photogrammetric point clouds, which are not our focus. In addition, to well display the characteristics of the buildings, scalebars and coordinate axes are shown below the building point clouds in
Figure 10, and the unit of the scalebars is meter (m).
Moreover, according to the method proposed in
Section 2, thresholds
,
,
,
,
, and
and weights
,
,
, and
need to be set before the experiments take place. After an experimental comparison of the visual and data effects of different values of these factors, the thresholds and weights are set as shown in
Table 2. It is worth noting that the values can be modified according to the actual conditions of the building.
3.1.2. Metrics Selection
To quantitatively analyze the modeling effect, this work adopts metrics of two aspects to assess reconstruction performance, which are simplicity and accuracy.
The first aspect pertains to the simplicity of the models. Since the objective is to create lightweight polygonal models, the number of vertices (Ver. No.) and the number of faces (Fac. No.) in the model become critical evaluation factors. Taking building 1 we mentioned in
Section 3.1.1 as an example,
Figure 11a represents the built 3D model,
Figure 11b represents the vertices on the model (each point is a sphere),
Figure 11c represents the wireframe of the model, and
Figure 11d represents the faces based on the wireframe (each face is a triangular face). The small numbers of vertices and faces signify the simple reconstruction model. This work uses the software CloudCompare to count the number of vertices and faces. After importing the model into CloudCompare, the number of vertices can be obtained through “Cloud” and the number of faces can be obtained through “Mesh” in column “Properties”. The units of vertices and faces are both individual.
The second aspect concerns the accuracy of the models. This work mainly uses the orthogonal distance from the original building points to their nearest faces of the building 3D model (P2M) and the shortest distance from the building model vertices to the original building points (O2P) to assess the model accuracy [
15,
16]. Taking building 1 we mentioned in
Section 3.1 as an example,
Figure 12a represents the combination of the input point clouds and the built 3D model,
Figure 12b represents the P2M distance of each point in the point clouds,
Figure 12c represents the combination of the input point clouds and the building model vertices, and
Figure 12d represents the O2P distance of each building model vertex. For an intuitive comparison, this work calculates the mean P2M distance (Mean P2M) and the mean O2M distance (Mean O2P). Moreover, for a mean P2M distance, the lower the value, the higher the fit degree between the input point clouds and the output model mesh. For a mean O2M distance, the lower the value, the fewer artificial points exist and the more accurate the retention of key features. Therefore, both low values indicate accurate results. The mean P2M distance and the mean O2P distance can also be calculated using the software CloudCompare. After importing the model and the original point cloud into CloudCompare, we can use “Tools”-“Distances”-“Mesh”-“Cloud/Mesh Dist.” and “Cloud/Cloud Dist.” to calculate the mean P2M distance and the mean O2P distance. The unit for both P2M and O2P is meter (m). meter (m) is also the unit of the scalebars in
Figure 12.
3.2. Modeling Result Example
To explain the experimental results of each step in the modeling process, the modeling process diagram is given with building 1 as an example.
As shown in
Figure 13, in the candidate plane generation step, the building point cloud (a) is segmented first. However, some point cloud parts in the segmentation result are not suitable for modeling alone, such as the red dotted box parts in (b). After redundancies are removed and similar planes are merged, a relatively satisfactory result (c) can be obtained.
Based on the candidate planes, in the skeleton lines generation step, a pairwise intersection is used to find the potential skeleton lines of the building. Taking the plane in the red dotted box in (d) and its two adjacent planes as an example, two intersection lines (the black lines in (d)) can be calculated. Moreover, according to the condition of the axes of the plane (the green lines and the red lines in (d)) when intersecting, the mode of each intersection is judged, which will be used in subsequent loop selection. After calculating all the intersection relations of the current plane, and ((e) and (f)) can be obtained to generate the final skeleton lines (g). Notably, the final skeleton lines are not simple enough, which should be further selected for modeling.
In the plane shape modeling step, by combining the situations of the point cloud and the skeleton lines (h), this work calculates the optimal loop (i) for modeling. The optimal loop not only can fit the point cloud basically but also is easy to triangulate. This is useful for obtaining a correct and simple mesh model.
Finally, this work combines models of different planes to achieve building reconstruction.
3.3. Comparison with Traditional Methods
To demonstrate the advances of our method, this work conducts experiments with two SOTA traditional building surface reconstruction methods, namely the 2.5D dual contouring method (2.5DC) [
41] and the PolyFit method [
42].
3.3.1. Modeling Results and Qualitative Analysis
Table 3 shows the reconstruction results of the three methods of buildings 1–6.
From the above visual effects of different reconstruction results, it can be seen that these methods can all obtain the polygonal surface models of buildings. Among them, the 2.5DC method mainly reconstructs point clouds by identifying building roofs and then extruding them to create the models. Although the major components can be retained, the framework information is ignored. In addition, the 2.5DC method performs poorly on flat surfaces, which hampers subsequent steps such as texture mapping. Regarding the PolyFit method, it can produce flat polygonal surface models and offer building framework information. However, due to the requirements of manifold and watertight models, along with some limitations in model optimization settings, some model details can be overly simplified. The method in this work is grounded in planes, and the framework is modeled by exploring the connection between these planes. Polygonal models of planes are then constructed using optimization equations. Consequently, this method preserves plane feature information to a great extent while also ensuring the correctness of the model framework.
3.3.2. Quantitative Analysis
The simplicity results for the six buildings in
Section 3.3.1 are shown in
Table 4. The results are the numbers of vertices and faces.
Table 4 shows that the proposed method can achieve the lowest numbers of vertices and faces of the three methods. Through the method analysis, it can be seen that the main reason for the large numbers of vertices and faces of 2.5DC is that their planes are not flat enough, while the main reason for the large numbers of vertices and faces of PolyFit is that their faces in a plane are too fragmented. For the method proposed in this paper, because the optimal closed loop is found for each plane and the triangular mesh is constructed based on the intersection points of the closed loop, it can not only ensure the flatness of the plane but also ensure there is as little segmentation as possible on the surface.
Table 5 shows the accuracy results of the three methods, where the unit of distances is meter (m).
As can be seen from the simplicity validation, the proposed method has small numbers of faces and vertices, which results in the faces covering large areas and the vertices being distributed at the edges of models. In fact, the photogrammetric point clouds are not perfectly flat but undulating, which can result in less-than-ideal face fits. Moreover, when the vertices are distributed at the edges of models rather than in the middle of the planes, the mean O2P value might be higher because there are no central vertices to provide averaging. However, despite these considerations, the experimental results show that on the whole, the mean P2M and O2P data generated by the proposed method are still more accurate compared to other methods. Furthermore, there is no significant deviation in the data, which further underscores the stability and reliability of the proposed method.
3.4. Comparison with Deep Learning Methods
Considering the constant development of deep learning algorithms, this work also conducts experiments with two SOTA deep learning 3D reconstruction methods, namely the IGR method [
33] and the Building3D method [
35]. Notably, the IGR method is not designed only for buildings; therefore, this work reconstructs buildings 1–6 for comparison. As for the Building3D method, because different types of buildings have different data characteristics, and the Building3D method is mainly aimed at laser point clouds, buildings 7–9 provided in the Building3D dataset are selected for comparison. Moreover, because IGR has too many grids, and Building3D have an uneven surface, only 3D models are displayed for a more intuitive display. The comparison results with the two deep learning methods are shown separately as follows.
3.4.1. Comparison Results with IGR
Table 6 shows the reconstruction comparison results of the IGR method and the proposed method for buildings 1–6.
Table 7 shows the simplicity comparison results of the IGR method and the proposed method for buildings 1–6. The results are the numbers of vertices and faces.
Table 8 shows the accuracy comparison results of the IGR method and the proposed method for buildings 1–6. The unit of accuracy results is meter (m).
From the above results, it can be seen that the reconstruction results of the IGR method are more consistent with the point clouds, and the mean P2M and mean O2P values of IGR are lower than the method proposed in this work. However, the IGR method builds complex mesh models, which not only results in a very high number of vertices and faces but also can lead to unwanted tortuous meshes and even some distortion due to noise points. Moreover, the goal of this work is to build LOD2 mesh models, which can be further used to build textural models and LOD3 models. Therefore, the simple mesh model is suitable for building subsequent models, and this work gives an application example in
Section 3.5.
3.4.2. Comparison Results with Building3D
Table 9 shows the reconstruction comparison results of the Building3D method and the proposed method for buildings 7–9.
Table 10 shows the simplicity comparison results of the Building3D method and the proposed method for buildings 7–9. The results are the numbers of vertices and faces.
Table 11 shows the accuracy comparison results of the Building3D method and the proposed method for buildings 7–9. The unit of accuracy results is meter (m).
Building3D adopts self-supervised pre-training methods for 3D building reconstruction and can build simple LOD2 mesh models. However, because Building3D does not take into account the actual framework of the buildings, some models have uneven surfaces. Uneven surfaces not only pose potential problems for subsequent steps such as texture mapping but also increase the number of faces. Since the proposed method is based on the plane point cloud for surface reconstruction, each plane retains the topological structure characteristics well. The reconstructed results of the proposed method not only look closer to the actual situation but also show smaller mean P2M and O2P values.
3.5. Application
Since the proposed method uses aerial images to generate point clouds through an SFM algorithm, the correspondence between point clouds and images can be obtained, enabling texture mapping. In addition, the skeletons are extracted after plane fitting based on point cloud segmentation, which can not only determine the topological relationship between planes but also facilitate the clipping of flat texture images. The flat texture images ensure that the building is not distorted, and the components can be extracted for LOD3 reconstruction. In order to illustrate the application of our method in CityGML, this work takes building 1 as an example to build a textural model and LOD3 model. The results are shown in
Figure 14.
From the above results, it can be seen that the model built by the proposed method can generate satisfying LOD2 textural and LOD3 models, which is closely related to the plane fitting and topological relationship detecting carried out by the photogrammetric point cloud in this work.
5. Conclusions
This work proposes a hybrid method of connection evaluation and framework optimization for building surface reconstruction from photogrammetric point clouds. Given the point clouds of a single planar building, a candidate plane generation method is firstly proposed to remove redundancies and merge similar elements to solve the poor effect of plane primitive extraction. Secondly, through the improvement of the connection evaluation method, this work solves the problem of connecting two plane primitives in any direction, and the skeleton lines of each plane are extracted. Thirdly, the plane modeling task is converted into an optimal boundary loop selection problem to realize plane shape modeling. Finally, by plane primitive merging and holes filling, a complete 3D building model is generated.
Experiment comparisons of our method with SOTA methods prove the qualitative and quantitative advantages of the proposed method in 3D building reconstruction. Reconstruction results and qualitative analysis demonstrate that the proposed method can reproduce the details while maintaining the major structure of the building. Meanwhile, quantitative analysis reveals that the proposed method can build simple models while ensuring the accuracy of modeling. Moreover, constructing a textural model and LOD3 model on the basis of a 3D model also shows the superiority of this work.
Although this work proposes a candidate plane generation method, the modeling results still rely on the initial plane segmentation. This reliance on the segmentation process is a common problem encountered in primitive-based methods. If there is a significant lack of data, addressing how to complete the modeling becomes a critical challenge that needs to be tackled in future research. This could involve developing robust methods for handling incomplete or sparse data and exploring techniques for data augmentation to improve the modeling process. In addition, for some buildings with curved parts, how to apply some of the steps in this work, such as skeleton line extraction, to other modeling algorithms, such as deep learning algorithms, will also be the focus of future research. Through the fusion of different algorithms, it may be possible to obtain a simple polygonal surface model on the basis of preserving the original topological relationship between curved and flat surfaces. Finally, to obtain the high LOD model, adding semantic, material, structure and other information on the polygon surface model is also future research content. On this basis, it is hoped that more building simulations can be further carried out.