Existing Buildings Recognition and BIM Generation Based on Multi-Plane Segmentation and Deep Learning

Wang, Dejiang; Liu, Jinzheng; Jiang, Haili; Liu, Panpan; Jiang, Quanming

doi:10.3390/buildings15050691

Open AccessArticle

Existing Buildings Recognition and BIM Generation Based on Multi-Plane Segmentation and Deep Learning

by

Dejiang Wang

^1,*

,

Jinzheng Liu

¹,

Haili Jiang

²

,

Panpan Liu

² and

Quanming Jiang

¹

School of Mechanics and Engineering Science, Shanghai University, Shanghai 200444, China

²

Shanghai Highway and Bridge (Group) Co., Ltd., Shanghai 200433, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(5), 691; https://doi.org/10.3390/buildings15050691

Submission received: 25 January 2025 / Revised: 18 February 2025 / Accepted: 21 February 2025 / Published: 22 February 2025

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

Point cloud-based BIM reconstruction is an effective approach to enabling the digital documentation of existing buildings. However, current methods often demand substantial time and expertise for the manual measurement of building dimensions and the drafting of BIMs. This paper proposes an automated approach to BIM modeling of the external surfaces of existing buildings, aiming to streamline the labor-intensive and time-consuming processes of manual measurement and drafting. Initially, multi-angle images of the building are captured using drones, and the building’s point cloud is reconstructed using 3D reconstruction software. Next, a multi-plane segmentation technique based on the RANSAC algorithm is applied, facilitating the efficient extraction of key features of exterior walls and planar roofs. The orthophotos of the building façades are generated by projecting wall point clouds onto a 2D plane. A lightweight convolutional encoder–decoder model is utilized for the semantic segmentation of windows and doors on the façade, enabling the precise extraction of window and door features and the automated generation of AutoCAD elevation drawings. Finally, the extracted features and segmented data are integrated to generate the BIM. The case study results demonstrate that the proposed method exhibits a stable error distribution, with model accuracy exceeding architectural industry requirements, successfully achieving reliable BIM reconstruction. However, this method currently faces limitations in dealing with buildings with complex curved walls and irregular roof structures or dense vegetation obstacles.

Keywords:

existing buildings; BIM reconstruction; deep learning; plane segmentation

1. Introduction

In the context of the rapidly evolving construction industry, building information models (BIMs) have become essential for the design, construction, and operational management of buildings. However, a considerable proportion of existing buildings continue to rely on traditional methods, lacking the integration and utilization of BIM [1]. According to statistics from China’s Ministry of Housing and Urban–Rural Development, by the end of 2024, the total area of existing buildings in the country will exceed 80 billion square meters, of which more than 90% lack a complete BIM. These buildings frequently face issues, such as incomplete information, inefficient operations, and maintenance difficulties. One critical obstacle is the absence of an As-designed building information model (BIM), as difficulties in acquiring and accessing relevant data adversely impact operational and maintenance efficiency, leaving room for significant improvement [2,3,4].

To address this issue, manually enriching BIM semantics has emerged as a viable solution [5]. However, while manual methods provide high accuracy, their labor-intensive, time-consuming, and high-cost nature limits their scalability [6,7,8]. In contrast, automated or semi-automated techniques for BIM reconstruction can significantly enhance modeling efficiency and minimize errors introduced by manual intervention. This approach involves 3D reconstruction, wherein spatial geometric data are acquired using advanced data acquisition tools and utilized to generate detailed 3D models [9,10].

In recent years, researchers have proposed various point cloud-based methods for BIM reconstruction, achieving significant advancements in accuracy and efficiency. Among these, façade modeling is a critical component of BIM reconstruction. It focuses on extracting geometric features of buildings from point cloud data, including windows, doors, and protrusions [11,12]. Zolanvari et al. [13] employed a slicing method to extract features from façades and roofs. They used the RANSAC algorithm to detect planes and identified hole regions in slices through horizontal or vertical cuts, enabling the extraction of boundaries for elements such as windows and doors. Fan et al. [14] proposed a RANSAC-based façade layout method, which segments façade components and detects their edge orientations to extract window and door contours.

Deep learning technology has been extensively applied to roof modeling in recent years. Many studies have employed deep learning algorithms for building extraction and roof shape classification [15,16,17,18]. These methods often face limitations, such as low accuracy, in planar surface extraction and restricted ability to identify complex roof structures [19,20]. Furthermore, Albano et al. [21] compared a fuzzy C-means clustering method with a combined region-growing segmentation and random sample consensus (RANSAC) approach. They found that, while the latter approach requires more processing time, it achieves higher segmentation accuracy. Hu et al. [22] introduced an outlier detection-based optimization algorithm that enhances roof plane extraction accuracy by reducing noise.

In summary, recent advancements in automated BIM reconstruction can be broadly categorized into two paradigms: (1) RANSAC-based geometric feature extraction methods [13,14], which excel at planar surface detection but struggle with complex geometries and semantic enrichment; (2) deep learning-driven approaches [15,16,17,18], which achieve high semantic accuracy for elements like roofs but require substantial computational resources. The proposed hybrid methods combining geometric and deep learning techniques, which balance accuracy and efficiency, inherits the advantages of the above two methods. In addition, BIM reconstruction still faces several significant challenges. On one hand, automation in the extraction of features for complex building structures is still at an early stage [23,24]. On the other hand, there is limited research on accurately identifying and extracting semantically rich building elements (e.g., doors and windows) from point cloud data [25,26,27].

To address these issues, this paper proposes a method for the recognition and BIM modeling of existing buildings, utilizing multi-plane segmentation and deep learning techniques. Specifically, the study follows four main steps:

Capturing multi-angle images of buildings using drones and generating point cloud data through 3D reconstruction software;
Segmenting the building point cloud data using a RANSAC-based multi-plane segmentation technique [28,29]. This enables the rapid extraction of planar features, such as walls and roofs;
Extracting key features from external walls and roofs, generating orthophotos of building façades, and performing semantic segmentation using a lightweight convolutional encoder–decoder model to identify door and window features, followed by the automatic generation of AutoCAD elevation drawings;
Finally, integrating the extracted features to reconstruct the BIM.

The proposed method facilitates the efficient generation of 3D models and the extraction of semantic information for existing buildings. This provides technical support for automating BIM generation and lays a solid foundation for the comprehensive digital management of complex buildings across their lifecycle.

2. Methodology

Figure 1 illustrates the overall framework of the method adopted in this study. The input comprises multiple 2D images of a building, while the output is the corresponding reconstructed BIM. The entire process is divided into four main steps: (1) Generating 3D point clouds from 2D images captured by drones, applying photogrammetry techniques to transform the 2D images into 3D point clouds. (2) Using multi-plane segmentation with the RANSAC algorithm to segment planar building elements (e.g., walls and roofs) in the point cloud and extract their key features for BIM reconstruction. (3) Generating orthophotos of building façades from segmented wall point clouds. The semantic segmentation of façade doors and windows is performed using a lightweight convolutional encoder–decoder model. Key features of doors and windows are then extracted, and AutoCAD elevation drawings are generated. (4) Finally, reconstructing the BIM. The specific details of each step will be elaborated in subsequent sections.

2.1. 3D Point Cloud Modeling

In this step, a drone equipped with a high-resolution camera was used to capture images from multiple angles. These images were processed using structure from motion (SfM) photogrammetry to generate the point cloud model within the specified area. Specifically, this study utilized the DJI Matrice 300 RTK drone equipped with a Zenmuse P1 camera, along with the DJI GS Pro application, to automatically plan flight routes. To ensure geometric accuracy, the forward and side overlap ratios were set to 80%, and the flight altitude was maintained at 80 m to optimize image coverage and spatial resolution.

ContextCapture Center (Bentley) is a widely used 3D modeling software that generates high-precision point cloud models from sequential 2D images, eliminating the need for expensive laser scanning equipment. The software supports a variety of imaging devices, including traditional digital cameras and drone imagery, and features advanced multi-source data fusion capabilities to achieve high modeling accuracy. Its core technology leverages 3D reconstruction algorithms based on structured illumination and advanced image matching techniques, enabling the creation of detailed and highly accurate models. These models can also be accessed and shared remotely via the internet. Specifically, this study utilized ContextCapture Center for point cloud generation and model visualization, as illustrated in Figure 2.

2.2. Multi-Plane Segmentation of Point Clouds

To automatically extract building feature data from 3D point clouds and perform BIM reconstruction, an initial step in this process is to extract the point cloud representing the building’s walls. The RANSAC algorithm is employed to segment the point cloud into planar features. Given that the normal vector of the façade wall point cloud is approximately parallel to the ground, this allows for the distinction between wall point clouds and roof point clouds. This method facilitates the efficient extraction of wall point clouds, which serves as a foundation for geometric modeling and semantic enrichment in subsequent BIM tasks. The algorithm process for the multi-plane segmentation of point clouds proceeds as follows:

Randomly sample three data points from the point cloud set P and compute the parameters M_p of the fitted plane;
Validate the fitted plane by classifying points that satisfy the parameters M_p as inliers and those that do not as outliers. Record the current number of inliers;
Check the termination conditions:
- If the proportion of inliers in the current plane exceeds a predefined threshold, or if the number of random sampling iterations reaches the maximum limit, the algorithm terminates;
- Otherwise, repeat the loop by sampling three new data points from P;
During the iterations, fit new plane parameters and validate them. Models with fewer inliers are discarded, while those with more inliers are retained. This ensures that the model parameters $\hat{M}$ _p correspond to the optimal plane fitting for the segmentation task.

The following Algorithm 1 is a simplified pseudocode representation of the RANSAC algorithm.

Algorithm 1. Simplified pseudocode representation of the RANSAC algorithm.

Require: Point cloud P, threshold T, number of iterations K

Ensure: Optimal fitted plane

\hat{M}

_p

1:

\hat{M}

_p

\leftarrow

null

2: best_score

\leftarrow

0

3: for k = 1 to K do

4: Randomly select the minimal sample set

P^{n}

from point cloud P

5: Fit the plane M using the sample set

P^{n}

, obtaining the plane parameters

M_{p}

6: inliers

\leftarrow

[]

7: for each point p in P do

8: if p is consistent with the fitted plane then

9: Add p to inliers

10: end if

11: end for

12: if the number of inliers is greater than best_score then

13: Update best_score and

\hat{M}

_p

14: end if

15: end for

16: Return

\hat{M}

_p

By merging segmented plane point clouds, a resulting set of merged point cloud planes is produced, as depicted in Figure 3. From this set, vertical planes satisfying specific geometric criteria are selected as candidate wall surfaces. A plane is considered a vertical wall if the angle between its normal vector n and the vertical direction v satisfies the condition |n∙v| < ε, where n is the normal vector of the plane, v = (0, 0, 1)^T represents the unit vector in the vertical direction, and ε is the threshold for determining verticality. The threshold ε is defined as the cosine of the angle threshold. For example, if the angle threshold is 90° ± 1°, then ε is calculated as cos(90° ± 1°).

Conversely, a plane is considered horizontal, representing a roof slab, if the angle threshold condition for horizontal alignment is met. Surfaces with areas smaller than 1 square meter are excluded from the candidate wall set to eliminate irrelevant or insignificant features.

2.3. Key Building Feature Extraction

Following the semantic segmentation of the point cloud, the building’s wall point cloud was extracted, and the next step involves separately identifying the key features of the exterior walls, roof, and doors and windows.

2.3.1. External Wall and Roof Feature Extraction

The wall point cloud (Figure 4a) and the roof point cloud (Figure 4d) obtained in Section 2.2 are utilized in this stage. The wall point cloud is first projected onto the XOY plane along the Z-axis (Figure 4b). Next, image edge detection techniques are applied to identify the region enclosed by the walls (Figure 4c). The height of the walls is determined by calculating the maximum height difference of the points in each wall. The roof profile is defined by the region enclosed by the walls, and the roof height is chosen as the median height of the roof point cloud to represent the overall height of the planar roof (Figure 4e).

Although the wall and roof plane point clouds were successfully extracted using the RANSAC method, some outliers may remain in the wall point cloud after plane segmentation. Removing these outliers is essential to eliminate noise and improve the accuracy of subsequent feature extraction. In this study, a statistical filtering method [30] is employed, which identifies and removes outliers by analyzing the statistical deviation of point distances from their neighbors.

2.3.2. Door and Window Feature Extraction

This section focuses on processing each exterior wall segmented from the raw point cloud, as shown in Figure 5a,b in Section 2.2. The point-cloud2orthoimage tool [31] is used to generate façade orthophotos with a specified ground sample distance (GSD) from photogrammetric or LIDAR point cloud data (Figure 5c). Next, façade analysis is conducted using a lightweight convolutional encoder–decoder model for the pixel-level segmentation of walls, doors, and windows (Figure 5d). Contour detection and minimum bounding rectangle fitting techniques are employed to derive simplified and normalized shapes for walls, doors, and windows (Figure 5e). Finally, the pywin32 library automates façade plan generation in AutoCAD, with Python scripts managing the opening of drawings, graphical element creation, and file saving (Figure 5f).

The pointcloud2orthoimage tool [31] generates façade orthophotos with a specified ground sample distance (GSD) from point cloud data through interpolation and color processing, while preserving both the geometric and color information of the point cloud.

After generating the 2D façade orthophoto, we developed a lightweight convolutional encoder–decoder model (Figure 6) for the pixel-level segmentation of the orthophoto, enabling the differentiation of objects such as walls, doors, and windows. The encoder component extracts features from the orthophoto through convolutional layers, while the decoder classifies these features into corresponding labels using additional convolutional layers. The model includes three max-pooling and three upsampling layers, ensuring the output label image matches the input orthophoto dimensions. The model employs integer encoding to represent multiple classes within a single output channel. Instead of using one-hot encoding (which requires multiple output channels), each class (e.g., walls, windows, doors) is assigned a unique integer value (e.g., 0 for background, 1 for windows, 2 for doors). The sigmoid activation function outputs continuous values (0–1), which are scaled to integers (0–255) to represent class labels. This approach reduces memory usage and computational costs compared to multi-channel SoftMax outputs. Its lightweight architecture is optimized for deployment on graphics cards with limited CUDA cores.

The choice of a custom lightweight encoder–decoder architecture over established models like U-Net or DeepLab was driven by two key considerations: (1) Computational efficiency: our architecture reduces model complexity by 60% compared to U-Net, enabling deployment on GPUs with limited CUDA cores (e.g., NVIDIA GTX 1060). (2) Task-specific optimization: façade segmentation primarily involves detecting rectangular windows/doors rather than complex textures, allowing us to prioritize spatial resolution over deep feature hierarchies.

To handle large orthophotos, we employed a splitting and assembling strategy based on a sliding window algorithm during model training and prediction. Specifically, the splitting algorithm moves a 64 × 64-pixel sliding window with a step size of 32 pixels along both the width and height of the image, splitting the input image into overlapping patches with a 50% overlap, each assigned a unique index. The assembling algorithm reconstructs the original image by aligning the patches using their indices, producing the final output.

After obtaining the simplified façade element pixel coordinates through contour detection and minimum bounding rectangle fitting, these pixel coordinates need to be converted into real-world coordinates. Since the orthophoto’s coordinate origin is at the top-left corner and AutoCAD/Revit software’s coordinate origin is at the bottom-left corner, a coordinate system transformation is performed to align the two origins. The real-world coordinate data are saved in a Notepad (.txt) file. Subsequently, the pywin32 library is used to automatically generate the front, back, left, and right façade plans in AutoCAD 2022. Python 3.8 scripts automate the process in AutoCAD, including opening files, creating graphical elements, and saving the completed drawings (Figure 7).

2.4. BIM Generation

During Revit model generation, feature information extracted from the point cloud data (e.g., walls, roofs, doors, and windows) is transformed into a Revit model using the Revit API, as shown in Figure 8. Extracted wall profiles and height data are utilized to generate walls in Revit via the Wall. Create method. For roofs, extracted boundaries and elevation data are applied to create flat or sloped roofs using methods such as NewExtrusionRoof. For doors and windows, profile data are employed to place corresponding family instances at specified locations using the FamilyInstance method. Coordinate transformation is performed throughout the process to ensure alignment with Revit’s coordinate system. Parametric features, such as wall height and window size, are fine-tuned through the Revit API to align with actual design requirements. Once all elements are created and adjusted, the model is saved as a Revit project file (.rvt) for subsequent design and analysis.

3. Experiments

3.1. Case Study

The dataset was collected in Baoshan District, Shanghai, China. The dataset encompassed 15 buildings and their surroundings, capturing diverse building types and environmental features, including vegetation, roads, and other structures. Approximately, 200–300 images per building were captured from multiple angles to ensure comprehensive coverage for reconstruction. In total, 3676 images were collected at a resolution of 5280 × 3956.

Following data acquisition, the images were processed using ContextCapture Center software to generate point clouds. Point clouds for all 15 building areas were created, resulting in approximately 500 million data points. A complexly structured building within the study area was selected for validating the proposed method (Figure 9).

The RANSAC algorithm is first used to segment point cloud planes, selecting those that satisfy the condition |n∙v| < ε as wall surfaces (Figure 10a). Next, the wall point cloud is projected onto the XOY plane along the Z-axis (Figure 10b). Using image edge detection techniques, the contour edges are fitted into straight lines to identify the region enclosed by the walls (Figure 10c). The point cloud within this region is treated as the roof point cloud (Figure 10d). The RANSAC algorithm is then applied again to perform plane fitting on the roof point cloud to obtain the roof plane model (Figure 10e). Finally, with the reconstructed roof and wall point clouds, we generate the BIM (Figure 10f).

For each wall point cloud obtained through multi-plane segmentation (Figure 11a), the pointcloud2orthoimage tool converts planar wall point clouds into orthophotos with a specified ground sample distance (GSD) (Figure 11b). A lightweight convolutional encoder–decoder model is then applied to the orthophoto for the pixel-level segmentation of façade elements, including walls, doors, and windows (Figure 11c). Contour detection and minimum bounding rectangle fitting are used to simplify and normalize the shapes of walls, doors, and windows (Figure 11d), from which door and window coordinates are extracted. These pixel coordinates are converted into real-world measurements using the GSD, followed by a coordinate system transformation to align with the CAD system. The pywin32 library in Python is then employed to automatically generate CAD drawings based on these coordinates (Figure 11e).

The lightweight convolutional encoder–decoder model was trained using a dataset of 300 annotated façade orthophotos (240 for training, 30 for validation, 30 for testing). To enhance robustness, data augmentation techniques including random rotation (±15°), horizontal/vertical flipping, and brightness adjustment (±20%) were applied during training. The model was optimized using the Adam algorithm with a learning rate of 1 × 10⁻⁴ and a batch size of 8. The loss function used is cross-entropy loss. Training proceeded for 200 epochs. The training configuration and hyperparameters are detailed in Table 1.

Finally, the extracted data are imported into Revit, where the BIM is automatically generated through customized programming using the Revit API. Figure 12 illustrates how the BIM is reconstructed using the proposed method. The reconstructed BIM demonstrates accurate alignment with the point cloud data, highlighting the high precision and reliability of the method.

3.2. Evaluation Metrics

3.2.1. Semantic Segmentation Metrics

To evaluate the performance of the lightweight encoder–decoder model in recognizing doors and windows, we adopted standard semantic segmentation metrics: mean intersection-over-union (mIoU) and pixel accuracy (PA). The formula is as follows:

m I o U = \frac{1}{K} \sum_{k = 1}^{K} \frac{T P_{k}}{T P_{k} + F P_{k} + F N_{k}}

(1)

P A = \frac{\sum_{k = 1}^{K} T P_{k}}{\sum_{k = 1}^{K} T P_{k} + F P_{k} + F N_{k}}

(2)

where

T P_{k}

,

F P_{k}

, and

F N_{k}

are the true positives, false positives, and false negatives for class

k

(e.g., doors, windows, walls).

3.2.2. Model Accuracy Metrics

To accurately evaluate the reconstruction accuracy of the building information model (BIM), we designed a set of model accuracy evaluation metrics, divided into length evaluation metrics, which include length error and accuracy percentage, and location evaluation metrics. For each building, the Leica D510 laser distance meter was used to measure the actual values for length and location evaluation. The length evaluation metrics are based on the width and height of doors and windows, as well as the length and height of walls. The location evaluation metrics are calculated based on the horizontal (x) and vertical (y) coordinates of doors and windows. The formula for calculating the length evaluation metric is as follows:

L e n g t h e r r o r = \sum \frac{l_{m e a s u r e d}}{L} | l_{m e a s u r e d} - l_{m o d e l} |

(3)

L e n g t h a c c u r a c y p e r c e n t a g e = \sum \frac{l_{m e a s u r e d}}{L} | 1 - \frac{l_{m e a s u r e d} - l_{m o d e l}}{l_{m e a s u r e d}} | \times 100 %

(4)

where L represents the sum of the measured lengths of doors, windows, and walls,

l_{m e a s u r e d}

is the measured length, and

l_{m o d e l}

is the model-reconstructed length.

The formula for calculating the location evaluation metrics is as follows:

L o c a t i o n e r r o r = \frac{1}{N} \sum \sqrt{{(x_{m e a s u r e d} - x_{m o d e l})}^{2} + {(y_{m e a s u r e d} - y_{m o d e l})}^{2}}

(5)

where N represents the total number of doors and windows,

x_{m e a s u r e d}

and

y_{m e a s u r e d}

represent the measured horizontal and vertical coordinates of the doors and windows, and

x_{m o d e l}

and

y_{m o d e l}

represent the reconstructed horizontal and vertical coordinates of the doors and windows.

This evaluation metric system provides an objective assessment of the reconstructed building information model (BIM). By comparing the measured and reconstructed values, it quantifies the model’s precision, thereby enhancing its quality.

4. Results

This section presents the experimental results of the proposed method, focusing on two key aspects: door/window/wall recognition accuracy and BIM reconstruction performance.

4.1. Performance of DeepLearning

The performance of the lightweight convolutional encoder–decoder model for door and window segmentation was evaluated using mean intersection-over-union (mIoU) and pixel accuracy (PA). For comparison, we implemented two state-of-the-art models, U-Net and DeepLabV3+, under the same training conditions.

As shown in Table 2, the proposed lightweight model achieves competitive accuracy (mIoU: 92.3% vs. 93.1% for U-Net), while reducing parameters by approximately 60% and inference time by approximately 40%. DeepLabV3+ achieves the highest mIoU (94.2%) but at the cost of 4.8× more parameters and 2.5× slower inference speed, making it less suitable for resource-constrained applications. The high PA (94.1%) of the proposed model demonstrates its robustness in pixel-wise classification, particularly for small objects like windows and doors.

4.2. Analysis of BIM Reconstruction Results

As shown in Table 3, the Leica D510 laser distance meter was used to measure the dimensions (height and width) and coordinates (horizontal and vertical) of 104 doors and windows, along with the dimensions (length and height) of 18 walls.

Combining these measurements with the corresponding data from the reconstructed model, we calculated the length evaluation metrics and location evaluation metrics. As shown in the calculation results in Table 4 and Table 5, the length error was 4.1 cm, the length accuracy percentage was 98.7%, and the location error was 5.4 cm. The model errors arise from multiple stages of the workflow, and we have analyzed their primary causes as follows. During data acquisition and point cloud generation, occlusions (e.g., dense vegetation) in certain areas of the building lead to incomplete or missing point cloud data. In the wall and roof feature extraction stage, the projection of wall point clouds onto the XOY plane assumes alignment with the coordinate system, which may cause distorted projections if the building has a rotated or non-orthogonal layout. The small errors in this step are likely to be amplified in the subsequent steps. During door and window feature recognition, local fluctuations in point cloud density or projection distortions, combined with the ground sample distance (GSD = 5 mm/pixel), can introduce errors in the calculated dimensions of doors and windows. Additionally, cumulative errors occur during the multi-step data transformation process (point cloud → CAD → BIM), as small deviations at each stage propagate and amplify in the final BIM. These factors collectively contribute to the observed errors, which are quantified as 4.1 cm in length and 5.4 cm in location.

These results indicate that the proposed modeling method exhibits a stable and consistent error distribution, and the model accuracy meets the predefined criteria, effectively completing the reconstruction task. Overall, the method proposed in this paper provides an efficient and reliable tool for reconstructing actual building models, offering improved precision compared to traditional methods.

5. Conclusions

In this study, we employed drone technology to capture high-resolution image data and processed the point cloud using multi-plane segmentation and feature extraction techniques, successfully achieving the rapid reconstruction of building information models (BIMs). The research findings demonstrate that the proposed method efficiently extracts external building features and generates BIMs that meet the accuracy standards of the construction industry. The key contributions of this study are summarized as follows.

Multi-plane segmentation: Using RANSAC-based multi-plane segmentation techniques, this method effectively separates wall, roof, and other planar features from point clouds, enabling the efficient and accurate extraction of building feature data. This automated approach not only enhances the speed of point cloud processing but also minimizes the need for manual intervention.
Key feature extraction: A novel approach was developed for extracting external wall and roof features. Additionally, a lightweight convolutional encoder–decoder deep learning model was employed to perform pixel-level segmentation on façade orthoimages, enabling the accurate and efficient extraction of door and window features.
Evaluation metrics: Length and position evaluation metrics were designed to comprehensively assess the reconstruction accuracy of the building information model. These metrics consider not only the dimensions of building elements but also their spatial accuracy, providing a more comprehensive evaluation of the model’s precision.

Despite the significant achievements of this study in the reconstruction of BIMs for existing buildings, some limitations still exist:

In cases where buildings are obstructed by trees, dense vegetation, or other structures, missing point cloud data may occur, which limits the accuracy of feature extraction for walls and their associated doors and windows.
This method is currently primarily suitable for buildings with relatively simple geometric shapes; improvements are needed to handle complex curved walls and irregular roof structures, particularly in terms of segmentation and feature fitting.

These minor defects can be efficiently addressed through minimal manual post-processing. In future work, we plan to continue optimizing the methodological framework, focusing on addressing the challenges of accurately extracting features from complex curved structures, particularly in terms of segmentation and model fitting. Additionally, as deep learning technologies continue to advance, we will explore more advanced models and algorithms, aiming to make significant improvements in the precise extraction of building information from point clouds. We believe that this research framework will play a vital role in supporting the digital management of complex buildings in BIM reconstruction and will contribute to further advancements in this field.

Author Contributions

Data curation, J.L.; methodology, J.L., D.W., and Q.J.; software, J.L. and D.W.; validation, J.L. and D.W.; formal analysis, D.W.; resources, D.W., H.J., and P.L.; project administration, H.J.; writing—original draft preparation, J.L.; writing—review and editing, J.L. and D.W.; visualization, J.L.; supervision, D.W. and P.L.; funding acquisition, H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shanghai Special Fund for Promoting High-Quality Industrial Development, grant number 2023-GZL-RGZN-01031.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

Authors Haili Jiang and Panpan Liu were employed by the company Shanghai Highway and Bridge (Group) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Rocha, G.; Mateus, L. A survey of scan-to-BIM practices in the AEC industry—A quantitative analysis. ISPRS Int. J. Geo-Inf. 2021, 10, 564. [Google Scholar] [CrossRef]
Gao, X.; Shen, S.; Zhou, Y.; Cui, H.; Zhu, L.; Hu, Z.; Sensing, R. Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds. ISPRS J. Photogramm. Remote Sens. 2018, 143, 72–84. [Google Scholar] [CrossRef]
Guo, H.; Chen, Z.; Chen, X.; Yang, J.; Song, C.; Chen, Y. UAV-BIM-BEM: An automatic unmanned aerial vehicles-based building energy model generation platform. Energy Build. 2024, 328, 115120. [Google Scholar] [CrossRef]
Pantoja-Rosero, B.G.; Rusnak, A.; Kaplan, F.; Beyer, K. Generation of LOD4 models for buildings towards the automated 3D modeling of BIMs and digital twins. Autom. Constr. 2024, 168, 105822. [Google Scholar] [CrossRef]
Wang, D.J.; Jiang, Q.M.; Liu, J.Z. Deep-Learning-Based Automated Building Information Modeling Reconstruction Using Orthophotos with Digital Surface Models. Buildings 2024, 14, 808. [Google Scholar] [CrossRef]
Urbieta, M.; Urbieta, M.; Laborde, T.; Villarreal, G.; Rossi, G. Generating BIM model from structural and architectural plans using Artificial Intelligence. J. Build. Eng. 2023, 78, 107672. [Google Scholar] [CrossRef]
Zhao, Y.; Deng, X.; Lai, H. Reconstructing BIM from 2D structural drawings for existing buildings. Autom. Constr. 2021, 128, 103750. [Google Scholar] [CrossRef]
Gu, S.; Wang, D. Component Recognition and Coordinate Extraction in Two-Dimensional Paper Drawings Using SegFormer. Information 2024, 15, 17. [Google Scholar] [CrossRef]
Li, Z.X.; Shan, J. RANSAC-based multi primitive building reconstruction from 3D point clouds. ISPRS J. Photogramm. Remote Sens. 2022, 185, 247–260. [Google Scholar] [CrossRef]
Zhang, W.; Li, Z.; Shan, J. Optimal Model Fitting for Building Reconstruction from Point Clouds. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9636–9650. [Google Scholar] [CrossRef]
Deng, M.; Tan, Y.; Singh, J.; Joneja, A.; Cheng, J.C.P. A BIM-based framework for automated generation of fabrication drawings for façade panels. Comput. Ind. 2021, 126, 103395. [Google Scholar] [CrossRef]
Wang, B.; Li, M.; Peng, Z.; Lu, W. Hierarchical attributed graph-based generative façade parsing for high-rise residential buildings. Autom. Constr. 2024, 164, 105471. [Google Scholar] [CrossRef]
Zolanvari, S.M.I.; Laefer, D.F.; Natanzi, A.S. Three-dimensional building facade segmentation and opening area detection from point clouds. ISPRS J. Photogramm. Remote Sens. 2018, 143, 134–149. [Google Scholar] [CrossRef]
Fan, H.C.; Wang, Y.F.; Gong, J.Y. Layout graph model for semantic facade reconstruction using laser point clouds. Geo-Spat. Inf. Sci. 2021, 24, 403–421. [Google Scholar] [CrossRef]
Maltezos, E.; Doulamis, A.; Doulamis, N.; Ioannidis, C. Building Extraction From LiDAR Data Applying Deep Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 155–159. [Google Scholar] [CrossRef]
Chen, H.; Chen, W.; Wu, R.; Huang, Y. Plane segmentation for a building roof combining deep learning and the RANSAC method from a 3D point cloud. J. Electron. Imaging 2021, 30, 053022. [Google Scholar] [CrossRef]
Li, L.; Song, N.; Sun, F.; Liu, X.; Wang, R.; Yao, J.; Cao, S. Point2Roof: End-to-end 3D building roof modeling from airborne LiDAR point clouds. ISPRS J. Photogramm. Remote Sens. 2022, 193, 17–28. [Google Scholar] [CrossRef]
Otero, R.; Sanchez-Aparicio, M.; Lagüela, S.; Arias, P. Semi-automatic roof modelling from indoor laser-acquired data. Autom. Constr. 2022, 136, 104130. [Google Scholar] [CrossRef]
Dey, E.K.; Awrangjeb, M.; Stantic, B. Outlier detection and robust plane fitting for building roof extraction from LiDAR data. Int. J. Remote Sens. 2020, 41, 6325–6354. [Google Scholar] [CrossRef]
Sun, X.; Guo, B.; Li, C.; Sun, N.; Wang, Y.; Yao, Y. Semantic Segmentation and Roof Reconstruction of Urban Buildings Based on LiDAR Point Clouds. ISPRS Int. J. Geo-Inf. 2024, 13, 19. [Google Scholar] [CrossRef]
Albano, R. Investigation on Roof Segmentation for 3D Building Reconstruction from Aerial LIDAR Point Clouds. Appl. Sci. 2019, 9, 11. [Google Scholar] [CrossRef]
Hu, P.B.; Miao, Y.M.; Hou, M.L. Reconstruction of Complex Roof Semantic Structures from 3D Point Clouds Using Local Convexity and Consistency. Remote Sens. 2021, 13, 25. [Google Scholar] [CrossRef]
Hu, D.; Gan, V.J.; Yin, C. Robot-assisted mobile scanning for automated 3D reconstruction and point cloud semantic segmentation of building interiors. Autom. Constr. 2023, 152, 104949. [Google Scholar] [CrossRef]
Wang, B.; Chen, Z.; Li, M.; Wang, Q.; Yin, C.; Cheng, J.C. Omni-Scan2BIM: A ready-to-use Scan2BIM approach based on vision foundation models for MEP scenes. Autom. Constr. 2024, 162, 105384. [Google Scholar] [CrossRef]
Yang, F.; Pan, Y.T.; Zhang, F.S.; Feng, F.Y.; Liu, Z.J.; Zhang, J.Y.; Liu, Y.; Li, L. Geometry and Topology Reconstruction of BIM Wall Objects from Photogrammetric Meshes and Laser Point Clouds. Remote Sens. 2023, 15, 21. [Google Scholar] [CrossRef]
Wang, S.; Park, S.; Park, S.; Kim, J. Building façade datasets for analyzing building characteristics using deep learning. Data Brief 2024, 57, 110885. [Google Scholar] [CrossRef]
Bassier, M.; Vergauwen, M. Unsupervised reconstruction of Building Information Modeling wall objects from point cloud data. Autom. Constr. 2020, 120, 103338. [Google Scholar] [CrossRef]
Xu, B.; Jiang, W.S.; Shan, J.; Zhang, J.; Li, L.L. Investigation on the Weighted RANSAC Approaches for Building Roof Plane Segmentation from LiDAR Point Clouds. Remote Sens. 2016, 8, 23. [Google Scholar] [CrossRef]
He, Y.; Wu, X.; Pan, W.; Chen, H.; Zhou, S.; Lei, S.; Gong, X.; Xu, H.; Sheng, Y. LOD2-Level+ Low-Rise Building Model Extraction Method for Oblique Photography Data Using U-NET and a Multi-Decision RANSAC Segmentation Algorithm. Remote Sens. 2024, 16, 2404. [Google Scholar] [CrossRef]
Zhao, Q.; Gao, X.; Li, J.; Luo, L. Optimization Algorithm for Point Cloud Quality Enhancement Based on Statistical Filtering. J. Sensors 2021, 2021, 7325600. [Google Scholar] [CrossRef]
Jiang, Y.H.; Han, S.S.; Bai, Y. Scan4Facade: Automated As-Is Facade Modeling of Historic High-Rise Buildings Using Drones and AI. J. Archit. Eng. 2022, 28, 22. [Google Scholar] [CrossRef]

Figure 1. The overall flow of the proposed BIM reconstruction method.

Figure 2. 3D point cloud model generated using ContextCapture Center.

Figure 3. Point cloud plane segmentation and plane fusion.

Figure 4. External wall and roof feature extraction process.

Figure 5. Door and window feature extraction process.

Figure 6. Lightweight convolutional encoder–decoder model (patch size = 64 × 64 pixels).

Figure 7. Automatically generated architectural façade AutoCAD drawing.

Figure 8. Process of generating BIM.

Figure 9. Point cloud for the case study.

Figure 10. Extraction process of external wall and roof feature in the case study. (a) Wall point cloud; (b) projection of wall point cloud; (c) projection of wall point cloud; (d) roof point cloud; (e) roof plane model; (f) BIM.

Figure 11. Extraction process of door and window feature in the case study. (a) Raw point cloud; (b) wall façade orthoimage (GSD = 5 mm/p); (c) semantic segmentation based on convolutional network; (d) boundary regularization and coordinate extraction; (e) architectural façade AutoCAD drawing.

Figure 12. The reconstructed BIM in the case study.

Table 1. Training configuration and hyperparameters.

Parameter	Value/Description
Dataset size	300 images (240/30/30 split)
Data augmentation	Rotation, flipping, brightness adjustment
Batch size	8
Optimizer	Adam
Learning rate	1 × 10⁻⁴
Loss function	Cross-entropy loss
Epochs	200

Table 2. Model performance comparison.

Model	Parameters (M)	mIoU (%)	PA (%)	Inference Time (ms)
U-Net	31.0	93.1	95.7	120
DeepLabV3+	59.8	94.2	96.5	180
Proposed Model	12.4	92.3	94.1	70

Table 3. Measured data in this case.

Category	Measured Quantity
Door and window width	104
Door and window height	104
Window horizontal coordinate x	104
Window vertical coordinate y	104
Wall length	18
Wall height	18

Table 4. Calculation of length evaluation metrics.

Instance	$l_{m e a s u r e d}$ (m)	$l_{m o d e l}$ (m)	$\|l_{m e a s u r e d} - l_{m o d e l}\|$ (m)
Door width 1	1.35	1.32	0.03
Door height 1	2.15	2.21	0.06
Door width 2	1.34	1.39	0.05
Door height 2	2.13	2.10	0.03
Window width 1	2.03	1.98	0.05
Window height 1	2.41	2.40	0.01
Window width 2	4.53	4.55	0.02
Window height 2	2.42	2.45	0.03
⋯
Total	L = 780.74
$L e n g t h e r r o r = \sum \frac{l_{m e a s u r e d}}{L} \| l_{m e a s u r e d} - l_{m o d e l} \| = 4.1 c m$
$L e n g t h a c c u r a c y p e r c e n t a g e = \sum \frac{l_{m e a s u r e d}}{L} \| 1 - \frac{l_{m e a s u r e d} - l_{m o d e l}}{l_{m e a s u r e d}} \| \times 100 % = 98.7 %$

Table 5. Calculation of location evaluation metrics.

Instance	$x_{m e a s u r e d}$ (m)	$y_{m e a s u r e d}$ (m)	$x_{m o d e l}$ (m)	$y_{m o d e l}$ (m)
Door 1	1.74	0.40	1.69	0.39
Door 2	37.8	0.41	38.2	0.38
Window 1	0.66	17.55	0.62	17.60
Window 2	3.57	17.60	3.55	17.64
Window 3	6.22	17.62	6.25	17.63
⋯
$L o c a t i o n e r r o r = \frac{1}{N} \sum \sqrt{{(x_{m e a s u r e d} - x_{m o d e l})}^{2} + {(y_{m e a s u r e d} - y_{m o d e l})}^{2}} = 5.4 c m$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.; Liu, J.; Jiang, H.; Liu, P.; Jiang, Q. Existing Buildings Recognition and BIM Generation Based on Multi-Plane Segmentation and Deep Learning. Buildings 2025, 15, 691. https://doi.org/10.3390/buildings15050691

AMA Style

Wang D, Liu J, Jiang H, Liu P, Jiang Q. Existing Buildings Recognition and BIM Generation Based on Multi-Plane Segmentation and Deep Learning. Buildings. 2025; 15(5):691. https://doi.org/10.3390/buildings15050691

Chicago/Turabian Style

Wang, Dejiang, Jinzheng Liu, Haili Jiang, Panpan Liu, and Quanming Jiang. 2025. "Existing Buildings Recognition and BIM Generation Based on Multi-Plane Segmentation and Deep Learning" Buildings 15, no. 5: 691. https://doi.org/10.3390/buildings15050691

APA Style

Wang, D., Liu, J., Jiang, H., Liu, P., & Jiang, Q. (2025). Existing Buildings Recognition and BIM Generation Based on Multi-Plane Segmentation and Deep Learning. Buildings, 15(5), 691. https://doi.org/10.3390/buildings15050691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Existing Buildings Recognition and BIM Generation Based on Multi-Plane Segmentation and Deep Learning

Abstract

1. Introduction

2. Methodology

2.1. 3D Point Cloud Modeling

2.2. Multi-Plane Segmentation of Point Clouds

2.3. Key Building Feature Extraction

2.3.1. External Wall and Roof Feature Extraction

2.3.2. Door and Window Feature Extraction

2.4. BIM Generation

3. Experiments

3.1. Case Study

3.2. Evaluation Metrics

3.2.1. Semantic Segmentation Metrics

3.2.2. Model Accuracy Metrics

4. Results

4.1. Performance of DeepLearning

4.2. Analysis of BIM Reconstruction Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI