Automatic Classification of BIM Object Based on IFC Data Using the Uniclass Classification Standard

Tang, Shi; Bito, Takamasa; Shide, Kazuya

doi:10.3390/buildings15132347

Open AccessArticle

Automatic Classification of BIM Object Based on IFC Data Using the Uniclass Classification Standard

by

Shi Tang

^1,*

,

Takamasa Bito

² and

Kazuya Shide

³

¹

Graduate School of Engineering and Science, Shibaura Institute of Technology, Tokyo 135-8548, Japan

²

STARTS Research Institute, Ltd., Tokyo 103-0027, Japan

³

School of Architecture, Shibaura Institute of Technology, Tokyo 135-8548, Japan

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(13), 2347; https://doi.org/10.3390/buildings15132347

Submission received: 21 May 2025 / Revised: 24 June 2025 / Accepted: 2 July 2025 / Published: 4 July 2025

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

Classification of BIM objects is critical for enhancing information interoperability and standardization within construction projects; however, research on automated BIM object classification based on standardized classification systems remains limited. Therefore, this study proposes an automated method to classify BIM objects using IFC data under the Uniclass system, aiming to enhance standardization, semantic clarity, and practical applicability. The proposed method first assigns Uniclass codes to 8715 BIM objects, then extracts 13 types of IFC-derived feature variables—including semantic, spatial, and dimensional information, and uses 2 categories of Uniclass coding information (EF and Ss tables) as classification labels, each comprising 11 and 17 classes, respectively. A Random Forest model with 100 decision trees and 10-fold cross-validation is then employed to perform automatic classification. Experimental results demonstrate that the proposed method achieves classification accuracies of 1.00 and 0.99 for BIM objects under the Elements/Functions and Systems classification tasks. This study demonstrates that accurate and fine-grained classification of BIM objects can be achieved using only low-LOD IFC data, thereby contributing to standardized information structuring and facilitating intelligent model management during the early design phase.

Keywords:

Building Information Modeling; IFC; Random Forest; Uniclass; classification

1. Introduction

In the Architecture, Engineering, and Construction (AEC) industry, the implementation of building projects requires a high degree of collaboration and seamless information flow [1]. Building Information Modeling (BIM) is a technology that digitally represents information contained within buildings. BIM models, characterized by their rich and well-structured data, facilitate efficient information collaboration throughout the entire lifecycle of a construction project, thereby improving design and construction efficiency while effectively reducing waste and cost overruns [2]. Within the BIM environment, the Industry Foundation Classes (IFC) format functions as a centralized data storage medium for building projects [3]. Compared to traditional construction workflows, IFC data can be integrated with cutting-edge technologies, such as Big Data, Artificial Intelligence (AI), the Internet of Things (IoT), unmanned aerial vehicle (UAV) technologies, and Geographic Information Systems (GIS), thereby promoting the advanced application and development of BIM in construction engineering projects [4,5]. To support the application of such advanced technologies, BIM objects should be classified to enable structured information management, thereby improving their usability and maintainability [6]. Uniclass is an international standard for classifying building components in the construction field. It provides a coding framework and hierarchical structure for systematically organizing components, functions, and other elements [7]. This enhances the consistency and interoperability of information management and facilitates the fine-grained management and efficient reuse of information. Pupeikis et al. conducted a comparative study between Denmark’s Construction Classification System (CCI) and the United Kingdom’s Uniclass 2015 in the context of BIM applications. Their findings indicated that Uniclass 2015 offers certain advantages in terms of classification structure and applicability, enabling more effective information management and collaboration in BIM projects [8]. Although some studies have attempted to encode prefabricated components, traditional coding methods are often time-consuming, labor-intensive, and prone to errors [9]. Choi et al. proposed a rule-based automatic classification method, which improves efficiency compared to manual classification [10]. However, rule-based classification depends on manually defined rules based on expert knowledge. When data or task conditions change, the rules must be updated manually. Building such rule sets also requires experienced experts, whose knowledge is often subjective and hard to standardize or share. In recent years, with the development of artificial intelligence, classification methods based on image recognition and semantic analysis have emerged, leading to significant improvements in the accuracy and adaptability of automatic classification approaches [11]. Liu et al. proposed an Automatic Fine-Grained BIM Element Classification method, which aims to automate the detailed classification of elements within BIM. The study employs a multimodal deep learning (MMDL) approach, integrating geometric and semantic information to enhance classification accuracy and efficiency. This method enables the automatic classification of BIM elements, thereby improving information management and decision-making support during the design and construction processes [12]. Seydgar evaluated the performance of various deep neural network models based on point clouds for IFC object classification, demonstrating excellent results in the automatic classification of BIM information [13]. However, these methods primarily focus on classifying the geometric shapes of construction components and do not adopt a standardized classification framework. Xu et al. developed an automated classification and coding system using IFC data as feature variables within BIM models, achieving unique identification and information sharing for prefabricated components [14]. Although this study was the first to apply a standardized classification approach to BIM objects, it focused only on prefabricated components and did not cover all BIM object types. Previous research has not fully explored comprehensive classification of BIM data using standardized systems.

To address the lack of standardized and fine-grained classification for BIM objects, this study proposes an automated method using low-level development (LOD 200) IFC data. It examines whether BIM objects can be accurately classified with only low-LOD input, and whether such classification can follow standard systems like Uniclass. The study also evaluates if a data-driven model can enhance modeling efficiency and data interoperability. These questions arise from common issues in BIM workflows, where many components still require manual classification—an approach that is time-consuming, inconsistent, and experience-dependent. By exploring these challenges, the study demonstrates that a Random Forest model based on semantic, spatial, and dimensional features can provide a practical and scalable solution for automatic classification during early design stages.

The experiment starts with BIM data collection and Uniclass code assignment. IFC data are then extracted to generate feature variables for the dataset. A suitable machine learning model is trained to classify BIM objects into Level 4 categories in the Uniclass system. The performance of the Random Forest algorithm is finally validated against other models.

The remainder of this paper is organized as follows: Section 2 reviews existing research on BIM object classification, classification methods based on building information classification standards, and the application of the Random Forest algorithm for data classification. Section 3 details the proposed methodological framework, including the Uniclass Elements/Functions (EF) and Systems (Ss) classification structures adopted in this study, the data collection and feature selection methods for machine learning model training, and the experimental analysis methods. Section 4 presents the data preprocessing procedures, the classification models employed, and the experimental results. Section 5 discusses the experimental outcomes, summarizes the research contributions, compares the findings with existing studies, and outlines the study’s limitations and future research directions. Finally, Section 6 concludes the paper.

2. Related Work

2.1. Building Classification System

Architectural forms vary widely in both structure and function. Their basic components usually include structural systems, envelope elements, and interior finishes. To clearly define building characteristics and ensure design suitability, construction-related information must be organized within a classification framework [15]. Building Information Classification Systems (BICS) began to develop in the early 20th century in the United States and Europe. However, due to differences in laws, regulations, and construction practices, classification methods and structures vary across regions [2].

In 1948, Sweden developed the SfB classification system, which incorporated practical building content and is considered one of the earliest architectural classification systems in the world. It was subsequently adopted by countries such as the United Kingdom. In 1963, in the United States, the precursor to the MasterFormat system, known as the CSI Format for Construction Specification, was introduced to organize construction project documents by work sections and trades. Similarly, in 1987, the United Kingdom released the first version of the Common Arrangement of Work Sections (CAWS), which provided a standardized structure for classifying construction work packages by building task types and processes. Additionally, the Royal Institute of British Architects (RIBA) refined and extended the SfB system into the SfB/UDC, and in 1968 proposed the Construction Index (CI)/SfB system. Following these developments, the international standardization of construction classification systems was led by the International Organization for Standardization (ISO). In 1994, ISO published Technical Report ISO TR 14177, which addressed the classification of information in the building production process. Later, in 2001, ISO formally released ISO 12006-2: Building Construction—Organization of Information about Construction Works. To align with international standards, many countries subsequently restructured their national classification systems. For example, in 1997, the UK revised its CI/SfB system into Uniclass; in North America, MasterFormat and UniFormat were integrated to form OmniClass, based on ISO 12006-2. In Japan, the Japan Construction Information Center (JACIC) developed the JCCS (Japan Construction Classification System). Other countries also established systems tailored to their own construction contexts, such as StLB in Germany, CCS in Denmark, SfB+BSAB in Sweden, and Building 90 (Talo 90) in Finland. In summary, building classification systems worldwide have gradually evolved in line with ISO 12006-2, showing a clear trend toward systematization. In recent years, as digital technologies become more integrated into construction, these systems have continued to develop to support the digitalization of building information. With the growing use of BIM, classification systems are receiving more attention and are now key tools for linking project data across the building lifecycle.

In the United Kingdom, early practices traditionally relied on the CI/SfB Construction Information Index Handbook published by the RIBA in 1976, which was a faceted classification system composed of five classification tables. Subsequently, in 1987, RIBA, together with the RICS and other organizations, jointly established the Construction Project Information Committee (CPIc) and developed the Uniclass classification system in 1997. The system later underwent several major revisions: it evolved into Uniclass2 in 2013, was updated to Uniclass2015 in 2015, and most recently was renamed simply as Uniclass in 2023, marking a total of four significant updates. Uniclass is a unified classification system used across all building-related sectors. It covers a wide range of projects, from infrastructure like railways to components such as station surveillance cameras. With the adoption of BIM, Uniclass offers a comprehensive and versatile framework for infrastructure, landscapes, engineering services, and buildings, supporting all phases of the project lifecycle. Although both Uniclass and Omniclass are established classification systems developed in alignment with ISO 12006-2, this study adopts Uniclass 2015 due to its clearly defined hierarchical structure. As our research moves toward multi-level analysis of building components, Uniclass provides a suitable framework for structured classification and gradual refinement of BIM data.

The Uniclass classification system consists of 15 interrelated tables that encompass various aspects of the built environment, ranging from functional requirements to component-level details. Each table is identified by a two-letter code and is designed to support data at different levels of granularity throughout the lifecycle of a construction project. Table 1 summarizes the names, abbreviations, and functional descriptions of these tables, highlighting their respective roles in facilitating structured and interoperable information management in BIM applications. This overview is adapted from the official Uniclass documentation published by NBS.

Elements/Functions classify components and functional parts of a building. “Elements” refer to physical parts, such as floors, walls, roofs, or major structural parts, like foundations and piers. “Functions” represent the services provided by spaces or equipment. “Systems” are groups of components that work together to form elements or deliver functions. For example, a pitched roof system includes rafters, tiles, insulation, ceiling finishes, and other parts. A heating system may consist of boilers, pipes, tanks, and radiators. With its hierarchical structure, Uniclass supports detailed organization of building information and meets the needs of BIM throughout the project lifecycle. This study adopts Uniclass due to its clear logic, structured format, and flexibility to support multi-dimensional classification, including EF, Ss, and potential Pr integration.

2.2. BIM Object Classification

The classification of BIM objects plays a crucial role in improving the organization and retrieval efficiency of construction information, facilitating information sharing and collaboration among project teams, reducing design and construction errors, and ultimately lowering costs and improving project quality [1,16]. Early research primarily relied on expert systems to address this issue. Zhi et al. proposed a semi-explicit coding method for prefabricated building components, aiming to improve the dissemination and integration of information for prefabricated components in China through a flexible and widely applicable coding system [17]. Parece et al. developed a tool based on Building Information Modeling and a construction classification system to assess building carbon emissions. This tool integrated the classification system into BIM models, enabling adaptability across different development stages and modeling techniques, thereby simplifying the evaluation of design options and reducing the need for repeated modeling [18]. These studies focused on optimizing the organization and application of construction information through the integration of BIM technology and classification systems [17,18]. However, such methods have limited automation capabilities. When classification tasks involve a broad target range and complex data, it becomes difficult to ensure high efficiency and accuracy.

As a result, researchers have begun to explore data-driven methods, such as machine learning, which can learn and generalize classification rules from the data itself, thereby overcoming the efficiency and scalability limitations of rule-based approaches [9,19]. Utkucu et al. proposed an integrated method combining semantic keyword search, rule-based reasoning, machine learning based on object geometry, and deep learning based on visual shape features. This approach enabled automatic classification of architectural and MEP BIM objects, significantly improving the efficiency, interoperability, and accuracy of data exchange in building performance assessments [20]. Yu et al. proposed an ensemble deep learning method that combines a multi-view convolutional neural network (MVCNN) and a multilayer perceptron (MLP), leveraging image and adjacency features of BIM objects to enhance classification performance [21]. Koo et al. applied the support vector machine (SVM) algorithm to classify BIM objects automatically based on geometric and physical relationship features of construction objects, aiming to evaluate their semantic integrity [22]. These studies demonstrate that data-driven methods offer stronger interoperability, higher efficiency, and better classification accuracy in BIM object classification tasks [20,21,22]. However, challenges remain in terms of feature selection, model complexity, and data requirements. Most existing BIM object classification studies rely primarily on image-based datasets, which often lead to increased model complexity and a strong dependence on high-quality data.

2.3. Automatic Classification by Machine Learning

In recent years, with advancements in computational power and the availability of large datasets, machine learning has achieved significant breakthroughs across various fields [23,24,25]. Machine learning classification algorithms, through techniques such as natural language processing and classification modeling, not only enhance the accuracy of key variable quantification but also improve variable selection, topic extraction, and pattern recognition capabilities, thereby deepening the understanding of strategic management objects. In particular, classification models such as Random Forest and Support Vector Machine (SVM) have demonstrated substantial value in improving the effectiveness and robustness of model estimation [26]. Current research mainly focuses on numerical data, text classification, and image recognition tasks. In the field of construction engineering, Kim et al. applied a machine learning method based on terrestrial laser scanning (TLS) and density-driven techniques to automatically classify rebar diameters, thereby improving the automation and accuracy of rebar size detection [27]. Zeng et al. utilized machine learning methods to classify and predict the failure modes, load-bearing capacities, and effective stiffness of corroded reinforced concrete columns, enhancing the precision and efficiency of seismic performance assessments in construction engineering [28]. Li et al. employed machine learning-based classification methods to assess the defect conditions of wooden columns in ancient building walls, thereby achieving the automation of structural health monitoring in heritage structures [29]. Although machine learning-based classification methods have been applied to structural inspection, performance evaluation, and facility maintenance in the construction industry, the application of these techniques for the automatic classification of BIM objects remains a crucial research topic.

Emunds et al. proposed SpaRSE-BIM, a sparse convolution-based neural network model designed to classify BIM geometric data in IFC format, enhancing the semantic richness of BIM models [30]. Koo et al. applied the SVM algorithm to automatically classify BIM model objects based on geometric and physical relationship features of building objects to verify semantic integrity [22]. Liu et al. developed a multimodal deep learning approach that integrates the geometric, semantic, and topological features of BIM objects to achieve fine-grained classification of building components [12]. These studies have demonstrated that machine learning methods can achieve high classification accuracy for BIM objects, enabling precise recognition and management of building information. However, research focusing on the classification of BIM objects based on standardized industry classification systems remains limited and warrants further exploration. Xu et al. introduced an automated classification and coding method based on BIM information and the Random Forest algorithm for prefabricated building components. Their method extracted IFC information from components and classified BIM objects according to the OmniClass classification system [14]. However, the feature variables required in their study included not only geometric and spatial information but also material attributes, which are often unavailable during the early design stages. In studies involving the classification of BIM objects, standard supervised learning metrics—such as accuracy, precision, recall, and F1-score—are commonly used to evaluate model performance. These evaluation metrics have become a de facto standard in machine learning-based classification research, enabling researchers to benchmark model effectiveness across varying data distributions and classification tasks [12,14,22]. Consequently, their method could not be implemented until the construction documentation phase. Additionally, their research focused solely on prefabricated components and did not extend to all types of building objects.

These studies illustrate the expanding potential of machine learning techniques in BIM object classification across diverse scenarios. However, most existing approaches either rely on high-LOD geometric or image-based data or employ custom or non-standard coding schemes, limiting their applicability during the early design phase. Notably, Xu et al. (2022) [14] leveraged a Random Forest model to classify prefabricated components using OmniClass, yet their method required detailed material information and was constrained to a narrow object scope. In contrast, the present study builds upon these efforts by proposing a streamlined classification pipeline using low-LOD IFC data and standard Uniclass codes, aiming for broader applicability across building objects. The full methodological implementation is elaborated in Section 3.

2.4. Comparative Summary of BIM Object Classification Methods

To provide a structured overview of current approaches in BIM object classification, Table 2 compares representative studies in terms of classification methods, adopted standards, feature types, level of development (LOD) requirements, and application scopes. This comparison highlights the diversity of algorithms, input modalities, and classification frameworks applied in recent research.

3. Methodology

3.1. Research Process

This study proposes an automated classification methodology, as illustrated in Figure 1. First, to create the target labels for supervised learning, this study established a mapping relationship between BIM objects and the Uniclass classification system. The Uniclass EF and Ss tables were adopted as the standard classification basis, and a unique classification label was assigned to each IfcEntity, thereby generating the target variables required for model training. To ensure the reliability and consistency of Uniclass code assignment, all annotations underwent a multi-party verification process. The initial coding was performed by the primary researcher and then reviewed collaboratively by the academic supervisor and industry collaborators. This process ensured the consistency and standardization of BIM object classification. Second, IFC4-format files were extracted from a set of BIM models to create the initial dataset. Using the BIM visualization tool BIMvision, the IFC files were parsed to extract IfcEntities representing individual building components. Each entity contained three categories of feature attributes: Semantic attributes, such as LoadBearing (weather load-bearing) and Storey (the associated floor); Spatial features, including global coordinates (GlobalX, GlobalY, GlobalZ) and bounding box dimensions; Dimensional information, such as maximum projected area, total area, and volume.

To handle missing values in the original data, the K-Nearest Neighbors (KNN) method was used for imputation, resulting in a cleaned dataset. The data were then randomly split into training and testing sets. The Random Forest algorithm was selected as the classification model, using the extracted features to predict Uniclass codes for BIM objects. The model was trained in batches and evaluated on the testing set. The final output established mappings between BIM objects and their corresponding Uniclass codes. For example, an IfcWall was classified as Ef_25_10_40 (Internal walls) and Ss_25_10_30_35 (Gypsum board partition systems). Further details of the methodology are provided in the following sections.

3.2. Uniclass Classification System

In Uniclass, there is a faceted structural relationship between the EF and SS tables, characterized by “A Part of (Composition)” and “A Type of (Selection)” associations, thereby achieving a unified hierarchical structure [31]. Specifically, the EF and SS tables share identical numbering at the level 1 (Group) and level 2 (Subgroup) hierarchies and establish a mapping relationship through “A Part of” associations. For example, as shown in Figure 2, for wall-related objects, the first-level entry in EF is “EF_25 Wall and barrier elements,” corresponding to “Ss_25 Wall and barrier systems” in Ss; at the second level, “EF_25_10 Walls” includes several wall systems, such as “Ss_25_10 Framed wall systems” and “Ss_25_11 Monolithic wall structure systems.” Thus, based on “EF_25_10 Walls,” it is possible to further nest information such as “Ss_25_10_32 Framed wall systems,” thereby establishing a hierarchical link between EF Level 2 and Ss Level 3. Moreover, EF Level 3 comprises various types, such as External walls, Internal walls, and Parapet walls, forming a “Type of” structure; similarly, Ss Level 4 presents a corresponding detailed structure. Appendix A Table A1 provides a complete list of the EF and Ss codes included in the dataset. From the perspective of LOD, EF Levels 1 and 2 are more suitable for lower LOD stages (e.g., LOD 100–200), representing the basic functional classification of objects. In contrast, EF Level 3 and Ss Level 4 are appropriate for higher LOD stages (e.g., LOD 300 and above), providing more detailed categorization of components and system divisions.

Figure 3 shows the hierarchical structure of Element/Function (EF) and System (Ss) classifications in the Uniclass system. It illustrates how both EF and Ss categories are broken down into five levels, with “part of” and “type of” relationships connecting the layers. From a structural point of view, EF Level 2 and Ss Level 2 can be directly linked to represent how buildings are organized into systems and functions. However, this connection alone may not provide enough detail for accurate information management. To improve this, EF Level 3 is added between EF Level 2 and Ss Level 2. This adds more specific functional types and strengthens the classification logic. The layered design in the figure helps with better semantic interpretation of BIM objects, especially for automatic classification. It also supports the decision to use EF and Ss levels as classification targets in this study and highlights the value of a standardized mapping approach.

3.3. Data Acquisition and Feature Selection

Table 3 shows the composition of the dataset used in this study. The data come from five residential building projects located in Japan. In total, 8715 data records were collected for training the machine learning model. The dataset size is similar to those used in earlier studies [12,22]. Since this study aims to test automatic classification using low-LOD IFC data under the Uniclass system, the dataset shown in Table 3 reflects a typical and practical use case. It is representative of early-stage BIM modeling scenarios where detailed geometry is limited but classification is still needed.

Figure 4 shows an example of how BIM objects are represented in the dataset. It illustrates a wall object along with its assigned Uniclass codes and extracted IFC features. The EF code “EF_25_10_40” represents “Internal walls,” and the Ss code “Ss_25_10_30_35” stands for “Gypsum board partition systems.” In addition to these classification labels, the object includes detailed IFC-based attributes, such as spatial coordinates, bounding box dimensions, and volume. These features are grouped into three categories: semantic, spatial and dimensional.

Semantic attributes: including IfcEntity, Storey, LoadBearing, and IsExternal;
Spatial features: including the global spatial coordinates within the model (GlobalX, GlobalY, GlobalZ) and the bounding box dimensions (length, width, and height);
Dimensional information: including Area max, Area total, and Volume.
The selection of semantic, spatial, and dimensional features in this study is informed by previous literature summarized in Section 2 [12,14,22]. The feature set is based on prior research and adjusted to fit the classification goals of the Uniclass system.

As shown in Figure 5, all input features correspond to information at LOD 200 or lower, which can be directly extracted from low-LOD IFC models without requiring highly detailed modeling information. In contrast, the inferred target variables (EF/Ss codes) correspond to classification information typically associated with approximately LOD 300. Therefore, this study establishes an automatic classification method that predicts higher-detail classification labels based on low-detail IFC data.

In addition, by utilizing the Autodesk Interoperability Tools plugin, shared parameters can be added to BIM objects within Revit models, thereby enabling the assignment of Uniclass-based EF and Ss classification identifiers. After encoding the BIM objects in the Revit models, further analysis was conducted on the relationship between EF and Ss classifications. As shown in Figure 6, the correspondence between EF and Ss classifications in the dataset is not strictly one-to-one; rather, extensive one-to-many, many-to-one, and even many-to-many mapping structures are observed. In particular, for wall objects, there is a prominent many-to-many relationship among EF_25_10_25 (External walls), EF_25_10_40 (Internal walls) and Ss_25_10_32_70 (Reinforced concrete wall systems), Ss_25_11_16 (Concrete wall systems), Ss_25_10_30_35 (Gypsum board partition systems), and Ss_25_12_60 (Panel enclosure systems). This indicates that special attention will be given to the classification performance of wall objects in subsequent experiments. Such complex multi-level classification relationships reflect the functional and system-level multiplicity of building components, increasing the challenge of classification tasks. At the same time, it further validates the logical consistency and flexible extensibility of the Uniclass classification system at the Group and Subgroup hierarchy levels.

Figure 7 and Figure 8 illustrate the distribution of category counts across the EF and Ss classification dimensions within the dataset. The results reveal an evident imbalance in the number of components among different categories for both EF and Ss classifications. Notably, wall-related objects dominate across both classification dimensions, reflecting the ubiquity and critical importance of wall objects in real-world residential projects. Moreover, the EF and Ss classification systems provide relatively detailed categorization for wall objects, implying that the classification accuracy for walls will significantly influence the overall performance of the model. In addition to walls, floor slabs and openings also constitute important portions of the dataset.

3.4. Evaluation Metrics

In this study, an exemplary IFC dataset was created and representative feature vectors were extracted to train machine learning classification models, thereby verifying the effectiveness of the proposed features for the automatic classification of building components. Following established practices from previous studies, the dataset was randomly split into a training set and a testing set at a ratio of 8:2 [32,33,34], containing 6972 and 1743 samples. During the training phase, 10-fold cross-validation was employed to enhance the generalization capability and robustness of the models.

For the evaluation of classification performance, four commonly used supervised learning metrics were adopted: Accuracy, Precision, Recall, and F1-score, consistent with previous work in BIM object classification. These metrics were computed based on the confusion matrix shown in Table 4, and the specific calculation formulas are provided in Equations (1)–(4). Accuracy measures the proportion of correctly classified samples out of all test samples (Equation (1)); however, in cases of class imbalance, accuracy alone may obscure the model’s performance on minority classes and therefore should not be the sole evaluation metric. Recall focuses on the proportion of original positive samples that are correctly predicted, reflecting the model’s coverage of the positive class (Equation (2)); Precision measures the proportion of true positives among all samples predicted as positive (Equation (3)); and the F1-score, being the harmonic mean of Precision and Recall, provides a comprehensive measure of model performance under imbalanced data conditions (Equation (4)).

A c c u r a c y = \frac{T P + T N}{T P + F P + F N + T N}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

In addition, to further analyze the contribution of each feature variable to the model’s classification decisions, this study introduced a feature importance analysis. Within the adopted Random Forest algorithm, feature importance was quantified based on the Gini impurity criterion, which measures the degree to which each feature contributes to improving node purity during the decision-making process. Through this approach, key variables that have a significant impact on classification outcomes were identified. This analysis not only facilitates the optimization of feature selection strategies but also enhances the interpretability of the model, thereby supporting future improvements in object classification standardization and feature engineering efforts.

4. Experiments and Results

4.1. Data Preprocessing

Before training the model, it was necessary to standardize the original dataset. First, the non-numeric attributes “IsExternal” and “LoadBearing” were converted into one-hot encoded formats to meet the input requirements of machine learning models for numerical features. In addition, missing values were observed in the “Volume” of some IFC model objects. To address this issue, following the steps outlined in Figure 1, the dataset was preprocessed prior to data splitting and model training. K-Nearest Neighbors (KNN) imputation was applied for missing data filling. This method estimates missing values based on the distances between instances: for a sample with missing data, the five most similar samples (neighbors) in the dataset are selected, and the missing attribute is imputed using the average value of these neighbors. The implementation code is as follows (Algorithm 1):

Algorithm 1: KNN imputation code

volume_data = df[[‘Volume’]]
imputer = KNNImputer(n_neighbors = 5, weights = ‘uniform’)
volume_imputed = imputer.fit_transform(volume_data)
df[‘Volume’] = volume_imputed[:, 0]

4.2. Model Training

To implement the classification of BIM objects into standardized Uniclass EF and Ss categories, a Random Forest model was trained using scikit-learn. The model input comprised 13 IFC-derived features. The classification output consisted of EF (11 classes) and Ss (17 classes) codes assigned to each object showed in Table A1, representing functional and system-level categorizations respectively.

First, a training–testing split was performed based on actual BIM projects. The dataset, consisting of BIM object data from five real-world residential buildings, was divided into training and testing sets in an 80:20 ratio. To preserve the original category distribution, stratified sampling was applied. For example, BIM objects from Project C labeled as EF_25_10_40 (Internal Walls) and objects from Project E labeled as Ss_25_30_20_39 (Hinged Doorset Systems) were proportionally distributed across both subsets to ensure robust generalization across different classification categories.

Next, the dataset was used to train a Random Forest classifier. The model was configured with the following parameters: The model comprised 100 decision trees (n_estimators = 100) to enhance classification stability and robustness. The Gini impurity (criterion = “gini”) was adopted as the splitting criterion to evaluate the purity gain after feature partitioning. No maximum depth (max_depth = None) was imposed, allowing each tree to grow fully according to the data characteristics. The minimum number of samples required to split an internal node was set to 2 (min_samples_split = 2), and the minimum number of samples required at a leaf node was set to 1 (min_samples_leaf = 1), ensuring the model’s ability to capture fine-grained patterns even with small samples. Regarding feature selection, the number of features considered at each split was set to the default auto (equivalent to the square root of the total number of features), increasing diversity among trees and enhancing the overall ensemble performance. Bootstrap sampling (bootstrap = True) was enabled, meaning that each tree was trained using bootstrapped subsets of the data, further improving the model’s generalization capability. To ensure reproducibility, the random seed was fixed at 42 (random_state = 42).

The parameter configuration was informed by recent studies on the classification of BIM objects using Random Forest models [14,21]. The final configuration was determined through iterative tuning and comparative evaluation, aiming to balance training efficiency, classification accuracy, and model stability. The core code used to train the model in the Python (version 3.9) environment is as follows (Algorithm 2):

Algorithm 2: Random Forest core code

from sklearn.ensemble import RandomForestClassifier
# Initialize the Random Forest classifier with specified parameters
clf = RandomForestClassifier(
n_estimators = 100, # Number of trees in the forest
criterion = ‘gini’, # Splitting criterion
bootstrap = True, # Enable bootstrapped sampling
max_depth = None, # No maximum depth restriction
random_state = 42 # Fixed seed for reproducibility
)
# Fit the model using the training data
clf.fit(x_train, y_train)
# Predict the class labels on the test set
y_pred = clf.predict(x_test)

4.3. Experimental Results

To validate the classification performance of the constructed Random Forest (RF), this study conducted comparative experiments against Support Vector Machine (SVM) and Decision Tree (DT) algorithms. All three models were trained and tested on the same dataset, and their performance was evaluated on both EF and Ss coding classification tasks. The results were assessed using four metrics: Accuracy, Precision, Recall, and F1-score. Table 5 summarizes the classification performance of each model on the testing set. As shown in the table, the RF model achieved the highest scores across all metrics for both EF and Ss tasks. This demonstrates that RF provides better classification accuracy and stability than the other two models. Similarly, in the SS coding classification, Random Forest again demonstrated superior performance on all evaluation indices, showcasing stronger classification capability. Overall, the comparative evaluation confirmed that Random Forest is better suited than DT and SVM for the BIM object classification tasks addressed in this study. As an ensemble learning method composed of multiple decision trees, Random Forest exhibits excellent nonlinear modeling capabilities and strong noise resistance. It can maintain high classification accuracy even when handling high-dimensional features without requiring explicit feature selection, demonstrating superior generalization ability. Therefore, Random Forest presents significant advantages in the automatic classification of BIM objects.

5. Discussion

5.1. Results Analysis

5.1.1. Classification Results Analysis

Based on a series of evaluation metrics and the analysis of the confusion matrices, it can be observed that the Random Forest algorithm demonstrated high performance in the object classification tasks, achieving an accuracy of 1.00 and 0.99 for the EF and SS classification tasks, respectively. Figure 9 illustrates the confusion matrix for the classification of 11 EF coding categories. Overall, the model exhibited strong classification capability across most categories, with the majority of samples correctly assigned to their corresponding classes. Specifically, the model performed exceptionally well in distinguishing EF_25_10_25 (External walls) and EF_25_10_40 (Internal walls), with 229 internal wall samples and 705 external wall samples correctly classified. This result indicates the model’s ability to accurately differentiate between external and internal walls, with minimal misclassification.

However, the confusion matrix also revealed some misclassification among certain categories. In particular, EF_25_10_40 (Internal walls), EF_25_10_60 (Parapet walls), and EF_25_10_25 (External walls) exhibited cross-category misclassifications. For instance, one external wall sample was misclassified as an internal wall, one external wall sample was misclassified as a parapet wall, three internal wall samples were misclassified as external walls, and one parapet wall sample was misclassified as an external wall. Despite these four instances of confusion between internal and external walls, the overall classification accuracy remained high given the large sample size. Regarding the misclassification between parapet walls and external walls, further analysis revealed that both categories share the “IsExternal” attribute (True) and have considerable similarity in geometric features, making differentiation more challenging for the model. Additionally, the relatively small sample size for parapet walls further contributed to lower classification accuracy in this category. Aside from the above confusions, misclassification among other categories was limited. The confusion matrix predominantly exhibited a dark diagonal pattern, with only a few darker off-diagonal cells, indicating robust overall classification performance. The model successfully achieved high recognition accuracy for most EF categories, demonstrating stable and reliable classification capability across different types of BIM objects. The distribution pattern in the confusion matrix further validates the effectiveness of the model in EF classification and also highlights potential directions for future refinement, particularly in distinguishing between highly similar categories.

Figure 10 presents the confusion matrix results for the Ss coding classification task, covering 17 system categories. Overall, the model demonstrated high classification accuracy across most system categories. Notably, the Ss_25_10_30_35 (Gypsum board partition systems) category achieved outstanding classification performance, with 654 correctly classified samples, making it the category with the highest sample count in the test set. This indicates that this system type had sufficient training samples and distinctive feature patterns conducive to accurate recognition. In contrast, the classification accuracy for Ss_25_10_32_70 (Reinforced concrete wall systems) and several door system categories was comparatively lower. For example, within the classification results for Ss_25_10_32_70 (Reinforced concrete wall systems), six samples were misclassified as Ss_25_11_16 (Concrete wall systems), indicating some confusion by the model in distinguishing different types of concrete wall systems. Additionally, a few misclassifications were observed for the Ss_25_12_60 (Panel enclosure systems), where two samples were incorrectly classified as Ss_25_10_30_35 (Gypsum board partition systems). This confusion is likely attributable to similarities in geometric dimensions or material properties between these system types, complicating the model’s discrimination process.

Regarding door system classification, the accuracy for Ss_25_30_20_37 (High-security doorset systems) was relatively low, with some samples misclassified as Ss_25_30_20_39 (Hinged doorset systems). Similarly, samples from Ss_25_30_20_78 (Sliding folding doorset systems) were sometimes misclassified as Ss_25_30_20_39 (Hinged doorset systems) or Ss_25_30_20_77 (Sliding doorset systems). This can primarily be attributed to the high similarity in geometric features and opening/closing mechanisms among these door system categories, coupled with the relatively small sample size available for training, leading to less distinct decision boundaries.

In summary, the Random Forest model developed in this study exhibited robust performance in recognizing the system-level attributes of BIM objects, achieving high classification accuracy, particularly for common categories such as foundations, beams, columns, and windows. However, a certain degree of confusion persists among system categories with structurally or functionally similar characteristics, suggesting that future research should focus on feature refinement or dataset expansion to further improve classification accuracy.

5.1.2. Feature Importance Analysis

Figure 11 illustrate the feature importance rankings of the Random Forest model used for the EF and Ss code classification tasks, respectively. In the EF coding classification task, the top three most important features were identified as IfcEntity, Bounding Box H, and IsExternal. Among these, IfcEntity represents the semantic category of the object, serving as the most distinctive structural feature, while Bounding Box H and IsExternal pertain to typical geometric and semantic boundary attributes. These results indicate that the model primarily relies on geometric dimensions and structural boundary characteristics—rather than purely spatial attributes—when distinguishing between functional categories such as walls, slabs, and foundations. Additionally, features like Volume and LoadBearing also contributed meaningfully to the classification task. It is noteworthy that global spatial information (such as GlobalX, GlobalY, GlobalZ) and the floor attribute (Storey) ranked lowest in feature importance, indicating a significantly lesser impact compared to geometric features. This suggests that the model’s classification decisions are not based on the absolute spatial location or floor distribution of objects but are instead driven by intrinsic geometric and structural properties. Consequently, the model demonstrates strong generalization capability and robustness. For the Ss coding classification task, the feature importance ranking showed a similar trend. IfcEntity and Bounding Box H again emerged as the top two features, while Bounding Box L, IsExternal, and Bounding Box W also exhibited considerable importance. This finding further confirms that, at the system classification level, the model primarily bases its decisions on volumetric and geometric attributes rather than global spatial coordinates.

In summary, the feature importance analysis indicates that the proposed classification model possesses strong independence from spatial distribution and achieves generalizable classification through geometry-driven learning. Such characteristics ensure the model’s applicability even during early design stages (Low LOD phases), laying a solid foundation for practical deployment of automatic BIM object classification in preliminary modeling workflows.

5.2. Comparison with Existing Methods

Currently, research focusing on the automatic classification of BIM objects based on standardized classification systems remains relatively limited. Most existing studies primarily adopt semantic analysis or image-based datasets to classify BIM objects. Despite differences in dataset types and specific methods, these studies, like the present work, confirm the feasibility of utilizing machine learning techniques for automatic BIM object classification [13,20,22].

For instance, Liu et al. demonstrated an approach that combines IFC data, image features, and semantic labels to automatically classify building components, achieving a classification accuracy exceeding 98%. However, their study mainly focused on the classification of object geometries, relying on complex data features that require significant data collection and preprocessing efforts [12]. Similarly, Xu et al. proposed an automatic BIM object classification method based on IFC data targeting the OmniClass classification system, achieving 98.9% accuracy within the domain of prefabricated wall objects. Nevertheless, their study was limited to prefabricated components, constraining its applicability, and primarily emphasized symbolic tagging of features rather than broader category classification [14].

In contrast, the method proposed in this study—an automatic classification approach based on the Uniclass system using low-LOD IFC data—demonstrates clear advantages in terms of data requirements, applicability, and systematization. The proposed method enables high-precision, multi-level object classification under conditions of low-detail modeling, showcasing superior practicality and engineering applicability, particularly for intelligent modeling assistance and information management during the early design stages.

5.3. Research Scope

This study focused on the automatic classification and coding of BIM objects based on IFC data, achieving high accuracy through a supervised learning approach. However, the dataset was sourced exclusively from residential building projects, which limits the variety of component types and functional diversity included in the model training. As such, the current classification model may not generalize well to other types of buildings, such as commercial complexes, industrial facilities, or infrastructure projects. These project types may introduce new or uncommon BIM objects not covered in the training data, potentially reducing classification performance. This indicates that the current research scope is constrained primarily to residential buildings and typical architectural objects within that domain.

5.4. Potential Impact

This study’s method enables automated assignment of EF and Ss codes using only geometric and semantic features, improving model standardization and supporting structured information exchange.

By embedding classification at early stages, the method enhances information continuity across the design, construction, and operational phases. Classified objects can serve as semantic anchors for integration with 5D modeling tools, scheduling systems, and facility management platforms, thereby promoting consistent data flow throughout the building lifecycle. Furthermore, standardized classification facilitates intelligent model checking and automated auditing. Components with defined functional or system roles can be assessed for compliance and performance, improving the accuracy and efficiency of quality control. Finally, the classification framework supports lightweight model management and lays a foundation for linking with product-level information, including unit costs or environmental metrics. This opens opportunities for early-stage cost estimation, material planning, and sustainability analysis, contributing to data-driven decision-making in digital construction environments.

5.5. Future Recommendations

Future research should focus on extending the current classification framework to incorporate the Products (Pr) layer of the Uniclass system, thereby enabling more fine-grained and application-oriented BIM object identification. While the present study achieves accurate classification at the Element/Function (EF) and Systems (Ss) levels, the inclusion of the Pr layer would allow for detailed recognition of specific products or materials used in construction, such as wall types, door models, or insulation systems. By establishing a multi-level linkage across EF, Ss, and Pr codes, it would become feasible to map BIM objects to standardized unit cost databases or material libraries. This would support the early prediction of component-level costs based on low-LOD data, enabling preliminary budget analysis and cost optimization during the conceptual design stage. This capability supports better planning decisions and improves BIM integration with cost and procurement systems.

5.6. Key Research Findings and Contributions

This section presents the main experimental findings and highlights the key contributions of the study. The proposed method uses semantic, spatial, and dimensional features from low-LOD IFC data to classify BIM objects under the Uniclass system. The Random Forest model achieved over 99% accuracy in both Element/Function (EF) and Systems (Ss) classification tasks, confirming its effectiveness and stability. Feature importance analysis showed that semantic types, spatial location and volume-related features played a major role in classification. The model performed well across diverse object categories, even with limited data detail. This confirms the method’s applicability in early-stage BIM scenarios. Beyond experimental validation, this study contributes the following:

A novel automatic classification method for BIM objects based on IFC data is proposed. By extracting feature variables from IFC data and training a Random Forest model, the method achieved over 99% classification accuracy in both EF and Ss coding tasks, demonstrating its effectiveness and robustness.
The study successfully implements automatic classification and coding of BIM objects based on the Uniclass classification system (EF and Ss standards). This approach overcomes the traditional limitations of relying solely on geometric or functional features by employing a standardized classification framework, thereby enhancing the systematization, standardization, and engineering applicability of BIM object classification.
For the first time, high-precision, fine-grained classification is achieved using only low-level detailed IFC data at LOD 200. The proposed method significantly reduces reliance on high-precision modeling data and substantially improves applicability during the early design stages (Early Design Stage) and low-LOD phases, offering a viable solution for intelligent management at the preliminary stage of BIM projects.

6. Conclusions

Fine-grained classification of BIM objects is essential for enhancing information interoperability and supporting BIM-based applications across the building lifecycle. This study proposes a method that enables automatic fine-grained classification and coding of BIM objects based on the Uniclass classification system, utilizing low Level of Development (LOD 200) IFC data. Specifically, semantic features, spatial features, and dimensional features were extracted from the IFC models to create feature vectors, which were mapped to Element Function (EF) and Systems (Ss) codes via a Random Forest classifier. The method achieved over 99% accuracy in both EF and Ss classification tasks, while reducing dependence on high-LOD modeling data. The classification results demonstrated strong robustness, achieving high precision across multiple categories, thereby validating the effectiveness of the proposed feature framework and classification model in automatic BIM object recognition. This study further confirms the feasibility and application advantages of achieving efficient and scalable BIM object classification using standardized low-LOD BIM data. Employing the standardized Uniclass classification system not only enhances the systematicity and standardization of classification results but also strengthens the interoperability and engineering applicability of the classification framework. Despite the positive outcomes, certain limitations remain. The current dataset is primarily sourced from residential building projects; applying the model to other building types may introduce variations in object types, potentially affecting the model’s applicability and classification accuracy. The classification labels were manually assigned, which may introduce human error. Future work will expand the dataset to cover more building types and improve annotation consistency and accuracy, and explore integrating the Products (Pr) table into the Uniclass system by combining EF and Ss classification data to link and predict object-level costs. The goal is to enable accurate cost estimation and control in the early design stage using low-LOD BIM models. This approach aims to expand the value of BIM in early planning and support smarter decision-making.

Author Contributions

Conceptualization, S.T. and K.S.; methodology, S.T.; software, S.T.; validation, S.T.; investigation, S.T. and T.B.; writing—Original draft preparation, S.T.; writing—Review and editing, T.B. and K.S.; visualization, S.T.; funding acquisition, T.B. and K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Research grant of Construction Research Institute, grant number 2024-1, and the APC was funded by Research grant of Construction Research Institute. https://www.kensetu-bukka.or.jp/trendtopics/subsidy/ (accessed on 18 May 2025).

Data Availability Statement

The datasets presented in this article are not publicly available due to proprietary restrictions. The BIM models used in this study were provided by STARTS Corporation and are not authorized for open dissemination.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT-4o for the purpose of polishing English expressions. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Author Takamasa Bito was employed by the company STARTS Research Institute, Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. EF and Ss Codes Included in the Dataset Used in This Study.

Code	Title	Gr	Su	Se	Ob	SO	Description
EF_20_05_30	05EF	20	05	30			Foundations
EF_20_10_30	05EF	20	10	30			Framed structures
EF_25_10_25	05EF	25	10	25			External walls
EF_25_10_40	05EF	25	10	40			Internal walls
EF_25_10_60	05EF	25	10	60			Parapet walls
EF_25_30_25	05EF	25	30	25			Doors
EF_25_30_97	05EF	25	30	97			Windows
EF_30_10_30	05EF	30	10	30			Flat roofs
EF_30_20_06	05EF	30	20	06			Basement floors
EF_30_20_34	05EF	30	20	34			Ground floors
EF_35_10_30	05EF	35	10	30			External stairs
Ss_20_05_15_71	06Ss	20	05	15	71		Reinforced concrete pilecap and ground beam foundation systems
Ss_20_05_15_72	06Ss	20	05	15	72		Reinforced concrete raft foundation systems
Ss_20_05_50_70	06Ss	20	05	50	70		Reinforced concrete foundation and plinth systems
Ss_20_20_75_70	06Ss	20	20	75	70		Reinforced concrete beam systems
Ss_20_30_75_70	06Ss	20	30	75	70		Reinforced concrete column systems
Ss_25_10_30_35	06Ss	25	10	30	35		Gypsum board partition systems
Ss_25_10_32_70	06Ss	25	10	32	70		Reinforced concrete wall structure systems
Ss_25_11_16	06Ss	25	11	16			Concrete wall systems
Ss_25_12_60	06Ss	25	12	60			Panel enclosure systems
Ss_25_30_20_37	06Ss	25	30	20	37		High-security doorset systems
Ss_25_30_20_39	06Ss	25	30	20	39		Hinged doorset systems
Ss_25_30_20_77	06Ss	25	30	20	77		Sliding doorset systems
Ss_25_30_20_78	06Ss	25	30	20	78		Sliding folding doorset systems
Ss_25_30_95_95	06Ss	25	30	95	95		Window systems
Ss_30_10_30_70	06Ss	30	10	30	70		Reinforced concrete roof framing systems
Ss_30_12_15	06Ss	30	12	15			Concrete plank floor systems
Ss_35_10_25_85	06Ss	35	10	25	85		Suspended external stair systems

Gr: Group; Su: Subgroup; Se: Section; Ob: Object; SO: Sub Object.

References

Lee, D.-G.; Park, J.-Y.; Song, S.-H. BIM-Based Construction Information Management Framework for Site Information Management. Adv. Civ. Eng. 2018, 2018, 5249548. [Google Scholar] [CrossRef]
Xu, X.; Ma, L.; Ding, L. A Framework for BIM-Enabled Life-Cycle Information Management of Construction Project. Int. J. Adv. Robot. Syst. 2014, 11, 126. [Google Scholar] [CrossRef]
BuildingSMART IFC. Available online: http://standards.buildingsmart.org/IFC/RELEASE/IFC4_1/FINAL/HTML/ (accessed on 8 September 2023).
Zhou, D.; Pei, B.; Li, X.; Jiang, D.; Wen, L. Innovative BIM Technology Application in the Construction Management of Highway. Sci. Rep. 2024, 14, 15298. [Google Scholar] [CrossRef] [PubMed]
Laakso, M.; Nyman, L. Exploring the Relationship between Research and BIM Standardization: A Systematic Mapping of Early Studies on the IFC Standard (1997–2007). Buildings 2016, 6, 7. [Google Scholar] [CrossRef]
Ma, L.; Sacks, R.; Kattel, U.; Bloch, T. 3D Object Classification Using Geometric Features and Pairwise Relationships. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 152–164. [Google Scholar] [CrossRef]
Uniclass by NBS. Available online: https://uniclass.thenbs.com/ (accessed on 7 May 2025).
Pupeikis, D.; Navickas, A.A.; Klumbyte, E.; Seduikyte, L. Comparative Study of Construction Information Classification Systems: CCI versus Uniclass 2015. Buildings 2022, 12, 656. [Google Scholar] [CrossRef]
Bloch, T.; Sacks, R. Comparing Machine Learning and Rule-Based Inferencing for Semantic Enrichment of BIM Models. Autom. Constr. 2018, 91, 256–272. [Google Scholar] [CrossRef]
Choi, L.; Kim, H.; Kim, S.; Kim, M.H. Scalable Packet Classification Through Rulebase Partitioning Using the Maximum Entropy Hashing. IEEE/ACM Trans. Netw. 2009, 17, 1926–1935. [Google Scholar] [CrossRef]
Lim, Y.T.; Yi, W.; Wang, H. Application of Machine Learning in Construction Productivity at Activity Level: A Critical Review. Appl. Sci. 2024, 14, 10605. [Google Scholar] [CrossRef]
Liu, H.; Gan, V.J.L.; Cheng, J.C.P.; Zhou, S.A. Automatic Fine-Grained BIM Element Classification Using Multi-Modal Deep Learning (MMDL). Adv. Eng. Inform. 2024, 61, 102458. [Google Scholar] [CrossRef]
Seydgar, M.; Poirier, É.A.; Motamedi, A. Comparative Evaluation of Deep Neural Network Performance for Point Cloud-Based IFC Object Classification. IEEE Access 2024, 12, 108303–108312. [Google Scholar] [CrossRef]
Xu, Z.; Xie, Z.; Wang, X.; Niu, M. Automatic Classification and Coding of Prefabricated Components Using IFC and the Random Forest Algorithm. Buildings 2022, 12, 688. [Google Scholar] [CrossRef]
Austern, G.; Bloch, T.; Abulafia, Y. Incorporating Context into BIM-Derived Data—Leveraging Graph Neural Networks for Building Element Classification. Buildings 2024, 14, 527. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Chang, S.; Sparkling, A. BIM-Based Object Mapping Using Invariant Signatures of AEC Objects. Autom. Constr. 2023, 145, 104616. [Google Scholar] [CrossRef]
Shan, Z.; Fu, D.; Qiu, L.; Liang, Y.; Huang, C. A Semi-Explicit Practical Coding Method for Prefabricated Building Component Parts in China. Buildings 2023, 13, 1236. [Google Scholar] [CrossRef]
Parece, S.; Resende, R.; Rato, V. A BIM-Based Tool for Embodied Carbon Assessment Using a Construction Classification System. Dev. Built Environ. 2024, 19, 100467. [Google Scholar] [CrossRef]
Jung, N.; Lee, G. Automated Classification of Building Information Modeling (BIM) Case Studies by BIM Use Based on Natural Language Processing (NLP) and Unsupervised Learning. Adv. Eng. Inform. 2019, 41, 100917. [Google Scholar] [CrossRef]
Utkucu, D.; Ying, H.; Wang, Z.; Sacks, R. Classification of Architectural and MEP BIM Objects for Building Performance Evaluation. Adv. Eng. Inform. 2024, 61, 102503. [Google Scholar] [CrossRef]
Yu, Y.S.; Kim, S.H.; Lee, W.B.; Koo, B.S. Ensemble-Based Deep Learning Approach for Performance Improvement of BIM Element Classification. KSCE J. Civ. Eng. 2023, 27, 1898–1915. [Google Scholar] [CrossRef]
Koo, B.; La, S.; Cho, N.-W.; Yu, Y. Using Support Vector Machines to Classify Building Elements for Checking the Semantic Integrity of Building Information Models. Autom. Constr. 2019, 98, 183–194. [Google Scholar] [CrossRef]
Akinosho, T.D.; Oyedele, L.O.; Bilal, M.; Ajayi, A.O.; Delgado, M.D.; Akinade, O.O.; Ahmed, A.A. Deep Learning in the Construction Industry: A Review of Present Status and Future Innovations. J. Build. Eng. 2020, 32, 101827. [Google Scholar] [CrossRef]
Dopazo, D.A.; Mahdjoubi, L.; Gething, B.; Mahamadu, A.-M. An Automated Machine Learning Approach for Classifying Infrastructure Cost Data. Comput.-Aided Civ. Infrastruct. Eng. 2024, 39, 1061–1076. [Google Scholar] [CrossRef]
Antoniou, F.; Aretoulis, G.; Giannoulakis, D.; Konstantinidis, D. Cost and Material Quantities Prediction Models for the Construction of Underground Metro Stations. Buildings 2023, 13, 382. [Google Scholar] [CrossRef]
Jianzu, W.; Chaojie, Z. Machine Learning in Strategic Management Research: A Review and Prospects. Foreign Econ. Manag. 2025, 47, 119–136. [Google Scholar] [CrossRef]
Kim, M.-K.; Thedja, J.P.P.; Chi, H.-L.; Lee, D.-E. Automated Rebar Diameter Classification Using Point Cloud Data Based Machine Learning. Autom. Constr. 2021, 122, 103476. [Google Scholar] [CrossRef]
Zeng, Z.; Ying, G.; Zhang, Y.; Gong, Y.; Mei, Y.; Li, X.; Sun, H.; Li, B.; Ma, J.; Li, S. Classification of Failure Modes, Bearing Capacity, and Effective Stiffness Prediction for Corroded RC Columns Using Machine Learning Algorithm. J. Build. Eng. 2025, 102, 111982. [Google Scholar] [CrossRef]
Li, Y.; Ouyang, W.; Xin, Z.; Zhang, H.; Sun, S.; Zhang, D.; Zhang, W. Machine Learning for Defect Condition Rating of Wall Wooden Columns in Ancient Buildings. Case Stud. Constr. Mater. 2025, 22, e04458. [Google Scholar] [CrossRef]
Mitera-Kiełbasa, E.; Zima, K. Automated Classification of Exchange Information Requirements for Construction Projects Using Word2Vec and SVM. Infrastructures 2024, 9, 194. [Google Scholar] [CrossRef]
Kieu, T.C.; Shide, K. Analysis of the possibility of table-to-table linkage in the creation of WBS using a faceted classification systems—Targeting OmniClass and Uniclass2015. AIJ J. Technol. Des. 2022, 28, 986–991. [Google Scholar] [CrossRef]
Romero-Jarén, R.; Arranz, J.J. Automatic Segmentation and Classification of BIM Elements from Point Clouds. Autom. Constr. 2021, 124, 103576. [Google Scholar] [CrossRef]
Emunds, C.; Pauen, N.; Richter, V.; Frisch, J.; van Treeck, C. SpaRSE-BIM: Classification of IFC-Based Geometry via Sparse Convolutional Neural Networks. Adv. Eng. Inform. 2022, 53, 101641. [Google Scholar] [CrossRef]
Cui, J.; Zang, M.; Liu, Z.; Qi, M.; Luo, R.; Gu, Z.; Lu, H. BIM Product Style Classification and Retrieval Based on Long-Range Style Dependencies. Buildings 2023, 13, 2280. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed methodology framework.

Figure 2. Example of the Hierarchical Structure between EF and Ss Codes.

Figure 3. Hierarchical Relationship between EF and Ss Codes.

Figure 4. Feature Variables Extracted from IFC Data for Each BIM Object.

Figure 5. Correspondence between IFC Information and LOD Levels.

Figure 6. Mapping Relationships between EF and SS Classifications in the Dataset.

Figure 7. Composition Ratio of EF Codes in the Dataset.

Figure 8. Composition Ratio of Ss Codes in the Dataset.

Figure 9. Confusion Matrix (EF).

Figure 10. Confusion Matrix (Ss).

Figure 11. Feature Importance Analysis for Target Labels (Left: EF Labels, Right: SS Labels).

Table 1. Overview of the Uniclass Classification Tables and Their Functions.

Uniclass Tables	Description
Activities (Ac)	The Activities table classifies the activities that take place in existing assets, or that need to be accommodated within them.
Complexes (Co)	The Complexes table classifies high-level groupings in the built environment and tends to describe a group of entities brought together in one place as a complex, for a particular purpose or multiple activities.
Entities (En)	The Entities table classifies individual parts of an asset, like buildings, bridges, or tunnels.
Spaces/locations (SL)	The Spaces/locations table classifies spaces where activities take place and locations where specific items or equipment can be found, often in linear infrastructure like pipelines, roads, and rail.
Elements/functions (EF)	The Elements/functions table classifies general elements like walls, decks, and roofs, which can be thought of as the main components of buildings, structures, towers or tunnels, and functions, which describe generic services required for asset operation, such as piped gas supply, rail and paving heating, or waste collection.
Systems (Ss)	The Systems table classifies collections of products brought together to operate as systems, in order to provide a common purpose or solution.
Products (Pr)	The Products table classifies individual products used across the built environment, including those assembled to create systems, and objects located as part of asset operation or functions.
Tools and equipment (TE)	The Tools and equipment table classifies tools and equipment, such as plant machinery, vehicles, tunnel boring machines, formwork, scaffolding, and temporary hoardings across the full-range of built environment for the construction and ongoing maintenance and repair of assets.
Project management (PM)	The Project management table classifies requirements, information, and records for asset management and project management across the full lifecycle of the built environment, at all scales.
Form of information (FI)	The Form of information table classifies forms of information, often exchanged as part of asset management and construction projects, with codes for contract, quotation, room data sheet, bill of quantities, three-dimensional model, or invoice.
Roles (Ro)	The Roles table classifies the individual or organizational roles required in asset management and the successful delivery of built environment projects.
Risk (RK)	The Risk table is used to categorize various types of risks associated with the lifecycle of built assets, facilitating the identification, management, and communication of potential risks during the design, construction, and operation phases of a project.
Material (Ma)	The Material table classifies materials used in the built environment.
Properties and characteristics (PC)	The Properties and Characteristics table is designed to categorize various attributes and characteristics of built assets, supporting detailed description and management throughout the asset lifecycle, and enhancing consistency and traceability of information.
CAD and modelling content (Zz)	The Zz_ table supports CAD and modelling content to assist with clear and consistent layer naming in modelling platforms, and managing the various components required in digital drawings, models, and construction outputs.

Adapted from Uniclass by NBS [7].

Table 2. Comparison of recent BIM object classification methods.

Study	Method	Classification System	Feature Types	LOD Requirement	Scope
Liu et al. (2024) [12]	Multimodal Deep Learning	Custom Labels	Geometry, Semantic, Topological	300+	All Components
M. Seydgar et al. (2024) [13]	PC-DNN	Ifc Classes	Geom., Semantic	300+	All Components
Xu et al. (2022) [14]	Random Forest	OmniClass	Geometry, Spatial, Material	300+	Rigid Frame
Utkucu et al. (2024) [20]	Ensemble Learning	Custom Labels	Geometry, Image	300+	Wall Structure
Yu et al. (2023) [21]	Ensemble Deep Learning	Ifc Classes	Geometry, Image, Relational	200	All Components
Koo et al. (2021) [22]	SVM	Custom Labels	Geometry, Relational	300+	All Components
This Study	Random Forest	Uniclass	Geometry, Semantic	200	All Components

Table 3. Basic Information of the Construction Projects Used for Dataset Creation.

Project	Site Area (m²)	Building Area (m²)	Number of Floors	Structure Type	Structural Systems
A	221.44	167.00	6	RC (Seismic Resistant)	Rigid Frame
B ¹	4127.08	1935.11	7	RC (Seismic Resistant)	Rigid Frame
C	463.65	261.73	9	RC (Seismic Isolation)	Rigid Frame
D	326.07	188.80	4	RC (Seismic Resistant)	Wall Structure
E	209.98	146.10	6	RC (Seismic Resistant)	Rigid Frame

¹ Project B consists of three complex buildings.

Table 4. Truth Table Confusion Matrix.

	Actual Positive	Actual Negative
Predicted Positive	TP (True Positive)	FP (False Positive)
Predicted Negative	FN (False Negative)	TN (True Negative)

T: True (prediction is correct); F: False (prediction is incorrect); P: Positive (belongs to the target class); N: Negative (does not belong to the target class).

Table 5. Classification results of different machine learning algorithms.

Model	Precision EF	Precision Ss	Recall EF	Recall Ss	F1-Score EF	F1-Score Ss	Accuracy EF	Accuracy Ss
RF	0.99	0.99	0.99	0.94	0.99	0.96	1.00	0.99
DT	1.00	0.97	0.96	0.93	0.98	0.94	1.00	0.98
SVM	0.99	0.85	0.91	0.77	0.94	0.79	0.99	0.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, S.; Bito, T.; Shide, K. Automatic Classification of BIM Object Based on IFC Data Using the Uniclass Classification Standard. Buildings 2025, 15, 2347. https://doi.org/10.3390/buildings15132347

AMA Style

Tang S, Bito T, Shide K. Automatic Classification of BIM Object Based on IFC Data Using the Uniclass Classification Standard. Buildings. 2025; 15(13):2347. https://doi.org/10.3390/buildings15132347

Chicago/Turabian Style

Tang, Shi, Takamasa Bito, and Kazuya Shide. 2025. "Automatic Classification of BIM Object Based on IFC Data Using the Uniclass Classification Standard" Buildings 15, no. 13: 2347. https://doi.org/10.3390/buildings15132347

APA Style

Tang, S., Bito, T., & Shide, K. (2025). Automatic Classification of BIM Object Based on IFC Data Using the Uniclass Classification Standard. Buildings, 15(13), 2347. https://doi.org/10.3390/buildings15132347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Classification of BIM Object Based on IFC Data Using the Uniclass Classification Standard

Abstract

1. Introduction

2. Related Work

2.1. Building Classification System

2.2. BIM Object Classification

2.3. Automatic Classification by Machine Learning

2.4. Comparative Summary of BIM Object Classification Methods

3. Methodology

3.1. Research Process

3.2. Uniclass Classification System

3.3. Data Acquisition and Feature Selection

3.4. Evaluation Metrics

4. Experiments and Results

4.1. Data Preprocessing

4.2. Model Training

4.3. Experimental Results

5. Discussion

5.1. Results Analysis

5.1.1. Classification Results Analysis

5.1.2. Feature Importance Analysis

5.2. Comparison with Existing Methods

5.3. Research Scope

5.4. Potential Impact

5.5. Future Recommendations

5.6. Key Research Findings and Contributions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI