Comparing Domain Expert and Machine Learning Data Enrichment of Building Registry

Torim, Ants; Iliste, Elisa; Pikas, Ergo; Liiv, Innar; Robal, Tarmo; Kalamees, Targo

doi:10.3390/buildings15111798

Open AccessArticle

Comparing Domain Expert and Machine Learning Data Enrichment of Building Registry

by

Ants Torim

^1,*

,

Elisa Iliste

²

,

Ergo Pikas

^2,*,

Innar Liiv

¹,

Tarmo Robal

¹

and

Targo Kalamees

^2,*

¹

School of Information Technology, Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn, Estonia

²

Department of Civil Engineering and Architecture, School of Engineering, Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn, Estonia

^*

Authors to whom correspondence should be addressed.

Buildings 2025, 15(11), 1798; https://doi.org/10.3390/buildings15111798

Submission received: 17 April 2025 / Revised: 16 May 2025 / Accepted: 22 May 2025 / Published: 24 May 2025

(This article belongs to the Special Issue Data Analysis and Energy Modeling in Smart and Zero-Energy Buildings and Communities)

Download

Browse Figures

Versions Notes

Abstract

Municipal decision-makers must define and quantitatively analyze full-renovation scenarios adapted to specific districts and buildings to achieve the European Union (EU) target of saving 60% to 90% of energy by renovating 75% of building stock. However, poor open-data quality presents a tenacious challenge, especially for automatic calculations or decision-making. This study addresses the challenge of enriching Estonian Building Registry (EBR) data by predicting the actual external wall type from existing registry information. To achieve this, both domain expert rules and machine learning models were employed. The study used a training dataset of 416 buildings and a test dataset of 66 buildings. While previous research comparing expert-based and machine learning approaches has been limited and yielded mixed results, our findings demonstrate that both methods perform similarly, improving the initial wall type classification accuracy from 54% to 89%.

Keywords:

machine learning; data enrichment; expert assessment; construction data management; deep renovation

1. Introduction

The renewed European Union (EU) Energy Performance of Buildings Directive (EBPD) makes renovation mandatory in all member states, targeting to save 60% to 90% of energy by renovating 75% of EU building stock [1]. Sandberg et al. [2] modelled European building stock and showed that renovation volumes must rise. Suitable cost-optimal and energy-efficient measures must be developed to achieve decarbonized building stock by 2050. Currently, strategies, approaches, and methods for dealing with the goals of the renovation wave are primarily focused on individual building level to find cost-optimal renovation solutions [3,4,5].

The Renovation Wave calls for an integrated, participatory, neighborhood-to- neighborhood renovation approach tailored to local environments. Municipal decision-makers and large-scale real estate owners must define, generate, simulate, and quantitatively analyze full-renovation scenarios adapted to specific districts and buildings to make well-informed investment decisions [6]. To assess the energy performance of the buildings on a district scale, renovation strategy tools use data from national registries such as the building register, register of cultural monuments, climatic restrictions from the Land Board, etc. [7,8]. However, the effectiveness of these tools depends on the accuracy and completeness of the underlying data. In Estonia, for example, the quality of the Estonian Building Registry (EBR) data significantly affects the feasibility of (semi-)automatically suggesting energy-efficient renovation strategies for selected areas [9,10]. In particular, we try to solve the problem of data inconsistency in the EBR.

For preparing and maintaining long-term renovation plans, statistical or reference-building methods, or a combination of the two, are typically used [7]. Calculating energy performance on a district scale often requires data that are unavailable in public registries. Previously, three approaches have been used to tackle this issue: building typologies, building archetypes, and reference buildings [11]. Studies have been conducted to assess energy performance and promote building typology-based building-stock renovation in 20 European countries [12] and evaluate building stock characteristics in Eastern Europe [13,14,15].

There are multiple studies on building classification, many of which have compared different machine learning models on similar data available in the Estonian Building Registry (EBR). One study compared different machine learning models to classify buildings as residential or non-residential using OpenStreetMap [16]. A 2021 study used machine learning to characterize buildings by the number of floors and the first year of use based on data on yearly electricity consumption, height, perimeter, area, and type of building [17].

A typology similar to TABULA (2016 or later) [12] was developed to support energy performance calculations in the Estonian building stock renovation strategy. This typology groups buildings into standardized types, requiring each building to be attributed to a specific category based on key characteristics. Among these, the external wall type is a critical parameter, as it significantly influences thermal performance and, consequently, renovation planning. While the Estonian Building Registry (EBR) contains many entries describing external wall materials, these values are often inconsistent, overly specific, or missing [18,19].

To address this, the national typology simplifies the range of wall descriptions found in the EBR into four generalized categories: wood, brick, lightweight concrete, and precast concrete panel. These categories are defined primarily by material and structural properties, for example, distinguishing between single-layer block walls and multi-layer panel systems. This generalization improves consistency for modeling purposes, though it requires resolving mismatches in the source data.

For instance, walls made from gas silicate panels are frequently misclassified in the EBR. Some users label them as lightweight blocks, while others list them as precast concrete panels. Neither label is technically accurate, though the former more closely reflects the material’s physical properties. In the typology, these are classified under lightweight concrete. Such inconsistencies highlight the need for a systematic identification process to assign the correct wall type, as classification errors can significantly impact renovation strategy due to differing thermal performance and retrofit requirements.

Data enrichment research has mainly focused on adding data, such as enriching the Great Britain landslide database with data from old newspaper articles [20]. Belsky and Sacks [21] described an innovative approach for enriching building models with semantically useful concepts inferred from explicit and implicit information in the building model. Their prototype applies a rule-processing engine and allows the composition of inference rule sets that can be tailored for different domains. Similarly, statistical modeling techniques such as Gaussian mixture models (GMM) with expectation maximization (EM) have been applied to build energy performance datasets to generate synthetic data, improving the reliability of large-scale energy modeling [22]. These probabilistic approaches can help address data inconsistencies and enhance the accuracy of energy performance predictions. Furthermore, multiple studies have focused on building level of detail (LOD) data enrichment, like the enrichment of the level of development of LOD1 and LOD2 models with windows and doors to reach LOD3 [23,24,25].

When datasets like EBR are already available, the focus shifts from adding new data to improving data quality. Ensuring the accuracy of external wall data in building registers, such as the EBR, is critical for developing effective national renovation strategies across the EU. External wall type plays a key role in calculating a building’s thermal properties, including thermal transmittance, thermal bridges, and air leakage rate. Given that these calculations inform renovation needs, inaccuracies in wall-type classification can lead to suboptimal investment decisions and ineffective energy-saving measures. Our analysis indicates that the external wall data in the EBR is of low reliability. We hypothesize that this issue can be addressed through data enrichment, using either an expert-driven approach or machine learning techniques. Specifically, by leveraging existing registry data, it should be possible to derive missing wall-type classifications or identify incorrect entries with sufficient accuracy, improving the reliability of thermal performance assessments and supporting more effective renovation strategies.

Comparison of expert and machine learning (ML) models is a sparsely researched topic. Although the superiority of ML models is often assumed and sometimes demonstrated [26,27,28], a credit scoring case study by Bend-David and Frank [29] found no statistically significant advantage for dozens of ML models over the expert system. They concluded that more expert systems in various domains need to be tested against machine learning models before one can reliably deduce which of the approaches gives more accurate results and under which conditions. Older studies in the area of medical diagnosis [30] reported an advantage of expert systems over ML methods, while studies on soybean pathology [26] and census data coding [27] gave the advantage to ML. In the field of construction, an article by Bloch and Sacks [28] from 2018 compared machine learning and rule-based inferencing for semantic enrichment of building information models, specifically for the classification of room types. Their expert model applied rule sets to unique feature signatures. They concluded that while the ML approach was very effective, the rule engineering approach was only able to classify 5 out of 15 types of spaces. A review of the 86 articles citing Ben-David and Eibe [29] does not reveal any new comparisons between ML and expert models though there are articles describing hybrid expert–ML models [31,32]. To address this gap, our study systematically compares these two approaches in the context of EBR data enrichment, assessing their respective advantages and limitations. Our comparison should help in filling this research gap for our area of application. Therefore, our main objectives are the development of effective data enrichment models to correct the faulty wall types in the EBR and the comparison of expert and ML approaches for this problem domain.

We will give an overview of current research comparing expert and ML models, describe the models used to predict the correct external wall type, and evaluate their accuracy. The method of enriching low-quality data using a smaller set of expert-annotated training data and the comparison of expert and machine learning models should be of general interest. Finally, we will verify the hypothesis that additional training data would significantly improve the performance of the ML models.

2. Methods

2.1. Data from the Estonian Building Registry

Our research object, the Estonian Building Registry (EBR) (https://livekluster.ehr.ee/ui/ehr/v1 (accessed on 22 March 2025)), contains information about buildings and infrastructure objects that are planned, under construction, or have been constructed on the territory of Estonia. The registry was established in 2003 and holds records for more than 758,545 buildings. Building owners and local governments use the EBR to process construction documents. The EBR also contains digital twin models of buildings at LOD 0, 1, and 2 [33]. EBR and LOD models with building-specific reference information create a unique possibility to speed up building energy performance assessment at the district level. However, data availability and quality per building in the EBR and LOD models vary significantly. The EBR is the only authoritative source to obtain necessary initial information for building performance calculation at the neighbourhood level. Yet, there are limitations that we aim to overcome through data enrichment.

To develop expert and machine learning classification approaches, 416 apartment buildings from the EBR were studied and used as training data. The focus of this research is on older buildings waiting for renovation. The addresses of 416 buildings were collected from a database of renovated buildings. This ensured that all buildings possessed readily available design documentation within the EBR, enabling us to cross-verify the registry information. The correct external wall type was assigned to each building through a combination of visual on-site observations and an examination of design documentation. In parallel, a building registry-based type was queried from the EBR. In cases where multiple values or information were missing (an occurrence present in a single case within our training data), the wall type was left unassigned.

A closer look at these 416 buildings helped us identify a data quality gap. To evaluate the data quality, the EBR wall type value was used. External walls form most of the building envelope areas for multi-story buildings and, compared to other building envelope structures, have more variations in used materials and solutions. Thus, the external wall types are the basis for archetype assessment for Estonian apartment building archetypes. If the wall type value in the EBR contained either 0 or multiple values, the wall type was determined as unassigned, whereas for single values, the wall type was determined as the EBR value. The EBR wall types were compared to expert-assigned wall types. Overall, 46% of buildings (training and test data) suffered from an inaccurate wall type (in 22% of cases, the assigned wall type was incorrect, whereas for 24% of buildings, the wall type was unassigned).

An additional set of 66 buildings was utilized to validate the performance of both expert assessments and machine learning (ML) models. These buildings had not been renovated and as a result, design documentation was not available in the EBR. Their inclusion was primarily based on the feasibility of visually classifying the external wall type. The test set size was determined by practical constraints, particularly the availability of buildings meeting these criteria and the resources required for visual verification. The selected set provided sufficient coverage of the major wall type categories observed in the training data. Similarly to training data, EBR and expert observation wall types were acquired for all 66 buildings. The EBR data were evaluated similarly: if the wall type value in EBR contained either 0 or multiple values, the type was determined as unassigned, and for single values, the wall type was determined as the EBR value. Overall, 46% of buildings were assigned an inaccurate wall type (13% of walls were assigned an incorrect type, and 33% were unassigned). The data quality in EBR was similar for both the training and test datasets.

The training data obtained from EBR contained ten different external wall types. However, several of these were not feasible external wall types in principle—one such example is cladding. Cladding is an exterior finish material rather than a type of wall, illustrating the problem with raw data from the EBR. Therefore, we could not use these types as prediction targets and used expert labels to reduce the list of external wall types to four: wood, brick, lightweight concrete, and precast concrete panels.

The building data from the EBR contain 40 attributes that our models use to predict external wall type. These can be divided into the following:

Attributes describing building areas: net floor area, heated area, building common area, etc., presented as real numbers.
Attributes describing buildings rooms and floors: number of rooms, number of underground floors, etc., presented as integer numbers.
Nominal textual attributes describing the building location: county, city, and address.
Real numbered coordinates describing the building location: x and y.
Nominal textual attributes describing various materials: external walls (our target), load-bearing walls, etc.
Nominal textual values describing building systems: heating system, cooling system, ventilation system, and energy sources—several may be present at the same time, for example, an oven and a heat pump.

The 416 expert-labeled buildings do not present a large amount of data and, therefore, are not well suited for deep learning models. However, many machine learning (ML) models can be trained on such a limited amount of data. For example, the seminal classification dataset of iris flowers by R.A. Fisher [34] contains only 150 flowers in total for training and testing. In addition, as buildings built during the Soviet era were standardized, there is also a high level of regularity. In the following sections, our learning curves demonstrate that the ML model’s performance plateau can be reached on even smaller sample sizes.

2.2. Expert and Machine Learning Approaches

This study addresses a well-known and general classification problem in the field of machine learning: predicting the class (here, external wall type) from other data. A subset of buildings was hand-checked by an expert for correct external wall type, out of which 416 buildings were used to train the models and 66 separate buildings were used as test data.

An iterative process was used to develop the expert model. A literature-based analysis [35,36,37] was utilized first to identify attributes for verifying the EBR data on external walls. The first four main attributes were identified from the literature, and new ones were added based on iterative analysis of EBR data.

The general machine learning process is depicted in Figure 1. We used expert labeling to provide ground truth for wall types in a subset of EBR data, which we further divided into training and test data. We use training data and hyperparameters (like the maximum depth of a decision tree) to train our predictive models. Then, we evaluate their performance with the test data.

We created three machine learning (ML) models: decision tree [38], random forest [39], and logistic regression ML classifiers. Training of those ML models was based on several hyper-parameters (parameters controlling the training process) like the maximal depth of a tree or the number of estimators in the forest. We used automatic hyper-parameter tuning to find optimal hyper-parameters for accuracy. The fourth ML model based on the simplified random forest was added after learning curve analysis.

Learning curves that plot model performance for a given sample size were used to detect over-fitting and to assess possible benefits from the additional training data. If the model’s performance is significantly better for training than the test sample, then it indicates that the model complexity approaches that of the training data. Such a model encodes the expected result for each training sample (building), performing poorly on novel data due to over-fitting. A larger amount of training data reduces over-fitting. Secondly, if the model’s performance on test data plateaus before the maximal available amount is reached, then we cannot expect much benefit from additional training data for that model. However, if the performance maintains an upward trend, we can expect further performance improvements from additional training data.

To compare our expert and ML models, we divided our data into training and test datasets and used well-known metrics like accuracy, F1 score, and confusion matrix for the evaluation.

A feature engineering step that selects an informative subset of existing attributes or introduces new attributes is often used in machine learning processes. While selecting an effective subset of attributes can benefit some models like logistic regression, models like decision tree and random Forest are quite insensitive to the presence of irrelevant attributes. As our main aim is to develop machine learning models that mimic the expert model (a decision tree), use the same data, and can be implemented quickly (advantage of ML), we did not include a feature engineering step.

3. Expert and Machine Learning Models

We created expert and ML models to predict the correct external wall material using the training dataset in which we provided the correct external wall type (annotations) based on expert observations. Our test dataset contained 66 annotated buildings with expert labels.

3.1. Expert Data Enrichment

Four main attributes were identified based on the literature review [35,36,37]: external wall type value in EBR, location (represented by county), number of floors, and year of first use. A dataset of 416 buildings was used to evaluate whether the literature-based attributes represent reality. Our analysis demonstrated that the statistical characteristics of these attributes were consistent with the findings documented in the literature. Consequently, the first iteration of the expert model was developed utilizing those four attributes. However, this initial model’s efficacy was found to be inadequate, primarily due to some buildings having either missing or too many wall type values listed in EBR. As a solution, additional attributes had to be considered. We introduced the structural material value obtained from the EBR as an additional attribute for buildings with either no or multiple values listed as the wall type. The expert-created decision tree model is depicted below as a nested list with predicted wall types in bold. A sub-tree of the full decision tree is described separately as Sub-tree One to manage complexity.

External wall type?
(a)
Brick: Structural material value contains block?
- Yes: Lightweight concrete.
- No: Brick.
(b)
Small or large block: More than 5 stories?
- Yes: Lightweight concrete.
- No: Brick.
(c)
Precast concrete panel: Structural material value contains block?
- Yes: Lightweight concrete.
- No: Year of first use is before 1960?
  - Yes: Brick.
  - No: Building location is Harju, Rapla, Lääne-Viru, or Tartu?
    Yes: Precast concrete panel.
    No: Lightweight concrete.
(d)
Equals wood, log or timber truss with filling: More than 4 stories?
- Yes: Go to Sub-tree One...
- No: Wood.
(e)
Has multiple values or value is missing: Go to Sub-tree One...

Sub-tree One is depicted below.

2.

Structural material type?

(a)

Brick: Brick.

(b)

Small or large block: More than 5 stories?

Yes: Brick.
No: Lightweight concrete.

(c)

Has multiple values: Structural material value contains block?

Yes: Lightweight concrete.
No: Structural material value contains wood?
- Yes: More than 4 stories?
  - Yes: Stone.
  - No: Wood.
- No: Structural material value contains brick?
  - Yes: Brick.
  - No: Wood.

(d)

Value is missing: More than 4 stories?

Yes: Stone.
No: Year of first use is before 1920?
- Yes: Wood.
- No: Stone.

The expert model was designed with a decision tree structure, branching into three distinct paths based on the availability of data within the EBR. Focusing on the reliability of the external wall type value within the EBR gave this attribute top priority in the decision-making process. The three distinct paths in the expert model were implemented as follows:

In instances where the external wall type consisted of a singular value, supplementary checks were conducted, culminating in the assignment of the final wall type.
In the case of missing or multiple external wall type values, the determination of wall type relied on the structural material type value, as depicted in Sub-tree One. As a result, the expert model predicted a detailed type or the general identification of stone or wood type.
For multiple structural material types, the listed values were checked for keywords. If certain keywords were found, then assigning a specific external wall type was possible. A stone or wood type was assigned if keywords were not listed.
If the structural material type was missing in the EBR data, then based on the checks (Sub-tree One), only the general stone or wood external walls were predicted. This represents a loss of accuracy.

3.2. Machine Learning Data Enrichment

The expert and machine learning (ML) models used the same training and test datasets. We transformed all the data into a numerical form for our ML models. We dropped the attribute corresponding to the precise building address as it was too specific. We kept the attributes corresponding to county or town. The original dataset contained several other nominal attributes corresponding to building materials used for roofs, walls, etc., or the presence of a furnace, heat pump, etc. These were all transformed into binary attributes for each nominal value using one-hot encoding.

We used decision tree [38], random forest [39], and logistic regression ML classifiers as implemented in the Python 3.11 scikit-learn [40] library, version 1.2.2. We tuned the hyper-parameters that defined the structure of our models and the training process for each of these models using grid search as implemented using scikit-learn. The tuning process optimized the hyper-parameters for best accuracy. We analyze the learning curves using the corresponding scikit-learn functionality.

The first model we trained was a decision tree. Optimal hyper-parameters for the decision tree were as follows:

criterion = entropy;
max depth = 3;
splitter = random.

Criterion is the function that measures the quality of the sub-sample selected by each node in the tree; the options checked were entropy and Gini. Max depth sets the maximum amount of nodes in one branch. Splitter selects the feature and threshold used in the decision node; the options checked were random and best. A detailed hyper-parameter description can be found in the Python scikit-learn library [40] documentation and theoretical literature. A decision tree trained on the training data of 416 buildings using the above-mentioned optimal hyper-parameters is shown below as a nested list where output types are in bold.

Outer wall marked as multi-layer enforced concrete panel?
(a)
Yes: Harju County?
- Yes: Precast concrete panel.
- No: Structural material is marked as other?
  - Yes: Precast concrete panel.
  - No: Lightweight concrete.
(b)
No: Outer wall marked as small or large block?
- Yes: Lightweight concrete.
- No: External wall exterior finish is marked as wood?
  - Yes: Wood.
  - No: Brick.

It is interesting to note that the max depth is quite low, and the decision tree has somewhat lower complexity than the expert model, which is also organized as a decision tree. As machine learning training processes are pretty opaque, it is hard to give a precise explanation for such simplicity besides the fact that such a model gives optimal results for the hyperparameter search space under consideration.

Next, we explored the random forest. The optimal hyper-parameters for the random forest were as follows:

criterion = Gini;
max depth = unlimited;
number of trees = 80.

Criterion and max depth are the same as for the previous decision tree model, applying to all decision trees in the forest, while the number of trees is the number of decision trees in the forest. The prediction of the forest is found through the majority voting of the trees. Interestingly, the optimal max depth of a single tree is unlimited, which makes every tree in the forest more complex than the previous decision tree model and the entire forest much more complex. It is unrealistic to reproduce the entire forest of 80 trees here. Therefore, we give the feature importances for the nine most important features of the forest as calculated by sklearn in Table 1.

We also trained a simplified version of the random forest, but as the justification of its creation follows from the analysis of the results of the random forest, we present it later in the Results section.

Finally, we trained an ML model based on logistic regression that estimates the probability of the material based on linear functions of the features, where the features are assigned positive (increase the probability), zero, or negative (decrease the probability) coefficients. The logistic regression pipeline included standardization of all attributes to a mean of 0.0 and standard deviation of 1.0.

Optimal hyper-parameters for the logistic regression were as follows:

C = 0.45;
Penalty = L1;
Max iterations = 180;
Solver = liblinear.

Cost (C) is the inverse regularization strength; smaller values support stronger regularization, and the default is 1. Regularization aims to simplify the generated functions. The penalty is the norm to use for regularization, with the options being L1 (lasso) and L2 (least squares error). L1 tends to increase the number of zero coefficients. Max iterations is the maximum number of iterations used in training the model and the solver represents different algorithms for generating the model, with options being lbfgs and liblinear.

As the number of non-zero coefficients was high even with L1 regularization, we will only present those numbers for each class in Table 2 to illustrate the complexity of our functions.

4. Results

A test set of 66 buildings was used to evaluate the accuracy of the created models. To assess the baseline accuracy of the registry data, we compared the raw EBR data to expert-verified wall types. Using the same approach described in the Methods section for evaluating training data, 54% of the buildings in the test set had a correctly assigned wall type in the EBR. Applying the developed expert model, 89% of buildings were assigned correct wall types, while ML random forest correctly assigned 88% of wall types. In the context of gas silicate panel walls, the most-often indicated wall type within the EBR, the building register values were correct for 56%. Using the expert model decreased the error and the accurate type was assigned for 82% of buildings.

We evaluate the performance of the models by accuracy and F1 score. Accuracy is the ratio of correct predictions to all predictions. Accuracy can be misleadingly high if classes are unbalanced and the classifier exhibits good performance for common classes and poor performance for rare classes. The F1 score is used to compensate for the class imbalance. For two classes (positive and negative) F1 score is defined as follows:

F 1 = 2 * \frac{(p r e c i s i o n * r e c a l l)}{(p r e c i s i o n + r e c a l l)}

(1)

where precision is the ratio of true positives to all predicted positives and recall is the ratio of true positives to all real positive cases. The multiclass F1 score (macro F1 score) is an average F1 score over all classes (positive being the presence of the class and negative the lack of the class). Accuracy and F1 score are therefore both dimensionless quantities that range from 0 to 1. Random guessing between two equally probable classes should have an accuracy and F1 score of around 0.5. The results for our models are presented in Table 3.

The confusion matrix displays the frequencies of objects for all true class (row) and predicted class (column) combinations (correct predictions are on the main diagonal), making explicit the frequencies of different error types (false positives, false negatives for binary classification).

4.1. Expert Data Enrichment

Utilizing the expert model, the correct type was predicted for 89% of buildings.

Accuracy = 0.8933.
F1 = 0.9012.

The confusion matrix for the expert model is shown in Table 4. As shown in the confusion matrix, the different types of errors are mostly balanced with no single type dominating. There is a slight tendency towards predicting precast concrete when the actual wall type is lightweight concrete or brick.

These results improve significantly on initial low-quality data and are very competitive with the results from machine learning models described below.

4.2. Machine Learning Data Enrichment

For the ML data enrichment models, we checked for over-fitting and the potential for improvement through additional data by plotting the learning curves. As this involves training and evaluating the model afresh for various sample sizes, we could not use this method for the hand-crafted expert model.

Metrics for the decision tree calculated on the test data were as follows:

Accuracy = 0.8267.
F1 = 0.8452.

The confusion matrix is shown in Table 5.

As demonstrated by the scores and confusion matrix, our classes are quite balanced, and the simple accuracy metric is informative and sufficient. As shown in the confusion matrix, there is a strong tendency to predict precast concrete when the actual wall type is lightweight concrete.

The learning curve for the decision tree (Figure 2) plateaus around 220 buildings and the difference between training and validation accuracy becomes quite narrow beyond this point. This demonstrates the absence of over-fitting and that we have sufficient data for the model.

Metrics for the random forest calculated on the test data were as follows:

Accuracy = 0.8800.
F1 = 0.8903.

The confusion matrix is shown in Table 6. As was the case with the expert model, the different types of errors are mostly balanced with no single type dominating. There is no tendency to falsely predict lightweight concrete walls as precast concrete, which was the case in the previous model.

Although the learning curve for random forest (Figure 3) plateaus around 220 buildings, the difference between training and validation accuracy remains wide and accuracy for training data is optimal at 1.0. This indicates strong over-fitting from the complex model (no limit for max depth of trees, etc.). Even when validation accuracy remains high, this indicates benefits of simplifying the model or training the model with a larger dataset. Acquiring a larger labeled dataset was not realistic at the time of the study; therefore, we also included a simpler random forest.

To find the optimal hyper-parameters for the simpler random forest, we limited the max depth to five and the maximal number of trees to 100 for our hyper-parameter tuning. The best hyper-parameters were the following:

criterion = entropy.
max depth = 5.
number of trees = 60.

Feature importances for the 10 most important features of the simplified forest, as calculated by sklearn, are given in Table 7. These are dimensionless units showing the mean decrease in impurity as measured by Gini or entropy metrics.

Metrics for the simple random forest calculated on the test data were the following:

Accuracy = 0.8133,
F1 = 0.8319.

There is a slight decrease in accuracy and a slight increase in F1 score compared to the previous complex version of random forest (see Table 3).

The confusion matrix is shown in Table 8. There is a tendency to falsely predict that lightweight concrete walls are precast concrete walls, which was not present in the previously discussed more complex random forest.

The learning curve for the simple random forest (Figure 4) shows a dip in training accuracy starting from 250 training samples. Over-fitting is less of a problem than for a more complex random forest, but additional data would likely reduce this even further.

Finally, we trained a logistic regression model. The linear functions for predicting each material were following:

Its performance metrics were as follows:

Accuracy = 0.8133
F1 = 0.8319

There is a decrease in accuracy and F1 score compared to the decision tree and both random forest models.

The confusion matrix is shown in Table 9 and, as was the case with decision tree, significantly exhibits the error of predicting precast concrete when the actual wall type is lightweight concrete.

Its learning curve (Figure 5) plateaus around the sample size of 250 buildings. Over-fitting does not seem to be a major problem at this point, as the training and validation curves become quite close. Additional data are unlikely to improve this model.

To compare the performance of different methods over different wall types, we also calculated F1 scores (Equation (1)) for each separate class (wall type) as shown in Table 10.

We can see that the expert model and random forest outmatch other models when predicting precast and lightweight concrete walls. For other wall types, the performance is comparable.

5. Discussion

5.1. Data Enrichment

The data in the EBR were only 54% accurate for our 66 test buildings. The results of experiments show marked improvement when using either expert or machine learning models to enrich the data and are summarized in Table 3. These results confirm our hypothesis that the external wall data in EBR are of low reliability and data enrichment by both the expert model and machine learning models improves it significantly. This method could be applicable to various registries and databases. The important benefit of data enrichment is the possibility of assigning values to buildings without a wall or structural material type listed and of correctly identifying lightweight concrete type walls.

5.2. Comparison of Expert and ML Approaches

On our relatively small dataset, the results of the expert model were slightly better than those of the machine learning model. These results conform to a similar comparison by Ben-David and Frank [29]. The ML decision tree had a somewhat simpler structure compared to the expert decision tree, as its depth of three was smaller than the depth of five of the expert model. Its accuracy was somewhat lower. The implementation of the machine learning model was easier than the creation of the expert model. Most of the effort for creating the machine learning model was spent on data preprocessing, whereas the training and evaluation time was quite short. On the other hand, the creation of the expert model involved a literature review, which was not needed for machine learning. Also, the expert model required correcting the errors in the data, for example, the floor number information using Google Maps for training and test data.

5.3. Limitations

It is important to note that the structure of the expert model is more flexible, sometimes falling back to predicting a more general class stone instead of brick, precast concrete panel, or lightweight concrete, while ML models have to predict the four classes provided in the training data. While these cases are rare (and not present in our test data as demonstrated by the confusion matrix—Table 4), the accuracy of the expert model may be somewhat overvalued compared to ML models in future tests.

The learning curves show that while decision tree and logistic regression are well suited to a relatively small amount of available training data, the most accurate ML model, random forest, would benefit from more data as it currently shows a serious amount of over-fitting. Additional data would also make the accuracy and F1 score metrics more stable and reliable. More complex models like larger random forests and deep neural networks become feasible on larger datasets and should improve the accuracy of predicting the external wall type. When a large amount of training data is available, the expert model could lag behind these deep learning approaches. There are several good reviews in the literature on the application of such approaches in construction [41] and geotechnicals [42]. But compared to expert, decision tree, and logistic regression models, random forests and deep neural networks are harder to fully interpret due to their complexity.

We also note that the Logistic Regression model could benefit from feature engineering, which was not part of our process.

We have shown that the expert model could be a feasible alternative or complement to the ML model, but there is no guarantee that this applies to each sub-domain as demonstrated by contrary results obtained by Bloch and Sacks [28] for room type classification.

A further limitation concerns the geographic and structural context of the data. Both the expert and ML models were developed using data from the EBR, which is EU-based and relatively detailed. As a result, the models may not apply well to building registries in other regions, especially those with lower-resolution data. Applying these models to non-EU datasets or to databases with fewer attributes requires adaptations to account for differences in data availability, structure, and local construction practices.

Additionally, the quality of the underlying EBR data introduces potential biases. Incorrect registry values affect model performance. This is particularly relevant for the expert model, which relies heavily on the assumed correctness of certain fields in the registry. Such inconsistencies can lead to misclassification and reduced model reliability.

5.4. Further Work and Applications

Both approaches show promise and should be useful in similar projects. Potential topics for the future involve combining expert and ML models into an improved decision tree or combining expert and ML models into a hybrid voting-based ensemble method. Transfer learning may hold some promise regarding overcoming the problems posed by limited training data, but it faces serious challenges concerning different building stocks due to geography, climate, history, and different database structures. The hybrid integration of expert and machine learning models to enrich building registries can lead to the development of more accurate building information models. Active learning, where the model selectively queries a human expert to label uncertain or informative examples, could be another promising avenue of study. The conclusion of Ben-David and Frank [29] that more expert systems in various domains need to be tested against ML models is still true 15 years later. The field of construction could be a fruitful test-bed for such comparisons.

6. Conclusions

External wall type determines the thermal properties of a building and is a critical input for developing good renovation strategies. We proposed a solution for improving the data quality regarding external wall types in the Estonian Building Registry, though our approach could be applied to various databases and registries. We explored expert-created solutions (expert decision tree) and several machine learning models suitable for smaller training data: decision tree, random forest, and logistic regression. We estimate the initial accuracy of the Estonian Building Registry wall type data to be 54%; the expert model-enriched wall types are 89% accurate, and the random forest-enriched wall types are 88% accurate. Therefore, both expert and machine learning models efficiently improve the data quality. The good performance of the expert model shows that it could still be a viable option in well-understood domains where available training data are limited and the cost of crafting an expert model is acceptable. These results contribute to the sparse research comparing expert and machine learning models.

Random forest is the most complex of the studied models, while the machine learning decision tree is somewhat less complex but less accurate than the expert model. Learning curve analysis suggests that random forest would benefit from more training data while other machine learning models achieved their peak performance. Logistic regression is a model that could benefit from additional feature engineering. We conclude that data enrichment can significantly improve the quality of data that are used to calculate the energy efficiency of buildings, and further improvements are possible, for example, through hybrid integration of expert and ML systems.

Author Contributions

Conceptualization, I.L., E.I., E.P., T.K. and T.R.; methodology, A.T., E.I., E.P. and I.L.; software, A.T.; validation, A.T. and E.I.; formal analysis, A.T., E.I. and I.L.; investigation, A.T. and E.I.; resources, T.K.; data curation, A.T., E.I., E.P. and I.L.; writing—original draft preparation, A.T., E.I., E.P., I.L. and T.K.; writing—review and editing, A.T., E.I., E.P., I.L., T.R. and T.K.; visualization, A.T. and E.I.; supervision, T.K. and T.R.; project administration, E.P., I.L. and T.K.; funding acquisition, E.P. and T.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Estonian Research Council grant (PSG963); by the European Commission through LIFE IP BUILDEST (LIFE20 IPC/EE/000010); and by the Estonian Centre of Excellence in Energy Efficiency, ENER (grant TK230) funded by the Estonian Ministry of Education and Research.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

BIM	Building Information Modeling
EBR	Estonian Building Registry
F1	a ML evaluation score
LOD, LOD1, LOD2	Level of Detail
ML	Machine Learning
RF	Random Forest

References

European Commission. A European Green Deal; Publications Office of the European Union: Luxembourg, 2021. [Google Scholar] [CrossRef]
Sandberg, N.H.; Sartori, I.; Heidrich, O.; Dawson, R.; Dascalaki, E.; Dimitriou, S.; Vimm-r, T.; Filippidou, F.; Stegnar, G.; Šijanec Zavrl, M.; et al. Dynamic building stock modelling: Application to 11 European countries to support the energy efficiency and retrofit ambitions of the EU. Energy Build. 2016, 132, 26–38. [Google Scholar] [CrossRef]
Kuusk, K.; Kalamees, T. Retrofit cost-effectiveness: Estonian apartment buildings. Build. Res. Inf. 2016, 44, 920–934. [Google Scholar] [CrossRef]
Mora, M.D.; Fabbri, K.; Berardinis, L.D.; Andersen, K.K.; Boukhanouf, A.; Ferrando, M. Cost-optimum analysis of building fabric renovation in a Swedish multi-story residential building. Energy Build. 2014, 84, 662–673. [Google Scholar] [CrossRef]
Niemelä, T.; Kosonen, R.; Jokisalo, J. Cost-effectiveness of energy performance renovation measures in Finnish brick apartment buildings. Energy Build. 2017, 137, 60–75. [Google Scholar] [CrossRef]
European Commission. Renovation Wave; European Commission: Brussels, Belgium, 2020. [Google Scholar]
Civiero, P.; Pascual, J.; Arcas Abella, J.; Bilbao Figuero, A.; Salom, J. PEDRERA. Positive Energy District Renovation Model for Large Scale Actions. Energies 2021, 14, 2833. [Google Scholar] [CrossRef]
Ang, Y.Q.; Berzolla, Z.M.; Letellier-Duchesne, S.; Jusiega, V.; Reinhart, C. UBEM.io: A web-based framework to rapidly generate urban building energy models for carbon reduction technology pathways. Sustain. Cities Soc. 2022, 77, 103534. [Google Scholar] [CrossRef]
Arumägi, E.; Hallik, J.; Kisel, E. Quantification of Building Envelope Heat Losses on a District Level for Comparative Renovation Strategies Assessment. J. Phys. Conf. Ser. 2023, 2654, 012003. [Google Scholar] [CrossRef]
Iliste, E.; Lomp, S.; Kisel, E.; Liiv, I.; Kalamees, T. Heat loss characteristics of typology-based apartment building external walls for a digital twin-based renovation strategy tool. J. Phys. Conf. Ser. 2023, 2654, 012125. [Google Scholar] [CrossRef]
Ali, U.; Shamsi, M.H.; Hoare, C.; Mangina, E.; O’Donnell, J. A data-driven approach for multi-scale building archetypes development. Energy Build. 2019, 202, 109364. [Google Scholar] [CrossRef]
Loga, T.; Stein, B.; Diefenbach, N. TABULA building typologies in 20 European countries—Making energy-related features of residential building stocks comparable. Energy Build. 2016, 132, 4–12. [Google Scholar] [CrossRef]
Csoknyai, T.; Hrabovszky-Horváth, S.; Georgiev, Z.; Jovanovic-Popovic, M.; Stankovic, B.; Villatoro, O.; Szendrő, G. Building stock characteristics and energy performance of residential buildings in Eastern-European countries. Energy Build. 2016, 132, 39–52. [Google Scholar] [CrossRef]
Tuominen, P.; Holopainen, R.; Eskola, L.; Jokisalo, J.; Airaksinen, M. Calculation method and tool for assessing energy consumption in the building stock. Build. Environ. 2014, 75, 153–160. [Google Scholar] [CrossRef]
Xue, F.; Wu, L.; Lu, W. Semantic enrichment of building and city information models: A ten-year review. Adv. Eng. Inform. 2021, 47, 101245. [Google Scholar] [CrossRef]
Atwal, K.S.; Anderson, T.; Pfoser, D.; Züfle, A. Predicting building types using OpenStreetMap. Sci. Rep. 2022, 12, 19976. [Google Scholar] [CrossRef] [PubMed]
Krayem, A.; Yeretzian, A.; Faour, G.; Najem, S. Machine learning for buildings’ characterization and power-law recovery of urban metrics. PLoS ONE 2021, 16, e0246096. [Google Scholar] [CrossRef]
Parts, E.R.; Pikas, E.; Parts, T.M.; Arumägi, E.; Liiv, I.; Kalamees, T. Quality and accuracy of digital twin models for the neighbourhood level building energy performance calculations. E3S Web Conf. 2023, 396, 04021. [Google Scholar] [CrossRef]
Iliste, E. Basics of the Energy Efficiency Assesment of Residential Areas Based on the Data of the Building Register. Master’s Thesis, Tallinn University of Technology, Tallinn, Estonia, 2023. [Google Scholar]
Taylor, F.E.; Malamud, B.D.; Freeborough, K.; Demeritt, D. Enriching Great Britain’s National Landslide Database by searching newspaper archives. Geomorphology 2015, 249, 52–68. [Google Scholar] [CrossRef]
Belsky, M.; Sacks, R.; Brilakis, I. Semantic Enrichment for Building Information Modeling. Comput.-Aided Civ. Infrastruct. Eng. 2016, 31, 261–274. [Google Scholar] [CrossRef]
Han, M.; Wang, Z.; Zhang, X. An Approach to Data Acquisition for Urban Building Energy Modeling Using a Gaussian Mixture Model and Expectation-Maximization Algorithm. Buildings 2021, 11, 30. [Google Scholar] [CrossRef]
da Silva Ruiz, P.R.; Almeida, C.M.d.; Schimalski, M.B.; Liesenberg, V.; Mitishita, E.A. Multi-approach integration of ALS and TLS point clouds for a 3-D building modeling at LoD3. Int. J. Archit. Comput. 2023, 21, 14780771231176029. [Google Scholar] [CrossRef]
Zhang, X.; Chen, K.; Johan, H.; Erdt, M. A Semantics-aware Method for Adding 3D Window Details to Textured LoD2 CityGML Models. In Proceedings of the 2022 International Conference on Cyberworlds (CW), Kanazawa, Japan, 27–29 September 2022; pp. 63–70. [Google Scholar] [CrossRef]
Szcześniak, J.T.; Ang, Y.Q.; Letellier-Duchesne, S.; Reinhart, C.F. A method for using street view imagery to auto-extract window-to-wall ratios and its relevance for urban-level daylighting and energy simulations. Build. Environ. 2022, 207, 108108. [Google Scholar] [CrossRef]
Michalski, R.S.; Chilausky, R.L. Knowledge acquisition by encoding expert rules versus computer induction from examples: A case study involving soybean pathology. Int. J. Man Mach. Stud. 1980, 12, 63–87. [Google Scholar] [CrossRef]
Creecy, R.H.; Masand, B.M.; Smith, S.J.; Waltz, D.L. Trading MIPS and memory for knowledge engineering. Commun. ACM 1992, 35, 48–64. [Google Scholar] [CrossRef]
Bloch, T.; Sacks, R. Comparing machine learning and rule-based inferencing for semantic enrichment of BIM models. Autom. Constr. 2018, 91, 256–272. [Google Scholar] [CrossRef]
Ben-David, A.; Frank, E. Accuracy of machine learning models versus “hand crafted” expert systems—A credit scoring case study. Expert Syst. Appl. 2009, 36, 5264–5271. [Google Scholar] [CrossRef]
Musen, M.A. Automated Support for Building and Extending Expert Models. In Knowledge Acquisition: Selected Research and Commentary: A Special Issue of Machine Learning on Knowledge Acquisition; Marcus, S., Ed.; Springer: Boston, MA, USA, 1990; pp. 101–129. [Google Scholar]
Yousofi Tezerjan, M.; Safi Samghabadi, A.; Memariani, A. ARF: A hybrid model for credit scoring in complex systems. Expert Syst. Appl. 2021, 185, 115634. [Google Scholar] [CrossRef]
Mazzetto, S. Hybrid Predictive Maintenance for Building Systems: Integrating Rule-Based and Machine Learning Models for Fault Detection Using a High-Resolution Danish Dataset. Buildings 2025, 15, 630. [Google Scholar] [CrossRef]
Estonian Land Board. Live Kluster Portal (EHR); Estonian Land Board: Tallinn, Estonia, 2025.
Fisher, R.A. The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Kalamees, T.; Õiger, K.; Kõiv, T.A.; Liias, R.; Kallavus, U.; Mikli, L.; Lehtla, A.; Kodi, G.; Luman, A.; Arumägi, E.; et al. Eesti Eluasemefondi Suurpaneel-Korterelamute Ehitustehniline Seisukord Ning Prognoositav Eluiga; Technical Report; Tallinna Tehnikaülikool: Tallinn, Estonia, 2009. [Google Scholar]
Kalamees, T.; Kõiv, T.A.; Liias, R.; Õiger, K.; Kallavus, U.; Mikli, L.; Ilomets, S.; Kuusk, K.; Maivel, M.; Mikola, A.; et al. Eesti Eluasemefondi Telliskorterelamute Ehitustehniline Seisukord Ning Prognoositav Eluiga; Technical Report; Tallinna Tehnikaülikool: Tallinn, Estonia, 2010. [Google Scholar]
Kalamees, T.; Arumägi, E.; Just, A.; Kallavus, U.; Mikli, L.; Thalfeldt, M.; Klõšeiko, P.; Agasild, T.; Liho, E.; Haug, P.; et al. Eesti Eluasemefondi Puitkorterelamute Ehitustehniline Seisukord Ning Prognoositav Eluiga; Technical Report; Tallinna Tehnikaülikool: Tallinn, Estonia, 2011. [Google Scholar]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. Mach. Learn. Python 2011, 12, 2825–2830. [Google Scholar]
Akinosho, T.D.; Oyedele, L.O.; Bilal, M.; Ajayi, A.O.; Delgado, M.D.; Akinade, O.O.; Ahmed, A.A. Deep learning in the construction industry: A review of present status and future innovations. J. Build. Eng. 2020, 32, 101827. [Google Scholar] [CrossRef]
Zhang, W.; Li, H.; Li, Y.; Liu, H.; Chen, Y.; Ding, X. Application of deep learning algorithms in geotechnical engineering: A short critical review. Artif. Intell. Rev. 2021, 54, 5633–5673. [Google Scholar] [CrossRef]

Figure 1. Machine learning process with data flows.

Figure 2. Learning curve for decision tree.

Figure 3. Learning curve for random forest.

Figure 4. Learning curve for simple random forest.

Figure 5. Learning curve for logistic regression.

Table 1. Feature importances for random forest.

Feature	Importance
Structural material: brick	0.0604
First time in use year	0.0515
External wall: brick	0.0495
External wall: small or large block	0.0433
x (Coordinate)	0.0394
Structural material: prefabricated reinforced concrete	0.0384
External wall: multi-layer reinforced concrete panel	0.0355
Area of habitable spaces	0.034
Building footprint area	0.0314

Table 2. Function complexity characterization.

Type	# Non-Zero Coefficients
Wood	7
Lightweight concrete	45
Precast concrete panel	16
Brick	36

Table 3. Comparison of methods.

Method	Accuracy	F1 Score
Expert Model	0.89	0.90
Decision Tree	0.83	0.85
Random Forest	0.88	0.89
Simplified RF	0.83	0.85
Logistic Regression	0.81	0.83

Table 4. Confusion matrix for expert model.

	Predicted
Actual	Precast Concrete	Lightweight Concrete	Brick	Wood
precast concrete	18	2	2	0
lightweight concrete	0	20	3	0
brick	0	0	21	0
wood	0	1	0	8

Table 5. Confusion matrix for decision tree.

	Predicted
Actual	Precast Concrete	Lightweight Concrete	Brick	Wood
precast concrete	16	6	0	0
lightweight concrete	3	18	2	0
brick	0	1	20	0
wood	0	0	1	8

Table 6. Confusion matrix for random forest.

	Predicted
Actual	Precast Concrete	Lightweight Concrete	Brick	Wood
precast concrete	20	2	0	0
lightweight concrete	1	19	3	0
brick	0	2	19	0
wood	0	0	1	8

Table 7. Feature importances for simplified random forest.

Feature	Importance
External wall exterior finish: brick	0.1122
First time in use year	0.0744
External wall: multi-layer reinforced concrete panel	0.0729
External wall: brick	0.0559
External wall exterior finish: prefabricated reinforced concrete	0.0474
External wall: small or large block	0.0444
Area of habitable spaces	0.0386
Net building area	0.0384
Number of habitable spaces	0.0376
Building footprint area	0.0326

Table 8. Confusion matrix for simple random forest.

	Predicted
Actual	Precast Concrete	Lightweight Concrete	Brick	Wood
precast concrete	15	7	0	0
lightweight concrete	3	17	3	0
brick	0	0	21	0
wood	0	0	1	8

Table 9. Confusion matrix for logistic regression.

	Predicted
Actual	Precast Concrete	Lightweight Concrete	Brick	Wood
precast concrete	15	7	0	0
lightweight concrete	3	17	3	0
brick	0	0	21	0
wood	0	0	1	8

Table 10. F1 scores for each class.

Class	Expert Model	Decision Tree	Random Forest	Simplified Random Forest	Logistic Regression
precast concrete	0.9	0.78	0.93	0.75	0.75
lightweight concrete	0.91	0.77	0.83	0.72	0.72
brick	0.89	0.89	0.86	0.91	0.91
wood	0.94	0.94	0.94	0.94	0.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Torim, A.; Iliste, E.; Pikas, E.; Liiv, I.; Robal, T.; Kalamees, T. Comparing Domain Expert and Machine Learning Data Enrichment of Building Registry. Buildings 2025, 15, 1798. https://doi.org/10.3390/buildings15111798

AMA Style

Torim A, Iliste E, Pikas E, Liiv I, Robal T, Kalamees T. Comparing Domain Expert and Machine Learning Data Enrichment of Building Registry. Buildings. 2025; 15(11):1798. https://doi.org/10.3390/buildings15111798

Chicago/Turabian Style

Torim, Ants, Elisa Iliste, Ergo Pikas, Innar Liiv, Tarmo Robal, and Targo Kalamees. 2025. "Comparing Domain Expert and Machine Learning Data Enrichment of Building Registry" Buildings 15, no. 11: 1798. https://doi.org/10.3390/buildings15111798

APA Style

Torim, A., Iliste, E., Pikas, E., Liiv, I., Robal, T., & Kalamees, T. (2025). Comparing Domain Expert and Machine Learning Data Enrichment of Building Registry. Buildings, 15(11), 1798. https://doi.org/10.3390/buildings15111798

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparing Domain Expert and Machine Learning Data Enrichment of Building Registry

Abstract

1. Introduction

2. Methods

2.1. Data from the Estonian Building Registry

2.2. Expert and Machine Learning Approaches

3. Expert and Machine Learning Models

3.1. Expert Data Enrichment

3.2. Machine Learning Data Enrichment

4. Results

4.1. Expert Data Enrichment

4.2. Machine Learning Data Enrichment

5. Discussion

5.1. Data Enrichment

5.2. Comparison of Expert and ML Approaches

5.3. Limitations

5.4. Further Work and Applications

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI