A Method for Selecting the Appropriate Source Domain Buildings for Building Energy Prediction in Transfer Learning: Using the Euclidean Distance and Pearson Coefficient

Luo, Chuyi; Xia, Liang; Hong, Sung-Hugh

doi:10.3390/en18143706

Open AccessArticle

A Method for Selecting the Appropriate Source Domain Buildings for Building Energy Prediction in Transfer Learning: Using the Euclidean Distance and Pearson Coefficient

by

Chuyi Luo

,

Liang Xia

and

Sung-Hugh Hong

^*

Department of Architecture and Built Environment, University of Nottingham Ningbo China, 199 East Taikang Road, Ningbo 315100, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(14), 3706; https://doi.org/10.3390/en18143706

Submission received: 31 May 2025 / Revised: 30 June 2025 / Accepted: 9 July 2025 / Published: 14 July 2025

(This article belongs to the Special Issue Innovations in Low-Carbon Building Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

Building energy prediction faces challenges such as data scarcity while Transfer Learning (TL) demonstrates significant potential by leveraging source building energy data to enhance target building energy prediction. However, the accuracy of TL heavily relies on selecting appropriate source buildings as the source data. This study proposes a novel, easy-to-understand, statistics-based visualization method that combines the Euclidean distance and Pearson correlation coefficient for selecting source buildings in TL for target building electricity prediction. Long Short-Term Memory, the Gated Recurrent Unit, and the Convolutional Neural Network were applied to verify the appropriateness of the source domain buildings. The results showed the source building, selected via the method proposed by this research, could reduce 65% of computational costs, while the RMSE was approximately 6.5 kWh, and the R² was around 0.92. The method proposed in this study is well suited for scenes requiring rapid response times and exhibiting low tolerance for prediction errors.

Keywords:

transfer learning; Pearson correlation coefficient; Euclidean distance; data-driven method; deep learning; long short-term memory; gated recurrent unit; convolutional neural network

1. Introduction

The construction industry is the largest energy-consuming sector globally and one of the major sources of greenhouse gas emissions. According to the Global Status Report for Buildings and Construction 2023 [1], published by the United Nations Environment Programme, the building sector accounted for 34% of global energy demand and 37% of energy and process-related CO₂ emissions in 2022. To meet net-zero carbon targets, accelerating the decarbonization of the building sector and promoting the development of green buildings are indispensable and crucial [1]. The construction industry, as one of the major contributors to global greenhouse gas emissions, is also a key focus of attention at the United Nations Climate Change Conference 28 (COP28) with its “Building Breakthrough” initiative, signed by 28 countries including China [2], aimed at driving the construction industry toward a more sustainable and low-carbon future. The core goal of this initiative is to reduce carbon emissions in the construction sector by improving building energy efficiency and promoting the design and construction of green buildings through innovation and collaboration. The Energy Performance of Buildings Directive [2], published by the European Commission, paid more attention to the energy efficiency of existing buildings, as 75% of EU buildings are still energy inefficient. New policy measures emphasized the importance of building digitalization, monitoring, building automation and smartness, data collection, and sharing.

In addition, Artificial Intelligence (AI) technologies could be applied to produce energy data for analysis, management, and forecasting [1,3]. Since there is a close relationship between building energy efficiency and accurate building energy consumption prediction, an accurate energy consumption prediction could help optimize building design, operation, maintenance, and financial cost reduction and enable facility managers to make informed decisions towards improving building energy efficiency and thereby reduce energy consumption. When it comes to AI algorithm application in energy prediction, the majority of research aims to obtain the highest level of accuracy [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. The summary of these literature reviews is listed in Table A1. A large variety of supervised machine learning algorithms (a structure of AI applications) have been utilized for energy prediction and the Artificial Neural Network [12,15,25,43], Support Vector Machine (SVM) [9,10,13,44] and Decision Tree were recognized as popular [4].

The data-driven models, a kind of AI application, are able to make predictions, classifications, or decisions by learning patterns from large amounts of data. In the context of building energy prediction, historical data was one of the essential features of data-driven models in building energy prediction, due to the ability to increase the prediction accuracy of dynamic loads [5]. In the commercial sector, big data is frequently leveraged for customer information classifications and to provide robust support for future decision-making [6]. However, newly constructed buildings or buildings that do not yet have energy monitoring systems installed lack high-quality historical data. A building without high-quality historical data could result in low accuracy due to underfitting.

Transfer Learning (TL) is a data technique that employs information from previous related tasks to assist in solving new ones [45]. Due to its excellent performance in processing small data, it has been widely applied in graphics recognition, text classification, and web page classification. Based on research on TL from different fields, it has been demonstrated that, if appropriately implemented, TL has the following advantages: (1) reducing the amount of training data required in the development of the target model; (2) saving time in constructing and training models; and (3) improving model prediction performance. In recent years, several studies have focused on applying TL to data-driven building energy prediction with insufficient training data and thus taking advantage of additional datasets from other buildings [46,47]. Source domain and task domains, as well as target domain and target tasks, were identified as four important factors in TL. A domain consisted of a domain and marginal distribution, while a task could be seen as the output of a model. Typically, researchers designated a domain with sufficient data as the source domain. The importance of the source domain in TL was reflected in data quality and task relevance. High-quality source domain data could help the model learn more generalizable features.

Therefore, selecting an appropriate source domain and optimizing its relationship to the target domain is key to the success of TL. To effectively leverage TL for building energy prediction model development, guides on how to select datasets as source domain models are in urgent need. So far, some scholars have been studying how to define an appropriate match method for selecting the source domain on TL. Some studies utilize the Pearson coefficient alone for correlation detection [45,46]. Others employ individual distance metrics, such as Euclidean or Chebyshev distance measurements [47,48]. However, research on methodologies that combine distance metrics with the Pearson coefficient to establish data selection guidelines for source domains remains unexplored.

This study presents a novel and effective method for selecting relatively appropriate source domain buildings in TL for building energy prediction, leveraging a combination of the Euclidean distance and Pearson correlation coefficient. By integrating the metrics of these two methods, the proposed approach addresses the limitations of each method when used in isolation. The Pearson correlation coefficient captures the linear correlation between the features of the source and target domains, while the Euclidean distance provides insights into the geometric proximity of the source and target domains in the Euclidean space. This dual-metric method ensures a robust, readily applicable, and accurate selection guide of source domain buildings, enhancing the performance of TL models.

The remaining parts of this paper are constructed as follows: Section 2 gives a brief overview of the basic concepts related to this research, such as the Pearson correlation coefficient and Euclidean distance; Section 3 proposes an effective method for source domain building selection by using the Euclidean distance and Pearson correlation; Section 4 discusses the research results; and Section 5 concludes the key findings.

2. Background

2.1. Pearson Correlation Coefficient and Euclidean Distance

The Pearson correlation coefficient quantifies the linear relationship between two continuous, normally distributed variables, while the Pearson correlation coefficient might overlook the structure of data distribution. Assume X and Y are samples: X contains n sample observations (x₁, x₂, x₃,……, x_n) and Y contains n sample observations (y₁, y₂, y₃,……, y_n). Then the Pearson correlation coefficient is defined as follows (1):

r = \frac{(N \sum x_{i} y_{i} - \sum x_{i} \sum y_{i})}{\sqrt{{N_{x_{i}}}^{2} - {(\sum x_{i})}^{2}} \sqrt{{N_{y_{i}}}^{2} - {(\sum y_{i})}^{2}}}

(1)

The value of r is in the interval [−1, 1]. When r = 1, X and Y have a completely positive correlation, and when r = −1, X and Y have a completely negative correlation. When r = 0, the linear correlation between X and Y is not obvious.

The Euclidean distance metric quantifies the shortest path length between two points in the Euclidean space, equivalent to the magnitude of the displacement vector joining them. In data analysis and machine learning, the Euclidean distance is widely employed as a metric to quantify the similarity between data points. A smaller Euclidean distance implies greater proximity in the feature space, which often corresponds to higher similarity in their underlying characteristics (e.g., cluster assignment in unsupervised learning or affinity modeling in recommender systems). The Euclidean distance (d) between (x₁, y₁) and (x₂, y₂) in a two-dimensional coordinate system is shown in (2):

d = \sqrt{{{(x}_{2} - x_{1})}^{2} + {{(y}_{2} - y_{1})}^{2}}

(2)

However, the Euclidean distance has certain limitations. In high-dimensional spaces, the Euclidean distance tends to become ineffective because the distances between data points become increasingly similar. The Euclidean distance is sensitive to noise and outliers, which may lead to incorrect source domain selection. The Euclidean distance is based solely on the geometric distance and cannot reflect relation similarity between data points. The Euclidean distance focuses only on the distance between data points and ignores the relevance between source and target domain tasks.

2.2. Transfer Learning

In 1976, Bozinovski and Fulgosi [49] published a paper addressing TL in neural network training. Source domains, task domains, target tasks, and target domains are fundamental concepts in TL. A domain consisted of a domain and marginal distribution, while a task could be seen as the output of a model. Typically, researchers designated a domain with sufficient data as the source domain (D_S) and tasks performed on the source domain as the source domain task (T_S). The domain and task that lack sufficient data are defined as the target domain (D_T) and target task (T_T).

2.2.1. Classifications of Transfer Learning

TL can be classified into various types based on different criteria.

Task relationships

A kind of classification of TL based on task relationships and the specific implementation was introduced. When both D_S and D_T have the same input space and distribution but have different T_S and T_T, it is referred to as homogeneous TL. When D_S ≠ D_T, it is called heterogeneous TL [43,50,51,52,53,54,55] (Figure 1).

Detailed implementation

TL can be distinguished into four categories regarding the specific implementation method of TL: instance-based, feature-based, parameter-based, and relation-based.

When there are similar data patterns between the D_S and D_T, the instance-based TL can be implemented. This method involved finding features and data in the source domain similar to the D_T and adjusting the weights of this data to match the D_T data. The model is moved from D_S to tackle the T_T. The advantages lie in its simplicity and ease of implementation. However, the drawbacks include the reliance on empirical weight selection and similarity measurement when there are small similarities in data distribution between the source and target domains.

Feature-based TL assumes that D_S and D_T share some common overlapping features. By explaining and transforming these features, the data from both domains is brought into the same space, resulting in a similar distribution between the D_S and D_T data. Its strengths include applicability to most methods and relatively high-accuracy performance. The challenges include difficulty in explaining and transforming these features and susceptibility to overfitting.

In parameter-based TL, a model trained extensively on data from D_S is applied to the D_T for prediction. By transferring the pre-trained model to the new domain, high accuracy can still be achieved. However, the drawbacks include difficulties in model parameter convergence.

Relation-based TL is suitable when two domains are similar and share some logical relationship. It involves applying logical network relationships from the source domain to the target domain for TL purposes [51].

Technical application

TL is divided into two methods according to technical application: fine-tuning and feature extraction [52]. Compared to fine-tuning, feature extraction typically requires less data and computational resources because only the newly added output layer needs to be learned, while the basic feature extraction layer has already learned the generic features from a pre-trained model. Fine-tuning allows the model to fully adapt to the data distribution of the new task, as all parameters are relearned. The distinction between them is shown in Table 1.

2.2.2. Transfer Learning in Building Energy Prediction

In recent years, several studies have focused on applying TL to data-driven building energy prediction considering insufficient training data and taking advantage of additional datasets from other buildings. In the mature research field of AI for building energy consumption prediction, there exists a few state-of-the-art research papers utilizing TL. Yuan et al. [43] presented models that forecasted the peak electricity demand and total energy consumption of a target building. The results indicated a direct learning error of 34.34% and 26.32%, whereas the errors decreased significantly to 12.48% and 10.78%, respectively, when employing TL. This demonstrated that the performance of the proposed TL models outperforms the other models when the dataset was insufficient. Taking the perspective of new construction, Gao et al. [55] highlighted the challenges of acquiring extensive historical data for newly constructed buildings. They proposed using TL alongside two deep learning models: a sequence-to-sequence model and a two-dimensional CNN, analyzed through case studies of three office buildings. The results showed that with only one year of data from the source domain, relatively high accuracy could be achieved. In efforts to enhance TL efficiency, Peng et al. [47] introduced a two-stage source domain building matching method based on advantage comparison to identify multiple source domain buildings similar to the target building. Real-world applications demonstrated that their proposed multi-energy load prediction method for buildings could achieve high-precision prediction results. It was observed that previous studies on TL-based building energy prediction usually transferred knowledge, such as model structures and parameters. The source dataset was usually composed of, based on engineering experience, one or a small number of buildings with a similar building industry and scale and in the same climate area as the target building.

2.2.3. Source Domain Selection Method

The importance of the D_S in TL is reflected in the prediction results with data quality and task relevance being the two essential factors. The quality of the source data directly impacts the effectiveness of TL, as high-quality and well-marked source domain data can help the model learn more generalizable feature patterns. Therefore, selecting an appropriate source domain for the target domain is key to high accuracy. Additionally, by ensuring the relevance between the source and target domains, the model can achieve higher accuracy and more generalization capabilities. So far, some scholars have been studying how to define an appropriate selection method for the source domain on TL.

The core methods proposed in some research papers were to identify the Euclidean distance between the target domain and the source domain. Jung et al. [46] selected similar data from collected electric load datasets from 25 districts in Seoul for five categories and various external data, such as the calendar, population, and weather data, by calculating the Pearson correlation coefficient and constructing a forecasting model using the selected data. Lastly, they fine-tuned the model using the target data. Jebli et al. [45] used the Pearson correlation coefficient to identify relevant meteorological data for model training, thereby improving prediction accuracy. Peng et al.’s [47] paper proposed a multi-source TL-guided ensemble Long Short-Term Memory (LSTM) method for building multi-load forecasting. A two-stage source domain building matching method based on dominance comparison was developed to find multiple source domain buildings similar to the target building. In the first stage, the Euclidean distance with less computational complexity was used to find candidate source buildings that may be similar to the target building. In the second stage, a more accurate dynamic time warping distance measurement was used to delete such buildings with low similarity from the source buildings. Next, an LSTM modeling strategy combining TL and fine-tuning technology was proposed using multiple source domain data to generate multiple basic load forecasting models for the target building. In Iglesias and Kastner’s [45] study on clustering time series to identify typical building energy patterns, they compared four similarity measures: the Euclidean distance, Mahalanobis distance, dynamic time warping distance, and a distance based on Pearson’s correlation. The findings revealed the Euclidean distance achieved the best, balanced overall performance.

2.3. Black-Box (Data-Driven Method)

Black-Box building load prediction models leverage extensive historical data enabling the accurate forecasting of future trends and precise values [56]. Since this model relies on massive data, it can be a called data-driven model as well. The Black-Box method could be categorized into shallow learning networks (e.g., linear regression, XGBoost, Support Vector Machines (SVMs), and Decision Trees) and deep learning (e.g., neural networks) [44,56]. The advantages include achieving prediction errors of less than 10% with a large historical dataset or the appropriate network structure, less computation cost, and no assumptions for inexplicable inputs [43,56]. However, the drawbacks include a lack of interactivity and challenges in finding optimal parameter interpretability [57,58].

Linear regression is the simplest and most common predictive method, using a linear model to forecast building energy consumption. Although it is easy to use, it is limited in handling nonlinear relationships [59,60]. Decision Trees, including Random Forest and gradient boosting trees, could manage complex nonlinear relationships and are robust to noise in the data, but deeper trees often lead to poor application on new data. SVMs are suitable for high-dimensional feature spaces and can handle nonlinear data, but it has high computational complexity. Neural networks, including feedforward neural networks and LSTM networks, are well suited for complex nonlinear and large-scale datasets, capable of capturing long-term dependencies in time series data. This makes them highly suitable for modeling the complex and dynamic nature of building energy usage, providing more accurate and robust predictions compared to traditional methods.

2.3.1. Data-Driven Method Applications for Building Energy Prediction: From Machine Learning to Deep Learning

Using the ScienceDirect search engine, a statistical analysis of 86 scholarly articles, of which we selected 35 [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,47,61], discussing the use of data-driven models for building energy consumption prediction published between 2020 and 2023 was conducted and generated, as shown in Figure 2. While the detailed literature review is summarized in Table A1, in Figure 3, Long Short-Term Memory (LSTM), the Convolutional Neural Network (CNN), Random Forest (RF), and the Gate Recurrent Unit (GRU) are the most frequently discussed algorithms. It is evident from Figure 3 that among the articles discussing data-driven models over the past four years, LSTM was mentioned most frequently. The number of articles utilizing LSTM for building energy consumption prediction also steadily increased from 6 in 2020 [11,40,42,44,55,61], accounting for 22% of the total articles in 2020, to 9 in 2023 [8,9,12,13,14,15,16,17,43], rising to 25% in 2021 [7,34,35,36,37], and returning to 22% in 2023. While 2023 had the highest number of mentions of LSTM, the proportion of LSTM papers was not larger due to the different variety of machine learning algorithms mentioned in the published articles compared to the previous years. However, it was evident that LSTM had gradually matured and was widely applied in building energy consumption prediction. In 2020 and 2021, more articles focused on using shallow models in building energy consumption, such as the SVM and RF [32,34,35,36,37,39,40,44,61]. In 2020, articles using shallow models accounted for 37% of the surveyed articles, while deep learning models accounted for 25% [39,41,42,44,55]. There were 10 articles in 2021 focusing on shallow models. However, by 2023, only two articles mentioned SVM [9,13], four articles mentioned RF [9,14,16,43], and deep neural networks accounted for nearly 40%. This supports the preference for using deep neural networks in building energy consumption prediction, and the reason for this is the higher accuracy delivered by deep neural network models [9,10,11,13,14,15,16,17,18,19,21,22,23]. Most articles that used LSTM reported high accuracy with an RMSE of less than 10 [15,30,35,36], and some outstanding researchers have achieved prediction accuracy with an RMSE of less than 1 by modifying LSTM network structures or leveraging important features [17,21], thereby advancing the development of deep neural networks like LSTM in predicting building energy consumption.

RF was a frequently mentioned data-driven model; RF is a kind of traditional machine learning, while LSTM, the CNN, and the GRU are deep learning models. Since this study focuses on the performance differences of deep learning models in TL, RF is not addressed. LSTM, the CNN, and the GRU are introduced in Section 2.3.2, Section 2.3.3 and Section 2.3.4. Furthermore, only three deep learning models were tested and discussed in Section 3.

2.3.2. LSTM

LSTM is composed of several storage units that store information. In each storage unit, the functions of the input gate, forget gate, and output gate are the protection and control of information. The basic structure of LSTM and the related equations are shown in Figure 3.

Due to the structure called gates, LSTM has the ability to remove or add information from the previous to the subsequent unit. The gate is a structure to optionally let information through, and it is composed of a sigmoid neural net layer and a pointwise multiplication operation. The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. With the help of three control gates and storage units, LSTM can read, reset, and update long-term information [47].

2.3.3. GRU

Gated Recurrent Units (GRUs) are gate mechanisms in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al [61]. The structure of the GRU is similar to LSTM with a gate mechanism to keep or forget certain information but lacks a context vector or output gate, resulting in fewer parameter requirements than LSTM. Although the structure is similar, there is no concrete conclusion on which of the two algorithms is better.

A GRU encompasses two gates, the reset gate g_r and the update gate g_z. The g_z. is similar to the forget gate and input gate in LSTM as it controls storing or erasing potential features from the previous state that could be useful later. Meanwhile, the g_r controls the amount of information that should be discarded. The g_r makes the GRU efficient, allowing the GRU to reset information that is useless. The basic unit of a GRU is illustrated in Figure 4.

2.3.4. CNN

Although it has been almost 30 years since the first CNN was proposed, the modern CNN structure still shares common properties with the first version such as convolutional and pooling layers. Also, besides a few variations, the popular training method, the Back-Propagation technique, was another commonality since the 1990s [62]. This section will provide a brief overview of the conventional deep CNNs while introducing the most fundamental ideas and cornerstone structure. This research project mainly focuses on numerical data; therefore, the main focus was on a 1D CNN.

The configuration of a 1D-CNN was formed by the following hyperparameters: the number of hidden CNN and multilayer perceptron (MLP) layers/neurons, the filter (kernel) size in each CNN layer, the subsampling factor in each CNN, and the choice of pooling and activation functions. As in 2D CNNs, the input layer is a passive layer that receives the raw 1D signal and the output layer is an MLP layer with the number of neurons equal to the number of classes. Firstly, Layer l performs a sequence of convolutions, the sum of which is passed through the activation function. In the next step, the CNN layers process the raw 1D data and learn to “extract” such features that are used in the classification task performed by the MLP layers. Consequently, feature extraction and classification operations are fused into one process that can be optimized to maximize the classification performance. This is the major advantage of 1D CNNs, which can also result in low computational complexity since the only operation with a significant cost is a sequence of 1D convolutions that are simply linear weighted sums of two 1D arrays [62].

2.4. Evaluation Indicators

Three indicators were employed in this study to evaluate the variance between observed and predicted values: the root-mean-square error (RMSE), Mean Absolute Error (MAE), and coefficient of determination (R²).

During the training process, the RMSE (3) was utilized to gauge the disparity between predicted and measured values, serving as a common measure of such differences. The smaller the RMSE, the more precise the model’s predictions. All training RMSE values in this study were below 0.05 kWh.

R M S E = \sqrt{\frac{1}{N} \sum_{T = 1}^{N} {({O b s e r v e d}_{T} - {P r e d i c t e d}_{T})}^{2}}

(3)

N is the number of data.

The MAE (4) ranged from zero to positive infinity, with a value of zero indicating perfect alignment between predicted and measured values, thus representing an ideal model. The R² (5) in statistics was employed to measure the proportion of the variance in the dependent variable that was predictable from the independent variables, thereby assessing the explanatory power of the regression model. Typically, higher R² values for the same dataset indicate minimal differences between measured data and predicted values.

M A E = \frac{1}{N} \sum_{T = 1}^{N} |{O b s e r v e d}_{T} - {P r e d i c t e d}_{T}|

(4)

R^{2} = 1 - \frac{\sum_{T = 1}^{N} {({O b s e r v e d}_{T} - {P r e d i c t e d}_{T})}^{2}}{\sum_{T = 1}^{N} {({O b s e r v e d}_{T} - {P r e d i c t e d}_{T})}^{2}}

(5)

3. Methodology

3.1. Proposed Model

This research proposed an effective method for source domain building selection by using statistical concepts, which are the Euclidean distance and Pearson correlation. This combined approach enabled a more accurate selection of the source domain. The Pearson correlation coefficient captured the linear correlation between the features of the source and target domains, while the Euclidean distance provides insights into the geometric proximity of the source and target domains in the Euclidean space. While the Pearson correlation coefficient might overlook the structure of the data distribution, the Euclidean distance could complement this. However, the Euclidean distance might cause errors in high-dimensional data, and the Pearson correlation coefficient could compensate for this limitation. Additionally, the Pearson correlation coefficient provided an intuitive explanation of linear correlation, whereas the Euclidean distance offered an intuitive explanation of geometric distances. This study focuses on conducting fine-tuning TL by selecting buildings from different regions, with the same building type and comparable scales. Furthermore, this study visualizes the results of Euclidean distance and Pearson correlation coefficient analyses. By leveraging visual representations, which align more effectively with human cognitive processing capabilities than numerical data alone, this approach facilitates rapid comprehension and enhances the interpretability of the selection outcomes.

The whole procedures are shown in Figure 5. The first step was to select the target and source buildings from the dataset. Since the data was raw, there were missing and invalid values in both the source and target data. Hence, data processing was needed. The next step was to obtain the Euclidean distance and Pearson coefficient. After that, three mature deep learning networks were used to carry out the pre-training of the model and TL. Finally, the results are discussed.

3.2. Dataset

The data utilized in this study was obtained from the open dataset Building Data Genome Project 2 (BDG2). BDG2 comprised 3053 energy meters from 1636 buildings in 16 categories located across 19 sites in North America and Europe. The dataset covered a time span of two complete years, specifically 2016 and 2017, with hourly metering frequency. It included 8 measurements of chilled water, electricity, gas, hot water, irrigation, solar steam, and water, amounting to approximately 53.6 million data points. Additionally, the dataset provided corresponding hourly meteorological data files.

The naming convention within this dataset followed a format of “SITE ID + Building Type + Building ID”, for example, Robin_education_Zenia. The dataset comprised 19 sites, each named after a different animal. The number of building types is 15, including education, office, retail, and public service. Building IDs were named after common names.

In this study, the electricity raw data and weather data for education buildings in the Robin, Bear, and Rat areas were utilized from BDG2. In order to be a target building with limited history data, only 10% of Robin_education_Zenia data (1754 data points) was selected as the task domain. Then, we found 2 randomly selected buildings (Rat and Bear) as candidates for the source domain buildings. The data from 01:00 on 14 February 2016, to 01:00 on 14 March 2016, was used for TL. The building information is shown in Table 2. The selected features for this study included the timestamp, air temperature, dew temperature, sea level pressure, wind direction, and wind speed (Table 3).

3.3. Data Process

Since the data from BDG2 was raw, preprocessing was necessary due to missing values. MATLAB 2023b and SPSS 29 were used to identify missing values, which were then supplemented. Because of the temporal and seasonal nature of the data, interpolation combined with seasonal adjustments was employed for supplementation. After that, data distribution was examined to identify primary features for detailed analysis, such as outlier detection. Outliers were identified using the standard deviation method based on the average value of factors. An outlier was defined as a set of measured values deviating from the mean by more than 3 times. The impact of outliers on the model was mitigated through elimination. Min-Max normalization was applied. The data was divided into training and testing sets and was tiled into the pattern the algorithms could read.

3.4. Data Mining

Principal Component Analysis (PCA) is first performed to reduce feature redundancy. Then, the Euclidean distance quantifies feature-space separation between the three candidate buildings. Next, Pearson correlation coefficients identify critical input variables by measuring inter-feature relationships. After analyses of the Euclidean distance and Pearson correlation, the distance between the core inputs in the source and target domains was evaluated.

3.4.1. Component Analysis

Prior to conducting the TL, factor analysis was employed to facilitate feature dimensionality reduction. This study utilized Principal Component Analysis (PCA) to generate the Component Matrix as illustrated in Table 4. The matrix demonstrates that six original features were extracted into three principal components. In Component 1, both air temperature and dew temperature exhibited factor loadings exceeding 0.9, indicating these two features were strongly loaded on this component. To mitigate potential redundancy in feature information, this research selectively retained only air temperature as the representative feature while excluding dew temperature from subsequent analyses.

3.4.2. Calculate Euclidean Distance

The K-means cluster uses the Euclidean distance to calculate the distance between data and cluster centers. The K-means cluster relies on the Euclidean distance to allocate data to different clusters and update cluster centers. In other words, the Euclidean distance is its core metric tool. Figure 6 shows the algorithm steps of the K-means cluster. In this research, the K-means cluster was used to determine the cluster centroid of each source and target buildings with 6 features. After obtaining the cluster centroid, the Euclidean distance among the cluster centroid was visualized to find the appropriate target building.

3.4.3. Calculate Pearson Coefficient

Pearson correlation was processed on the features of all buildings to obtain the affinity matrix. The significance of the affinity matrix was to describe the similarity or relationship between data features, using the affinity matrix to determine the most relevant weather features for predicting the electricity of the target building. Afterwards, using the features of the source domain building and the features of the target buildings, one more affinity matrix was created to identify the Pearson coefficient between the weather features, which was most relevant to the target domain, and the source domain weather features.

3.4.4. Combining Euclidean Distance and Pearson Coefficient

According to the visualization of the Euclidean distance among the cluster centroids and Pearson coefficient, a matched source building could be determined. In theory, a source building showing a similar position of the cluster centroid to the target building could be considered an appropriate source building for the target building; and a source building with features showing a strong relation to the weather features of a target building that was most relevant to the prediction task could be an appropriate source building for the target building.

When the source building and target building were decided, the data then was put into the data-driven model. Source data will be used to pre-train the model. Then the TL strategy was applied to obtain the prediction results for the target building. The predicted and measured results were discussed regarding accuracy and model performance.

3.5. Data-Driven Model Construction

All the networks were trained and executed on an Inter(R) Iris(R) Xe system featuring an 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80 GHz processor, running Windows 10 and utilizing MATLAB 2023b. The hardware resources consisted of a single CPU. Figure 7 shows the 3 deep learning structures.

3.5.1. LSTM

To prevent overfitting, 60% of the electricity data from the source building was utilized as the pre-training model, while the remaining 40% was reserved for validation. Data normalization and format transformation were implemented to prepare the data for input into the LSTM network structure, facilitating the network’s learning process. Following pre-training, a fine-tuning TL strategy was employed: 70% of the target domain was used as the training set and 30% for validation. The hyperparameters were determined primarily through empirical experimentation, with extensive trials identifying convergence points. The hyperparameters and network structure are defined in Table 5 and Figure 7a.

3.5.2. GRU

The GRU showed an overall trend of reduced time consumption as the number of hidden units decreased. Hidden units of 16 and above require 15 min or more, while hidden units of 8 and below require less than 7 min. LSTM takes less time than the GRU for the same number of hidden units. Under the same data, the GRU usually takes less time than LSTM, but the actual time may be affected by various factors such as specific implementation, hardware configuration, and dataset characteristics. For the training set, the MAE decreases as the number of hidden units increases, while R² approaches a more perfect model as the number of hidden units increases. For the test set, the MAE and R² converge at 32 hidden units, while MBE with 32 hidden units was also within a lower error range. Considering both accuracy and computational cost, this experiment selected 32 hidden units. The hyperparameters and network structure are defined in Table 6 and Figure 7b. To prevent overfitting, 70% of the electricity data from the source building was utilized as the training dataset, while the remaining 30% was reserved for validation. The hyperparameters were determined primarily through empirical experimentation, with extensive trials identifying convergence points.

3.5.3. CNN

For the CNN structure, there were two parameters that had a significant impact on the results: the size of the kernel of the convolutional layers and the size of the kernel of the max pooling layers. Generally, when using the same kernel size of the max pooling layer, the larger the size of the kernel of the convolutional layers, the longer the time required. When the size of the kernel of the convolutional layers was the same, the larger the kernel size of the max pooling layer, the longer the time required. When the size of the kernel of the max pooling layer was 2 and the size of the kernel of the max pooling layer was 24, the accuracy and computational cost of the results were compromised. The hyperparameters and network structure are defined in Table 7 and Figure 7c. To prevent overfitting, 70% of the electricity data from the target domain was utilized as the training dataset, while the remaining 30% was reserved for validation. The hyperparameters were determined primarily through empirical experimentation, with extensive trials identifying convergence points.

The performance of the CNN in this research was not as excellent as LSTM and the GRU, as its accuracy was not as high as the other two algorithms. But its computational cost was much lower than LSTM and the GRU. This means that CNNs have the potential to handle larger volumes of data.

3.5.4. Evaluation Indicators

The RMSE, MAE, and R², which were introduced in Section 2.4, were employed in this study to evaluate the variance between observed and predicted values, while the computation cost was discussed.

4. Results

4.1. Euclidean Distance and Pearson Correlation

Firstly, the K-means cluster was used to determine the cluster centroids of source and target buildings. The clustering results and cluster centroid are shown in Table 8.

Figure 8 uses a 3D plot to represent the three cluster centroids of each building. The location of the three cluster centroids of Robin and Bear was quite similar, especially in terms of airTemperature, windSpeed, and windDirection. There were many factors that affect the temperature, the most important of which were latitude and altitude. However, the information shows that the seaLvlPressure of two buildings were similar, indicating that the altitude could be similar, and it was possible that the two buildings were at the same latitude, resulting in similar temperatures.

A Pearson correlation analysis was conducted for Robin first (Table 9), revealing that electricity consumption was positively correlated with both timestamp and air temperature. In other words, timestamp and air temperature are critical feature predictors for Robin’s electricity consumption, as evidenced by their significant statistical associations. The correlation coefficient matrix is presented in Table 9, with all significance values (p) less than 0.001.

A Pearson correlation analysis was also performed between the features of the two source domain buildings (Rat and Bear) and Robin (Table 10). Specific attention was given to features from the source buildings exhibiting correlations with Robin-timestamp and Robin-airTemperature. The strength of the correlations was visually encoded through fill opacity, where lower opacity denotes stronger correlations and higher opacity indicates weaker correlations; only correlations with coefficients above 0.02 were annotated. Among the Rat features, only Rat-airTemperature demonstrated a strong correlation with Robin-airTemperature. For the Bear features, Bear-airTemperature, Bear-seaLvlPressure, Bear-windDirection, and Bear-windSpeed all exhibited significant correlations with Robin-airTemperature, especially for Bear-airTemperature. To validate the accuracy of the visualization, the Euclidean distances between the cluster centroids of the two source domain buildings (Bear and Rat) and those of the Robin target domain were computed (Table 11). The results demonstrate that all three cluster centroids of Bear exhibit consistently closer proximity to Robin compared to those of Rat.

4.2. Transfer Learning Results

Figure 9 and Table 12 presented the 389 data results of Robin predicted by LSTM, the CNN, and the GRU under optimal hyperparameters using Bear as the source domain (Bear-based). As illustrated in Figure 9, both LSTM and the GRU demonstrate high predictive accuracy, compared to CNN. For comparative analysis, Rat was employed as the source domain (Rat-based) to implement TL-based prediction using the network architectures of LSTM and the GRU.

The MAE is presented using boxplots (Figure 10).

In terms of computational cost, the LSTM network requires less time. Considering the balance between computational expense and achieved accuracy, the LSTM network achieved accuracy levels differing by less than 0.4% from the GRU network while incurring a 17% reduction in computational cost. Furthermore, the source domain selected by the method proposed in this study enabled Transfer Learning which, while maintaining the RMSE and R² within the high-accuracy range, achieved a 65% reduction in original computational costs. This demonstrates that the source domain chosen by the proposed method contributes substantially to reducing computational overhead while preserving prediction accuracy at a comparable level.

The source domain selection method proposed in this study is therefore well suited for applications requiring rapid response times and exhibiting low tolerance for prediction errors.

4.3. Negative Transfer

This study detects negative transfer in TL by comparing the performance baseline of simple models against transfer models. We constructed minimal LSTM and GRU models using parameters directly adopted from Table 4 and Table 5. A 10% subset (1754 data points) of the Rob-in_education_Zenia dataset was selected as the target task domain. The simple model results and the performance baseline established based on these results are presented in Table 13.

5. Discussion

This section includes two parts: key findings and limitations.

5.1. Key Findings

The key finding in the Euclidean distance and Pearson Correlation section is that Bear represents a more suitable source domain architecture than Rat or Robin in Transfer Learning applications. The supportive evidence is as follows: (1) According to Figure 8, the distributions of Robin and Bear demonstrate greater similarity compared to Bear. (2) According to Table 10, Pearson correlation analysis reveals that four features between Robin and Bear exhibit statistically significant correlations, whereas Rat shows only one feature with a weak correlation.

Regarding the Bear-based TL results (Table 12), regardless of which of the two network architectures was employed, the resulting RMSE was approximately 6.5 kWh, and the R² was around 0.92. When Rat-based TL was employed, the RMSE and R² values were comparable to those of the Bear-base TL results, both demonstrating relatively low prediction errors and accurately capturing the overall trend of the target data.

In terms of computational cost (Table 12), the LSTM network required less time. Considering the balance between computational expense and achieved accuracy, the LSTM network achieved accuracy levels differing by less than 0.4% from the GRU network while incurring a 17% reduction in computational cost. Furthermore, the source domain selected by the method proposed in this study enabled Transfer Learning which, while maintaining the RMSE and R² within the high-accuracy range, achieved a 65% reduction in original computational costs. This demonstrates that the source domain chosen by the proposed method contributes substantially to reducing computational overhead while preserving prediction accuracy at a comparable level.

The MAE is presented using boxplots (Figure 10). For the same source domain, the MAE values for the LSTM network were more concentrated and exhibited a lower mean value, indicating that the errors were clustered more tightly within a lower range. When Rat was used as the source domain, the distribution of MAE values was slightly wider, and the median value was higher compared to the Bear-based TL results. A more concentrated MAE boxplot signifies a more stable distribution of absolute errors and lower variability. In this respect, Bear-based TL demonstrated a slight advantage.

For detecting negative transfer, we analyzed the results in Table 11 and Table 12. Both GRU and LSTM models demonstrate superior performance after TL compared to the baseline indicators, indicating no evidence of negative transfer risk in this study.

The source domain selection method proposed in this study is therefore well suited for applications requiring rapid response times and exhibiting low tolerance for prediction errors.

5.2. Limitation

The method proposed in this paper predicts building electricity consumption using only basic weather data. More sophisticated features were not considered, such as advanced weather parameters (e.g., solar radiation), occupancy-related characteristics (e.g., human traffic), or finer temporal granularity (e.g., holiday indicators). The applicability of this approach warrants further investigation as feature complexity increases.

This study only demonstrates the methodology using cases from open-source databases. Future research will apply this approach to cases with richer building information characteristics and temporal dynamics to validate its generalizability.

6. Conclusions

This study proposes a novel, easy-to-understand, statistics-based visualization method for selecting source buildings in TL for target building electricity prediction. By integrating these two metrics, the proposed approach addresses the limitations of each method when used in isolation. The Pearson correlation coefficient captures the linear correlation between the source and target domains, while the Euclidean distance provides insights into the geometric proximity of data points in the Euclidean space. This dual-metric strategy ensures a more robust and accurate selection of source domain buildings, enhancing the performance of TL models. Furthermore, this study visualizes the results of Euclidean distance and Pearson correlation coefficient analyses. By leveraging visual representations, which align more effectively with human cognitive processing capabilities than numerical data alone, this approach facilitates rapid comprehension and enhances the interpretability of the selection outcomes.

The effectiveness of the proposed method was tested among the three data-driven models: Long Short-Term Memory (LSTM), the Gated Recurrent Units (GRU), and the Convolutional Neural Network (CNN). When employing Bear as the source domain for Transfer Learning (Bear-based TL), the computational cost was reduced by 65% while maintaining the RMSE and R² within the high-accuracy range. In contrast, using Rat as the source domain resulted in a slightly wider distribution of the MAE and a higher median compared to Bear-based TL. The more concentrated MAE boxplot observed for Bear-based TL indicates a more stable distribution of absolute errors with lower variability, demonstrating a marginal advantage for this approach.

Therefore, it could be concluded that the source building, selected by the method proposed by this research, significantly reduced computational costs while maintaining higher accuracy in TL results.

The findings highlighted the importance and interpretability of selecting source domain buildings in TL tasks in building energy prediction. The proposed method not only achieved high-accuracy prediction results but also reduced computational load, making it a practical and efficient solution for real-world applications. Meanwhile, this research contributed a valuable framework for improving the interpretability of source domain selection in TL energy-related predictive modeling.

Future work could explore the integration of additional metrics or advanced feature selection techniques to further refine the source domain selection process and extend the applicability of this approach to other domains beyond building energy prediction.

Author Contributions

Conceptualization, C.L. and L.X.; methodology, C.L.; writing—original draft preparation, C.L.; writing—review and editing, S.-H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

“Building Data Genome Project 2”. Available online: https://github.com/buds-lab/the-building-data-genome-project (accessed on 6 December 2023).

Acknowledgments

The authors acknowledge all the people who contributed their efforts to this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TL	Transfer Learning
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Unit
CNN	Convolutional Neural Network
COP28	United Nations Climate Change Conference 28
AI	Artificial Intelligence
SVM	Support Vector Machine
DS	source domain
TS	source domain task
DT	target domain
TT	target task
RF	Random Forest
RMSE	root-mean-square error
MAE	Mean Absolute Error
R²	coefficient of determination

Appendix A

Table A1. Reference for data-driven method applications for building energy prediction.

Reference	Year	Main Algorithms	Input	Output	Accuracy and Key Findings
[7]	2021	LSTM	1	LSTM weather data; energy consumption	LSTM weather data can provide more realistic simulations than meteorological stations and EMP files
[8]	2023	LSTM and GRU	8-month heating load	24 h heating load	RMSE improved by 37.78%
[9]	2023	CNN, GRU, LSTM	Time features, 1, solar radiation, and historical data	1 h electricity load	RMSE value reduced by 13.64–34.55%; an integrated energy consumption prediction model considering spatial
[10]	2023	Bidirectional gate, recurrent unit, CNN, and the residual connection	1-year heating and cooling load	1 h heating and cooling load	R²—90.74%; CVRMSE—19.24%
[11]	2020	RF	Building material information, 1	Heating and cooling loads	RMSE—6.97
[12]	2023	ANN, LSTM	Occupant characteristics, travel behavior variables, daily load distribution	Cooling, heating and electric load for different buildings considering EV charging load	R²—0.987
[13]	2023	LSTM, XGB	Cooling loads, meteorological data, and contextual information	Cooling loads of five building types	R²—35.68%, 25.36%, 32.44%, 73.91%, and 37.06%,
[14]	2023	LightGBM, RF, and LSTM	1, electric equipment power density, building material information	Building thermal load	CVRMSE, R², and computation time are 22.06%, 0.9267, and 758.8 s
[15]	2023	ANN, SVR, RF, XGB, LSTM model, hybrid CNN-LSTM model	History electricity load	Daily electricity load	For a building with a low dispersion level, the simple persistence model has satisfactory performance
[16]	2023	LSTM, GRU	24, 12, 6, and 2 h cooling and heating loads	1 h and 1-day cooling and heating load forecasting of building district energy system	CV-RMSE—14.51% and 11.95% for the 1 h-ahead forecasting of cooling and heating loads
[17]	2023	CNN, GRU, LSTM	Electricity demand	5 min electricity load	RMSE—0.0212
[18]	2023	CNN, LSTM, SVM	Electricity consumption	1-day electricity consumption	Relative error values—5.26	Combines the CNN with LSTM to improve performance when weather information is lost
[19]	2023	SVR, LSTM	Building cooling demands, 1	Building cooling demands	RMSE—4.33; MAPE—0.66
[20]	2023	CNN	Plug and light load, HVAC electric load, 1, timestamp	Building energy load	MAPE reduced by 7.52%, 4.96%, 6.59%, and 2.34%	An accuracy transfer model based on 1D-CNN
[21]	2023	BiLSTM, CNN	1 h electricity consumption	1-day and 2-day electricity consumption	MAE—9.20 × 10−⁴ (1-day) and 9.33 × 10⁻⁴ (2-day)
[22]	2023	RF	1, building cold load	1-day building cold load	RMSE—7.84
[23]	2022	LSTM, GRU, BILSTM, BIGRU	Outdoor temperature, relative humidity, and load	15 min building thermal load	MAPE—0.2%
[24]	2022	CNN, LSTM, BILSTM	Cooling loads and heating loads	Cooling loads and heating loads	RMSE—0.00874
[25]	2022	CNN, ANN, RF, support vector regression, and gradient boosting tree	Building information	Cooling and heating loads	R²—0.92
[26]	2022	ANN, SVM, ELM, RVM, MLR, RF, and BLR	Whole building’s electric energy consumption; hourly from September 1989 to February 1990	Whole building’s electric energy consumption	MAPE—1.06
[27]	2022	RF	1, personnel flow, historical load	Monthly cooling load	RMSE—2.8735
[28]	2022	RF, light GBM	1, hourly electricity consumption data for five years	Electricity consumption	CVRMSE—12.91
[29]	2022	GRU	Thermal load	Thermal load	Predict thermal load accurately when the meteorological parameters are missing; RMSE—14.63%
[30]	2022	GRU, RNN, CNN	Electricity load	Electricity load	RMSE—17.282
[31]	2021	RNN, LSTM	Cooling electricity data	Short-term (1 hour ahead) and long-term (1 day ahead) cooling load	RMSE—37.45; R²—0.9431
[32]	2021	LSTM	Short-term heating load, building information	Short-term heating load	CVRMSE—18.53
[33]	2021	LSTM, RNN, CNN	1, cooling load	Cooling load	CVRMSE—11.5
[34]	2021	LSTM, SVM, multilayer perceptron	Electric load	Day-ahead electric load	RMSE—10.66
[35]	2021	LSTM, RNN, RF	Electricity load	Short-term electricity load	MAE—4.80
[36]	2021	ANN, SVM, RF	1, short-term heating load	Short-term heating load	R²—0.90
[37]	2021	ANN, RF, and SVM	1, building cooling load	Building cooling load	MAE—9.83
[38]	2020	ANN, SVR, LSTM	1, heating, cooling, lighting loads, and BIPV power production	Heating, cooling, lighting loads, and BIPV power production	MAPE—9.01
[39]	2020	ANN, LSTM, RF, SVM, XGBoost	1, building information, daily electricity load	Daily electricity load	MAPE—10.69
[40]	2020	LSTM, GRU	Occupant data, plug load, time	Electric loads	RMSE—0.0741
[41]	2020	LSTM, CNN	1, scheduled related parameters and historical loads	Short-term electrical load forecasting	RMSE—6.24
[42]	2020	RF, SVM, ANN	1, hourly electricity consumption	Daily electricity load	MAPE—20

1: temperature, RH, dew point, pressure, wind direction, wind speed, and solar radiation.

References

United Nations Environment Programme; Global Alliance for Buildings and Construction. 2023 Global Status Report for Buildings and Construction 2024. Available online: https://max.book118.com/html/2024/0704/7003042110006130.shtm (accessed on 14 March 2025).
Wolf, J. 28 Countries Sign Buildings Breakthrough Agreement at COP28 2023. Available online: https://www.greenbuildingadvisor.com/article/28-countries-sign-buildings-breakthrough-agreement-at-cop28 (accessed on 14 March 2025).
Ali, D.M.T.E.; Motuzienė, V.; Džiugaitė-Tumėnienė, R. AI-Driven Innovations in Building Energy Management Systems: A Review of Potential Applications and Energy Savings. Energies 2024, 17, 4277. [Google Scholar] [CrossRef]
Olu-Ajayi, R.; Alaka, H. Building Energy Consumption Prediction Using Deep Learning. J. Build. Eng. 2022, 45, 103406. [Google Scholar] [CrossRef]
Sun, Y.; Haghighat, F.; Fung, B.C.M. A Review of The-State-of-the-Art in Data-Driven Approaches for Building Energy Prediction. Energy Build. 2020, 221, 110022. [Google Scholar] [CrossRef]
Janssen, M.; Van Der Voort, H.; Wahyudi, A. Factors Influencing Big Data Decision-Making Quality. J. Bus. Res. 2017, 70, 338–345. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, X.; Guo, S.; Xu, X.; Chen, J.; Wang, W. Urban micro-climate prediction through long short-term memory network with long-term monitoring for on-site building energy estimation. Sustain. Cities Soc. 2021, 74, 103227. [Google Scholar] [CrossRef]
Zhou, Y.; Li, X.; Liu, Y.; Wei, R. Transfer learning-based adaptive recursive neural network for short-term non-stationary building heating load prediction. J. Build. Eng. 2023, 76, 107271. [Google Scholar] [CrossRef]
Cao, W.; Yu, J.; Chao, M.; Wang, J.; Yang, S.; Zhou, M.; Wang, M. Short-term energy consumption prediction method for educational buildings based on model integration. Energy 2023, 283, 128580. [Google Scholar] [CrossRef]
Wang, L.; Xie, D.; Zhou, L.; Zhang, Z. Application of the hybrid neural network model for energy consumption prediction of office buildings. J. Build. Eng. 2023, 72, 106503. [Google Scholar] [CrossRef]
Seyedzadeh, S.; Rahimian, F.P.; Oliver, S.; Glesk, I.; Kumar, B. Data driven model improved by multi-objective optimisation for prediction of building energy loads. Autom. Constr. 2020, 116, 103188. [Google Scholar] [CrossRef]
Zhang, X.; Kong, X.; Yan, R.; Liu, Y.; Xia, P.; Sun, X.; Zeng, R.; Li, H. Data-driven cooling, heating and electrical load prediction for building integrated with electric vehicles considering occupant travel behavior. Energy 2023, 264, 126274. [Google Scholar] [CrossRef]
Lu, C.; Gu, J.; Lu, W. An improved attention-based deep learning approach for robust cooling load prediction: Public building cases under diverse occupancy schedules. Sustain. Cities Soc. 2023, 96, 104679. [Google Scholar] [CrossRef]
Chen, Y.; Ye, Y.; Liu, J.; Zhang, L.; Li, W.; Mohtaram, S. Machine Learning Approach to Predict Building Thermal Load Considering Feature Variable Dimensions: An Office Building Case Study. Buildings 2023, 13, 312. [Google Scholar] [CrossRef]
Hu, M.; Stephen, B.; Browell, J.; Haben, S.; Wallom, D.C.H. Impacts of building load dispersion level on its load forecasting accuracy: Data or algorithms? Importance of reliability and interpretability in machine learning. Energy Build. 2023, 285, 112896. [Google Scholar] [CrossRef]
Yu, H.; Zhong, F.; Du, Y.; Xie, X.E.; Wang, Y.; Zhang, X.; Huang, S. Short-term cooling and heating loads forecasting of building district energy system based on data-driven models. Energy Build. 2023, 298, 113513. [Google Scholar] [CrossRef]
Chiu, M.-C.; Hsu, H.-W.; Chen, K.-S.; Wen, C.-Y. A hybrid CNN-GRU based probabilistic model for load forecasting from individual household to commercial building. Energy Rep. 2023, 9, 94–105. [Google Scholar] [CrossRef]
Chen, P.; Chen, L. Prediction method of intelligent building electricity consumption based on deep learning. Evol. Intell. 2023, 16, 1637–1644. [Google Scholar] [CrossRef]
Liu, H.; Yu, J.; Dai, J.; Zhao, A.; Wang, M.; Zhou, M. Hybrid prediction model for cold load in large public buildings based on mean residual feedback and improved SVR. Energy Build. 2023, 294, 113229. [Google Scholar] [CrossRef]
Zhang, Y.; Zhou, Z.; Du, Y.; Shen, J.; Li, Z.; Yuan, J. A data transfer method based on one dimensional convolutional neural network for cross-building load prediction. Energy 2023, 277, 127645. [Google Scholar] [CrossRef]
Sekhar, C.; Dahiya, R. Robust framework based on hybrid deep learning approach for short term load forecasting of building electricity demand. Energy 2023, 268, 126660. [Google Scholar] [CrossRef]
Zou, Q.; Wang, L.; Xue, H.; Feng, X.; Qiao, B.; Dong, Y. Random Forest Algorithm Based Dynamictraining Set for Cold Load Prediction. In Proceedings of the 2023 8th Asia Conference on Power and Electrical Engineering (ACPEE), Tianjin, China, 14–16 April 2023; pp. 165–172. [Google Scholar]
Lv, R.; Yuan, Z.; Lei, B.; Zheng, J.; Luo, X. Building thermal load prediction using deep learning method considering time-shifting correlation in feature variables. J. Build. Eng. 2022, 61, 105316. [Google Scholar] [CrossRef]
Kavitha, R.J.; Thiagarajan, C.; Priya, P.I.; Anand, A.V.; Al-Ammar, E.A.; Santhamoorthy, M.; Chandramohan, P. Improved Harris Hawks Optimization with Hybrid Deep Learning Based Heating and Cooling Load Prediction on residential buildings. Chemosphere 2022, 309, 136525. [Google Scholar] [CrossRef] [PubMed]
Lu, J.; Zhang, C.; Li, J.; Zhao, Y.; Qiu, W.; Li, T.; Zhou, K.; He, J. Graph convolutional networks-based method for estimating design loads of complex buildings in the preliminary design stage. Appl. Energy 2022, 322, 119478. [Google Scholar] [CrossRef]
Li, K.; Zhang, J.; Chen, X.; Xue, W. Building’s hourly electrical load prediction based on data clustering and ensemble learning strategy. Energy Build. 2022, 261, 111943. [Google Scholar] [CrossRef]
Gao, Z.; Yu, J.; Zhao, A.; Hu, Q.; Yang, S. A hybrid method of cooling load forecasting for large commercial building based on extreme learning machine. Energy 2022, 238, 122073. [Google Scholar] [CrossRef]
Moon, J.; Rho, S.; Baik, S.W. Toward explainable electrical load forecasting of buildings: A comparative study of tree-based ensemble methods with Shapley values. Sustain. Energy Technol. Assess. 2022, 54, 102888. [Google Scholar] [CrossRef]
Ma, Z.; Wang, J.; Dong, F.; Wang, R.; Deng, H.; Feng, Y. A decomposition-ensemble prediction method of building thermal load with enhanced electrical load information. J. Build. Eng. 2022, 61, 105330. [Google Scholar] [CrossRef]
Liu, R.; Chen, T.; Sun, G.; Muyeen, S.M.; Lin, S.; Mi, Y. Short-term probabilistic building load forecasting based on feature integrated artificial intelligent approach. Electr. Power Syst. Res. 2022, 206, 107802. [Google Scholar] [CrossRef]
Chalapathy, R.; Khoa, N.L.D.; Sethuvenkatraman, S. Comparing multi-step ahead building cooling load prediction using shallow machine learning and deep learning models. Sustain. Energy Grids Netw. 2021, 28, 100543. [Google Scholar] [CrossRef]
Lu, Y.; Tian, Z.; Zhang, Q.; Zhou, R.; Chu, C. Data augmentation strategy for short-term heating load prediction model of residential building. Energy 2021, 235, 121328. [Google Scholar] [CrossRef]
Sha, H.; Moujahed, M.; Qi, D. Machine learning-based cooling load prediction and optimal control for mechanical ventilative cooling in high-rise buildings. Energy Build. 2021, 242, 110980. [Google Scholar] [CrossRef]
Jeong, D.; Park, C.; Ko, Y.M. Short-term electric load forecasting for buildings using logistic mixture vector autoregressive model with curve registration. Appl. Energy 2021, 282, 116249. [Google Scholar] [CrossRef]
Bellahsen, A.; Dagdougui, H. Aggregated short-term load forecasting for heterogeneous buildings using machine learning with peak estimation. Energy Build. 2021, 237, 110742. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, Y.; Wang, D.; Liu, X. Comparison of machine-learning models for predicting short-term building heating load using operational parameters. Energy Build. 2021, 253, 111505. [Google Scholar] [CrossRef]
Zhang, C.; Li, J.; Zhao, Y.; Li, T.; Chen, Q.; Zhang, X.; Qiu, W. Problem of data imbalance in building energy load prediction: Concept, influence, and solution. Appl. Energy 2021, 297, 117139. [Google Scholar] [CrossRef]
Luo, X.J.; Oyedele, L.O.; Ajayi, A.O.; Akinade, O.O. Comparative study of machine learning-based multi-objective prediction framework for multiple building energy loads. Sustain. Cities Soc. 2020, 61, 102283. [Google Scholar] [CrossRef]
Cao, L.; Li, Y.; Zhang, J.; Jiang, Y.; Han, Y.; Wei, J. Electrical load prediction of healthcare buildings through single and ensemble learning. Energy Rep. 2020, 6, 2751–2767. [Google Scholar] [CrossRef]
Das, A.; Annaqeeb, M.K.; Azar, E.; Novakovic, V.; Kjærgaard, M.B. Occupant-centric miscellaneous electric loads prediction in buildings using state-of-the-art deep learning methods. Appl. Energy 2020, 269, 115135. [Google Scholar] [CrossRef]
Chitalia, G.; Pipattanasomporn, M.; Garg, V.; Rahman, S. Robust short-term electrical load forecasting framework for commercial buildings using deep recurrent neural networks. Appl. Energy 2020, 278, 115410. [Google Scholar] [CrossRef]
Walker, S.; Khan, W.; Katic, K.; Maassen, W.; Zeiler, W. Accuracy of different machine learning algorithms and added-value of predicting aggregated-level energy performance of commercial buildings. Energy Build. 2020, 209, 109705. [Google Scholar] [CrossRef]
Yuan, Y.; Chen, Z.; Wang, Z.; Sun, Y.; Chen, Y. Attention mechanism-based transfer learning model for day-ahead energy demand forecasting of shopping mall buildings. Energy 2023, 270, 126878. [Google Scholar] [CrossRef]
Wang, Z.; Hong, T.; Piette, M.A. Building thermal load prediction through shallow machine learning and deep learning. Appl. Energy 2020, 263, 114683. [Google Scholar] [CrossRef]
Jebli, I.; Belouadha, F.-Z.; Kabbaj, M.I.; Tilioua, A. Prediction of Solar Energy Guided by Pearson Correlation Using Machine Learning. Energy 2021, 224, 120109. [Google Scholar] [CrossRef]
Jung, S.-M.; Park, S.; Jung, S.-W.; Hwang, E. Monthly Electric Load Forecasting Using Transfer Learning for Smart Cities. Sustainability 2020, 12, 6364. [Google Scholar] [CrossRef]
Peng, C.; Tao, Y.; Chen, Z.; Zhang, Y.; Sun, X. Multi-source transfer learning guided ensemble LSTM for building multi-load forecasting. Expert Syst. Appl. 2022, 202. [Google Scholar] [CrossRef]
Iglesias, F.; Kastner, W. Analysis of Similarity Measures in Times Series Clustering for the Discovery of Building Energy Patterns. Energies 2013, 6, 579–597. [Google Scholar] [CrossRef]
Bozinovski, S. Reminder of the first paper on transfer learning in neural networks, 1976. Informatica 2020, 44. [Google Scholar] [CrossRef]
Wu, X.; Khorshidi, H.A.; Aickelin, U.; Edib, Z.; Peate, M. Transfer Learning to Enhance Amenorrhea Status Prediction in Cancer and Fertility Data with Missing Values. In Artificial Intelligence; Productivity Press: New York, NY, USA, 2020; pp. 233–260. ISBN 978-0-429-31741-5. [Google Scholar]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 1345–1459. [Google Scholar] [CrossRef]
Wu, Q.; Wu, H.; Zhou, X.; Tan, M.; Xu, Y.; Yan, Y.; Hao, T. Online Transfer Learning with Multiple Homogeneous or Heterogeneous Sources. IEEE Trans. Knowl. Data Eng. 2017, 29, 1494–1507. [Google Scholar] [CrossRef]
Santosh, T.; Ichim, O.; Grabmair, M. Zero-shot transfer of article-aware legal outcome classification for European Court of human rights cases. arXiv 2023, arXiv:2302.00609. [Google Scholar]
Gao, Y.; Ruan, Y.; Fang, C.; Yin, S. Deep learning and transfer learning models of energy consumption forecasting for a building with poor information data. Energy Build. 2020, 223, 110156. [Google Scholar] [CrossRef]
Zhang, L.; Wen, J.; Li, Y.; Chen, J.; Ye, Y.; Fu, Y.; Livingood, W. A review of machine learning in building load prediction. Appl. Energy 2021, 285, 116452. [Google Scholar] [CrossRef]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
Li, L.; Su, X.; Bi, X.; Lu, Y.; Sun, X. A novel Transformer-based network forecasting method for building cooling loads. Energy Build. 2023, 296, 113409. [Google Scholar] [CrossRef]
Thilker, C.A.; Bacher, P.; Bergsteinsson, H.G.; Junker, R.G.; Cali, D.; Madsen, H. Non-linear grey-box modelling for heat dynamics of buildings. Energy Build. 2021, 252, 111457. [Google Scholar] [CrossRef]
Harb, H.; Boyanov, N.; Hernandez, L.; Streblow, R.; Müller, D. Development and validation of grey-box models for forecasting the thermal response of occupied buildings. Energy Build. 2016, 117, 199–207. [Google Scholar] [CrossRef]
Cho, K. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D convolutional neural networks and applications: A survey. Mech. Syst. Signal Process. 2021, 151, 107398. [Google Scholar] [CrossRef]

Figure 1. A general definition of transfer learning.

Figure 2. A statistical analysis of 86 articles about machine learning for building energy consumption prediction published between 2020 and 2023.

Figure 3. Basic LSTM storage unit.

Figure 4. GRU basic unit.

Figure 5. Research method and K-means algorithm.

Figure 6. K-means cluster; Blue/Yellow/Green dots: Cluster centroids; Blue/Yellow/Green circles: Cluster extents.

Figure 7. Pre-training network structure. (a) Pre-training LSTM network structure; (b) pre-training GRU network structure; (c) pre-training CNN structure.

Figure 8. Cluster centroid-6D. (a) Cluster centroid for Group 1; (b) cluster centroid for Group 2; (c) cluster centroid for Group 3; Green dot: ZX Projection; Blue dot: YZ Projection; Cyan dot: XY Projection; according to the affinity matrix, the correlation between seaLvlPressure and the predicted results was low, so this feature was ignored.

Figure 9. TL results.

Figure 10. MAE boxplots.

Table 1. The difference between feature extraction and fine-tuning.

	Feature Extraction	Fine-Tuning
Definition	Keeping the feature extraction layer of the pre-trained model unchanged and only training the newly added output layer. This approach leverages the generic features learned by the pre-trained model on large-scale datasets while customizing the model through the new output layer. Usually, this involves freezing the weights and extracting useful features.	Using the entire pre-trained model as the initial model and then training the entire model using the new dataset. This means that all parameters of the model, including the weights of the pre-trained model and the new output layer, will be relearned. Usually, this involves freezing the earlier layers.
Workflow	1. Load the pre-trained model. 2. Freeze the feature extraction layer. 3. Add a new output layer. 4. Train only the newly added output layer.	1. Load the pre-trained model. 2. Modify the output layer to suit the new task. 3. Load the new dataset. 4. Train the entire model.
Advantages	Usually requires less data and computing resources.	Enables the model to fully adapt to the data distribution of the new task.
Disadvantages	Prone to overfitting.	Requires a significant amount of new data and computational costs.

Table 2. Research building information.

Building_Id	Spaceusage	Sqm	Location	Electricity
Rat_education_Lynn	Education-K-12 School	7785.3	US/Eastern	29–260
Bear_education_Pattie	Education	8032.9	US/Pacific	92–329
Robin_education_Zenia (Target)	Education-College Laboratory	6337.0	Europe/London	52–466

Table 3. Research inputs and units.

timestamp	airTemperature	seaLvlPressure	windDirection	windSpeed	Electricity
Serial value	°C	kPa	°	m/s	kWh

Table 4. Component Matrix.

	Component
	1	2	3
timestamp	0.200	−0.101	0.949
airTemperature	0.916	0.148	0.010
Dewtemperature	0.962	0.034	−0.023
seaLvlPressure	−0.439	−0.596	0.204
windDirection	−0.306	0.719	0.233
windSpeed	−0.241	0.789	0.063

Table 5. Hyperparameters for pre-training LSTM model.

Slover	Learning Rate Initial	Batch Size	Epoch	Momentum
Adam	1 × 10⁻³	128	30	0.9
LearnRateSchedule	LearnRateDropFactor	LearnRateDropPeriod	Hidden Unit
piecewise	0.1	400	8

Table 6. Hyperparameters for pre-training GRU model.

Slover	Learning Rate Initial	Batch Size	Epoch	Verbose
Adam	1 × 10⁻³	128	30	False
LearnRateSchedule	LearnRateDropFactor	LearnRateDropPeriod	Hidden Unit
piecewise	0.1	400	32

Table 7. Hyperparameters for pre-training CNN model.

Slover	Learning Rate Initial	Batch Size
Adam	0.005	128
Epoch	Verbose	Kernel
30	False	[57]

Table 8. K-means clustering results and cluster centroid for source and target buildings.

Building	Group	timestamp	airTemperature	seaLvlPressure	windDirection	windSpeed	Electricity
Rat_ Edu_ Lynn	1	42,402.25	1.6	1018.7	318	5.7	166.37
	2	42,409.84	7	1016.7	187	3.9	144.82
	3	42,408.71	4.6	1019.8	30	2.4	147.04
Building	Group	timestamp	airTemperature	seaLvlPressure	windDirection	windSpeed	Electricity
Bear_ Edu_ Pattie	1	42,735.66	12.6	1017.4	97	2.8	170.6491
	2	42,520.93	15.8	1016.6	239	4.3	149.309
	3	42,952.07	15.8	1016.3	230	4	198.1281
Building	Group	timestamp	airTemperature	seaLvlPressure	windDirection	windSpeed	Electricity
Rob_ Edu_ Zenia	1	42,752.38	9.1	1019.4	179	3.7	219.7798
	2	42,502.41	12.5	1014.1	204	4.2	155.1575
	3	42,984.45	13.9	1015.3	233	4.2	234.195

Table 9. Affinity matrix for Robin.

		Robin-timestamp	Robin-airTemperature	Robin-seaLvlPressure	Robin-windDirection	Robin-windSpeed
Robin_education_Zenia	Pearson correlation	0.600 **	0.329 **	0.085 **	0.035 **	0.072 **
	Sig. (2-tailed)	<0.001	<0.001	<0.001	<0.001	<0.001
	N	17,544	17,544	17,544	17,544	17,544

** Correlation is significant at the 0.01 level (2-tailed).

Table 10. Affinity matrix for Robin, Rat, and Bear.

		Robin-timestamp	Robin-airTemperature			Robin-timestamp	Robin-airTemperature
Rat-airTemperature	Pearson correlation	0.113 **	0.754 **	Bear-airTemperature	Pearson correlation	0.01	0.619 **
Rat-airTemperature	Sig. (2-tailed)	<0.001	<0.001	Bear-airTemperature	Sig. (2-tailed)	0.06	<0.001
Rat-seaLvlPressure	Pearson correlation	0.039 **	−0.114 **	Bear-seaLvlPressure	Pearson correlation	−0.045 **	−0.346 **
Rat-seaLvlPressure	Sig. (2-tailed)	<0.001	<0.001		Sig. (2-tailed)	<0.001	<0.001
Rat-windDirection	Pearson correlation	(0.01)	−0.055 **	Bear-windDirection	Pearson correlation	−0.028 **	0.392 **
Rat-windDirection	Sig. (2-tailed)	0.47	0.00		Sig. (2-tailed)	<0.001	<0.001
Rat-windSpeed	Pearson correlation	−0.060 **	(0.01)	Bear-windSpeed	Pearson correlation	−0.077 **	0.305 **
Rat-windSpeed	Sig. (2-tailed)	<0.001	0.32		Sig. (2-tailed)	<0.001	<0.001

** Correlation is significant at the 0.01 level (2-tailed). The strength of the correlations was visually encoded through fill opacity, where lower opacity denotes stronger correlations and higher opacity indicates weaker correlations; only correlations with coefficients above 0.02 were annotated.

Table 11. Euclidean distance matrix of Robin.

	Group 1		Group 2		Group 3
Rat	380.559	Rat	583.382	Rat	616.758
Bear	97.131	Bear	471.277	Bear	48.610

Table 12. Evaluation indicators of the results.

	Bear-GRU	Bear-LSTM	Rat-GRU	Rat-LSTM	Bear-CNN
RMSE	6.50	6.21	6.47	6.19	33.21
R²	0.926	0.922	0.928	0.923	0.7
Computation cost(s)	29	24	83	71	8

Table 13. Negative transfer performance and baseline.

	GRU	LSTM	Base-GRU	Base-LSTM
RMSE	6.75	7.73	<6.75	<7.73
R²	0.91	0.90	>0.91	>0.90
Computation cost(s)	82	69	-	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, C.; Xia, L.; Hong, S.-H. A Method for Selecting the Appropriate Source Domain Buildings for Building Energy Prediction in Transfer Learning: Using the Euclidean Distance and Pearson Coefficient. Energies 2025, 18, 3706. https://doi.org/10.3390/en18143706

AMA Style

Luo C, Xia L, Hong S-H. A Method for Selecting the Appropriate Source Domain Buildings for Building Energy Prediction in Transfer Learning: Using the Euclidean Distance and Pearson Coefficient. Energies. 2025; 18(14):3706. https://doi.org/10.3390/en18143706

Chicago/Turabian Style

Luo, Chuyi, Liang Xia, and Sung-Hugh Hong. 2025. "A Method for Selecting the Appropriate Source Domain Buildings for Building Energy Prediction in Transfer Learning: Using the Euclidean Distance and Pearson Coefficient" Energies 18, no. 14: 3706. https://doi.org/10.3390/en18143706

APA Style

Luo, C., Xia, L., & Hong, S.-H. (2025). A Method for Selecting the Appropriate Source Domain Buildings for Building Energy Prediction in Transfer Learning: Using the Euclidean Distance and Pearson Coefficient. Energies, 18(14), 3706. https://doi.org/10.3390/en18143706

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Method for Selecting the Appropriate Source Domain Buildings for Building Energy Prediction in Transfer Learning: Using the Euclidean Distance and Pearson Coefficient

Abstract

1. Introduction

2. Background

2.1. Pearson Correlation Coefficient and Euclidean Distance

2.2. Transfer Learning

2.2.1. Classifications of Transfer Learning

2.2.2. Transfer Learning in Building Energy Prediction

2.2.3. Source Domain Selection Method

2.3. Black-Box (Data-Driven Method)

2.3.1. Data-Driven Method Applications for Building Energy Prediction: From Machine Learning to Deep Learning

2.3.2. LSTM

2.3.3. GRU

2.3.4. CNN

2.4. Evaluation Indicators

3. Methodology

3.1. Proposed Model

3.2. Dataset

3.3. Data Process

3.4. Data Mining

3.4.1. Component Analysis

3.4.2. Calculate Euclidean Distance

3.4.3. Calculate Pearson Coefficient

3.4.4. Combining Euclidean Distance and Pearson Coefficient

3.5. Data-Driven Model Construction

3.5.1. LSTM

3.5.2. GRU

3.5.3. CNN

3.5.4. Evaluation Indicators

4. Results

4.1. Euclidean Distance and Pearson Correlation

4.2. Transfer Learning Results

4.3. Negative Transfer

5. Discussion

5.1. Key Findings

5.2. Limitation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI