You are currently viewing a new version of our website. To view the old version click .
Information
  • Article
  • Open Access

4 July 2025

Limited Data Availability in Building Energy Consumption Prediction: A Low-Rank Transfer Learning with Attention-Enhanced Temporal Convolution Network

,
,
and
1
School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
2
Jiangsu Province Engineering Research Center of Construction Carbon Neutral Technology, Suzhou University of Science and Technology, Suzhou 215009, China
3
Jiangsu Province Key Laboratory of Intelligent Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China
4
School of Architecture and Urban Planning, Suzhou University of Science and Technology, Suzhou 215009, China
This article belongs to the Special Issue AI Applications in Construction and Infrastructure

Abstract

Building energy consumption prediction (BECP) is the essential foundation for attaining energy efficiency in buildings, contributing significantly to tackling global energy challenges and facilitating energy sustainability. However, while data-driven methods have emerged as a crucial method to solving this complex problem, the limited availability of data presents a significant challenge to model training. To address this challenge, this paper presents an innovative method, named Low-Rank Transfer Learning with Attention-Enhanced Temporal Convolution Network (LRTL-AtTCN). LRTL-AtTCN integrates the attention mechanism with temporal convolutional network (TCN), improving the ability of extracting global and local dependencies. Moreover, LRTL-AtTCN combines low-rank decomposition, reducing the number of parameters during the transfer learning process with similar buildings, which can achieve better transfer performance in the limited data case. Experimentally, we conduct a comprehensive evaluation across three forecasting horizons—1 week, 2 weeks, and 1 month. Compared to the horizon-matched baseline, LRTL-AtTCN cuts the MAE by 91.2%, 30.2%, and 26.4%, respectively, and lifts the 1-month R2 from 0.8188 to 0.9286. On every horizon it also outperforms state-of-the-art transfer-learning methods, confirming its strong generalization and transfer capability in BECP.

1. Introduction

Nowadays, building energy consumption represents roughly 34% of global final energy use, while carbon dioxide emissions associated with buildings account for 37% of total global emissions [1]. The primary objective of enhancing building energy efficiency is to optimize energy utilization and reduce unnecessary consumption, especially for large public buildings. Generally, achieving energy efficiency in buildings mainly depends on the implementation of effective building management strategies [2], while the development of these strategies is contingent upon building energy consumption prediction (BECP), which plays a pivotal role in understanding and anticipating dynamic changes in energy consumption patterns and guiding energy management decisions. Effective BECP is a method that analyzes and predicts the dynamics of energy use and assists managers in anticipating trends in energy demand and optimizing energy allocation, significantly reducing the overall level of energy consumption in buildings [3]. Personalized control within BECP facilitates the intelligent and precise operation of energy-consuming equipment, thereby ensuring the stable functioning of building systems while maintaining interior comfort. Moreover, BECP functions as a valuable auxiliary tool in the realms of intelligent control [4], demand-side management strategies [5], and fault detection and diagnosis [6], exhibiting a diverse range of applications across these domains.
There are two main kinds of methods for BECP: physics-based modeling methods and data-driven methods. Physics-based modeling methods involve creating explicit models based on parameters that reflect the intrinsic characteristics of the building, enabling the prediction of its energy consumption [2]. However, the construction of physical models for buildings necessitates a substantial set of detailed operational parameters—including thermophysical characteristics, energy equipment specifications, and system settings such as occupancy schedules and air conditioning zoning—while also relying heavily on extensive measurement data to ensure model accuracy [7]. Consequently, the implementation of physics-based modeling methods faces significant challenges in achieving accurate predictions. In contrast, data-driven methods utilize machine learning to predict building energy consumption by developing models based on extensive historical data [2], resulting in substantial advancements in the field of BECP. The most common data-driven methods can be categorized into two primary kinds: statistical learning methods and deep learning methods. The field of statistical learning applied in BECP encompasses a variety of techniques, including Support Vector Regression (SVR) [8], Multiple Linear Regression(MLR) [9], Random Forest (RF) [10], and Extreme Gradient Boosting (XGBoost) [11], and so on. These methods provide several advantages, including low computational resource requirements, rapid training and prediction speeds, and high interpretability of the models. Meanwhile, deep learning methods employed in BECP can be categorized into several types: Recurrent Neural Networks (RNNs) [12], such as LSTM [13] and GRU [14], are particularly effective for processing sequential data and adept at capturing both long- and short-term dependencies; Convolutional Neural Networks (CNNs) [15], such as Temporal Convolutional Network (TCN) [16], excel at extracting local hidden features and managing short-term fluctuations; and Attention-based methods [17], are proficient in processing long sequences of data and capturing global dependencies. Due to the powerful ability of feature learning, deep learning has achieved remarkable results and has been widely applied in BECP. However, like statistical learning methods, deep learning relies heavily on substantial historical data. In scenarios where such data is limited—such as in newly constructed buildings or in the absence of a comprehensive energy monitoring system—this limitation poses a significant barrier to the effective implementation of BECP.
The challenge of limited data availability has become a central focus in research related to BECP. To address this problem, several perspectives can be considered. Data augmentation [18] and synthetic data generation [19] have proven to be viable strategies for enhancing the quality and diversity of datasets, thereby improving the models’ prediction performance, and addressing the issue of limited data availability. However, these methods often rely on techniques like noise injection and data recombination, which may introduce inconsistencies and fail to accurately represent real-world conditions, thus hindering the model’s applicability in practical scenarios. Moreover, Transfer learning [20] is an effective method for addressing the issue of limited data availability in BECP. By leveraging data from different buildings or regions, transfer learning applies the acquired knowledge to the target domain with limited data, thereby reducing the dependency on large-scale datasets. In practical applications, transfer learning can swiftly adapt to the variations in energy consumption characteristics of new buildings or regions, maintaining high prediction accuracy in diverse and dynamic environments. Despite the increasing use of transfer learning in BECP, existing methods often require extensive parameter updates to adapt to the target domain, which hinders practical deployment, especially under limited data scenarios [21]. Furthermore, cross-domain discrepancies—stemming from differences in HVAC configurations, occupant behavior, and energy management strategies—introduce significant domain shifts, compromising generalization and increasing the risk of overfitting [20]. These challenges reveal two major gaps in current BECP methods: (1) the lack of lightweight and adaptive transfer mechanisms that reduce parameter overhead. (2) insufficient robustness to domain shifts under data limited.
To address overparameterization associated with traditional transfer learning, mitigating inter-domain discrepancies among buildings, and to enhance the performance of BECP in scenarios characterized by limited data availability, we propose a Low-Rank Attention-Enhanced Temporal Convolutional Transfer Learning method (LRTL-AtTCN). By integrating the attention mechanism with the local feature extraction capability of the Temporal Convolutional Network (TCN), we establish a temporal feature learning framework with the ability to model global dependencies. This framework addresses the inherent defect of the limited receptive field of TCN, thereby enhancing the feature extraction ability of the model. Meanwhile, the low-rank transfer learning framework we propose can reduce the parameter redundancy while maintaining the cross-domain invariant features, and this characteristic has been verified through experiments. For the first time, LRTL systematically expounds how the low-rank constraint can simultaneously achieve the dual objectives of minimizing the differences across building domains and preserving key knowledge during the transfer process, providing a viable solution to these challenges. In summary, the LRTL-AtTCN framework employs an attention-enhanced TCN architecture, which captures both local and global temporal dependencies simultaneously through a dynamic weighting mechanism; the low-rank transfer mechanism significantly reduces the parameter transfer cost while maintaining the prediction performance across different building domains. The principal contributions of this paper are as follows:
  • We propose integrating the attention mechanism with TCN, named AtTCN. AtTCN enhances the model’s ability to dynamically capture global dependencies with the attention mechanism, addressing the limitations of TCN’s local receptive field, and capturing both local and global hidden features more effectively.
  • By introducing low-rank decomposition, we propose a novel transfer learning-based AtTCN method, named LRTL-AtTCN, which significantly reduces the number of parameters during the transfer learning process and achieves better prediction performance in conditions of limited data, demonstrating great adaptability across different building types.
  • We evaluate LRTL-AtTCN, focusing on its performance in BECP in the source domain and the target domain with limited data, and specifically investigate the impact of the experimental results of the attention mechanism and low-rank decomposition. The code in this paper is available at https://github.com/Fechos/LRTL-AtTCN (accessed on 21 May 2025).
The paper is organized as follows. Section 1 introduces the background and motivation of building energy consumption prediction. Section 2 reviews existing work, including physics-based models, data-driven methods, and methods for addressing data scarcity. Section 3 formally defines the problem setting, and describes the proposed methodology, including data preprocessing, feature engineering, source domain model training, and the low-rank transfer learning framework. Section 4 presents the experimental setup, evaluation metrics, and result analysis. Finally, Section 5 concludes the paper and outlines potential future directions.

3. Methodology

3.1. Problem Statement

BECP represents a typical time series prediction problem focused on predicting future energy consumption trends based on existing historical data. By capturing the time series features of building energy consumption, the model offers a scientific foundation for energy management and optimization, facilitating more efficient energy control and promoting sustainable development.
Time series predicting can be described as follows: given the historical time series D 1 : t = X 1 : t , y 1 : t , X 1 : t R t   ×   m , y 1 : t R t   ×   1 , the future time series is then predicted, where t is the length of the historical time series and m is the number of external features. External features, also known as exogenous features or external inputs, encompass variables relevant to the target time series but not directly derived from its historical data. These features generally capture external environmental conditions and timestamp elements, including meteorological parameters (e.g., temperature, relative humidity, atmospheric pressure) and temporal attributes (e.g., specific dates, day of the year, holiday status). The information provided by these external features plays a critical role in enhancing the predictive accuracy and robustness of models associated with the target time series. In the field of BECP, X 1 : t represents external features series of building consumption data, and y 1 : t is building consumption series. With modeling P y t + 1 : t + Ω X 1 : t , y 1 : t , it allows us to make predictions about future values, where Ω denotes the prediction horizon, defined as the number of future time steps over which a forecasting model generates predictions.
In this paper, we employ transfer learning to address the challenge of limited data availability in BECP. Transfer learning [20] is an effective technique for improving the performance of a target domain model through transfer knowledge from the source domain that, while disparate, is relevant to the target domain. In this paper, we designate buildings with abundant data as the source domain and new buildings in similar environments, or those with limited data, as the target domain. By utilizing transfer learning, we aim to transfer the knowledge acquired in the source building domain to enhance prediction performance in the target building domain.
The transfer learning in building energy prediction is defined as follows: assuming there are source and target building domains. The source building domain data is defined as follows: D S = X 1 : t S S , y 1 : t S S , where X 1 : t S S is the external features series of the source building domain, y 1 : t S S is the source building consumption series, and t S is the source domain length of historical data. The target building domain data is defined as follows: D T = X 1 : t T T , y 1 : t T T , where X 1 : t T T is external features series of the target building domain, y 1 : t T T is the target building consumption series, and t T is the target domain length of historical data, where t T is considerably shorter than t S in transfer learning.
The objective is to enhance the prediction capability of y 1 : t T T with respect to the target building domain D T by leveraging the knowledge embedded in the source building domain D S . Given a source building domain model f S : ( X 1 : t S S , y 1 : t S S ) ( y t S + 1 : t S + Ω S ) , the objective is to learn a target building domain model f T : ( X 1 : t T T , y 1 : t T T ) ( y t T + 1 : t T + Ω T ) , where f T minimizes the prediction error in the target building domain by effectively utilizing the information from the source building domain:
f T = a r g m i n f T ( E ( X 1 : t T T , y 1 : t T T ) ~ D T ( L f T ) + λ E ( X 1 : t S S , y 1 : t S S ) ~ D S ( L f S ) )    
where L denotes the loss function, and λ is the hyperparameter, which can balance the losses between the source building domain and the target building domain.

3.2. Data Processing

The overall framework of method in this paper is shown in Figure 1. It includes the following five steps: (1) Step 1: data acquisition. Energy consumption data in this paper were collected from several office buildings in Shanghai, China, and the weather data were collected from a weather station located within 5 km of the office. (2) Step 2: data processing. Statistical methods are employed to identify and process outliers in energy consumption data, thereby ensuring the accuracy and completeness of the dataset. Furthermore, the pertinent data are normalized to eliminate discrepancies between different scales and enhance the efficacy of model training. (3) Step 3: feature extraction. The data features were selected utilizing Granger causality analysis, which examines the causal relationship between external features and energy consumption. This method was employed to identify the key features that exert a substantial influence on energy consumption prediction. (4) Step 4: modeling and transfer. We utilize AtTCN for training of the source building domain and save the optimal model. Subsequently, we employ LRTL to transfer the trained model to the small-scale dataset on the target building domain, which enables the transfer of knowledge from the source building domain to the target building domain. (5) Step 5: evaluation: A comparative analysis is conducted between the source building domain and the target domain against other methods, with further performance evaluation of LRTL-AtTCN in Experiments and analysis.
Figure 1. The overall framework of methodology.
A dataset of energy consumption from air-conditioned buildings was analyzed, derived from several office buildings in Shanghai, China, which vary in factors such as building area and HVAC system (Dataset source: Prof. Peng Xu’s group, School of Mechanical and Energy Engineering, Tongji University—De ang Technology, https://mp.weixin.qq.com/s/h9L0MFOQG5SghEZeDu92EA, accessed on 21 May 2025). The data was collected from 1 January 2015, at 12:00 a.m. to 31 December 2016, at 11:00 p.m., and sampled intervals were one hour. Due to a lack of data collection on 29 February 2016, the total number of observed samples for each office building amounted to 17,520. Apart from the energy consumption data of the buildings, the dataset encompasses meteorological information within a 5 km radius of each building, including variables such as temperature, dew point, relative humidity, air pressure, and wind speed. Additionally, the inclusion of timestamp features is shown to influence energy consumption features. These timestamp features are derived from the time series energy consumption data and encompass the hour of the day, the day of the week, the day of the month, the day of the year, the month of the year, the week of the year, and holiday status. The features previously delineated are referred to as external features used for energy consumption prediction, as mentioned in the Problem statement. The features are described in Table 2.
Table 2. Features information.
This study investigates a representative energy consumption curve from the dataset, aiming to uncover characteristic consumption patterns and temporal dynamics. Figure 2 presents four subplots, each depicting the load curves from Thursday to Saturday across different seasons, where the horizontal axis in each subplot represents the time axis, while the vertical axis corresponds to energy consumption data. These subplots illustrate the seasonal variations in energy consumption, which demonstrate that energy consumption patterns remain unaffected by seasonal factors.
Figure 2. Energy consumption trend for four seasons.
Figure 3 illustrates the differences in energy consumption patterns across weekdays, non-working days, and holidays, highlighting the variations in energy usage among these categories. By comparing the load curves for weekdays, non-working days, and holidays, the subplot reveals the influence of occupancy and operational patterns on energy consumption.
Figure 3. Energy consumption trend for weekday, weekend, and holiday.
The presence of unidentified variables during the data collection and transmission stages can lead to the introduction of outliers and missing values in the dataset. To ensure precision performance and dependability of the model training process, it is essential to preprocess the data. We carefully identify and address any outliers within the dataset to maintain its quality. First, the distribution of values for each feature is checked using the interquartile range (IQR) method, which helps determine the range of outliers by calculating the upper and lower quartiles. For the identified outliers, we employ various processing strategies, including the removal of outliers that significantly deviate from the normal range, the use of linear interpolation to fill in missing values, and the truncation of extreme outliers.

3.3. Feature Engineering

Feature engineering involves developing meaningful features to serve as inputs for a model. To determine whether external features contribute to the prediction of energy consumption series, we apply the statistical method of Granger causality analysis to analyze the causal relationships between external and energy consumption series. In the Results and analysis, we provide experiment results of how different feature combinations affect the model’s prediction performance.
Granger causality analysis [47] defines baseline model y t = α 0 + i = 1 n α i y t i + ϵ t and extended model y t = α 0 + i = 1 n α i y t i + j = 1 n β j x t j + ϵ t , where y t and x t refer to time series of length t , α 0 is constant term, α i is the inherent tendency of y to recur at the same value, the coefficient β j represents the lagged term of x , n is the maximum lag coefficient, and ϵ t is the error term. The model uses the F-test to verify the causal relationship between the two variables:
F = R S S 1 R S S 2 L R S S 2 t 2 L 1
R S S 1 is the residual sum of squares of the baseline model; R S S 2 is the residual sum of squares of the extended model; L is the number of lags; t is the length of time series; F cdf is the cumulative distribution function of the F-distribution. Then we calculate p , which means the corresponding p-value:
p = 1 F c d f F , t , L 2 L 1
We determine whether to reject the null hypothesis based on p-value, where γ is the significance level:
  • When p < γ , we reject the null hypothesis, indicating that x has a Granger causal influence on y .
  • When p γ , we accept the null hypothesis, indicating that x has no Granger causal influence on y .
In the field of BECP, x t represents the feature data related to energy consumption mentioned earlier, while y t represents the energy consumption data. Following an analysis of the building energy consumption data and feature data, a selection of input variables was made, which included temperature, dew point, relative humidity, air pressure, wind speed, hour of the day, and holiday status. Meanwhile, the energy consumption data from the previous 24 h was selected as the time window.
To circumvent the potential for dominant effects resulting from discrepancies in the scales of the input variables, to enhance numerical stability, and to ensure the fairness of the model with respect to individual features, we employ Z-score normalization to normalize the input feature values:
x s c a l e = x μ σ
where the term x s c a l e represents the normalized eigenvalue, and the mean and standard deviation of the data to be scaled are represented by μ and σ , respectively.

3.4. Low-Rank Attention-Enhanced Temporal Convolutional Transfer Learning (LRTL-AtTCN)

3.4.1. Attention-Enhanced Temporal Convolution Network(AtTCN)

The TCN [16] applies convolutional neural networks to time series tasks. By using causal and dilated convolutions, it captures long-range dependencies efficiently and processes the entire sequence in parallel, unlike RNNs that work sequentially.
TCN typically includes causal convolutions to avoid using future information, dilated convolutions to expand the receptive field, and residual connections to stabilize training and reduce gradient vanishing in deep networks.
Given the input building energy consumption data X 1 : t , y 1 : t , X 1 : t R t   ×   m , y 1 : t R t   ×   1 , TCN is computed as follows:
Z n l = σ i = 0 k 1 W i l Z t m + 1 i l 1 + b l + Z n l 1
where Z n l denotes the input to the l t h layer at the time step t . Specifically, when l = 0 , Z n 0 = X 1 : t , y 1 : t ; k represents the size of the convolution kernel, and W i l is the weight of the l t h convolution kernel in the l t h layer. d i is the dilation factor, and Z t ( m + 1 ) i l 1 is the output of the l 1 t h layer at the time step t = ( m + 1 ) i , b l is the bias term for the l t h layer, and σ · is the activation function.
The attention mechanism [48] enables the model to capture global dependencies by jointly attending to all time steps in the input sequence. This parallel computation improves the modeling of long-range relationships. The multi-head attention mechanism extends this method by performing multiple attention operations in parallel, thereby enhancing the model’s representational capacity. The multi-head attention mechanism can be described as follows:
Given the input building energy consumption data X 1 : t , y 1 : t = ( X 1 , y 1 ) , ( X 2 , y 2 ) , , ( X t , y t ) , ( X i , y i ) R m + 1 , where ( X i , y i ) represents the i t h input vector, and m + 1 is the dimension of the input vector. The model uses a multi-head mechanism to compute self-attention scores independently and in parallel. Each attention head, h e a d i , is computed as follows:
h e a d i = A t t e n t i o n Q i , K i , V i
Q i , K i , V i represent the query vector, key vector, and value vector, respectively, where d k is the dimension of Q i , K i , V i .
Concatenate the outputs of these heads:
M u l t H e a d Q , K , V = C o n c a t h e a d 1 , h e a d 2 , , h e a d n W O
where W O is the linear transformation matrix of the output, where C o n c a t indicates concatenating the outputs of all heads along the feature dimension.
In BECP, TCN effectively extracts local features by expanding its receptive field through convolution. However, energy consumption is influenced by complex long-term factors such as seasonal patterns, equipment cycles, and environmental conditions, which exhibit intricate temporal dependencies. Due to its local receptive field, TCN struggles to dynamically focus on such variable factors, limiting its capacity to model long-range dependencies and reducing prediction accuracy.
To address this issue, we propose an innovative Attention-Enhanced Temporal Convolution Network (AtTCN) that integrates the MHA with TCN. As illustrated in Figure 4, this integration enables the model to capture global features and long-term dependencies more effectively. MHA allows dynamic and parallel modeling of temporal relationships, significantly improving the model’s ability to process complex patterns in energy consumption data.
Figure 4. AtTCN architecture.
The extraction of local hidden features is achieved through the utilization of multiple convolutional kernels H n l :
H n l = i = 0 k 1 W i l Z n ( m + 1 ) i l 1 + b l
The extracted local hidden features, designated as H n l , are subjected to a linear transformation to generate Q , K , V , whose vectors are subsequently computed by MHA. In conjunction with the residual connection structure, the final output, Z n l . AtTCN is computed as follows:
Z n l = σ MultiHead H t l W Q l , W K l , W V l + Z n l 1
W Q l , W K l , W V l are the learned matrices that are employed to transform the convolved hidden features H n l to Q , K , V . While retaining TCN’s advantages in local feature extraction and efficiently and flexibly capturing short-term fluctuations, TCN is capable of further capturing long-term trends, thus improving the prediction accuracy and model robustness. With retaining the advantages of TCN in local feature extraction and capturing short-term fluctuations efficiently and flexibly, AtTCN can further capture long-term trends, thereby addressing the complexity of building energy consumption data influenced by multidimensional factors. This enables enhanced prediction accuracy and model robustness.
The AtTCN architecture integrates MHA and TCN to achieve synergy on three levels. First, in terms of feature granularity, TCN captures local patterns through convolution, while MHA refines these features using global context—forming a multi-scale representation akin to a “microscope-telescope” system. Second, for dynamic receptive field adjustment, the attention weights effectively serve as learnable, time-varying dilation factors, allowing the model to adapt its receptive field based on input characteristics. Third, regarding gradient flow, the residual structure creates dual pathways for gradient propagation: preserving local feature stability via TCN and enabling global error correction through the attention branch.

3.4.2. Low-Rank Transfer Learning (LRTL)

Low-rank decomposition [49] is a matrix factorization technique that represents a matrix as the product of lower-rank matrices. It enables dimensionality reduction while preserving key structural information, thus reducing storage and computational costs. This method effectively captures latent features in the data and is widely used in tasks such as data compression and noise reduction.
Given matrix M R a × b , the objective of the low-rank decomposition is to identify two matrices, U R a × r and V R r × b , such that the matrix A can be approximated as the product of U and V :
M U V
where r is a number less than m i n a , b .
To address data scarcity challenges in BECP, we propose a transfer learning method with low-rank decomposition (LRTL). This method introduces an incremental matrix derived from pre-trained source model weights, enabling rapid adaptation to new buildings by fine-tuning target features. Compared to traditional methods, LRTL reduces parameter dimensionality while preserving essential source domain features, which alleviates overfitting and improves generalization. Moreover, low-rank decomposition mitigates domain differences, ensuring stable and accurate predictions even with limited data.
Given the pre-trained model weight matrix W R a × b , we introduce an incremental matrix W 0 R a × b for weight updates:
W = W + W 0
W is the transferred model weight matrix, where d represents the input dimension, and k is the output dimension.
Directly optimizing W 0 in such a high-dimensional space is computationally expensive and prone to overfitting. To mitigate these issues, we apply a low-rank decomposition to W 0 , expressing it as the product of two smaller matrices A R a × r and B R r × b , where r is a rank that is much smaller than min a , b . This decomposition reduces the complexity of the update by capturing the most significant variations in a lower-dimensional subspace. Intuitively, this is equivalent to representing a large, complex transformation as two simpler, consecutive transformations.
Accordingly, the updated weight matrix becomes
W = W + B A
Given a source building domain model f S : ( X 1 : t S S , y 1 : t S S ) ( X t S + 1 : t S + Ω S , y t S + 1 : t S + Ω S ) , the objective is to learn a target building domain model f T : ( X 1 : t T T , y 1 : t T T ) ( X t T + 1 : t T + Ω T , y t T + 1 : t T + Ω T ) , where f T minimizes the prediction error in the target building domain by effectively utilizing the information from the source building domain. Low-Rank Transfer Learning is computed as follows:
f T = a r g m i n f T E ( X 1 : t T T , y 1 : t T T ) ~ D T L f T ; W + B A + λ E ( X 1 : t S S , y 1 : t S S ) ~ D S L f S ; W
where L denotes the loss function, and λ is the hyperparameter that balances the losses between the source building domain and the target building domain.
As illustrated in Figure 5, we freeze the model’s hidden feature extraction layers and fine-tune only the final prediction layer. This preserves the model’s ability to capture local and global features while adapting the prediction layer to the target domain. This method reduces training costs and mitigates overfitting, improving transfer learning efficiency and generalization. Experimental results show significant performance gains in predicting building energy consumption with varying feature distributions.
Figure 5. LRTL architecture.
LRTL reduces the number of parameters during model optimization while preserving essential features from the source building’s energy consumption task. This enables rapid adaptation to the target building’s characteristics, improving transfer efficiency. The low-rank structure enhances robustness and mitigates overfitting, maintaining high prediction accuracy even with limited data.

3.4.3. Algorithm

Algorithm 1 systematically details the LRTL-AtTCN for BECP, structured into two stages: the source domain training stage and the transfer learning stage. In the initial stage, the AtTCN model is trained on the source building data, which captures localized temporal features through convolutional layers and enhances global hidden features using multi-head attention mechanism with residual connections. In the subsequent Transfer Learning Stage, the pretrained AtTCN model is adapted to the target building data using the Low-Rank Transfer Learning (LRTL) technique. This adaptation refines the model weights to account for domain-specific variations, employing low-rank decomposition to facilitate knowledge transfer.
Algorithm 1 LRTL-AtTCN method for energy consumption prediction
Input: Source building energy consumption data ( X 1 : t S S , y 1 : t S S ) R t × m + 1 and target energy consumption data ( X 1 : t T T , y 1 : t T T ) R t × m + 1 , where X is external features, y is building energy consumption data, n is the number of the time steps, and m is the number of features, the last column is the target variable;
Output: predicted values for the target variable.
 1:Source Domain Training Stage
 2:  Initialize AtTCN model;
 3:  for episode = 1 to EPISODES do
 4:    for each layer l do
 5:       H n l = i = 0 k 1 W i l · Z n ( m + 1 ) i l 1 + b l ;
 6:       Z n l = σ MultiHead H t l W Q l , W K l , W V l + Z n l 1 ;
 7:    end for
 8:     o u t p u t = W · Z n l + b ;
 9:    Loss computation: L ( θ ) = M S E ( o u t p u t , t r u e _ y ) ;
10:    Update AtTCN using: θ θ η θ L ;
11:  end for
12:  Save AtTCN model;
13:
14:Transfer Learning Stage
15:  Initialize LRTL model and load AtTCN model;
16:  for episode = 1 to EPISODES do
17:    compute the forward pass using the model weights ( W + B A );
18:    loss computation: L ( θ ) = M S E ( o u t p u t , t r u e _ y ) ;
19:    Update B using: B B η B L ;
20:    Update A using: A A η A L ;
21:  end for
22:  Save transferred model

4. Results

4.1. Experiments Setting

This section presents a comprehensive evaluation of LRTL-AtTCN. In the experiment, the source domain data consists of weather and energy consumption data from a building collected between 2016 and 2017, with a data sampling interval of one hour. To simulate a scenario with limited data availability, we selected data from another similar building over three different time spans (one month, two weeks, and one week) as the data source for the target domain. The features utilized in the experimental study were determined through Granger causality analysis. To facilitate comparison with the performance of the subsequent transfer model, the energy consumption data of different buildings are normalized uniformly.
The experimental analysis is conducted from three aspects: (1) Method performance: We evaluate the performance of the AtTCN model in the source domain, and compare it with XGBoost [11], LSTM [13], CNN-LSTM [50], TCN [16], and iTransformer [51]. Then, we analyze LRTL’s transfer performance in the target domain, and compare it with some state-of-the-art transfer learning methods; (2) Method details analysis: A detailed examination of the LRTL-AtTCN model is performed from two aspects, including feature combination and rank setting; (3) Ablation analysis: To comprehensively analyze the impact of different components on the overall performance of the LRTL-AtTCN, we conduct a comparative evaluation with several widely adopted models, including LSTM, CNN-LSTM, TCN, and iTransformer. To gain deeper insights into the integration of TCN and the attention mechanism, we conduct experimental comparisons by varying the placement of the attention layers, aiming to further investigate the influence of the attention mechanism in LRTL-AtTCN.

4.2. Evaluation Metrics

To comprehensively evaluate the prediction performance in the source building domain and the transfer performance in the target building domain, a series of key metrics were employed, including the mean absolute error, MAE, measuring the average magnitude of errors, mean square error, MSE, penalizing larger errors more heavily, and coefficient of determination, R2, indicating how well the model captures the variance, which provides insight into its prediction performance:
M A E = 1 N i = 1 N y i y i ^
M S E = 1 N i = 1 N y i y i ^ 2
M A P E = 1 n i = 1 n y i y i ^ y i × 100 %
R 2 = 1 i = 1 N y i y i ^ 2 i = 1 N y i y ¯ 2
where y i denotes the true value at the time step i , y i ^ denotes the predicted value at the same the time step, N denotes the total number of samples, and y ¯ denotes the mean actual value across all the time steps.

4.3. Results and Analysis

4.3.1. Method Performance Comparison

To evaluate the prediction performance of LRTL-AtTCN in the source domain, we compared AtTCN with other traditional and state-of-the-art methods. The results, reported as mean values with 95% confidence intervals, are presented in Table 3. Results shows that AtTCN outperforms XGBoost significantly, reducing MAE from 0.2092 to 0.0830, MSE from 0.1161 to 0.0234, MAPE from 143.69% to 35.21%, and increasing R2 from 0.8497 to 0.9674. Compared with LSTM, CNN-LSTM, and TCN, AtTCN achieves lower MAE, MSE, and MAPE, indicating superior prediction accuracy. Against TCN, AtTCN’s attention mechanism enables it to capture latent features better, reducing MAE and MSE by 0.0216 and 0.0044, respectively, while improving R2 by 0.0063 and MAPE by 16.25%. Notably, AtTCN surpasses iTransformer in MAE, MSE, and MAPE with an equivalent R2 of 0.9674, demonstrating its high efficiency in a streamlined structure. Thus, AtTCN exhibits robust prediction performance with lower error rates and better model fitness in the source domain.
Table 3. Source Domain Performance of Different Methods (Mean ± 95% CI).
Figure 6 illustrates the prediction results of different methods, where the horizontal axis represents the time step, the vertical axis represents normalized energy consumption, the blue line represents the actual normalized energy consumption data, and the yellow line denotes the method’s predictions without inverse normalization in each subplot.
Figure 6. The prediction results for different methods.
Compared to the traditional method XGBoost, AtTCN model demonstrates a much closer fit to the true values, particularly in capturing peaks and valleys, where it exhibits significantly higher accuracy, while XGBoost shows larger deviations from the true values, especially at the extremes, resulting in noticeable errors. This indicates that AtTCN possesses a stronger prediction performance for complex time series data. Compared to LSTM and CNNLSTM, three methods all show relatively small deviations from the true values, especially in regions with minimal fluctuations. However, in specific local areas, such as the peaks and valley, AtTCN’s predictions align more closely with the true values, resulting in smaller prediction errors, which suggests AtTCN has better prediction performance. In comparison with TCN, AtTCN’s prediction curve is smoother and aligns more closely with the true values, especially in the peak and valley areas. The addition of the attention mechanism allows AtTCN to capture global temporal dependencies dynamically and more effectively, leading to higher precision, particularly at critical data points. Moreover, AtTCN and iTransformer exhibit highly similar prediction curves, both showing a strong fit in the peaks and valleys, with their curves almost overlapping the true values. Therefore, AtTCN consistently outperforms other models in various comparisons. In complex time series prediction tasks, AtTCN not only captures the overall trends of the data but also leverages the attention mechanism to handle local fluctuations with greater precision.
Figure 7 also provides insights into the performance of various methods from a different perspective. The horizontal axis represents normalized actual values, while the vertical axis denotes normalized predicted values. The solid blue line indicates perfect alignment between predicted and actual values, the blue dashed lines represent a ±20% margin of error, and the shaded area defines an acceptance interval (AI) where the prediction error is within 20%. A similar AI-based visualization was also employed by Li et al. [52], supporting the validity of this method. Compared to the traditional method XGBoost, AtTCN demonstrates more stable predictions, while XGBoost exhibits significant deviations from the true values, particularly showing larger fluctuations and errors in areas where predictions diverge. Compared to LSTM and CNNLSTM, AtTCN provides slightly better performance in capturing local details, especially at extreme points and regions with high variability. When compared to TCN, AtTCN’s prediction points are smoother and more tightly clustered around the ideal prediction line, particularly in peak and valley regions, where the attention mechanism significantly enhances the model’s ability to capture both global and local dependencies dynamically, further improving prediction performance. Moreover, compared to iTransformer, both models show a strong fit near the “perfect prediction” line, with nearly identical performance on complex time series data. Therefore, AtTCN delivers superior results in terms of prediction performance and stability, particularly in complex time series prediction tasks.
Figure 7. The distribution of prediction errors for different methods.
We evaluated the transfer performance of LRTL-AtTCN in the target domain, using the performance of AtTCN trained directly in the target domain as a benchmark. Subsequently, we compared the performance of fine-tuned AtTCN, LRTL-AtTCN, and other transfer learning methods.
Table 4 summarizes the transfer performance of different methods across various time series under limited data conditions. Due to the small sample size in the target building domain, R2 values for one-week predictions were omitted, and results are reported as means with 95% confidence intervals. Compared with the fine-tuned AtTCN, LRTL-AtTCN consistently achieves lower MAE, MSE, and MAPE across all forecasting horizons, indicating its superior capacity to capture temporal patterns with greater accuracy and stability. In short-term forecasting, LRTL-AtTCN outperforms aRATL by a considerable margin, particularly in MAPE and MAE, suggesting enhanced precision in low-data regimes. While aRATL improves over longer horizons, LRTL-AtTCN maintains stronger data fitting and robustness. Compared with DCORAL, LRTL-AtTCN exhibits markedly reduced MAE and MSE in both one- and two-week settings, especially excelling in MSE reduction. Moreover, DCORAL requires domain alignment steps, incurring additional transfer costs. Fine-Grained RNN with transfer learning and Freeze-LSTM demonstrate competitive performance in one-week predictions, with low errors and narrow confidence intervals, but their effectiveness diminishes at longer horizons. In contrast, LRTL-AtTCN delivers the most balanced and consistent performance across all time spans.
Table 4. Transfer Performance of Different Methods Across Various Time Series (Mean ± 95% CI).
In summary, LRTL-AtTCN excels in scenarios with limited target data, outperforming state-of-the-art baselines by effectively reducing redundant parameters via low-rank decomposition. This results in lower adaptation cost and better generalization across various prediction intervals.
Figure 8 illustrates the prediction results of different transfer learning methods, where the horizontal axis represents the time step, the vertical axis represents normalized energy consumption, the blue line represents the actual normalized energy consumption data, and the yellow line denotes the method’s predictions without inverse normalization in each subplot. Compared to finetune and aRATL, LRTL consistently achieves better overall fitting performance. Although finetune captures the general trend of energy consumption, there is a significant deviation between the predicted and actual values. aRATL demonstrates satisfactory performance in fitting intermediate values; however, its ability to fit peak and trough values is relatively weak. In comparison to DCORAL, LRTL performs similarly in terms of overall fitting quality, but outperforms DCORAL in fitting trough values, while its fitting of peak values is slightly inferior to that of DCORAL. The Freeze LSTM method exhibits a moderate ability to capture both peak and trough values but suffers from noticeable deviations in certain intermediate periods, indicating instability in its predictions. On the other hand, the Fine-Grained RNN with Transfer Learning achieves a more stable fitting of general trends, with relatively accurate intermediate values. However, like aRATL, it struggles with accurately predicting extreme peaks and troughs. Overall, LRTL maintains its superior performance by consistently demonstrating better fitting of both intermediate and extreme values, reinforcing its robustness and adaptability across different time steps.
Figure 8. The prediction results for different transfer learning methods.
Figure 9 illustrates the absolute errors of various transfer learning methods: finetune, aRATL, DCORAL, Freeze LSTM, Fine-Grained RNN with TL, and LRTL. The horizontal axis represents the models, while the vertical axis shows the absolute error values. Each boxplot depicts the distribution of errors for the respective method, including the interquartile range (IQR), median (orange line), and range (whiskers). Among all methods, LRTL demonstrates the lowest overall error distribution, with a smaller interquartile range (IQR) and a lower median absolute error compared to the other methods. The compactness of the box for LRTL indicates that its predictions are consistently closer to the true values, reflecting higher accuracy and stability across various time steps. Finetune and aRATL have much larger error distributions, with both higher medians and significantly wider IQRs, implying that these methods struggle with achieving consistent accuracy, leading to higher variability in their predictions. While DCORAL and Freeze LSTM show reduced error variability compared to finetune and aRATL, their medians remain higher than that of LRTL, indicating inferior overall prediction accuracy. Fine-Grained RNN with TL achieves a comparable error distribution to LRTL but has a slightly larger median and IQR, suggesting that it performs well but still falls short of LRTL’s precision. Both finetune and aRATL exhibit larger error ranges and potential outliers, further demonstrating their inconsistency. In contrast, LRTL maintains a tight error range, highlighting its robustness and reliability across different conditions. The boxplot effectively demonstrates that LRTL outperforms all other methods in terms of accuracy and stability. Its minimal error distribution, low median, and compact IQR underline its superior ability to generalize and adapt, making it the most reliable method for transfer learning in this context.
Figure 9. Absolute Prediction Error Comparison for Transfer Learning Models.

4.3.2. Method Details Exploration

We analyze the impact of different feature combinations on the LRTL-AtTCN’s overall performance, and feature combination and results of the experiment are shown in Table 5. The inclusion of weather and timestamp features consistently improves model performance across all evaluation metrics, including MAE, MSE, MAPE, and R2. A comparison between F2 and F3 reveals that weather features contribute more substantially to performance enhancement than timestamp features, as evidenced by the lower MAE and MAPE values in F2. Interestingly, although F4 includes more features than F5, F5—constructed based on the results of Granger causality analysis—achieves the best overall performance. The Granger analysis reveals a strong causal relationship between weather factors and energy consumption, whereas only two timestamp-related features (“hour of the day” and “holiday status”) exhibit a meaningful causal link. The inclusion of non-causally related timestamp features in F4 appears to introduce noise, slightly impairing performance compared to the more selective F5. These results demonstrate that Granger causality analysis serves as an effective feature selection method, helping to refine the input space and improve model accuracy across all four metrics, particularly in reducing MAPE.
Table 5. Feature Combination Impact on LRTL-AtTCN Transfer Performance (1-Month).
Table 6 presents the parameters and transfer metrics for the LRTL-AtTCN method across different ranks. It is evident that the number of fine-tuning parameters during the transfer stage of LRTL-AtTCN has been significantly reduced. In conjunction with the results depicted in Table 7, the focus is on evaluating the impact of different rank settings (rank = 4, rank = 8, rank = 16, rank = 32, and rank = 64) on the method’s performance, where MAE, MSE, and R2 values are represented using blue, green, and orange bars, respectively, with the R2 values additionally connected by a dashed trend line for clarity.
Table 6. LRTL-AtTCN Parameters and Transfer Parameters Metrics.
Table 7. Different Rank Setting on LRTL-AtTCN Transfer Performance (1-Month).
Table 6 and Table 7 illustrate the impact of different rank settings on model parameterization and transfer performance of LRTL-AtTCN. As shown in Table 6, increasing the rank results in a higher number of transfer parameters, thereby reducing the parameter reduction ratio, and increasing model complexity during the transfer stage. However, as observed in Table 7, this increase in model complexity does not lead to improved performance; on the contrary, performance metrics such as MAE, MSE, MAPE, and R2 deteriorate as the rank grows. We speculate that larger ranks introduce more trainable parameters, which increases the risk of overfitting on the limited target domain data. This overfitting may cause the model to rely excessively on source-domain-specific features, hindering its generalization ability and, thus, reducing transfer effectiveness.
In contrast, lower-rank configurations (e.g., rank = 4) achieve better overall performance while significantly reducing the number of transfer parameters—achieving a parameter reduction ratio as high as 0.9844. These results suggest that low-rank transfer not only helps control model size and computational costs but also enhances generalization by mitigating overfitting. Therefore, selecting an appropriate rank is essential to balance model complexity and transfer performance, especially in data-scarce scenarios.

4.3.3. Ablation Analysis

Table 8 presents an ablation study that compares various transfer strategies for time series prediction across different horizons. Among them, direct training, which uses only limited target data, performs worst due to its lack of transfer mechanisms and high susceptibility to overfitting, especially under data-scarce conditions. It is also not repeated across runs, so statistical measures are not reported. Similarly, the unadapted strategy, applying a source-trained model directly to the target domain, suffers from domain misalignment and lacks generalization, with no statistical variability reported due to its deterministic inference. The fine-tuned approach improves performance by adjusting the model with target data, partially reducing domain gaps, but it still faces overfitting risks due to limited data. The unified training method combines source and target data, increasing data volume but struggling with domain shifts. In contrast, the proposed LRTL method consistently outperforms others by enforcing a low-rank constraint on transfer layers. This reduces parameters, retains source knowledge, and enhances adaptation. LRTL is especially effective for longer horizons, achieving an R2 of 0.9286 and the lowest MAE and MAPE in the 1-month task, confirming the robustness of low-rank transfer under limited data conditions.
Table 8. Ablation Study on Transfer Strategies for Time Series Prediction.
Table 9 illustrates the impact of different source domain models on the transfer performance of LRTL-AtTCN over a one-month prediction horizon. Compared with LRTL-LSTM and LRTL-CNNLSTM, LRTL-AtTCN achieves notably better predictive accuracy, indicating that the attention-enhanced TCN structure is more effective in capturing transferable temporal patterns. During the transfer process, the attention mechanism in LRTL-AtTCN enables dynamic focus on salient features from the source domain, which facilitates more robust adaptation and mitigates performance degradation. When compared to LRTL-TCN, LRTL-AtTCN consistently maintains superior performance, suggesting that the integration of attention further enhances the temporal representation capability beyond conventional convolutional architectures. Additionally, although LRTL-iTransformer also incorporates attention, LRTL-AtTCN demonstrates greater adaptability and robustness, due to its hybrid design that leverages both temporal convolution and attention mechanisms. Consequently, AtTCN exhibits exceptional performance in LRTL. The integration of the attention mechanism not only enhances the model’s prediction accuracy but also strengthens its adaptability during the transfer process, allowing for more effective migration of features learned in the source domain to the target domain.
Table 9. Source Domain Model Impact on LRTL-AtTCN Transfer Performance (1-Month).
We conducted a comparative analysis of the impact of different sequences of convolutional and attention layers on the performance of the LRTL-AtTCN, which is shown in Table 10. The results demonstrate that the sequence of these layers exerts a substantial influence on model performance. Notably, the configuration of “conv1 + conv2 + attention” yields the best performance in terms of MAE, MSE, and R2. This suggests that performing convolutional operations prior to attention enables more effective feature extraction, thereby allowing the attention mechanism to better focus on critical temporal patterns. Moreover, this configuration also achieves the lowest MAPE, indicating enhanced robustness and relative accuracy across varying scales. These findings highlight the crucial role of architectural design in improving the model’s capability to generalize and accurately predict energy consumption patterns.
Table 10. Layer Arrangements Impact on LRTL-AtTCN Transfer Performance (1-Month).

4.3.4. Discussion

In this paper, we propose LRTL-AtTCN, a novel transfer learning method addressing limited data availability in BECP. By integrating the attention mechanism with TCN, the method effectively captures both local and global hidden features. Experimental results show that AtTCN achieves prediction accuracy comparable to state-of-the-art models while maintaining superior stability, validating the effectiveness of combining attention with TCN. In the transfer stage, low-rank decomposition is introduced, significantly reducing model complexity without compromising transfer performance. We attribute this improvement to the reduction in model parameters and redundancy. Additionally, we thoroughly evaluate the method by analyzing feature combinations, rank settings, network components, and layer sequences. These findings confirm that LRTL-AtTCN effectively tackles the data scarcity challenge in BECP.

5. Conclusions

This paper proposes LRTL-AtTCN, a transfer learning method that integrates low-rank decomposition with TCN and incorporates attention mechanisms to address the performance degradation caused by limited data in building energy consumption prediction. During the pre-training phase, the model leverages TCN to extract local temporal features and employs attention mechanisms to dynamically focus on global dependencies, thereby enhancing the modeling of complex temporal patterns. In the transfer phase, low-rank decomposition significantly reduces the number of model parameters, simplifies data representation, and effectively mitigates domain discrepancies, improving adaptation efficiency. Experimental results demonstrate that the proposed method exhibits strong robustness and generalization across different building scenarios, offering a practical solution to challenges such as domain shift and parameter redundancy commonly encountered in existing prediction methods.
While the proposed method performs well in similar building scenarios, its performance may be limited in extreme domain shift scenarios. When there are significant differences between the target and source buildings in terms of building type, climatic environment, HVAC systems, etc., the mismatch in feature distributions between the source and target domains can lead to the “negative transfer” problem in transfer learning. Future research can enhance the method’s adaptability to heterogeneous building data by introducing domain adaptation techniques.

Author Contributions

Material preparation, data collection, analysis, and the writing of the first draft of the manuscript were performed by B.W. The manuscript was reviewed and edited by Q.F. Resources and project administration were handled by Y.L. Supervision and formal analysis were conducted by K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by National Key R&D Program of China (No. 2020YFC2006602), Foundation of Engineering Research Center of Construction Carbon Neutral Technology of Jiangsu Province (No. JZTZH2023-0402), National Natural Science Foundation of China (No. 62372318, No. 62102278, No. 62072324), University Natural Science Foundation of Jiangsu Province (No. 21KJA520005), and the Science and Technology Project of Suzhou Water under grants 2023008.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The code is available at https://github.com/Fechos/LRTL-AtTCN (accessed on 21 May 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

VariablesMeaning
D Dataset
X External features series of building energy consumption series
x An external feature column of the building energy consumption series
x s c a l e The normalized eigenvalue
μ , σ The mean of data, the standard deviation of data
y Building energy consumption series
t The length of time series
m Number of external features
Ω The length of the predict length
S Source domain
T Target domain
λ The hyperparameter that balances the losses between the source building domain and the target building domain
α 0 Constant term
α i The   inherent   tendency   of   y to recur at the same value
β j The   lagged   term   of   x
L The maximum lag coefficient
ϵ t The error term
p The corresponding p-value
γ Significance level
k The size of the convolution kernel
l Layer index
θ Model parameters
W Model weight matrix
A , B Low-rank matrices for adaption
r Matrix rank

References

  1. Sulkowska, M.S.N.; Nugent, A.; Nugent, A.; Vega, L.A.; Carrazco, C. Global Status Report for Global Status Report for Buildings and Construction; UN Environment Programme: Nairobi, Kenya, 2024. [Google Scholar]
  2. Chen, Y.; Guo, M.; Chen, Z.; Chen, Z.; Ji, Y. Physical Energy and Data-Driven Models in Building Energy Prediction: A Review. Energy Rep. 2022, 8, 2656–2671. [Google Scholar] [CrossRef]
  3. Amasyali, K.; El-Gohary, N.M. A Review of Data-Driven Building Energy Consumption Prediction Studies. Renew. Sust. Energ. Rev. 2018, 81, 1192–1205. [Google Scholar] [CrossRef]
  4. Olu-Ajayi, R.; Alaka, H.; Owolabi, H.; Akanbi, L.; Ganiyu, S. Data-Driven Tools for Building Energy Consumption Prediction: A Review. Energies 2023, 16, 2574. [Google Scholar] [CrossRef]
  5. Zhao, Y.; Zhang, C.; Zhang, Y.; Wang, Z.; Li, J. A Review of Data Mining Technologies in Building Energy Systems: Load Prediction, Pattern Identification, Fault Detection and Diagnosis. Energy Build. 2020, 1, 149–164. [Google Scholar] [CrossRef]
  6. Darwazeh, D.; Duquette, J.; Gunay, B.; Wilton, I.; Shillinglaw, S. Review of Peak Load Management Strategies in Commercial Buildings. Sustain. Cities Soc. 2022, 77, 103493. [Google Scholar] [CrossRef]
  7. Qiu, S.; Li, Z.; Pang, Z.; Zhang, W.; Li, Z. A Quick Auto-Calibration Method Based on Normative Energy Models. Energy Build. 2018, 172, 35–46. [Google Scholar] [CrossRef]
  8. Zhong, H.; Wang, J.; Jia, H.; Mu, Y.; Lv, S. Vector Field-Based Support Vector Regression for Building Energy Consumption Prediction. Appl. Energy 2019, 242, 403–414. [Google Scholar] [CrossRef]
  9. Chen, S.; Zhou, X.; Zhou, G.; Fan, C.; Ding, P.; Chen, Q. An Online Physical-Based Multiple Linear Regression Model for Building’s Hourly Cooling Load Prediction. Energy Build. 2022, 254, 111574. [Google Scholar] [CrossRef]
  10. Rana, M.; Sethuvenkatraman, S.; Goldsworthy, M. A Data-Driven Method Based on Quantile Regression Forest to Forecast Cooling Load for Commercial Buildings. Sustain. Cities Soc. 2022, 76, 103511. [Google Scholar] [CrossRef]
  11. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Association for Computing Machinery, New York, NY, USA, 13 August 2016; pp. 785–794. [Google Scholar]
  12. Chalapathy, R.; Khoa, N.L.D.; Sethuvenkatraman, S. Comparing Multi-step Ahead Building Cooling Load Prediction Using Shallow Machine Learning and Deep Learning Models. Sustain. Energy Grids. 2021, 28, 100543. [Google Scholar] [CrossRef]
  13. Chen, X.; Qiu, X.; Zhu, C.; Liu, P.; Huang, X. Long Short-Term Memory Neural Networks for Chinese Word Segmentation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1197–1206. [Google Scholar]
  14. Dey, R.; Salem, F.M. Gate-variants of Gated Recurrent Unit (GRU) Neural Networks. In Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems, Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
  15. Lara-Benítez, P.; Carranza-García, M.; Luna-Romera, J.M.; Riquelme, J.C. Temporal Convolutional Networks Applied to Energy-Related Time Series Forecasting. Appl. Sci. 2020, 10, 2322. [Google Scholar] [CrossRef]
  16. Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal Convolutional Networks for Action Segmentation and Detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 156–165. [Google Scholar]
  17. Dong, H.; Zhu, J.; Li, S.; Wu, W.; Zhu, H.; Fan, J. Short-Term Residential Household Reactive Power Forecasting Considering Active Power Demand via Deep Transformer Sequence-to-Sequence Networks. Appl. Energy 2023, 329, 120281. [Google Scholar] [CrossRef]
  18. Fan, C.; Chen, M.; Tang, R.; Wang, J. A Novel Deep Generative Modeling-Based Data Augmentation Strategy for Improving Short-Term Building Energy Predictions. Build. Simul. 2022, 15, 197–211. [Google Scholar] [CrossRef]
  19. Roth, J.; Martin, A.; Miller, C.; Jain, R.K. SynCity: Using Open Data to Create a Synthetic City of Hourly Building Energy Estimates by Integrating Data-Driven and Physics-Based Methods. Appl. Energy 2020, 280, 115981. [Google Scholar] [CrossRef]
  20. Fang, X.; Gong, G.; Li, G.; Chun, L.; Li, W.; Peng, P. A Hybrid Deep Transfer Learning Strategy for Short-Term Cross-Building Energy Prediction. Energy 2021, 215, 119208. [Google Scholar] [CrossRef]
  21. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
  22. Al-Hyari, L.; Kassai, M. Development and Experimental Validation of TRNSYS Simulation Model for Heat Wheel Operated in Air Handling Unit. Energies 2020, 13, 4957. [Google Scholar] [CrossRef]
  23. Crawley, D.B.; Lawrie, L.K.; Winkelmann, F.C.; Buhl, W.F.; Huang, Y.J.; Pedersen, C.O.; Strand, R.K.; Liesen, R.J.; Fisher, D.E.; Witte, M.J.; et al. EnergyPlus: Creating a New-Generation Building Energy Simulation Program. Energ. Build. 2001, 33, 319–331. [Google Scholar] [CrossRef]
  24. Im, P.; Joe, J.; Bae, Y.; New, J.R. Empirical Validation of Building Energy Modeling for Multi-Zones Commercial Buildings in Cooling Season. Appl. Energy 2020, 261, 114374. [Google Scholar] [CrossRef]
  25. Chen, Y.; Chen, Z.; Xu, P.; Li, W.; Sha, H.; Yang, Z.; Li, G.; Hu, C. Quantification of Electricity Flexibility in Demand Response: Office Building Case Study. Energy 2019, 157, 1–259. [Google Scholar] [CrossRef]
  26. Soleimani-Mohseni, M.; Nair, G.; Hasselrot, R. Energy Simulation for a High-Rise Building Using IDA ICE: Investigations in Different Climates. Build. Simul. -China 2016, 9, 629–640. [Google Scholar] [CrossRef]
  27. Zhou, Y.; Su, Y.; Xu, Z.; Wang, X.; Wu, J.; Guan, X. A Hybrid Physics-Based/Data-Driven Model for Personalized Dynamic Thermal Comfort in Ordinary Office Environment. Energ. Build. 2021, 238, 110790. [Google Scholar] [CrossRef]
  28. Ngo, N.; Truong, T.T.H.; Truong, N.; Pham, A.; Huynh, N.; Pham, T.M.; Pham, V.H.S. Proposing a Hybrid Metaheuristic Optimization Algorithm and Machine Learning Model for Energy Use Forecast in Non-Residential Buildings. Sci. Rep. 2022, 12, 1065. [Google Scholar] [CrossRef]
  29. Fan, C.; Ding, Y. Cooling Load Prediction and Optimal Operation of HVAC Systems Using a Multiple Nonlinear Regression Model. Energ. Build. 2019, 197, 7–17. [Google Scholar] [CrossRef]
  30. He, N.; Liu, L.; Qian, C.; Zhang, L.; Yang, Z.; Li, S. A Closed-Loop Data-Fusion Framework for Air Conditioning Load Prediction Based on LBF. Energy Rep. 2022, 8, 7724–7734. [Google Scholar] [CrossRef]
  31. Le, T.; Vo, M.; Vo, B.; Hwang, E.; Rho, S.; Baik, S. Improving Electric Energy Consumption Prediction Using CNN and Bi-LSTM. Appl. Sci. 2019, 9, 4237. [Google Scholar] [CrossRef]
  32. Cen, S.; Lim, C.G. Multi-Task Learning of the PatchTCN-TST Model for Short-Term Multi-Load Energy Forecasting Considering Indoor Environments in a Smart Building. IEEE Access 2024, 12, 19553–19568. [Google Scholar] [CrossRef]
  33. Li, L.; Su, X.; Bi, X.; Lu, Y.; Sun, X. A Novel Transformer-Based Network Forecasting Method for Building Cooling Loads. Energ. Build. 2023, 296, 113409. [Google Scholar] [CrossRef]
  34. Alsmadi, L.; Lei, G.; Li, L. Forecasting Day-Ahead Electricity Demand in Australia Using a CNN-LSTM Model with an Attention Mechanism. Appl. Sci. 2025, 15, 3829. [Google Scholar] [CrossRef]
  35. Cheng, J.; Jin, S.; Zheng, Z.; Hu, K.; Yin, L.; Wang, Y. Energy consumption prediction for water-based thermal energy storage systems using an attention-based TCN-LSTM model. Sustain. Cities Soc. 2025, 126, 106383. [Google Scholar] [CrossRef]
  36. Liu, S.; Xu, T.; Du, X.; Zhang, Y.; Wu, J. A hybrid deep learning model based on parallel architecture TCN-LSTM with Savitzky-Golay filter for wind power prediction. Energy Convers. Manag. 2024, 302, 118122. [Google Scholar] [CrossRef]
  37. Bandara, K.; Hewamalage, H.; Liu, Y.; Kang, Y.; Bergmeir, C. Improving the Accuracy of Global Forecasting Models Using Time Series Data Augmentation. Pattern Recogn. 2021, 120, 108148. [Google Scholar] [CrossRef]
  38. Fekri, M.N.; Ghosh, A.M.; Grolinger, K. Generating Energy Data for Machine Learning with Recurrent Generative Adversarial Networks. Energies 2020, 13, 130. [Google Scholar] [CrossRef]
  39. Gao, Y.; Ruan, Y.; Fang, C.; Yin, S. Deep Learning and Transfer Learning Models of Energy Consumption Forecasting for a Building with Poor Information Data. Energ. Build. 2023, 292, 113164. [Google Scholar] [CrossRef]
  40. Ye, R.; Dai, Q. A Relationship-Aligned Transfer Learning Algorithm for Time Series Forecasting. Inform. Sci. 2022, 593, 17–34. [Google Scholar] [CrossRef]
  41. Zang, L.; Wang, T.; Zhang, B.; Li, C. Transfer Learning-Based Nonstationary Traffic Flow Prediction Using AdaRNN and DCORAL. Expert Syst. Appl. 2024, 258, 125143. [Google Scholar] [CrossRef]
  42. Hua, Y.; Sevegnani, M.; Yi, D.; Birnie, A.; McAslan, S. Fine-Grained RNN with Transfer Learning for Energy Consumption Estimation on EVs. IEEE Trans. Ind. Inform. 2022, 18, 8182–8190. [Google Scholar] [CrossRef]
  43. Xing, Z.; Pan, Y.; Yang, Y.; Yuan, X.; Liang, Y.; Huang, Z. Transfer Learning Integrating Similarity Analysis for Short-Term and Long-Term Building Energy Consumption Prediction. Appl. Energy 2024, 365, 123276. [Google Scholar] [CrossRef]
  44. Li, G.; Wu, Y.; Yan, C.; Fang, X.; Li, T.; Gao, J.; Xu, C.; Wang, Z. An improved transfer learning strategy for short-term cross-building energy prediction using data incremental. Build. Simul. 2024, 17, 165–183. [Google Scholar] [CrossRef]
  45. Kamalov, F.; Sulieman, H.; Moussa, S.; Avante Reyes, J.; Safaraliev, M. Powering Electricity Forecasting with Transfer Learning. Energies 2024, 17, 626. [Google Scholar] [CrossRef]
  46. Cheng, X.; Cao, Y.; Song, Z.; Zhang, C. Wind power prediction using stacking and transfer learning. Sci. Rep. 2025, 15, 11566. [Google Scholar] [CrossRef] [PubMed]
  47. Seth, A.K.; Barrett, A.B.; Barnett, L. Granger Causality Analysis in Neuroscience and Neuroimaging. J. Neurosci. 2015, 35, 3293–3297. [Google Scholar] [CrossRef]
  48. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  49. Liao, S.; Fu, S.; Li, Y.; Han, H. Image Inpainting Using Non-Convex Low Rank Decomposition and Multidirectional Search. Appl. Math. Comput. 2023, 452, 128048. [Google Scholar] [CrossRef]
  50. Kim, T.; Cho, S. Predicting Residential Energy Consumption Using CNN-LSTM Neural Networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
  51. Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
  52. Li, T.; Liu, T.; Sawyer, A.O.; Tang, P.; Loftness, V.; Lu, Y.; Xie, J. Generalized Building Energy and Carbon Emissions Benchmarking with Post-Prediction Analysis. Dev. Built Environ. 2024, 17, 100320. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.