1. Introduction
As the optimization of the global energy structure and the achievement of carbon-neutrality targets become increasingly urgent, the issue of building energy consumption has gradually become a core topic in smart city and green strategy. According to statistics, the energy consumption in the operation phase of a building accounts for about 30% to 40% of the total energy consumption, of which the proportion of energy consumption of the heating, ventilation and air conditioning (HVAC) system, as the key equipment for temperature regulation in the building, usually exceeds 50%. According to statistics, the operational phase of buildings accounts for approximately 30–40% of global final energy consumption, and HVAC systems commonly consume 30–40% or even more of the total building energy use [
1,
2,
3]. Therefore, improving the accuracy of HVAC energy consumption prediction is crucial for enhancing building energy efficiency and supporting low-carbon operation strategies.
Therefore, how to achieve accurate modeling and prediction of HVAC system energy consumption is not only a key means to improve energy efficiency and operating costs, but also provides technical support for building intelligence and low-carbon operation.
Energy consumption prediction methods mainly include rule-based engineering models, Linear Regression (LR) [
4], time series models (e.g., ARIMA) [
5], and machine learning methods such as Support Vector Machines (SVMs) [
6] and Random Forests [
7]. These models have certain advantages in dealing with linear, univariate, and low-dimensional energy consumption data scenarios, but in the face of the highly nonlinear, multi-source heterogeneous, and strongly time-series-dependent characteristics that exist in real building systems, the models are often difficult to accurately capture the deep-seated laws, and the prediction results are biased and unstable. In-depth study of its characteristics can be understood, which is highly nonlinear from the equipment power law–saturation–step cascade, multi-source heterogeneity from the weather–structure–equipment–people–network of five heterogeneous spatial and temporal scales, strong timing dependence on the thermal inertia of the building, the state of the equipment memory and the personnel feed-forward together, and these three characteristics are not a flaw in the data, but rather a true depiction of the physical operation of the law of the HVAC, but also on the modeling structure. Therefore, it is necessary to introduce deep learning methods with strong feature expression and time-sequence modeling capabilities to improve prediction accuracy and generalization ability.
A number of studies have investigated building and HVAC energy consumption prediction from different perspectives. Zhao and Magoulès [
8] provided a comprehensive review of building energy consumption prediction methods, covering engineering, statistical, and artificial intelligence-based models. More recently, Ciampi et al. [
9] applied Bayesian Networks to industrial HVAC systems and demonstrated that probabilistic graphical models can effectively capture the complex dependencies between operating conditions and energy use. These works highlight both the importance and the difficulty of accurate HVAC energy prediction, especially under nonlinear and multi-factor coupling effects.
More recently, a variety of deep learning-based approaches have been proposed for building and HVAC energy prediction. For example, Wan et al. [
10] developed a least-squares support vector machine-based framework for HVAC energy consumption prediction and demonstrated that nonlinear kernel methods can effectively handle multi-factor coupling effects. Liu et al. [
11] proposed a dynamic prediction method for HVAC systems operating under different modes and seasons, highlighting the importance of capturing complex temporal patterns and mode transitions. Pan et al. [
12] introduced a probabilistic HVAC load forecasting method based on TCN, showing that temporal convolutional architectures can effectively model long-term dependencies in HVAC load series. In addition, Chen et al. [
13] presented a systematic review of building energy consumption prediction methods using artificial intelligence techniques, summarizing the current trends and challenges in this field.
In recent years, as deep learning has made breakthroughs in areas such as image, and natural language processing, its application in building energy prediction has become increasingly widespread. Typical Recurrent Neural Network (RNN) [
14], Convolutional Neural Network (CNN) [
15], Long Short-Term Memory (LSTM) [
16], Gated Recurrent Unit (GRU) [
17,
18,
19] and other structures have been widely used in energy consumption prediction tasks, showing good nonlinear modeling capabilities and the ability to capture temporal information. Chae et al. [
20] developed an LSTM-based model for forecasting sub-hourly electricity usage in commercial buildings, and reported that the LSTM reduced the RMSE from 0.27 to 0.19 compared with SVR, corresponding to a performance improvement of approximately 29.63%; Zhao et al. [
18] proposed a GRU-based approach for short-term heating load prediction. Their experimental results showed that the GRU model reduced the MAPE from 28.33% to 17.34% compared with traditional methods, and explicitly captured the cyclical variations between weekdays, weekends and holidays. Wan et al. [
10] developed a least-squares support vector machine-based framework for HVAC energy consumption prediction and demonstrated that nonlinear kernel methods can effectively handle multi-factor coupling effects.
However, this type of network structure still suffers from two shortcomings: on the one hand, its unidirectional structure makes it difficult to fully utilize the complete contextual information, which restricts the model from modeling the alignment of historical and future information; on the other hand, its sequential processing restricts the efficiency of the training and the flexibility of the scaling of the modeling.
To overcome the above problems, scholars have tried to introduce more expressive and flexible model structures in recent years, such as Temporal Convolutional Network (TCN) [
21] and Attention [
22]. For temporal convolutional networks, previous studies have demonstrated that TCN can achieve lower prediction errors with fewer training epochs than RNN/LSTM models, e.g., reducing RMSE by about 20% while shortening the training time by 40% in time-series forecasting tasks [
21,
23]. Liu et al. [
11] proposed a dynamic prediction method for HVAC systems operating under different modes and seasons, highlighting the importance of capturing complex temporal patterns and mode transitions. Pan et al. [
12] introduced a probabilistic HVAC load forecasting method based on TCN, showing that temporal convolutional architectures can effectively model long-term dependencies in HVAC load series. In addition, Chen et al. [
13] presented a systematic review of building energy consumption prediction methods using artificial intelligence techniques, summarizing the current trends and challenges in this field.
In real building operations, HVAC energy consumption exhibits complex temporal patterns, such as daily and weekly cycles, as well as slow seasonal trends, even when the data are sampled at an hourly resolution. Capturing such multi-scale temporal dependencies within a single time series is therefore a key challenge for accurate HVAC energy consumption prediction.
Although the above studies have significantly advanced HVAC and building energy consumption prediction, several limitations remain. First, many existing works focus on either purely statistical models or single deep learning architectures, which may struggle to simultaneously capture multi-scale temporal dependencies, bidirectional contextual information, and key time slices with large load changes. Second, a large portion of the literature evaluates the models on a limited number of buildings or in short-term test periods, and the cross-building generalization capability is still insufficiently explored. Third, the interpretability of deep learning models for HVAC load forecasting, especially regarding the contribution of different time steps and exogenous features, is often not explicitly analyzed. These limitations motivate the development of hybrid deep learning architectures that can integrate complementary temporal modeling modules and provide more interpretable predictions for building-level HVAC energy consumption.
To address these gaps, this paper proposes a hybrid TCN-BiGRU-Attention model that integrates three complementary temporal modeling components within a single framework, and performs a comprehensive evaluation on the ASHRAE public dataset, including ablation experiments, attention-based interpretability analysis, cross-building generalization tests, and comparisons with both deep-learning and non-deep-learning baselines. The model synthesizes the structural advantages of the three types of sub-modules and possesses several significant features as follows:
- (1)
The convolutional structure of TCN is utilized to extend the sensory field without network depth and extract local to global temporal features, which enhances the ability of multi-scale temporal feature modeling;
- (2)
Introducing BiGRU to construct bi-directional temporal dependencies, which enhances the model’s joint perception of trends and short-term fluctuations, and enhances the comprehensive modeling capability of contextual information;
- (3)
Introducing the attention mechanism to identify key moments in energy consumption (e.g., equipment start/stop, drastic climate change, etc.), realizing weighted modeling of key time slices, and enhancing the ability of focusing on key time slices;
- (4)
Modularized design facilitates deployment and iteration, and provides a certain degree of interpretability through Attention weighting analysis, which helps engineering landing application and improves scalability and interpretability.
The remainder of this paper is organized as follows.
Section 2 reviews the related work on HVAC and building energy consumption prediction.
Section 3 presents the architecture and key components of the proposed TCN-BiGRU-Attention model.
Section 4 describes the dataset, feature engineering steps, and evaluation setup.
Section 5 reports and discusses the model performance, ablation studies, sensitivity analysis, and cross-building generalization results. Finally,
Section 6 summarizes the main findings, discusses the limitations of this study, and outlines directions for future work and practical deployment.
2. Related Works
2.1. Application of TCN in Temporal Modeling
Temporal Convolutional Network (TCN) is a temporal modeling method based on convolutional structure, proposed by W Zhao et al. (2019) [
24], for replacing RNN-like models. Its core includes the following structural design points:
- (1)
Convolution (Dilated Convolution): by exponentially increasing the dilation rate, TCN is able to capture long-distance dependencies with limited convolutional layer depth;
- (2)
Causal Convolution (Causal Convolution): to ensure that the current output is only related to the historical information to avoid future information leakage;
- (3)
Residual Block: to alleviate the problem of gradient disappearance in deep networks and improve training stability.
It can be seen that TCN has parallel computing capability, faster training speed compared to RNN/LSTM, and can model dependence, has a stronger sense of the wild than GRU under a certain number of layers, and has good local feature extraction capability in the convolutional kernel structure to take into account the trend and fluctuations.
In energy prediction, TCN has been used for tasks such as wind power prediction, load decomposition, and HVAC energy modeling. SK Gautam et al. (2025) [
23] achieved short- and medium-term prediction of building electrical loads based on a TCN model, which significantly improved robustness and prediction accuracy.
2.2. Advantages of BiGRU in Bidirectional Sequence Modeling
GRU is a recurrent neural network that is more concise and computationally efficient than LSTM structure, which uses two gating mechanisms (reset gate and update gate) to control the information flow. BiGRU, i.e., bi-directional GRU structure, is capable of learning context dependencies in sequences in both forward and backward directions, which is suitable for tasks that have dependencies on both the future and the history.
Compared with unidirectional GRU, BiGRU has a more comprehensive context modeling capability, improves the perception of turning points and cycle boundaries, and facilitates the model to consistently model trend changes and abrupt time points, and at the same time, performs more stably in small-sample or long-duration dependency tasks.
In recent years, BiGRU has been widely used in air prediction, building temperature control curve regression, etc., proving its reliability in time series regression.
2.3. Attention Mechanism and Its Introduction Value in Energy Consumption Prediction
Attention mechanism (Attention) proposed by A Vaswani et al. (2017) [
25] for neural machine translation task. This mechanism achieves selective modeling of input information by calculating the importance weights of each part of the input sequence to the current task goal. The core idea is “resource focusing”, instead of processing all time steps equally, computational resources should be concentrated on more critical parts.
In energy consumption prediction, the introduction of Attention mechanism brings three benefits. First, it can automatically identify the intervals with large changes in energy consumption during key time periods such as peak hours and equipment start/stop hours; second, it can visualize the time weights and assist in the analysis of the causes of energy consumption, which enhances the interpretability of the model; and third, it has a certain filtering capability for anomalies and non-critical data, which improves the robustness of the prediction.
At present, Attention mechanism has been integrated into a variety of structures, such as Seq2Seq+Attention [
26], TCN-Attention [
27], Informer [
28], etc., which are widely used in short-term load forecasting, smart grid scheduling, air conditioning system control and other tasks.
To provide a clearer comparison of representative HVAC and building energy prediction methods,
Table 1 summarizes several typical models, their main characteristics, and limitations.
As shown in
Table 1, previous studies have extensively explored LSTM/GRU, SVM-based models, probabilistic graphical models, and TCN-based architectures for HVAC and building energy prediction. However, only a few works attempt to jointly integrate multi-scale temporal convolutions, bidirectional recurrent modeling and attention mechanisms, or to systematically evaluate hybrid models on publicly available datasets with cross-building validation. These observations further motivate the hybrid TCN-BiGRU-Attention framework proposed in this paper.
2.4. Summary
Recent reviews and application studies on building and HVAC energy prediction [
8,
9] further confirm that there is still room for improvement in terms of modeling nonlinear dynamics, temporal dependencies, and cross-feature interactions.
In summary, existing studies have explored multiple deep time-series modeling for HVAC energy consumption prediction. However, it is usually difficult for a single model structure to simultaneously meet the following three requirements: (i) multi-scale time-series modeling capability; (ii) context modeling capability; and (iii) time-slice criticality identification capability.
The multi-module structure introduced in this paper, which integrates TCN, BiGRU and Attention mechanism, is precisely an integrated innovation based on fully absorbing the advantages of the above three types of methods, which has stronger representation capability and practical value.
4. Modelling and Evaluation
The TCN-BiGRU-Attention model proposed in this paper was tested under a unified platform, and a systematic experimental procedure was designed, including data source and preprocessing, feature construction, experimental parameter setting, and comparison model selection. It is found to be highly effective in the task of predicting energy consumption of HVAC systems, and multiple independent runs are performed to ensure the reproducibility and comparability of the results.
4.1. Dataset Source
The modelling and evaluation in this paper are based on the building energy consumption dataset published by the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE). This dataset includes energy consumption data of HVAC systems from multiple commercial buildings across various climate zones, along with corresponding environmental, equipment, and temporal information.
According to the dataset description, the analyzed buildings are commercial facilities equipped with centralized, electrically driven HVAC systems (e.g., chillers, pumps, air-handling units, and ventilation fans). The publicly released ASHRAE dataset provides aggregated meter readings at the building level and does not disclose detailed device-level specifications such as manufacturer, rated capacity, or control strategy. Consequently, the proposed model targets building-level HVAC energy consumption prediction for such centralized systems, rather than the performance of individual air-conditioning units.
In this study, two office buildings from the ASHRAE dataset are selected and denoted as Building 1 (B1) and Building 2 (B2):
- (1)
Building 1 (B1) is used for model development, including training, validation, and primary testing. It is an office building located in a temperate climate region and equipped with a centralized, electrically driven HVAC system. Unless otherwise specified, all model comparisons in
Section 4 and
Section 5 are conducted on B1.
- (2)
Building 2 (B2) is another office building with different floor area and climatic conditions. It is reserved for cross-building generalization experiments in
Section 5.5, where the model trained on B1 is directly applied to B2 without retraining.
Throughout this paper, we consistently use the notation “Building 1 (B1)” and “Building 2 (B2)” to avoid ambiguity.
To avoid information leakage, the dataset was split chronologically rather than randomly. For B1, the first 70% of the hourly records in the 12-month period were used as the training set, the subsequent 10% as the validation set for hyperparameter tuning and early stopping, and the final 20% as the test set for performance evaluation. The same chronological splitting strategy was adopted for Building-2 in the generalization experiment.
In particular, this study uses the publicly available “ASHRAE—Great Energy Predictor III” dataset released by the American Society of Heating, Refrigerating and Air-Conditioning Engineers on the Kaggle platform [
29].
Each building contains approximately 8784 annual data records (365 × 24), resulting in hundreds of thousands of samples in total. In this study, a representative building (denoted as Building-1) was selected as the primary experimental subject. All data were standardized, cleaned, and preprocessed to ensure the stability and consistency of the input features.
In the official metadata of the ASHRAE dataset, the building selected as Building-1 in this study is labeled as an office building located in a temperate climate region. The building is served by a centralized HVAC system and is representative of medium-to-large commercial office buildings. For the cross-building generalization experiment in
Section 5.5, Building-2 is another office building with different floor area and climatic conditions. The dataset anonymizes the exact geographical locations and detailed architectural characteristics for privacy reasons; therefore, this study focuses on the building type (office) and climatic context rather than on specific addresses.
As shown in
Figure 2, the dataset records electricity consumption readings over a continuous 12-month period, exhibiting a clear seasonal pattern. During the late spring to autumn months (from late May to October), electricity consumption significantly increases.
The data were sampled on an hourly basis over 12 consecutive months. As illustrated in
Figure 3, after normalization, the data reveal the daily, weekly, and monthly electricity consumption of the HVAC system. The peak consumption period occurs between 10:00 and 19:00 each day.
Additionally, the dataset includes meteorological and climatic attributes such as temperature and wind speed. As depicted in
Figure 4, the statistical relationship between electricity consumption and climatic factors shows that energy usage peaks when the ambient temperature ranges between 20 °C and 30 °C, and similarly, when the wind speed is within levels 3 to 6 (on the Beaufort scale).
4.2. Data Preprocessing and Feature Construction
To enhance the model’s learning performance and generalization capability, the following preprocessing steps were applied to the raw data:
- (1)
Handling Missing Values
For the missing meteorological and meter data, linear interpolation and forward fill methods were employed.
Outliers were handled using the 3
σ (three-sigma) rule for truncation, with the following value range limits applied:
where
μ and
σ represent the mean and standard deviation of the corresponding feature, respectively.
- (2)
Temporal Feature Extraction
To capture the temporal periodic structure, the following time-based features were extracted from the timestamp:
Hour ∈{0, 1, …, 23}, Day of the week ∈{0, 1, …, 6}, holiday (binary: 0/1)
where both the hour and day of the week were encoded cyclically using sine-cosine functions, constructing continuous features as follows:
- (3)
Feature Normalization
All continuous numerical features were standardized using Z-score normalization to eliminate dimensionality effects:
where
μ and
σ are the mean and standard deviation computed from the training dataset.
- (4)
Input-Output Window Setup
The input sequence consists of the feature set from the past T = 168 h (i.e., 7 days), and the prediction target is the HVAC energy consumption for the next P = 24 h.
The input-output format is as follows:
where
and
represent the feature and target values, respectively, for each time step.
In summary, the input feature vector at each time step consists of the following attributes:
- (1)
meter_reading_norm: normalized electricity consumption of the HVAC system;
- (2)
air_temperature (°C);
- (3)
wind_speed (m/s);
- (4)
Hour_sin and Hour_cos: sine–cosine encodings of the hour of day;
- (5)
Week_sin and Week_cos: sine–cosine encodings of the day of week;
- (6)
holiday: binary indicator (0/1) denoting whether the current time falls on a weekend or public holiday.
These attributes are used consistently in all models and experiments described in this paper.
4.3. Model Parameter Settings
During the experimental process, the main model parameters and their configurations were set in
Table 2.
To prevent model overfitting, an early stopping mechanism was introduced during training (Patience = 7). Specifically, training was terminated if the validation loss did not decrease for seven consecutive epochs.
All models in this study were implemented in Python 3.10 using the PyTorch 2.1 deep learning framework. The experiments were conducted on a workstation equipped with an NVIDIA RTX 3080 GPU (10 GB memory; NVIDIA Corporation, Santa Clara, CA, USA), an Intel Core i7-12700 CPU (Intel Corporation, Santa Clara, CA, USA), and 32 GB of RAM, running the Windows 11 operating system (Microsoft Corporation, Redmond, WA, USA). The optimization was performed using the Adam optimizer with an initial learning rate of 1 × 10−3 and a batch size of 64, unless otherwise specified.
To ensure fair comparison, all baseline models were implemented within the same framework and trained using the same training–validation–test splits and early stopping strategy. The main code structure follows an open-source implementation for time-series forecasting, and can be made available upon request to facilitate reproducibility.
4.4. Comparison of Models
To verify the effectiveness of the model, several commonly used forecasting models were selected as baseline models for comparison.
In addition to the deep learning baselines, we also implemented two classical machine-learning models—multiple linear regression (LR) and random forest regression (RF)—as non-deep-learning benchmarks. Both models use the same input features and training/validation splits as the proposed method.
As observed in
Table 3, each model was trained and tested on the same input features and sequence lengths to ensure a fair comparison. The Transformer model and other architectures were implemented based on publicly available implementations, and the input format was adjusted to match the format of the data in this study.
4.5. Evaluation Metrics
This paper uses the following four standard metrics to evaluate the forecasting performance:
Mean Absolute Error (MAE):
Mean Squared Error (MSE):
Root Mean Squared Error (RMSE):
Mean Absolute Percentage Error (MAPE):
where
denotes the actual energy consumption,
represents the predicted value, and
N is the number of samples. MAE reflects the overall magnitude of prediction errors, while MSE and RMSE are more sensitive to outliers and thus better suited for evaluating model robustness. MAPE measures the average percentage deviation of predictions from the actual values, providing an interpretable indicator of relative prediction accuracy.
5. Results and Discussion
To comprehensively evaluate the effectiveness of the proposed model in HVAC energy consumption forecasting, this section presents a multi-perspective discussion covering model performance, ablation study analysis, attention visualization, sensitivity analysis, and generalization testing.
5.1. Model Performance Results
Using the ASHRAE B1 dataset, the proposed model and seven comparative models were trained and tested under identical model evaluation settings.
As shown in
Figure 5, the prediction curve of the TCN-BiGRU-Attention model aligns more closely with the actual energy consumption curve. Compared with single or simpler hybrid models such as TCN, BiGRU, and TCN-BiGRU, the proposed architecture effectively integrates the advantages of each component. It achieves synergistic effects in temporal feature extraction, bidirectional information utilization, and key information focusing. Consequently, the proposed model demonstrates superior accuracy and stability in energy forecasting tasks, providing a promising architecture reference for deep learning–based energy prediction research.
The evaluation results are summarized as
Table 4.
As observed in
Table 4, the proposed model outperforms mainstream deep learning models across all evaluation metrics. Compared with the Transformer model, the TCN-BiGRU-Attention model achieves reductions of 2.3% in MAE, 22.2% in RMSE, and 34.7% in MAPE, while the MSE is significantly reduced by 54.1%.
Given that MSE is more sensitive to outliers, these results indicate that the proposed model exhibits stronger robustness in mitigating large prediction deviations.
Computational Cost Comparison
In addition to predictive accuracy, computational efficiency is also an important factor for practical deployment.
Table 5 is therefore extended with the average training time per epoch and the inference time per 1000 prediction windows for each model, measured on the same hardware platform (GPU configuration).
Table 5 summarizes the computational costs of different models for HVAC load forecasting and shows that these costs are consistent with their structural characteristics, while highlighting the favorable accuracy–efficiency trade-off of the proposed TCN-BiGRU-Attention model. Linear Regression (LR) achieves the highest efficiency (4.2 s/epoch training, 8.7 ms per thousand windows inference) due to its minimal parameter size and simple linear operations, and thus serves as an efficient baseline. Random Forest (RF) requires more training time (18.5 s/epoch) because multiple decision trees must be constructed, yet it still avoids the higher complexity of attention-based architectures. Overall, the computational cost follows the clear pattern LR < TCN < GRU < TCN-GRU < TCN-BiGRU-Attention < RF < Transformer, which aligns with the common hierarchy from linear to complex attention-based models. Although TCN-BiGRU-Attention slightly increases training time compared with single TCN/GRU, it maintains acceptable inference latency (25.3 ms per thousand windows) and achieves a 22.2% RMSE and 54.1% MSE reduction relative to the Transformer, while LR shows over 40% higher RMSE and RF yields moderate accuracy with higher cost, confirming the practical deployment advantages of TCN-BiGRU-Attention for HVAC load forecasting.
From a physical and operational perspective, the superiority of the TCN-BiGRU-Attention model can be interpreted as follows. The TCN module effectively captures daily and weekly cycles, as well as smoother seasonal trends, which correspond to typical operation schedules and outdoor weather variations. The BiGRU module exploits bidirectional information to better represent the gradual ramp-up and ramp-down behaviors before and after peak periods. The Attention mechanism highlights those time steps that are most informative for forecasting upcoming loads, such as morning start-up hours and midday peaks. This combination allows the model to produce forecasts that are not only numerically more accurate but also more consistent with HVAC operation patterns observed in practice.
5.2. Ablation Study Analysis
As shown in
Figure 6, removing any individual module leads to degraded performance, indicating that the TCN, BiGRU, and Attention modules work collaboratively in temporal modeling. Notably, the introduction of the Attention mechanism substantially enhances the model’s responsiveness to key temporal moments, as depicted in
Figure 7.
To further validate the effectiveness of each submodule, four variant models were designed for ablation experiments, as summarized below
Table 6:
Ablation experiments were conducted to verify the contributions of key components (Attention mechanism, BiGRU module, TCN module) in the TCN-BiGRU-Attention model, with performance evaluated by MAE, MSE, RMSE, and MAPE.
The results show that when the Attention mechanism is removed (No-Attention), the MAE increases to 0.1192, MSE to 0.0275, RMSE to 0.1657, and MAPE to 0.0325. The significant rise in error indicators indicates that the Attention mechanism can effectively capture long-range dependencies of sequence features and improve prediction accuracy. When the BiGRU module is removed (No-BiGRU), MAE, MSE, and RMSE surge to 0.2236, 0.0748, and 0.2735 respectively, with MAPE at 0.0346. The most severe error deterioration confirms the irreplaceable role of BiGRU in modeling temporal information of sequences. When the TCN module is removed (No-TCN), MAE is 0.1091, MSE is 0.0239, RMSE is 0.1546, and MAPE is 0.0252. Although it outperforms No-Attention and No-BiGRU, it is still much worse than the complete model, demonstrating the critical value of TCN in local feature extraction.
In contrast, the complete TCN-BiGRU-Attention model achieves MAE of 0.0299, MSE of 0.0414, RMSE as low as 0.0017, and MAPE of 0.0182. All indicators are significantly better than those of the ablated models, fully verifying the synergistic advantages of the integration of TCN, BiGRU, and the Attention mechanism. This proves that the model architecture has efficient feature extraction and temporal modeling capabilities in sequence prediction tasks.
From the ablation study, it can be seen that each submodule—TCN, BiGRU, and Attention—plays a distinct and complementary role. TCN mainly contributes to multi-scale temporal feature extraction, BiGRU enhances bidirectional context modeling, and Attention focuses the model on key time slices with large load changes. The complete TCN-BiGRU-Attention model therefore integrates these three capabilities and achieves the best overall performance, which directly supports the effectiveness of the proposed combination.
5.3. Attention Visualization Analysis
To verify the interpretability of the attention mechanism, a random 24-h input sequence was selected, and the corresponding attention weights were visualized, as shown in
Figure 8.
Figure 8 reveals that the model assigns higher attention weights to the preceding 1–2 h and to typical HVAC operation peaks (e.g., around 8:00 AM and 1:00 PM). This demonstrates that the model effectively captures critical time segments corresponding to HVAC activity patterns.
5.4. Sensitivity Analysis
To evaluate the influence of individual input features on model performance, a sensitivity analysis was conducted. Specifically, while keeping the model architecture and parameters unchanged, each input feature was removed in turn, and performance metrics on the test set were recorded. The comparative results quantify the relative importance of each feature.
As shown in
Figure 9, the model achieves the lowest error (MAE = 0.0299) when all features are included. Removing air_temperature or wind_speed results in performance degradation, with air_temperature removal increasing MAE to 0.0334, indicating its substantial contribution to prediction accuracy. Excluding meter_reading_norm (i.e., historical target values) causes a sharp performance drop (MAE = 0.0574), confirming its critical role as a predictive feature.
In addition to feature-level sensitivity, it is also important to consider the measurement uncertainty of the underlying sensors. The ASHRAE dataset does not explicitly provide detailed accuracy specifications for individual meters and environmental sensors. However, typical commercial-grade electricity meters and temperature sensors used in building energy monitoring often have accuracy levels within ±0.5–1.0% of full scale and ±0.3–0.5 °C, respectively. Therefore, even an ideal model cannot be expected to achieve arbitrarily small prediction errors, since part of the observed deviation originates from measurement noise. The reported MAE and RMSE values in this study are of the same order of magnitude as the plausible sensor uncertainty, suggesting that a non-negligible portion of the residual error is likely due to data-level uncertainty rather than purely model deficiencies.
5.5. Generalization Ability Test
To assess the generalization capability of the proposed model, the trained model was transferred and tested on the ASHRAE B2 dataset. Compared with B1, B2 differs in its floor area and climatic conditions, which allows us to preliminarily examine the robustness of the proposed model across office buildings with different scales and environmental contexts.
It should be noted that both B1 and B2 are office buildings, but they are located at different sites in the ASHRAE dataset, which correspond to different climatic conditions (e.g., different distributions of outdoor temperature and humidity) and building scales. To obtain an initial insight into the influence of climate and building characteristics on generalization, we compare the mean and variance of the outdoor temperature and HVAC energy consumption between B1 and B2. B2 exhibits higher temperature variability and a larger peak-to-average load ratio, indicating more pronounced seasonal and peak-load behaviours.
As illustrated in
Figure 10, four independent experiments were conducted. From the cross-building experiments, the prediction errors on B2 are slightly higher than those on B1, which suggests that differences in climatic conditions and building operation patterns do affect the generalization performance. Nevertheless, the TCN-BiGRU-Attention model maintains a lower RMSE and MAPE than the baseline models on both buildings, indicating that the hybrid architecture retains a certain robustness across office buildings with varying climates. A more systematic sensitivity analysis across multiple building types (e.g., residential, retail, hospitals) and climate zones would be required to draw stronger conclusions, and this is left as an important direction for future work.
5.6. Model Validation
Model validation in this study is conducted at two levels. First, an internal validation is performed by splitting the time series chronologically into training, validation, and test sets, and by using the validation set to tune hyperparameters and prevent overfitting through early stopping. Multiple independent runs with different random seeds are carried out, and the average performance on the test set is reported to reduce the influence of randomness.
Second, an external validation is carried out by transferring the model trained on Building 1 to Building 2, which has different floor area and climatic conditions. The results show that the TCN-BiGRU-Attention model maintains a competitive performance on Building 2, achieving an RMSE and MAPE comparable to those obtained on Building 1, which indicates that the model has a certain generalization capability across office buildings.
Moreover, the magnitude of the error metrics observed in this study is broadly consistent with the performance reported in recent HVAC energy prediction works using data-driven models, where MAPE values typically fall in the range of about 5–10% for similar hourly load forecasting tasks [XX–YY]. Although direct numerical comparison is limited by differences in building types, climates, and feature sets, this consistency suggests that the proposed model reaches a realistic level of predictive accuracy in line with the current state of the art.
6. Limitations and Future Work
Despite these achievements, there remain several directions for further improvement:
- (1)
Validation and Optimization on Proprietary Datasets:
Considering the complexity of HVAC systems in real-world deployment, future research can focus on collecting and refining proprietary building energy datasets to validate and optimize the proposed model’s forecasting reliability under practical conditions.
- (2)
Multimodal Feature Fusion and Climate Adaptability:
Future work could integrate multimodal data sources—such as thermal infrared imagery, semantic sensor data, and Building Information Modeling (BIM) information—to enhance the model’s adaptability to diverse building types and varying climatic environments.
It should be emphasized that the experiments in this paper were conducted at an hourly sampling resolution. Extending the evaluation to multiple temporal granularities (e.g., 15-min, half-hourly, or daily series) and explicitly modeling cross-granularity interactions will be considered in future work.
From a practical perspective, the proposed TCN-BiGRU-Attention model is suitable for deployment in both day-ahead and intra-day HVAC load forecasting scenarios. In an online application, the model can be used to generate rolling 24-h forecasts every hour based on the most recent 168 h of measurements, providing decision support for optimal operation scheduling and demand-side management.
Since the statistical characteristics of building loads may gradually drift due to equipment aging, occupancy pattern changes and retrofits, periodic model retraining with newly collected field data (e.g., on a weekly or monthly basis) is recommended to maintain long-term accuracy. In future work, we plan to investigate lightweight online update and transfer learning strategies to further reduce the maintenance cost. In addition, integrating the proposed model into building energy management systems (BEMS) to drive rule-based or optimization-based control actions will be an important direction towards real-time energy-efficient HVAC control.
Furthermore, the cross-building validation in this study is limited to two office buildings in different climate contexts. The generalization of the proposed model to other building types and climate zones has not yet been fully assessed.
7. Conclusions
This paper proposes a hybrid TCN-BiGRU-Attention model for building-level HVAC energy consumption prediction based on the ASHRAE public dataset. The model is designed to jointly exploit multi-scale temporal patterns through TCN, bidirectional contextual information through BiGRU, and key time slices through an Attention mechanism.
The evaluation results on B1 show that the proposed model outperforms several representative baseline methods, including GRU, TCN, TCN-GRU, Transformer, linear regression, and random forest. Specifically, the TCN-BiGRU-Attention model achieves the lowest MAE, RMSE, MSE and MAPE among all models, reducing the RMSE and MSE by approximately 22.2% and 54.1%, respectively, compared with the best-performing baseline. The visualization of prediction curves indicates that the proposed model can more accurately track both the overall trend and most of the peak loads, which is crucial for reliable HVAC operation and demand-side management.
Ablation experiments further confirm the effectiveness of the hybrid architecture. Removing any submodule (TCN, BiGRU, or Attention) leads to a noticeable degradation in prediction accuracy, showing that each component plays a distinct and complementary role in multi-scale temporal feature extraction, bidirectional context modeling, and time-step weighting. The Attention visualization provides additional interpretability by highlighting the time periods that contribute most to the prediction, such as typical start-up and peak operation hours.
The cross-building generalization experiment using B2 demonstrates that the model maintains satisfactory performance when transferred to another office building with different floor area and climatic conditions, suggesting that the proposed approach has a certain degree of robustness across buildings. Combined with the analysis of measurement uncertainty, these results indicate that a substantial portion of the residual error may be attributed to data-level noise rather than purely model deficiencies.
From a practical perspective, the proposed model is suitable for day-ahead and intra-day HVAC load forecasting applications. With its relatively low inference cost and improved predictive accuracy, the model can be embedded into building energy management systems to support proactive scheduling, peak shaving, and energy-saving strategies. Periodic retraining with newly collected field data is recommended to maintain long-term accuracy.
In summary, this work contributes a hybrid, interpretable deep learning framework for HVAC energy consumption prediction, together with a comprehensive evaluation on a publicly available dataset. Future work will focus on extending the model to more building types and temporal granularities, integrating it with advanced control strategies, and developing online updating mechanisms for long-term deployment.